Date: 2010jun29
Language: perl
Q. What is the best way to parse XML or HTML?
A. Use non-greedy matches. And the s g options.
For example, if you have a RSS feed which is XML with
multiple <item>'s. Do this:
@a = $content =~ m|\<item\>(.*?)\</item\>|sg;
for $i (@a)
{
print "item=$i\n";
}
Here is what's happening:
We use | to delimit the match so we don't have to escape the /
The .* matches any character(s). Adding the ? makes it non-greedy
so we get each <item> at a time. Because <item> is any characters.
The s option folds several lines together. So this works
if there are newlines.
The g option gets all (global) matches.
| What this info useful to you? You can donate to say thanks |
Add a comment
Sign in to add a comment