Dave's Brain

Browse - programming tips - perl use non greedy matches to quickly parse xml

Date: 2010jun29
Language: perl

Q.  What is the best way to parse XML or HTML?

A.  Use non-greedy matches.   And the s g options.

For example, if you have a RSS feed which is XML with
multiple <item>'s.  Do this:

	@a = $content =~ m|\<item\>(.*?)\</item\>|sg;

	for $i (@a)
	{
		print "item=$i\n";
	}

Here is what's happening:

We use | to delimit the match so we don't have to escape the /

The .* matches any character(s).  Adding the ? makes it non-greedy
so we get each <item> at a time.  Because <item> is any characters.

The s option folds several lines together.  So this works
if there are newlines.

The g option gets all (global) matches.
What this info useful to you? You can donate to say thanks

Add a comment

Sign in to add a comment
Copyright © 2008-2012, dave - Code samples on Dave's Brain is licensed under the Creative Commons Attribution 2.5 License. However other material, including English text has all rights reserved.