I am mikek's Blog

The personal blog of Mike Karthauser, who you may know from Brightstorm Limited or the internet.

PHP: Parsing HTML to find Links

From blogging to log analysis and search engine optimisation (SEO) people are looking for scripts that can parse web pages and RSS feeds from other websites – to see where their traffic is coming from among other things.

Parsing your own HTML should be no problem – assuming that you use consistent formatting – but once you set your sights at parsing other people’s HTML the frustration really sets in. This article at Art of Web presents some regular expressions and a commentary that will hopefully point you in the right direction.


Filed under: development, ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

mikek on twitter

%d bloggers like this: