Tuesday, 20 August 2013

Regex preg_match_all HTML a tag for retrieving names,ids and hrefs

Regex preg_match_all HTML a tag for retrieving names,ids and hrefs

So I've been trying to use some regular expressions to extract information
from <a href='#' >HTML a tag</a>, for three separate schemas of possible
tags.
<a id="Anchor_One" name="Anchor_One"> Anchor Details </a>
<a href="#Anchor_Two" name="Anchor_Two" > Anchor Two Details </a>
<a name="Anchor_Three" > Anchor Three Details </a>
So far I have some regular expressions to extract all the attributes from
a given HTML tag /(\\w+)\s*=\\s*("[^"]*"|\'[^\']*\'|[^"\'\\s>]*)/. And I
also have some regex to match links with href attribute active
/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU. But I can't seem
to create a pattern to match the other combinations of what a link tag may
have.
<a id="Anchor_One" name="Anchor_One"> Anchor Details </a>
<a name="Anchor_Three" > Anchor Three Details </a>
Links that do not have href attribute set, are not picked up with my
current pattern, so not all the anchors can be retrieved.

No comments:

Post a Comment