python - How do I use regular expressions to parse HTML tags? -

- April 15, 2012

was wondering how extrapolate value of html element using regular expression (in python preferably).

for example, <a href="http://google.com"> hello world! </a>

what regex use extract hello world! above html?

>>> beautifulsoup import beautifulsoup >>> html = '<a href="http://google.com"> hello world! </a>' >>> soup = beautifulsoup(html) >>> soup.a.string u' hello world! '

this, instance, print out links on page:

import urllib2 beautifulsoup import beautifulsoup  q = urllib2.urlopen('https://stackoverflow.com/questions/3884419/') soup = beautifulsoup(q.read())  link in soup.findall('a'):     if link.has_key('href'):         print str(link.string) + " -> " + link['href']     elif link.has_key('id'):         print "id: " + link['id']     else:         print "???"

output:

stack exchange -> http://stackexchange.com log in -> /users/login?returnurl=%2fquestions%2f3884419%2f careers -> http://careers.stackoverflow.com meta -> http://meta.stackoverflow.com ... id: flag-post-3884419 none -> /posts/3884419/revisions ...

Search This Blog

ERT

python - How do I use regular expressions to parse HTML tags? -

Comments

Post a Comment

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -

PostgreSQL 9.x - pg_read_binary_file & inserting files into bytea -