Wednesday, September 30, 2009

Python - regular expression backreference example

# use the power of regular expressions
# bite the bullet and review the regular expression syntax 

import re

# lets say you have created the next search engine
# your search engine extracts the contents of
# the <title></title> tags
theString = """ <lots of garbage and # what not and this

title is going to be cool> <myTitle> will be awesome.
And once you get <title>the title is here</title> and then
there is the end """

# you compile a regular expression to search

# for the contents of the title tag
# (this is where the regular expression syntax
# comes in handy)
# the one thing to certainly notice is that there are
# parenthesis surrounding the contents of the title tag.
# These are called backreferences.  Once we've run the search
# we'll be able to reference these.
p = re.compile('<title>(.+)<\/title>')

# now search theString
m =, theString)

# you can test whether or not your
# regular expression was successfull
if m:
    print "regular expression search successfull!"
    # referencing group #1 references the first backreference

    print "the title contents are:",
    # group # 0 is the entire regular expression result
    print "the entire regular expression returned:",

    print "regular expression search returns no results"

#   regular expression search successfull!
#   the title contents are: the title is here
#   the entire regular expression returned: <title>the title is here</title>

