Monday, September 21, 2009

Python - browse the web with python

# retrieve the html from a web site 
import urllib2
import urllib

# query string
# in this example these GET parameters don't
# do anything. They are just here to show
# off the urlencode() function
qs = {}
qs['q'] = "items to search for"
qs['i'] = 22
qs_values = urllib.urlencode(qs)

# append everything together
url = "http://www.example.com"
full_url = url + '?' + qs_values
print "my full url: %s" %(full_url)

# get the data from the web
data = urllib2.urlopen(full_url)

# data now has all the html/css/javascsript in it
for item in data:
print item


## output:
##my full url: http://www.example.com?q=items+to+search+for&i=22
##<HTML>
##
##<HEAD>
##
## <TITLE>Example Web Page</TITLE>
##
##</HEAD>
##
##<body>
##
##<p>You have reached this web page by typing &quot;example.com&quot;,
##
##&quot;example.net&quot;,
##
## or &quot;example.org&quot; into your web browser.</p>
##
##<p>These domain names are reserved for use in documentation and are not available
##
## for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt">RFC
##
## 2606</a>, Section 3.</p>
##
##</BODY>
##
##</HTML>

No comments:

Post a Comment