python - Downloading a webpage using urllib2 results in garbled junk? (only sometimes) -


how come hit webpage, html text:

http://itunes.apple.com/us/app/mobile/id381057839 

but when hit webpage, garbled junk?

http://itunes.apple.com/us/app/mobile/id375562663 

i use same download() function in python, here:

def download(source_url):     try:         socket.setdefaulttimeout(10)         agent = "mozilla/5.0 (windows; u; windows nt 6.1; en-us; rv:1.9.2.10) gecko/20100914 alexatoolbar/alxf-1.54 firefox/3.6.10 gtb7.1"         ree = urllib2.request(source_url)         ree.add_header('user-agent',agent)         ree.add_header("accept","text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")         ree.add_header("accept-language","en-us,en;q=0.5")         ree.add_header("accept-charset","iso-8859-1,utf-8;q=0.7,*;q=0.7")         ree.add_header("accept-encoding","gzip,deflate")         ree.add_header("host","itunes.apple.com")         resp = urllib2.urlopen(ree)         htmlsource = resp.read()         return htmlsource     except exception, e:         print e 

solved. compression issue.

def download(source_url):     try:         socket.setdefaulttimeout(10)         agents = ['mozilla/4.0 (compatible; msie 5.5; windows nt 5.0)','mozilla/4.0 (compatible; msie 7.0b; windows nt 5.1)','microsoft internet explorer/4.0b1 (windows 95)','opera/8.00 (windows nt 5.1; u; en)']         ree = urllib2.request(source_url)         ree.add_header('user-agent',random.choice(agents))         ree.add_header('accept-encoding', 'gzip')         opener = urllib2.build_opener()         h = opener.open(ree).read()         import stringio         import gzip          compressedstream = stringio.stringio(h)         gzipper = gzip.gzipfile(fileobj=compressedstream)         data = gzipper.read()         return data      except exception, e:         print e         return "" 

Comments

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

jquery - appear modal windows bottom -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -