Python HTTP Redirect requests forbidden -
i'm trying scrape website url redirected, programmatically trying gives me 403 error code (forbidden). can place url in browser , browser follow url though...
to show simple example i'm trying go : http://en.wikipedia.org/w/index.php?title=mike_tyson
i've tried urllib2 , mechanize both not work. new web programming , wondering whether there other tricks need in order follow redirect!
thanks!
edit
okay, messed. looking alternative methods because trying scrape mp3. managing succesfully downloading mp3 mangled.
turns out somehow related me downloading on windows or current python version. tested code on ubuntu distro , mp3 file downloaded fine....
so used simple urllib2.openurl , worked perfect!
i wonder why downloading on windows mangled mp3?
try changing mechanize
flag not respect robots.txt. also, consider changing user-agent http header:
>>> import mechanize >>> br = mechanize.browser() >>> br.set_handle_robots(false) >>> br.addheaders = [('user-agent', 'mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; sv1)')]
web servers treat running ms internet explorer 6, rather bot. if restrict robots.txt, bot continue work until blocked.
>>> br.open('http://en.wikipedia.org/w/index.php?title=mike_tyson') <response_seek_wrapper @ 0x... wrapped object = <closeable_response @ 0x... fp = <socket._fileobject object @ 0x...>>> #doctest: +ellipsis
Comments
Post a Comment