Python HTTP Redirect requests forbidden -

- June 15, 2013

i'm trying scrape website url redirected, programmatically trying gives me 403 error code (forbidden). can place url in browser , browser follow url though...

to show simple example i'm trying go : http://en.wikipedia.org/w/index.php?title=mike_tyson

i've tried urllib2 , mechanize both not work. new web programming , wondering whether there other tricks need in order follow redirect!

thanks!

edit

okay, messed. looking alternative methods because trying scrape mp3. managing succesfully downloading mp3 mangled.

turns out somehow related me downloading on windows or current python version. tested code on ubuntu distro , mp3 file downloaded fine....

so used simple urllib2.openurl , worked perfect!

i wonder why downloading on windows mangled mp3?

try changing mechanize flag not respect robots.txt. also, consider changing user-agent http header:

>>> import mechanize >>> br = mechanize.browser() >>> br.set_handle_robots(false) >>> br.addheaders = [('user-agent', 'mozilla/4.0 (compatible; msie 6.0; windows nt 5.1; sv1)')]

web servers treat running ms internet explorer 6, rather bot. if restrict robots.txt, bot continue work until blocked.

>>> br.open('http://en.wikipedia.org/w/index.php?title=mike_tyson') <response_seek_wrapper @ 0x... wrapped object = <closeable_response @ 0x... fp = <socket._fileobject object @ 0x...>>> #doctest: +ellipsis

Search This Blog

ERT

Python HTTP Redirect requests forbidden -

Comments

Post a Comment

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -

PostgreSQL 9.x - pg_read_binary_file & inserting files into bytea -