python - Passing arguments inside Scrapy spider through lambda callbacks -

- February 15, 2014

hi,

i'm have short spider code:

class testspider(crawlspider):     name = "test"     allowed_domains = ["google.com", "yahoo.com"]     start_urls = [         "http://google.com"     ]      def parse2(self, response, i):         print "page2, i: ",         # traceback.print_stack()       def parse(self, response):         in range(5):             print "page1 : ",             link = "http://www.google.com/search?q=" + str(i)             yield request(link, callback=lambda r:self.parse2(r, i))

and expect output this:

page1 :  0 page1 :  1 page1 :  2 page1 :  3 page1 :  4  page2 :  0 page2 :  1 page2 :  2 page2 :  3 page2 :  4

, however, actual output this:

page1 :  0 page1 :  1 page1 :  2 page1 :  3 page1 :  4  page2 :  4 page2 :  4 page2 :  4 page2 :  4 page2 :  4

so, arguemnt pass in callback=lambda r:self.parse2(r, i) somehow wrong.

what's wrong code ?

the lambdas accessing i being held in closure referencing same value (the value of i in youre parse function when lambdas called). simpler reconstruction of phenomenon is:

>>> def do(x): ...     in range(x): ...         yield lambda: ...  >>> delayed = list(do(3)) >>> d in delayed: ...     print d() ...  2 2 2

you can see i's in lambdas bound value of i in function do. return whatever value has , python keep scope alive long of lambdas alive preserve value it. what's referred closure.

a simple ugly work around

>>> def do(x): ...     in range(x): ...         yield lambda i=i: ...  >>> delayed = list(do(3)) >>> d in delayed: ...     print d() ...  0 1 2

this works because, in loop, current value of i bound paramater i of lambda. alternatively (and maybe little bit clearer) lambda r, x=i: (r, x). important part making assignment outside body of lambda (which executed later) binding variable current value of i instead of value takes @ end of loop. makes lambdas not closed on i , can each have own value.

so need change line

yield request(link, callback=lambda r:self.parse2(r, i))

yield request(link, callback=lambda r, i=i:self.parse2(r, i))

and you're cherry.

Search This Blog

ERT

python - Passing arguments inside Scrapy spider through lambda callbacks -

Comments

Post a Comment

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -

PostgreSQL 9.x - pg_read_binary_file & inserting files into bytea -