python - Passing arguments inside Scrapy spider through lambda callbacks -
hi,
i'm have short spider code:
class testspider(crawlspider): name = "test" allowed_domains = ["google.com", "yahoo.com"] start_urls = [ "http://google.com" ] def parse2(self, response, i): print "page2, i: ", # traceback.print_stack() def parse(self, response): in range(5): print "page1 : ", link = "http://www.google.com/search?q=" + str(i) yield request(link, callback=lambda r:self.parse2(r, i)) and expect output this:
page1 : 0 page1 : 1 page1 : 2 page1 : 3 page1 : 4 page2 : 0 page2 : 1 page2 : 2 page2 : 3 page2 : 4 , however, actual output this:
page1 : 0 page1 : 1 page1 : 2 page1 : 3 page1 : 4 page2 : 4 page2 : 4 page2 : 4 page2 : 4 page2 : 4 so, arguemnt pass in callback=lambda r:self.parse2(r, i) somehow wrong.
what's wrong code ?
the lambdas accessing i being held in closure referencing same value (the value of i in youre parse function when lambdas called). simpler reconstruction of phenomenon is:
>>> def do(x): ... in range(x): ... yield lambda: ... >>> delayed = list(do(3)) >>> d in delayed: ... print d() ... 2 2 2 you can see i's in lambdas bound value of i in function do. return whatever value has , python keep scope alive long of lambdas alive preserve value it. what's referred closure.
a simple ugly work around
>>> def do(x): ... in range(x): ... yield lambda i=i: ... >>> delayed = list(do(3)) >>> d in delayed: ... print d() ... 0 1 2 this works because, in loop, current value of i bound paramater i of lambda. alternatively (and maybe little bit clearer) lambda r, x=i: (r, x). important part making assignment outside body of lambda (which executed later) binding variable current value of i instead of value takes @ end of loop. makes lambdas not closed on i , can each have own value.
so need change line
yield request(link, callback=lambda r:self.parse2(r, i)) to
yield request(link, callback=lambda r, i=i:self.parse2(r, i)) and you're cherry.
Comments
Post a Comment