python - Passing arguments inside Scrapy spider through lambda callbacks -
hi,
i'm have short spider code:
class testspider(crawlspider): name = "test" allowed_domains = ["google.com", "yahoo.com"] start_urls = [ "http://google.com" ] def parse2(self, response, i): print "page2, i: ", # traceback.print_stack() def parse(self, response): in range(5): print "page1 : ", link = "http://www.google.com/search?q=" + str(i) yield request(link, callback=lambda r:self.parse2(r, i))
and expect output this:
page1 : 0 page1 : 1 page1 : 2 page1 : 3 page1 : 4 page2 : 0 page2 : 1 page2 : 2 page2 : 3 page2 : 4
, however, actual output this:
page1 : 0 page1 : 1 page1 : 2 page1 : 3 page1 : 4 page2 : 4 page2 : 4 page2 : 4 page2 : 4 page2 : 4
so, arguemnt pass in callback=lambda r:self.parse2(r, i)
somehow wrong.
what's wrong code ?
the lambdas accessing i
being held in closure referencing same value (the value of i
in youre parse
function when lambdas called). simpler reconstruction of phenomenon is:
>>> def do(x): ... in range(x): ... yield lambda: ... >>> delayed = list(do(3)) >>> d in delayed: ... print d() ... 2 2 2
you can see i
's in lambdas bound value of i
in function do
. return whatever value has , python keep scope alive long of lambdas alive preserve value it. what's referred closure.
a simple ugly work around
>>> def do(x): ... in range(x): ... yield lambda i=i: ... >>> delayed = list(do(3)) >>> d in delayed: ... print d() ... 0 1 2
this works because, in loop, current value of i
bound paramater i
of lambda. alternatively (and maybe little bit clearer) lambda r, x=i: (r, x)
. important part making assignment outside body of lambda (which executed later) binding variable current value of i
instead of value takes @ end of loop. makes lambdas not closed on i
, can each have own value.
so need change line
yield request(link, callback=lambda r:self.parse2(r, i))
to
yield request(link, callback=lambda r, i=i:self.parse2(r, i))
and you're cherry.
Comments
Post a Comment