scrapy爬虫,要爬取1-100页的内容,用循环把网址生成,代码如下:
def start_requests(self):
pages=[]
for i in range(1,100):
newpage=scrapy.Request("http://www.yyyy.com/yyy/yyy-list.php?page=%s"%i)
pages.append(newpage)
return pages
这样对吗?
import scrapy
url_prefix = "http://www.yyyy.com/yyy/yyy-list.php?page={}"
class YyyySpider(scrapy.spiders.Spider):
name = "Yyyy"
allowed_domains = ["yyyy.com"]
start_urls = [
url_prefix.format(i) for i in range(1,101)
]
def parse(self, response):
filename = response.url.split("/")[-2]
with open(filename, 'wb') as f:
f.write(response.body)
大概可以这样