我们知道,平常下载图片,都是右击鼠标另存这种做法,那
在scrapy的Pipeline中,到底是哪个语句执行了download操作呐?
是get_media_requests中的yield Request('图片绝对路径')吗?
请教。
- 先通过
xpath
把图片URL
存到item
指定字段中 - 再通过
Pipeline
把这个字段中的图片下载到本地
下面是一个我自己实现的下载图片Pipeline
可以参考:https://github.com/ZhangBohan/fun_crawler/blob/master/fun/pipelines.py
# -*- coding: utf-8 -*-
import requests
from fun import settings
import os
class ImageDownloadPipeline(object):
def process_item(self, item, spider):
if 'image_urls' in item:
images = []
dir_path = '%s/%s' % (settings.IMAGES_STORE, spider.name)
if not os.path.exists(dir_path):
os.makedirs(dir_path)
for image_url in item['image_urls']:
us = image_url.split('/')[3:]
image_file_name = '_'.join(us)
file_path = '%s/%s' % (dir_path, image_file_name)
images.append(file_path)
if os.path.exists(file_path):
continue
with open(file_path, 'wb') as handle:
response = requests.get(image_url, stream=True)
for block in response.iter_content(1024):
if not block:
break
handle.write(block)
item['images'] = images
return item