Scrapy要用代理ip,有自带的中间件可以使用吗?
如果有的话,怎么使用?有文章或文档什么的麻烦贴一下链接,谢谢!
将以下代码保存为 middlewares.py 放项目目录里
import random, base64
class ProxyMiddleware(object):
proxyList = ['36.250.69.4:80', '58.18.52.168:3128', '58.253.238.243:80', '60.191.164.22:3128', '60.191.167.93:3128']
def process_request(self, request, spider):
# Set the location of the proxy
pro_adr = random.choice(self.proxyList)
print "USE PROXY -> "+pro_adr
request.meta['proxy'] = "http://"+ pro_adr
'''这里用的免费代理,不用用户名密码的.如果有用户名和密码,还要加入以下代码
proxy_user_pass = "USERNAME:PASSWORD"
encoded_user_pass = base64.encodestring(proxy_user_pass)
request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass'''
在setting.py文件里加入以下代码:
DOWNLOADER_MIDDLEWARES = {
#项目名称改成自己的
'pro_name.middlewares.ProxyMiddleware': 100,
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110
}
这个应该是你想要的http://doc.scrapy.org/en/mast...