首页 > python 爬虫爬wiki 报错 [Errno 65] No route to host

python 爬虫爬wiki 报错 [Errno 65] No route to host

代码如下

# -*- coding: utf-8 -*-
import bs4
import re
import requests

from bs4 import BeautifulSoup

def work(html):
    soup = BeautifulSoup(html,'html.parser')
    print(soup.prettify())

use_data = {}
use_data['url'] = r'https://zh.wikipedia.org/zh/\%E9\%A2\%9C\%E8\%89\%B2\%E5\%88\%97\%E8\%A1\%A8'
proxy = {"http":"http://72.46.135.119:21071","https":"https://72.46.135.119:21071"} # shadowsocks服务器地址
# response = requests.get(use_data['url'])
response = requests.get(use_data['url'],proxies = proxy,verify=False)
print type(requests.get(use_data['url']).text) #查看编码
response.encoding = 'gbk'
work(response.text)

在不启用代理,注释proxy,执行response = requests.get(use_data['url']) 时报错

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='zh.wikipedia.org', port=443): Max retries exceeded with url: /zh/%5C%E9%5C%A2%5C%9C%5C%E8%5C%89%5C%B2%5C%E5%5C%88%5C%97%5C%E8%5C%A1%5C%A8 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x109c3e810>: Failed to establish a new connection: [Errno 65] No route to host',))

试着启用代理,使用的是自己买的shadowsocks服务器..结果报错无法连接代理。想问一下python爬虫可以用shadowsocks服务器作代理进行爬虫吗?如果不行,用什么方式代理爬wiki百科比较合适方便。谢谢

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='zh.wikipedia.org', port=443): Max retries exceeded with url: /zh/%5C%E9%5C%A2%5C%9C%5C%E8%5C%89%5C%B2%5C%E5%5C%88%5C%97%5C%E8%5C%A1%5C%A8 (Caused by ProxyError('Cannot connect to proxy.', error(54, 'Connection reset by peer')))

你用代理的方式不对吧。requests 支持的是 HTTP 代理,shadowsocks 是 socks5 代理。

建议写 hosts,不要禁用证书验证,使用 MediaWiki API。

写入 hosts 文件的 IP 你直接查别的语种的维基百科就可以了。


可以看看http://stackoverflow.com/questions/12601316/how-to-make-python-requests-work-via-socks-proxy

【热门文章】
【热门文章】