首页 > UnicodeEncodeError: 'ascii' codec can't encode character

UnicodeEncodeError: 'ascii' codec can't encode character

根据书本Web Scraping with Python: Collecting Data from the Modern
其中chart 3的例子,运行后提示错误如下:

UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 585: ordinal not in range(128)

原因是在于 查看了下,是在于print(bsObj.find(id="mw-content-text").findAll("p")[0])

网上也有一些解决的办法,大部分都说是print的限制问题,比如

http://blog.csdn.net/jim7424994/article/details/22675759

但是按照这个方法还是无法解决问题,不过他上面的例子倒是正常编译执行
请教各位大概是如何解决的?有没有参考办法

代码如下:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re,urllib


pages = set()
def getLinks(pageUrl):
    global pages
    html = urlopen("http://en.wikipedia.org"+pageUrl)
    bsObj = BeautifulSoup(html)
    try:
        print(bsObj.h1.get_text())
        print(bsObj.find(id="mw-content-text").findAll("p")[0])
        print(bsObj.find(id="ca-edit").find("span").find("a").attrs['href'])
    except AttributeError:
        print("this page is missing somthing! No worries though!")

    for link in bsObj.find("a",href=re.compile("^(/wiki/)")):
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages:
                newPage = link.attrs['href']
                print("--------------\n"+newPage)
                pages.add(newPage)
                getLinks(newPage)
getLinks("")

已经完成是关于print 的问题
解决办法如下
import io
import sys

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf-8')

【热门文章】
【热门文章】