代码如下
import urllib.request
url = 'http://learnvimscriptthehardway.stevelosh.com/chapters/16.html'
response = urllib.request.urlopen(url)
vim = response.read().decode('utf-8',errors = 'ignore')
with open("vim.html","w") as f:
f.write(vim)
运行结果
Traceback (most recent call last):
File "/Users/zhangzhimin/PycharmProjects/py3 attempt/vim.py", line 8, in <module>
f.write(vim)
UnicodeEncodeError: 'ascii' codec can't encode character '\u0193' in position 35: ordinal not in range(128)
Process finished with exit code 1
新手初学python。求解答
经过探究我发现问题在于大多数网页都是gzip压缩过的,只要导入gzip模块将html数据就压缩就行了。
加上#--coding:utf-8
,建议使用python3
首先你需要在文件头部定义文件的编码
# coding: utf-8
其次你需要导入sys这个模块并设置默认的编码
import sys
reload(sys)
sys.setdefaultencoding('utf8')
这行是解码为unicode
vim = response.read().decode('utf-8',errors = 'ignore')
但写入文件时unicode必须要编码的,比如utf-8
所以要f.write(vim.encode('utf-8'),或者开始就不要解码.