半自动模拟登入豆瓣
代码信息:
# /usr/bin/python
#coding:utf-8
__author__ = 'eyu Fanne'
import requests
from bs4 import BeautifulSoup
headers={
"Host":"www.douban.com",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0",
"Accept-Language":"zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",
"Accept-Encoding":"gzip, deflate",
"Connection":"keep-alive"
}
s=requests.session()
s.headers.update(headers)
html_url = s.get('https://www.douban.com/accounts/login',headers=headers)
print s.cookies.items()
print "html_url code %s" %html_url.status_code
html_txt = html_url.text
html_soup = BeautifulSoup(html_txt,'lxml')
img_soup = html_soup.find_all('img',class_="captcha_image")
for img_i in img_soup:
print img_i['src']
cap_img=img_i['src']
for i in html_soup.find_all("input",attrs={"name":"captcha-id"}):
print i['value']
cap_i = i['value']
captcha_solution=raw_input('输入验证码:')
captcha_id=cap_i
print captcha_solution
print captcha_id
url_data={
"source":"index_nav",
"form_email":"*********",
"form_password":"*******",
"captcha-solution":captcha_solution,
"captcha-id":captcha_id,
}
s_login=s.post(html_url,data=url_data,headers=headers)
print s.cookies.items()
账号密码用**代替了,执行时候会给出验证码图片,人为输入的
错误信息:
[('bid', '"X1c3XEWFnhQ"')]
html_url code 200
https://www.douban.com/misc/captcha?id=ArzwwQ6Yv33e0BU7MawrL62d:en&size=s
ArzwwQ6Yv33e0BU7MawrL62d:en
输入验证码:thought
thought
ArzwwQ6Yv33e0BU7MawrL62d:en
Traceback (most recent call last):
File "D:/360_svn/eyugame_python_exercise/121_remote_pro/crawler_ex/get_douban_move/douban_login.py", line 48, in <module>
s_login=s.post(html_url,data=url_data,headers=headers)
File "C:\Python27_x86\lib\site-packages\requests\sessions.py", line 508, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "C:\Python27_x86\lib\site-packages\requests\sessions.py", line 451, in request
prep = self.prepare_request(req)
File "C:\Python27_x86\lib\site-packages\requests\sessions.py", line 382, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "C:\Python27_x86\lib\site-packages\requests\models.py", line 304, in prepare
self.prepare_url(url, params)
File "C:\Python27_x86\lib\site-packages\requests\models.py", line 362, in prepare_url
to_native_string(url, 'utf8')))
requests.exceptions.MissingSchema: Invalid URL '<Response [200]>': No schema supplied. Perhaps you meant http://<Response [200]>?
Process finished with exit code 1
问题出现在哪里?
还有一疑问,requests函数
http://docs.python-requests.org/en/latest/user/advanced/
s = requests.Session()
这边是大写的Session,有些地方看到是小写的session的,有咋区别。
===========
update 更新信息~~~
模拟登入问题已搞定,出现在最后的post请求上,第一个参数我给的不是url参数,
修改后的代码:
# /usr/bin/python
#coding:utf-8
__author__ = 'eyu Fanne'
import requests
from bs4 import BeautifulSoup
headers={
"Host":"www.douban.com",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0",
"Accept-Language":"zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",
"Accept-Encoding":"gzip, deflate",
"Connection":"keep-alive"
}
s=requests.session()
s.headers.update(headers)
login_url=r'https://www.douban.com/accounts/login'
html_url = s.get(login_url,headers=headers)
print s.cookies.items()
print "html_url code %s" %html_url.status_code
html_txt = html_url.text
html_soup = BeautifulSoup(html_txt,'lxml')
img_soup = html_soup.find_all('img',class_="captcha_image")
for img_i in img_soup:
print img_i['src']
cap_img=img_i['src']
for i in html_soup.find_all("input",attrs={"name":"captcha-id"}):
print i['value']
cap_i = i['value']
captcha_solution=raw_input('输入验证码:')
captcha_id=cap_i
print captcha_solution
print captcha_id
url_data={
"source":"index_nav",
"form_email":"******",
"form_password":"******",
"captcha-solution":captcha_solution,
"captcha-id":captcha_id,
}
s_login=s.post(login_url,data=url_data,headers=headers)
print s.cookies.items()
执行结果:
[('bid', '"Ojx9+4qSsdw"')]
html_url code 200
https://www.douban.com/misc/captcha?id=ryEmaBD2QermvX2BSPncxIuY:en&size=s
ryEmaBD2QermvX2BSPncxIuY:en
输入验证码:opposite
opposite
ryEmaBD2QermvX2BSPncxIuY:en
[('bid', '"Ojx9+4qSsdw"'), ('ck', '"malX"'), ('dbcl2', '"41572135:JiIAk8PlKLw"'), ('ue', '"896661380@qq.com"')]
Process finished with exit code 0
最后那个session函数还是没搞懂。
还有一疑问,requests函数
http://docs.python-requests.org/en/latest/user/advanced/
s = requests.Session()
这边是大写的Session,有些地方看到是小写的session的,有咋区别。
2333手抖了吧
s_login=s.post(html_txt,data=url_data,headers=headers)