描述问题
Firefox上能够选择正确
但是到soup上无法起作用
请重点关注下文中,重现代码中的css_path
上下文环境
Python 2.7.11 |Anaconda 2.5.0 (64-bit)| (default, Jan 29 2016, 14:26:21) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
---
Metadata-Version: 2.0
Name: lxml
Version: 3.5.0
Summary: Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.
Home-page: http://lxml.de/
Author: lxml dev team
Author-email: lxml-dev@lxml.de
License: UNKNOWN
Location: c:\anaconda2\lib\site-packages
Requires:
Metadata-Version: 1.1
Name: beautifulsoup4
Version: 4.4.1
Summary: Screen-scraping library
Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/
Author: Leonard Richardson
Author-email: leonardr@segfault.org
License: MIT
Location: c:\anaconda2\lib\site-packages
Requires:
重现
拷贝代码, 运行
相关代码
from __future__ import absolute_import, unicode_literals
import requests
import lxml
from bs4 import BeautifulSoup
url = 'http://v2ex.com/?tab=hot'
headers = {
'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'
}
response = requests.get(url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'lxml')
css_path = 'html body div#Wrapper div.content div#Main div.box div.cell.item table tbody tr td span.item_title a'
print soup.select(css_path) #这里返回为空列表, 按理不是这样
# 这个css选择器是正确的: .item_title>a
报错信息
相关截图
已经尝试哪些方法仍然没有解决(附上相关链接)
问题简化
解决了
正确的方法是: 使用专业的css/XPath选择工具,生成可用的选择器, 浏览器本身生成的, 完全不好用