描述问题
原本打算
构造XML对象
利用XPath语法选择
但是XPath死活选不到东西
后来发现
构造的XML对象根本不是我想要的(难怪选不到东西)
自认为步骤没有错, 到底是哪里出错了?
上下文环境
python 2.7.11+ (default, Apr 17 2016, 14:00:29)
[GCC 5.3.1 20160413] on linux2
pip show lxml
---
Metadata-Version: 1.1
Name: lxml
Version: 3.5.0
Summary: Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.
Home-page: http://lxml.de/
Author: lxml dev team
Author-email: lxml-dev@lxml.de
License: UNKNOWN
Location: /usr/lib/python2.7/dist-packages
Requires:
Classifiers:
Development Status :: 5 - Production/Stable
Intended Audience :: Developers
Intended Audience :: Information Technology
License :: OSI Approved :: BSD License
Programming Language :: Cython
Programming Language :: Python :: 2
Programming Language :: Python :: 2.6
Programming Language :: Python :: 2.7
Programming Language :: Python :: 3
Programming Language :: Python :: 3.2
Programming Language :: Python :: 3.3
Programming Language :: Python :: 3.4
Programming Language :: Python :: 3.5
Programming Language :: C
Operating System :: OS Independent
Topic :: Text Processing :: Markup :: HTML
Topic :: Text Processing :: Markup :: XML
Topic :: Software Development :: Libraries :: Python Modules
重现
相关代码
from lxml import etree
xml = """
<html>
<head>
<title>Example page</title>
</head>
<body>
<p>Moved to <a href="http://example.org/">example.org</a>
or <a href="http://example.com/">example.com</a>.</p>
</body>
</html>
"""
page = etree.XML(xml)
print page.text #不是想要的东西