Python：使用BeautifulSoup提取锚文本

超链接如下：

<a target="_blank" href="http://www.baidu.com"><span id="video_hl">国际足球</span>巴西世界杯</a>

现在，我想提取锚文本：国际足球巴西世界杯
现在的方法：

a = """<a target="_blank" href="http://www.baidu.com"><span id="video_hl">国际足球</span>巴西世界杯</a>"""
    soup = BeautifulSoup("".join(a))
    print soup.contents[0].string

结果输出None，我知道问题出在标签a中还包含其他的标签（span），从而导致出错，不知道怎么实现呢？

from bs4 import BeautifulSoup

html = '<a target="_blank" href="http://www.baidu.com"><span id="video_hl">国际足球</span>巴西世界杯</a>'
soup = BeautifulSoup(html)
print(soup.select('a')[0].text)

解决办法：

    a = """<a target="_blank" href="http://www.baidu.com"><span id="video_hl">国际足球</span>巴西世界杯</a>"""
    soup = BeautifulSoup("".join(a))
    print soup.text

注：要使用bs4版本

【热门文章】