<dd
isStop = "1" class='isStop'
matchcode="201409066001"
matchnumcn ="周六001"
starttime = "1409994000000"
endtime ="1409993820000"
isattention = "0"
hostname="北九州" guestname="福冈黄蜂"
leagueid = "533"
hostteamid = "46148"
visitteamid = "12193"
matchid="1000817"
leagueName="J2联赛"
class="league_533"
style="display: none;"
ishot="0"
>
pass</dd>
比如我想获取的是:
style="display: none;"
这个字段的none~如何获取呢?
s = """ <dd
isStop = "1" class='isStop'
matchcode="201409066001"
matchnumcn ="周六001"
starttime = "1409994000000"
endtime ="1409993820000"
isattention = "0"
hostname="北九州" guestname="福冈黄蜂"
leagueid = "533"
hostteamid = "46148"
visitteamid = "12193"
matchid="1000817"
leagueName="J2联赛"
class="league_533"
style="display: none;"
ishot="0"
>
pass</dd>"""
from pyquery import PyQuery
p = PyQuery(s)
a=p("dd")
print a.attr('style')
print a.attr('hostname')
display: none;
北九州
1.如果python的cgi中能有专门获取html中style或者属性的方法最好,这style 既没有id name 也不是value。不知道能不能get出来
2,我的超级笨办法我的思路:把这一大块用'''包裹,之后,另开一个py文件,用open打开刚才的要检索的大块,用readlins()去读取那大块中的每一行,用正则匹配出 style=“dispaly:”,之后再用str的切片 切出来。
用tag attrs["style"] 然后正则
上代码:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
tag_content = """
<dd
isStop = "1" class='isStop'
matchcode="201409066001"
matchnumcn ="周六001"
starttime = "1409994000000"
endtime ="1409993820000"
isattention = "0"
hostname="北九州" guestname="福冈黄蜂"
leagueid = "533"
hostteamid = "46148"
visitteamid = "12193"
matchid="1000817"
leagueName="J2联赛"
class="league_533"
style="display: none;"
ishot="0">
pass</dd>
"""
from bs4 import BeautifulSoup
tag_soup = BeautifulSoup(tag_content)
style_str = tag_soup.dd["style"]
print style_str.split(":")[1].lstrip()[:-1]
Beautiful Soup不能直接获得“none”,不过我们能容易地得到display: none;
,然后用python很容易处理了。