首页 > BeautifulSoup中的parent使用的一点疑问

BeautifulSoup中的parent使用的一点疑问

在看《Python网络数据采集》一书,里面有这样一段代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup

html=urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj=BeautifulSoup(html,"html.parser")
print(bsObj.find("img",{"src:":"../img/gifts/img1.jpg"}).parent.previous_siblingget_text())

但是运行后出错,提示:

AttributeError: 'NoneType' object has no attribute 'parent'

爬取得URL对应的网页页面如图所示:

对应的代码如下:

<html>
<head>
<style>
img{
    width:75px;
}
table{
    width:50%;
}
td{
    margin:10px;
    padding:10px;
}
.wrapper{
    width:800px;
}
.excitingNote{
    font-style:italic;
    font-weight:bold;
}
</style>
</head>
<body>
<div id="wrapper">
<img src="../img/gifts/logo.jpg" style="float:left;">
<h1>Totally Normal Gifts</h1>
<div id="content">Here is a collection of totally normal, totally reasonable gifts that your friends are sure to love! Our collection is
hand-curated by well-paid, free-range Tibetan monks.<p>
We haven't figured out how to make online shopping carts yet, but you can send us a check to:<br>
123 Main St.<br>
Abuja, Nigeria
</br>We will then send your totally amazing gift, pronto! Please include an extra $5.00 for gift wrapping.</div>
<table id="giftList">
<tr><th>
Item Title
</th><th>
Description
</th><th>
Cost
</th><th>
Image
</th></tr>

<tr id="gift1" class="gift"><td>
Vegetable Basket
</td><td>
This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
<span class="excitingNote">Now with super-colorful bell peppers!</span>
</td><td>
$15.00
</td><td>
<img src="../img/gifts/img1.jpg">
</td></tr>

<tr id="gift2" class="gift"><td>
Russian Nesting Dolls
</td><td>
Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span>
</td><td>
$10,000.52
</td><td>
<img src="../img/gifts/img2.jpg">
</td></tr>

<tr id="gift3" class="gift"><td>
Fish Painting
</td><td>
If something seems fishy about this painting, it's because it's a fish! <span class="excitingNote">Also hand-painted by trained monkeys!</span>
</td><td>
$10,005.00
</td><td>
<img src="../img/gifts/img3.jpg">
</td></tr>

<tr id="gift4" class="gift"><td>
Dead Parrot
</td><td>
This is an ex-parrot! <span class="excitingNote">Or maybe he's only resting?</span>
</td><td>
$0.50
</td><td>
<img src="../img/gifts/img4.jpg">
</td></tr>

<tr id="gift5" class="gift"><td>
Mystery Box
</td><td>
If you love suprises, this mystery box is for you! Do not place on light-colored surfaces. May cause oil staining. <span class="excitingNote">Keep your friends guessing!</span>
</td><td>
$1.50
</td><td>
<img src="../img/gifts/img6.jpg">
</td></tr>
</table>
</p>
<div id="footer">
&copy; Totally Normal Gifts, Inc. <br>
+234 (617) 863-0736
</div>

</div>
</body>
</html>

个人感觉书中代码逻辑上没有错误。。。但是为什么最后IDE会提示'NoneType' object呢。。。是find函数使用不对么。。。应该怎么正确写这段代码呢?


previous_siblingget_text中间少.不说
对find,keyword式
bsObj.find("img",src="../img/gifts/img1.jpg").parent.previous_sibling.get_text()
attr式
bsObj.find("img",attrs={"src":"../img/gifts/img1.jpg"}).parent.previous_sibling.get_text()

参考: http://beautifulsoup.readthedocs.io/zh_CN/latest/#find


from urllib.request import urlopen
from bs4 import BeautifulSoup

html=urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj=BeautifulSoup(html,"html.parser")      
print(bsObj.find("img",{"src":"../img/gifts/img1.jpg"}).parent.previous_sibling.get_text())

你的代碼... 也太多錯...

  1. "src:" 應改為 "src"

  2. parent.previous_siblingget_text() 應改為 parent.previous_sibling.get_text()


我回答過的問題: Python-QA


{"src:":"../img/gifts/img1.jpg"}应该改为{"src":"../img/gifts/img1.jpg"},它找的src属性,没有:这个符号的。还有后面应该是.previous_sibling.get_text()吧!认真点...

【热门文章】
【热门文章】