这里先获取平均分 //待续
打开网页
https://book.douban.com/subject/26853356/comments/
综合下一页可以知道:
后面的页码
https://book.douban.com/subject/26853356/comments/hot?p=2
<ul class="comment-paginator">
<li class="p">
<span class="page-disabled">第一页span>
li>
<li class="p">
<span class="page-disabled">前一页span>
li>
<li class="p">
<a class="page-btn" href="hot?p=2">后一页a>
li>
ul>
""" 抽取某本书的前 50 条短评内容并计算评分的平均值 """
import requests
from bs4 import BeautifulSoup
import re
sum = 0
url = 'https://book.douban.com/subject/26853356/comments/'
pattern_s = re.compile(')
p = []
while len(p) <50:
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
p.extend(re.findall(pattern_s, r.text))
btn = soup.find_all('a','page-btn',text="后一页")
url += btn[0].attrs['href']
for star in p:
sum += int(star)
print("the average value is : {:.2f} ".format(sum/len(p)))
这里p.extend(re.findall(pattern_s, r.text))用了列表的extend函数:
用于2个列表顺序相加
>>>a = [1,2,3,4]
>>>b = [1,2,3,4,5]
>>>a.extend(b)
>>>a
[1, 2, 3, 4, 1, 2, 3, 4, 5]
>>> c = 'test'
>>> a.extend(c)
>>> a
[1, 2, 3, 4, 1, 2, 3, 4, 5, 't', 'e', 's', 't']
>>> c