作者:mobiledu2502929547 | 来源:互联网 | 2023-09-14 08:35
1.find和find_all方法1.1find和find_all方法find方法,找到第一个满足条件的标签后立即返回,只返回一个元素find_all方法,找到所有满足条件的标签
1. find 和find_all方法
1.1 find和find_all方法
- find方法,找到第一个满足条件的标签后立即返回,只返回一个元素
- find_all方法,找到所有满足条件的标签都返回回去,以列表形式返回很多元素
a_list = soup.find_all('a')
for a in a_list:
# 1.
# href = a['href']
# print(href)
# 2.
href = a.attrs['href']
print(href)
1.2 使用find和find_all的过滤条件
- 关键字参数:将属性的名字作为关键字参数的名字,以及属性的值作为关键数参数的值进行过滤
- attrs参数:将属性条件放到字典中传给attrs参数
trs1 = soup.find_all('tr', class_='even') #attrs={'class': 'even}
for tr in trs1:
print(tr)
# 6.获取所有职位信息
trs2 = soup.find_all('tr')[1:]
lists = []
for tr in trs2:
# tds = tr.find_all('td')
# name = tds[0].string
# category = tds[1].string
# info = {}
# info['name'] = name
# info['category'] = category
# list.append(info)
infos = list(tr.stripped_strings)
print(infos)
# print(lists)
from bs4 import BeautifulSoup
html = """
"""
soup = BeautifulSoup(html, 'lxml')
# 1.获取所有tr标签
# print(soup.tr)
# print(soup.find('tr')) #都只能获取一个tr标签,所以需要用到find_all
# trs = soup.find_all('tr')
# for tr in trs:
# print(tr)
# print('-'*50)
# 2.获取第2个tr标签
# tr = soup.find_all('tr', limit=2)[1]
# print(tr)
# 3.获取所有class=even的标签
trs1 = soup.find_all('tr', class_='even') #attrs={'class': 'even}
for tr in trs1:
print(tr)
# 4.所有id=test, class=test的a的标签
# list = soup.find_all('a', id='test', class_='test')
# for a in list:
# print(a)
# 5. 所有a标签的href
a_list = soup.find_all('a')
for a in a_list:
# 1.
# href = a['href']
# print(href)
# 2.
href = a.attrs['href']
print(href)
# 6.获取所有职位信息
trs2 = soup.find_all('tr')[1:]
lists = []
for tr in trs2:
# tds = tr.find_all('td')
# name = tds[0].string
# category = tds[1].string
# info = {}
# info['name'] = name
# info['category'] = category
# list.append(info)
infos = list(tr.stripped_strings)
print(infos)
# print(lists)
2. select 方法
使用css选择器的语法,使用select方法更加方便。
具体方法:
from bs4 import BeautifulSoup
html = """
"""
soup = BeautifulSoup(html, 'lxml')
trs = soup.select('tr')
# print(trs)
trs1 = soup.select('tr')[1]
# print(trs1)
trs2 = soup.select('tr[class="even"]')
# print(trs2)
trs3 = soup.select('a')
for a in trs3:
href = a['href']
# print(href)
trs4 = soup.select('tr')
for tr in trs4:
info = list(tr.stripped_strings)
print(info)