作者:罗丝012 | 来源:互联网 | 2023-09-16 03:59
``所以我试图从这个天气网站上获得学位。但是我一直没有返回一个空白的答案。这是我的代码
Link to a screenshot
import requests
from bs4 import BeautifulSoup
# -----------------------------get site info------------------------------- #
URL = "https://www.theweathernetwork.com/ca/hourly-weather-forecast/ontario/oakville"
request = requests.get(URL)
# print(request.content)
# ----------------------parse site info---------------- #
soup = BeautifulSoup(request.content,'html5lib')
#print(soup.prettify().encode("utf-8"))
weatherdata = soup.find('span',class_='temp')
print(weatherdata)
这些值可能是动态呈现的,即这些值可能由页面中的Javascript填充。
requests.get()
只是返回从服务器接收到的标记,而无需进行任何其他客户端更改,因此它并不是完全等待。
您也许可以使用Selenium Chrome Webdriver来加载页面URL并获取页面源。 (或者您可以使用Firefox驱动程序。)
转到chrome://settings/help
检查当前的Chrome版本,然后从here下载该版本的驱动程序。确保将驱动程序文件保存在PATH
或python脚本所在的文件夹中。
尝试一下:
from bs4 import BeautifulSoup as bs
from selenium.webdriver import Chrome # pip install selenium
from selenium.webdriver.chrome.options import Options
url = "https://www.theweathernetwork.com/ca/hourly-weather-forecast/ontario/oakville"
#Make it headless i.e. run in backgroud without opening chrome window
chrome_optiOns= Options()
chrome_options.add_argument("--headless")
# use Chrome to get page with Javascript generated content
with Chrome(executable_path="./chromedriver",optiOns=chrome_options) as browser:
browser.get(url)
page_source = browser.page_source
#Parse the final page source
soup = bs(page_source,'html.parser')
weatherdata = soup.find('span',class_='temp')
print(weatherdata.text)
10
参考文献:
Get page generated with Javascript in Python
selenium - chromedriver executable needs to be in PATH
,
问题似乎是通过Javascript加载了数据,因此需要一段时间才能加载该特定范围的值。当您执行请求时,它似乎是空的,只是稍后加载。一种可能的解决方案是使用硒等待页面加载,然后再提取html。
from bs4 import BeautifulSoup
from selenium import webdriver
url = "https://www.theweathernetwork.com/ca/hourly-weather-forecast/ontario/oakville"
browser = webdriver.Chrome()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html,'html.parser')
elem = soup.find('span',class_='temp')
print(elem.text)