这将从ajax请求中获取json格式的数据:import requests
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36',
'Content-Type': 'application/json',
'Referer': 'http://droughtmonitor.unl.edu/MapsAndData/DataTables.aspx',
'X-Requested-With': 'XMLHttpRequest',
}
import json
data = json.dumps({'area':'conus', 'type':'conus', 'statstype':'1'})
ajax = requests.post("http://droughtmonitor.unl.edu/Ajax.aspx/ReturnTabularDM",
data=data,
headers=headers)
from pprint import pprint as pp
pp(ajax.json())
输出片段:
^{pr2}$
您可以从返回的json中获取所需的所有数据,如果您print(len(cont.json()["d"]))您将看到返回的853行,因此您似乎可以一次性从35页中获取所有数据。即使你确实解析了这个页面,你仍然需要再做34次,从ajax请求中获取json使得解析变得很容易,而且都是从一个post中获得的。在
要按状态过滤,我们需要将type设置为state,并将{}设置为CA:data = json.dumps({'type':'state', 'statstype':'1','area':'CA'})
ajax = requests.post("http://droughtmonitor.unl.edu/Ajax.aspx/ReturnTabularDM",
data=data,
headers=headers)
from pprint import pprint as pp
pp(ajax.json())
又是一个简短的片段:{u'd': [{u'D0': 95.73,
u'D1': 89.68,
u'D2': 74.37,
u'D3': 49.15,
u'D4': 21.04,
u'Date': u'2016-05-03',
u'FileDate': u'20160503',
u'None': 4.27,
u'ReleaseID': 890,
u'__type': u'DroughtMonitorData.DmData'},
{u'D0': 95.76,
u'D1': 90.09,
u'D2': 74.37,
u'D3': 49.15,
u'D4': 21.04,
u'Date': u'2016-04-26',
u'FileDate': u'20160426',
u'None': 4.24,
u'ReleaseID': 889,
u'__type': u'DroughtMonitorData.DmData'},
您将看到与页面上显示的内容相匹配的内容。在