作者:mobiledu2502868653 | 来源:互联网 | 2023-07-19 13:32
篇首语:本文由编程笔记#小编为大家整理,主要介绍了requests模块学习相关的知识,希望对你有一定的参考价值。
import requests
url = ‘https://item.jd.com/2967929.html‘
try:
r = requests.get(url)
r.raise_for_status()
r.encoding = r.apparent_encoding
print(r.text[:1000])
except:
print("抓取失败")
抓jd产品页面信息^
import requests
url = ‘https://www.amazon.cn/gp/product/B01M8L5Z3Y‘
try:
kv = {‘user-agent‘:‘Mozilla/5.0‘}
r = requests.get(url,headers = kv)
r.raise_for_status()
r.encoding = r.apparent_encoding
print(r.text[1000:2000])
except:
print("抓取失败")
抓z产品页面信息,较jd页面,加入headers参数,意在说明,有些网站屏蔽程序访问页面。^
import requests
keyword = ‘Python‘
try:
kv = {‘wd‘:keyword}
r = requests.get(‘http://www.baidu.com/s‘,params=kv)
# 百度搜索URL一般为:http://www.baidu.com/s?wd=关键字
# params参数加了“?wd=Python",现在(20200225)还需参考z产品页面抓取示例,百度也有屏蔽user-agent为python
print(r.request.url)
r.raise_for_status()
print(len(r.text))
except:
print("抓取失败")
通过代码提交关键字搜索,并抓取页面,上文注释具体情况如下:^
print(r.request.headers)
#{‘User-Agent‘: ‘python-requests/2.22.0‘, ‘Accept-Encoding‘: ‘gzip, deflate‘, ‘Accept‘: ‘*/*‘, ‘Connection‘: ‘keep-alive‘}
r = requests.get(‘http://www.baidu.com/s‘,params=kv)
改为:
r = requests.get(‘http://www.baidu.com/s‘,headers={‘user-agent‘:‘Mozilla/5.0‘}params=kv)
import requests
import os
url = ‘http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg‘
root = "D://pics//"
path = root +url.split(‘/‘)[-1]
try:
if not os.path.exists(root):
os.mkdir(root)
if not os.path.exists(path):
r = requests.get(url)
with open(path,‘wb‘) as f:
f.write(r.content) # 写二进制文件内容
f.close()
print(‘保存文件成功‘)
else:
print(‘文件已存在‘)
except:
print("抓取失败")
保存国家地理杂志网站的一幅图片^
import requests
import re
c = re.compile(r"(.*?)") # 非贪婪模式
url = ‘http://www.ip138.com/iplookup.asp?ip=‘
try:
r = requests.get(url + ‘202.204.80.112‘ + ‘&action=2‘,headers={‘User-Agent‘:‘Mozilla/5.0‘}) # url较教程做了调整
r.raise_for_status()
r.encoding = r.apparent_encoding
cMatch = c.search(str(r.text))
print(cMatch.group(1))
except:
print("抓取失败")
地址所属地查询^,比嵩天老师的代码多了re表达式,直接匹配出结果
D:python_workvenvScriptspython.exe D:/python_work/test.py
本站数据:北京市海淀区 北京理工大学 教育网
Process finished with exit code 0