最近在学Python,看了不少资料、视频,对爬虫比较感兴趣,爬过了网页文字、图片、视频。文字就不说了直接从网页上去根据标签分离出来就好了。图片和视频则需要在获取到相应的链接之后取做下载。以下是图片和视频下载的代码备份:
# eg:url-http://dynamic-image.yesky.com/740x-/uploadImages/2016/338/21/7058TW4EAC62.JPG
# path:D:\\pic\\
def pic_down(url,path):
fileName = path + 'pic.jpg'
imgRes = requests.get(url)
with open(fileName,'wb') as f:
f.write(imgRes.content)
def audio_down(url,path):
try:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.3.2.1000 Chrome/30.0.1599.101 Safari/537.36"}
pre_content_length = 0 #前次下载的数据长度(大小)
#接收视频数据
while True:
#若文件存在则断点续传
if os.path.exists(path):
headers['Range'] = 'bytes=%d-' % os.path.getsize(path)
res = requests.get(url,stream=True,headers=headers)
content_length=int(res.headers['content-length'])
#若当前报文长度小于前次报文长度,或者已接收文件等于当前报文长度,则可以认为视频接收完成
if content_lengthor (os.path.exists(path) and os.path.getsize(path)==content_length):
break
pre_content_length =content_length
#写入收到的视频数据
with open(path,'ab') as file:
file.write(res.content)
file.flush()
print('receive data,file size : %d total size:%d' % (os.path.getsize(path),content_length))
except Exception as e:
print(e)