热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

使用python自动化下载pdf文档

使用python输入PDF编号自动下载freepatentsonline.com的文档#!usrbinenvpython3#codingutf-8#Version:python3.

使用python输入PDF编号自动下载freepatentsonline.com的文档

#!/usr/bin/env python3
#
coding=utf-8
#
Version:python3.6.1
#
File:requests_freepatentsonline_com.py
#
Author:lgsp_Harold
import os
import requests
from lxml import etree

dir_path
= './files/freepatentsonline_com/'

if not os.path.exists(dir_path):
os.makedirs(dir_path)

headers
= {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0',
}


while True:
number
= input('(输入Q退出)输入pdf编号:').strip()
if number == 'Q':
break
url
= 'https://www.freepatentsonline.com/' + number + '.pdf'
pdf_response
= requests.get(url=url, headers=headers)

doc
= etree.HTML(pdf_response.text)
download
= doc.xpath('//center[@]/iframe/@src')[0]

# [url=https://s3.amazonaws.com/pdf.sumobrain.com/US9039490B2.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIBOKHYOLP4MBMRGQ%2F20210715%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210715T000000Z&X-Amz-Expires=173822&X-Amz-SignedHeaders=host&X-Amz-Signature=ade0d0aad351dc65cb130810793964e11a6970120fe6bb3258a9728424db6a42#view=FitH]https://s3.amazonaws.com/pdf.sum ... 424db6a42#view=FitH[/url]
download_url = download.replace('#view=FitH', '')

file
= requests.get(download_url, headers=headers)

file_path
= './files/freepatentsonline_com/' + number + '.pdf'

with open(file_path,
'wb') as f:
f.write(file.content)
print('%s-PDF成功下载' % number)

 

略懂,略懂....



推荐阅读
author-avatar
痴情被撕碎的阳光
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有