使用python获取nature系列期刊封面高清图片

作者：陈可不能哭 | 来源：互联网 | 2023-08-24 04:35

nature作为科学界最顶级的期刊之一，其期刊封面审美也一直很在线，兼具科学和艺术的美感为了方便快速获取nature系列封面，这里用py

nature作为科学界最顶级的期刊之一&＃xff0c;其期刊封面审美也一直很在线&＃xff0c;兼具科学和艺术的美感

为了方便快速获取nature系列封面&＃xff0c;这里用python requests模块进行自动化请求并使用BeautifulSoup模块进行html解析

import requests from bs4 import BeautifulSoup import ospath &＃61; &＃39;C:\\Users\\User\\Desktop\\nature 封面\\nature 正刊&＃39; # path &＃61; os.getcwd() if not os.path.exists(path):os.makedirs(path)print("新建文件夹 nature正刊")# 在这里改变要下载哪期的封面 # 注意下载是从后往前下载的&＃xff0c;所以start_volume应大于等于end_volume start_volume &＃61; 501 end_volume &＃61; 500 # nature_url &＃61; &＃39;https://www.nature.com/ng/volumes/&＃39; # nature genetics nature_url&＃61;&＃39;https://www.nature.com/nature/volumes/&＃39; # nature 正刊 kv &＃61; {&＃39;User-Agent&＃39;:&＃39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36&＃39; } while start_volume >&＃61; end_volume:try:volume_url &＃61; nature_url &＃43; str(start_volume)volume_response &＃61; requests.get(url&＃61;volume_url, headers&＃61;kv, timeout&＃61;120)except Exception:print(str(start_volume) &＃43; "请求异常")with open(path &＃43; "\\异常.txt", &＃39;at&＃39;) as txt:txt.write(str(start_volume) &＃43; "请求异常\n")continuevolume_response.encoding &＃61; &＃39;utf-8&＃39;volume_soup &＃61; BeautifulSoup(volume_response.text, &＃39;html.parser&＃39;)ul_tag &＃61; volume_soup.find_all(&＃39;ul&＃39;,class_&＃61;&＃39;ma0 clean-list grid-auto-fill grid-auto-fill-w220 very-small-column medium-row-gap&＃39;)img_list &＃61; ul_tag[0].find_all("img")issue_number &＃61; 0for img_tag in img_list:issue_number &＃43;&＃61; 1filename &＃61; path &＃43; &＃39;\\&＃39; &＃43; str(start_volume) &＃43; &＃39;_&＃39; &＃43; str(issue_number) &＃43; &＃39;.png&＃39;if os.path.exists(filename):print(filename &＃43; "已经存在")continueprint("Loading...........................")img_url &＃61; &＃39;https:&＃39; &＃43; img_tag.get("src").replace("w200", "w1000")try:img_response &＃61; requests.get(img_url, timeout&＃61;240, headers&＃61;kv)except Exception:print(start_volume, issue_number, &＃39;???????????异常????????&＃39;)with open(path &＃43; "\\异常.txt", &＃39;at&＃39;) as txt:txt.write(str(start_volume) &＃43; &＃39;_&＃39; &＃43; str(issue_number) &＃43; "请求异常\n")continuewith open(filename, &＃39;wb&＃39;) as imgfile:imgfile.write(img_response.content)print("成功下载图片&＃xff1a;" &＃43; str(start_volume) &＃43; &＃39;_&＃39; &＃43; str(issue_number))start_volume -&＃61; 1

运行结果&＃xff1a;

以上部分代码可以自动下载nature和nature genetics的封面&＃xff0c;这两个期刊的网站结构跟其他子刊略有不同&＃xff0c;其他子刊可以用以下代码来进行爬虫&＃xff1a;

import requests from bs4 import BeautifulSoup import osother_journals &＃61; {&＃39;nature biomedical engineering&＃39;: &＃39;natbiomedeng&＃39;,&＃39;nature methods&＃39;: &＃39;nmeth&＃39;,&＃39;nature astronomy&＃39;: &＃39;natastron&＃39;,&＃39;nature medicine&＃39;: &＃39;nm&＃39;,&＃39;nature protocols&＃39;: &＃39;nprot&＃39;,&＃39;nature microbiology&＃39;: &＃39;nmicrobiol&＃39;,&＃39;nature cell biology&＃39;: &＃39;ncb&＃39;,&＃39;nature nanotechnology&＃39;: &＃39;nnano&＃39;,&＃39;nature immunology&＃39;: &＃39;ni&＃39;,&＃39;nature energy&＃39;: &＃39;nenergy&＃39;,&＃39;nature materials&＃39;: &＃39;nmat&＃39;,&＃39;nature cancer&＃39;: &＃39;natcancer&＃39;,&＃39;nature neuroscience&＃39;: &＃39;neuro&＃39;,&＃39;nature machine intelligence&＃39;: &＃39;natmachintell&＃39;,&＃39;nature metabolism&＃39;: &＃39;natmetab&＃39;,&＃39;nature food&＃39;: &＃39;natfood&＃39;,&＃39;nature ecology & evolution&＃39;: "natecolevol","nature stuctural & molecular biology":"nsmb","nature physics":"nphys","nature human behavior":"nathumbehav","nature chemical biology":"nchembio" }nature_journal &＃61; {# 要下载的期刊放这里&＃39;nature plants&＃39;: &＃39;nplants&＃39;,&＃39;nature biotechnology&＃39;: &＃39;nbt&＃39; } folder_Name &＃61; "nature 封面" kv &＃61; {&＃39;User-Agent&＃39;:&＃39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36&＃39; }def makefile(path):folder &＃61; os.path.exists(path)if not folder:os.makedirs(path)print("Make file -- " &＃43; path &＃43; " -- successfully!")else:raise AssertionError################################################################ def getCover(url, journal, year, filepath, startyear&＃61;2022, endyear&＃61;2022):# 注意endyear是比startyear小的,因为是从endyear开始由后往前来下载的if not (endyear <&＃61; year <&＃61; startyear):returntry:issue_response &＃61; requests.get("https://www.nature.com" &＃43; url,timeout&＃61;120,headers&＃61;kv)except Exception:print(journal &＃43; " " &＃43; str(year) &＃43; " Error")returnissue_response.encoding &＃61; &＃39;gbk&＃39;if &＃39;Page not found&＃39; in issue_response.text:print(journal &＃43; " Page not found")returnissue_soup &＃61; BeautifulSoup(issue_response.text, &＃39;html.parser&＃39;)cover_image &＃61; issue_soup.find_all("img", class_&＃61;&＃39;image-constraint pt10&＃39;)for image in cover_image:image_url &＃61; image.get("src")print("Start loading img.............................")image_url &＃61; image_url.replace("w200", "w1000")if (image_url[-2] &＃61;&＃61; &＃39;/&＃39;):month &＃61; "0" &＃43; image_url[-1]else:month &＃61; image_url[-2:]image_name &＃61; nature_journal[journal] &＃43; "_" &＃43; str(year) &＃43; "_" &＃43; month &＃43; ".png"if os.path.exists(filepath &＃43; journal &＃43; "\\" &＃43; image_name):print(image_url &＃43; " 已经存在")continueprint(image_url)try:image_response &＃61; requests.get("http:" &＃43; image_url,timeout&＃61;240,headers&＃61;kv)except Exception:print("获取图片异常:" &＃43; image_name)continuewith open(filepath &＃43; journal &＃43; "\\" &＃43; image_name,&＃39;wb&＃39;) as downloaded_img:downloaded_img.write(image_response.content)def main():try:path &＃61; os.getcwd() &＃43; &＃39;\\&＃39;makefile(path &＃43; folder_Name)except Exception:print("文件夹 --nature 封面-- 已经存在")path &＃61; path &＃43; folder_Name &＃43; "\\"for journal in nature_journal:try:makefile(path &＃43; journal)except AssertionError:print("File -- " &＃43; path &＃43; " -- has already exist!")try:volume_response &＃61; requests.get("https://www.nature.com/" &＃43;nature_journal[journal] &＃43;"/volumes",timeout&＃61;120,headers&＃61;kv)except Exception:print(journal &＃43; " 异常")continuevolume_response.encoding &＃61; &＃39;gbk&＃39;volume_soup &＃61; BeautifulSoup(volume_response.text, &＃39;html.parser&＃39;)volume_list &＃61; volume_soup.find_all(&＃39;ul&＃39;,class_&＃61;&＃39;clean-list ma0 clean-list grid-auto-fill medium-row-gap background-white&＃39;)number_of_volume &＃61; 0for volume_child in volume_list[0].children:if volume_child &＃61;&＃61; &＃39;\n&＃39;:continueissue_url &＃61; volume_child.find_all("a")[0].get("href")print(issue_url)print(2020 - number_of_volume)getCover(issue_url,journal,year&＃61;(2020 - number_of_volume),filepath&＃61;path,startyear&＃61;2022, endyear&＃61;2022)number_of_volume &＃43;&＃61; 1if __name__ &＃61;&＃61; "__main__":main()print("Finish Everything!")

运行结果&＃xff1a;

推荐阅读

header
Python 爬虫实战：知乎美腿图片抓取

本文介绍如何使用Python编写一个简单的爬虫程序，从知乎问题页面抓取美腿图片。环境配置包括Windows 10操作系统，Python语言及其相关库。 ... [详细]

蜡笔小新 2024-12-02 09:30:45
header
使用Bootstrap创建响应式渐变固定头部导航栏的方法

本文详细介绍了如何利用Bootstrap框架构建一个具有渐变效果的固定顶部响应式导航栏，包括HTML结构、CSS样式以及JavaScript交互的完整实现过程。适合前端开发者和学习者参考。 ... [详细]

蜡笔小新 2024-12-12 18:04:25
header
python翻译程序编写模板_python爬虫编写英译中小程序

1.选择一个翻译页面，我选择的是有道词典(http:dict.youdao.com)2.随便输入一个英语单词进行翻译，然后查看源文件，找到 ... [详细]

蜡笔小新 2024-11-29 12:52:41
require
探索HTML5：十五个关键的新特性

本文深入探讨了HTML5中十五个重要的新特性，为开发者提供了详细的指南。 ... [详细]

蜡笔小新 2024-11-26 19:09:22
export
利用CSS3和React实现数字滚动动画组件

在前端开发中，数字滚动动画是一个常见的需求。本文将详细介绍如何使用CSS3和React构建一个数字滚动动画组件，包括组件的代码实现和样式设计。如果您对HTML版本感兴趣，欢迎留言获取。 ... [详细]

蜡笔小新 2024-12-13 13:48:05
java
远程访问用户 Kindle通过电子书实现控制

介绍自2007年以来，亚马逊已售出数千万台Kindle，令人印象深刻。但这也意味着数以千万计的人可能会因为这些Kindle中的软件漏洞而被黑客入侵。他 ... [详细]

蜡笔小新 2024-11-29 07:58:24
java
QUIC协议：快速UDP互联网连接

QUIC（Quick UDP Internet Connections）是谷歌开发的一种旨在提高网络性能和安全性的传输层协议。它基于UDP，并结合了TLS级别的安全性，提供了更高效、更可靠的互联网通信方式。 ... [详细]

蜡笔小新 2024-12-28 12:33:18
string
Java 中的 BigDecimal pow()方法，示例

Java 中的 BigDecimal pow()方法，示例 ... [详细]

蜡笔小新 2024-12-27 20:54:03
string
Linux 系统启动故障排除指南：MBR 和 GRUB 问题

本文详细介绍了 Linux 系统启动过程中常见的 MBR 扇区和 GRUB 引导程序故障及其解决方案，涵盖从备份、模拟故障到恢复的具体步骤。 ... [详细]

蜡笔小新 2024-12-27 20:40:29
string
深入理解Cookie与Session会话管理

本文详细介绍了如何通过HTTP响应和请求处理浏览器的Cookie信息，以及如何创建、设置和管理Cookie。同时探讨了会话跟踪技术中的Session机制，解释其原理及应用场景。 ... [详细]

蜡笔小新 2024-12-27 18:20:43
export
Vue 3.0 翻牌数字组件使用指南

本文详细介绍了如何在 Vue 3.0 中使用翻牌数字组件，包括其基本设置和高级配置，旨在帮助开发者快速掌握并应用这一动态视觉效果。 ... [详细]

蜡笔小新 2024-12-17 11:54:45
java
select下拉箭头改变，兼容ie8/9

各个浏览器下select默认的下拉箭头差别较大，通常会清除默认样式，重新设计<html><head><metacharsetutf-8> ... [详细]

蜡笔小新 2024-12-12 18:11:40
java
HTML5实现逼真树叶飘落动画详解

本文详细介绍了如何利用HTML5技术创建一个逼真的树叶飘落动画，包括HTML、CSS和JavaScript的代码实现及优化技巧。 ... [详细]

蜡笔小新 2024-12-12 13:05:58
require
使用 NDB 提升 Node.js 应用调试体验

本文介绍了由 Google Chrome 实验室推出的新一代 Node.js 调试工具 NDB，旨在为开发者提供更加高效和便捷的调试解决方案。 ... [详细]

蜡笔小新 2024-12-02 20:52:15
string
如何清空Layui树结构

本文将详细介绍如何使用Layui框架清空树形结构，包括创建树、添加节点以及实现清空功能的具体步骤。通过本文，您将能够掌握Layui树的管理技巧。 ... [详细]

蜡笔小新 2024-11-29 16:16:44

陈可不能哭

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章