爬虫是比较常用的程序,用python实现起来非常简单,有几个相关的库,这里就记录一下python常用的爬虫代码,备忘。
1 requestxs
import requests
url ='http://onevanillachecker.com/'
rep = requests.get(url)
rep.encoding = 'utf-8'
print(rep.text)
一些参数的记录
import requests
url ='http://onevanillachecker.com/'
header={
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'zh-CN,zh;q=0.8',
'Connection': 'keep-alive',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0'
}
timeout = random.choice(range(80, 180))
rep = requests.get(url,headers = header,timeout = timeout)
rep.encoding = 'utf-8'
print(rep.text)
2 urllib2
import urllib2
req = urllib2.Request('http://onevanillachecker.com/')
response = urllib2.urlopen(req)
html = response.read()
3 beautifulsoup
beautifulsoup是用来解析页面的库,使用起来非常方便
相关文档https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
下面简单记一些常用的东西,备忘。
配置安装
pip install beautifulsoup4
简单使用
from bs4 import BeautifulSoup
import urllib2
req = urllib2.Request('http://onevanillachecker.com/')
response = urllib2.urlopen(req)
html = response.read()
# beautifulsoup
soup = BeautifulSoup(html)
print(soup.title)
#
One Vanilla Gift Card Balance Check -Official Website
print(soup.title.name)
# title
print(soup.title.string)
# One Vanilla Gift Card Balance Check -Official Website
print(soup.title.parent.name)
# head
print(soup.p)
#
Life happens every day. And OneVanilla
helps make it simpler. Shop, dine, fill 'er up
and more - all with one prepaid card.
# print(soup.p['class'])
print(soup.a)
# Vanilla Gift Card
print(soup.find_all('a'))
# Vanilla Gift Card, Check Vanilla 3 Balance
# Vanilla Gift Cards, Where to Buy # Sign In, About Vanilla Gift Card
# Using Your Vanilla Gift Card, Try Vanilla Gift
# ......
print(soup.find(alt="2"))
#
print(soup.get_text())