作者:手机用户2502871605 | 来源:互联网 | 2024-10-09 14:05
When I implemented a spider using Scrapy, I wanted to change the proxy of it so that the server wouldn't forbid my request according to the frequent requests from an ip. I also knew how to change the proxy with Scrapy, using middlewares or directly change the meta when I request.
However, I used the package scrapy_splash to execute the Javascript for my spider, then I found it difficult to change the proxy because in my opinion, the scrapy_splash use a proxy server to render the JS of the website for us.
In fact, when I only use Scrapy, the proxy goes well, but turns to be unuseful when I use scrapy_splash.
So is there any way to set a proxy for the request of the scrapy_splash?
HELP ME,PLZ,THANK YOU
modified 4 hours later:
I have set the related settings in the
and written this in the
. As I mentioned before, this only works for scrapy but not scrapy_splash:
1 2 3 4 5 6 7 8 9 10
| class RandomIpProxyMiddleware(object):
def __init__(self, ip=''):
self.ip = ip
ip_get()
with open('carhome\\ip.json', 'r') as f:
self.IPPool = json.loads(f.read())
def process_request(self, request, spider):
thisip = random.choice(self.IPPool)
request.meta['proxy'] = "http://{}".format(thisip['ipaddr']) |
And here is the code in the spider with scrapy_splash:
1 2
| yield scrapy_splash.SplashRequest(
item, callback=self.parse, args={'wait': 0.5}) |
Here is the code in the spider without this pluguin:
1
| yield scrapy.Request(item, callback=self.parse) |