热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Canwesetaproxyforthespiderusingthescrapy_splash?

WhenIimplementedaspiderusingScrapy,Iwantedtochangetheproxyofitsothattheserverw

When I implemented a spider using Scrapy, I wanted to change the proxy of it so that the server wouldn't forbid my request according to the frequent requests from an ip. I also knew how to change the proxy with Scrapy, using middlewares or directly change the meta when I request.



However, I used the package scrapy_splash to execute the Javascript for my spider, then I found it difficult to change the proxy because in my opinion, the scrapy_splash use a proxy server to render the JS of the website for us.

In fact, when I only use Scrapy, the proxy goes well, but turns to be unuseful when I use scrapy_splash.



So is there any way to set a proxy for the request of the scrapy_splash?

HELP ME,PLZ,THANK YOU

modified 4 hours later:

I have set the related settings in the

1
setting.py

and written this in the

1
middlewares.py

. As I mentioned before, this only works for scrapy but not scrapy_splash:

1
2
3
4
5
6
7
8
9
10
class RandomIpProxyMiddleware(object):

    def __init__(self, ip=''):

        self.ip = ip

        ip_get()

        with open('carhome\\ip.json', 'r') as f:

            self.IPPool = json.loads(f.read())



    def process_request(self, request, spider):

        thisip = random.choice(self.IPPool)

        request.meta['proxy'] = "http://{}".format(thisip['ipaddr'])

And here is the code in the spider with scrapy_splash:

1
2
    yield scrapy_splash.SplashRequest(

            item, callback=self.parse, args={'wait': 0.5})

Here is the code in the spider without this pluguin:

1
    yield scrapy.Request(item, callback=self.parse)



   



推荐阅读
author-avatar
手机用户2502871605
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有