当前位置: 开发笔记 > 编程语言 > 正文

Httpclientgzip乱码问题解决

作者：Jesus_kk | 来源：互联网 | 2023-09-12 17:48

■出现问题的原因推测被反爬了,缺少了cookie,你请求出来的信息就是运行一段js,生成cookie,看到args1了么,这个是密钥,下面的也不是编码的,就是js混淆的问题防

■出现问题的原因推测

被反爬了,缺少了COOKIE,你请求出来的信息就是运行一段js,

生成COOKIE,看到args1了么,这个是密钥,下面的也不是编码的,就是js混淆的问题

防爬网站需要携带一些基础http头模拟成浏览器登录

https://www.jianshu.com/p/401a25134b89

■前言

以下代码运行的返回值

■代码

package com.sxz.timecontroal;import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.net.URLDecoder; import java.util.zip.GZIPInputStream;import org.apache.http.Header; import org.apache.http.HttpResponse; import org.apache.http.HttpStatus; import org.apache.http.client.ClientProtocolException; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.DefaultHttpClient; import org.apache.http.util.EntityUtils;public class CheckTimeWithNet {static final String LOGINURL &＃61; "https://blog.csdn.net/sxzlc/article/list/3";public static void main(final String[] args) {final DefaultHttpClient httpclient &＃61; new DefaultHttpClient();final HttpGet httpGet &＃61; new HttpGet(LOGINURL);HttpResponse response &＃61; null;try {httpGet.addHeader("Accept-Encoding", "gzip, deflate"); response &＃61; httpclient.execute(httpGet); } catch (final ClientProtocolException cpException) {} catch (final IOException ioException) {}// verify response is HTTP OKfinal int statusCode &＃61; response.getStatusLine().getStatusCode();if (statusCode !&＃61; HttpStatus.SC_OK) {System.out.println("Error authenticating to Force.com: "&＃43;statusCode);return;}System.out.println("---------------------Status code Info Start---------------------");System.out.println(response.getStatusLine());System.out.println("---------------------Status code Info end ---------------------");System.out.println("---------------------Head Info Start---------------------");final Header[] hs &＃61; response.getAllHeaders();for(final Header h:hs){System.out.println(h.getName() &＃43; ":" &＃43; h.getValue());}System.out.println("---------------------Head Info End ---------------------");String getResult &＃61; null;try {// response.setEntity(new GzipDecompressingEntity(response.getEntity())); // getResult &＃61; EntityUtils.toString(response.getEntity(),"UTF-8");getResult &＃61; getStringFromResponseUzip(response);} catch (final Exception ioException) {// Handle system IO exception}System.out.println(getResult);}public static String getStringFromResponseUzip(final HttpResponse response) throws Exception {if (response &＃61;&＃61; null) {return null;}String responseText &＃61; "";//InputStream in &＃61; response.getEntity().getContent();final InputStream in &＃61; response.getEntity().getContent();final Header[] headers &＃61; response.getHeaders("Content-Encoding");for(final Header h : headers){System.out.println(h.getValue());if(h.getValue().indexOf("gzip") > -1){//For GZip responsetry{final GZIPInputStream gzin &＃61; new GZIPInputStream(in);final InputStreamReader isr &＃61; new InputStreamReader(gzin,"UTF-8");responseText &＃61; getStringFromStream(isr);//responseText &＃61; URLDecoder.decode(responseText, "utf-8");}catch (final IOException exception){exception.printStackTrace();}return responseText;}}responseText &＃61; EntityUtils.toString(response.getEntity(),"utf-8");return responseText;}public static String getStringFromStream(final InputStreamReader isr) throws Exception{final BufferedReader br &＃61; new BufferedReader(isr);final StringBuilder sb &＃61; new StringBuilder();String tmp;while((tmp &＃61; br.readLine())!&＃61;null){sb.append(tmp);sb.append("\r\n");}br.close();isr.close();return sb.toString();} }

■运行结果

---------------------Status code Info Start---------------------
HTTP/1.1 200 OK
---------------------Status code Info end ---------------------
---------------------Head Info Start---------------------
Server:Tengine
Date:Sat, 07 Dec 2019 12:20:38 GMT
Content-Type:text/html; charset&＃61;utf-8
Transfer-Encoding:chunked
Connection:keep-alive
Set-COOKIE:acw_tc&＃61;2760820215757212385795097e52a909ebbcda96b20e30f4c216c0bfbc89e6;path&＃61;/;HttpOnly;Max-Age&＃61;2678401
Content-Encoding:gzip
cache-control:no-cache, no-store
Pragma:no-cache
Strict-Transport-Security:max-age&＃61;86400
---------------------Head Info End ---------------------
gzip

■后续

解压后为16进制代码&＃xff0c;有待解决。。。

\x65 z

这是 URLENCODE造成的&＃xff0c;使用URLDECODE解决

感谢&＃xff0c;[gybao]大神的帮助

https://bbs.csdn.net/topics/395274030

但是&＃xff0c;没有使用URLDECODE&＃xff0c;之前的代码&＃xff0c;在运行一下&＃xff0c;竟然直接成功了。

但是&＃xff0c;我之前是怎么跑出这种效果的&＃xff0c;原因不明。。。　推测问题的原因在下面记述

■再次修改后的代码

对于目前最新代码的说明

当能进入到下面79行的分支中时&＃xff0c;不论有没有85行都不会出现乱码问题。

代码

package com.sxz.timecontroal;import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.net.URLDecoder; import java.util.zip.GZIPInputStream;import org.apache.http.Header; import org.apache.http.HttpResponse; import org.apache.http.HttpStatus; import org.apache.http.client.ClientProtocolException; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.DefaultHttpClient; import org.apache.http.util.EntityUtils;public class CheckTimeWithNet {//static final String LOGINURL &＃61; "https://blog.csdn.net/sxzlc?orderby&＃61;ViewCount";static final String LOGINURL &＃61; "https://blog.csdn.net/sxzlc/article/list/2?orderby&＃61;ViewCount";public static void main(final String[] args) {final DefaultHttpClient httpclient &＃61; new DefaultHttpClient();final HttpGet httpGet &＃61; new HttpGet(LOGINURL);HttpResponse response &＃61; null;try {httpGet.addHeader("Accept-Encoding", "gzip, deflate"); response &＃61; httpclient.execute(httpGet); } catch (final ClientProtocolException cpException) {} catch (final IOException ioException) {}// verify response is HTTP OKfinal int statusCode &＃61; response.getStatusLine().getStatusCode();if (statusCode !&＃61; HttpStatus.SC_OK) {System.out.println("Error authenticating to Force.com: "&＃43;statusCode);return;}System.out.println("---------------------Status code Info Start---------------------");System.out.println(response.getStatusLine());System.out.println("---------------------Status code Info end ---------------------");System.out.println("---------------------Head Info Start---------------------");final Header[] hs &＃61; response.getAllHeaders();for(final Header h:hs){System.out.println(h.getName() &＃43; ":" &＃43; h.getValue());}System.out.println("---------------------Head Info End ---------------------");String getResult &＃61; null;try {// response.setEntity(new GzipDecompressingEntity(response.getEntity())); // getResult &＃61; EntityUtils.toString(response.getEntity(),"UTF-8");getResult &＃61; getStringFromResponseUzip(response);} catch (final Exception ioException) {// Handle system IO exception}System.out.println(getResult);}public static String getStringFromResponseUzip(final HttpResponse response) throws Exception {if (response &＃61;&＃61; null) {return null;}String responseText &＃61; "";//InputStream in &＃61; response.getEntity().getContent();final InputStream in &＃61; response.getEntity().getContent();final Header[] headers &＃61; response.getHeaders("Content-Encoding");for(final Header h : headers){System.out.println(h.getValue());if(h.getValue().indexOf("gzip") > -1){//For GZip responsetry{final GZIPInputStream gzin &＃61; new GZIPInputStream(in);final InputStreamReader isr &＃61; new InputStreamReader(gzin,"UTF-8");responseText &＃61; getStringFromStream(isr);responseText &＃61; URLDecoder.decode(responseText, "UTF-8");}catch (final IOException exception){exception.printStackTrace();}System.out.println("---------------------is gzip---------------------");return responseText;}}System.out.println("---------------------is not gzip---------------------");responseText &＃61; EntityUtils.toString(response.getEntity(),"utf-8");return responseText;}public static String getStringFromStream(final InputStreamReader isr) throws Exception{final BufferedReader br &＃61; new BufferedReader(isr);final StringBuilder sb &＃61; new StringBuilder();String tmp;while((tmp &＃61; br.readLine())!&＃61;null){sb.append(tmp);sb.append("\r\n");}br.close();isr.close();return sb.toString();} }

以上代码运行后的结果

如果Get不设定gzip是&＃xff0c;

---

推测出现问题的原因&＃xff1a;

-------------------------------------------------------

■原因推测

还是网站那边做了什么特殊的处理

上午之所以好用&＃xff0c;是因为网站那边返回的结果没有进行 gzip压缩&＃xff0c;

而下午请求同样的地址&＃xff0c;经过了gzip压缩&＃xff0c;所以在解析处理的时候&＃xff0c;无法正常解析。

■现象1

下午再次同样的运行代码&＃xff0c;又出现了乱码的问题&＃xff0c;

加上DECODE也没有用&＃xff08;以下88&＃xff0c;87行&＃xff09;&＃xff0c;估计解码时出现问题&＃xff0c;直接返回NULL了

现象2

上午再cmd 窗口中&＃xff0c;使用CURL 上面的地址

可以返回页面的HTML&＃xff0c;下午就不行了&＃xff0c;返回效果如下。

■关于URLEncode的确认

上面的乱码抽取了一部分&＃xff0c;确定是URL编码&＃xff0c;但是在解码全部字符串的时候&＃xff0c;返回值为NULL

■补充说明

而且&＃xff0c;感觉乱码是&＃xff0c;返回的信息&＃xff0c;和上午返回所有的页面HTML代码相比较&＃xff0c;少了很多&＃xff01;

-------------------------------------------------------

■后续&＃xff08;结果说明1&＃xff09;

关于一会儿是 gzip&＃xff0c; 一会儿不是&＃xff0c;

原因推测是&＃xff0c;因为负载平衡&＃xff0c;每次访问的服务器不一样。

基于 Nginx 的两个版本&＃xff08;Openresty和Tengine&＃xff09;

・gzip的server信息 Tengine

------------------------------------

・不是gzip时的server信息

------------------------------------

TODO

------------------------------------

----

---

推荐阅读

request
java 模拟get post请求_Java后台模拟发送http的get和post请求，并测试

个人学习使用：谨慎参考1Client类importcom.thoughtworks.gauge.Step;importcom.thoughtworks.gauge.T ... [详细]

蜡笔小新 2023-12-13 14:20:23
jar
java命令运行

Java在运行已编译完成的类时，是通过java虚拟机来装载和执行的，java虚拟机通过操作系统命令JAVA_HOMEbinjava–option来启 ... [详细]

蜡笔小新 2023-12-12 19:26:55
request
Sleuth+zipkin链路追踪SpringCloud微服务的解决方案

在庞大的微服务群中，随着业务扩展，微服务个数增多，系统调用链路复杂化。Sleuth+zipkin是解决SpringCloud微服务定位和追踪的方案。通过TraceId将不同服务调用的日志串联起来，实现请求链路跟踪。通过Feign调用和Request传递TraceId，将整个调用链路的服务日志归组合并，提供定位和追踪的功能。 ... [详细]

蜡笔小新 2023-12-09 19:14:50
request
【译】发送表单数据

这是原文链接：sendingformdata许多情况下，我们使用表单发送数据到服务器。服务器处理数据并返回响应给用户。这看起来很简单，但是 ... [详细]

蜡笔小新 2023-12-14 16:19:10
io
Web学习历程记录（七）——Tomcat基本概念和配置

本文介绍了Web学习历程记录中关于Tomcat的基本概念和配置。首先解释了Web静态Web资源和动态Web资源的概念，以及C/S架构和B/S架构的区别。然后介绍了常见的Web服务器，包括Weblogic、WebSphere和Tomcat。接着详细讲解了Tomcat的虚拟主机、web应用和虚拟路径映射的概念和配置过程。最后简要介绍了http协议的作用。本文内容详实，适合初学者了解Tomcat的基础知识。 ... [详细]

蜡笔小新 2023-12-13 17:08:24
instance
org.apache.catalina.LifecycleEvent类的使用及代码示例

标题： ... [详细]

蜡笔小新 2023-12-13 11:03:10
io
单击时动态创建
元素 - Dynamically create
element on click

Ihavethefollowingonhtml我在html上有以下内容<html><head><scriptsrc..3003_Tes ... [详细]

蜡笔小新 2023-12-12 15:59:36
string
Hibernate延迟加载深入分析-集合属性的延迟加载策略

本文深入分析了Hibernate延迟加载的机制，特别是集合属性的延迟加载策略。通过延迟加载，可以降低系统的内存开销，提高Hibernate的运行性能。对于集合属性，推荐使用延迟加载策略，即在系统需要使用集合属性时才从数据库装载关联的数据，避免一次加载所有集合属性导致性能下降。 ... [详细]

蜡笔小新 2023-12-10 14:26:13
request
Servlet多用户登录时HttpSession会话信息覆盖问题的解决方案

本文讨论了在Servlet多用户登录时可能出现的HttpSession会话信息覆盖问题，并提供了解决方案。通过分析JSESSIONID的作用机制和编码方式，我们可以得出每个HttpSession对象都是通过客户端发送的唯一JSESSIONID来识别的，因此无需担心会话信息被覆盖的问题。需要注意的是，本文讨论的是多个客户端级别上的多用户登录，而非同一个浏览器级别上的多用户登录。 ... [详细]

蜡笔小新 2023-12-10 12:00:40
filter
大数据Hadoop生态(20)MapReduce框架原理OutputFormat的开发笔记

本文介绍了大数据Hadoop生态(20)MapReduce框架原理OutputFormat的开发笔记，包括outputFormat接口实现类、自定义outputFormat步骤和案例。案例中将包含nty的日志输出到nty.log文件，其他日志输出到other.log文件。同时提供了一些相关网址供参考。 ... [详细]

蜡笔小新 2023-12-10 11:44:06
jar
sqoop自定义分隔符的实现方法及步骤详解

本文介绍了在sqoop1.4.*版本中，如何实现自定义分隔符的方法及步骤。通过修改sqoop生成的java文件，并重新编译，可以满足实际开发中对分隔符的需求。具体步骤包括修改java文件中的一行代码，重新编译所需的hadoop包等。详细步骤和编译方法在本文中都有详细说明。 ... [详细]

蜡笔小新 2023-12-10 11:29:22
string
Java如何导入和导出Excel文件的方法和步骤详解

本文详细介绍了在SpringBoot中使用Java导入和导出Excel文件的方法和步骤，包括添加操作Excel的依赖、自定义注解等。文章还提供了示例代码，并将代码上传至GitHub供访问。 ... [详细]

蜡笔小新 2023-12-09 20:27:00
uri
Apache Shiro 身份验证绕过漏洞 (CVE202011989) 详细解析及防范措施

本文详细解析了Apache Shiro 身份验证绕过漏洞 (CVE202011989) 的原理和影响，并提供了相应的防范措施。Apache Shiro 是一个强大且易用的Java安全框架，常用于执行身份验证、授权、密码和会话管理。在Apache Shiro 1.5.3之前的版本中，与Spring控制器一起使用时，存在特制请求可能导致身份验证绕过的漏洞。本文还介绍了该漏洞的具体细节，并给出了防范该漏洞的建议措施。 ... [详细]

蜡笔小新 2023-12-09 19:58:36
jar
解决java开源项目apache commons email简单使用报错的方法

本文介绍了解决java开源项目apache commons email简单使用报错的方法，包括使用正确的JAR包和正确的代码配置，以及相关参数的设置。详细介绍了如何使用apache commons email发送邮件。 ... [详细]

蜡笔小新 2023-12-09 17:35:16
jar
org.apache.solr.common.SolrDocument.setField()方法的使用及代码示例

本文整理了Java中org.apache.solr.common.SolrDocument.setField()方法的一些代码示例，展示了SolrDocum ... [详细]

蜡笔小新 2023-12-09 06:54:05

Jesus_kk

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章