Nokogiri-使用XML，而不是HTML-Nokogiri-WorkswithXML,notsomuchwithHTML

作者：手机用户2502927973 | 来源：互联网 | 2023-05-27 13:50

ImhavinganissuegettingNokogiritoworkproperly.Imusingversion1.4.4withRuby1.9.2.我遇到了

I'm having an issue getting Nokogiri to work properly. I'm using version 1.4.4 with Ruby 1.9.2.

我遇到了让Nokogiri正常工作的问题。我正在使用版本1.4.4和Ruby 1.9.2。

I have both libxml2 libxslt installed and up to date. When I run a Ruby script with XML, it works great.

我安装了libxml2 libxslt并且是最新的。当我使用XML运行Ruby脚本时,它运行良好。

require 'nokogiri'

doc = Nokogiri::XML(File.open("test.xml"))
doc = doc.css("name").each do |node|
    puts node.text
end

Enter into the CL, run ruby test.rb, returns

进入CL,运行ruby test.rb,返回

Name 1
Name 2
Name 3

And the crowd goes wild. I tweak a few things, make a few adjustments to the code...

人群疯狂。我调整了一些事情,对代码做了一些调整......

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("http://domain.tld"))
doc = doc.css("p").each do |node|
    puts node.text
end

Back to CL, ruby test.rb, returns... nothing! Just a new, empty line.

回到CL,ruby test.rb,返回......什么都没有!只是一个新的空行。

Is there any reason that it will work with an XML file, but not HTML?

是否有任何理由可以使用XML文件,而不是HTML?

1 个解决方案

#1

To debug this sort of problem we need more information from you. Since you're not giving a working URL, and because we know that Nokogiri works fine for this sort of problem, the debugging falls on you.

要调试此类问题,我们需要您提供更多信息。由于您没有提供有效的URL,并且因为我们知道Nokogiri可以解决这类问题,所以调试工作就在您身上。

Here's what I would do to test:

这是我要做的测试:

In IRB:

Do you get output when you do: open('http://whateverURLyouarehiding.com').read

当你这样做时,你得到输出:open('http://whateverURLyouarehiding.com')。read

If that returns a valid document, what do you get when you wrap the previous open statement in Nokogiri::HTML(...). That needs to preserve the .read in the previous line too, so Nokogiri is receiving the body of the page, NOT an IO stream.

如果返回一个有效的文档,当你在Nokogiri :: HTML(...)中包装上一个open语句时会得到什么。这也需要保留前一行中的.read,因此Nokogiri正在接收页面的主体,而不是IO流。

Try #2 above, but remove the .read. That will tell if there's a problem with Nokogiri reading an IO stream, though I seriously doubt it has a problem since I use it all the time. At that point I'd suspect a problem on your system.

尝试上面的#2,但删除.read。这将告诉Nokogiri读取IO流是否存在问题,但我严重怀疑它是否存在问题,因为我一直使用它。那时我怀疑你的系统有问题。

If you're getting a document in #2 and #3, then the problem could be in your accessor; I suspect what you're looking for doesn't exist.

如果您在#2和#3中收到文档,则问题可能出在您的访问者身上;我怀疑你所寻找的东西是不存在的。

If it does exist, then check the value of doc.errors after Nokogiri parses the document. It could be finding errors in the document, and, if so, they'll be captured there.

如果确实存在,则在Nokogiri解析文档后检查doc.errors的值。它可能是在文档中发现错误,如果是这样,它们将被捕获到那里。

推荐阅读

io
itextcss样式的简单介绍

pdf怎么把html变成pdf1　用AdobeAcroat8.1.2，打开网页后，页面右键菜单中会出现一个“转换为AobePDF的选项，点击就可以转换。　安装AdobeAcroba ... [详细]

蜡笔小新 2024-09-28 11:07:23
request
crossorigin注解添加了解决不了跨域问题_CORS与@CrossOrigin详解

1、跨域的基本概念a、跨域的解释要了解跨域，首先需要知晓浏览器的同源策略，简单的说就是两个请求协议、端口、主机都相同，则两个请求具有相同的 ... [详细]

蜡笔小新 2024-09-30 19:24:12
io
22.Container With Most Water（能装最多水的容器）

thecontainercontainsthemos ... [详细]

蜡笔小新 2024-09-30 18:33:10
io
HTTP 请求/响应的步骤

HTTP请求响应的步骤第一步：第二步：第三步：第四步：第五步第一步：1.客户端连接到Web服务器⼀个HTTP ... [详细]

蜡笔小新 2024-09-30 16:44:08
request
html5技术研究报告,HTML5数据通信技术研究本科生毕业设计开题报告

要点三：基本方法XMLHttpRequest应用程序有一定的复杂性，但是如果将复杂的问题简单话就只需以下几个方法步骤即可完成。abort()方法取消当前 ... [详细]

蜡笔小新 2024-09-30 15:16:10
io
spotify engineering culture part 1

原文，因为原视频说的太快太长，又没有字幕，于是借助youtube，把原文听&打出来了。中文版日后有时间再翻译。oneofthebigsucceessfactorshereatSpo ... [详细]

蜡笔小新 2024-09-30 13:36:17
io
MyBatis模糊查询和多条件查询

MyBatis模糊查询和多条件查询一、ISmbmsUserDao层根据姓名模糊查询publicListgetUser();多条件查询publicList ... [详细]

蜡笔小新 2024-09-30 13:26:10
io
Apache Traffic Server 6.2.2 发布，反向代理服务器

ApacheTrafficServer6.2.2发布了，TrafficServer是一套快速、模块化 ... [详细]

蜡笔小新 2024-09-30 13:24:08
io
抓取百万知乎用户设计之实体设计

一.实体的关系实体是根据返回的Json数据来设计的教育经历方面用户可以有很多教育经理，USER和education是一对多的关系，一个education对应一个education一 ... [详细]

蜡笔小新 2024-09-30 05:52:51
utf-8
Flex中使用filter过滤数据

Flex中使用filter过滤数据 ... [详细]

蜡笔小新 2024-09-29 14:51:58
loops
Linux网络编程：自己动手写高性能HTTP服务器框架（二）

github：https:github.comfroghuiyolandaIO模型和多线程模型实现多线程设计的几个考虑在我们的设计中，mainre ... [详细]

蜡笔小新 2024-09-29 11:22:09
io
[字符编码]Numeric Character Reference和HTML Entities（一）

你是否在dreamweaver里编辑网页的时候看到Σ这样的东西，你曾使用过 这样的玩意吧，或者你在调试webservice的时候看到返回xml字符串中现 ... [详细]

蜡笔小新 2024-09-28 15:31:47
数组
AndroidlistView 点击事件

Adapter相当于C（Controller，控制器），listView相当于V(View,视图)用于显示数据为ListView提供数据的List,数组或数据库相当于MVC模式中的 ... [详细]

蜡笔小新 2024-09-28 15:24:54
main
去掉英文句子中重复出现的单词和标点符号

#includestdafx.h#includeiostream#includesstream#includemap#includestring ... [详细]

蜡笔小新 2024-09-28 15:17:19
io
两种方式实现Flink异步IO查询Mysql

如官网所描述的Flink支持两种方式实现异步IO查询外部系统http ... [详细]

蜡笔小新 2024-09-28 11:27:02

手机用户2502927973

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章