热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Nokogiri-使用XML,而不是HTML-Nokogiri-WorkswithXML,notsomuchwithHTML

ImhavinganissuegettingNokogiritoworkproperly.Imusingversion1.4.4withRuby1.9.2.我遇到了

I'm having an issue getting Nokogiri to work properly. I'm using version 1.4.4 with Ruby 1.9.2.

我遇到了让Nokogiri正常工作的问题。我正在使用版本1.4.4和Ruby 1.9.2。

I have both libxml2 libxslt installed and up to date. When I run a Ruby script with XML, it works great.

我安装了libxml2 libxslt并且是最新的。当我使用XML运行Ruby脚本时,它运行良好。

require 'nokogiri'

doc = Nokogiri::XML(File.open("test.xml"))
doc = doc.css("name").each do |node|
    puts node.text
end

Enter into the CL, run ruby test.rb, returns

进入CL,运行ruby test.rb,返回

Name 1
Name 2
Name 3

And the crowd goes wild. I tweak a few things, make a few adjustments to the code...

人群疯狂。我调整了一些事情,对代码做了一些调整......

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("http://domain.tld"))
doc = doc.css("p").each do |node|
    puts node.text
end

Back to CL, ruby test.rb, returns... nothing! Just a new, empty line.

回到CL,ruby test.rb,返回......什么都没有!只是一个新的空行。

Is there any reason that it will work with an XML file, but not HTML?

是否有任何理由可以使用XML文件,而不是HTML?

1 个解决方案

#1


5  

To debug this sort of problem we need more information from you. Since you're not giving a working URL, and because we know that Nokogiri works fine for this sort of problem, the debugging falls on you.

要调试此类问题,我们需要您提供更多信息。由于您没有提供有效的URL,并且因为我们知道Nokogiri可以解决这类问题,所以调试工作就在您身上。

Here's what I would do to test:

这是我要做的测试:

In IRB:

  1. Do you get output when you do: open('http://whateverURLyouarehiding.com').read
  2. 当你这样做时,你得到输出:open('http://whateverURLyouarehiding.com')。read

  3. If that returns a valid document, what do you get when you wrap the previous open statement in Nokogiri::HTML(...). That needs to preserve the .read in the previous line too, so Nokogiri is receiving the body of the page, NOT an IO stream.
  4. 如果返回一个有效的文档,当你在Nokogiri :: HTML(...)中包装上一个open语句时会得到什么。这也需要保留前一行中的.read,因此Nokogiri正在接收页面的主体,而不是IO流。

  5. Try #2 above, but remove the .read. That will tell if there's a problem with Nokogiri reading an IO stream, though I seriously doubt it has a problem since I use it all the time. At that point I'd suspect a problem on your system.
  6. 尝试上面的#2,但删除.read。这将告诉Nokogiri读取IO流是否存在问题,但我严重怀疑它是否存在问题,因为我一直使用它。那时我怀疑你的系统有问题。

  7. If you're getting a document in #2 and #3, then the problem could be in your accessor; I suspect what you're looking for doesn't exist.
  8. 如果您在#2和#3中收到文档,则问题可能出在您的访问者身上;我怀疑你所寻找的东西是不存在的。

  9. If it does exist, then check the value of doc.errors after Nokogiri parses the document. It could be finding errors in the document, and, if so, they'll be captured there.
  10. 如果确实存在,则在Nokogiri解析文档后检查doc.errors的值。它可能是在文档中发现错误,如果是这样,它们将被捕获到那里。


推荐阅读
author-avatar
手机用户2502927973
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有