热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

解析器错误:XML声明只允许在文档的开头-parsererror:XMLdeclarationallowedonlyatthestartofthedocument

Ihaveaxmlfilewhichcontainsmultipledeclarationslikethefollowing我有一个包含多个声明的xml文件,如下所示<

I have a xml file which contains multiple declarations like the following

我有一个包含多个声明的xml文件,如下所示



 
  Stefan
  42
  Shirt
  3000  





 
  Damon
  32
  Jeans
  4000  


when i tried to load the xml with

当我尝试加载xml时

$data = simplexml_load_file("testdoc.xml") or die("Error: Cannot create object");

then it gives me the following error

然后它给了我以下的错误

Warning: simplexml_load_file(): testdoc.xml:11: parser error : XML declaration allowed only at the start of the document in C:\xampp\htdocs\crea\services\testxml.php on line 3

Warning: simplexml_load_file():  in C:\xampp\htdocs\crea\services\testxml.php on line 3

Warning: simplexml_load_file(): ^ in C:\xampp\htdocs\crea\services\testxml.php on line 3

Warning: simplexml_load_file(): testdoc.xml:12: parser error : Extra content at the end of the document in C:\xampp\htdocs\crea\services\testxml.php on line 3

Warning: simplexml_load_file():  in C:\xampp\htdocs\crea\services\testxml.php on line 3

Warning: simplexml_load_file(): ^ in C:\xampp\htdocs\crea\services\testxml.php on line 3
Error: Cannot create object

please let me know how to parse this xml or how to split it into no of xml files so that i can read. The file size is around 1 gb.

请让我知道如何解析这个xml,或者如何将它分割成xml文件,以便我可以阅读。文件大小约为1gb。

2 个解决方案

#1


4  

The second line

第二行


needs to be removed. Only 1 xml declaration is a allowed in any file and it must be the first line.

需要被删除。任何文件中只允许有1个xml声明,并且必须是第一行。

Strictly speaking, you also need to have a single root element (though i've seen lenient parsers). Just wrap the contents with a pseudo tag, such that your file would look like:

严格地说,您还需要一个单独的根元素(尽管我看到过一些比较宽容的解析器)。只需用伪标记将内容包装起来,使您的文件看起来如下:



    
        
    
    
        
    

    

Solution for (very) large files:

(非常)大的文件的解决方案:

Use sed to eliminate offending xml declarations and printf to add a single xml declaration plus a unique root element. A sequence of bash commands follows:

使用sed消除违规的xml声明,并使用printf添加一个xml声明和一个惟一的根元素。bash命令序列如下:

  printf "\n\n" >out.xml
  sed '/<\?xml /d' in.xml >>out.xml
  printf "\n\n" >>out.xml

in.xml denotes your original file,out.xml the purged result.

在。xml表示原始文件out。xml清除的结果。

printf prints a single xml declaration and the opening/closing tags. sed is a tool to edit a file line by line performing actions contingent on regex pattern matches. The pattern to match is the start of the xml declaration (<\? xml), the action to perform is to delete that line.

printf打印一个xml声明和打开/关闭标记。sed是一种工具,可以根据regex模式匹配逐行编辑执行操作的文件。要匹配的模式是xml声明的开始(<\?执行的操作是删除这一行。

Notes:

注:

  • the backslashes in the commands escape symbols with special semantics at the position where they occur.
  • 命令中的反斜杠以特殊语义在它们发生的位置转义符号。
  • sed is available for windows/macos too.
  • sed也适用于windows/macos。

Alternate solution

Another option is to split the file into individual well-formed files (taken from this SO answer:

另一种选择是将文件分割成单独的格式良好的文件(从这个SO中获取答案:

csplit -z -f 'temp' -b 'out%03d.xml' in.xml '/<\?xml /' {*}

which produces files named out000.xml, out001.xml, ... You should know at least the magnitude of the number of individual files that have been processed into your input file to be safe with the autonumbering ( though you could of course take the byte number of the input file as the magnitude, using -b 'out%09d.xml' in the above command).

生成名为out000的文件。xml,out001。xml,…您应该至少知道被处理到您的输入文件中的单个文件的数量,以便在自动编号时安全(当然,您可以使用-b 'out%09d将输入文件的字节数作为大小。以上命令中的xml)。

#2


1  

This is not valid XML. You will need to use string functions to split it - or to be more exact to read it part by part.

这不是有效的XML。您将需要使用字符串函数来分割它——或者更准确地逐部分读取它。

$xmlDeclaration = '';

$file = new SplFileObject($filename, 'r');
$file->setFlags(SplFileObject::SKIP_EMPTY);
$buffer = '';
foreach ($file as $line) {
  if (FALSE === strpos($line, $xmlDeclaration)) {
    $buffer .= $line; 
  } else {
    outputBuffer($buffer);
    $buffer = $line;
  }
}
outputBuffer($buffer);

function outputBuffer($buffer) {
  if (!empty($buffer)) {
    $dom = new DOMDocument();
    $dom->loadXml($buffer);
    $xpath = new DOMXPath($dom);
    echo $xpath->evaluate('string(//element1)'), "\n";
  }
}

Output:

输出:

Stefan
Damon

推荐阅读
author-avatar
mobiledu2502861593
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有