热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

有效地解析100gbxml文件-efficientlyparsing100gbxmlfile

ThefileIhaveisabitunstructuredandmessy.Ihavefoo.xmlwhosesizeis100gbsfilesuchas:

The file I have is a bit unstructured and messy. I have foo.xml whose size is 100 gbs file such as:

我的文件有点非结构化和杂乱。我有foo.xml,其大小为100 gbs文件,例如:


    
         
             some_path_1
             another_path_1
         
    
    
        some_text_again
        some_text_again
    
 .
 .
 .
 

The expected output I need is:

我需要的预期输出是:

some_path_1
another_path_1
attrib: string=blah
some_text_again
attrib: attribs=yes, labs=check
some_text_again

Currently I am using lxml parser. Such as:

目前我正在使用lxml解析器。如:

from lxml import etree
root = etree.parse('foo.xml').getroot()
for i in root.iterchildren():
    # do something

What would be a better way to do it since it's a 100 gb file.

什么是更好的方法,因为它是一个100 GB的文件。

2 个解决方案

#1


0  

I had the same problem with a huge file and found that I had to parse it incrementally.

我有一个巨大的文件同样的问题,发现我必须逐步解析它。

import xml.etree.ElementTree as ET
cOntext= ET.iterparse(result_file_name, events=["end"])
    # turn it into an iterator
    cOntext= iter(context)
    for event, elem in context:
        if event == "end":
        .....

#2


0  

Using XSLT 3.0 with streaming enabled this would be:

使用支持流式传输的XSLT 3.0,这将是:


  
  
  
  {.}&xa;
  attrib: {
     string-join(@* ! (name() || '=' || .), ', ')
  }



推荐阅读
author-avatar
书友36110188
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有