作者:书友36110188 | 来源:互联网 | 2024-10-19 19:03
ThefileIhaveisabitunstructuredandmessy.Ihavefoo.xmlwhosesizeis100gbsfilesuchas:
The file I have is a bit unstructured and messy. I have foo.xml whose size is 100 gbs file such as:
我的文件有点非结构化和杂乱。我有foo.xml,其大小为100 gbs文件,例如:
some_path_1
another_path_1
some_text_again
some_text_again
.
.
.
The expected output I need is:
我需要的预期输出是:
some_path_1
another_path_1
attrib: string=blah
some_text_again
attrib: attribs=yes, labs=check
some_text_again
Currently I am using lxml parser. Such as:
目前我正在使用lxml解析器。如:
from lxml import etree
root = etree.parse('foo.xml').getroot()
for i in root.iterchildren():
# do something
What would be a better way to do it since it's a 100 gb file.
什么是更好的方法,因为它是一个100 GB的文件。
2 个解决方案