您可以简单地去除所有标签:
>>> import re
>>> txt = """
...
...
Everyday Italian
... Giada De Laurentiis
... 2005
... 300.00
...
...
...
...
Harry Potter
... J K. Rowling
... 2005
... 625.00
...
... """
>>> exp &#61; re.compile(r&#39;<.>&#39;)
>>> text_only &#61; exp.sub(&#39;&#39;,txt).strip()
>>> text_only
&#39;Everyday Italian
Giada De Laurentiis
2005
300.00
Harry Potter
J K. Rowling
2005
6
25.00&#39;
但是,如果您只想在Linux中搜索某些文本的文件,则可以使用grep&#xff1a;
burhan&#64;sandbox:~$grep "Harry Potter" file.xml
Harry Potter
如果要搜索文件,请使用上面的grep命令,或打开文件并在Python中搜索它&#xff1a;
>>> import re
>>> exp &#61; re.compile(r&#39;<.>&#39;)
>>> with open(&#39;file.xml&#39;) as f:
... lines &#61; &#39;&#39;.join(line for line in f.readlines())
... text_only &#61; exp.sub(&#39;&#39;,lines).strip()
...
>>> if &#39;Harry Potter&#39; in text_only:
... print &#39;It exists&#39;
... else:
... print &#39;It does not&#39;
...
It exists