作者:Qualcommtjmag_716 | 来源:互联网 | 2023-05-18 22:35
IamtryingtoedittextusingpythonregexthatoriginatesfromanMSWorddocumentthathasbeenc
I am trying to edit text using python regex that originates from an MS Word document that has been created by someone else. The document has specific formatting and equations that need to be preserved. I save the .docx file as a .xml and edit with python. Unfortunately, Word adds XML tags that split the words and messes with my regular expressions. Example(this is the format that Word outputs):
我正在尝试使用python正则表达式编辑文本,该正则表达式源自其他人创建的MS Word文档。该文档具有需要保留的特定格式和方程式。我将.docx文件保存为.xml并使用python进行编辑。不幸的是,Word添加了XML标签,用我的正则表达式分隔单词和混乱。示例(这是Word输出的格式):
awesome
敬畏
some
I have attempted to remove the tags with regular expressions and have had little success. Any help is appreciated.
我试图用正则表达式删除标签,但收效甚微。任何帮助表示赞赏。
EDIT: The solution does not have to incorporate Python or regex
编辑:解决方案不必包含Python或正则表达式
1 个解决方案