Python-如何使用非字母字符拆分字符串-Python-Howtosplitastringbynonalphacharacters

作者：手机用户2502901575_836 | 来源：互联网 | 2023-07-16 08:14

Imtryingtousepythontoparselinesofc++sourcecode.TheonlythingIaminterestedinisinc

I'm trying to use python to parse lines of c++ source code. The only thing I am interested in is include directives.

我正在尝试使用python来解析c ++源代码行。我唯一感兴趣的是包含指令。

    #include "header.hpp"

I want it to be flexible and still work with poor coding styles like:

我希望它具有灵活性,仍然适用于不良的编码风格,如:

          #   include"header.hpp"

I have gotten to the point where I can read lines and trim whitespace before and after the #. However I still need to find out what directive it is by reading the string until a non-alpha character is encountered regardless of weather it is a space, quote, tab or angled bracket.

我已经到了能够在#之前和之后读取线条和修剪空白的地步。但是我仍然需要通过读取字符串来找出它是什么指令,直到遇到非字母字符,无论天气如何,它都是空格,引号,制表符或有角度的括号。

So basically my question is: How can I split a string starting with alphas until a non alpha is encountered?

所以基本上我的问题是:我如何分割以alpha开头的字符串,直到遇到非alpha?

I think I might be able to do this with regex, but I have not found anything in the documentation that looks like what I want.

我想我可以用正则表达式做到这一点,但我没有在文档中找到任何看起来像我想要的东西。

Also if anyone has advice on how I would get the file name inside the quotes or angled brackets that would be a plus.

此外,如果有人有关于我如何获得引号或斜角括号内的文件名的建议,这将是一个加号。

7 个解决方案

#1

You can do that with a regex. However, you can also use a simple while loop.

你可以用正则表达式做到这一点。但是,您也可以使用简单的while循环。

def splitnonalpha(s):
   pos = 1
   while pos

 
Test: 
>>> splitnonalpha('#include"blah.hpp"')
('#include', '"blah.hpp"')

                        
                           
							  
							    #2
							    
							    
							      
11  
Your instinct on using regex is correct. 
你对使用正则表达式的直觉是正确的。 
import re
re.split('[^a-zA-Z]', string_to_split)
 
The [^a-zA-Z] part means "not alphabetic characters". 
[^ a-zA-Z]部分表示“不是字母字符”。
							     
							                          
                           
							  
							    #3
							    
							    
							      
2  
You can use regex. The \W token will match all non-word characters (which is about the same as non-alphanumeric). Word characters are A-Z, a-z, 0-9, and _. If you want to match underscores as well you could just do [\W_]. 
你可以使用正则表达式。 \ W令牌将匹配所有非单词字符(与非字母数字字符大致相同)。单词字符是A-Z,a-z,0-9和_。如果你想匹配下划线,你可以做[\ W_]。 
>>> import re
>>> line = '#   include"header.hpp"  ' 
>>> m = re.match(r'^\s*#\s*include\W+([\w\.]+)\W*$', line)
>>> m.group(1)
'header.hpp'

							     
							                          
                           
							  
							    #4
							    
							    
							      
1  
import re
s = 'foo bar- blah/hm.lala'
print(re.findall(r"\w+",s))
 
output : ['foo', 'bar', 'blah', 'hm', 'lala'] 
输出:['foo','bar','blah','hm','lala']
							     
							                          
                           
							  
							    #5
							    
							    
							      
1  
The two options mentioned by others that are best in my opinion are re.split and re.findall: 
在我看来,其他人提到的两个选项是re.split和re.findall: 
>>> import re
>>> re.split(r'\W+', '#include "header.hpp"')
['', 'include', 'header', 'hpp', '']
>>> re.findall(r'\w+', '#include "header.hpp"')
['include', 'header', 'hpp']
 
A quick benchmark: 
快速基准: 
>>> setup = "import re; word_pattern = re.compile(r'\w+'); sep_pattern = re.compile(r'\W+')"
>>> iteratiOns= 10**6
>>> timeit.timeit("re.findall(r'\w+', '#header foo bar!')", setup=setup, number=iterations)
3.000092029571533
>>> timeit.timeit("word_pattern.findall('#header foo bar!')", setup=setup, number=iterations)
1.5247418880462646
>>> timeit.timeit("re.split(r'\W+', '#header foo bar!')", setup=setup, number=iterations)
3.786440134048462
>>> timeit.timeit("sep_pattern.split('#header foo bar!')", setup=setup, number=iterations)
2.256173849105835
 
The functional difference is that re.split keeps empty tokens. That’s usually not useful for tokenization purposes, but the following should be identical to the re.findall solution: 
功能上的区别在于re.split保持空令牌。这通常对标记化目的没有用,但以下内容应与re.findall解决方案相同: 
>>> filter(bool, re.split(r'\W+', '#include "header.hpp"'))
['include', 'header', 'hpp']

							     
							                          
                           
							  
							    #6
							    
							    
							      
0  
While not exact, most parse header directives like this  
虽然不精确,但大多数解析头指令都是这样的 
(?m)^\h*#\h*include\h*["<](\w[\w.]*)\h*[">]  
Where, (?m) is multi-line mode, \h is horizontal whitespace (aka [^\S\r\n] ).  
其中,(?m)是多行模式,\ h是水平空格(又名[^ \ S \ r \ n])。
							     
							                          
                           
							  
							    #7
							    
							    
							      
0  
This works: 
import re

test_str = '    #   include "header.hpp"'

match = re.match(r'\s*#\s*include\s*("[\w.]*")', test_str)
if match:
    print match.group(1)




    
        
                        python
                        regex
                        string
                        split
                        char
                        int
                        cpython
                        include
                        header
                    
    



    
        写下你的评论吧 !
        
            
                吐个槽吧,看都看了
            
            
                
                                        会员登录 | 用户注册
                                    
                
            
        

        
    

    
        推荐阅读
        
            
                                
                    
                        text
                        利用爬虫技术抓取数据，结合Fiddler与Postman在Chrome中的应用优化提交流程
                    

                    
                                                
                            
                        
                                                
                        本文探讨了如何利用爬虫技术抓取目标网站的数据，并结合Fiddler和Postman工具在Chrome浏览器中的应用，优化数据提交流程。通过详细的抓包分析和模拟提交，有效提升了数据抓取的效率和准确性。此外，文章还介绍了如何使用这些工具进行调试和优化，为开发者提供了实用的操作指南。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-09 09:05:16
                    

                

                
                                
                    
                        text
                        网站访问全流程解析
                    

                    
                                                
                            
                        
                                                
                        本文详细介绍了从用户在浏览器中输入一个域名（如www.yy.com）到页面完全展示的整个过程，包括DNS解析、TCP连接、请求响应等多个步骤。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-12 18:13:16
                    

                

                                
                    
                    
                
                
                                
                    
                        range
                        c/c++常用代码doc,ppt,xls文件格式转PDF格式[转]
                    

                    
                                                
                        [转]doc,ppt,xls文件格式转PDF格式http:blog.csdn.netlee353086articledetails7920355确实好用。需要注意的是#import ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-12 16:19:40
                    

                

                
                                
                    
                        text
                        使用Python和smtplib实现邮件发送功能
                    

                    
                                                
                        本文详细介绍了如何使用Python中的smtplib库来发送带有附件的邮件，并提供了完整的代码示例。作者：多测师_王sir，时间：2020年5月20日 17:24，微信：15367499889，公司：上海多测师信息有限公司。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-12 12:21:27
                    

                

                
                                
                    
                        dll
                        开机自启动的几种方式
                    

                    
                                                
                            
                        
                                                
                        0x01快速自启动目录快速启动目录自启动方式源于Windows中的一个目录，这个目录一般叫启动或者Startup。位于该目录下的PE文件会在开机后进行自启动 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-12 11:16:30
                    

                

                
                                
                    
                        range
                        大类|电阻器_使用Requests、Etree、BeautifulSoup、Pandas和Path库进行数据抓取与处理 | 将指定区域内容保存为HTML和Excel格式
                    

                    
                                                
                            
                        
                                                
                        大类|电阻器_使用Requests、Etree、BeautifulSoup、Pandas和Path库进行数据抓取与处理 | 将指定区域内容保存为HTML和Excel格式 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-11 19:05:59
                    

                

                
                                
                    
                        list
                        尝试对从复杂 XSD 生成的类进行序列化时出现 NullReferenceException 错误
                    

                    
                                                
                        在尝试对从复杂 XSD 生成的类进行序列化时，遇到了 `NullReferenceException` 错误。尽管已经花费了数小时进行调试和搜索相关资料，但仍然无法找到问题的根源。希望社区能够提供一些指导和建议，帮助解决这一难题。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-02 17:25:38
                    

                

                
                                
                    
                        range
                        在范围[0..n-1]中产生m个不同的随机数 - Generating m distinct random numbers in the range [0..n-1]
                    

                    
                                                
                        Ihavetwomethodsofgeneratingmdistinctrandomnumbersintherange[0..n-1]我有两种方法在范围[0.n-1]中生 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-13 09:49:14
                    

                

                
                                
                    
                        search
                        杜甫《喜晴》的两种英译比较
                    

                    
                                                
                        本文对比了杜甫《喜晴》的两种英文翻译版本：a. Pleased with Sunny Weather 和 b. Rejoicing in Clearing Weather。a 版由 alexcwlin 翻译并经 Adam Lam 编辑，b 版则由哈佛大学的宇文所安教授 (Prof. Stephen Owen) 翻译。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-12 15:02:28
                    

                

                
                                
                    
                        text
                        如何在Webpack项目中集成ECharts
                    

                    
                                                
                        本文将详细介绍如何在Webpack项目中安装和使用ECharts，包括全量引入和按需引入的方法，并提供一个柱状图的示例。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-12 09:49:07
                    

                

                
                                
                    
                        text
                        利用Struts1构建简易计算器：采用DispatchAction处理请求，动态Form优化开发流程，提供用户友好的错误提示
                    

                    
                                                
                        本文介绍了如何利用Struts1框架构建一个简易的四则运算计算器。通过采用DispatchAction来处理不同类型的计算请求，并使用动态Form来优化开发流程，确保代码的简洁性和可维护性。同时，系统提供了用户友好的错误提示，以增强用户体验。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-09 19:48:22
                    

                

                
                                
                    
                        web
                        Java环境中Selenium Chrome驱动在大规模Web应用扩展时的性能限制分析
                    

                    
                                                
                        Java环境中Selenium Chrome驱动在大规模Web应用扩展时的性能限制分析 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-07 10:10:30
                    

                

                
                                
                    
                        search
                        如何更有效地提升对支持部门的协助与支撑？ - Enhancing Support for the Support Department: Strategies and Best Practices
                    

                    
                                                
                        尽管我们尽最大努力，任何软件开发过程中都难免会出现缺陷。为了更有效地提升对支持部门的协助与支撑，本文探讨了多种策略和最佳实践，旨在通过改进沟通、增强培训和支持流程来减少这些缺陷的影响，并提高整体服务质量和客户满意度。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-07 06:55:33
                    

                

                
                                
                    
                        range
                        Oracle表空间分区技术详解与实践总结
                    

                    
                                                
                        本文详细介绍了Oracle数据库中的表空间及其分区技术。表空间作为Oracle数据库的一个逻辑单元，每个数据库可包含一个或多个表空间，每个表空间则关联一个或多个数据文件。通过合理的表空间管理和分区策略，可以显著提升数据库的性能和管理效率。文章还总结了实际应用中的最佳实践，为读者提供了宝贵的参考。 ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-06 13:12:31
                    

                

                
                                
                    
                        list
                        如何安全卸载OpenJDK以优化Java环境配置？
                    

                    
                                                
                        在Ubuntu 13.04系统中，如果希望移除OpenJDK以优化Java环境配置，但尝试卸载`openjdk-7-jre`时遇到了问题。具体命令 `$ sudo apt-get purge openjdk-7-jre` 会显示如下提示信息： ...
                        [详细]
                    
                    

                    
                        蜡笔小新   2024-11-03 09:05:53

















    

    
        
            
            
                
                
            

            
                手机用户2502901575_836            

            
                这个家伙很懒，什么也没留下！            


        
    

    
    

    
    

    
        Tags | 热门标签
        
            
                                
                    php7
                
                                
                    request
                
                                
                    shell
                
                                
                    timestamp
                
                                
                    php
                
                                
                    perl
                
                                
                    nodejs
                
                                
                    express
                
                                
                    spring
                
                                
                    web
                
                                
                    foreach
                
                                
                    hashset
                
                                
                    const
                
                                
                    buffer
                
                                
                    window
                
                                
                    php8
                
                                
                    chat
                
                                
                    golang
                
                                
                    list
                
                                
                    include
                
                                
                    hash
                
                                
                    text
                
                                
                    web3
                
                                
                    cmd
                
                                
                    uml
                
                                
                    dll
                
                                
                    random
                
                                
                    timezone
                
                                
                    search
                
                                
                    range
                
                                
            
        
    

    
    
        
            
            
        
        RankList | 热门文章
        
            
                                
                    1web前端升级之路
                
                                
                    2JS BOM基础 全局对象  window location  history screen navigator
                
                                
                    3IoC容器Autofac正篇之类型关联（服务暴露）(八)
                
                                
                    4Prometheus cadvisor容器监控和nodeexporter节点监控
                
                                
                    5Docker，安装部署 Minio
                
                                
                    6Python中的集合操作
                
                                
                    7windows上的python能否在unix上使用_python文件格式（unix、windows）互换
                
                                
                    8Oracle 创建 Schema
                
                                
                    9mysql exit 意义,mysql explain用法和结果的含义
                
                                
                    10XAF之刷新View的方法
                
                                
                    11四六级仔细阅读技巧（不需要太多的词汇量）
                
                                
                    12高德定位SDK_高德地图api使用教程
                
                                
                    13MVC 外网  上传 下载 实现方式（一）
                
                                
                    14知识付费时代，你怎么看？
                
                                
                    15【剑指 Offer II】 105. 岛屿的最大面积