热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Pythonregex:替换数字和特殊字符,除非年份。-Pythonregex:replacenumbersandspecialcharactersexceptyears

Iwanttoreplaceallnon-alphabeticcharacterswithspaces,excludingyearsbetween1950and2029.

I want to replace all non-alphabetic characters with spaces, excluding years between 1950 and 2029. E.g.:

我想用空格替换所有非字母字符,不包括1950年至2029年之间的年份。例如:

ab-c 0123 4r. a2017 2010 -> ab c r a 2010

ab-c 0123 4 r。a2017 2010 -> ab c r a 2010。

My attempt so far, trying to blacklist the dates via a negative look-ahead:

到目前为止,我的尝试是通过消极的展望将日期列入黑名单:

re.sub('(?!\b19[5-9][0-9]\b|\b20[0-2][0-9]\b)([^A-Za-z]+)', ' ', string)

Since this doesn't work, any help is greatly appreciated!

因为这不起作用,所以非常感谢您的帮助!

3 个解决方案

#1


2  

You could use a simple regex and pass a function to check if it's a year:

您可以使用一个简单的regex并传递一个函数来检查它是否为一年:

import re

def replace_non_year_numbers(m):
  number = int(m.group(0))
  if 1950 <= number <= 2029:
    return str(number)
  else:
    return ''

print(re.sub('\d+', replace_non_year_numbers, 'ab-c 0123 4r. a2017 2010'))
# 'ab-c  r. a2017 2010'

To keep the regex and the logic simple, you could remove special characters in a second step:

为了使regex和逻辑保持简单,您可以在第二步中删除特殊字符:

only_years = re.sub('\d+', replace_non_year_numbers, 'ab-c 0123 4r. a2017 2010')
no_special_char = re.sub('[^A-Za-z0-9 ]', ' ', only_years)
print(re.sub(' +', ' ', no_special_char))
# ab c r a2017 2010

#2


1  

Let's select what you want to keep in your result. Look at the regex:

让我们选择你想要保留的结果。看看正则表达式:

(
  (?

in a oneliner:

oneliner:

((?

You can test it on regex 101

您可以在regex 101上测试它

Let's put that in a python script:

让我们把它放到python脚本中:

$ cat test.py
import re

pattern = r"(?:(?

which gives:

这使:

$ python test.py
ab c r a 2010 a 1955 abc

#3


0  

Not too pretty, but I would use multiple replaces:

不是很漂亮,但我会用倍数替换:

import re

def check_if_year(m):
  number = int(m.group(0))
  if 1950 <= number <= 2029:
    return str(number)
  else:
    return ' '

s = 'ab-c 0123 4r. a2017 2010 1800'             # Added 1800 for testing
print(s)
print('ab c r a 2010')
t = re.sub(r'[^A-Za-z0-9 ]+', ' ', s)           # Only non-alphanumeric
t = re.sub(r'(?!\b\d{4}\b)(?

ideone demo

ideone演示

(?!\b\d{4}\b)(?

Will match any number as long as it's not a 4 digit number 'standing alone' (no characters except whitespace or string start/end around it), and I'm using (? so that it won't attempt matching in the middle of a number.

将匹配任何数字,只要它不是一个4位数的“独立”(除了空格或字符串开始/结束),我正在使用(?


推荐阅读
author-avatar
鉴佳熙萍
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有