我有一个包含225000行的文件,其中包含一堆相似的行。我希望删除所有类似的行,而只保留每个“类型”的第一行。示例如下。
我想要一个看起来像这样的文件:
./ACT_HERE_REPORT_MEMO_APPROVED_20180510_083000.log.gz ./ACT_HERE_REPORT_MEMO_APPROVED_20180512_083000.log.gz ./ACT_HERE_REPORT_MEMO_APPROVED_20180513_083000.log.gz ./ACT_HERE_REPORT_MEMO_APPROVED_20180515_083000.log.gz ./ACT_HERE_SOMETHING_MEMO_APPROVED_20180326.xls ./ACT_HERE_SOMETHING_MEMO_APPROVED_20180327.xls ./ACT_HERE_SOMETHING_MEMO_APPROVED_20180328.xls ./ACT_HERE_SOMETHING_MEMO_APPROVED_20180329.xls ./ACT_HERE_SOMETHING_MEMO_APPROVED_20180331.xls ./Archive/20150919-084501.SOMETHING ./Archive/20150922-084501.SOMETHING ./Archive/20150923-084500.SOMETHING ./Archive/20150924-084500.SOMETHING ./TEST/TEST.20170310.20170310-181017.txt.gz ./TEST/TEST.20170310.20170310-201023.txt.gz ./TEST/TEST.20170313.20170313-011035.txt.gz ./TEST/TEST.20170313.20170313-024006.txt.gz ./TEST/TEST.20170313.20170313-041018.txt.gz ./TEST/TEST.20180402-011024.log.gz ./TEST/TEST.20180402-011200.log.gz ./TEST/TEST.20180402-061113.log.gz ./TEST/TEST.20180402-081013.log.gz ./TEST/TEST.20180402-101012.log.gz
要这样结束:
./ACT_HERE_REPORT_MEMO_APPROVED_20180510_083000.log.gz ./ACT_HERE_SOMETHING_MEMO_APPROVED_20180326.xls ./Archive/20150919-084501.SOMETHING ./TEST/TEST.20170310.20170310-181017.txt.gz ./TEST/TEST.20180402-011024.log.gz
Toto.. 5
Ctrl+H
找什么: ((^.+?)[-_.\d]+(\..+\R))(?:\2[-_.\d]+\3)+
用。。。来代替: $1
检查环绕
检查正则表达式
取消选中 . matches newline
Replace all
说明:
( # start group 1 ( # start group 2 ^ # beginning of line .+? # 1 or more any character but newline, not greedy ) # end group 2 [-_.\d]+ # 1 or more hyphen, underscore, dot or digit ( # start group 3 \. # a dot .+ # 1 or more any character \R # any kind of linebreak ) # end group 3 ) # end group 1 (?: # non capture group \2 # backreference to group 2 [-_.\d]+ # 1 or more hyphen, underscore, dot or digit \3 # backreference to group 3 )+ # end group, must appear 1 or more times
给定示例的结果:
./ACT_HERE_REPORT_MEMO_APPROVED_20180510_083000.log.gz ./ACT_HERE_SOMETHING_MEMO_APPROVED_20180326.xls ./Archive/20150919-084501.SOMETHING ./TEST/TEST.20170310.20170310-181017.txt.gz ./TEST/TEST.20180402-011024.log.gz
屏幕截图:
Ctrl+H
找什么: ((^.+?)[-_.\d]+(\..+\R))(?:\2[-_.\d]+\3)+
用。。。来代替: $1
检查环绕
检查正则表达式
取消选中 . matches newline
Replace all
说明:
( # start group 1 ( # start group 2 ^ # beginning of line .+? # 1 or more any character but newline, not greedy ) # end group 2 [-_.\d]+ # 1 or more hyphen, underscore, dot or digit ( # start group 3 \. # a dot .+ # 1 or more any character \R # any kind of linebreak ) # end group 3 ) # end group 1 (?: # non capture group \2 # backreference to group 2 [-_.\d]+ # 1 or more hyphen, underscore, dot or digit \3 # backreference to group 3 )+ # end group, must appear 1 or more times
给定示例的结果:
./ACT_HERE_REPORT_MEMO_APPROVED_20180510_083000.log.gz ./ACT_HERE_SOMETHING_MEMO_APPROVED_20180326.xls ./Archive/20150919-084501.SOMETHING ./TEST/TEST.20170310.20170310-181017.txt.gz ./TEST/TEST.20180402-011024.log.gz
屏幕截图: