作者:技术潜行者 | 来源:互联网 | 2023-09-03 11:28
Ihaveagzippedfiletradedata.txt.gzwhichcontainsmillionsofrecords.thisfilehasabout50fi
I have a gzipped file tradedata.txt.gz
which contains millions of records. this file has about 50 fields separated by |
. The 45th field can contain values such as 0000
, 0002
, 0003
, 0004
and blank value(null). I want to filter the file and get those rows with value 0000,0002 and blank values only. I want to do this in the fastest way using awk
, perl
, or any other language.
我有一个gzip压缩文件tradedata.txt.gz,其中包含数百万条记录。此文件有大约50个以|分隔的字段。第45个字段可以包含诸如0000,0002,0003,0003和空值(空值)之类的值。我想过滤文件并获取值为0000,0002且仅为空值的行。我想使用awk,perl或任何其他语言以最快的方式执行此操作。
For example, the data looks like this (I am only displaying few fields for illustration purposes).
例如,数据看起来像这样(我只显示几个字段用于说明目的)。
abc|234|test|0000|test2|1
abc|2343|test1|0002|test2|1
abc|2345|test3|0004|test2|1
abc|2346|test4|0004|test2|1
abc|2347|test5|0003|test2|1
abc|2348|test6||test2|1
abc|234|test|0003|test2|1
The results after filtering the data should be:
过滤数据后的结果应为:
abc|234|test|0000|test2|1
abc|2343|test1|0002|test2|1
abc|2348|test6||test2|1
As you can see, I am only pulling records with value 0000,0002 and blank. Can someone help with this request using awk, perl or anything other language that does it the fastest way?
正如您所看到的,我只会提取值为0000,0002且空白的记录。有人可以使用awk,perl或其他任何语言以最快的方式帮助处理此请求吗?
2 个解决方案