热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

按字段数过滤行数-Filterlinesbynumberoffields

IamfilteringverylongtextfilesinLinux(usually>1GB)togetonlythoselinesIaminteres

I am filtering very long text files in Linux (usually > 1GB) to get only those lines I am interested in. I use with this command:

我在Linux中过滤很长的文本文件(通常> 1GB)只能获得我感兴趣的那些行。我使用这个命令:

cat ./my/file.txt | LC_ALL=C fgrep -f ./my/patterns.txt | $decoder > ./path/to/result.txt

$decoder is the path to a program I was given to decode these files. The problem now is that it only accept lines with 7 fields, this is, 7 strings separated by spaces (e.g. "11 22 33 44 55 66 77"). Whenever a string with more or less fields is passed into this program makes it crash, and I get a broken pipe error message.

$ decoder是我给出的解码这些文件的程序的路径。现在的问题是它只接受带有7个字段的行,即7个用空格分隔的字符串(例如“11 22 33 44 55 66 77”)。每当具有更多或更少字段的字符串传递到此程序时,它就会崩溃,并且我收到管道错误消息。

To fix it, I wrote a super simple script in Bash:

为了解决这个问题,我在Bash中编写了一个超级简单的脚本:

while read line ; do
    if [[ $( echo $line | awk '{ print NF }') == 7 ]]; then
        echo $line;
    fi;
done

But the problem is that now it take ages to finish. Before it took seconds and now it takes ~30 minutes.

但问题是,现在需要很长时间才能完成。在需要几秒钟之前,现在需要约30分钟。

Does anyone know a better/faster way to do this? Thank you in advance.

有谁知道更好/更快的方法吗?先谢谢你。

1 个解决方案

#1


1  

Well perhaps you can insert awk between instead. No need to rely on Bash:

好吧也许你可以在之间插入awk。无需依赖Bash:

LC_ALL=C fgrep -f ./my/patterns.txt ./my/file.txt | awk 'NF == 7' | "$decoder" > ./path/to/result.txt

Perhaps awk can be the starter. Performance may be better that way:

也许awk可以成为首发。性能可能更好:

awk 'NF == 7' ./my/file.txt | LC_ALL=C fgrep -f ./my/patterns.txt | "$decoder" > ./path/to/result.txt

You can merge fgrep and awk as a single awk command however I'm not sure if that would affect anything that require LC_ALL=C and that it would give better performance.

您可以将fgrep和awk合并为单个awk命令,但是我不确定这是否会影响需要LC_ALL = C的任何内容,并且它会提供更好的性能。


推荐阅读
author-avatar
mobiledu2502891657
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有