作者:mobiledu2502891657 | 来源:互联网 | 2023-05-17 22:42
I am filtering very long text files in Linux (usually > 1GB) to get only those lines I am interested in. I use with this command:
我在Linux中过滤很长的文本文件(通常> 1GB)只能获得我感兴趣的那些行。我使用这个命令:
cat ./my/file.txt | LC_ALL=C fgrep -f ./my/patterns.txt | $decoder > ./path/to/result.txt
$decoder
is the path to a program I was given to decode these files. The problem now is that it only accept lines with 7 fields, this is, 7 strings separated by spaces (e.g. "11 22 33 44 55 66 77"). Whenever a string with more or less fields is passed into this program makes it crash, and I get a broken pipe error message.
$ decoder是我给出的解码这些文件的程序的路径。现在的问题是它只接受带有7个字段的行,即7个用空格分隔的字符串(例如“11 22 33 44 55 66 77”)。每当具有更多或更少字段的字符串传递到此程序时,它就会崩溃,并且我收到管道错误消息。
To fix it, I wrote a super simple script in Bash
:
为了解决这个问题,我在Bash中编写了一个超级简单的脚本:
while read line ; do
if [[ $( echo $line | awk '{ print NF }') == 7 ]]; then
echo $line;
fi;
done
But the problem is that now it take ages to finish. Before it took seconds and now it takes ~30 minutes.
但问题是,现在需要很长时间才能完成。在需要几秒钟之前,现在需要约30分钟。
Does anyone know a better/faster way to do this? Thank you in advance.
有谁知道更好/更快的方法吗?先谢谢你。
1 个解决方案