热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

什么语言是二进制的,因为Perl是文本的?-Whatlanguageistobinary,asPerlistotext?

Iamlookingforascripting(orhigherlevelprogramming)language(ore.g.modulesforPythonors

I am looking for a scripting (or higher level programming) language (or e.g. modules for Python or similar languages) for effortlessly analyzing and manipulating binary data in files (e.g. core dumps), much like Perl allows manipulating text files very smoothly.

我正在寻找一种脚本(或更高级别的编程)语言(或者例如Python或类似语言的模块),以便毫不费力地分析和操作文件中的二进制数据(例如核心转储),就像Perl允许非常流畅地操作文本文件一样。

Things I want to do include presenting arbitrary chunks of the data in various forms (binary, decimal, hex), convert data from one endianess to another, etc. That is, things you normally would use C or assembly for, but I'm looking for a language which allows for writing tiny pieces of code for highly specific, one-time purposes very quickly.

我想要做的事情包括以各种形式呈现数据的任意块(二进制,十进制,十六进制),将数据从一个endianess转换为另一个endianess,等等。也就是说,你通常会使用C或汇编的东西,但我是寻找一种语言,允许非常快速地为非常具体的一次性目的编写微小的代码片段。

Any suggestions?

11 个解决方案

#1


Things I want to do include presenting arbitrary chunks of the data in various forms (binary, decimal, hex), convert data from one endianess to another, etc. That is, things you normally would use C or assembly for, but I'm looking for a language which allows for writing tiny pieces of code for highly specific, one-time purposes very quickly.

我想要做的事情包括以各种形式呈现数据的任意块(二进制,十进制,十六进制),将数据从一个endianess转换为另一个endianess,等等。也就是说,你通常会使用C或汇编的东西,但我是寻找一种语言,允许非常快速地为非常具体的一次性目的编写微小的代码片段。

Well, while it may seem counter-intuitive, I found erlang extremely well-suited for this, namely due to its powerful support for pattern matching, even for bytes and bits (called "Erlang Bit Syntax"). Which makes it very easy to create even very advanced programs that deal with inspecting and manipulating data on a byte- and even on a bit-level:

好吧,虽然看起来有点反直觉,但我发现erlang非常适合这种情况,即由于它对模式匹配的强大支持,甚至是字节和位(称为“Erlang位语法”)。这使得创建甚至非常高级的程序非常容易,这些程序可以处理在字节上甚至在位级别上检查和操作数据:

Since 2001, the functional language Erlang comes with a byte-oriented datatype (called binary) and with constructs to do pattern matching on a binary.

从2001年开始,函数式语言Erlang带有一个面向字节的数据类型(称为二进制),并带有用于对二进制进行模式匹配的结构。

And to quote informIT.com:

并引用informIT.com:

(Erlang) Pattern matching really starts to get fun when combined with the binary type. Consider an application that receives packets from a network and then processes them. The four bytes in a packet might be a network byte-order packet type identifier. In Erlang, you would just need a single processPacket function that could convert this into a data structure for internal processing. It would look something like this:

(Erlang)与二进制类型结合使用时,模式匹配确实开始变得有趣。考虑从网络接收数据包然后处理它们的应用程序。分组中的四个字节可以是网络字节顺序分组类型标识符。在Erlang中,您只需要一个可以将其转换为内部处理数据结构的processPacket函数。它看起来像这样:

processPacket(<<1:32/big,RestOfPacket>>) ->
    % Process type one packets
    ...
;
processPacket(<<2:32/big,RestOfPacket>>) ->
    % Process type two packets
    ...

So, erlang with its built-in support for pattern matching and it being a functional language is pretty expressive, see for example the implementation of ueencode in erlang:

因此,erlang具有内置的模式匹配支持,并且它是一种功能性语言非常具有表现力,例如参见erlang中ueencode的实现:

uuencode(BitStr) ->
<<(X+32):8 || <> <= BitStr >>.
uudecode(Text) ->
<<(X-32):6 || <> <= Text >>.

For an introduction, see Bitlevel Binaries and Generalized Comprehensions in Erlang.You may also want to check out some of the following pointers:

有关介绍,请参阅Erlang中的Bitlevel二进制文件和广义理解。您可能还想查看以下一些指针:

  • Parsing Binaries with erlang, lamers inside
  • 解析二进制文件与erlang,lamers里面

  • More File Processing with Erlang
  • 使用Erlang进行更多文件处理

  • Learning Erlang and Adobe Flash format same time
  • 同时学习Erlang和Adobe Flash格式

  • Large Binary Data is (not) a Weakness of Erlang
  • 大二进制数据(不)是Erlang的弱点

  • Programming Efficiently with Binaries and Bit Strings
  • 使用二进制和位串有效编程

  • Erlang bit syntax and network programming
  • Erlang位语法和网络编程

  • erlang, the language for network programming (1)
  • erlang,网络编程的语言(1)

  • Erlang, the language for network programming Issue 2: binary pattern matching
  • Erlang,网络编程的语言问题2:二进制模式匹配

  • An Erlang MIDI File Reader/Writer
  • Erlang MIDI文件读写器

  • Erlang Bit Syntax
  • Erlang位语法

  • Comprehending endianness
  • Playing with Erlang
  • 和Erlang一起玩

  • Erlang: Pattern Matching Declarations vs Case Statements/Other
  • Erlang:模式匹配声明与案例陈述/其他

  • A Stream Library using Erlang Binaries
  • 使用Erlang二进制文件的流库

  • Bit-level Binaries and Generalized Comprehensions in Erlang
  • Erlang中的位级二进制和广义理解

  • Applications, Implementation and Performance Evaluation of Bit Stream Programming in Erlang
  • Erlang中比特流编程的应用,实现和性能评估

#2


perl's pack and unpack ?

perl的包装和拆包?

#3


The Python bitstring module was written for this purpose. It lets you take arbitary slices of binary data and offers a number of different interpretations through Python properties. It also gives plenty of tools for constructing and modifying binary data.

Python bitstring模块是为此目的而编写的。它允许您获取二进制数据的任意切片,并通过Python属性提供许多不同的解释。它还提供了大量用于构造和修改二进制数据的工具。

For example:

>>> from bitstring import BitArray, ConstBitStream
>>> s = BitArray('0x00cf')                           # 16 bits long
>>> print(s.hex, s.bin, s.int)                       # Some different views
00cf 0000000011001111 207
>>> s[2:5] = '0b001100001'                           # slice assignment
>>> s.replace('0b110', '0x345')                      # find and replace
2                                                    # 2 replacements made
>>> s.prepend([1])                                   # Add 1 bit to the start
>>> s.byteswap()                                     # Byte reversal
>>> ordinary_string = s.bytes                        # Back to Python string

There are also functions for bit-wise reading and navigation in the bitstring, much like in files; in fact this can be done straight from a file without reading it into memory:

在bitstring中还有按位读取和导航的功能,就像在文件中一样;实际上,这可以直接从文件中完成,而无需将其读入内存:

>>> s = ConstBitStream(filename='somefile.ext')
>>> hex_code, a, b = s.readlist('hex:32, uint:7, uint:13')
>>> s.find('0x0001')         # Seek to next occurence, if found
True

There are also views with different endiannesses as well as the ability to swap endianness and much more - take a look at the manual.

还有具有不同字节序的视图以及交换字节序的能力等等 - 请查看手册。

#4


Take a look at python bitstring, it looks like exactly what you want :)

看看python bitstring,它看起来就像你想要的:)

#5


I'm using 010 Editor to view binary files all the time to view binary files. It's especially geared to work with binary files.

我正在使用010 Editor来查看二进制文件以查看二进制文件。它特别适合使用二进制文件。

It has an easy to use c-like scripting language to parse binary files and present them in a very readable way (as a tree, fields coded by color, stuff like that).. There are some example scripts to parse zipfiles and bmpfiles.

它有一个易于使用的类似c的脚本语言来解析二进制文件并以一种非常易读的方式呈现它们(如树,按颜色编码的字段,类似的东西)..有一些示例脚本来解析zipfiles和bmpfiles。

Whenever I create a binary file format, I always make a little script for 010 editor to view the files. If you've got some header files with some structs, making a reader for binary files is a matter of minutes.

每当我创建二进制文件格式时,我总是为010编辑器制作一个小脚本来查看文件。如果你有一些带有一些结构的头文件,那么为二进制文件制作一个阅读器只需几分钟。

#6


Any high-level programming language with pack/unpack functions will do. All 3 Perl, Python and Ruby can do it. It's matter of personal preference. I wrote a bit of binary parsing in each of these and felt that Ruby was easiest/most elegant for this task.

任何带有打包/解包功能的高级编程语言都可以。所有3 Perl,Python和Ruby都可以做到。这是个人喜好的问题。我在每一个中都写了一些二进制解析,觉得Ruby对于这个任务来说是最容易/最优雅的。

#7


Why not use a C interpreter? I always used them to experiment with snippets, but you could use one to script something like you describe without too much trouble.

为什么不使用C语言翻译?我总是用它们来试验片段,但是你可以使用它来编写你描述的东西而不会有太多麻烦。

I have always liked EiC. It was dead, but the project has been resurrected lately. EiC is surprisingly capable and reasonably quick. There is also CINT. Both can be compiled for different platforms, though I think CINT needs Cygwin on windows.

我一直很喜欢EiC。它已经死了,但该项目最近已经复活了。 EiC令人惊讶的能力和相当快的速度。还有CINT。两者都可以针对不同的平台进行编译,不过我认为CINT需要在Windows上使用Cygwin。

#8


Python's standard library has some of what you require -- the array module in particular lets you easily read parts of binary files, swap endianness, etc; the struct module allows for finer-grained interpretation of binary strings. However, neither is quite as rich as you require: for example, to present the same data as bytes or halfwords, you need to copy it between two arrays (the numpy third-party add-on is much more powerful for interpreting the same area of memory in several different ways), and, for example, to display some bytes in hex there's nothing much "bundled" beyond a simple loop or list comprehension such as [hex(b) for b in thebytes[start:stop]]. I suspect there are reusable third-party modules to facilitate such tasks yet further, but I can't point you to one...

Python的标准库具有您需要的一些功能 - 特别是数组模块可以让您轻松读取部分二进制文件,交换字节序等; struct模块允许对二进制字符串进行更精细的解释。但是,它们都不如您所需的那么丰富:例如,要呈现与字节或半字相同的数据,您需要在两个数组之间复制它(numpy第三方附加组件对于解释相同区域更加强大例如,以十六进制显示一些字节,除了简单的循环或列表理解之外没有太多“捆绑”,例如[字节(b)代表字节[start:stop]]中的b。我怀疑有可重用的第三方模块可以进一步促进这些任务,但我不能指出你一个......

#9


Forth can also be pretty good at this, but it's a bit arcane.

Forth也可以相当不错,但它有点神秘。

#10


Well, if speed is not a consideration, and you want perl, then translate each line of binary into a line of chars - 0's and 1's. Yes, I know there are no linefeeds in binary :) but presumably you have some fixed size -- e.g. by byte or some other unit, with which you can break up the binary blob.

好吧,如果速度不是考虑因素,并且你想要perl,那么将每行二进制转换成一行字符 - 0和1。是的,我知道二进制文件中没有换行符:)但可能你有一些固定的大小 - 例如按字节或其他单位,您可以使用它来分解二进制blob。

Then just use the perl string processing on that data :)

然后只对该数据使用perl字符串处理:)

#11


If you're doing binary level processing, it is very low level and likely needs to be very efficient and have minimal dependencies/install requirements.

如果您正在进行二进制级别处理,则它的级别非常低,并且可能需要非常高效并且具有最小的依赖性/安装要求。

So I would go with C - handles bytes well - and you can probably google for some library packages that handle bytes.

所以我会使用C - 处理字节很好 - 你可以google一些处理字节的库包。

Going with something like Erlang introduces inefficiencies, dependencies, and other baggage you probably don't want with a low-level library.

使用像Erlang这样的东西会引入低效率,依赖性以及您可能不希望使用低级库的其他行李。


推荐阅读
  • Python正则表达式学习记录及常用方法
    本文记录了学习Python正则表达式的过程,介绍了re模块的常用方法re.search,并解释了rawstring的作用。正则表达式是一种方便检查字符串匹配模式的工具,通过本文的学习可以掌握Python中使用正则表达式的基本方法。 ... [详细]
  • 微软头条实习生分享深度学习自学指南
    本文介绍了一位微软头条实习生自学深度学习的经验分享,包括学习资源推荐、重要基础知识的学习要点等。作者强调了学好Python和数学基础的重要性,并提供了一些建议。 ... [详细]
  • 本文介绍了lua语言中闭包的特性及其在模式匹配、日期处理、编译和模块化等方面的应用。lua中的闭包是严格遵循词法定界的第一类值,函数可以作为变量自由传递,也可以作为参数传递给其他函数。这些特性使得lua语言具有极大的灵活性,为程序开发带来了便利。 ... [详细]
  • CF:3D City Model(小思维)问题解析和代码实现
    本文通过解析CF:3D City Model问题,介绍了问题的背景和要求,并给出了相应的代码实现。该问题涉及到在一个矩形的网格上建造城市的情景,每个网格单元可以作为建筑的基础,建筑由多个立方体叠加而成。文章详细讲解了问题的解决思路,并给出了相应的代码实现供读者参考。 ... [详细]
  • 怎么在PHP项目中实现一个HTTP断点续传功能发布时间:2021-01-1916:26:06来源:亿速云阅读:96作者:Le ... [详细]
  • 本文介绍了在处理不规则数据时如何使用Python自动提取文本中的时间日期,包括使用dateutil.parser模块统一日期字符串格式和使用datefinder模块提取日期。同时,还介绍了一段使用正则表达式的代码,可以支持中文日期和一些特殊的时间识别,例如'2012年12月12日'、'3小时前'、'在2012/12/13哈哈'等。 ... [详细]
  • 合并列值-合并为一列问题需求:createtabletab(Aint,Bint,Cint)inserttabselect1,2,3unionallsel ... [详细]
  • 本文整理了315道Python基础题目及答案,帮助读者检验学习成果。文章介绍了学习Python的途径、Python与其他编程语言的对比、解释型和编译型编程语言的简述、Python解释器的种类和特点、位和字节的关系、以及至少5个PEP8规范。对于想要检验自己学习成果的读者,这些题目将是一个不错的选择。请注意,答案在视频中,本文不提供答案。 ... [详细]
  • Python中的PyInputPlus模块原文:https ... [详细]
  • 本文介绍了brain的意思、读音、翻译、用法、发音、词组、同反义词等内容,以及脑新东方在线英语词典的相关信息。还包括了brain的词汇搭配、形容词和名词的用法,以及与brain相关的短语和词组。此外,还介绍了与brain相关的医学术语和智囊团等相关内容。 ... [详细]
  • PHP图片截取方法及应用实例
    本文介绍了使用PHP动态切割JPEG图片的方法,并提供了应用实例,包括截取视频图、提取文章内容中的图片地址、裁切图片等问题。详细介绍了相关的PHP函数和参数的使用,以及图片切割的具体步骤。同时,还提供了一些注意事项和优化建议。通过本文的学习,读者可以掌握PHP图片截取的技巧,实现自己的需求。 ... [详细]
  • 展开全部下面的代码是创建一个立方体Thisexamplescreatesanddisplaysasimplebox.#Thefirstlineloadstheinit_disp ... [详细]
  • C++字符字符串处理及字符集编码方案
    本文介绍了C++中字符字符串处理的问题,并详细解释了字符集编码方案,包括UNICODE、Windows apps采用的UTF-16编码、ASCII、SBCS和DBCS编码方案。同时说明了ANSI C标准和Windows中的字符/字符串数据类型实现。文章还提到了在编译时需要定义UNICODE宏以支持unicode编码,否则将使用windows code page编译。最后,给出了相关的头文件和数据类型定义。 ... [详细]
  • JDK源码学习之HashTable(附带面试题)的学习笔记
    本文介绍了JDK源码学习之HashTable(附带面试题)的学习笔记,包括HashTable的定义、数据类型、与HashMap的关系和区别。文章提供了干货,并附带了其他相关主题的学习笔记。 ... [详细]
  • 本文介绍了贝叶斯垃圾邮件分类的机器学习代码,代码来源于https://www.cnblogs.com/huangyc/p/10327209.html,并对代码进行了简介。朴素贝叶斯分类器训练函数包括求p(Ci)和基于词汇表的p(w|Ci)。 ... [详细]
author-avatar
小小的dream
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有