python 处理二进制数据正确方法是什么？

我有一个 tuple ，内容是 12 个 int ，范围是 0-255 ，相当 c 里的数组 unsigned char[12], 共 96bit 现在想按每 6 位切分这个数组为 16 个，也是 96bit 。
采用方法：从低位开始，对字节位运算留 6 位，输出一个值，剩下的高位于下一个字节连接成新 byte ，重复直到够 16 个输出
这种方法看起来很 c ，有 pythonic 的方法吗？

irainy

2016-03-04 15:26:51 +08:00

py3>>> int("{:b}".format(255)[:6], base = 2)

mulog

2016-03-04 15:59:28 +08:00

你描述的不是很清楚啊我假设是这样
比如说你的 tuple 是 (1,127...)
也就是 -> 00000001 01111111
然后你说从低位开始取六个嘛那就是
-> 100000
然后高位和下一个字节的低位连起来再取六个
-> 00 1111
然后
- > 1110xx
最后假设你需要把它再转回成 int

乱写一个：
# assuming t is the tuple holding 12 ints
bits = "".join([bin(i)[2:].zfill(8)[::-1] for i in t])
result = [int(bits[i:i+6], 2) for i in range(0, len(bits), 6)]

wentian

2016-03-04 17:22:10 +08:00

Python 有 struct 包啊, 我经常拿这个修改二进制文件
楼上都是些什么答案

2016-03-04 17:50:43 +08:00

自带包 struct + 1 ，而且字节运算基本都看着比较笨，没什么 pythonic 不 pythonic 的
上面的答案我也不是很懂

chinuno

2016-03-04 17:55:54 +08:00

struct 好像只能一次处理 8 位
要 6 位只能自己标记处理

raiz

2016-03-04 19:31:15 +08:00

Struct 的单位最小也是字节的，所以避免不了手动移位拼接
````
SPLIT_SIZE_BIT = 6
JOINED_SIZE_BIT = 8
SPLIT_MASK = 0X3F
SPLIT_LEN = 16

....
def split(self, t):
in_iter = iter(t)
splited = []
remain = 0
remain_bit_cnt = 0
while len(splited) < SPLIT_LEN:
while remain_bit_cnt < SPLIT_SIZE_BIT:
b = next(in_iter)
b <<= remain_bit_cnt
remain |= b
remain_bit_cnt += JOINED_SIZE_BIT
splited.append(remain & SPLIT_MASK)
remain >>= SPLIT_SIZE_BIT
remain_bit_cnt -= SPLIT_SIZE_BIT
# print(splited)
return splited
....
````

leavic

2016-03-04 21:03:12 +08:00

struct 包＋ 1 ，我处理 hex 文件就靠这个

test0x01

2016-03-04 21:31:38 +08:00

看需求吧，如果这个功能被调用的非常频繁的话我宁愿用 c 写出来。让 python 调用

mulog

2016-03-04 21:38:28 +08:00

@raiz 喏。。之前误解了你的需求稍微改了一点
比你贴上来的慢大概 50%左右不过如果这部分性能很重要还不如直接 C 。。
https://gist.github.com/mulog1990/50e1f3d8993db801663e

savebox

2016-03-05 09:02:23 +08:00

import base64, string
t=range(12)
E = string.maketrans('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',''.join([chr(c) for c in range(64)]))
print [ord(c) for c in base64.b64encode("".join([chr(c) for c in t[::-1]])).translate(E)][::-1]

速度应该更快一点

calease

2016-03-05 12:11:36 +08:00

楼上上的用 re.findall('......', x)切分 string 可以 one liner
[int(x, 2) for x in re.findall('......', "".join([bin(i)[2:].zfill(8) for i in t][::-1]))]

ruoyu0088

2016-03-05 19:53:20 +08:00

Python3 中可以使用 int.from_bytes 将一个字节序列转换为整数：

import random

data = [random.randint(0, 255) for _ in range(12)]
x = int.from_bytes(bytearray(data), "big")
r = [(x >> i) & 0x3f for i in range(90, -1, -6)]

raiz

2016-03-05 23:51:03 +08:00

@ruoyu0088 this one should be elegent and efficient

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/261098

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.