偶然间在stackoverflow上看到下面这个问题:
Please explain me what does this piece of code do. h should be 32Byte result from sha256 calculation. I am rewriting parts of this code for my project in C++ and I'm not sure if this switches byte order per 4byte chunk or change byte order on whole 32byte number. def reverse_hash(h): return struct.pack('>IIIIIIII', *struct.unpack('>IIIIIIII', h)[::-1])[::-1] And, how does this array index work ? [::-1] Thanks for any and all info
Python的splice到还好理解.但对于代码里struct的使用倒是很是疑惑.
遂搜索struct module的使用.
This module performs conversions between Python values and C structs represented as Python strings.
用法也就参考文档.
当遇到字节序的时候,产生了疑惑.
不同的架构有不同的字节序.大致有三种,大端(Big-endian),小端(Little-endian),双端(Bi-endian).(貌似还有Middle-endian).
简单来说,
大端是高位字节在低地址处,低位字节在高地址处,
小端是低位字节在低地址处,高位字节在高地址处.
双端是字节序可以配置.
理解:
1. 内存中的数据写进去就不再改变.只是解析的顺序不同才有大端,小端一说.
2.
字节序大端小端之说针对的是单个内存单元之内的字节顺序.单元与单元之间只是按地址线性增长.
先看wiki上的一个例子:
字符串"XRAY"的存储分配.
XRAY 字符值表:
X 0x58 R 0x52 A 0x41 Y 0x59
- character int value
以一个字节为存储单元:
...
"Y"
"A"
"R"
"X"
...
- addresses from right to left
以两个字节为单位:
要表示"XRAY",内存实际分布:
...
"AY"
"XR"
...
- addresses from right to left
测试代码:
# coding: utf-8
import struct
s="XRAY"
little_s_uchar_hex=map(hex,struct.unpack("BBBB",s))
print "big_s_uchar_hex:",big_s_uchar_hex
big_s_ushort_hex=map(hex,struct.unpack(">HH",s))
print "big_s_ushort_hex:",big_s_ushort_hex
#output:
'''
little_s_uchar_hex: ['0x58', '0x52', '0x41', '0x59']
little_s_ushort_hex: ['0x5258', '0x5941']
big_s_uchar_hex: ['0x58', '0x52', '0x41', '0x59']
big_s_ushort_hex: ['0x5852', '0x4159']
'''
观察little_s_ushort_hex的值.由于笔者使用的是x86的机子(小端字节序).
little_s_ushort_hex在内存中的存储序列是:
0x52 0x58 0x59 0x41
即为AYXR(地址从右向左增长)
和wiki中的表示相符.
再来看一个例子
将一个8位的字符串unpack成8个unsigned char,4个unsigned short,2个unsigned
int,1个unsigned long long
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | #!/usr/bin/env python2
# coding: utf-8
import struct
string='hjflyllx' # my prefered string
print ('string:%s' % string)
string_hex=map(hex,map(ord,string))
print ('-'*20)
print ('string_hex:')
print (string_hex)
little_uchar_string=struct.unpack("BBBBBBBB",string)
print ('big_uchar_string:')
print (big_uchar_string)
big_uchar_string_hex=map(hex,big_uchar_string)
print ('big_uchar_string_hex:')
print (big_uchar_string_hex)
little_ushort_string=struct.unpack("HHHH",string)
print ('big_ushort_string:')
print (big_ushort_string)
big_ushort_string_hex=map(hex,big_ushort_string)
print ('big_ushort_string_hex:')
print (big_ushort_string_hex)
little_uint_string=struct.unpack("II",string)
print ('big_uint_string:')
print (big_uint_string)
big_uint_string_hex=map(hex,big_uint_string)
print ('big_uint_string_hex:')
print (big_uint_string_hex)
little_ullong_string=struct.unpack("Q",string)
print ('big_ullong_string:')
print (big_ullong_string)
big_ullong_string_hex=map(hex,big_ullong_string)
print ('big_ullong_string_hex:')
print (big_ullong_string_hex)
#output:
'''
string:hjflyllx
--------------------
string_hex:
['0x68', '0x6a', '0x66', '0x6c', '0x79', '0x6c', '0x6c', '0x78']
--------------------uchar big and little endianness--------------------
little_uchar_string:
(104, 106, 102, 108, 121, 108, 108, 120)
little_uchar_string_hex:
['0x68', '0x6a', '0x66', '0x6c', '0x79', '0x6c', '0x6c', '0x78']
big_uchar_string:
(104, 106, 102, 108, 121, 108, 108, 120)
big_uchar_string_hex:
['0x68', '0x6a', '0x66', '0x6c', '0x79', '0x6c', '0x6c', '0x78']
--------------------ushort big and little endianness--------------------
little_ushort_string:
(27240, 27750, 27769, 30828)
little_ushort_string_hex:
['0x6a68', '0x6c66', '0x6c79', '0x786c']
big_ushort_string:
(26730, 26220, 31084, 27768)
big_ushort_string_hex:
['0x686a', '0x666c', '0x796c', '0x6c78']
--------------------uint big and little endianness--------------------
little_uint_string:
(1818651240, 2020371577)
little_uint_string_hex:
['0x6c666a68', '0x786c6c79']
big_uint_string:
(1751803500, 2037148792)
big_uint_string_hex:
['0x686a666c', '0x796c6c78']
--------------------ullong big and little endianness--------------------
little_ullong_string:
(8677429850801597032,)
little_ullong_string_hex:
['0x786c6c796c666a68']
big_ullong_string:
(7523938743555484792,)
big_ullong_string_hex:
['0x686a666c796c6c78']
'''
|
下面是一些表格,假设地址开始于100
address character hex value
- 100 h 0x68
- 101 j 0x6a
- 102 f 0x66
- 103 l 0x6c
- 104 y 0x79
- 105 l 0x6c
- 106 l 0x6c
- 107 x 0x78
-
string
address characters hex value
- 100 jh 0x6a68
- 102 lf 0x6c66
- 104 ly 0x6c79
- 106 xl 0x786c
-
little ushort
address characters hex value
- 100 lfjh 0x6c666a68
- 104 xlly 0x786c6c79
-
little uint
address characters hex value
- 100 xllylfjh 0x786c6c796c666a68
-
little ulonglong
uchar那一项可以看出当内存单元大小是一个字节时,大端,小端字节序是一样的.
而其它多于1个字节的内存单元,可以看到相对应的项的字节顺序正好颠倒.但单元与单元之间的顺势都是递增的.
现在我们来看其中一个人的回答:
>>> h = ''.join(map(str, range(0,21))) >>> h '01234567891011121314151617181920' >>> struct.pack('>IIIIIIII', *struct.unpack('>IIIIIIII', h)[::-1])[::-1] '32107654019821114131615181710291' Equivalent expression: >>> struct.pack('<IIIIIIII', *struct.unpack('>IIIIIIII', h)) '32107654019821114131615181710291'
主要看其给出的相等实现:
>>> struct.pack('<IIIIIIII', *struct.unpack('>IIIIIIII', h))
为什么这个也能得出相同的结果?
采用不同的字节序进行unpack,pack一个字符串,就能得出单元内存内的字符串翻转.
你应该知道了为什么吧!
同样假设开始内存地址是100,我们只分析一个内存单元(4个字节),
见表:
address character
100 '0' 101 '1' 102 '2' 103 '3'
先是以大端字节序来unpack,读出的内容就是'0123'的内存表示的整数.
然后以小端来pack,小端是低位在前,高位在后,进行继续读,从103-100,读到的也就是'3210'了.
参考链接:
http://docs.python.org/2/library/struct.html
http://en.wikipedia.org/wiki/Endianness
http://stackoverflow.com/questions/20882693/what-does-this-piece-of-python-indexing-code-do
Comments