浅浅研究了一下 .pyc 的文件结构。
.pyc 是 Python 字节码文件,可以由 Python 虚拟机来执行。
在 Python 3 中,Python 会自动在 __pycache__ 目录里,缓存每个模块编译后的版本,名称为 module.version.pyc ,这就是 Python 字节码文件。其中 version 一般使用 Python 版本号。
字节码在 Python 虚拟机程序里对应的是 PyCodeObject 对象。.pyc 文件是字节码在磁盘上的表现形式。
PyCodeObject 对象的创建时机是模块加载的时候,即 import。
如果使用 python test.py 命令会对 test.py 进行编译成字节码并解释执行,但是不会生成 test.pyc 。
如果 test.py 加载了其他模块,如 import util ,Python 会对 util.py 进行编译成字节码,生成 util.pyc ,然后对字节码解释执行。
如果想生成 test.pyc ,我们可以使用Python内置模块py_compile来编译。
加载模块时,如果同时存在 .py 和 .pyc ,Python 会尝试使用 .pyc ;如果 .pyc 的编译时间早于 .py 的修改时间,则重新编译 .py 并更新 .pyc 。
Python 的原始代码在运行前都会被先编译成字节码(二进制),并把编译的结果保存到:
一个四字节 magic number 
一个四字节的时间戳 
一个 PyCodeObject 
 
把这三部分在内存中以 marshal 格式保存为文件,即 pyc 文件。
iPlayForSG 大佬博客里整理了下面的 Magic Number 对照表。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 Known values: #  Python 1.5:   20121 #  Python 1.5.1: 20121 #     Python 1.5.2: 20121 #     Python 1.6:   50428 #     Python 2.0:   50823 #     Python 2.0.1: 50823 #     Python 2.1:   60202 #     Python 2.1.1: 60202 #     Python 2.1.2: 60202 #     Python 2.2:   60717 #     Python 2.3a0: 62011 #     Python 2.3a0: 62021 #     Python 2.3a0: 62011 (!) #     Python 2.4a0: 62041 #     Python 2.4a3: 62051 #     Python 2.4b1: 62061 #     Python 2.5a0: 62071 #     Python 2.5a0: 62081 (ast-branch) #     Python 2.5a0: 62091 (with) #     Python 2.5a0: 62092 (changed WITH_CLEANUP opcode) #     Python 2.5b3: 62101 (fix wrong code: for x, in ...) #     Python 2.5b3: 62111 (fix wrong code: x += yield) #     Python 2.5c1: 62121 (fix wrong lnotab with for loops and #                          storing constants that should have been removed) #     Python 2.5c2: 62131 (fix wrong code: for x, in ... in listcomp/genexp) #     Python 2.6a0: 62151 (peephole optimizations and STORE_MAP opcode) #     Python 2.6a1: 62161 (WITH_CLEANUP optimization) #     Python 2.7a0: 62171 (optimize list comprehensions/change LIST_APPEND) #     Python 2.7a0: 62181 (optimize conditional branches: #                          introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE) #     Python 2.7a0  62191 (introduce SETUP_WITH) #     Python 2.7a0  62201 (introduce BUILD_SET) #     Python 2.7a0  62211 (introduce MAP_ADD and SET_ADD) #     Python 3000:   3000 #                    3010 (removed UNARY_CONVERT) #                    3020 (added BUILD_SET) #                    3030 (added keyword-only parameters) #                    3040 (added signature annotations) #                    3050 (print becomes a function) #                    3060 (PEP 3115 metaclass syntax) #                    3061 (string literals become unicode) #                    3071 (PEP 3109 raise changes) #                    3081 (PEP 3137 make __file__ and __name__ unicode) #                    3091 (kill str8 interning) #                    3101 (merge from 2.6a0, see 62151) #                    3103 (__file__ points to source file) #     Python 3.0a4: 3111 (WITH_CLEANUP optimization). #     Python 3.0b1: 3131 (lexical exception stacking, including POP_EXCEPT                           #3021) #     Python 3.1a1: 3141 (optimize list, set and dict comprehensions: #                         change LIST_APPEND and SET_ADD, add MAP_ADD #2183) #     Python 3.1a1: 3151 (optimize conditional branches: #                         introduce POP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE                           #4715) #     Python 3.2a1: 3160 (add SETUP_WITH #6101) #                   tag: cpython-32 #     Python 3.2a2: 3170 (add DUP_TOP_TWO, remove DUP_TOPX and ROT_FOUR #9225) #                   tag: cpython-32 #     Python 3.2a3  3180 (add DELETE_DEREF #4617) #     Python 3.3a1  3190 (__class__ super closure changed) #     Python 3.3a1  3200 (PEP 3155 __qualname__ added #13448) #     Python 3.3a1  3210 (added size modulo 2**32 to the pyc header #13645) #     Python 3.3a2  3220 (changed PEP 380 implementation #14230) #     Python 3.3a4  3230 (revert changes to implicit __class__ closure #14857) #     Python 3.4a1  3250 (evaluate positional default arguments before #                        keyword-only defaults #16967) #     Python 3.4a1  3260 (add LOAD_CLASSDEREF; allow locals of class to override #                        free vars #17853) #     Python 3.4a1  3270 (various tweaks to the __class__ closure #12370) #     Python 3.4a1  3280 (remove implicit class argument) #     Python 3.4a4  3290 (changes to __qualname__ computation #19301) #     Python 3.4a4  3300 (more changes to __qualname__ computation #19301) #     Python 3.4rc2 3310 (alter __qualname__ computation #20625) #     Python 3.5a1  3320 (PEP 465: Matrix multiplication operator #21176) #     Python 3.5b1  3330 (PEP 448: Additional Unpacking Generalizations #2292) #     Python 3.5b2  3340 (fix dictionary display evaluation order #11205) #     Python 3.5b3  3350 (add GET_YIELD_FROM_ITER opcode #24400) #     Python 3.5.2  3351 (fix BUILD_MAP_UNPACK_WITH_CALL opcode #27286) #     Python 3.6a0  3360 (add FORMAT_VALUE opcode #25483) #     Python 3.6a1  3361 (lineno delta of code.co_lnotab becomes signed #26107) #     Python 3.6a2  3370 (16 bit wordcode #26647) #     Python 3.6a2  3371 (add BUILD_CONST_KEY_MAP opcode #27140) #     Python 3.6a2  3372 (MAKE_FUNCTION simplification, remove MAKE_CLOSURE #                         #27095) #     Python 3.6b1  3373 (add BUILD_STRING opcode #27078) #     Python 3.6b1  3375 (add SETUP_ANNOTATIONS and STORE_ANNOTATION opcodes #                         #27985) #     Python 3.6b1  3376 (simplify CALL_FUNCTIONs & BUILD_MAP_UNPACK_WITH_CALL                           #27213) #     Python 3.6b1  3377 (set __class__ cell from type.__new__ #23722) #     Python 3.6b2  3378 (add BUILD_TUPLE_UNPACK_WITH_CALL #28257) #     Python 3.6rc1 3379 (more thorough __class__ validation #23722) #     Python 3.7a1  3390 (add LOAD_METHOD and CALL_METHOD opcodes #26110) #     Python 3.7a2  3391 (update GET_AITER #31709) #     Python 3.7a4  3392 (PEP 552: Deterministic pycs #31650) #     Python 3.7b1  3393 (remove STORE_ANNOTATION opcode #32550) #     Python 3.7b5  3394 (restored docstring as the first stmt in the body; #                         this might affected the first line number #32911) #     Python 3.8a1  3400 (move frame block handling to compiler #17611) #     Python 3.8a1  3401 (add END_ASYNC_FOR #33041) #     Python 3.8a1  3410 (PEP570 Python Positional-Only Parameters #36540) #     Python 3.8b2  3411 (Reverse evaluation order of key: value in dict #                         comprehensions #35224) #     Python 3.8b2  3412 (Swap the position of positional args and positional #                         only args in ast.arguments #37593) #     Python 3.8b4  3413 (Fix "break" and "continue" in "finally" #37830) 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 typedef  struct  {    PyObject_HEAD     int  co_argcount;             int  co_nlocals;              int  co_stacksize;            int  co_flags;                PyObject *co_code;           PyObject *co_consts;         PyObject *co_names;          PyObject *co_varnames;       PyObject *co_freevars;       PyObject *co_cellvars;            PyObject *co_filename;       PyObject *co_name;           int  co_firstlineno;          PyObject *co_lnotab;         void  *co_zombieframe;        PyObject *co_weakreflist; } PyCodeObject; 
所有的 PyCodeObject 都是通过调用以下的函数得以运行:
1 PyObject * PyEval_EvalFrameEx (PyFrameObject *f, int  throwflag)  
这个函数是 Python 的一个重量级的函数,他的作用即是执行中间码,Python 的代码都是通过调用这个函数来运行的。
PyFrameObject 这个数据结构:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 typedef  struct  _frame  {    PyObject_VAR_HEAD       struct  _frame  *f_back ;     PyCodeObject *f_code;          PyObject *f_builtins;          PyObject *f_globals;           PyObject *f_locals;            PyObject **f_valuestack;                   PyObject **f_stacktop;       PyObject *f_trace;                     PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;       PyThreadState *f_tstate;       int  f_lasti;                       int  f_lineno;              int  f_iblock;              PyTryBlock f_blockstack[CO_MAXBLOCKS];        PyObject *f_localsplus[1 ];     } PyFrameObject; 
参考资料:
Python 字节码 https://www.cnblogs.com/ningjing213/p/16224595.html  
Python 程序的执行原理 http://tech.uc.cn/?p=1932、  
.pyc 文件的结构 https://blog.csdn.net/weixin_45055269/article/details/105682945  
python Magic Number对照表以及pyc修复方法 https://www.cnblogs.com/Here-is-SG/p/15885799.html