Cpython的整数对象

0. 参考资料

本文参考资料如下：

B站 CPython源码分析:

P4 4.long对象解析和Hook

1. 整数类型

首先我们要有两个知识点：

在cpython的上层（也就是python）中看到的int型，在cpython中是以Long类型来实现的。
在cpython中对于整数将分为大整数与小整数两种情况进行处理

1.1. 整数对象的头文件定义

我们先看一下cpython中关于整数对象的头文件：

下面的代码摘录自: cpython源码3.8分支的
Include/longobject.h
文件中：

/* Long (arbitrary precision) integer object interface */

typedef struct _longobject PyLongObject; /* Revealed in longintrepr.h */

PyAPI_DATA(PyTypeObject) PyLong_Type;

在上面的代码中说明了
_longobject
的定义在
Include/longintrepr.h
文件，如下：

/* Long integer representation.
   The absolute value of a number is equal to
        SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
   Negative numbers are represented with ob_size < 0;
   zero is represented by ob_size == 0.
   In a normalized number, ob_digit[abs(ob_size)-1] (the most significant
   digit) is never zero.  Also, in all cases, for all valid i,
        0 <= ob_digit[i] <= MASK.
   The allocation function takes care of allocating extra memory
   so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available.

   CAUTION:  Generic code manipulating subtypes of PyVarObject has to
   aware that ints abuse  ob_size's sign bit.
*/

struct _longobject {
    PyObject_VAR_HEAD
    digit ob_digit[1];
};

解释：

PyObject_VAR_HEAD
极重要，它的介绍，可以参考我单独写的《cpython中的PyObject等对象入门》。
digit ob_digit[1];
: 上面的注释有详细的解释，但是先不用深入理解。但需要知道如下信息：
1. 它是1维数组，默认情况下数组中只有1个元素
2. ob_digit
  数组的长度与
  ob_size
  有着密切的关系，而
  ob_size
  的定义在
  PyObject_VAR_HEAD
  中，表示的是可变长度对象的元素个数。

1.2. 整数对象的c代码实现

接着看一下整数对象的C代码的实现：

下面的代码摘录自: cpython源码3.8分支的
Objects/longobject.c
文件中：

PyTypeObject PyLong_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "int",                                      /* tp_name */
    offsetof(PyLongObject, ob_digit),           /* tp_basicsize */
    sizeof(digit),                              /* tp_itemsize */
    0,                                          /* tp_dealloc */
    0,                                          /* tp_vectorcall_offset */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_as_async */
    long_to_decimal_string,                     /* tp_repr */
    &long_as_number,                            /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    (hashfunc)long_hash,                        /* tp_hash */
    0,                                          /* tp_call */
    0,                                          /* tp_str */
    PyObject_GenericGetAttr,                    /* tp_getattro */
    0,                                          /* tp_setattro */
    0,                                          /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE |
        Py_TPFLAGS_LONG_SUBCLASS,               /* tp_flags */
    long_doc,                                   /* tp_doc */
    0,                                          /* tp_traverse */
    0,                                          /* tp_clear */
    long_richcompare,                           /* tp_richcompare */
    0,                                          /* tp_weaklistoffset */
    0,                                          /* tp_iter */
    0,                                          /* tp_iternext */
    long_methods,                               /* tp_methods */
    0,                                          /* tp_members */
    long_getset,                                /* tp_getset */
    0,                                          /* tp_base */
    0,                                          /* tp_dict */
    0,                                          /* tp_descr_get */
    0,                                          /* tp_descr_set */
    0,                                          /* tp_dictoffset */
    0,                                          /* tp_init */
    0,                                          /* tp_alloc */
    long_new,                                   /* tp_new */
    PyObject_Del,                               /* tp_free */
};

解释：

PyLong_Type
的类型是
PyTypeObject
，这里定义了这个类型的各种具体值。
1. 在
  tp_as_number
  ,
  tp_as_sequence
  ,
  tp_as_mapping
  中，只有第一个进行赋值，其余均设置为0
2. tp_doc
  定义为
  long_doc
3. 其他还有很多，就不多介绍了
关于
PyTypeObject
的介绍，可以参考我单独写的《cpython中的PyObject等对象入门》。

1.3. 简单的举例说明

在上面我们介绍了python中Long类型结构体中的各种成员，其中一个是
tp_doc
, 我们可以看到
long_doc
的定义如下：

PyDoc_STRVAR(long_doc,
"int([x]) -> integer\n\
int(x, base=10) -> integer\n\
\n\
Convert a number or string to an integer, or return 0 if no arguments\n\
are given.  If x is a number, return x.__int__().  For floating point\n\
numbers, this truncates towards zero.\n\
\n\
If x is not a number or if base is given, then x must be a string,\n\
bytes, or bytearray instance representing an integer literal in the\n\
given base.  The literal can be preceded by '+' or '-' and be surrounded\n\
by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.\n\
Base 0 means to interpret the base from the string as an integer literal.\n\
>>> int('0b100', base=0)\n\
4");

我们在交互式界面中作一个简单的测试，验证python的int类型的文档是不是这样：

请添加图片描述

可以看到完全一致。

2. python的小整数

你可能有点好奇，为什么python要将整数分为大整数与小整数两种情况来处理?

原因：在python中小整数会大量的使用到，如果每次使用时都创建对象，然后不用再释放, 这样明显会拖慢程序运行速度。所以python在初始化阶段会将一部分小整数对象实例化后，自动存放到内存池中，用的时候直接拿，不再重新申请与释放。

简要概括: 这么设计是python针对小整数的对象使用对象池技术进行的优化

下面的截图可以验证这个问题：

请添加图片描述

那么自然而然的问题是：python中到底哪些数字为小整数？

答案在：cpython源码3.8分枝的
Objects/longobject.c
文件中, 文中先定义了两个宏：

NSMALLPOSINTS
: 存放正的小整数
NSMALLNEGINTS
: 存放负的小整数

如下：

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

然后定义了方法与宏判断入参是否是小整数：

get_small_int
: 获取小整数
CHECK_SMALL_INT
: 通过定义宏来检查是否是小整数

代码：

static PyObject *
get_small_int(sdigit ival)
{
    PyObject *v;
    assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
    v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
    Py_INCREF(v);
#ifdef COUNT_ALLOCS
    if (ival >= 0)
        _Py_quick_int_allocs++;
    else
        _Py_quick_neg_int_allocs++;
#endif
    return v;
}
#define CHECK_SMALL_INT(ival) \
    do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
        return get_small_int((sdigit)ival); \
    } while(0)

3. python的大整数

我们主要看一下大整数对象申请内存的C代码的实现：