从DataFrame中删除列

在操作数据的时候，
DataFrame
对象中删除一个或多个列是常见的操作，并且实现方法较多，然而这中间有很多细节值得关注。

首先，一般被认为是“正确”的方法，是使用
DataFrame
的
drop
方法，之所以这种方法被认为是标准的方法，可能是收到了SQL语句中使用
drop
实现删除操作的影响。

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(25).reshape((5,5)), columns=list("abcde"))

display(df)

try:
    df.drop('b')
except KeyError as ke:
    print(ke)

    a   b   c   d   e
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14
3  15  16  17  18  19
4  20  21  22  23  24
"['b'] not found in axis"

上面的操作中出现了报错信息，什么原因？这是因为
drop
方法中，默认是删除行。

如果用
axis=0
或
axis='rows'
，都表示展出行，也可用
labels
参数删除行。

df.drop(0)                # drop a row, on axis 0 or 'rows'
df.drop(0, axis=0)        # same
df.drop(0, axis='rows')   # same
df.drop(labels=0)         # same
df.drop(labels=[0])       # same

# 结果
    a   b   c   d   e
1   5   6   7   8   9
2  10  11  12  13  14
3  15  16  17  18  19
4  20  21  22  23  24

如何删除列

如何删除列？可以指定
axis
或使用
columns
参数，如下所示：

df.drop('b', axis=1)         # drop a column
df.drop('b', axis='columns') # same
df.drop(columns='b')         # same
df.drop(columns=['b'])       # same

# 输出
    a   c   d   e
0   0   2   3   4
1   5   7   8   9
2  10  12  13  14
3  15  17  18  19
4  20  22  23  24

这样就删除了一列，注意，删除之后，返回了新的对象，这意味着，你可以用一个新的变量引用删除后得到的结果。如果要改变原有的DataFrame，可以增加一个参数
inplace=True
。

df2 = df.drop('b', axis=1)

print(df2.columns)
print(df.columns)

# result
Index(['a', 'c', 'd', 'e'], dtype='object')
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

同样值得注意的是，你可以通过同时使用
index
和
columns
，同时删除行和列，并且你可以传入多个值，即删除多行或者多列。

df.drop(index=[0,2], columns=['b','c'])

# result
    a   d   e
1   5   8   9
3  15  18  19
4  20  23  24

如果不使用
drop
方法，还可以通过索引实现同样的操作。有多种方式，这里列举一种，如下所示，用
.loc
和
.isin
并取反。

df.loc[~df.index.isin([0,2]), ~df.columns.isin(['b', 'c'])]

# result
    a   d   e
1   5   8   9
3  15  18  19
4  20  23  24

If none of that makes sense to you, I would suggest reading through my series on selecting and indexing in pandas, starting

here

.

如果这些对你来说都不是很清楚，建议参阅

《跟老齐学Python：数据分析》

中对此的详细说明。

另外的方法

除了上面演示的方法之外，还有别的方法可以删除列。

del df['a']
df

# result
    b   c   d   e
0   1   2   3   4
1   6   7   8   9
2  11  12  13  14
3  16  17  18  19
4  21  22  23  24

原来的
df['a']
没了，这就如同前面用
drop
方法时参数中使用了
inplace=True
一样，原地修改。

但是，不要认为
del
就能百试百灵，它会让你有迷茫的时候。

我们知道，如果用类似
df.b
这样访问属性的形式，也能得到DataFrame对象的列，虽然这种方法我不是很提倡使用，但很多数据科学的民工都这么干。

df.b

# result
0     1
1     6
2    11
3    16
4    21
Name: b, dtype: int64

这么干，如果仅仅是查看，也无所谓，但是：

del df.b

# result
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-0dca358a6ef9> in <module>
----> 1 del df.b

AttributeError: b

这就报错了。是不是很迷惑，为什
del df['b']
奏效，而
del df.b
无效？

这就是接下来要研究的了。必须通过对细节的剖析，才能搞清楚问题的根源。

首先，
del df['b']
有效，是因为DataFrame对象中实现了
__delitem__
方法，在执行
del df['b']
时会调用该方法。但是
del df.b
呢，有没有调用此方法呢？

为此，可以定义一个简单的类，这里暂用
dict
作为保存数据的容器，当然，这个类不是真正的
DataFrame
。

class StupidFrame:
    def __init__(self, columns):
        self.columns = columns
        
    def __delitem__(self, item):
        del self.columns[item]
        
    def __getitem__(self, item):
        return self.columns[item]
    
    def __setitem__(self, item, val):
        self.columns[item] = val
            
f = StupidFrame({'a': 1, 'b': 2, 'c': 3})
print("StupidFrame value for a:", f['a'])
print("StupidFrame columns: ", f.columns)
del f['b']
f.d = 4
print("StupidFrame columns: ", f.columns)

# result
StupidFrame value for a: 1
StupidFrame columns:  {'a': 1, 'b': 2, 'c': 3}
StupidFrame columns:  {'a': 1, 'c': 3}

认真观察上面的操作和
StupidFrame
代码，如果用
[]
对所创建的实例进行数据操作，可以实现删除、赋值、读取等。但是，当我们执行
f.d = 4
的操作时，并没有在
StupidFrame
中所创建的
columns
属性中增加键为
d
的键值对，而是为实例
f
增加了一个普通属性，名称是
d
。

因此，如果要让
f.d
与
f['d']
等效，还必须要在
StupidFrame
类中添加
__getattr__
方法，并使用
__setattr__
方法来处理设置问题（关于这两个方法的使用，请参阅

《Python大学实用教程》

中的详细介绍）。

class StupidFrameAttr:
    def __init__(self, columns):
        self.__dict__['columns'] = columns
        
    def __delitem__(self, item):
        del self.__dict__['columns'][item]
        
    def __getitem__(self, item):
        return self.__dict__['columns'][item]
    
    def __setitem__(self, item, val):
        self.__dict__['columns'][item] = val
        
    def __getattr__(self, item):
        if item in self.__dict__['columns']:
            return self.__dict__['columns'][item]
        elif item == 'columns':
            return self.__dict__[item]
        else:
            raise AttributeError
    
    def __setattr__(self, item, val):
        if item != 'columns':
            self.__dict__['columns'][item] = val
        else:
            raise ValueError("Overwriting columns prohibited") 

            
f = StupidFrameAttr({'a': 1, 'b': 2, 'c': 3})
print("StupidFrameAttr value for a", f['a'])
print("StupidFrameAttr columns: ", f.columns)
del f['b']
print("StupidFrameAttr columns: ", f.columns)
print("StupidFrameAttr value for a", f.a)
f.d = 4
print("StupidFrameAttr columns: ", f.columns)
del f['d']
print("StupidFrameAttr columns: ", f.columns)
f.d = 5
print("StupidFrameAttr columns: ", f.columns)
del f.d

# result
StupidFrameAttr value for a 1
StupidFrameAttr columns:  {'a': 1, 'b': 2, 'c': 3}
StupidFrameAttr columns:  {'a': 1, 'c': 3}
StupidFrameAttr value for a 1
StupidFrameAttr columns:  {'a': 1, 'c': 3, 'd': 4}
StupidFrameAttr columns:  {'a': 1, 'c': 3}
StupidFrameAttr columns:  {'a': 1, 'c': 3, 'd': 5}
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-fd29f59ea01e> in <module>
     39 f.d = 5
     40 print("StupidFrameAttr columns: ", f.columns)
---> 41 del f.d

AttributeError: d

现在删除属性也能够奏效了。

另外，还可以在类中重写
__delattr__
方法，如下所示：

class StupidFrameDelAttr(StupidFrameAttr):
    def __delattr__(self, item):
        # trivial implementation using the data model methods
        del self.__dict__['columns'][item]

f = StupidFrameDelAttr({'a': 1, 'b': 2, 'c': 3})
print("StupidFrameDelAttr value for a", f['a'])
print("StupidFrameDelAttr columns: ", f.columns)
del f['b']
print("StupidFrameDelAttr columns: ", f.columns)
print("StupidFrameDelAttr value for a", f.a)
f.d = 4
print("StupidFrameDelAttr columns: ", f.columns)
del f.d 
print("StupidFrameDelAttr columns: ", f.columns)

# result
StupidFrameDelAttr value for a 1
StupidFrameDelAttr columns:  {'a': 1, 'b': 2, 'c': 3}
StupidFrameDelAttr columns:  {'a': 1, 'c': 3}
StupidFrameDelAttr value for a 1
StupidFrameDelAttr columns:  {'a': 1, 'c': 3, 'd': 4}
StupidFrameDelAttr columns:  {'a': 1, 'c': 3}

现在，就理解了前面使用
del
删除
DataFrame
对象属性的方法出问题的根源了。当然，并不是说DataFrame对象的类就是上面那样的，而是用上面的方式简要说明了一下原因。

所以，在Pandas中要删除DataFrame的列，最好是用对象的
drop
方法。

另外，特别提醒，如果要创建新的列，也不要用
df.column_name
的方法，这也容易出问题。

参考文献

[1]. https://www.wrighters.io/how-to-remove-a-column-from-a-dataframe/

原文链接：https://blog.csdn.net/qiwsir/article/details/114867900

如何删除列

另外的方法

参考文献

你可能也喜欢