pandas怎么去除nan_如何删除在特定列中的值为NaN的Pandas DataFrame行

  • Post author:
  • Post category:其他


小编典典

这个问题已经解决,但是…

…还要考虑伍特(Wouter)在其原始评论中提出的解决方案。dropna()大熊猫内置了处理丢失数据(包括)的功能。除了通过手动执行可能会提高的性能之外,这些功能还带有多种可能有用的选项。

In [24]: df = pd.DataFrame(np.random.randn(10,3))

In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

In [26]: df

Out[26]:

0 1 2

0 NaN NaN NaN

1 2.677677 -1.466923 -0.750366

2 NaN 0.798002 -0.906038

3 0.672201 0.964789 NaN

4 NaN NaN 0.050742

5 -1.250970 0.030561 -2.678622

6 NaN 1.036043 NaN

7 0.049896 -0.308003 0.823295

8 NaN NaN 0.637482

9 -0.310130 0.078891 NaN

In [27]: df.dropna() #drop all rows that have any NaN values

Out[27]:

0 1 2

1 2.677677 -1.466923 -0.750366

5 -1.250970 0.030561 -2.678622

7 0.049896 -0.308003 0.823295

In [28]: df.dropna(how=’all’) #drop only if ALL columns are NaN

Out[28]:

0 1 2

1 2.677677 -1.466923 -0.750366

2 NaN 0.798002 -0.906038

3 0.672201 0.964789 NaN

4 NaN NaN 0.050742

5 -1.250970 0.030561 -2.678622

6 NaN 1.036043 NaN

7 0.049896 -0.308003 0.823295

8 NaN NaN 0.637482

9 -0.310130 0.078891 NaN

In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

0 1 2

1 2.677677 -1.466923 -0.750366

2 NaN 0.798002 -0.906038

3 0.672201 0.964789 NaN

5 -1.250970 0.030561 -2.678622

7 0.049896 -0.308003 0.823295

9 -0.310130 0.078891 NaN

In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)

Out[30]:

0 1 2

1 2.677677 -1.466923 -0.750366

2 NaN 0.798002 -0.906038

3 0.672201 0.964789 NaN

5 -1.250970 0.030561 -2.678622

6 NaN 1.036043 NaN

7 0.049896 -0.308003 0.823295

9 -0.310130 0.078891 NaN

还有其他选项(请参见http://pandas.pydata.org/pandas-docs/stable/generation/pandas.DataFrame.dropna.html上的文档),包括删除列而不是行。

2020-02-15



版权声明:本文为weixin_39870155原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。