padans 判断列是否为空_pandas 判断某一列数据是否在另一列中

Post author:xfxia
Post published:2023年8月23日
Post category:其他

判断某一列数据是否在另一列中是数据处理时的常用操作，假设某dataframe中有两列数据

data = [[‘北京’,’上海’,’深圳’,’广州’,’杭州’,’南京’,’武汉’,’成都’,’苏州’,’青岛’],

[‘上海’,’南京’,’杭州’,’苏州’,’无锡’,’广州’,’深圳’,’东莞’,’香港’,’澳门’]]

df = pd.DataFrame(data)

df = df.T

df.columns=[‘cities1′,’cities2’]

假设要判断cities2列中的数据是否也在cities1列中，首先想到的应该是in 或者 not in。

df.cities2 in df.cities1

然而这显然是不行的

TypeError: ‘Series’ objects are mutable, thus they cannot be hashed

整列不行，那么自然想到的是df.apply方法

df[df.apply(lambda x: x.cities2 in x.cities1)]

很不幸，依然报错

AttributeError: ‘Series’ object has no attribute ‘cities2’

这是由于apply默认axis=0，即是按列进行操作，而我们是需要按行操作，故得加上axis=1

df[df.apply(lambda x: x.cities2 in x.cities1, axis=1)]

这次运行没有错误，但是得到的结果不是我们想要的，因为以上语句的效果是依次比对每行的数据，故得不到我们想要的结果

df.cities2[df.apply(lambda x: x.cities2 in df.cities1.values, axis=1)]

至此，我们终于得到了想要的结果，即找出了包含在了cities1列中的cities2列中的数据。

同时我们也可以添加一列用于存取比对的结果：

df[‘result’] = df.apply(lambda x: ‘yes’ if x.cities2 in df.cities1.values else ‘no’, axis=1)

结果如下：

cities1

cities2

result

北京

上海

yes

上海

南京

yes

深圳

杭州

yes

广州

苏州

yes

杭州

无锡

南京

广州

yes

武汉

深圳

yes

成都

东莞

苏州

香港

青岛

澳门

原文链接：https://blog.csdn.net/weixin_35097945/article/details/112828224

你可能也喜欢