padas数据清洗

  • Post author:
  • Post category:其他


numpy 的使用

  • list和numpy互转
# list 转 numpy
array_data = np.array(list_data)
# np array 转 list
list_data = array_data.tolist()
  • numpy中numpy array存储和读取.npy
# 存储:
import numpy as np
numpy_array = np.array([1,2,3])
np.save('log.npy',numpy_array )

# 读取:
import numpy as np
numpy_array = np.load('log.npy')
  • numpy.mean()的几种用法(按行求均值,按列求均值)
import numpy as np
x = np.array([1,2,3,4,5])
y = np.array([0,2,3,4,6])
z = np.array([[1,2],[3,4]])#二维数组
np.mean(x==y)#返回条件成立的占比
Out[5]: 0.59999999999999998
np.mean(x)#均值
Out[6]: 3.0

np.mean(z)
Out[10]: 2.5
np.mean(z,axis=0)#按列求均值
Out[11]: array([ 2.,  3.])
np.mean(z,axis=1)#按行求均值
Out[12]: array([ 1.5,  3.5])

padas 的使用

  • pandas取某几列
import pandas as pd

data = pd.read_csv("dirty_data.csv")
data = data.iloc[:, 0:13]  # 按位置取某几列  
  • pandas 读写csv (解决读取csv遇到编码问题不能读取)
DF = pandas.read_csv(r'test.csv',encoding='gbk')
DF.to_csv(r'test.csv',encoding='gbk')

# 如果读取csv遇到编码问题不能读取
def preprocess(path):
    try:
        pd.read_csv(path)
    except Exception:
        with open(path,'r',errors='ignore') as f:
            contents=f.read()
        f.close()    
        with open(path,'w',encoding='utf-8') as f:
            f.write(contents)
        f.close()     
    # Read files
    data = pd.read_csv(path)
    frame_data = DataFrame(data)
  • dataframe 获取列名列表
DataFrame.columns.values.tolist()
  • Pandas合并和组合操作

参考:

https://www.jianshu.com/p/fe47c70d31f9

  • pandas 修改DataFrame列名

参考:

https://www.cnblogs.com/hhh5460/p/5816774.html

  • Pandas聚合运算和分组运算

参考:

https://blog.csdn.net/baoshuowl/article/details/79870706


https://www.cnblogs.com/huiyang865/p/5577772.html

pandas 数据清洗


https://mp.csdn.net/postedit

pandas 数据类型转换


https://www.cnblogs.com/onemorepoint/p/9404753.html


https://www.cnblogs.com/c-w20140301/p/11379026.html

排序


http://sofasofa.io/forum_main_post.php?postid=1000411

pandas-DataFrame移动列的位置


https://blog.csdn.net/sinat_41701878/article/details/80945861



版权声明:本文为csdjia11原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。