python处理csv数据_Python-处理CSV数据

  • Post author:
  • Post category:python


python处理csv数据

python处理csv数据

Python-处理CSV数据

(

Python – Processing CSV Data

)

Reading data from CSV(comma separated values) is a fundamental necessity in Data Science. Often, we get data from various sources which can get exported to CSV format so that they can be used by other systems. The Panadas library provides features using which we can read the CSV file in full as well as in parts for only a selected group of columns and rows.

从CSV(逗号分隔值)中读取数据是数据科学的基本必要条件。 通常,我们从各种来源获取数据,这些数据可以导出为CSV格式,以便其他系统可以使用。 Panadas库提供了一些功能,通过这些功能,我们可以完全或部分地读取选定的一组列和行的CSV文件。

输入为CSV文件

(

Input as CSV File

)

The csv file is a text file in which the values in the columns are separated by a comma. Let’s consider the following data present in the file named

input.csv

.

csv文件是一个文本文件,其中列中的值用逗号分隔。 让我们考虑一下名为

input.csv

的文件中存在的以下数据。

You can create this file using windows notepad by copying and pasting this data. Save the file as

input.csv

using the save As All files(*.*) option in notepad.

您可以使用Windows记事本通过复制和粘贴此数据来创建此文件。 使用记事本中的“另存为所有文件(*。*)”选项将文件另存为

input.csv


id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Tusar,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
5,Gary,843.25,2015-03-27,Finance
6,Rasmi,578,2013-05-21,IT
7,Pranab,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance

读取CSV文件

(

Reading a CSV File

)

The

read_csv

function of the pandas library is used read the content of a CSV file into the python environment as a pandas DataFrame. The function can read the files from the OS by using proper path to the file.

pandas库的

read_csv

函数用于将CSV文件的内容作为pandas DataFrame读取到python环境中。 该功能可以通过使用正确的文件路径从OS读取文件。


import pandas as pd
data = pd.read_csv('path/input.csv')
print (data)

When we execute the above code, it produces the following result. Please note how an additional column starting with zero as a index has been created by the function.

当我们执行上面的代码时,它产生以下结果。 请注意,该函数是如何创建从零开始作为索引的附加列的。


   id    name  salary  start_date        dept
0   1    Rick  623.30  2012-01-01          IT
1   2     Dan  515.20  2013-09-23  Operations
2   3   Tusar  611.00  2014-11-15          IT
3   4    Ryan  729.00  2014-05-11          HR
4   5    Gary  843.25  2015-03-27     Finance
5   6   Rasmi  578.00  2013-05-21          IT
6   7  Pranab  632.80  2013-07-30  Operations
7   8    Guru  722.50  2014-06-17     Finance

读取特定行

(

Reading Specific Rows

)

The

read_csv

function of the pandas library can also be used to read some specific rows for a given column. We slice the result from the read_csv function using the code shown below for first 5 rows for the column named salary.

pandas库的

read_csv

函数还可用于读取给定列的某些特定行。 我们使用下面显示的代码对read_csv函数的结果进行切片,该代码用于名为salary的列的前5行。


import pandas as pd
data = pd.read_csv('path/input.csv')

# Slice the result for first 5 rows
print (data[0:5]['salary'])

When we execute the above code, it produces the following result.

当我们执行上面的代码时,它产生以下结果。


0    623.30
1    515.20
2    611.00
3    729.00
4    843.25
Name: salary, dtype: float64

读取特定列

(

Reading Specific Columns

)

The

read_csv

function of the pandas library can also be used to read some specific columns. We use the multi-axes indexing method called

.loc()

for this purpose. We choose to display the salary and name column for all the rows.

pandas库的

read_csv

函数还可用于读取某些特定的列。 为此,我们使用称为

.loc()

的多轴索引方法。 我们选择显示所有行的薪水和姓名列。


import pandas as pd
data = pd.read_csv('path/input.csv')

# Use the multi-axes indexing funtion
print (data.loc[:,['salary','name']])

When we execute the above code, it produces the following result.

当我们执行上面的代码时,它产生以下结果。


   salary    name
0  623.30    Rick
1  515.20     Dan
2  611.00   Tusar
3  729.00    Ryan
4  843.25    Gary
5  578.00   Rasmi
6  632.80  Pranab
7  722.50    Guru


读取特定的列和行

(

Reading Specific Columns and Rows

)

The

read_csv

function of the pandas library can also be used to read some specific columns and specific rows. We use the multi-axes indexing method called

.loc()

for this purpose. We choose to display the salary and name column for some of the rows.

pandas库的

read_csv

函数还可用于读取某些特定的列和特定的行。 为此,我们使用称为

.loc()

的多轴索引方法。 我们选择显示某些行的薪水和姓名列。


import pandas as pd
data = pd.read_csv('path/input.csv')

# Use the multi-axes indexing funtion
print (data.loc[[1,3,5],['salary','name']])

When we execute the above code, it produces the following result.

当我们执行上面的代码时,它产生以下结果。


   salary   name
1   515.2    Dan
3   729.0   Ryan
5   578.0  Rasmi


读取一系列行的特定列

(

Reading Specific Columns for a Range of Rows

)

The

read_csv

function of the pandas library can also be used to read some specific columns and a range of rows. We use the multi-axes indexing method called

.loc()

for this purpose. We choose to display the salary and name column for some of the rows.

pandas库的

read_csv

函数还可用于读取某些特定的列和行范围。 为此,我们使用称为

.loc()

的多轴索引方法。 我们选择显示某些行的薪水和姓名列。


import pandas as pd
data = pd.read_csv('path/input.csv')

# Use the multi-axes indexing funtion
print (data.loc[2:6,['salary','name']])

When we execute the above code, it produces the following result.

当我们执行上面的代码时,它产生以下结果。


   salary    name
2  611.00   Tusar
3  729.00    Ryan
4  843.25    Gary
5  578.00   Rasmi
6  632.80  Pranab

翻译自:

https://www.tutorialspoint.com/python_data_science/python_processing_csv_data.htm

python处理csv数据