本文翻译自:
Add one row to pandas DataFrame
I understand that pandas is designed to load fully populated
DataFrame
but I need to
create an empty DataFrame then add rows, one by one
.
我知道pandas旨在加载完全填充的
DataFrame
但是我需要
创建一个空的DataFrame,然后逐行添加行
。
What is the best way to do this ?
做这个的最好方式是什么 ?
I successfully created an empty DataFrame with :
我成功创建了一个空的DataFrame:
res = DataFrame(columns=('lib', 'qty1', 'qty2'))
Then I can add a new row and fill a field with :
然后,我可以添加新行,并使用以下字段填充字段:
res = res.set_value(len(res), 'qty1', 10.0)
It works but seems very odd :-/ (it fails for adding string value)
它有效,但看起来很奇怪:-/(添加字符串值失败)
How can I add a new row to my DataFrame (with different columns type) ?
如何将新行添加到DataFrame(具有不同的列类型)?
#1楼
参考:
https://stackoom.com/question/ixi9/向pandas-DataFrame添加一行
#2楼
You could use
pandas.concat()
or
DataFrame.append()
.
您可以使用
pandas.concat()
或
DataFrame.append()
。
For details and examples, see
Merge, join, and concatenate
.
有关详细信息和示例,请参见
合并,联接和连接
。
#3楼
In case you can get all data for the data frame upfront, there is a much faster approach than appending to a data frame:
如果可以预先获取该数据帧的所有数据,则有一种比附加到数据帧快得多的方法:
-
Create a list of dictionaries in which each dictionary corresponds to an input data row.
创建一个词典列表,其中每个词典对应于一个输入数据行。
-
Create a data frame from this list.
从此列表创建一个数据框。
I had a similar task for which appending to a data frame row by row took 30 min, and creating a data frame from a list of dictionaries completed within seconds.
我有一个类似的任务,需要花30分钟的时间逐行追加到数据帧,然后在几秒钟内完成的字典列表中创建数据帧。
rows_list = []
for row in input_rows:
dict1 = {}
# get input row in dictionary format
# key = col_name
dict1.update(blah..)
rows_list.append(dict1)
df = pd.DataFrame(rows_list)
#4楼
For efficient appending see
How to add an extra row to a pandas dataframe
and
Setting With Enlargement
.
为了高效地附加,请参见
如何向pandas数据框添加额外的行
和“
设置为放大”
。
Add rows through
loc/ix
on
non existing
key index data.
通过
loc/ix
在
不存在的
键索引数据上添加行。
eg :
例如:
In [1]: se = pd.Series([1,2,3])
In [2]: se
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: se[5] = 5.
In [4]: se
Out[4]:
0 1.0
1 2.0
2 3.0
5 5.0
dtype: float64
Or:
要么:
In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),
.....: columns=['A','B'])
.....:
In [2]: dfi
Out[2]:
A B
0 0 1
1 2 3
2 4 5
In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']
In [4]: dfi
Out[4]:
A B C
0 0 1 0
1 2 3 2
2 4 5 4
In [5]: dfi.loc[3] = 5
In [6]: dfi
Out[6]:
A B C
0 0 1 0
1 2 3 2
2 4 5 4
3 5 5 5
#5楼
>>> import pandas as pd
>>> from numpy.random import randint
>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>> df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))
>>> df
lib qty1 qty2
0 name0 3 3
1 name1 2 4
2 name2 2 8
3 name3 2 1
4 name4 9 6
#6楼
If you know the number of entries ex ante, you should preallocate the space by also providing the index (taking the data example from a different answer):
如果您事前知道条目数,则应该通过提供索引来预分配空间(从另一个答案中获取数据示例):
import pandas as pd
import numpy as np
# we know we're gonna have 5 rows of data
numberOfRows = 5
# create dataframe
df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )
# now fill it up row by row
for x in np.arange(0, numberOfRows):
#loc or iloc both work here since the index is natural numbers
df.loc[x] = [np.random.randint(-1,1) for n in range(3)]
In[23]: df
Out[23]:
lib qty1 qty2
0 -1 -1 -1
1 0 0 0
2 -1 0 -1
3 0 -1 0
4 -1 0 0
Speed comparison
速度比较
In[30]: %timeit tryThis() # function wrapper for this answer
In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)
1000 loops, best of 3: 1.23 ms per loop
100 loops, best of 3: 2.31 ms per loop
And – as from the comments – with a size of 6000, the speed difference becomes even larger:
而且-从注释中看-大小为6000,速度差异变得更大:
Increasing the size of the array (12) and the number of rows (500) makes the speed difference more striking: 313ms vs 2.29s
增加数组(12)的大小和行数(500)会使速度差异更加明显:313ms vs 2.29s