向pandas DataFrame添加一行

本文翻译自：

Add one row to pandas DataFrame

I understand that pandas is designed to load fully populated
DataFrame
but I need to

create an empty DataFrame then add rows, one by one

.

我知道pandas旨在加载完全填充的
DataFrame
但是我需要

创建一个空的DataFrame，然后逐行添加行

。

What is the best way to do this ?

做这个的最好方式是什么？

I successfully created an empty DataFrame with :

我成功创建了一个空的DataFrame：

res = DataFrame(columns=('lib', 'qty1', 'qty2'))

Then I can add a new row and fill a field with :

然后，我可以添加新行，并使用以下字段填充字段：

res = res.set_value(len(res), 'qty1', 10.0)

It works but seems very odd :-/ (it fails for adding string value)

它有效，但看起来很奇怪：-/（添加字符串值失败）

How can I add a new row to my DataFrame (with different columns type) ?

如何将新行添加到DataFrame（具有不同的列类型）？

#1楼

参考：

https://stackoom.com/question/ixi9/向pandas-DataFrame添加一行

#2楼

You could use
pandas.concat()
or
DataFrame.append()
.

您可以使用
pandas.concat()
或
DataFrame.append()
。

For details and examples, see

Merge, join, and concatenate

.

有关详细信息和示例，请参见

合并，联接和连接

。

#3楼

In case you can get all data for the data frame upfront, there is a much faster approach than appending to a data frame:

如果可以预先获取该数据帧的所有数据，则有一种比附加到数据帧快得多的方法：

Create a list of dictionaries in which each dictionary corresponds to an input data row.

创建一个词典列表，其中每个词典对应于一个输入数据行。
Create a data frame from this list.

从此列表创建一个数据框。

I had a similar task for which appending to a data frame row by row took 30 min, and creating a data frame from a list of dictionaries completed within seconds.

我有一个类似的任务，需要花30分钟的时间逐行追加到数据帧，然后在几秒钟内完成的字典列表中创建数据帧。

rows_list = []
for row in input_rows:

        dict1 = {}
        # get input row in dictionary format
        # key = col_name
        dict1.update(blah..) 

        rows_list.append(dict1)

df = pd.DataFrame(rows_list)

#4楼

For efficient appending see

How to add an extra row to a pandas dataframe

and

Setting With Enlargement

.

为了高效地附加，请参见

如何向pandas数据框添加额外的行

和“

设置为放大”

。

Add rows through
loc/ix
on

non existing

key index data.

通过
loc/ix
在

不存在的

键索引数据上添加行。

eg :

例如：

In [1]: se = pd.Series([1,2,3])

In [2]: se
Out[2]: 
0    1
1    2
2    3
dtype: int64

In [3]: se[5] = 5.

In [4]: se
Out[4]: 
0    1.0
1    2.0
2    3.0
5    5.0
dtype: float64

Or:

要么：

In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),
   .....:                 columns=['A','B'])
   .....: 

In [2]: dfi
Out[2]: 
   A  B
0  0  1
1  2  3
2  4  5

In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']

In [4]: dfi
Out[4]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
In [5]: dfi.loc[3] = 5

In [6]: dfi
Out[6]: 
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
3  5  5  5

#5楼

>>> import pandas as pd
>>> from numpy.random import randint

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>>     df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))

>>> df
     lib qty1 qty2
0  name0    3    3
1  name1    2    4
2  name2    2    8
3  name3    2    1
4  name4    9    6

#6楼

If you know the number of entries ex ante, you should preallocate the space by also providing the index (taking the data example from a different answer):

如果您事前知道条目数，则应该通过提供索引来预分配空间（从另一个答案中获取数据示例）：

import pandas as pd
import numpy as np
# we know we're gonna have 5 rows of data
numberOfRows = 5
# create dataframe
df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )

# now fill it up row by row
for x in np.arange(0, numberOfRows):
    #loc or iloc both work here since the index is natural numbers
    df.loc[x] = [np.random.randint(-1,1) for n in range(3)]
In[23]: df
Out[23]: 
   lib  qty1  qty2
0   -1    -1    -1
1    0     0     0
2   -1     0    -1
3    0    -1     0
4   -1     0     0

Speed comparison

速度比较

In[30]: %timeit tryThis() # function wrapper for this answer
In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)
1000 loops, best of 3: 1.23 ms per loop
100 loops, best of 3: 2.31 ms per loop

And – as from the comments – with a size of 6000, the speed difference becomes even larger:

而且-从注释中看-大小为6000，速度差异变得更大：

Increasing the size of the array (12) and the number of rows (500) makes the speed difference more striking: 313ms vs 2.29s

增加数组（12）的大小和行数（500）会使速度差异更加明显：313ms vs 2.29s

#1楼

#2楼

#3楼

#4楼

#5楼

#6楼

你可能也喜欢