来自
参考
原始数据
也就是两列数据,一列是时间,一列是电力消耗量:
Datetime,PJME_MW
2002-12-31 01:00:00,26498.0
2002-12-31 02:00:00,25147.0
2002-12-31 03:00:00,24574.0
2002-12-31 04:00:00,24393.0
2002-12-31 05:00:00,24860.0
2002-12-31 06:00:00,26222.0
2002-12-31 07:00:00,28702.0
2002-12-31 08:00:00,30698.0
...
2018-01-01 19:00:00,44343.0
2018-01-01 20:00:00,44284.0
2018-01-01 21:00:00,43751.0
2018-01-01 22:00:00,42402.0
2018-01-01 23:00:00,40164.0
2018-01-02 00:00:00,38608.0
准备训练集和测试集
以2015-01-01切分训练集和测试集:
pjme = pd.read_csv('PJME_hourly.csv', index_col=[0], parse_dates=[0])
split_date = '2015-01-01'
pjme_train = pjme.loc[pjme.index <= split_date].copy()
pjme_test = pjme.loc[pjme.index > split_date].copy()
构造特征:
def create_features(df, label=None):
df['date'] = df.index # index: DatetimeIndex
df['hour'] = df['date'].dt.hour # dt: DatetimeProperties, hour: Series
df['day_of_week'] = df['date'].dt.dayofweek
df['quarter'] = df['date'].dt.quarter
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
df['day_of_year'] = df['date'].dt.dayofyear
df['day_of_month'] = df['date'].dt.day
df['week_of_year'] = df['date'].dt.weekofyear
X = df[['hour', 'day_of_week', 'quarter', 'month', 'year', 'day_of_year', 'day_of_month', 'week_of_year']]
if label:
y = df[label]
return X, y
return X
# 训练集
X_train, y_train = create_features(pjme_train, label='PJME_MW')
# 测试集
X_test, y_test = create_features(pjme_test, label='PJME_MW')
X_train:
hour day_of_week quarter month year day_of_year day_of_month week_of_year
Datetime
2002-12-31 01:00:00 1 1 4 12 2002 365 31 1
2002-12-31 02:00:00 2 1 4 12 2002 365 31 1
2002-12-31 03:00:00 3 1 4 12 2002 365 31 1
2002-12-31 04:00:00 4 1 4 12 2002 365 31 1
2002-12-31 05:00:00 5 1 4 12 2002 365 31 1
...
模型->训练->预测
# 模型
reg = xgb.XGBRegressor(n_estimators=1000)
# 训练
reg.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)], early_stopping_rounds=50)
[0] validation_0-rmse:29710.4 validation_1-rmse:28762.5
Multiple eval metrics have been passed: 'validation_1-rmse' will be used for early stopping.
Will train until validation_1-rmse hasn't improved in 50 rounds.
[1] validation_0-rmse:26822.6 validation_1-rmse:25892.2
[2] validation_0-rmse:24211.2 validation_1-rmse:23286.6
[3] validation_0-rmse:21885.1 validation_1-rmse:20967.5
[4] validation_0-rmse:19780.3 validation_1-rmse:18868.5
...
[195] validation_0-rmse:2844.33 validation_1-rmse:3754.45
[196] validation_0-rmse:2842.94 validation_1-rmse:3754.73
[197] validation_0-rmse:2840.57 validation_1-rmse:3754.88
[198] validation_0-rmse:2838.73 validation_1-rmse:3754.71
[199] validation_0-rmse:2837.81 validation_1-rmse:3753.66
Stopping. Best iteration:
[149] validation_0-rmse:2923.17 validation_1-rmse:3712.2
# 预测
y_pred = reg.predict(X_test)
[28804.365 27663.098 27125.912 ... 34988.7 32725.598 31440.66 ]
评价
RMSE: 均方根误差(Root Mean Square Error)