python租车系统_Python共享单车数据–可视化

项目描述：利用kaggle网站项目(Bike Sharing Demand | Kaggle)中提供的2011年到2012年美国某城市的共享单车数据集，其中包括了租车日期，天气，季节，气温，体感温度，空气湿度，风速等数据。通过对数据进行清洗，计算描述性统计数据，分析租车日期，天气，季节，气温，体感温度，空气湿度，风速等对租车的影响并基本实现数据的可视化。

import warnings

warnings.filterwarnings(‘ignore’)#忽略警告错误的输出

import pandas as pd

import numpy as np

from datetime import datetime

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

data = pd.read_csv(‘Biketrain.csv’)

#数据清洗：将datetime数据按月、周及小时维度进行拆分，将datetime数据按月、周及小时维度进行拆分

data[‘date’] = data[‘datetime’].map(lambda x:x.split()[0])

data[‘hour’] = data[‘datetime’].map(lambda x:x.split()[1].split(‘:’)[0])

data[‘month’] = data[‘datetime’].map(lambda x:x.split(‘ ‘)[0].split(‘-‘)[1]).astype(‘int’)#’ ‘里有空格

data[‘weekday’] = data[‘date’].map(lambda x:datetime.strptime(x,’%Y-%m-%d’).isoweekday())#大写’Y’

#返回的1-7代表周一到周日

data.head()

#字典映射map

#季节、天气和星期为数值型，为了便于理解，需要转换成名称：

week_days = {0:’Sunday’,1:’Monday’,2:’Tuesday’,3:’Wednesday’,4:’Thursday’,5:’Friday’,6:’Saturday’}

data[‘weekday_name’]=data[‘weekday’].map(week_days)

season_name={1:’Spring’,2:’Summer’,3:’Fall’,4:’Winter’}

data[‘season_name’]=data[‘season’].map(season_name)

weather_name={1:’Sunny’,2:’Cloudy’,3:’Light Rain’,4:’Heavy Rain’}

data[‘weather_name’]=data[‘weather’].map(weather_name)

data.drop(‘datetime’,axis=1,inplace=True)

#inplace=True：不创建新的对象，直接对原始对象进行修改；

#inplace=False：对数据进行修改，创建并返回新的对象承载其修改结果

data.head()

”’变量解释： datetime – hourly date + timestamp

season – 1 = spring, 2 = summer, 3 = fall, 4 = winter

holiday – whether the day is considered a holiday(0:否；1：是)

workingday – whether the day is neither a weekend nor holiday(0:否；1：是)

weather – 1: Clear, Few clouds, Partly cloudy, Partly cloudy

2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

temp – temperature in Celsius(实际温度)

atemp – “feels like” temperature in Celsius(体感温度)

humidity – relative humidity

windspeed – wind speed

casual – number of non-registered user rentals initiated(未注册用户数)

registered – number of registered user rentals initiated(注册用户数)

count – number of total rentals”’

#https://www.cnblogs.com/zqiguoshang/p/5744563.html

#利用corr方法得出数据集的相关系数矩阵，并将其可视化，以选择出对count影响较大的变量

data_corr =data.corr()

fig = plt.figure(1)#新建一个名叫figure1的画图窗口

ax1 =plt.subplot(1,1,1)

#plt.subplot(111)和plt.subplot(1,1,1)是等价的。意思是将区域分成1行1列，当前画的是第一个图(排序由行至列)

fig.set_size_inches(11,11)#重新设置大小

sns.heatmap(data_corr,ax=ax1,annot=True,square=False)

#annotate的缩写，annot默认为False，当annot为True时，在heatmap中每个方格写入数据

#square设置热力图矩阵小块形状，默认值是False

plt.show

#发现季节、天气、温度、湿度、风速和月份对租车人数的影响较大

#柱形图，分类–连续

#分别分析月份、季节和星期几对单车使用情况的影响，并在分析季节和星期时结合小时维度进行分析

fig,(ax1,ax2,ax3) = plt.subplots(3,1)#逗号

fig.set_size_inches(11,15)

Month_avg = pd.DataFrame(data.groupby(‘month’)[‘count’].mean()).reset_index()

sns.barplot(ax=ax1,data=Month_avg,x=’month’,y=’count’)

ax1.set(xlabel=’Month’,ylabel=’Avg_num’,title=’Average Number By Month’)

#折线图

#hue:使用指定变量为分类变量画图

Season_hour_avg=pd.DataFrame(data.groupby([‘hour’,’season_name’],sort=True)[‘count’].mean()).reset_index()

sns.pointplot(ax=ax2,x=Season_hour_avg[‘hour’], y=Season_hour_avg[‘count’],

hue=Season_hour_avg[‘season_name’], data=Season_hour_avg, join=True)

ax2.set(xlabel=’Hour’, ylabel=’Count’,title=’Average Count By Hour Each Season’)

#折线图

Week_hour_avg=pd.DataFrame(data.groupby([‘hour’,’weekday’],sort=True)[‘count’].mean()).reset_index()

sns.pointplot(ax=ax3,x=Week_hour_avg[‘hour’], y=Week_hour_avg[‘count’],

hue=Week_hour_avg[‘weekday’], data=Week_hour_avg, join=True)

ax3.set(xlabel=’Hour’, ylabel=’Count’,title=’Average Count By Hour Each Weekday’)

每年5-10月为租车旺季，春季的租车量明显低于其他季节

工作日的用车时段集中在早上8点和晚上5-6点，与上下班高峰期吻合

周末的用车高峰时段不同于工作日，多集中在11点-17点

#天气和风力对单车使用情况的影响

climateDf=data[[‘count’,’weather’,’weather_name’,’temp’,’atemp’,’humidity’,’windspeed’]]

climateDf=pd.concat([climateDf,data[‘hour’].astype(int)],axis=1)

fig,axes=plt.subplots(2,1,figsize=(10,14))#逗号，通过figsize参数可以指定绘图对象的宽度和高度，单位为英寸

df_sum=climateDf.groupby(‘weather’).sum()[‘count’]

df_avg = climateDf.groupby(‘weather’).mean()[‘count’]

df_weather=pd.concat([df_sum,df_avg], axis=1).reset_index()

# 双轴图，不同天气下的单车使用量

ax1=plt.subplot(2,1,1)#几行、几列，以及选取第几个绘图区域

df_weather.columns=[‘weather’,’sum’,’mean’]

df_weather[‘sum’].plot(kind=’bar’,width=0.4,ax=ax1,alpha=0.6,label=”)

#类型，宽度，坐标轴，透明度，给所绘制的曲线一个名字，此名字在图示(legend)中显示

df_weather[‘mean’].plot(ax=ax1,style=’b–.’,alpha=0.6,secondary_y=True,label=’平均值’)

#linestyle与marker的取值可以参见表2，默认的线形为’-‘，点形为’o’

ax1.set_xticks(df_weather[‘weather’])

ax1.set_xlabel(‘Weather’)#xlabel : 设定x轴的标签

ax1.set_xticklabels([‘Sunny’,’Cloudy’,’Light Rain’,’Heavy Rain’],rotation=’horizontal’)

#设定x轴的标签文字，rotation就是翻转的角度

ax1.set_ylabel(‘Sum of rental’)

ax1.right_ax.set_ylabel(‘Ayg of rental’)

ax1.set_title(‘The rental number of bike_sharing in 2011-2012 with different weather’)

#不同风力下的单车使用量

ax2=plt.subplot(2,1,2)

df_sum2=climateDf.groupby(‘windspeed’).sum()[‘count’]

df_avg2=climateDf.groupby(‘windspeed’).mean()[‘count’]

df_wind=pd.concat([df_sum2,df_avg2], axis=1).reset_index()

df_wind.columns=[‘windspeed’,’sum’,’mean’]

df_wind[‘sum’].plot(ax=ax2,kind=’area’,alpha=0.5,label=”)#区域图

df_wind[‘mean’].plot(style=’b–.’,alpha=0.7,ax=ax2,secondary_y=True,label=’平均值’)

ax2.set_ylabel(‘Sum of rental’)

ax2.right_ax.set_ylabel(‘Ayg of rental’)

ax2.set_title(‘The rental number of bike_sharing in 2011-2012 with different windspped’)

ax2.set_xlabel(‘Windspeed’)

天气状况越好、风速越小(低于6)，租车量就越大。而极端天气和较大风速(大于25)对应的平均租车量反而较高，产生了异常值，因为它们出现的天数较少，导致租车数量波动值较大。

#散点图

#https://blog.csdn.net/qq_17278169/article/details/54927014，参数解释

fig,axes=plt.subplots(3,1,figsize=(10,13)) #3行1列

ax1=plt.subplot(3,1,1)

df_hum=climateDf[[‘humidity’, ‘count’]]

ax1.scatter(df_hum[‘humidity’],df_hum[‘count’],s=df_hum[‘count’]/5,c=df_hum[‘count’],marker=’.’,alpha=0.8)

# s:点的大小 ; c:点的颜色

ax1.set_title(‘The rental number of bike_sharing in 2011-2012 with different humidity’)

ax1.set_xlabel(‘Humidity’)

ax1.set_ylabel(‘Number’)

ax2=plt.subplot(3,1,2)

df_temp=climateDf[[‘temp’, ‘count’]]

ax2.scatter(df_temp[‘temp’], df_temp[‘count’], s=df_temp[‘count’]/5, c=df_temp[‘count’], marker=’.’, alpha=0.8)

ax2.set_title(‘The rental number of bike_sharing in 2011-2012 with different temperature’)

ax2.set_xlabel(‘Temperature’)

ax2.set_ylabel(‘Number’)

ax3=plt.subplot(3,1,3)

df_temp=climateDf[[‘windspeed’, ‘count’]]

ax3.scatter(df_temp[‘windspeed’], df_temp[‘count’], s=df_temp[‘count’]/5, c=df_temp[‘count’], marker=’.’, alpha=0.8)

ax3.set_title(‘The rental number of bike_sharing in 2011-2012 with different windspeed’)

ax3.set_xlabel(‘Windspeed’)

ax3.set_ylabel(‘Number’)

随着湿度的增大，租车量在减小；

租车量随着温度增加先递增后递减，最佳温度在25-30度之间；

在某一范围内(5

你可能也喜欢