python租车系统_Python共享单车数据–可视化

  • Post author:
  • Post category:python


项目描述:利用kaggle网站项目(Bike Sharing Demand | Kaggle)中提供的2011年到2012年美国某城市的共享单车数据集,其中包括了租车日期,天气,季节,气温,体感温度,空气湿度,风速等数据。通过对数据进行清洗,计算描述性统计数据,分析租车日期,天气,季节,气温,体感温度,空气湿度,风速等对租车的影响并基本实现数据的可视化。

import warnings

warnings.filterwarnings(‘ignore’)#忽略警告错误的输出

import pandas as pd

import numpy as np

from datetime import datetime

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

data = pd.read_csv(‘Biketrain.csv’)

#数据清洗:将datetime数据按月、周及小时维度进行拆分,将datetime数据按月、周及小时维度进行拆分

data[‘date’] = data[‘datetime’].map(lambda x:x.split()[0])

data[‘hour’] = data[‘datetime’].map(lambda x:x.split()[1].split(‘:’)[0])

data[‘month’] = data[‘datetime’].map(lambda x:x.split(‘ ‘)[0].split(‘-‘)[1]).astype(‘int’)#’ ‘里有空格

data[‘weekday’] = data[‘date’].map(lambda x:datetime.strptime(x,’%Y-%m-%d’).isoweekday())#大写’Y’

#返回的1-7代表周一到周日

data.head()

#字典映射map

#季节、天气和星期为数值型,为了便于理解,需要转换成名称:

week_days = {0:’Sunday’,1:’Monday’,2:’Tuesday’,3:’Wednesday’,4:’Thursday’,5:’Friday’,6:’Saturday’}

data[‘weekday_name’]=data[‘weekday’].map(week_days)

season_name={1:’Spring’,2:’Summer’,3:’Fall’,4:’Winter’}

data[‘season_name’]=data[‘season’].map(season_name)

weather_name={1:’Sunny’,2:’Cloudy’,3:’Light Rain’,4:’Heavy Rain’}

data[‘weather_name’]=data[‘weather’].map(weather_name)

data.drop(‘datetime’,axis=1,inplace=True)

#inplace=True:不创建新的对象,直接对原始对象进行修改;

#inplace=False:对数据进行修改,创建并返回新的对象承载其修改结果

data.head()

”’变量解释: datetime – hourly date + timestamp

season – 1 = spring, 2 = summer, 3 = fall, 4 = winter

holiday – whether the day is considered a holiday(0:否 ;1:是)

workingday – whether the day is neither a weekend nor holiday(0:否 ;1:是)

weather – 1: Clear, Few clouds, Partly cloudy, Partly cloudy

2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

temp – temperature in Celsius(实际温度)

atemp – “feels like” temperature in Celsius(体感温度)

humidity – relative humidity

windspeed – wind speed

casual – number of non-registered user rentals initiated(未注册用户数)

registered – number of registered user rentals initiated(注册用户数)

count – number of total rentals”’

#https://www.cnblogs.com/zqiguoshang/p/5744563.html

#利用corr方法得出数据集的相关系数矩阵,并将其可视化,以选择出对count影响较大的变量

data_corr =data.corr()

fig = plt.figure(1)#新建一个名叫figure1的画图窗口

ax1 =plt.subplot(1,1,1)

#plt.subplot(111)和plt.subplot(1,1,1)是等价的。意思是将区域分成1行1列,当前画的是第一个图(排序由行至列)

fig.set_size_inches(11,11)#重新设置大小

sns.heatmap(data_corr,ax=ax1,annot=True,square=False)

#annotate的缩写,annot默认为False,当annot为True时,在heatmap中每个方格写入数据

#square设置热力图矩阵小块形状,默认值是False

plt.show

#发现季节、天气、温度、湿度、风速和月份对租车人数的影响较大

#柱形图,分类–连续

#分别分析月份、季节和星期几对单车使用情况的影响,并在分析季节和星期时结合小时维度进行分析

fig,(ax1,ax2,ax3) = plt.subplots(3,1)#逗号

fig.set_size_inches(11,15)

Month_avg = pd.DataFrame(data.groupby(‘month’)[‘count’].mean()).reset_index()

sns.barplot(ax=ax1,data=Month_avg,x=’month’,y=’count’)

ax1.set(xlabel=’Month’,ylabel=’Avg_num’,title=’Average Number By Month’)

#折线图

#hue:使用指定变量为分类变量画图

Season_hour_avg=pd.DataFrame(data.groupby([‘hour’,’season_name’],sort=True)[‘count’].mean()).reset_index()

sns.pointplot(ax=ax2,x=Season_hour_avg[‘hour’], y=Season_hour_avg[‘count’],

hue=Season_hour_avg[‘season_name’], data=Season_hour_avg, join=True)

ax2.set(xlabel=’Hour’, ylabel=’Count’,title=’Average Count By Hour Each Season’)

#折线图

Week_hour_avg=pd.DataFrame(data.groupby([‘hour’,’weekday’],sort=True)[‘count’].mean()).reset_index()

sns.pointplot(ax=ax3,x=Week_hour_avg[‘hour’], y=Week_hour_avg[‘count’],

hue=Week_hour_avg[‘weekday’], data=Week_hour_avg, join=True)

ax3.set(xlabel=’Hour’, ylabel=’Count’,title=’Average Count By Hour Each Weekday’)

每年5-10月为租车旺季,春季的租车量明显低于其他季节

工作日的用车时段集中在早上8点和晚上5-6点,与上下班高峰期吻合

周末的用车高峰时段不同于工作日,多集中在11点-17点

#天气和风力对单车使用情况的影响

climateDf=data[[‘count’,’weather’,’weather_name’,’temp’,’atemp’,’humidity’,’windspeed’]]

climateDf=pd.concat([climateDf,data[‘hour’].astype(int)],axis=1)

fig,axes=plt.subplots(2,1,figsize=(10,14))#逗号,通过figsize参数可以指定绘图对象的宽度和高度,单位为英寸

df_sum=climateDf.groupby(‘weather’).sum()[‘count’]

df_avg = climateDf.groupby(‘weather’).mean()[‘count’]

df_weather=pd.concat([df_sum,df_avg], axis=1).reset_index()

# 双轴图,不同天气下的单车使用量

ax1=plt.subplot(2,1,1)#几行、几列,以及选取第几个绘图区域

df_weather.columns=[‘weather’,’sum’,’mean’]

df_weather[‘sum’].plot(kind=’bar’,width=0.4,ax=ax1,alpha=0.6,label=”)

#类型,宽度,坐标轴,透明度,给所绘制的曲线一个名字,此名字在图示(legend)中显示

df_weather[‘mean’].plot(ax=ax1,style=’b–.’,alpha=0.6,secondary_y=True,label=’平均值’)

#linestyle与marker的取值可以参见表2,默认的线形为’-‘,点形为’o’

ax1.set_xticks(df_weather[‘weather’])

ax1.set_xlabel(‘Weather’)#xlabel : 设定x轴的标签

ax1.set_xticklabels([‘Sunny’,’Cloudy’,’Light Rain’,’Heavy Rain’],rotation=’horizontal’)

#设定x轴的标签文字,rotation就是翻转的角度

ax1.set_ylabel(‘Sum of rental’)

ax1.right_ax.set_ylabel(‘Ayg of rental’)

ax1.set_title(‘The rental number of bike_sharing in 2011-2012 with different weather’)

#不同风力下的单车使用量

ax2=plt.subplot(2,1,2)

df_sum2=climateDf.groupby(‘windspeed’).sum()[‘count’]

df_avg2=climateDf.groupby(‘windspeed’).mean()[‘count’]

df_wind=pd.concat([df_sum2,df_avg2], axis=1).reset_index()

df_wind.columns=[‘windspeed’,’sum’,’mean’]

df_wind[‘sum’].plot(ax=ax2,kind=’area’,alpha=0.5,label=”)#区域图

df_wind[‘mean’].plot(style=’b–.’,alpha=0.7,ax=ax2,secondary_y=True,label=’平均值’)

ax2.set_ylabel(‘Sum of rental’)

ax2.right_ax.set_ylabel(‘Ayg of rental’)

ax2.set_title(‘The rental number of bike_sharing in 2011-2012 with different windspped’)

ax2.set_xlabel(‘Windspeed’)

天气状况越好、风速越小(低于6),租车量就越大。而极端天气和较大风速(大于25)对应的平均租车量反而较高,产生了异常值,因为它们出现的天数较少,导致租车数量波动值较大。

#散点图

#https://blog.csdn.net/qq_17278169/article/details/54927014,参数解释

fig,axes=plt.subplots(3,1,figsize=(10,13)) #3行1列

ax1=plt.subplot(3,1,1)

df_hum=climateDf[[‘humidity’, ‘count’]]

ax1.scatter(df_hum[‘humidity’],df_hum[‘count’],s=df_hum[‘count’]/5,c=df_hum[‘count’],marker=’.’,alpha=0.8)

# s:点的大小 ; c:点的颜色

ax1.set_title(‘The rental number of bike_sharing in 2011-2012 with different humidity’)

ax1.set_xlabel(‘Humidity’)

ax1.set_ylabel(‘Number’)

ax2=plt.subplot(3,1,2)

df_temp=climateDf[[‘temp’, ‘count’]]

ax2.scatter(df_temp[‘temp’], df_temp[‘count’], s=df_temp[‘count’]/5, c=df_temp[‘count’], marker=’.’, alpha=0.8)

ax2.set_title(‘The rental number of bike_sharing in 2011-2012 with different temperature’)

ax2.set_xlabel(‘Temperature’)

ax2.set_ylabel(‘Number’)

ax3=plt.subplot(3,1,3)

df_temp=climateDf[[‘windspeed’, ‘count’]]

ax3.scatter(df_temp[‘windspeed’], df_temp[‘count’], s=df_temp[‘count’]/5, c=df_temp[‘count’], marker=’.’, alpha=0.8)

ax3.set_title(‘The rental number of bike_sharing in 2011-2012 with different windspeed’)

ax3.set_xlabel(‘Windspeed’)

ax3.set_ylabel(‘Number’)


随着湿度的增大,租车量在减小;

租车量随着温度增加先递增后递减,最佳温度在25-30度之间;

在某一范围内(5