数据分析_小费数据集描述性分析

Post author:xfxia
Post published:2023年8月30日
Post category:其他

项目介绍

众所周知，在西方国家的服务行业中，顾客会给服务员一定金额的小费。本次项目研究对象是餐饮行业收集到的小费数据。

数据获取

本次项目的数据来源是python第三方库seaborn中自带的数据。数据集中含有7个字段，包括有消费总金额(totall_bill)(不含小费)，小费金额(tip)，顾客性别(sex)，消费的星期(day)，消费的时间段(time)，用餐人数(size)，顾客是否抽烟(smoker)

# 设置cell多行输出

from IPython.core.interactiveshell import InteractiveShell 
InteractiveShell.ast_node_interactivity = 'all' #默认为'last'

# 导入相关库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

# 导入数据集
tips = sns.load_dataset('tips')
tips.head()

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4

定义问题

本次研究将围绕小费数据集进行。

研究小费金额与消费总金额是否存在相关性？

小费金额与消费的日期，时间段，用餐人数以及顾客是否吸烟是否存在一定的关联？

数据清洗与整理

tips.info()  # 查看数据结构

# 数据结构 (244,7),也就是一共244条数据，包含7个字段的信息
# 从结构数据返回，观察不存在缺失数据，且各列的数据类型也符合实际情况

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
total_bill    244 non-null float64
tip           244 non-null float64
sex           244 non-null category
smoker        244 non-null category
day           244 non-null category
time          244 non-null category
size          244 non-null int64
dtypes: category(4), float64(2), int64(1)
memory usage: 7.2 KB

tips.isna().sum() # 进一步判断是否存在缺失数据

# 不存在缺失数据

total_bill    0
tip           0
sex           0
smoker        0
day           0
time          0
size          0
dtype: int64

数据探索

1.消费总金额与小费金额的关系

# 小费金额基本情况描述
tips.describe()['tip']

count    244.000000
mean       2.998279
std        1.383638
min        1.000000
25%        2.000000
50%        2.900000
75%        3.562500
max       10.000000
Name: tip, dtype: float64

tips['tip'].hist(bins=20,figsize=(8,6))
plt.xlabel('tip')
plt.ylabel('freq')
plt.title('Basic information of tip amount',pad=12

原文链接：https://blog.csdn.net/weixin_45556639/article/details/105469280

项目介绍

数据获取

定义问题

数据清洗与整理

数据探索

你可能也喜欢