栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

数据集:大学毕业生收入

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

数据集:大学毕业生收入

数据集 大学毕业生收入

下载地址 本文以绘制直方图为主。

1. 字段描述 字段名称字段类型字段说明Major_code整型专业代码。Major字符型专业名称。Major_category字符型专业所属目录。Total整型总人数。Employed整型就业人数。Employed_full_time_year_round整型全年全职在岗人数。Unemployed整型失业人数。Unemployment_rate浮点型失业率。Median整型收入的中位数。P25th整型收入的25百分位数。P75th浮点型收入的75百分位数。 2. 数据预处理 2.1 导包
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
import warnings
warnings.filterwarnings( ignore )
2.2 读取数据
df pd.read_csv( 大学毕业生收入数据集.csv )
3. 数据预览 3.1 预览数据
print(df.head())

结果

Major_code Major ... P25th P75th
0 1100 GENERAL AGRICULTURE ... 34000 80000.0
1 1101 AGRICULTURE PRODUCTION AND MANAGEMENT ... 36000 80000.0
2 1102 AGRICULTURAL EConOMICS ... 40000 98000.0
3 1103 ANIMAL SCIENCES ... 30000 72000.0
4 1104 FOOD SCIENCE ... 38500 90000.0
3.2 查看基本信息
df.info()

结果

RangeIndex: 173 entries, 0 to 172
Data columns (total 11 columns):
 # Column Non-Null Count Dtype 
--- ------ -------------- ----- 
 0 Major_code 173 non-null int64 
 1 Major 173 non-null object 
 2 Major_category 173 non-null object 
 3 Total 173 non-null int64 
 4 Employed 173 non-null int64 
 5 Employed_full_time_year_round 173 non-null int64 
 6 Unemployed 173 non-null int64 
 7 Unemployment_rate 173 non-null float64
 8 Median 173 non-null int64 
 9 P25th 173 non-null int64 
 10 P75th 173 non-null float64
dtypes: float64(2), int64(7), object(2)
3.3 查看重复值
print(df.duplicated().sum())

结果

0
3.4 查看缺失值
print(df.isnull().sum())

结果

Major_code 0
Major 0
Major_category 0
Total 0
Employed 0
Employed_full_time_year_round 0
Unemployed 0
Unemployment_rate 0
Median 0
P25th 0
P75th 0
dtype: int64
4. 数据集描述性信息
describe df.describe()
print(describe)

结果

Major_code Total ... P25th P75th
count 173.000000 1.730000e 02 ... 173.000000 173.000000
mean 3879.815029 2.302566e 05 ... 38697.109827 82506.358382
std 1687.753140 4.220685e 05 ... 9414.524761 20805.330126
min 1100.000000 2.396000e 03 ... 24900.000000 45800.000000
25% 2403.000000 2.428000e 04 ... 32000.000000 70000.000000
50% 3608.000000 7.579100e 04 ... 36000.000000 80000.000000
75% 5503.000000 2.057630e 05 ... 42000.000000 95000.000000
max 6403.000000 3.123510e 06 ... 78000.000000 210000.000000
[8 rows x 9 columns]

可在变量视图中查看describe

5. 数据分析 5.1 各专业种类 Major_category 的专业分支个数
Major_category_counts df[ Major_category ].value_counts()
print(Major_category_counts)
rects plt.bar(range(1,17),Major_category_counts);
for rect in rects: #rects 是三根柱子的集合
 height rect.get_height()
 plt.text(rect.get_x() rect.get_width() / 2, height, str(height), size 12, ha center , va bottom )
interval [ Engineering , Education , Humanities Liberal Arts , Biology Life Science , Business , Health , Computers Mathematics , Agriculture Natural Resources , Physical Sciences , Social Science , Psychology Social Work , Arts , Industrial Arts Consumer Services , Law Public Policy , Communications Journalism , Interdisciplinary ]
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Number of Branches by Major Category )
plt.ylabel( Counts )
plt.show()

结果

Engineering 29
Education 16
Humanities Liberal Arts 15
Biology Life Science 14
Business 13
Health 12
Computers Mathematics 11
Agriculture Natural Resources 10
Physical Sciences 10
Social Science 9
Psychology Social Work 9
Arts 8
Industrial Arts Consumer Services 7
Law Public Policy 5
Communications Journalism 4
Interdisciplinary 1
Name: Major_category, dtype: int64

图示

结论
由于机械类专业发展历史悠久 故相对来说机械类专业分支数相较其他大类专业要多

5.2 各大类专业收入
averageMoney []
for i in range(len(interval)):
 sum 0
 for j in range(173):
 if df[ Major_category ][j] interval[i]:
 sum sum df[ Median ][j]
 averageMoney.append(sum/Major_category_counts[i])
plt.bar(range(1,17),averageMoney);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Average Annual salary by Major Category )
plt.ylabel( Moneys )
plt.show()

图示

结论
由于机械类专业与人工智能、自动化等领域相关 故平均工资比较高 计算机与数学类专业发展前景很好 但是小公司工资普遍不高 大公司工资相对来说较高。

5.3 各大类专业失业率
averageUnemployRate []
for i in range(len(interval)):
 sum 0
 for j in range(173):
 if df[ Major_category ][j] interval[i]:
 sum sum df[ Unemployment_rate ][j]
 averageUnemployRate.append(sum/Major_category_counts[i])
plt.bar(range(1,17),averageUnemployRate);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Average Unemployment Rate by Major Category )
plt.ylabel( Rate )
plt.show()

图示

结论
艺术类专业由于可变动性特别大 加上对人才的要求相对来说较为苛刻 故失业率较高。

5.4 各大类专业就业率
averageEmployRate []
for i in range(len(interval)):
 sum 0
 for j in range(173):
 if df[ Major_category ][j] interval[i]:
 sum sum df[ Employed ][j] / df[ Total ][j]
 averageEmployRate.append(sum/Major_category_counts[i])
plt.bar(range(1,17),averageEmployRate);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Average Employment Rate by Major Category )
plt.ylabel( Rate )
plt.show()

图示

结论
相对来说 由于计算机的发展前景 计算机与数学类的就业率较高。

5.5 各大类专业全年全职在岗率
averageFullTimeRate []
for i in range(len(interval)):
 sum 0
 for j in range(173):
 if df[ Major_category ][j] interval[i]:
 sum sum df[ Employed_full_time_year_round ][j] / df[ Employed ][j]
 averageFullTimeRate.append(sum/Major_category_counts[i])
plt.bar(range(1,17),averageFullTimeRate);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Average Full-Time Rate by Major Category )
plt.ylabel( Rate )
plt.show()

图示

5.6 各大类专业总人数
averageNum []
for i in range(len(interval)):
 sum 0
 for j in range(173):
 if df[ Major_category ][j] interval[i]:
 sum sum df[ Total ][j]
 averageNum.append(sum/Major_category_counts[i])
plt.bar(range(1,17),averageNum);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Average Total Numbers by Major Category )
plt.ylabel( Counts )
plt.show()

图示

5.7 就业失业比
EUratio []
for i in range(len(interval)):
 EUratio.append(averageEmployRate[i]/averageUnemployRate[i])
plt.bar(range(1,17),EUratio);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Employment-Unemployment Ratio by Major Category )
plt.ylabel( Ratio )
plt.show()

图示

结论
相对来说 农业就业的门槛低 就业率高的同时失业率低。

6. 完整代码
# 导包
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
import warnings
warnings.filterwarnings( ignore )
# 读取数据
df pd.read_csv( 大学毕业生收入数据集.csv )
# 预览数据
print(df.head())
# 规范字段名称(本数据集已经较为规范)
# 查看基本信息
df.info()
# 查看重复值
print(df.duplicated().sum())
# 查看缺失值
print(df.isnull().sum())
# 查看数据集描述性信息
describe df.describe()
print(describe)
# 统计表中每个专业种类 Major_category 的个数
Major_category_counts df[ Major_category ].value_counts()
print(Major_category_counts)
rects plt.bar(range(1,17),Major_category_counts);
for rect in rects: #rects 是三根柱子的集合
 height rect.get_height()
 plt.text(rect.get_x() rect.get_width() / 2, height, str(height), size 12, ha center , va bottom )
interval [ Engineering , Education , Humanities Liberal Arts , Biology Life Science , Business , Health , Computers Mathematics , Agriculture Natural Resources , Physical Sciences , Social Science , Psychology Social Work , Arts , Industrial Arts Consumer Services , Law Public Policy , Communications Journalism , Interdisciplinary ]
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Number of Branches by Major Category )
plt.ylabel( Counts )
plt.show()
# 对各大类专业收入作统计并作图
averageMoney []
for i in range(len(interval)):
 sum 0
 for j in range(173):
 if df[ Major_category ][j] interval[i]:
 sum sum df[ Median ][j]
 averageMoney.append(sum/Major_category_counts[i])
plt.bar(range(1,17),averageMoney);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Average Annual salary by Major Category )
plt.ylabel( Moneys )
plt.show()
# 对各大类专业失业率作统计并作图
averageUnemployRate []
for i in range(len(interval)):
 sum 0
 for j in range(173):
 if df[ Major_category ][j] interval[i]:
 sum sum df[ Unemployment_rate ][j]
 averageUnemployRate.append(sum/Major_category_counts[i])
plt.bar(range(1,17),averageUnemployRate);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Average Unemployment Rate by Major Category )
plt.ylabel( Rate )
plt.show()
# 对各大类专业就业率作统计并作图
averageEmployRate []
for i in range(len(interval)):
 sum 0
 for j in range(173):
 if df[ Major_category ][j] interval[i]:
 sum sum df[ Employed ][j] / df[ Total ][j]
 averageEmployRate.append(sum/Major_category_counts[i])
plt.bar(range(1,17),averageEmployRate);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Average Employment Rate by Major Category )
plt.ylabel( Rate )
plt.show()
# 对各大类专业全年全职在岗率作统计并作图 没有早退的 
averageFullTimeRate []
for i in range(len(interval)):
 sum 0
 for j in range(173):
 if df[ Major_category ][j] interval[i]:
 sum sum df[ Employed_full_time_year_round ][j] / df[ Employed ][j]
 averageFullTimeRate.append(sum/Major_category_counts[i])
plt.bar(range(1,17),averageFullTimeRate);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Average Full-Time Rate by Major Category )
plt.ylabel( Rate )
plt.show()
# 对各大类专业总人数作统计并作图
averageNum []
for i in range(len(interval)):
 sum 0
 for j in range(173):
 if df[ Major_category ][j] interval[i]:
 sum sum df[ Total ][j]
 averageNum.append(sum/Major_category_counts[i])
plt.bar(range(1,17),averageNum);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Average Total Numbers by Major Category )
plt.ylabel( Counts )
plt.show()
# 对各大类专业就业失业比作统计并作图
EUratio []
for i in range(len(interval)):
 EUratio.append(averageEmployRate[i]/averageUnemployRate[i])
plt.bar(range(1,17),EUratio);
plt.xticks(range(1,17),interval,rotation 90);
plt.title( Employment-Unemployment Ratio by Major Category )
plt.ylabel( Ratio )
plt.show()
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/267313.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号