第9章 汽车销售数分析系统
1 创建数据库cardb
hive>create database if not exists cardb;
hive>use cardb;
2 创建数据表car
hive>create external table cars(province string, month int, city string, country string, year int, cartype string, productor string, brand string, mold string, owner string, nature string, number int, ftype string, outv int, power double, fuel string, length int, width int, height int, xlength int, xwidth int, xheight int, count int, base int, front int, norm string, tnumber int, total int, curb int, hcurb int,
passenger string, zhcurb int, business string, dtype string, fmold string, fbusiness string, name string, age int, sex string )row format delimited fields terminated by ',' location '/cars';
hive> desc cars;
province string month int city string country string year int cartype string productor string brand string mold string owner string nature string number int ftype string outv int power double fuel string length int width int height int xlength int xwidth int xheight int count int base int front int norm string tnumber int total int curb int hcurb int passenger string zhcurb int business string dtype string fmold string fbusiness string name string age int sex string
3 加载数据文件到数据表cars
1)使用XShell的XFtp文件传输工具,将Win7系统D:Hive教学教学课件Hive离线计算-配套视频代码数据资料\汽车销售系统datacar.txt数据文件上传到虚拟机Linux系统的/root目录
2) hive> load data local inpath '/root/car.txt' overwrite into table cars; 加载数据文件
3) hive> select * from cars limit 10; 查询前10条数据
4 统计cars表的数据行数
hive> select count(*) from cars;
OK
70640
说明:cars表总共有70640行数据
hive>select count(*) from cars where nature is not null and nature != '';
OK
70362
说明:car表中nature列值不为null且不为空字符串的数据行共有70362行,70640-70362=278行数据的nature列缺失值,车辆使用性质未知
5 统计车辆的使用性质:商用车、乘用车的数量
hive> select nature,count(*) as cnt from cars group by nature having nature is not null and nature != '';
中小学生校车 119 公交客运 1742 公路客运 1448 出租客运 2 初中生校车 2 小学生校车 111 工程救险 1 幼儿校车 17 救护 1 教练 26 旅游客运 219 消防 7 租赁 24 警用 165 非营运 66478
说明:car表中nature列值不为null且不为空字符串的数据行共有119+1742+1448+2+2+111+1+17+1+26+219+7+24+165+66478=70362行
hive>select '非营运', sum(if(a.nature='非营运',a.cnt,0)), '营运', sum(if(a.nature!='非营运',a.cnt,0))
from (select nature,count(*) as cnt from cars
group by nature having nature is not null and nature != '') a;
OK
非营运 66478 营运 3884
Time taken: 47.985 seconds, Fetched: 1 row(s)
说明:car表的nature列值为“非营运”的数据行有66478行,nature列值为“中小学生校车”、“公交客运”等属于营运性质的数据行有3884行,总共66478+3884=70362行
6 按月统计山西省2013年的汽车销售比例
hive> select sum(number) as total from cars where province='山西省' and year='2013';
OK
70362
说明:山西省2013年按年份统计的汽车销售总量是70362; 执行select * from cars where number !=1; 查询不到任何结果,说明cars表的所有行number列值都是1,因此sum(number)恰好等于总行数70362
hive> select month, sum(number) as ss from cars where province='山西省' and year='2013' group by month;
1 10413 2 4103 3 6548 4 4635 5 5151 6 3903 7 4449 8 4488 9 4889 10 7352 11 7074 12 7357
说明:山西省2013年按月份统计的汽车销售数量
hive> select month, c1.ss/c.total
from
(select month, sum(number) as ss from cars where province='山西省' and year='2013' group by month) c1,
(select sum(number) as total from cars where province='山西省' and year='2013') c;
OK
1 0.14799181376311077 2 0.05831272561894204 3 0.09306159574770473 4 0.06587362496802251 5 0.0732071288479577 6 0.05547028225462608 7 0.06323015263920867 8 0.06378442909525028 9 0.06948352804070379 10 0.1044882180722549 11 0.10053722179585571 12 0.1045592791563628
说明:山西省2013年每个月的汽车销售比例,即每月的销售数量/一年的销售数量



