栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

Pandas学习笔记(5) Data Types and Missing Values

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Pandas学习笔记(5) Data Types and Missing Values

1.理论部分

1.用dtype函数查看数据类型

reviews.price.dtype
dtype('float64')
reviews.dtypes
country        object
description    object
                ...  
variety        object
winery         object
Length: 13, dtype: object

2.用astype函数更换数据类型

reviews.points.astype('float64')
0         87.0
1         87.0
          ... 
129969    90.0
129970    90.0
Name: points, Length: 129971, dtype: float64

3.通过isnull函数查找有缺失值的数据

reviews[pd.isnull(reviews.country)]

4.用fillna函数填充缺失值

reviews.region_2.fillna("Unknown")
0         Unknown
1         Unknown
           ...   
129969    Unknown
129970    Unknown
Name: region_2, Length: 129971, dtype: object

5.用replace函数替换值

reviews.taster_twitter_handle.replace("@kerinokeefe", "@kerino")
0            @kerino
1         @vossroger
             ...    
129969    @vossroger
129970    @vossroger
Name: taster_twitter_handle, Length: 129971, dtype: object
2.实践部分

1.What is the data type of the points column in the dataset?

dtype = reviews.points.dtype

2.Create a Series from entries in the points column, but convert the entries to strings. Hint: strings are str in native Python.

point_strings = reviews.points.astype('str')

3.Sometimes the price column is null. How many reviews in the dataset are missing a price?

n_missing_prices = pd.isnull(reviews.price).sum()

4.What are the most common wine-producing regions? Create a Series counting the number of times each value occurs in the region_1 field. This field is often missing data, so replace missing values with Unknown. Sort in descending order. Your output should look something like this:

Unknown                    21247
Napa Valley                 4480
                           ...  
Bardolino Superiore            1
Primitivo del Tarantino        1
Name: region_1, Length: 1230, dtype: int64
reviews_per_region = reviews.region_1.fillna('Unknow').value_counts().sort_values(ascending=False)
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/664740.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号