栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

Pandas 库的数据处理

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Pandas 库的数据处理

# 导入 pandas 库,并简记成"pd"
import pandas as pd

我们使用的 "homelessness" 数据集一览

regionstateindividualsfamily_membersstate_pop
East South CentralAlabama2570.0864.04887681
PacificAlaska1434.0582.0735139
MountainArizona7259.02606.07158024
West South CentralArkansas2280.0432.03009733
..............................

 

.head() / .info() / .shape / .describe() 

# 输出 Dataframe 数据的前五行
print(homelessness.head())

# 查看 Dataframe 数据的 Column / 有无Null值 / 数据类型
print(homelessness.info())

# 输出 Dataframe 数据的大小
print(homelessness.shape)

# 输出 Dataframe 数据中数字类型数据的统计特征,比如均值,方差等
print(homelessness.describe())

输出结果:

 output:
                   region       state  individuals  family_members  state_pop
    0  East South Central     Alabama       2570.0           864.0    4887681
    1             Pacific      Alaska       1434.0           582.0     735139
    2            Mountain     Arizona       7259.0          2606.0    7158024
    3  West South Central    Arkansas       2280.0           432.0    3009733
    4             Pacific  California     109008.0         20964.0   39461588
    
    Int64Index: 51 entries, 0 to 50


    Data columns (total 5 columns):
     #   Column          Non-Null Count  Dtype  
    ---  ------          --------------  -----  
     0   region          51 non-null     object 
     1   state           51 non-null     object 
     2   individuals     51 non-null     float64
     3   family_members  51 non-null     float64
     4   state_pop       51 non-null     int64  
    dtypes: float64(2), int64(1), object(2)
    memory usage: 2.4+ KB
    None


    (51, 5)


           individuals  family_members  state_pop
    count       51.000          51.000  5.100e+01
    mean      7225.784        3504.882  6.406e+06
    std      15991.025        7805.412  7.327e+06
    min        434.000          75.000  5.776e+05
    25%       1446.500         592.000  1.777e+06
    50%       3082.000        1482.000  4.461e+06
    75%       6781.500        3196.000  7.341e+06
    max     109008.000       52070.000  3.946e+07

.values / .columns / .index

# 查看具体数值
print(homelessness.values)

#查看表格的列
print(homelessness.columns)

# 查看表格的行
print(homelessness.index)

输出结果:

 output:
    [['East South Central' 'Alabama' 2570.0 864.0 4887681]
     ['Pacific' 'Alaska' 1434.0 582.0 735139]
     :                         :                           :
     ['South Atlantic' 'West Virginia' 1021.0 222.0 1804291]
     ['East North Central' 'Wisconsin' 2740.0 2167.0 5807406]
     ['Mountain' 'Wyoming' 434.0 205.0 577601]]


    Index(['region', 'state', 'individuals', 'family_members', 'state_pop'], dtype='object')


    Int64Index([0, 1, ... 48, 49, 50], dtype='int64')

.sort_values

# 按照 "individuals" 这一列的数字的升序排列
one_column_increase = homelessness.sort_values('family_members')

# 按照 "individuals" 这一列的数字的降序排列
one_column_decrease = homelessness.sort_values('family_members',ascending = False)

# 先按照 "region" 列的数字升序排列,再按照 "family_members" 列的数字降序排列
multi_column_incre_decre = homelessness.sort_values(["region","family_members"],ascending = [True,False])

用.head()输出看一下:

 output:
    one_column_increase.head() is  
                  region         state    individuals  family_members  state_pop
    50            Mountain       Wyoming        434.0           205.0     577601
    34  West North Central  North Dakota        467.0            75.0     758080
    7       South Atlantic      Delaware        708.0           374.0     965479
    39         New England  Rhode Island        747.0           354.0    1058287
    45         New England       Vermont        780.0           511.0     624358


    one_column_decrease.head() is
                   region       state  individuals  family_members  state_pop
    4              Pacific  California     109008.0         20964.0   39461588
    32        Mid-Atlantic    New York      39827.0         52070.0   19530351
    9       South Atlantic     Florida      21443.0          9587.0   21244317
    43  West South Central       Texas      19199.0          6111.0   28628666
    47             Pacific  Washington      16424.0          5880.0    7523869


    multi_column_incre_decre.head() is
                    region      state  individuals  family_members  state_pop
    13  East North Central   Illinois       6752.0          3891.0   12723071
    35  East North Central       Ohio       6929.0          3320.0   11676341
    22  East North Central   Michigan       5209.0          3142.0    9984072
    49  East North Central  Wisconsin       2740.0          2167.0    5807406
    14  East North Central    Indiana       3776.0          1482.0    6695497

 

一些常见的子集索引

        列的名称索引:

# 单列 "individuals" 索引
one_col = homelessness["individuals"]

# 双列 "state","family_members" 索引
two_cols = homelessness[["state","family_members"]]

        行的名称索引: 

# 找出 "state" 为几个特定名称的行
fix_state = ["California", "Arizona", "Nevada", "Utah"] # 特定名称如下
name_find_rows = homelessness[homelessness["state"].isin(canu)] # 索引使用.isin()

        行的逻辑索引:

# 找出 "family_members" 少于 1000 ,"region" 是 "Pacific" 的行
logi_find_rows = homelessness[(homelessness['family_members']<1000)&(homelessness['region']=='Pacific')]

 创建新的列

# 创建 "indiv_per_10k" 列表示每 10k 人口的流浪人口比例 
homelessness["indiv_per_10k"] = 10000 * homelessness["individuals"] / homelessness["state_pop"] 
 output:
                  region       state  individuals  family_members  state_pop  indiv_per_10k
    0  East South Central     Alabama       2570.0          864.0    4887681          5.258
    1             Pacific      Alaska       1434.0          582.0     735139         19.507
    2            Mountain     Arizona       7259.0         2606.0    7158024         10.141
    3  West South Central    Arkansas       2280.0          432.0    3009733          7.575
    4             Pacific  California     109008.0        20964.0   39461588         27.624

 

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/656414.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号