栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

pandas10minnutes

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

pandas10minnutes

10 minutes to pandas英文网址 pandas10minnutes_中英对照01 pandas10minnutes_中英对照02 pandas10minnutes_中英对照03 [pandas10minnutes_中英对照04 待更新]

本次主要讲以下章节内容:
4.Missing data 缺失数据
5.Operations 操作
6.Merge 合并

4.Missing data 缺失数据

pandas primarily uses the value np.nan to represent missing data. It is by default not included in computations. See the Missing Data section.
Reindexing allows you to change/add/delete the index on a specified axis. This returns a copy of the data:

pandas主要使用np.nan表示缺失的数据。默认情况下,它不包括在计算中。请参阅缺失数据部分。    
重构索引允许您更改/添加/删除指定轴上的索引。这将返回数据的副本:
import numpy as np
import pandas as pd
dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6))
df["F"] = s1
df
ABCDF
2013-01-010.184624-1.0428140.444349-0.259771NaN
2013-01-02-0.744011-0.390294-0.1332670.9521791.0
2013-01-031.0039100.718454-0.0824832.1829442.0
2013-01-04-2.222158-0.509435-0.3671560.8521583.0
2013-01-05-0.4202092.1786012.5526430.7334524.0
2013-01-060.4509581.0656500.1717980.7013915.0
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ["E"])
df1
ABCDFE
2013-01-010.184624-1.0428140.444349-0.259771NaNNaN
2013-01-02-0.744011-0.390294-0.1332670.9521791.0NaN
2013-01-031.0039100.718454-0.0824832.1829442.0NaN
2013-01-04-2.222158-0.509435-0.3671560.8521583.0NaN

To drop any rows that have missing data:
要删除任何缺少数据的行,请执行以下操作:

df1.dropna(how="any")
ABCDFE

Filling missing data:
填充缺失数据:

df1.fillna(value=5)
ABCDFE
2013-01-010.184624-1.0428140.444349-0.2597715.05.0
2013-01-02-0.744011-0.390294-0.1332670.9521791.05.0
2013-01-031.0039100.718454-0.0824832.1829442.05.0
2013-01-04-2.222158-0.509435-0.3671560.8521583.05.0

To get the boolean mask where values are nan:
要获取值为nan(缺失)的布尔掩码:

pd.isna(df1)
ABCDFE
2013-01-01FalseFalseFalseFalseTrueTrue
2013-01-02FalseFalseFalseFalseFalseTrue
2013-01-03FalseFalseFalseFalseFalseTrue
2013-01-04FalseFalseFalseFalseFalseTrue
5.Operations 操作

See the Basic section on Binary Ops.

5.1Stats

Operations in general exclude missing data.
Performing a descriptive statistic:

参见二进制操作的基本部分

统计

操作通常排除丢失的数据。
进行描述性统计:

df.mean()
A   -0.291148
B    0.336694
C    0.430981
D    0.860392
F    3.000000
dtype: float64

Same operation on the other axis:
另一个轴上的相同操作:

df.mean(1)
2013-01-01    0.191630
2013-01-02   -0.114052
2013-01-03    0.071200
2013-01-04   -0.257770
2013-01-05    0.466199
2013-01-06    0.878283
Freq: D, dtype: float64

Operating with objects that have different dimensionality and need alignment. In addition, pandas automatically broadcasts along the specified dimension:
操作具有不同维度且需要对齐的对象。此外,pandas还会自动沿指定维度广播:

s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2)
s
2013-01-01    NaN
2013-01-02    NaN
2013-01-03    1.0
2013-01-04    3.0
2013-01-05    5.0
2013-01-06    NaN
Freq: D, dtype: float64
df.sub(s, axis="index")
#df
ABCDF
2013-01-01NaNNaNNaNNaNNaN
2013-01-02NaNNaNNaNNaNNaN
2013-01-030.003910-0.281546-1.0824831.1829441.0
2013-01-04-5.222158-3.509435-3.367156-2.1478420.0
2013-01-05-5.420209-2.821399-2.447357-4.266548-1.0
2013-01-06NaNNaNNaNNaNNaN
5.2Apply

Applying functions to the data:
应用
将函数应用于数据:

df.apply(np.cumsum)
ABCDF
2013-01-010.184624-1.0428140.444349-0.259771NaN
2013-01-02-0.559387-1.4331070.3110820.6924081.0
2013-01-030.444523-0.7146530.2285992.8753523.0
2013-01-04-1.777635-1.224088-0.1385573.7275106.0
2013-01-05-2.1978440.9545132.4140864.46096210.0
2013-01-06-1.7468872.0201642.5858845.16235315.0
df.apply(lambda x: x.max() - x.min())
A    3.226068
B    3.221415
C    2.919799
D    2.442716
F    4.000000
dtype: float64
df.apply(lambda x: x.max() - x.min(),axis=1)
2013-01-01    1.487163
2013-01-02    1.744011
2013-01-03    2.265428
2013-01-04    5.222158
2013-01-05    4.420209
2013-01-06    4.828202
Freq: D, dtype: float64
5.3Histogramming

See more at Histogramming and Discretization.
组织编程
更多信息请参见组织编程和离散化。

s = pd.Series(np.random.randint(0, 7, size=10))
s
0    5
1    2
2    6
3    6
4    4
5    1
6    2
7    3
8    1
9    2
dtype: int64
s.value_counts()
2    3
6    2
1    2
5    1
4    1
3    1
dtype: int64
5.4String Methods

字符串方法

Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in str generally uses regular expressions by default (and in some cases always uses them). See more at Vectorized String Methods.
Series(序列)在str(字符)属性中配备了一组字符串处理方法,可以方便地对数组的每个元素进行操作,如下面的代码片段所示。请注意,str中的模式匹配通常默认使用正则表达式(在某些情况下总是使用它们)。请参考向量化字符串方法。

s = pd.Series(["A", "B", "C", "Aaba", "Baca", np.nan, "CABA", "dog", "cat"])
s.str.lower()
0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object
type(s)
pandas.core.series.Series
6.Merge 6.1Concat

pandas provides various facilities for easily combining together Series and DataFrame objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations.
See the Merging section.
Concatenating pandas objects together with concat():
6.1 连接
pandas提供了各种工具用于在连接/合并类型操作的情况下,轻松地将带有索引和关系代数功能逻辑的序列和数据帧对象组合在一起。
请参阅合并部分。
将pandas对象通过concat()连接在一起:

df = pd.DataFrame(np.random.randn(10, 4))
df
0123
00.4889701.237504-1.640805-0.672117
10.3908730.9068300.2606620.119989
2-0.854710-0.5354101.6418780.321487
3-0.1347800.5555541.024371-0.103164
4-1.241929-0.116488-0.922242-2.066726
5-0.4323972.018692-0.5368010.074576
61.452204-0.5871960.9187981.192130
70.8199540.224358-0.022698-0.745293
80.266344-0.3219441.2515430.603333
9-0.4916710.2784490.1947511.056218
pieces = [df[:3], df[3:7], df[7:]]
pieces
[          0         1         2         3
 0  0.488970  1.237504 -1.640805 -0.672117
 1  0.390873  0.906830  0.260662  0.119989
 2 -0.854710 -0.535410  1.641878  0.321487,
           0         1         2         3
 3 -0.134780  0.555554  1.024371 -0.103164
 4 -1.241929 -0.116488 -0.922242 -2.066726
 5 -0.432397  2.018692 -0.536801  0.074576
 6  1.452204 -0.587196  0.918798  1.192130,
           0         1         2         3
 7  0.819954  0.224358 -0.022698 -0.745293
 8  0.266344 -0.321944  1.251543  0.603333
 9 -0.491671  0.278449  0.194751  1.056218]
pieces[0]
0123
00.4889701.237504-1.640805-0.672117
10.3908730.9068300.2606620.119989
2-0.854710-0.5354101.6418780.321487
pd.concat(pieces)
0123
00.4889701.237504-1.640805-0.672117
10.3908730.9068300.2606620.119989
2-0.854710-0.5354101.6418780.321487
3-0.1347800.5555541.024371-0.103164
4-1.241929-0.116488-0.922242-2.066726
5-0.4323972.018692-0.5368010.074576
61.452204-0.5871960.9187981.192130
70.8199540.224358-0.022698-0.745293
80.266344-0.3219441.2515430.603333
9-0.4916710.2784490.1947511.056218

note:
Adding a column to a DataFrame is relatively fast. However, adding a row requires a copy, and may be expensive. We recommend passing a pre-built list of records to the DataFrame constructor instead of building a DataFrame by iteratively appending records to it.
注意:向数据帧中添加列的速度相对较快。但是,添加行需要一个副本,而且可能会很昂贵。 我们建议将预构建的记录列表传递给DataFrame容器中,而不是通过迭代地向其追加记录来构建DataFrame。

Join 连接

SQL style merges. See the Database style joining section.
SQL风格的合并。请参见“数据库样式连接”部分。

left = pd.DataFrame({"key": ["foo", "foo"], "lval": [1, 2]})

left
keylval
0foo1
1foo2
right = pd.DataFrame({"key": ["foo", "foo"], "rval": [4, 5]})
right
keyrval
0foo4
1foo5
pd.merge(left, right, on="key")
keylvalrval
0foo14
1foo15
2foo24
3foo25
pd.merge(left, right)
keylvalrval
0foo14
1foo15
2foo24
3foo25

Another example that can be given is:
可以给出的另一个例子是:

left = pd.DataFrame({"key": ["foo", "bar"], "lval": [1, 2]})
right = pd.DataFrame({"key": ["foo", "bar"], "rval": [4, 5]})
pd.merge(left, right, on="key")
keylvalrval
0foo14
1bar25
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/856142.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号