数据科学库--第三天

- numpy中的对数组的一些基本操作
- numpy对文件的读取操作
- numpy 索引和切片
- numpy修改数组中的值
- numpy中的nan和常用统计方法
- 填充nan值
- 数组的拼接
- numpy中的一些其他方法
- numpy生成随机数

numpy中的对数组的一些基本操作

下面是numpy在对数组处理上的一些基本应用：

import numpy as np
import random

t1 = np.array([1, 2, 3])
print(t1)
print(type(t1))

t2 = np.arange(4, 10, 3)
print(t2)
print(type(t2))
print(t2.dtype)

# numpy中的数据类型
t3 = np.array(range(1, 4), dtype="i1")
print(t3)
print(t3.dtype)

# numpy中的bool类型
t4 = np.array([1, 1, 0, 1, 0, 0], dtype=bool)
print(t4)
print(t4.dtype)

# numpy中调整数据类型

t5 = t4.astype("int8")
print(t5)
print(t5.dtype)

# numpy 中的小树
t6 = np.array([random.random() for i in range(10)])
print(t6)
print(t6.dtype)

# 设置精度
t7 = np.round(t6, 2)
print(t7)

# shape 和 reshape 方法

t8 = np.array([[[1, 2, 3], [4, 5, 6]], [[10, 11, 12], [13, 14, 15]]])
print(t8.shape)

t9 = np.arange(12)  # reshape 方法有返回值，并不会改变原数组
print(t9.reshape((3, 4)))
print(t9)
t9 = t9.reshape((3, 4))
# t9变回一元数组的两种方式的列数
print(t9.reshape(12, ))
print(t9.flatten())

# numpy中数组的运算,数据的运算具有广播原则
t10 = np.array(range(5))
print(t10 / 5)
print(t10 + 2)

numpy数组转置

import  numpy as np


t1 = np.arange(24).reshape((4,6))
#numpy数组转置的三种方式
print("原数组为：n",t1)
print("第一种方式为：n",t1.transpose())
print("-"*100)
print("第二种方式为：n",t1.T)
print("-"*100)
print("第三种方式为：n",t1.swapaxes(1,0))

输出：

原数组为：
 [[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
第一种方式为：
 [[ 0  6 12 18]
 [ 1  7 13 19]
 [ 2  8 14 20]
 [ 3  9 15 21]
 [ 4 10 16 22]
 [ 5 11 17 23]]
----------------------------------------------------------------------------------------------------
第二种方式为：
 [[ 0  6 12 18]
 [ 1  7 13 19]
 [ 2  8 14 20]
 [ 3  9 15 21]
 [ 4 10 16 22]
 [ 5 11 17 23]]
----------------------------------------------------------------------------------------------------
第三种方式为：
 [[ 0  6 12 18]
 [ 1  7 13 19]
 [ 2  8 14 20]
 [ 3  9 15 21]
 [ 4 10 16 22]
 [ 5 11 17 23]]

Process finished with exit code 0

numpy对文件的读取操作

方法：

@set_module('numpy')
def loadtxt(fname, dtype=float, comments='#', delimiter=None,
            converters=None, skiprows=0, usecols=None, unpack=False,
            ndmin=0, encoding='bytes', max_rows=None):

numpy 索引和切片

import numpy as np

t1 = np.arange(24).reshape(4, 6)
print("这是t1数组：n", t1)
# numpy取数操作
print("----------")
t0 = t1[0, 1]
print("取数操作：n", t0)
# numpy取行操作
print("----------")
t2 = t1[2, :]
print("取单行操作：n", t2)
print("----------")
t3 = t1[[0, 2]]
print("取多行操作_1：n", t3)
print("----------")
t4 = t1[[0, 2], 2:]
print("取多行操作_2: n", t4)
print("----------")
# numpy取列操作
t5 = t1[:, [3]]
print("取列操作：n", t5)

# nump取多行多列

t6 = t1[0:2, 2:4]
print("取多行多列操作： n", t6)

输出：

这是t1数组：
 [[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
----------
取数操作：
 1
----------
取单行操作：
 [12 13 14 15 16 17]
----------
取多行操作_1：
 [[ 0  1  2  3  4  5]
 [12 13 14 15 16 17]]
----------
取多行操作_2: 
 [[ 2  3  4  5]
 [14 15 16 17]]
----------
取列操作：
 [[ 3]
 [ 9]
 [15]
 [21]]
取多行多列操作： 
 [[2 3]
 [8 9]]

Process finished with exit code 0

numpy修改数组中的值

其中：
1.where()方法为numpy中的三元运算
2.clip()方法为比第一个值小的全部替换为第一个值，比第二个值大的全部替换为第二个值

import numpy as np

t1 = np.arange(24).reshape(4, 6)
print("这是t1数组：n", t1)
print('---------')
# numpy中修改数组的值
# t1[t1<16] = 1
# print(t1)
print('---------')
t2 = np.where(t1 < 10, 0, 20)
print(t2)
print('---------')
t3 = t1.clip(10, 18)
print(t3)

输出：

这是t1数组：
 [[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
---------
---------
[[ 0  0  0  0  0  0]
 [ 0  0  0  0 20 20]
 [20 20 20 20 20 20]
 [20 20 20 20 20 20]]
---------
[[10 10 10 10 10 10]
 [10 10 10 10 10 11]
 [12 13 14 15 16 17]
 [18 18 18 18 18 18]]

numpy中的nan和常用统计方法

nan：not a number 表示不是一个数字
ps.当我们读取本地的文件为float时候，如果有缺失，就会出现nan，或者当做了一个不合适的计算的时候（比如无穷大减去无穷大）
inf：inf为正无穷，-inf为负无穷

nan 中的注意点：
1.两个nan是不相等的
2.nan和任何值计算都为nan

根据特性1可以计算数组中nan的
如将所有nan替换为0，然后利用count(0)来统计nan的值

numpy中常用统计函数：
求和：t.sum(axis=None)
均值: t.mean(a,axis=None) 受离群点的影响较大
中值: np.median(t,axis=None)
最大值: t.max(axin=None)
最小值：t.min(axis = None)
极值：np.ptp(t,axis = None)即最大值和最小值之差
标准差：t.std(axis =None)

填充nan值

代码：

import numpy as np


# 创建方法
def fill_Nan(t):
    for i in range(t.shape[1]):  # 按列遍历数组
        temp_col = t[:, i]  # 当前的一列
        nan_num = np.count_nonzero(temp_col != temp_col)
        if nan_num != 0:  # 不为0，说明当前这一列有nan
            temp_not_nan_col = temp_col[temp_col == temp_col]  # 将当前一列不为nan的其余值取出
            temp_col[np.isnan(temp_col)] = temp_not_nan_col.mean()  # 将列平均值填充进nan中

    return t


if __name__ == "__main__":
    # 先生成一个三行四列的二维数据
    t1 = np.arange(12).reshape(3, 4).astype("float")
    # 将其中第二行中的第三列之后的值替换为nan值
    t1[1, 2:] = np.nan
    print("原数组为:n", t1)
    # 调用该方法
    fill_Nan(t1)
    print("新数组为:n", t1)

输出：

原数组为:
 [[ 0.  1.  2.  3.]
 [ 4.  5. nan nan]
 [ 8.  9. 10. 11.]]
新数组为:
 [[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]

数组的拼接

代码：

import numpy as np

# 先生成一个四行六列的二维数据
t1 = np.arange(12).reshape(3, 4)
t2 = np.arange(48, 60).reshape(3, 4)
# 竖直拼接
t3 = np.vstack((t1, t2))
print(t3)
# 竖直分割
t4 = np.vsplit(t3, 2)
print(t4)
# 水平拼接
t5 = np.hstack((t1, t2))
print(t5)
# 水平分割
t6 = np.hsplit(t5, 2)
print(t6)

输出：

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [48 49 50 51]
 [52 53 54 55]
 [56 57 58 59]]
[array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]]), array([[48, 49, 50, 51],
       [52, 53, 54, 55],
       [56, 57, 58, 59]])]
[[ 0  1  2  3 48 49 50 51]
 [ 4  5  6  7 52 53 54 55]
 [ 8  9 10 11 56 57 58 59]]
[array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]]), array([[48, 49, 50, 51],
       [52, 53, 54, 55],
       [56, 57, 58, 59]])]

numpy中的一些其他方法

1.获取最大值最小值的位置
np.argmax(t,axis= 0)
np.argmix(t,axis= 0)
2.创建一个全0的数组：np.zeros((3,4))
3.创建一个全1的数组：np.ones((3,4))
4.创建一个对角线为1的正方形数组（方阵）：np.eye(3)

numpy生成随机数

参数	解释
.rand(d0,dn)	创建d0-dn维度的均匀分布的随机数数组，浮点数，范围从0-1
.randn(d0,dn)	创建d0-dn维度的标准正太分布随机数，浮点数，平均数0，标准差1
.randint(low,high,(shape))	从给定上下限范围选取随机数正数，范围是low-high，形状为shape
.uniform(low,high,(size))	产生具有均匀分布的数组，low起始值，high结束值，size形状
.normal(loc,scale,(size))	从指定正太分布中随机抽取样本，分布中心是loc(概率分布的均值)，标准差是scale，形状是size
.seed(s)	随机数种子，s是给定的种子值。因为计算机生成的是伪随机数，所以通过设定相同的随机数种子，可以每次生成相同的随机数

数据科学库--第三天

Python相关栏目本月热门文章