datanlysis_Python

数据分析

1.matlab
2.Python
3.Numpy

一、Numpy是什么？

Numrical Python，数值的Python，应用于数值分析领域的Python语言工具；
Numpy是一个开源的科学计算库；
Numpy弥补了作为通用编程语言的Python在数值计算方面，能力弱，速度慢的不足；
Numpy拥有丰富的数学函数、强大的多维数组和优异的运算性能；
Numpy与Scipy、scikit、matplotlib等其他科学计算库可以很好的协调工作；
Numpy可以取代matlab等工具，允许用户进行速度开发的同时完成交互式的原型设计
vector.py

import time
import numpy as np

def o_vector_add(n: int) -> list:
    a, b = [], []
    for i in range(n):
        a.append(i ** 2)
        b.append(i ** 3)
    c = []
    for x, y in zip(a, b):
        c.append(x + y)
    return c

def n_vector_add(n: int) -> np.array:
    return np.arange(n) ** 2 + np.arange(n) ** 3

def runtime(funcName: str, *args) -> float:
    start = time.time()
    result = funcName(*args)
    end = time.time()
    return end - start

if __name__ == '__main__':
    t1 = runtime(o_vector_add, 1000000)
    t2 = runtime(n_vector_add, 1000000)
    print(t1, t2)

二、多维数组 1. numpy中的多维数组是numpy.ndarray类类型的对象，可以用于表示数据结构中的任意维度的数组； 2. 创建多维数组对象 2.1 numpy.arange(起始,终止,步长) ——> 一维数组，首元素就是起始值，尾元素在终止值之前的最后一个元素，步长即每次递增的公差。缺省起始值为0，缺省步长为1 2.2 numpy.array(任何可被解释为数组的容器） 3. 内存连续，元素同质 4. ndarray.dtype属性表示元素的数据类型。通过dtype参数和astype()方法可以指定和修改元素的数据类型。 5. ndarray.shape属性表示数组的维度：(高维度数, …, 低维度数) array.py 6. 元素索引

数组[索引]
数组[行索引][列索引]
数组[页索引][行索引][列索引]
数组[页索引, 行索引, 列索引]

7. numpy的内置类型和自定义类型 7.1 numpy的内置类型

bool_ 1字节布尔型，True/False
int8 1字节有符号整型，-128-127
int16 2字节有符号整型
int32 4字节有符号整型
int64 8字节有符号整型
uint8 1字节无符号整型，0-255
uint16 2字节无符号整型
uint32 4字节无符号整型
uint64 8字节无符号整型
float16 2字节浮点型
float32 4字节浮点型
float64 8字节浮点型
complex64 8字节复数型
complex128 16字节复数型
str_ 字符串型

7.2 自定义类型

通过dtype将多个相同或者不同的numpy内置类型组合成某种复合类型，用于数组元素的数据类型。

7.2.1 除了使用内置类型的全称以外还可以通过类型编码字符串简化类型的说明

numpy.int8 -> i1
numpy.int16 -> i2
numpy.uint32 -> u4
numpy.float64 -> f8
numpy.complex123 -> c16
numpy.str_ -> U字符数
numpy.bool_ -> b

7.2.2对于多字节整数可以加上字节序前缀

< -> 小端字节序，低数位低地址；
0x1234
L H
0x34 0x12
= -> 处理器系统默认

-> 大端字节序，低数位高地址
0x1234
L H
0x12 0x34

	代码：dtype.py

import numpy as np
a = np.array([('ABC', [1, 2, 3])], dtype='U3, 3i4')
print(a.dtype)
print(a[0]['f0'])
print(a[0]['f1'])
print(a[0]['f1'][0])

b = np.array([('ABC', [1, 2, 3])], dtype=[
('name', np.str_, 3),
('scores', np.int32, 3)
])
print(b.dtype)
print(b[0]['name'])
print(b[0]['scores'])
print(b[0]['scores'][0])

c = np.array([('ABC', [1, 2, 3])], dtype={
'names': ['name', 'scores'],
'formats': ['U3', '3i4']
})
print(c.dtype)
print(c[0]['name'])
print(c[0]['scores'])
print(c[0]['scores'][0])

d = np.array([('ABC', [1, 2, 3])], dtype={
'name': ('U3', 0),
'scores': ('3i4', 12)
})
print(d.dtype)
print(d[0]['name'])
print(d[0]['scores'])
print(d[0]['scores'][0])

e = np.array([0x1234], dtype=(
' 
8. 切片 
数组[起始:终止:步长, 起始:终止:步长, …]
 缺省起始：首（步长为正）、尾（步长为负）
 缺省终止：尾后（步长为正）、首前（步长为负）
 缺省步长：1
 靠近端部的一个或几个连续的维度使用缺省切片，可以使用"…"表示。 
代码：slice.py 
import numpy as np
a = np.arange(1, 10)
print(a)
print(a[:3]) # 1 2 3
print(a[3:6]) # 4 5 6
print(a[6:]) # 7 8 9
print(a[::-1]) # 9 8 7 6 5 4 3 2 1
print(a[:-4:-1]) # 9 8 7
print(a[-4:-7:-1]) # 6 5 4
print(a[-7::-1]) # 3 2 1

b = np.arange(1, 25).reshape(2, 3, 4)
print(b)
print(b[:, 0, 0]) # 1 13
print(b[0, :, :]) # 1-12相当于b[0, ...]
print(b[0, 1, ::2]) # 5 7
print(b[:, :, 1])
print(b[:, 1])
print(b[-1, 1:, 2:])
 
9. 改变维度 
9.1视图变维：针对一个数组对象获取其不同维度的视图 
数组.reshape(新维度) -> 数组的新维度视图
 数组.ravel() -> 数组的一维视图 
9.2 复制变维：针对一个数组对象获取其不同维度的副本 
数组.flatten() -> 数组的一维副本 
9.3 就地变维 
数组.shape = (新维度)
 数组.resize(新维度) 
9.4 视图转置 
数组.transpose() -> 数组的转置视图
 数组.T：转置视图属性
 至少二维数组才能转置 
代码：reshape.py 
import numpy as np
a = np.arange(1, 9)
print(a)
b = a.reshape(2, 4)
print(b)
c = b.reshape(2, 2, 2)
print(c)
d = c.ravel()
print(d)
e = c.flatten()
print(e)
f = b.reshape(2, 2, 2).copy()
print(f)
a += 10
print(a, b, c, d, e, f, sep='n')
a.shape = (2, 2, 2)
print(a)
a.resize(2, 4)
print(a)
g = a.transpose() # 等价于a.T
print(g)
# print(np.array([e]).T)
print(e.reshape(-1, 1))
 
10. 组合与拆分 
10.1 垂直组合/拆分 
numpy.vstack((上, 下))
 numpy.vsplit(数组, 份数) -> 子数组集合 
10.2 水平组合/拆分 
numpy.hstack((左, 右))
 numpy.hsplit(数组, 份数) -> 子数组集合 
10.3 深度组合/拆分 
numpy.dstack((前, 后))
 numpy.dsplit(数组, 份数) -> 子数组集合 
10.4 行/列组合 
numpy.row_stack((上, 下))
 numpy.column_stack((左, 右)) 
代码：stack.py 
import numpy as np
a = np.arange(11, 20).reshape(3, 3)
b = np.arange(21, 30).reshape(3, 3)
c = np.vstack((a, b))
print(a, b, c, sep='n')
a, b = np.vsplit(c, 2)
print(a, b, sep='n')
c = np.dstack((a, b))
print(c)
a, b = np.dsplit(c, 2)
print(a.T[0].T, b.T[0].T, sep='n')
a = a.ravel()
b = b.ravel()
print(a, b, sep='n')
c = np.row_stack((a, b))
print(c)
# c = np.column_stack((a, b))
c = np.c_[a, b]
print(c)
 
11. ndarray类的属性 
dtype - 元素的类型
 shape - 数组的维度
 T - 转置视图
 ndim - 维数
 size - 元素数
 itemsize - 元素的字节数
 nbytes - 总字节数 = size x itemsizef
 flat - 扁平迭代器
 real - 实部数组
 imag - 虚部数组
 数组.tolist() -> 列表对象 
代码：attr.py 
import numpy as np
a = np.array([
[1+1j, 2+4j, 3+7j],
[4+2j, 5+5j, 6+8j],
[7+3j, 8+6j, 9+9j]
])
print(a.dtype, a.dtype.str, a.dtype.char)
print(a.shape)
print(a.ndim)
print(a.size, len(a))
print(a.itemsize)
print(a.nbytes)
print(a.T)
print(a.real, a.imag, sep='n')
for elem in a.flat:
print(elem)
print(a.flat[[1, 3, 5]])
a.flat[[2, 4, 6]] = 0
print(a)
 
三、数据可视化：matplotlib.pyplot（mp） 
1. 基本函数 
1.1 mp.plot(水平座标数组, 垂直坐标数组) 
x:[1 2 3 4]
 y:[5 6 7 8] 
1.2 mp.plot(…, label=图例文本,linestyle=线型,linewidth=线宽,color=颜色) 
1.3 mp.xlim(左边界, 有边界) 
1.4 mp.ylim(底边界, 顶边界) 
1.5 mp.xticks(刻度位置数组, 刻度文本数组) 
1.6 mp.yticks(刻度位置数组, 刻度文本数组) 
1.7 ax = mp.gca() # 获取当前坐标轴 
ax.spines[“left”].set_position((‘data’, 0))
 ax.spines[“top”].set_color(颜色) 
1.8 mp.legend(loc=‘upper left’) #‘lower left’ 显示图例 
1.9 mp.scatter(水平坐标数组, 垂直坐标数组,marker=点型,s=大小,edgecolor=勾边色,facecolor=填充色,zorder=Z序) 
1.10 mp.annotate 
mp.annotate(
 备注文本,
 xy=目标位置,
 xycoords=目标坐标系,
 xytext=文本的位置,
 textcoords=文本坐标系,
 fontsize=字体大小,
 arrowprops=箭头属性
 ) 
代码：plot1.py 
    import numpy as np
    import matplotlib.pyplot as mp

    space = 1.1

    x = np.linspace(-2*np.pi, 2*np.pi, 1000)
    y_sin = np.sin(x)
    y_cos = np.cos(x) / 2
    xo = np.pi * 3 / 4
    yo_cos = np.cos(xo) / 2
    yo_sin = np.sin(xo)

    mp.figure("正余弦函数")
    mp.xlim(x.min() * space, x.max() * space)
    mp.ylim(y_sin.min() * space, y_sin.max() * space)
    mp.xticks([-2 * np.pi, -np.pi, 0, np.pi, np.pi * 3 / 2, 2 * np.pi],
              [r'$-2pi$', r'$-pi$', r'$0$', r'$pi$', r'$frac{3pi}{2}$', r'$2pi$'])
    mp.yticks([-1, -0.5, 0.5, 1])
    ax = mp.gca()
    ax.spines["left"].set_position(("data", 0))
    ax.spines["bottom"].set_position(("data", 0))
    ax.spines["right"].set_color('none')
    ax.spines["top"].set_color('none')
    mp.plot(x, y_cos, label=r'$y=frac{1}{2}cos(x)$', linestyle=":", linewidth=1, color="dodgerblue")
    mp.plot(x, y_sin, label=r'$y=sin(x)$', linestyle="-", linewidth=0.5, color="orangered")
    mp.scatter([xo, xo], [yo_cos, yo_sin], s=60, edgecolors="limegreen", facecolor="white", zorder=3)
    mp.plot([xo, xo], [yo_cos, yo_sin], linestyle="--", linewidth=1, color="limegreen")
    mp.annotate(
        r"$frac{1}{2}cos(frac{3pi}{4})=-frac{sqrt{2}}{4}$",
        xy=(xo, yo_cos),
        xycoords="data",
        xytext=(-90, -40),
        textcoords="offset points",
        fontsize=14,
        arrowprops=dict(
            arrowstyle="->",
            connectionstyle="arc3, rad=.2"
        )
    )
    mp.annotate(
        r"$sin(frac{3pi}{4})=frac{sqrt{2}}{2}$",
        xy=(xo, yo_sin),
        xycoords="data",
        xytext=(30, 40),
        textcoords="offset points",
        fontsize=14,
        arrowprops=dict(
            arrowstyle="->",
            connectionstyle="arc3, rad=.2"
        )
    )

    mp.legend(loc="upper left")
    mp.show()
 
2. 图形对象 
mp.figure(图形对象名,figsize=窗口大小,dpi=分辨率,facecolor=颜色) 
代码：fig.py 
import numpy as np
import matplotlib.pyplot as mp

x = np.linspace(-np.pi, np.pi, 1000)
cos_y = np.cos(x) / 2
sin_y = np.sin(x)
mp.figure("Figure Object 1", figsize=(4, 3), dpi=120, facecolor="lightgray")
mp.title("Figure Object 1", fontsize=12)
mp.xlabel("x", fontsize=10)
mp.ylabel("y", fontsize=10)
mp.tick_params(labelsize=8)
mp.grid(linestyle=":")

mp.show()
 
3. 子图 
3.1 缺省布局 
mp.subplot(行数,列数,图号)
 mp.subplot(2,3,1)
 mp.subplot(231) 
sub1.py 
import matplotlib.pyplot as mp

m, n = 3, 4

mp.figure(facecolor="lightgray")
for i in range(m * n):
    mp.subplot(m, n, i+1)
    mp.xticks(())
    mp.yticks(())
    mp.text(0.5, 0.5, str(i+1), ha="center", va="center", size=36, alpha=0.5)

mp.tight_layout()
mp.show()
 
3.2 栅格布局 
import matplotlib.gridspec as mg
 gs = mg.GridSpec(行数,列数)#栅格布局器
 mp.subplot(gs[行,列]) 
代码：sub2.py 
import matplotlib.gridspec as mg
import matplotlib.pyplot as mp

mp.figure(facecolor="lightgray")
gs = mg.GridSpec(3, 3)
mp.subplot(gs[0, :2])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "1", ha="center", va="center", size=36, alpha=0.5)

mp.subplot(gs[:2, 2])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "2", ha="center", va="center", size=36, alpha=0.5)

mp.subplot(gs[2, 1:])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "3", ha="center", va="center", size=36, alpha=0.5)

mp.subplot(gs[1:, 0])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "4", ha="center", va="center", size=36, alpha=0.5)

mp.subplot(gs[1, 1])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "5", ha="center", va="center", size=36, alpha=0.5)

mp.tight_layout()
mp.show()
 
3.3 自由布局 
mp.axes([左下角水平坐标,左下角垂直坐标,宽度,高度])所有的尺寸参数都是比例 
sub3.py 
import matplotlib.pyplot as mp

mp.figure(facecolor="lightgray")
mp.axes([0.03, 0.038, 0.94, 0.924])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "1", ha="center", va="center", size=36, alpha=0.5)

mp.axes([0.63, 0.076, 0.31, 0.308])
mp.xticks(())
mp.yticks(())
mp.text(0.5, 0.5, "2", ha="center", va="center", size=36, alpha=0.5)

mp.show()
 
4. 坐标刻度定位器 
定位器对象 = mp.xxxLocator(…)
 ax = mp.gca()
 ax.xaxis.set_major_locator(定位器对象) # 主刻度
 ax.xaxis.set_minor_locator(定位器对象) # 次刻度 
代码：tick.py 
import numpy as np
import matplotlib.pyplot as mp

mp.figure(facecolor="lightgray")
locators = [
    "mp.NullLocator()",
    "mp.MaxNLocator(nbins=3, steps=[1, 3, 5, 7, 9])",
    "mp.FixedLocator(locs=[0, 2.5, 5, 7.5, 10])",
    "mp.AutoLocator()",
    "mp.IndexLocator(offset=0.5, base=1.5)",
    "mp.MultipleLocator()",
    "mp.LinearLocator(numticks=21)",
    "mp.LogLocator(base=2, subs=[1.0])"
]
n_locators = len(locators)
for i, locator in enumerate(locators):
    mp.subplot(n_locators, 1, i+1)
    mp.xlim(0, 10)
    mp.ylim(-1, 1)
    mp.yticks(())
    ax = mp.gca()
    ax.spines["left"].set_color("none")
    ax.spines["top"].set_color("none")
    ax.spines["right"].set_color("none")
    ax.spines["bottom"].set_position(("data", 0))
    ax.xaxis.set_major_locator(eval(locator))
    ax.xaxis.set_minor_locator(mp.MultipleLocator(0.1))
    mp.plot(np.arange(11), np.zeros(11), color="none")
    mp.text(5, 0.3, locator[3:], ha="center", size=12)

mp.tight_layout()
mp.show()
 
5. 散点图 
scatter.py 
import numpy as np
import matplotlib.pyplot as mp

n = 1000
x = np.random.normal(0, 1, n)
y = np.random.normal(0, 1, n)
d = np.sqrt(x ** 2, y ** 2)

mp.figure("Scatter", facecolor="lightgray")
mp.title("Scatter", fontsize=20)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.scatter(
    x, y,
    marker="*",# * D s
    s=60,
    c=d,
    cmap="jet_r",
    alpha=0.5
)

mp.show()
 
6. 区域填充 
mp.fill_between(水平坐标数组,垂直坐标的起点数组,垂直坐标终点数组,条件,color=颜色,alpha=透明度) 
代码：fill.py 
import numpy as np
import matplotlib.pyplot as mp

n = 1000
x = np.linspace(0, 8 * np.pi, n)
sin_y = np.sin(x)
cos_y = np.cos(x / 2) / 2

mp.figure("Fill", facecolor="lightgray")
mp.title("Fill", fontsize=20)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.plot(
    x, sin_y,
    color="dodgerblue",
    label=r"$y=sin(x)$"
)
mp.plot(
    x, cos_y,
    color="limegreen",
    label=r"$y=frac{1}{2}cos(frac{x}{2})$"
)
mp.fill_between(x, cos_y, sin_y, cos_y < sin_y, color="orangered", alpha=0.5)
mp.fill_between(x, cos_y, sin_y, cos_y > sin_y, color="dodgerblue", alpha=0.5)

mp.legend()
mp.show()
 
7. 柱状图 
mp.bar(水平座标数组,高度数组,ec=边缘颜色,fc=填充颜色,label=标签文本) 
代码：bar.py 
import numpy as np
import matplotlib.pyplot as mp

n = 12
x = np.arange(n)
y1 = (1 - x / n) * np.random.uniform(0.5, 1.0, n) # 区间内产生n个随机数
y2 = (1 - x / n) * np.random.uniform(0.5, 1.0, n)

mp.figure("Bar", facecolor="lightgray")
mp.title("Bar", fontsize=20)
mp.ylim(-1.25, 1.25)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.xticks(x, x + 1)
mp.tick_params(labelsize=10)
mp.grid(axis="y", linestyle=":")
mp.bar(
    x, y1,
    ec="white",
    fc="dodgerblue",
    label="Sample1"
)
mp.bar(
    x, -y2,
    ec="white",
    fc="orangered",
    label="Sample2",
    alpha=0.5
)
for _x, _y1, _y2 in zip(x, y1, y2):
    mp.text(_x, _y1, "%.2f" % _y1, ha="center", size=9)
    mp.text(_x, -_y2, "%.2f" % -_y2, ha="center", size=9, va="top")

mp.legend()
mp.show()
 
8. 等高线图 
mp.contour(x,y,z,线数,colors=颜色,linewidths=线宽) -> 线
 mp.contourf(x,y,z,线数,cmap=颜色映射) -> 色带 
代码：cntr.py 
import numpy as np
import matplotlib.pyplot as mp

n = 1000
x, y = np.meshgrid(np.linspace(-3, 3, n), np.linspace(-3, 3, n))
z = (1 - x / 2 + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2)

mp.figure("Contour", facecolor="lightgray")
mp.subplot(121)
mp.title("Contourf", fontsize=20)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.contourf(
    x, y, z,
    10,
    cmap="jet"
)
cntr = mp.contour(
    x, y, z,
    8,
    colors="black",
    linewidths=0.5
)
mp.clabel(cntr, inline_spacing=0.8, fmt="%.1f", fontsize=8)

mp.subplot(122)
mp.title("Contour", fontsize=20)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
cntr = mp.contour(
    x, y, z,
    8,
    colors="black",
    linewidths=0.5
)
mp.clabel(cntr, inline_spacing=1, fmt="%.1f", fontsize=10)

mp.tight_layout()
mp.show()
 
9. 热相图 
mp.imshow(矩阵, cmap=颜色映射,origin=垂直轴的方向) 
代码：hot.py 
import numpy as np
import matplotlib.pyplot as mp

n = 1000
x, y = np.meshgrid(np.linspace(-3, 3, n), np.linspace(-3, 3, n))
z = (1 - x / 2 + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2)

mp.figure("Hot", facecolor="lightgray")
mp.subplot(121)
mp.title("Hot", fontsize=20)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.imshow(
    z,
    cmap="jet",
    origin="lower"
)

mp.subplot(122)
mp.title("Contour", fontsize=20)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
cntr = mp.contour(
    x, y, z,
    8,
    colors="black",
    linewidths=0.5
)
mp.clabel(cntr, inline_spacing=1, fmt="%.1f", fontsize=10)

mp.tight_layout()
mp.show()
 
10. 饼图 
mp.pie(值列表,间隙列表,标签,颜色列表,格式串,shadow=是否带阴影,startangle=起始角度) 
代码：pie.py 
import matplotlib.pyplot as mp

mp.figure("Pie", facecolor="lightgray")
mp.title("Pie", fontsize=20)
mp.pie(
    [26, 17, 21, 29, 11],
    [0.05, 0.01, 0.01, 0.01, 0.01],
    ["Python", "Javascript", "C++", "C", "PHP"],
    ["dodgerblue", "orangered", "limegreen", "violet", "gold"],
    "%d%%",
    shadow=True,
    startangle=90
)
mp.axis("equal")

mp.show()
 
11. 三维曲面 
from mpl_toolkits.mplot3d import axes3d
 ax = mp.axes(projection=“3d”)
 或者
 fig = mp.figure()
 ax = fig.add_subplot(121, projection=“3d”)
 ax.plot_surface(x,y,z,rstride=行距,cstridd=列距,cmap=颜色映射) -> 三维曲面
 ax.plot_wireframe(x,y,z,rstride=行距,cstride=列距,linewidth=线宽,color=颜色) 
代码：plot3d.py 
import numpy as np
import matplotlib.pyplot as mp
from mpl_toolkits.mplot3d import axes3d

n = 1000
num = 10

x, y = np.meshgrid(np.linspace(-3, 3, n), np.linspace(-3, 3, n))
z = (1 - x / 2 + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2)

fig = mp.figure("3D Wireframe", facecolor="lightgray")
ax = fig.add_subplot(121, projection="3d")
mp.title("3D Lines", fontsize=20)
ax.set_xlabel("x", fontsize=14)
ax.set_ylabel("y", fontsize=14)
ax.set_zlabel("z", fontsize=14)
mp.tick_params(labelsize=10)
ax.plot_wireframe(
    x, y, z,
    rstride=num,
    cstride=num,
    linewidth=0.5,
    color="orangered"
)

ax1 = fig.add_subplot(122, projection="3d")
mp.title("3D Surface", fontsize=20)
ax1.set_xlabel("x", fontsize=14)
ax1.set_ylabel("y", fontsize=14)
ax1.set_zlabel("z", fontsize=14)
mp.tick_params(labelsize=10)
ax1.plot_surface(
    x, y, z,
    rstride=num,
    cstride=num,
    cmap="jet"
)

mp.show()
 
12. 三维散点 
ax.scatter(x,y,z,s=大小,c=颜色,marker=点型) 
代码：s3.py 
import numpy as np
import matplotlib.pyplot as mp
from mpl_toolkits.mplot3d import axes3d

n = 1000
x = np.random.normal(0, 1, n)
y = np.random.normal(0, 1, n)
z = np.random.normal(0, 1, n)
d = np.sqrt(x ** 2 + y ** 2 + z ** 2)

mp.figure("Scatter3D", facecolor="lightgray")
ax = mp.axes(projection="3d")
ax.set_xlabel("x", fontsize=14)
ax.set_ylabel("y", fontsize=14)
ax.set_zlabel("z", fontsize=14)
mp.tick_params(labelsize=10)
ax.scatter(
x, y, z,
s=60,
c=d,
cmap="jet_r",
alpha=0.5,
marker="*"
)

mp.show()
 
四、numpy的通用函数 
1. 读取文本文件 
numpy.loadtxt(
 文件名,
 delimiter=分隔符,
 usecols=选择列,
 unpack=是否解包,
 dtype=目标类型,
 converters=转换器
 ) -> 二维数组 
2. 保存文本文件 
numpy.savetxt(
 文件名,
 二维数组,
 delimiter=分隔符,
 fmt=格式
 ) 
代码：txt.py 
import numpy as np

a = np.arange(1, 10).reshape(3, 3)
print(a)
np.savetxt("test.csv", a, delimiter=",", fmt="%d")
b = np.loadtxt("test.csv", delimiter=",", dtype="i4")
print(b)

c = np.loadtxt("test.csv", delimiter=",", usecols=(0, 2), dtype="i4")
print(c)

d, e = np.loadtxt("test.csv", delimiter=",", usecols=(0, 2), unpack=True, dtype="i4, f8")
print(d, e)
 
代码 k.py 
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md

def dmy2ymd(dmy):
    dmy = str(dmy, encoding="utf-8")
    date = dt.datetime.strptime(dmy, "%d-%m-%Y").date()
    ymd = date.strftime("%Y-%m-%d")
    return ymd

def getStockData():
    dates, openings, highests, lowests, closes = np.loadtxt(
        "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV",
        # max_rows=191,
        max_rows=31,
        delimiter=",",
        skiprows=1,
        usecols=(2, 4, 5, 6, 7),
        unpack=True,
        dtype="M8[D], f8, f8, f8, f8",
        # M8[D]为numpy自带的时间类型
        # converters={2: dmy2ymd} 日期转换回调函数
    )
    return dates, openings, highests, lowests, closes

def plotCandle(d, o, h, l, c):
    mp.figure("Candle Stick", facecolor="lightgray")
    mp.title("Candle Stick", fontsize=20)
    mp.xlabel("Date", fontsize=14)
    mp.ylabel("Price", fontsize=14)
    mp.tick_params(labelsize=10)
    ax = mp.gca()
    ax.xaxis.set_major_locator(md.WeekdayLocator(byweekday=md.MO)) # 日期刻度
    ax.xaxis.set_minor_locator(md.DayLocator())
    ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y")) # 指定主刻度格式
    mp.tick_params(labelsize=10)
    mp.grid(linestyle=":")
    dates = d.astype(md.datetime.datetime)
    rise = c - o >= 0.01
    fall = o - c >= 0.01
    fc = np.zeros(dates.size, dtype="3f4")
    ec = np.zeros(dates.size, dtype="3f4")
    fc[rise], fc[fall] = (1, 1, 1), (0, 0.5, 0)
    ec[rise], ec[fall] = (1, 0, 0), (0, 0.5, 0)
    mp.bar(dates, h - l, 0, l, color=fc, edgecolor=ec) # 画线
    mp.bar(dates, c - o, 0.8, o, color=fc, edgecolor=ec) # 画蜡烛
    mp.gcf().autofmt_xdate() # 日期刻度自动化调整
    mp.show()

if __name__ == '__main__':
    d, o, h, l, c = getStockData()
    plotCandle(d=d, o=o, h=h, l=l, c=c)
 
3. 算术平均值 
样本：S = [s1,s2,…,sn]
 算术平均值：m = (s1+s2+…+sn)/n
 numpy.mean(样本数组) -> 算术平均值 
代码：mean.py 
import numpy as np

def getStockData():
    closes = np.loadtxt(
        "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV",
        # max_rows=191,
        max_rows=31,
        delimiter=",",
        skiprows=1,
        usecols=(7,),
        dtype="f8",
        # M8[D]为numpy自带的时间类型
        # converters={2: dmy2ymd} 日期转换回调函数
    )
    return closes

def mymean(closes):
    mean = 0
    for c in closes:
        mean += c
    mean /= closes.size
    return mean

def numpymean(closes):
    return np.mean(closes)

if __name__ == '__main__':
    print(mymean(getStockData()))
    print(numpymean(getStockData()))
 
4. 加权平均值 
样本：S = [s1,s2,…,sn]
 权重：W = [w1,w2,…,wn]
 加权平均值：a = (s1w1+s2w2+…+snwn)/(w1+w2+…+wn)
 numpy.average(样本数组,weights=权重数组) -> 加权平均值 
成交量加权平均价格（VWAP）
 时间加权平均价格（TWAP）
 代码：vwap.py、twap.py 
5. 最大值和最小值 
5.1 max/min 
获取一个数组中的最大/最小元素
 a: 9 7 5
 3 1 8
 6 6 1
 numpy.max(a) -> 9
 numpy.min(a) -> 1 
5.2 maximum/minimum 
在两个数组的对应元素之间构造最大值/最小值数组
 b: 6 1 9
 7 1 7
 4 4 5
 numpy.maximum(a,b) ->
 9 7 9
 7 1 8
 6 6 5 
5.3 ptp 
极差，一个数组的最大值和最小值之差
 numpy.ptp(数组) -> 数组.max()-数组.min()
 价格波动幅度=某一种价格的极差
 价格波动范围=最高的最高价-最低的最低价 
代码：max.py 
import numpy as np

a = np.random.randint(10, 100, 9).reshape(3, 3)
b = np.random.randint(10, 100, 9).reshape(3, 3)
print(a)
print(np.max(a), a.max())
print(np.min(a), a.min())
print(np.argmax(a), a.argmax()) # 最大值的下标
print(b)
print(np.maximum(a, b))
print(np.minimum(a, b))
 
6. 中位数 
将多个样本按照大小顺序排列，居于中间位置的元素即为中位数。 
A：样本集
 L：样本数
 M = (A[(L-1)//2]+A[L//2])/2
 numpy.median(数组) -> 中位数 
代码：med.py 
import numpy as np

def getStockData():
    closes = np.loadtxt(
        "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV",
        # max_rows=191,
        max_rows=31,
        delimiter=",",
        skiprows=1,
        usecols=(7,),
        dtype="f8",
        # M8[D]为numpy自带的时间类型
        # converters={2: dmy2ymd} 日期转换回调函数
    )
    return closes

def mymedian(closes):
    sorted_prices = np.msort(closes)
    l = sorted_prices.size
    median = (sorted_prices[int((l-1)/2)] + sorted_prices[int(l/2)]) / 2
    return median

def numpymedian(closes):
    return np.median(closes)

if __name__ == '__main__':
    print(mymedian(getStockData()))
    print(numpymedian(getStockData()))
 
7. 标准差 
样本：S = [s1,s2,…,sn]
 均值：m = (s1+s2+…+sn)/n
 离差：D = [s1-m,s2-m,…,sn-m]
 方差：v = ((s1-m)^2+(s2-m)2+…+(sn-m)^2)/n
 标准差：std = sqrt(v)(方均根离差)
 numpy.std(数组,ddof=非自由度) -> 标准差
 总体方差和总体标准差：…/n
 样本方差和样本标准差：…/(n-1) 
代码：var.py 
import numpy as np

def getStockData():
    closes = np.loadtxt(
        "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV",
        # max_rows=191,
        max_rows=31,
        delimiter=",",
        skiprows=1,
        usecols=(7,),
        dtype="f8",
    )
    return closes

def mypstd(closes):
    mean = np.mean(closes)
    devs = closes - mean
    var = (devs ** 2).mean()
    std = np.sqrt(var)
    return std

def numpypstd(closes):
    return np.std(closes)

def mysstd(closes):
    mean = np.mean(closes)
    devs = closes - mean
    svar = (devs ** 2).sum() / (devs.size - 1)
    sstd = np.sqrt(svar)
    return sstd

def numpysstd(closes):
    return np.std(closes, ddof=1)

if __name__ == '__main__':
    print(mypstd(getStockData()))
    print(numpypstd(getStockData()))
    print(mysstd(getStockData()))
    print(numpysstd(getStockData()))
 
8. 针对日期的处理 
8.1 星期数据 
数组[关系表达式]：关系表达式的值是一个布尔数组，其中为True的元素对应于数组中满足关系表达式的元素，以上下标运算的值就是数组中对应布尔数组中为True的元素
 np.where(关系表达式) -> 数组中满足关系表达式的元素的下标数组
 np.take(数组,下标数组) -> 数组中由下标数组所标识的元素的集合 
8.2 星期汇总 
np.apply_along_axis(函数,轴向,高维数组)
 在高维数组中，沿着指定轴向，提起低维子数组，作为参数传递给特定的函数，并将其返回值按照同样的轴向组成新的数组返回给调用者。
 轴向：二维，0-行，1-列
 三维，0-页，1-行，2-列 
代码：week.py 
import numpy as np
import datetime as dt

def dmy2wdays(ymd):
    ymd = str(ymd, encoding="utf-8")
    date = dt.datetime.strptime(ymd, "%Y-%m-%d").date()
    wday = date.weekday() # 0-6表示周一到周日
    return wday

def getStockData():
    wdays, closes = np.loadtxt(
        "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV",
        # max_rows=191,
        max_rows=31,
        delimiter=",",
        skiprows=1,
        usecols=(2, 7),
        unpack=True,
        dtype="f8, f8",
        converters={2: dmy2wdays}
    )
    return wdays, closes

def ave_close_every_day(wdays, closes):
    ave_closing_prices = np.zeros(5)
    for wday in range(ave_closing_prices.size):
        # ave_closing_prices[wday] = closes[wdays == wday].mean()
        # ave_closing_prices[wday] = closes[np.where(wdays == wday)].mean()
        ave_closing_prices[wday] = np.take(closes, np.where(wdays == wday)).mean()
    for wday, ave_closing_price in zip(
        ["MON", "TUE", "WED", "THU", "FRI"],
        ave_closing_prices
    ):
        print(wday, np.round(ave_closing_price, 2))
    return ave_closing_prices

if __name__ == '__main__':
    wdays, closes = getStockData()
    average = ave_close_every_day(wdays, closes)
    print(average)
 
代码：axis.py 
import numpy as np

def pingfang(x):
    print("pingfang", x)
    return x * x

X = np.array([
    [11, 12, 13],
    [4, 5, 6],
    [7, 8, 9]
])
Y = np.apply_along_axis(pingfang, 1, X) # 0表示沿着行方向
print(Y)
 
9. 一维卷积 
a:[1 2 3 4 5] - 被卷积数组
 b:[6 7 8] - 卷积核数组 (进行卷积时要倒置)
 c = a(x)b =
 [6 19 40 61 82 67 40] - full(完全卷积)
 [19 40 61 82 67] - same(同尾卷积)
 [40 61 82] - valid(有效卷积) 
      6 19 40 61 82 67 40
0  0  1  2  3  4  5  0  0
8  7  6
   8  7  6
      8  7  6
         8  7  6
            8  7  6
               8  7  6
                  8  7  6
 
numpy.convolve(a, b, “full”/“same”/“valid”) 
代码：conv.py 
import numpy as np

a = np.arange(1, 6)
print("a:", a)
b = np.arange(6, 9)
print("b:", b)
c = np.convolve(a, b, "full")
print("c(full):", c)
c = np.convolve(a, b, "same")
print("c(same):", c)
c = np.convolve(a, b, "valid")
print("c(valid):", c)
 
移动均线 ma.py 
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md

def getStockData():
    dates, closes = np.loadtxt(
        "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV",
        # max_rows=191,
        max_rows=31,
        delimiter=",",
        skiprows=1,
        usecols=(2, 7),
        unpack=True,
        dtype="M8[D], f8",
    )
    return dates, closes

def ma5_line(closes):
    ma5 = np.zeros(closes.size - 4)
    for i in range(ma5.size):
        ma5[i] = closes[i:i+5].mean()
    return ma5

def ma5_line_conv(closes):
    return np.convolve(closes, np.ones(5) / 5, "valid")

def ma5_exp_line(closes):
    weights = np.exp(np.linspace(-1, 0, 5))
    weights /= weights.sum()
    # 因为做卷积的时候卷积核数组会倒序，所以提前把权重数组倒序
    ma5 = np.convolve(closes, weights[::-1], "valid")
    return ma5

def plot(dates, closes, ma51, ma52, ma53):
    mp.figure("Moving Average", facecolor="lightgray")
    mp.title("Moving Average", fontsize=20)
    mp.xlabel("Date", fontsize=14)
    mp.ylabel("Price", fontsize=14)
    ax = mp.gca()
    ax.xaxis.set_major_locator(md.WeekdayLocator(byweekday=md.MO))
    ax.xaxis.set_minor_locator(md.DayLocator())
    ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y"))
    mp.tick_params(labelsize=10)
    mp.grid(linestyle=":")
    dates = dates.astype(md.datetime.datetime)
    mp.plot(dates, closes, c="lightgray", label="Closing Price")
    mp.plot(dates[4:], ma51, c="orangered", linewidth=0.5, label="MA-51")
    mp.plot(dates[4:], ma52, c="orangered", alpha=0.2, linewidth=4, label="MA-52")
    mp.plot(dates[4:], ma53, c="limegreen", label="MA-53")
    mp.legend()
    mp.gcf().autofmt_xdate()
    mp.show()

if __name__ == '__main__':
    d, c = getStockData()
    ma51 = ma5_line(c)
    ma52 = ma5_line_conv(c)
    ma53 = ma5_exp_line(c)
    plot(d, c, ma51, ma52, ma53)
 
布林带
 中轨：移动均线
 上轨：中轨+2x标准差
 下轨：中轨-2x标准差 
代码：bb.py 
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md

N = 5

def getStockData():
    dates, closes = np.loadtxt(
        "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV",
        # max_rows=191,
        max_rows=31,
        delimiter=",",
        skiprows=1,
        usecols=(2, 7),
        unpack=True,
        dtype="M8[D], f8",
    )
    return dates, closes

def ma5_line_conv(closes):
    medios = np.convolve(closes, np.ones(N) / N, "valid")
    stds = np.zeros(medios.size)
    for i in range(stds.size):
        stds[i] = np.std(closes[i:i+N])
    lowers = medios - 2 * stds
    uppers = medios + 2 * stds
    return medios, lowers, uppers

def plot(dates, closes, medios, lowers, uppers):
    mp.figure("Moving Average", facecolor="lightgray")
    mp.title("Moving Average", fontsize=20)
    mp.xlabel("Date", fontsize=14)
    mp.ylabel("Price", fontsize=14)
    ax = mp.gca()
    ax.xaxis.set_major_locator(md.WeekdayLocator(byweekday=md.MO))
    ax.xaxis.set_minor_locator(md.DayLocator())
    ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y"))
    mp.tick_params(labelsize=10)
    mp.grid(linestyle=":")
    dates = dates.astype(md.datetime.datetime)
    mp.plot(dates, closes, c="lightgray", label="Closing Price")
    mp.plot(dates[N-1:], medios, c="dodgerblue", linewidth=0.5, label="Medio")
    mp.plot(dates[N-1:], lowers, c="limegreen", linewidth=0.5, label="Lower")
    mp.plot(dates[N-1:], uppers, c="orangered", linewidth=0.5, label="Upper")
    mp.legend()
    mp.gcf().autofmt_xdate()
    mp.show()

if __name__ == '__main__':
    d, c = getStockData()
    medios, lowers, uppers = ma5_line_conv(c)
    plot(d, c, medios, lowers, uppers)
 
10. 线性模型 
y = kx + b 
10.1 线性预测 
a b c d e f ?
 d = aA+bB+cC
 e = bA+cB+dC >A B C
 f = cA+dB+eC/
 ? = dA+eB+fC
 /a b c /A / d 
 |b c d|X|B| = |e |
 c d e/ C/  f /
 a x b
 =numpy.linalg.lstsq(a, b)
 ? = bx 
代码：line.py 
import numpy as np
import pandas as pd
import matplotlib.pyplot as mp
import matplotlib.dates as md

N = 6

def getStockData():
    dates, closes = np.loadtxt(
        "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV",
        # max_rows=191,
        max_rows=31,
        delimiter=",",
        skiprows=1,
        usecols=(2, 7),
        unpack=True,
        dtype="M8[D], f8",
    )
    return dates, closes

def line_predict(closes):
    pred_prices = np.zeros(closes.size - 2 * N + 1)
    for i in range(pred_prices.size):
        a = np.zeros((N, N))
        for j in range(N):
            a[j, ] = closes[i + j: i + j + N]
        b = closes[i + N: i + N * 2]
        x = np.linalg.lstsq(a, b, rcond=None)[0]
        pred_prices[i] = b.dot(x) # 点乘，适量乘
    return pred_prices


def plot(dates, closes, pred_prices):
    mp.figure("Stock Price Prediction", facecolor="lightgray")
    mp.title("Stock Price Prediction", fontsize=20)
    mp.xlabel("Date", fontsize=14)
    mp.ylabel("Price", fontsize=14)
    ax = mp.gca()
    ax.xaxis.set_major_locator(md.WeekdayLocator(byweekday=md.MO))
    ax.xaxis.set_minor_locator(md.DayLocator())
    ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y"))
    mp.tick_params(labelsize=10)
    mp.grid(linestyle=":")
    dates = dates.astype(md.datetime.datetime)
    mp.plot(dates, closes, "o-", c="lightgray", label="Closing Price")
    dates = np.append(dates, dates[-1] + pd.tseries.offsets.BDay())
    mp.plot(dates[N * 2:], pred_prices, "o-", c="orangered", label="Predicted Price")
    mp.legend()
    mp.gcf().autofmt_xdate()
    mp.show()

if __name__ == '__main__':
    d, c = getStockData()
    predict = line_predict(c)
    plot(d, c, predict)
 
10.2 线性拟合 
kx + b = y
 kx1 + b = y1
 kx2 + b = y2
 …
 kxn + b = yn
 /x1 1 /k /y1
 |x2 1|X|b|=|y2|
 |… |  / |…|
 xn 1/ yn/
 a x b
 x= np.linalg.lstsq(a, b) 
代码：trend.py 
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md

def getStockData():
    dates, openings, highests, lowests, closes = np.loadtxt(
        "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV",
        max_rows=31,
        delimiter=",",
        skiprows=1,
        usecols=(2, 4, 5, 6, 7),
        unpack=True,
        dtype="M8[D], f8, f8, f8, f8",
    )
    return dates, openings, highests, lowests, closes

def plotCandle(d, o, h, l, c):
    # 基准点
    trend_points = (h + l + c) / 3
    spreads = h - l
    resistance_points = trend_points + spreads
    support_points = trend_points - spreads
    days = d.astype(int)
    a = np.column_stack((days, np.ones_like(days)))
    x = np.linalg.lstsq(a, trend_points, rcond=None)[0]
    trend_line = days * x[0] + x[1]
    x = np.linalg.lstsq(a, resistance_points, rcond=None)[0]
    resistance_line = days * x[0] + x[1]
    x = np.linalg.lstsq(a, support_points, rcond=None)[0]
    support_line = days * x[0] + x[1]
    mp.figure("Trend", facecolor="lightgray")
    mp.title("Trend", fontsize=20)
    mp.xlabel("Date", fontsize=14)
    mp.ylabel("Price", fontsize=14)
    mp.tick_params(labelsize=10)
    ax = mp.gca()
    ax.xaxis.set_major_locator(md.WeekdayLocator(byweekday=md.MO)) # 日期刻度
    ax.xaxis.set_minor_locator(md.DayLocator())
    ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y")) # 指定主刻度格式
    mp.tick_params(labelsize=10)
    mp.grid(linestyle=":")
    dates = d.astype(md.datetime.datetime)
    rise = c - o >= 0.01
    fall = o - c >= 0.01
    fc = np.zeros(dates.size, dtype="3f4")
    ec = np.zeros(dates.size, dtype="3f4")
    fc[rise], fc[fall] = (1, 1, 1), (0.85, 0.85, 0.85)
    ec[rise], ec[fall] = (0.85, 0.85, 0.85), (0.85, 0.85, 0.85)
    mp.bar(dates, h - l, 0, l, color=fc, edgecolor=ec) # 画线
    mp.bar(dates, c - o, 0.8, o, color=fc, edgecolor=ec) # 画蜡烛
    mp.scatter(dates, trend_points, c="dodgerblue", alpha=0.5, s=60, zorder=2)
    mp.scatter(dates, resistance_points, c="orangered", alpha=0.5, s=60, zorder=2)
    mp.scatter(dates, support_points, c="limegreen", alpha=0.5, s=60, zorder=2)
    mp.plot(dates, trend_line, c="dodgerblue", linewidth=2, label="Trend")
    mp.plot(dates, resistance_line, c="orangered", linewidth=2, label="Resistance")
    mp.plot(dates, support_line, c="limegreen", linewidth=2, label="Support")
    mp.gcf().autofmt_xdate() # 日期刻度自动化调整
    mp.legend()
    mp.show()

if __name__ == '__main__':
    d, o, h, l, c = getStockData()
    plotCandle(d=d, o=o, h=h, l=l, c=c)
 
11. 裁剪、压缩和累乘 
1）ndarray.clip(min=最小值,max=最大值)
 将调用数组中小于min的元素设置为min，大于max的元素设置为max
 2)ndarray.compress(条件) 返回调用数组中满足给定条件的元素
 3)ndarray.prod() 返回调用数组中各个元素的累积
 4)ndarray.cumprod() 返回调用数组中各元素计算累乘的过程数组 
代码：ndarr.py 
import numpy as np

a = np.arange(1, 10).reshape(3, 3)
print(a)
b = a.clip(min=3, max=7)
print(b)
c = a.compress(a.ravel() > 3).reshape(-1, 3)
print(c)
d = a.compress(a.ravel() < 7).reshape(-1, 3)
print(d)
e = a.compress((3 < a.ravel()) & (a.ravel() < 7))
print(e)
f = a.prod()
print(f)
h = a.cumprod()
print(h)
 
12. 相关性 
a = [a1,a2,…,an]
 b = [b1,b2,…,bn]
 均值：
 ave(a) = (a1+a2+…+an)/n
 ave(b) = (b1+b2+…+bn)/n
 离差：
 dev(a) = [a1,a2,…,an] - ave(a)
 dev(b) = [b1,b2,…,bn] - ave(b)
 方差：
 var(a) = ave(dev(a)*dev(a))
 var(b) = ave(dev(b)*dev(b))
 标准差：
 std(a) = sqrt(var(a))
 std(b) = sqrt(var(b))
 协方差：（绝对值越大，表示相关性越强，+为正相关，-为负相关）
 cov(a, b) = ave(dev(a)*dev(b))
 cov(b, a) = ave(dev(b)*dev(a))
 相关性系数：（为了消除波动影响，除以标准差）
 ave(dev(a)*dev(b))/std(a)*std(b)
 ave(dev(b)*dev(a))/std(b)*std(a)
 [-1, 1]:正负表示了相关性的方向为正或反，绝对值表示了相关性的强弱，大越强，小越弱
 相关性矩阵：
 /var(a)/std(a)std(a)=1 cov(a,b)/std(a)std(b)
 | |
 cov(b,a)/std(b)std(a) var(b)/std(b)std(b)=1/
 numpy.corrcoef(a, b) -> 相关性矩阵 
代码：corr.py 
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md

N = 5

def getStockData(csvName, skip=1):
    dates, closes = np.loadtxt(
        csvName,
        # max_rows=191,
        max_rows=31,
        delimiter=",",
        skiprows=skip,
        usecols=(2, 7),
        unpack=True,
        dtype="M8[D], f8",
    )
    return dates, closes

def plot(dates, closes1, closes2):
    mp.figure("Correlation Of Returns", facecolor="lightgray")
    mp.title("Correlation Of Returns", fontsize=20)
    mp.xlabel("Date", fontsize=14)
    mp.ylabel("Price", fontsize=14)
    ax = mp.gca()
    ax.xaxis.set_major_locator(md.WeekdayLocator(byweekday=md.MO))
    ax.xaxis.set_minor_locator(md.DayLocator())
    ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y"))
    mp.tick_params(labelsize=10)
    mp.grid(linestyle=":")
    dates = dates.astype(md.datetime.datetime)
    mp.plot(dates[:-1], closes1, c="lightgray", label="Closing Price1")
    mp.plot(dates[:-1], closes2, c="orangered", label="Closing Price2")
    mp.legend()
    mp.gcf().autofmt_xdate()
    mp.show()

if __name__ == '__main__':
    csvName1 = "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV"
    csvName2 = "C:/Users/Hasee/Desktop/stocks/深证A股/000159.SZ.CSV"
    d1, c1 = getStockData(csvName1)
    d2, c2 = getStockData(csvName2, skip=3468)
    # diff数组中的后一个减去前一个的到size为n-1的另一个数组，returns代表收益率
    returns1 = np.diff(c1) / c1[:-1]
    returns2 = np.diff(c2) / c2[:-1]
    ave_a = np.mean(returns1)
    dev_a = returns1 - ave_a
    var_a = np.mean(dev_a * dev_a)
    std_a = np.sqrt(var_a)

    ave_b = np.mean(returns2)
    dev_b = returns2 - ave_b
    var_b = np.mean(dev_b * dev_b)
    std_b = np.sqrt(var_b)

    cov_ab = np.mean(dev_a * dev_b)
    cov_ba = np.mean(dev_b * dev_a)
    covs = np.array([
        [var_a, cov_ab],
        [cov_ba, var_b]
    ])
    stds = np.array([
        [std_a * std_a, std_a * std_b],
        [std_b * std_a, std_b * std_b]
    ])
    corr = covs / stds
    print(corr)
    corr = np.corrcoef(returns1, returns2)
    print(corr)
    plot(d1, returns1, returns2)
 
13. 多项式拟合 
用一个无穷级数表示一个可微函数。实际上任何可微函数，总可以用一个N次多项式函数来近似，而比N次幂更高阶的部分可以作为无穷小量而被忽略不计。
 f(x)=p0x^n + p1x^n-1 + p2x^n-2 +…+ pnx^0
 y0 = f(x0)
 y1 = f(x1)
 y2 = f(x2)
 …
 yn = f(xn)
 numpy.ployfit(自变量数组,函数值数组,最高次幂) -> [p0,p1,…,pn]
 numpy.polyval([p0,p1,…,pn], 自变量数组) -> 函数值数组
 numpy.roots([p0,p1,…,pn]) -> 多项式方程的根
 numpy.polyder([p0,p1,…,pn]) -> 导函数系数数组 
代码：poly.py 
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md

N = 5

def getStockData(csvName, skip=1):
    dates, closes = np.loadtxt(
        csvName,
        max_rows=31,
        delimiter=",",
        skiprows=skip,
        usecols=(2, 7),
        unpack=True,
        dtype="M8[D], f8",
    )
    return dates, closes

def plot(dates, closes, pcloses, d, p):
    mp.figure("Polynomial Fitting", facecolor="lightgray")
    mp.title("Polynomial Fitting", fontsize=20)
    mp.xlabel("Date", fontsize=14)
    mp.ylabel("Price", fontsize=14)
    ax = mp.gca()
    ax.xaxis.set_major_locator(md.WeekdayLocator(byweekday=md.MO))
    ax.xaxis.set_minor_locator(md.DayLocator())
    ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y"))
    mp.tick_params(labelsize=10)
    mp.grid(linestyle=":")
    dates = dates.astype(md.datetime.datetime)
    mp.scatter(dates, closes, c="limegreen", alpha=0.5, s=60, label="Difference Price")
    mp.plot(dates, pcloses, c="dodgerblue", linewidth=3, label="Polynomial Fitting")
    for i in range(1, d.size):
        mp.annotate("", xytext=(d[i-1], p[i-1]), xy=(d[i], p[i]), size=40, arrowprops=dict(arrowstyle="fancy", color="orangered", alpha=0.25))
    mp.scatter(d, prices, marker="^", c="orangered", s=80, label="Peek", zorder=4)
    mp.legend()
    mp.gcf().autofmt_xdate()
    mp.show()

if __name__ == '__main__':
    csvName1 = "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV"
    csvName2 = "C:/Users/Hasee/Desktop/stocks/深证A股/000159.SZ.CSV"
    d1, c1 = getStockData(csvName1)
    d2, c2 = getStockData(csvName2, skip=3468)
    days = d1.astype(int)
    diff_closing_prices = c1 - c2
    p = np.polyfit(days, diff_closing_prices, 5)
    poly_closing_prices = np.polyval(p, days)
    q = np.polyder(p)
    roots = np.roots(q)
    reals = roots[np.isreal(roots)].real # 拿出实数部分
    peeks = [[days[0], np.polyval(p, days[0])]]
    for real in reals:
        if days[0] < real and real < days[-1]:
            peeks.append([real, np.polyval(p, real)])
    peeks.append([days[-1], np.polyval(p, days[-1])])
    peeks.sort()
    peeks = np.array(peeks)
    dates, prices = np.hsplit(peeks, 2)
    dates.astype(int).astype("M8[D]").astype(md.datetime.datetime)
    plot(days, diff_closing_prices, poly_closing_prices, dates, prices)
 
14. 符号数组 
a:[10 -20 30 0 40 -50 -60 0 70]
 numpy.sign(a) -> [1 -1 1 0 1 -1 -1 0 1]
 净额成交量（OBV）
 numpy.piecewise(被判断数组, [条件1,条件2,…],[标志1,标志2,…]) -> 满足每个条件的标志数组 
代码：obv.py 
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md

def getStockData(csvName, skip=1):
    dates, closes, volumes = np.loadtxt(
        csvName,
        max_rows=31,
        delimiter=",",
        skiprows=skip,
        usecols=(2, 7, 8),
        unpack=True,
        dtype="M8[D], f8, f8",
    )
    return dates, closes, volumes

def plot(dates, obvs):
    mp.figure("On-Balance Volume", facecolor="lightgray")
    mp.title("On-Balance Volume", fontsize=20)
    mp.xlabel("Date", fontsize=14)
    mp.ylabel("OBV", fontsize=14)
    ax = mp.gca()
    ax.xaxis.set_major_locator(md.WeekdayLocator(byweekday=md.MO))
    ax.xaxis.set_minor_locator(md.DayLocator())
    ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y"))
    mp.tick_params(labelsize=10)
    mp.grid(axis="y", linestyle=":")
    dates = dates[1:].astype(md.datetime.datetime)
    rise = obvs > 0
    fall = obvs < 0
    fc = np.zeros(dates.size, dtype="3f4")
    ec = np.zeros(dates.size, dtype="3f4")
    fc[rise], fc[fall] = (1, 0, 0), (0, 0.5, 0)
    ec[rise], ec[fall] = (1, 1, 1), (1, 1, 1)
    mp.bar(dates, obvs, color=fc, edgecolor=ec, label="OBV")

    mp.legend()
    mp.gcf().autofmt_xdate()
    mp.show()

if __name__ == '__main__':
    csvName1 = "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV"
    d, c, v = getStockData(csvName1)
    diff_closing_price = np.diff(c) # diff差分数组
    # sign_closing_price = np.sign(diff_closing_price)
    sign_closing_price = np.piecewise(diff_closing_price, [diff_closing_price < 0, diff_closing_price == 0, diff_closing_price > 0], [-1, 0, 1])
    obvs = v[1:] * sign_closing_price
    plot(d, obvs)
 
15. 矢量化 
def 标量函数(标量参数1,标量参数2, …):
 …
 return 标量返回值1, 标量返回值2, …
 np.vectorize(标量函数) -> 矢量函数
 矢量函数(矢量参数1,矢量参数2,…) -> 矢量返回值1,矢量返回值2,… 
代码：vec.py 
import numpy as np

def fun(a, b):
    return a + b, a - b, a * b

A = np.array([10, 20, 30])
B = np.array([100, 200, 300])
C = np.vectorize(fun)(A, B)
print(C)
 
代码：sim.py 
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md

def getStockData(csvName, skip=1):
    dates, opens, highest, lowest, closes = np.loadtxt(
        csvName,
        max_rows=31,
        delimiter=",",
        skiprows=skip,
        usecols=(2, 4, 5, 6, 7),
        unpack=True,
        dtype="M8[D], f8, f8, f8, f8",
    )
    return dates, opens, highest, lowest, closes

def profit(opens, highest, lowest, closes):
    buying_price = opens * 0.99
    if lowest <= buying_price <= highest:
        return (closes - buying_price) * 100 / buying_price
    return np.nan

def plot(dates, profits, gain_dates, gain_profits, loss_dates, loss_profits):
    mp.figure("Trading Simulation", facecolor="lightgray")
    mp.title("Trading Simulation", fontsize=20)
    mp.xlabel("Date", fontsize=14)
    mp.ylabel("Profit", fontsize=14)
    ax = mp.gca()
    ax.xaxis.set_major_locator(md.WeekdayLocator(byweekday=md.MO))
    ax.xaxis.set_minor_locator(md.DayLocator())
    ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y"))
    mp.tick_params(labelsize=10)
    mp.grid(linestyle=":")
    if dates.size > 0:
        dates = dates.astype(md.datetime.datetime)
        mp.plot(dates, profits, c="gray", label="Profit")
        mp.axhline(y=profits.mean(), linestyle=":", color="gray")
    if gain_dates.size > 0:
        gain_dates = gain_dates.astype(md.datetime.datetime)
        mp.plot(gain_dates, gain_profits, "o", c="orangered", label="Gain Profit")
        mp.axhline(y=gain_profits.mean(), linestyle=":", color="orangered")
    if loss_dates.size > 0:
        loss_dates = loss_dates.astype(md.datetime.datetime)
        mp.plot(loss_dates, loss_profits, "o", c="limegreen", label="Loss Profit")
        mp.axhline(y=loss_profits.mean(), linestyle=":", color="limegreen")

    mp.legend()
    mp.gcf().autofmt_xdate()
    mp.show()

if __name__ == '__main__':
    csvName1 = "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV"
    d, o, h, l, c = getStockData(csvName1)
    profits = np.vectorize(profit)(o, h, l, c)
    nan = np.isnan(profits)
    dates, profits = d[~nan], profits[~nan]
    gain_dates, gain_profits = dates[profits > 0], profits[profits > 0]
    loss_dates, loss_profits = dates[profits < 0], profits[profits < 0]
    plot(dates, profits, gain_dates, gain_profits, loss_dates, loss_profits)
 
16. 数据平滑与特征值 
卷积降噪--------->曲线拟合-------->特征值
 消除随机噪声干扰 获得数学模型 反映业务特征 
y = f(x) -> y1=f(x1)
 y = g(x) -> y1=g(x1)
 f(x1)=g(x1)
 f(x1)-g(x1)=0
 f(x)-g(x)=0 的根就是x1
 np.polysub(p1,p2) -> p3
 np.roots(p3) -> x1 
代码：smr.py 
import datetime as dt
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md

N = 8

def getStockData(csvName, skip=1):
    dates, closes = np.loadtxt(
        csvName,
        # max_rows=191,
        max_rows=31,
        delimiter=",",
        skiprows=skip,
        usecols=(2, 7),
        unpack=True,
        dtype="M8[D], f8",
    )
    return dates, closes

def ma5_line_conv(closes):
    medios = np.convolve(closes, np.ones(N) / N, "valid")
    stds = np.zeros(medios.size)
    for i in range(stds.size):
        stds[i] = np.std(closes[i:i+N])
    lowers = medios - 2 * stds
    uppers = medios + 2 * stds
    return medios, lowers, uppers

def plot(dates, closes1, closes2, a_smooth_returns, b_smooth_returns, a_fitted_returns, b_fitted_returns, inters):
    mp.figure("Smoothing Returns", facecolor="lightgray")
    mp.title("Smoothing Returns", fontsize=20)
    mp.xlabel("Date", fontsize=14)
    mp.ylabel("Price", fontsize=14)
    ax = mp.gca()
    ax.xaxis.set_major_locator(md.WeekdayLocator(byweekday=md.MO))
    ax.xaxis.set_minor_locator(md.DayLocator())
    ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y"))
    mp.tick_params(labelsize=10)
    mp.grid(linestyle=":")
    dates = dates.astype(md.datetime.datetime)
    d, r = np.hsplit(inters, 2)
    date = d.astype(int).astype("M8[D]").astype(md.datetime.datetime)
    mp.plot(dates[:-1], closes1, c="dodgerblue", label="Closing Price1", alpha=0.25)
    mp.plot(dates[:-1], closes2, c="orangered", label="Closing Price2", alpha=0.25)
    mp.plot(dates[N-1:-1], a_smooth_returns, c="dodgerblue", label="Smooth Price1", alpha=0.75)
    mp.plot(dates[N-1:-1], b_smooth_returns, c="orangered", label="Smooth Price2", alpha=0.75)
    mp.plot(dates[N-1:-1], a_fitted_returns, linewidth=3, c="dodgerblue", label="Fitted Price2")
    mp.plot(dates[N-1:-1], b_fitted_returns, linewidth=3, c="orangered", label="Fitted Price2")
    mp.scatter(date, r, marker="o", c="firebrick", s=30, lw=1, zorder=3)
    mp.legend()
    mp.gcf().autofmt_xdate()
    mp.show()

if __name__ == '__main__':
    csvName1 = "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV"
    csvName2 = "C:/Users/Hasee/Desktop/stocks/深证A股/000159.SZ.CSV"
    d1, c1 = getStockData(csvName1)
    d2, c2 = getStockData(csvName2, skip=3468)
    # diff数组中的后一个减去前一个的到size为n-1的另一个数组，returns代表收益率
    returns1 = np.diff(c1) / c1[:-1]
    returns2 = np.diff(c2) / c2[:-1]
    weights = np.hanning(N) # 汉宁窗
    weights /= weights.sum()
    a_smooth_returns = np.convolve(returns1, weights, "valid")
    b_smooth_returns = np.convolve(returns2, weights, "valid")

    days = d1[N-1:-1].astype(int)
    degree = 4
    a_p = np.polyfit(days, a_smooth_returns, deg=degree)
    a_fitted_returns = np.polyval(a_p, days)
    b_p = np.polyfit(days, b_smooth_returns, deg=degree)
    b_fitted_returns = np.polyval(b_p, days)
    sub_p = np.polysub(a_p, b_p)
    roots = np.roots(sub_p)
    reals = roots[np.isreal(roots)].real
    inters = []
    for real in reals:
        if days[0] <= real <= days[-1]:
            inters.append([real, np.polyval(a_p, real)])
    inters.sort()
    inters = np.array(inters)
    plot(d1, returns1, returns2, a_smooth_returns, b_smooth_returns, a_fitted_returns, b_fitted_returns, inters)
 
五、矩阵和ufunc 
1. 矩阵 
ndarray -> matrix
 numpy.matrix(可被解释为矩阵的二维容器,copy=[True]/False) -> 矩阵对象
 1 2 3
 4 5 6
 “1 2 3; 4 5 6”
 numpy.mat(可被解释为矩阵的二维容器)
 数据共享，相当于copy=False的matrix()
 numpy.bmat(“A B; C D”)
 A: 1 2 B: 5 6
 3 4 7 8
 C: 9 10 D: 13 14
 11 12 15 16
 T - 转置
 I - 逆矩阵
 A x B = E: 1 0 0
 0 1 0
 0 0 1
 B = A.I
 a = np.array([
 [1,2],
 [3,4]
 ])
 b = np.array([
 [5,6],
 [7,8]
 ])
 c = a * b
 5 12
 21 32 
a = np.mat([
 [1,2],
 [3,4]
 ])
 b = np.mat([
 [5,6],
 [7,8]
 ])
 c = a * b:
 19 22
 43 50 
       A D
       B E
       C F
a b c  m n
d e f  o p
m = aA+bB+cC
n = aD+bE+cF
o = dA+eB+fC
p = dD+eE+fF
 
代码：mat.py 
import numpy as np

a = np.array([
    [1, 2],
    [3, 4],
])
b = np.matrix(a, copy=False)
print(b, type(b))
c = np.mat(a)
print(c, type(c))
a *= 10
print(a)
print(b)
print(c)
d = np.mat("1 2; 3 4")
print(d)
e = np.mat("5 6; 7 8")
f = np.bmat("d e")
print(f)
g = np.bmat("d; e")
h = d.I
print(h)
print(h * d)
i = f.I # 广义逆矩阵，适用于非方阵
print(i)
print(i * f)
j = np.array([
    [5, 6],
    [7, 8]
])
k = a * j
print(a, j, k, sep="n")
a = np.mat(a)
j = np.mat(j)
k = a * j
print(a, j, k, sep="n")
 
2. ufunc，统一（泛）化函数 
2.1 numpy.frompyfunc(标量函数,参数个数,返回值个数) 
返回-> numpy.ufunc类型的函数对象
 ufunc函数对象(矢量参数,…) -> 矢量返回值,… 
代码：vec2.py 
import numpy as np

def fun(a, b):
    return a + b, a - b, a * b

A = np.array([10, 20, 30])
B = np.array([100, 200, 300])
C = np.vectorize(fun)(A, B)
print(C)
C = np.frompyfunc(fun, 2, 3)(A, B)
print(C)

def foo(a):
    def bar(b):
        return a + b, a - b, a * b
    return np.frompyfunc(bar, 1, 3)

C = foo(100)(A)
print(C)
C = foo(B)(A)
print(C)
 
2.2 numpy.add 
reduce - 累加
 accumulate - 累加过程
 reduceat - 在指定位置累加
 outer - 外和 
代码：add.py 
import numpy as np

a = np.arange(1, 7)
print(a)
b = np.add.reduce(a)
print(b)
c = np.add.accumulate(a)
print(c)
d = np.add.reduceat(a, [0, 2, 4])
#   4   7   11
# 1 2 3 4 5 6
# 0   2   4
print(d)
e = np.add.outer([10, 20, 30], a) # 外和
#  +  1  2  3  4  5  6
# 10 11 12 13 14 15 16
# 20 21 22 23 24 25 26
# 30 31 32 33 34 35 36
print(e)
f = np.outer([10, 20, 30], a) # 外积
#  *  1  2  3   4   5   6
# 10 10 20 30  40  50  60
# 20 20 40 60  80 100 120
# 30 30 60 90 120 150 180
print(f)
 
2.3 除法 
A.真除
 numpy.true_divide()
 numpy.divide()
 /
 [5 5 -5 -5]<真除>[2 -2 2 -2]=[2.5 -2.5 -2.5 2.5]
 B.地板除
 numpy.floor_divide()
 //
 [5 5 -5 -5]<地板除>[2 -2 2 -2]=[2 -3 -3 2]
 C.天花板除
 [5 5 -5 -5]<天花板除>[2 -2 2 -2]=[3 -2 -2 3]
 D.截断除
 [5 5 -5 -5]<真除>[2 -2 2 -2]=[2 -2 -2 2] 
代码：div.py 
import numpy as np

a = np.array([5, 5, -5, -5])
b = np.array([2, -2, 2, -2])
print(a, b)
c = np.true_divide(a, b)
d = np.divide(a, b)
e = a / b
print(c, d, e)
f = np.floor_divide(a, b)
g = a // b
print(f, g)
h = np.ceil(a / b).astype(int)
print(h)
i = np.trunc(a / b).astype(int)
print(i)
j = (a / b).astype(int)
print(j)
 
2.4 余数 
被除数<除以>除数=商…余数
 除数<乘以>商+余数=被除数
 地板余数：做地板除所得到的余数
 numpy.remainder()
 numpy.mod()
 %
 [5 5 -5 -5]<地板除>[2 -2 2 -2]=[2 -3 -3 2]…[1 -1 1 -1]
 截断余数：做截断除所得到的余数
 numpy.fmod()
 [5 5 -5 -5]<截断除>[2 -2 2 -2]=[2 -2 -2 2]…[1 1 -1 -1] 
代码：mod.py 
import numpy as np

a = np.array([5, 5, -5, -5])
b = np.array([2, -2, 2, -2])

print(a, b)
c = np.remainder(a, b)
d = np.mod(a, b)
e = a % b
print(c, d, e)
f = np.fmod(a, b)
print(f)
 
 python中几乎所有的算术和关系运算符都被numpy借助ufunc实现为可对数组操作的矢量化运算符
 代码：fibo.py
 1 1 1 1 1 1
 1 0 1 0 1 0
 1 1 2 1 3 2 5 3
 1 0 1 1 2 1 3 2 …
 f1f2 f3 f4 f5 fn
  F^2 3 4 n-1
  
import numpy as np

N = 50

fn = int((np.mat("1. 1.; 1. 0.") ** (N -1))[0, 0])
print(fn)
r = np.sqrt(5)
res = int((((1 + r) / 2) ** N - ((1-r) / 2) ** N) / r)
print(res)
 
2.5 numpy中的三角函数都是ufunc对象，可以对参数数组中的每个元素进行三角函数运算，并将运算结果以数组形式返回 
x = Asin(at+pi/2)
 y = Bsin(bt) 
4 sin((2k-1)t)
 — x ------------
 pi 2k-1 
代码：lissa.py 
import numpy as np
import matplotlib.pyplot as mp

A, a, B, b = 10, 9, 5, 8
t = np.linspace(0, 2 * np.pi, 201)
x = A * np.sin(a * t + np.pi / 2)
y = B * np.sin(b * t)

mp.figure("Lissajous", facecolor="lightgray")
mp.title("Lissajous", fontsize=20)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.grid(linestyle=":")
mp.plot(
    x, y,
    c="orangered",
    label="Lissajous"
)
mp.legend()
mp.show()
 
代码：squr.py 
import numpy as np
import matplotlib.pyplot as mp

def wave(n):
    k = np.arange(1, n + 1)
    def f(x):
        return 4 / np.pi * np.sum(np.sin((2 * k - 1) * x) / (2 * k - 1))
    return np.frompyfunc(f, 1, 1)

x = np.linspace(0, 2 * np.pi, 201)
y1 = wave(1)(x)
y2 = wave(2)(x)
y3 = wave(10)(x)
y4 = wave(100)(x)

mp.figure("Lissajous", facecolor="lightgray")
mp.title("Lissajous", fontsize=20)
mp.xlabel("x", fontsize=14)
mp.ylabel("y", fontsize=14)
mp.grid(linestyle=":")
mp.plot(
    x, y1,
    c="orangered",
    label="n=1"
)
mp.plot(
    x, y2,
    c="dodgerblue",
    label="n=2"
)
mp.plot(
    x, y3,
    c="limegreen",
    label="n=10"
)
mp.plot(
    x, y4,
    c="gray",
    label="n=100"
)
mp.legend()
mp.show()
 
2.6 实现位运算的ufunc 
A.异或：^/xor/bitwise_xor
 1 ^ 0 = 1
 1 ^ 1 = 0
 0 ^ 0 = 0
 0 ^ 1 = 1
 运用：if a^b < 0 then a 和b异号 
B.与：&/and/bitwise_and
 1 & 0 = 0
 1 & 1 = 1
 0 & 0 = 0
 0 & 1 = 0
 运用：
 1 2^0 00000001 -1 -> 00000000
 2 2^1 00000010 -1 -> 00000001
 4 2^2 00000100 -1 -> 00000011
 8 2^3 00001000 -1 -> 00000111
 16 2^4 00010000 -1 -> 00001111
 __&__/
 |
 0
 if a & (a-1) == 0 then a 是2的幂 
C.移位
 #<< /lshift/left_shift (除2)
 #>> /rshift/right_shift (乘2)
 代码：bit.py 
import numpy as np

a = np.array([0, 1, -2, 3, 4])
b = np.array([-5, 6, 7, 8, -9])
c = a ^ b
# c = a.__xor__(b)
# c = np.bitwise_xor(a, b)
print(a, b, c, sep="n")
print(np.where(c < 0)[0])
d = np.arange(1, 21)
print(d)
e = d & (d - 1)
print(e)
print(d[e == 0])
 
六、Numpy的子模块 
1. 线性代数模块 
1.1 矩阵的逆：inv() 
在线性代数中，矩阵A与其逆矩阵A^-1的乘积是一个单位矩阵I。
 使用numpy.linalg.inv()函数求矩阵的逆矩阵，要求必须是方阵，即行列数相等的矩阵。 
代码：inv.py 
import numpy as np

a = np.matrix("1 2 3; 8 9 4; 7 6 5")
print(a)
b = np.linalg.inv(a)
print(b)
c = a * b
print(c)
 
1.2 解线（一次）性方程组：solve() 
x-2y+z=0
 2y-8z-8=0
 -4x+5y+9z+9=0 
1x + -2y + 1z = 0
 0x + 2y + -8z = 8
 -4x + 5y + 9z = -9 
1 -2 1 x 0
 0 2 -8 X y = 8
 -4 5 9 z -9 
a x b
 x = numpy.linalg.lstsq(a,b)[0]
 x = numpy.linalg.solve(a,b) 
代码：solve.py 
import numpy as np

A = np.mat("1 -2 1; 0 2 -8; -4 5 9")
B = np.mat("0; 8; -9")
X = np.linalg.solve(A, B)
print(X)
 
1.3 特征值和特征向量：eig 
对于n阶方阵A，如果存在数a和非零n维向量x，使得Ax = ax，则称a是矩阵A的一个特征值，x是矩阵A属于特征值a的特征向量。
 numpy.linalg.eig(A) -> a, x
 a: 1 2
 | |
 x: 1 2
 3 4
 5 6 
代码：eig.py 
import numpy as np

A = np.mat("3 -2; 1 0")
eigvals, eigvecs = np.linalg.eig(A)
print(eigvals)
print(eigvecs)
print(A * eigvecs[:, 0])
print(eigvals[0] * eigvecs[:, 0])
 
1.4 奇异值分解：svd() 
对于一个满足特定条件的矩阵M，可以被分解为三个矩阵的乘积，M=USV，其中U和V都是正交矩阵，即UU^T=I，VVT=I，S矩阵除主对角线上以外的元素均为0，主对角线上的元素被称为矩阵M的奇异值。
 numpy.linalg.svd(M) -> U,S主对角线上的元素,V 
代码：svd.py 
import numpy as np

M = np.mat("4 11 14; 8 7 -2")
print(M)
U, s, V = np.linalg.svd(M, full_matrices=False)
S = np.diag(s)
print(U, S, V, sep="n")
print(U * U.T, V * V.T, sep="n")
print("usv", U * S * V)
 
1.5 广义逆矩阵：pinv() 
代码：pinv.py 
import numpy as np

A = np.mat("11 12 13 14; 20 21 22 15; 19 18 17 16")
print("A:", A)
B = np.linalg.pinv(A)
print("B:", B)
C = A * B
print("C:", C)
 
1.6 行列式：det() 
a b
 c d ad-bc 
a b c
 d e f
 g h i
 a(ei-fh)-b(di-fg)+c(dh-eg)
 numpy.linalg.det(方阵) -> 行列式的值 
代码：det.py 
import numpy as np

A = np.mat("2 1; 3 4")
print(A)
a = np.linalg.det(A)
print(a)
B = np.mat("3 2 1; 4 9 8; 5 6 7")
print(B)
b = np.linalg.det(B)
print(b)
 
2. 快速傅里叶变换（fft) 
s = F(t) -> (A/P,fai) = G(f) 
y = Asin(wx+fai)
 w1 -> A1,f1
 w2 -> A2,f2
 …
 (A,fai) = f(w)
  
numpy.fft.fftfreq(采样总数, 采样周期) -> 采样频率数组
 numpy.fft.fft(原始信号数组) -> 返回复数数组(Ai+faij)
 A = np.abs(复数数组) 
代码：fft.py 
import numpy as np
import matplotlib.pyplot as mp
import numpy.fft as nf

times = np.linspace(0, 2 * np.pi, 201)
sigs1 = 4 / (1 * np.pi) * np.sin(1 * times)
sigs2 = 4 / (3 * np.pi) * np.sin(3 * times)
sigs3 = 4 / (5 * np.pi) * np.sin(5 * times)
sigs4 = 4 / (7 * np.pi) * np.sin(7 * times)
sigs5 = 4 / (9 * np.pi) * np.sin(9 * times)
sigs6 = sigs1 + sigs2 + sigs3 + sigs4 + sigs5
freqs = nf.fftfreq(times.size, times[1]-times[0]) # 返回角频率数组
ffts = nf.fft(sigs6) # 返回复数数组
pows = np.abs(ffts)

sigs7 = nf.ifft(ffts)


mp.figure("FFT", facecolor="lightgray")
mp.subplot(121)
mp.title("Time Domain", fontsize=16)
mp.xlabel("Time", fontsize=12)
mp.ylabel("Signal", fontsize=12)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.plot(times, sigs1, label="{:.4f}".format(1 / (2 * np.pi)))
mp.plot(times, sigs2, label="{:.4f}".format(3 / (2 * np.pi)))
mp.plot(times, sigs3, label="{:.4f}".format(5 / (2 * np.pi)))
mp.plot(times, sigs4, label="{:.4f}".format(7 / (2 * np.pi)))
mp.plot(times, sigs5, label="{:.4f}".format(9 / (2 * np.pi)))
mp.plot(times, sigs6, label="{:.4f}".format(1 / (2 * np.pi)))
mp.plot(times, sigs7, linewidth=6, alpha=0.5, label="{:.4f}".format(1 / (2 * np.pi)))
mp.legend()

mp.subplot(122)
mp.title("Frequency domain", fontsize=16)
mp.xlabel("Frequency", fontsize=12)
mp.ylabel("Power", fontsize=12)
mp.plot(freqs[freqs >= 0], pows[freqs >= 0], c="orangered", label="Frequency Spectrum")
mp.legend()
mp.tight_layout()
mp.show()
 
代码：filter.py 
import numpy as np
import numpy.fft as nf
import scipy.io.wavfile as wf
import matplotlib.pyplot as mp

sample_rate, noised_sigs = wf.read("D:/pythonStudy/datalysis/notes/noised.wav")
noised_sigs = noised_sigs / 2 ** 15
times = np.arange(len(noised_sigs)) / sample_rate
freqs = nf.fftfreq(times.size, d=1/sample_rate)
noised_ffts = nf.fft(noised_sigs)
noised_pows = np.abs(noised_ffts)
fund_freq = freqs[noised_pows.argmax()]
noised_indices = np.where(np.abs(freqs) != fund_freq)
filter_ffts = noised_ffts.copy()
filter_ffts[noised_indices] = 0
filter_pows = np.abs(filter_ffts)

filter_sigs = nf.ifft(filter_ffts).real
wf.write("D:/pythonStudy/datalysis/notes/filter.wav", sample_rate, (filter_sigs * 2 ** 15).astype(np.int16))

mp.figure("Filter", facecolor="lightgray")
mp.subplot(221)
mp.title("Time Domaiin", fontsize=16)
mp.ylabel("Signal", fontsize=12)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.plot(times[:178], noised_sigs[:178], c="orangered", label="Noised")
mp.legend()

mp.subplot(222)
mp.title("Frequency Domain", fontsize=16)
mp.ylabel("Power", fontsize=12)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.semilogy(freqs[freqs >= 0], noised_pows[freqs >= 0], label="noised", c="limegreen")
mp.legend()

mp.subplot(224)
mp.xlabel("Frequency", fontsize=12)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.plot(freqs[freqs >= 0], filter_pows[freqs >= 0], label="Filter", c="dodgerblue")
mp.legend()

mp.subplot(223)
mp.title("Time Domaiin", fontsize=16)
mp.xlabel("Time", fontsize=12)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.plot(times[:178], filter_sigs[:178], c="hotpink", label="Filter")
mp.legend()

mp.tight_layout()
mp.show()
 
3. 随机数模块(random) 
3.1 二项分布 
numpy.random.binomial(n, p, size)
 ——>包含size个随机数的数组，其中每个随机数来自n次尝试中的成功次数，每次尝试成功的概率为p。 
猜硬币游戏：初始筹码1000，每轮猜9次，猜对5次或5次以上为赢，筹码加一，否则为输，筹码减一。模拟10000轮，记录筹码数的变化。binomial(9, 0.5, 10000) 
代码：bi.py 
import numpy as np
import matplotlib.pyplot as mp

outcomes = np.random.binomial(9, 0.5, 10000)
chips = [1000]
for outcome in outcomes:
    if outcome >= 5:
        chips.append(chips[-1] + 1)
    else:
        chips.append(chips[-1] - 1)
chips = np.array(chips)
if chips[-1] < chips[0]:
    color = "limegreen"
elif chips[-1] > chips[0]:
    color = "orangered"
else:
    color = "dodgerblue"

mp.figure("Binomial", facecolor="lightgray")
mp.title("Binomial", fontsize=16)
mp.xlabel("Round", fontsize=12)
mp.ylabel("Chips", fontsize=12)
mp.tick_params(labelsize=10)
mp.plot(chips, c=color, label="Ships")
mp.legend()
mp.show()
 
3.2 超几何分布 
numpy.random.hypergeometric(ngood, nbad, nsample, size)
 ——>包含size个随机数的数组，其中每个随机数来自随机抽取nsample个样本中好样本的个数，总样本中共有ngood个好样本，nbad个坏样本。 
摸球球游戏：将25个好球和1个坏球放在一起，每轮摸出3个球，全为好球加1分，若有坏球则减6分。模拟100轮，记录分值的变化。hypergeometric(25, 1, 3, 100)
 代码：hyper.py 
import numpy as np
import matplotlib.pyplot as mp

outcomes = np.random.hypergeometric(25, 1, 3, 100)
scores = [0]
for outcome in outcomes:
    if outcome == 3:
        scores.append(scores[-1] + 1)
    else:
        scores.append(scores[-1] - 7)
scores = np.array(scores)
if scores[-1] < scores[0]:
    color = "limegreen"
elif scores[-1] > scores[0]:
    color = "orangered"
else:
    color = "dodgerblue"

mp.figure("Hypergeometric", facecolor="lightgray")
mp.title("Hypergeometric", fontsize=16)
mp.xlabel("Round", fontsize=12)
mp.ylabel("scores", fontsize=12)
mp.tick_params(labelsize=10)
mp.plot(scores, c=color, label="Score")
mp.legend()
mp.show()
 
3.3 正态分布 
numpy.random.normal(size) -> 包含size个随机数的数组，其中每个随机数服从标准正态分布规律，即平均值为0，标准差为1的正态分布。
 【1 1 2 1 1 2 2 2 5 1 2 3 … 10 10】
 【1,3】20
 【4,6】60
 【7,9】40
 代码：norm.py 
import numpy as np
import matplotlib.pyplot as mp

samples = np.random.normal(size=10000)
mp.figure("Standard Normal", facecolor="lightgray")
mp.title("Standard Normal", fontsize=16)
mp.xlabel("Sample", fontsize=12)
mp.ylabel("Occurrence", fontsize=12)
mp.tick_params(labelsize=10)
mp.grid(axis="y", linestyle=":")
bins = mp.hist(samples, 100, density=True, edgecolor="steelblue", facecolor="deepskyblue", label="Standard Normal")[1]
# hist直方图，density，归一化，100画100个方块
# bins返回每一个块的左边的横坐标
probs = np.exp(-bins ** 2 / 2) / np.sqrt(2 * np.pi)
mp.plot(bins, probs, "o-", c="orangered", label="Probability")
mp.legend()
mp.show()
 
七、Numpy的专用函数 
1. 间接联合排序 
间接：获取排序样本的下标。
 原始序列：8 2 3 1 7 4 6 5 9
 序列下标：0 1 2 3 4 5 6 7 8
 直接排序：1 2 3 4 5 6 7 8 9
 间接排序：3 1 2 5 7 6 4 0 8
 姓名：张三 李四 王五 赵六 陈七
 成绩：90 70 50 80 60
 2 4 1 3 0
 年龄：20 30 30 20 40
 3 0 2 1 4
 numpy.lexsort((参考序列, 待排序列)) -> 索引序列
 numpy.sort_complex(复数数组) -> 按实部的升序排列，实部相同的参考虚部的升序 
代码：sort.py 
import numpy as np

names = np.array([
    "张三",
    "李四",
    "王五",
    "赵六",
    "陈七"
])
scores = np.array([90, 70, 50, 80, 60])
ages = np.array([20, 30, 30, 20, 40])
# 按照年龄的升序打印姓名
# 年龄相同的按成绩的升序
print(names[np.lexsort((scores, ages))])
c = ages + scores * 1j
print(c)
d = np.sort_complex(c)
print(d)
 
2. 最大值最小值 
numpy.xxx
 max/min
 argmax/argmin
 nanmax/nanmin
 nanargmax/nanargmin 
max - 最大值
 min - 最小值
 arg - 间接下标
 nan - 忽略无效值 
代码:nan.py 
import numpy as np

a = np.array([1, 2, np.nan, 3, 4])
print(np.max(a), np.min(a))
print(np.argmax(a), np.argmin(a))
print(np.nanmax(a), np.nanmin(a))
print(np.nanargmax(a), np.nanargmin(a))
 
3. 有序插入 
有序序列：[1, 2, 4, 5, 6, 8, 9]
 被插序列：[7, 3]
 将被插序列插入到有序序列的什么位置，结果还是有序的？
 numpy.searchsorted(有序序列，被插序列) -> 插入位置
 numpy.insert(有序序列, 插入位置, 被插序列) -> 插入结果 
代码：insert.py 
import numpy as np

a = np.array([1, 2, 4, 5, 6, 8, 9])
b = np.array([7, 3])
c = np.searchsorted(a, b)
print(c)
d = np.insert(a, c, b)
print(d)
 
4. 定积分 
y = f(x)
 /b
 |f(x)dx
 /a
 import scipy.integrate as si
 def f(x):
 y = …x…
 return y
 si.quad(f, a, b)[0] -> 定积分值 
代码：integ.py 
import numpy as np
import scipy.integrate as si
import matplotlib.pyplot as mp
import matplotlib.patches as mc

def f(x):
    y = 2 * x ** 2 + 3 * x + 4
    return y
a, b = -5, 5
quad = si.quad(f, a, b)[0]
print(quad)
x1 = np.linspace(a, b, 1001)
y1 = f(x1)

n = 50
x2 = np.linspace(a, b, n + 1)
y2 = f(x2)
area = 0
for i in range(n):
    area += (y2[i] + y2[i + 1]) * (x2[i + 1] - x2[i]) / 2
print(area)

mp.figure("Integral", facecolor="lightgray")
mp.title("Integral", fontsize=16)
mp.xlabel("x", fontsize=12)
mp.ylabel("y", fontsize=12)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.plot(x1, y1, linewidth=8, c="orangered", label="$y=2x^2+3x+4$", zorder=0)
for i in range(n):
    mp.gca().add_patch( # 将图列加入到当前窗口
        mc.Polygon(
            [[x2[i], 0],
             [x2[i], y2[i]],
             [x2[i + 1], y2[i + 1]],
             [x2[i + 1], 0]],
            fc="deepskyblue",
            ec="dodgerblue",
            alpha=0.5
        ))
mp.legend()
mp.show()
 
5. 插值 
 
import scipy.interpolate as si
 si.interp1d(离散样本水平坐标, 离散样本垂直坐标, kind=插值器种类) -> 一维插值器对象
 一维插值器对象(插值样本水平坐标) -> 插值样本垂直坐标 
代码：inter.py 
import numpy as np
import scipy.interpolate as si
import matplotlib.pyplot as mp

min_x, max_x = -2.5, 2.5
con_x = np.linspace(min_x, max_x, 1001)
con_y = np.sinc(con_x)
dis_x = np.linspace(min_x, max_x, 11)
dis_y = np.sinc(dis_x)
linear = si.interp1d(dis_x, dis_y)
lin_x = np.linspace(min_x, max_x, 51)
lin_y = linear(lin_x)
cubic = si.interp1d(dis_x, dis_y, kind="cubic")
cub_x = np.linspace(min_x, max_x, 51)
cub_y = cubic(cub_x)

mp.figure("Interpolation", facecolor="lightgray")
mp.subplot(221)
mp.title("Continuous", fontsize=16)
mp.ylabel("y", fontsize=12)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.plot(con_x, con_y, c="hotpink", label="Continuous")
mp.legend()

mp.subplot(222)
mp.title("Discrete", fontsize=16)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.scatter(dis_x, dis_y, c="hotpink", label="Continuous")
mp.legend()

mp.subplot(223)
mp.title("Linear", fontsize=16)
mp.xlabel("x", fontsize=12)
mp.ylabel("y", fontsize=12)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.plot(lin_x, lin_y, "o-", c="limegreen", label="Linear")
mp.scatter(dis_x, dis_y, c="hotpink", zorder=3)
mp.legend()

mp.subplot(224)
mp.title("Linear", fontsize=16)
mp.xlabel("x", fontsize=12)
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
mp.plot(cub_x, cub_y, "o-", c="dodgerblue", label="Cubic")
mp.scatter(dis_x, dis_y, c="hotpink", zorder=3)
mp.legend()

mp.show()
 
6. 金融计算 
6.1 绘制K线图 
import mpl_finance as mf
 mf.candlestick_ohlc(坐标图对象, 日期和开高低收价格数组, K线实体部分的宽度(0-1), 阳线颜色, 阴线颜色) 
代码：k.py 
import numpy as np
import matplotlib.pyplot as mp
import matplotlib.dates as md
import mpl_finance as mf
dates, openings, highests, lowests, closes = np.loadtxt(
    "C:/Users/Hasee/Desktop/stocks/深证A股/000166.SZ.CSV",
    max_rows=191,
    delimiter=",",
    skiprows=1,
    usecols=(2, 4, 5, 6, 7),
    unpack=True,
    dtype="M8[D], f8, f8, f8, f8"
)

mp.figure("Candle", facecolor="lightgray")
mp.title("Candle", fontsize=16)
mp.xlabel("Date", fontsize=12)
mp.ylabel("Price", fontsize=12)
ax = mp.gca()
ax.xaxis.set_major_locator(md.MonthLocator())
ax.xaxis.set_major_formatter(md.DateFormatter("%d %b %Y"))
mp.tick_params(labelsize=10)
mp.grid(linestyle=":")
dates = dates.astype(md.datetime.datetime)
dates = md.date2num(dates)
ohlcs = np.column_stack((dates, openings, highests, lowests, closes))
mf.candlestick_ohlc(mp.gca(), ohlcs, 0.8, "orangered", "limegreen")
mp.gcf().autofmt_xdate()
mp.show()
 
6.2 终值和现值 
代码：fin.py 
import numpy_financial as nf
# 终值=nf.fv(利率, 期数, 每期支付, 现值)
# 将1000元以1%的年利率存入银行5年，每年加存100元
# 到期后本息合计多少钱？
fv = nf.fv(0.01, 5, -100, -1000)
print(round(fv, 2))

# 现值=nf.pv(利率, 期数, 每期支付, 终值)
# 将多少钱以1%的年利率存入银行5年，每年加存100元
# 到期后本息合计fv元？
pv = nf.pv(0.01, 5, -100, fv)
print(round(pv, 2))

# 净现值=nf.npv(利率, 现金流)
# 将1000元以1%的年利率存入银行5年，每年加存100元
# 相当于一次性存入多少元？
npv = nf.npv(0.01, [-1000, -100, -100, -100, -100, -100])
print(round(npv, 2))
fv = nf.fv(0.01, 5, 0, npv)
print(round(fv, 2))

# 内部收益率=nf.irr(现金流)
# 将1000元存入银行5年，以后逐年提现100元，200元，300元，400元，500元
# 银行的年利率达到多少，可在最后一次提现后偿清全部本息？即净现值为0
irr = nf.irr([-1000, 100, 200, 300, 400, 500])
print(round(irr, 2))
npv = nf.npv(irr, [-1000, 100, 200, 300, 400, 500])
print(round(npv, 2))

# 每期支付=nf.pmt(利率, 期数, 现值)
# 以1%的年利率从银行贷款1000元，分5年还清
# 平均每年还多少钱？
pmt = nf.pmt(0.01, 5, 1000)
print(round(pmt, 2))

# 期数=nf.nper(利率, 每期支付, 现值)
# 以1%的年利率贷款1000元，平均每年换pmt元
# 多少年后能还清？
nper = nf.nper(0.01, pmt, 1000)
print(nper)

# 利率=nf.rate(期数, 每期支付, 现值, 终止)
# 从银行贷款1000元，平均每年还pmt元，nper年还清
# 年利率是多少？
rate = nf.rate(nper, pmt, 1000, 0)
print(round(rate, 2))

datanlysis

Python相关栏目本月热门文章