深度学习之路2:numpy的使用小结

Numpy简介

Numpy 顾名思义,我们可以拆解成number python来解读,就是python里用来处理数字的一个库,下面我们可以引述,numpy官方网站的一段话来介绍 :

NumPy is the fundamental package for scientific computing in Python.
It is a Python library that provides a multidimensional array object,
various derived objects (such as masked arrays and matrices), and an
assortment of routines for fast operations on arrays, including
mathematical, logical, shape manipulation, sorting, selecting, I/O,
discrete Fourier transforms, basic linear algebra, basic statistical
operations, random simulation and much more.

At the core of the NumPy package, is the ndarray object. This
encapsulates n-dimensional arrays of homogeneous data types, with many
operations being performed in compiled code for performance. There are
several important differences between NumPy arrays and the standard
Python sequences:

  • NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will
    create a new array and delete the original.
  • The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception:
    one can have arrays of (Python, including NumPy) objects, thereby
    allowing for arrays of different sized elements. NumPy arrays
    facilitate advanced mathematical and other types of operations on
    large numbers of data. Typically, such operations are executed more
    efficiently and with less code than is possible using Python’s
    built-in sequences.
  • A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support
    Python-sequence input, they convert such input to NumPy arrays prior
    to processing, and they often output NumPy arrays. In other words, in
    order to efficiently use much (perhaps even most) of today’s
    scientific/mathematical Python-based software, just knowing how to use
    Python’s built-in sequence types is insufficient - one also needs to
    know how to use NumPy arrays.

从上面可以看出,Numpy库主要包括多维序列的创建、计算(强调了这里的array数列不是python里的数列类型,比它强大很多),数列的初始化包含了大量科学计算的数列所需要的初始化,例如正太随机分布多维数列初始化、多维矩阵等。另外numpy也包括数列的处理分析,比用原生态python语句高出几倍效率。

numpy入手

Numpy 的下载以及调用

Numpy 库一般只有在安装Anaconda 数据包才会自带,如果只是安装了python的朋友可以在控制台输入:

1
pip install --upgrade  numpy

安装完成后,只要在py文件中 import numpy 便可以了

Numpy 的属性及方法

这里介绍一个小操作,在Jupyter notebook 中写代码的时候只要在调用方法后输入?执行后就会跳出具体的方法帮助文档,但是包对象的帮助文档调出需要在对象前。

1
2
?numpy  # 查看numpy
numpy.random? # 查看random方法的使用说明

下面列表是对Numpy的部分属性方法汇总:

方法名 使用说明 举例 备注
array(python.list/tuples) 数据的创建 t=[2,3]
numpy.array(t)
创建方式有多种
arange([min,max,interval]) 通过范围创建数列 a=numpy.arange(4,7,2) interval是间隔
范围内不包括max,包括min
ndim 所在数列的维度,返回int a.ndim & numpy.ndim(a) 多用于矩阵或多维
shape 多维数列的形状,例如矩阵会返回矩阵行列 numpy.shape(a) & a.shape 复杂维度的形状
add() 数列相加numpy.add(a,b) 等同a+b numpy.add(a,b)
zeros([colum,rows…]) 生成一个0对角矩阵 numpy.zeros([3,5])
ones([colum,rows…]) 生成1对角矩阵 numpy.ones([2,2])
dot() 矩阵乘法,不同于a* b numpy.dot(a,b) 矩阵乘法理解
dtype 数列数据类型 numpy.aray(t,dtype=”int32” 定义数据类型
itemsize 字符所占内存大小 t.itemsize
fromfuntion(fun(),dtype) 根据函数创建数列 numpy.fromfuntion(getx(2,3),dtyp=”int64”)
round(array,place) 同python四舍五入 numpy.round(t,3) place是保留小数第几位
allclose(array1,array2) 比较两个数列是否相同 见下面 矩阵求同
swapaxes() 轴转置 a.swapaxes(2,3)
bincont() 数列所在位置的相等数值个数 np.bincount(np.array([1,2,3,4,5,6,4,5,6,4])) 见附录代码,输出列数是array最大值
meshgrid() 获取数列的空间坐标 x,y=np.meshgrid(np.arange(3,6,1),np.arange(3,6,1)) 见附录代码
where(condition,a,b) 根据条件condition来抉择a的数值或者b的 np.where([True,False],a,b)
sum () 求和 np.sum(a)
mean() 求平均值 c.mean()
std() 方差 c.std()
in1d(a,b) a中的元素是否在b内 np.in1d(a,b)
unique(array) 提取出所有不重复的元素 np.unique([1,1,234,2,2])

矩阵的操作

矩阵转置

a.T 或者通过方法transpose()来转置:

1
2
3
4
5
6
7
8
9
10
11
12
  >>> x = np.arange(4).reshape((2,2))
>>> x
array([[0, 1],
[2, 3]])
>>>
>>> np.transpose(x)
array([[0, 2],
[1, 3]])
>>>
>>> x = np.ones((1, 2, 3))
>>> np.transpose(x, (1, 0, 2)).shape
(2, 1, 3)

求逆矩阵

linalg 这个包是所有线性代数操作方法的集合包

1
numpy.linalg.inv(np.dot(a.T,a))

求同

我们如果需要检测逆矩阵是否正确,就可以通过逆矩阵和矩阵乘积是否和1对角矩阵相同就行了

1
2
3
a_t=np.linalg.inv(a)
print(np.dot(a,a_t))
np.allclose(np.dot(a,a_t),np.eye(2))

Numpy数据的保存和读取

下面说的时常用文件存储的方式。
单个数据存储

1
2
3
4
5
6
7
8
 # 数据存储单组
np.save("a.npy",a)
print(np.load("a.npy"))
#存储多组
np.savez("b.npz",x=a,y=d)
np.load("b.npz")['y']
#存储成txt文件
np.savetxt("x.txt",a)

附录

是上面所说知识点的用法例子

We can create numpy arrays from native python tuples or lists.

1
2
3
4
import numpy as np
tp=([1,2],[34,4])
a=np.array(tp,dtype="int32")
print(a.shape)
(2, 2)
1
print(np.arange(5,18,1))   #范围内特定等间距递增
[ 5  6  7  8  9 10 11 12 13 14 15 16 17]
1
print(np.linspace(0,10,11)) # 范围内间距平分
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
1
2
c=np.arange(30).reshape(3,5,2)
c
array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9]],

       [[10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],

       [[20, 21],
        [22, 23],
        [24, 25],
        [26, 27],
        [28, 29]]])
1
c.sum(axis=2)
array([[ 1,  5,  9, 13, 17],
       [21, 25, 29, 33, 37],
       [41, 45, 49, 53, 57]])
1
2
3
d=[1,4],[4,5]
d=np.array(d)
a.dot(d)vvv
array([[  9,  14],
       [ 50, 156]])

One dimissions arrays

1
np.arange(3,25).shape
(22,)
1
np.shape(a)
(2, 2)
1
np.zeros([3,5])
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
1
np.dot(a,b)
array([[  9,  14],
       [ 50, 156]])
1
2

a*b
array([[  1,   8],
       [136,  20]])

需要注意的地方,从原数列下标取值,会污染原来数列,而python的数列是没有污染这种概念

1
2
3
4
5
6
 num=[1,3,45,6,7,8]
n=np.array(num)
a_slice=n[2:4]
a_slice[0]=1000000
print(a_slice)
print(n)
[1000000       6]
[      1       3 1000000       6       7       8]
1
n.itemsize
4
1
2
3
4
def f(x,y):                        # 通过函数来创建array
return 4*x+y
nd=np.fromfunction(f,(3,4),dtype="int")
nd
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

根据条件选取的例子,员工五天上班的时间,找出周四的上班时间

1
2
3
4
name=['wangjun','xiaoming','john','tom','qingqing']
names=np.array(name)
worktime=np.round(np.random.randn(5,5)+8.0,2)
worktime
array([[8.95, 7.16, 7.48, 8.49, 8.27],
       [8.26, 9.41, 6.98, 8.78, 8.62],
       [9.05, 8.41, 9.86, 8.74, 9.02],
       [7.86, 6.18, 7.34, 7.55, 7.12],
       [9.7 , 7.54, 6.47, 7.71, 9.21]])
1
print(worktime[names=='john'])
[[9.05 8.41 9.86 8.74 9.02]]
1
worktime.transpose(2)
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-100-1d384739e9f7> in <module>
----> 1 worktime.transpose(2)


ValueError: axes don't match array
1
2
3
4

a_t=np.linalg.inv(a)
print(np.dot(a,a_t))
np.allclose(np.dot(a,a_t),np.eye(2))
[[1. 0.]
 [0. 1.]]





True
1
2
#bincount()
np.bincount(np.array([1,2,3,4,200,1000,4,5,6,4]))
array([0, 1, 1, ..., 0, 0, 1], dtype=int64)
1
2
pointx,pointy=np.meshgrid(np.arange(-10,10,0.02),np.arange(-10,10,0.020))
pointx
array([[-10.  ,  -9.98,  -9.96, ...,   9.94,   9.96,   9.98],
       [-10.  ,  -9.98,  -9.96, ...,   9.94,   9.96,   9.98],
       [-10.  ,  -9.98,  -9.96, ...,   9.94,   9.96,   9.98],
       ...,
       [-10.  ,  -9.98,  -9.96, ...,   9.94,   9.96,   9.98],
       [-10.  ,  -9.98,  -9.96, ...,   9.94,   9.96,   9.98],
       [-10.  ,  -9.98,  -9.96, ...,   9.94,   9.96,   9.98]])
1
pointy
array([[-10.  , -10.  , -10.  , ..., -10.  , -10.  , -10.  ],
       [ -9.98,  -9.98,  -9.98, ...,  -9.98,  -9.98,  -9.98],
       [ -9.96,  -9.96,  -9.96, ...,  -9.96,  -9.96,  -9.96],
       ...,
       [  9.94,   9.94,   9.94, ...,   9.94,   9.94,   9.94],
       [  9.96,   9.96,   9.96, ...,   9.96,   9.96,   9.96],
       [  9.98,   9.98,   9.98, ...,   9.98,   9.98,   9.98]])
1
2
3
4
5
6
#meshgrid
z=pointx**2+pointy**2
import matplotlib.pyplot as mp
mp.imshow(z)
mp.colorbar()
mp.show()

png

1
2
3
4
# where
print(a)
print(d)
np.where([False,True],a,d)
[[ 1  2]
 [34  4]]
[[1 4]
 [4 5]]





array([[1, 2],
       [4, 4]])
1
2
#inld
np.in1d([2,3,4],a)
array([ True, False,  True])
1
np.unique([1,1,234,2,2])
array([  1,   2, 234])
1
2
3
4
5
6
7
8
# 数据存储单组
np.save("a.npy",a)
print(np.load("a.npy"))
#存储多组
np.savez("b.npz",x=a,y=d)
np.load("b.npz")['y']
#存储成txt文件
np.savetxt("x.txt",a)
[[ 1  2]
 [34  4]]
1
2


#参考资料

  1. https://www.numpy.org/devdocs/user/quickstart.html

Copyright © 2016 - 2020 Life-long Learning All Rights Reserved.

UV : | PV :