吴恩达

# Neural Network

  • Example: House Price Prediction

Standard Neural Network, CNN (convolutional, 图像等), RNN (recurrent, 时序序列)

Structured Data: 表格数据
Unstructured Data: 文本,语音,视频,图片

为什么效果好?
scale of data + scale of computation + better algorithms

# Basic of NN

# Binary Classification

输入 x 输出 0 或 1

对于训练集,有 m 个样本
m 个输入向量组成nx×mn_x \times m 的矩阵 X
m 个 label 组成1×m1 \times m 的行向量 Y

# Logistic Regression

一种二分类算法
given x, want y^=P(y=1x),xRn\hat{y} = P(y=1|x), x \in \mathbb{R}^n
Parameters: weights w, bias b
Output: y^=σ(wTx+b)\hat{y} = \sigma(w^Tx + b)

ww 是一个列向量,大小为 nx×1n_x \times 1,解释是对于每个特征都有一个权重
σ(z)=11+ez\sigma(z) = \frac{1}{1+e^{-z}} (sigmoid function, make sure y^[0,1]\hat{y} \in [0, 1])

math版
1
2
def basic_sigmoid(x):
return 1/(1 + math.exp(-x))
numpy版
1
2
def basic_sigmoid(x):
return 1/(1 + np.exp(-x))
梯度函数
1
2
3
4
def sigmoid_derivative(x):
s = sigmoid(x)
ds = s(1 - s)
return ds

# Logistic Regression Cost Function

Given(x(1),y(1)),...,(x(m),y(m)){(x^(1), y^(1)), ..., (x^(m), y^(m))}, want y^(i)y(i)\hat{y}^{(i)} \approx y^{(i)}
希望预测值和真实值接近
Loss function: 对于一个样本来说
L(y^,y)=[ylog(y^)+(1y)log(1y^)]L(\hat{y}, y) = -[y \log(\hat{y}) + (1-y) \log(1-\hat{y})]
不希望有多个局部最优,所以用交叉熵损失函数
Cost function: 对于mm 个样本(整个训练集)来说

J(w,b)=1mi=1mL(y^(i),y(i))=1mi=1m[y(i)log(y^(i))+(1y(i))log(1y^(i))]J(w, b) = \frac{1}{m} \sum_{i=1}^m L(\hat{y}^{(i)}, y^{(i)}) = -\frac{1}{m} \sum_{i=1}^m [y^{(i)} \log(\hat{y}^{(i)}) + (1-y^{(i)}) \log(1-\hat{y}^{(i)})]

找到一组参数w,bw,b 使得J(w,b)J(w,b) 最小

# Gradient Descent

梯度下降
为了便于理解,先忽略 b,只考虑 w,对于J(w)J(w):

repeat{w:=wαJ(w)w}repeat \{ w := w - \alpha \frac{\partial J(w)}{\partial w} \}

α\alpha 是学习率,也就是一次迭代所使用的步长,正负取决于初始值在哪边

对于J(w,b)J(w, b)

repeat{w:=wαJ(w,b)wb:=bαJ(w,b)b}repeat \{ w := w - \alpha \frac{\partial J(w, b)}{\partial w} b := b - \alpha \frac{\partial J(w, b)}{\partial b} \}

Derivative 表示函数的变化率,Gradient 表示多变量函数的变化率方向和大小

# Logistic Regression Gradient Descent

在一个样本上:

采用链式求导,先用L(a,y)L(a, y)aa 求导

La=ya+1y1a\frac{\partial L}{\partial a} = -\frac{y}{a} + \frac{1-y}{1-a}

再用aazz 求导 (sigmoid 函数求导)

Lz=Laaz=ya+1y1aa(1a)=ay\frac{\partial L}{\partial z} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} = -\frac{y}{a} + \frac{1-y}{1-a} \cdot a(1-a) = a - y

再用zzww 求导

z=wTx+b=w1x1+w2x2+...+wnxn+b所以zwi=xiz = w^Tx + b = w_1 x_1 + w_2 x_2 + ... + w_n x_n + b 所以\frac{\partial z}{\partial w_i} = x_i

# Logistic Regression Gradient Descent on m samples

对于 m 个样本:

J(w,b)=1mi=1mL(a(i),y(i))wherea(i)=σ(z(i))=σ(wTx(i)+b)J(w, b) = \frac{1}{m} \sum_{i=1}^m L(a^{(i)}, y^{(i)}) where a^{(i)} = \sigma(z^{(i)}) = \sigma(w^Tx^{(i)} + b)

串行的话只能用 for 循环,但是太慢了,所以可以利用矩阵运算

# Vectorization

向量化
Avoid explicit for-loops, use matrix/vector operations instead

u=Avui=j=1nAijvju = Av u_i = \sum_{j=1}^n A_{ij} v_j

代码实现
1
u = np.dot(A, v)

给定一个列向量 v

vT=[v1,v2,...,vn]v^T = [v_1, v_2, ..., v_n]

对 v 中每个元素做指数运算

uT=[ev1,ev2,...,evn]u^T = [e^{v_1}, e^{v_2}, ..., e^{v_n}]

for循环
1
2
3
u = np.zeros((n,1))
for i in range(n):
u[i] = np.exp(v[i])

改进为:

改进后的逻辑回归
1
2
u = np.exp(v)
# np.log(), np.sum(), np.mean(), np.abs(), np.maximum(v,0)...

# Vectorization for Logistic Regression

对于逻辑回归,尝试移除一个 for 循环

但对于 m 个样本,仍然需要循环 m 次
尝试移除所有 for 循环:

Logistic Regression
1
2
Z = np.dot(w.T, X) + b  # shape (1, m)
A = 1 / (1 + np.exp(-Z)) # shape (1, m)

# Vectorized Logistic Regression’s Gradient Computation

# Broadcasting in python

广播机制

axis 参数指定广播的方向:
axis = 0 : 垂直方向
axis = 1 : 水平方向

python/numpy:


计算的话最好全部用向量,不要用数组,不然会出问题

归一化
1
2
x_norm = np.linalg.norm(x, axis=1, keepdims=True)
x = x / x_norm

# 用 numpy 实现 softmax

用于多分类

for xR1×nsoftmax(x)=softmax([x1x2...xn])=[ex1jexjex2jexj...exnjexj]\text{for } x \in \mathbb{R}^{1\times n} \text{, } softmax(x) = softmax(\begin{bmatrix} x_1 && x_2 && ... && x_n \end{bmatrix}) = \begin{bmatrix} \frac{e^{x_1}}{\sum_{j}e^{x_j}} && \frac{e^{x_2}}{\sum_{j}e^{x_j}} && ... && \frac{e^{x_n}}{\sum_{j}e^{x_j}} \end{bmatrix}

For a matrix xRm×n, let xij denote the element in the i-th row and j-th column.\text{For a matrix } x \in \mathbb{R}^{m \times n}, \text{ let } x_{ij} \text{ denote the element in the } i\text{-th row and } j\text{-th column.}

softmax(x)=softmax[x11x12x13x1nx21x22x23x2nxm1xm2xm3xmn]=[ex11jex1jex12jex1jex13jex1jex1njex1jex21jex2jex22jex2jex23jex2jex2njex2jexm1jexmjexm2jexmjexm3jexmjexmnjexmj]=(softmax(first row of x)softmax(second row of x)...softmax(last row of x))softmax(x) = softmax\begin{bmatrix} x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\ x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn} \end{bmatrix} = \begin{bmatrix} \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\ \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}} \end{bmatrix} = \begin{pmatrix} softmax\text{(first row of x)} \\ softmax\text{(second row of x)} \\ ... \\ softmax\text{(last row of x)} \\ \end{pmatrix}

softmax
1
2
3
4
def softmax(x):
x_exp = np.exp(x)
s = np.sum(x_exp, asix = 1, keepdims = True)
return x_exp / s

# Logistic Regression’s Cost Function

ify=1:p(yx)=y^ify=0:p(yx)=1y^if y = 1: p(y|x) = \hat{y} if y = 0: p(y|x) = 1 - \hat{y}

解释:p (y|x) 是模型预测正确的概率,在已知输入 x 的情况下,真实标签 y 出现的概率
把这个概率代入 交叉熵损失函数(Cross-Entropy Loss)

L(y^,y)=[ylog(y^)+(1y)log(1y^)]L(\hat{y}, y) = -[y \log(\hat{y}) + (1-y) \log(1-\hat{y})]

就是损失函数

# Neural Network Overview

# Neural Network Representation

单隐藏层的 NN:
分别用a[0]a^{[0]} 表示输入层,a[1]a^{[1]} 表示隐藏层,a[2]a^{[2]} 表示输出层

# Computing a Neural Network’s Output

一个带两层隐藏层的神经网络,最后用 Sigmoid 输出

# Vectorizing across multiple examples

Edited on

Give me a cup of [coffee]~( ̄▽ ̄)~*

NoResponse WeChat Pay

WeChat Pay

NoResponse Alipay

Alipay

NoResponse PayPal

PayPal