# 梯度下降法

## 梯度

• 在单变量的函数中，梯度其实就是函数的微分，代表着函数在某个给定点的切线的斜率
• 在多变量函数中，梯度是一个向量，向量有方向，梯度的方向就指出了函数在给定点的上升最快的方向

## 梯度下降算法

### α含义

α在梯度下降算法中被称作为学习率或者步长，意味着我们可以通过α来控制每一步走的距离，不要走太快，步长太大会错过了最低点。同时也要保证不要走的太慢，太小的话半天都无法收敛。

## 缺点

• 参数调整缓慢（谷底较平时出现的问题）
• 收敛于局部极小值

## 随机梯度下降

It is a iterative method for optimizing a differentiable objective function, a stochastic approximation of gradient descent optimization. It is called stochastic because samples are selected randomly(or shuffled) instead of as a single group (as in standard gradient descent) or in the order they appear in the training set.

Note that in each iteration (also called update), only the gradient evaluate at a single point Xi instead of evaluating at the set of all samples.

The key difference compared to standard (Batch) Gradient Descent is that only on piece of data from the dataset is used to calculate the step and the piece of data is picked randomly at each step.

## 参考文献

• https://www.jianshu.com/p/c7e642877b0e
• https://blog.csdn.net/walilk/article/details/50978864

### 2 评论

这个有源码嘛？想学习一下

• #### xyjisaw

可以参考TensorFlow的源码