深度学习&卷积神经网络学习笔记(二)深入

2018年6月15日 0 作者 Manchery

概念

  • padding
    • valid convolutions: $n-f+1$
    • same convolutions: using padding, $p={f-1\over 2}$
  • stride: $s$
  • ${n+2p-f\over 2}+1$
  • type of layers
    • Conv (CONV)
    • Pooling (POOL)
    • Fully-connected (FC)
  • cross correlation vs. convolution
  • logstic vs. softmax
  • regularization
    • L1:稀疏矩阵、防止过拟合
    • L2:防止过拟合

Deep learning

  • Deep learning framework
    • Caffe/Caffe2,TensorFlow,etc
    • choosing
      • ease of programming
      • running speed
      • truly open
  • optimization algorithm for GD
    • mini-batch: 64,128,256,512
    • tricks
      • exp weighted average
      • bias correction
    • GD with momentum
    • RMSprop
    • Adam, $\beta_1=0.9,\beta_2=0.999,\epsilon=10^{-8}$

Convolutional neural network

  • Backpropagation: derivative
    • $dA += \sum_{h=0} ^{n_H} \sum_{w=0} ^{n_W} W_c \times dZ_{hw}$
    • $dW_c += \sum_{h=0} ^{n_H} \sum_{w=0} ^ {n_W} a_{slice} \times dZ_{hw} $
    • $db = \sum_h \sum_w dZ_{hw}$
  • 一些经典的CNN
    • LeNet-5
      • $n_H,n_W$ 减少,$n_C$ 增加
      • conv-pool-conv-pool-fc-fc-output
      • 60000 parameters
    • AlexNet
      • 60M parameters
      • ReLU
      • Multi-GPU
      • LRN
    • VGG-16
  • ResNets
    • plain vs. residual
    • shortcut: $a^{[l+2]} = g(z^{[l+2]}+a^{[l]})$
    • plain 随着层数增加,training error 可能升高,resnet 总是不会
    • 原因:$a[l]$ 可直接传输到 $a[l+2]$,无所谓深度
  • 1 x 1 convolution (Network in Network)
    • 升维降维
    • 增加非线性性
    • 减少计算量
  • Inception
    • Inception

    • bottleneck layer:减少运算量

技巧

  • Transfer learning
    • training set越多,参数freeze越少
  • Data augmentation: always helpful in DL
    • mirroring/random cropping/rotation/shearing
    • local warping
    • color shifting
  • Data vs. hand-engineering
    • little data: more hand-engineering/transfer learning
    • lots of data: simple algorithm, less hand-engineering
  • Tips for winning competition
    • Ensembling
    • multi-crop at test time: 10-crop
    • use open source code
  • Hyperparameters tuning
    • 重要程度:$\alpha$, $\beta$, #hidden units, mini batch size
    • random values: don’t use a grid
    • coarse to fine: 由粗到精,区域定位
    • an appropriate scale: linear/log scale
    • babysitting one model (pandas) vs. training many models in parallel (caviar)

练习

Convolution+model+-+Step+by+Step+-+v2

Convolution+model+-+Application+-+v1

Tensorflow+Tutorial