Deep Learning

Keith Dillon
Spring 2019

Topic 10: Convolutional Neural Networks (CNNs, ConvNets)

Outline:¶

Convolution
Convolutional layers
Pooling Layers
Keras CNN
Using a pre-optimized model

Reading:

Chollet Chapter 5
GBC 9.0,9.1,9.2,9.3,9.10,9.11
"A guide to convolution arithmetic for deep learning", https://arxiv.org/pdf/1603.07285.pdf

CNN's¶

"perhaps the greatest success story of biologically-inspired artificial intelligence" -GBC
ImageNet 2012 challenge (and computer visions competitions since).
Best approach for many major computer vision tasks.
Applied to text, speech signals, and other data types also.

Inspiration - Computer vision¶

To recognize complex object, we first recognize simple objects and then recognize combinations of simple objects ("features").

Note the features can be anywhere in image, hence "unstructured".

Biological inspiration - the Receptive Field¶

"Center-surround" fields at every point in visual field
Can implement via 2D convolution
Combine in subsequent neurons to make more sophisticated receptive fields.

Convolution¶

A "sliding dot product": \begin{align} s[i] &= \sum_{n=-\infty}^\infty x[n] w[i-n] = (x * w)(i) \end{align} Compare to \begin{align} s'[i] &= \sum_{n=-\infty}^\infty x[n] w[i+n] = (x \star w)(i) \end{align}

Exercise: Compute convolution of w = [-1, +1, -1] with:

x = [1, 2, 3, 4, 6, 7, 7, 6]
x = [0, 0, 0, -1, +1, -1, 0]

for $i$=0, 1, ..., $N$

How might you deal with the edges?

What do the following convolution kernels do?¶

w=[1,1,1]
w=[-1, +1]

2D Convolution¶

\begin{align} s[i,j] &= \sum_{m=0}^M\sum_{n=0}^N x[m,n] w[i-m][j-n] = (x * w)(i,j) \\ s'[i,j] &= \sum_{m=0}^M\sum_{n=0}^N x[m,n] w[i+m][j+n] = (x \star w)(i,j) \\ \end{align}

Recall sliding dot product $\rightarrow$ pattern matching

Some enlightening gifs found on internet: https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1

Q: How do we implement Convolution as a network layer?¶

Recall fully-connected a.k.a. Densely-connected a.k.a dense Layer

Suppose input was an image, and output is an image (with convolution applied). Apply equation: \begin{align} s[i] &= \sum_{n=0}^N x[n] w[i-n] = (x * w)(i) \end{align}

Convolutional Layer¶

Parameter sharing (same color = same weight)
Far fewer weights to deal with versus dense layer.
Also note how edges handled ~ image gets smaller.

Edge handling¶

Two options:

Edges thrown away, each layer gets smaller
Must pad image (which then shrinks back to original size)

Multiple Channels & Kernels¶

We know how to make a 2D image into a 1D vector (right?). What is a color image however?

We can also have multiple convolution kernels for the same input image

Treat as a "tensor convolution"

Biological inspiration II¶

Retina $\approx$ 100 million photoreceptors
Optic nerve $\approx$ 1 million axons

100x downsampling.

Stride¶

Downsample while doing convolution.

Pooling Layer¶

Downsampling by taking max over small region.

Combining everything...¶

Keras time¶

"Deep Learning with Python", Francois Chollet, Ch. 2, https://github.com/fchollet/deep-learning-with-python-notebooks

"Getting Started with TensorFlow and Deep Learning: SciPy 2018 Tutorial", Josh Gordon, https://www.youtube.com/watch?v=tYYVSEHq-io, https://www.tensorflow.org/tutorials/keras/

Classification demos

https://www.tensorflow.org/tutorials/keras/basic_classification

https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/basic_classification.ipynb

https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/2.1-a-first-look-at-a-neural-network.ipynb

First test a Dense network on the fashion MNIST dataset¶

In [3]:

mnist = keras.datasets.mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [4]:

print(X_train.shape)
print(X_test.shape)
print(y_train)

(60000, 28, 28)
(10000, 28, 28)
[5 0 4 ... 5 6 8]

In [10]:

showMNIST10x10()

"Preprocessing"¶

Note pixel values range from 0 to 255 (8 bit integers)

We want to "normalize" to between 0 and 1

In [11]:

plt.imshow(X_train[0])
plt.colorbar();

In [12]:

X_train = X_train / 255.0
X_test = X_test / 255.0

Dense network model¶

In [13]:

from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Flatten(input_shape=(28, 28)))
model.add(layers.Dense(128, activation=tf.nn.relu))
model.add(layers.Dense(10, activation=tf.nn.softmax))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________

In [14]:

model.compile(optimizer=tf.train.AdamOptimizer(), 
              loss='categorical_crossentropy',
              metrics=['accuracy'])

In [15]:

from keras.utils import to_categorical
y_train_cat = to_categorical(y_train)

model.fit(X_train, y_train_cat, epochs=5)

Epoch 1/5
60000/60000 [==============================] - 5s 89us/step - loss: 0.2577 - acc: 0.9261
Epoch 2/5
60000/60000 [==============================] - 5s 77us/step - loss: 0.1148 - acc: 0.9657
Epoch 3/5
60000/60000 [==============================] - 5s 79us/step - loss: 0.0792 - acc: 0.9759: 0s - loss: 0.0802 - ac
Epoch 4/5
60000/60000 [==============================] - 5s 79us/step - loss: 0.0600 - acc: 0.9815
Epoch 5/5
60000/60000 [==============================] - 6s 96us/step - loss: 0.0469 - acc: 0.9856

Out[15]:

<keras.callbacks.History at 0x1c544cbc2b0>

In [16]:

y_test_cat = to_categorical(y_test)
test_loss, test_acc = model.evaluate(X_test, y_test_cat)

print('Test accuracy:', test_acc)

10000/10000 [==============================] - 0s 42us/step
Test accuracy: 0.9782

Using Validation data¶

In [17]:

from sklearn.model_selection import train_test_split

X_test, X_valid, y_test, y_valid = train_test_split(X_test, y_test, test_size=0.33)

from keras.utils import to_categorical
y_test_cat = to_categorical(y_test)
y_valid_cat = to_categorical(y_valid)

In [18]:

model = models.Sequential()
model.add(layers.Flatten(input_shape=(28, 28)))
model.add(layers.Dense(128, activation=tf.nn.relu))
model.add(layers.Dense(10, activation=tf.nn.softmax))

model.compile(optimizer=tf.train.AdamOptimizer(), 
              loss='categorical_crossentropy',
              metrics=['accuracy'])

history =  model.fit(X_train, y_train_cat, validation_data=(X_valid,y_valid_cat), epochs=20)

Train on 60000 samples, validate on 3300 samples
Epoch 1/20
60000/60000 [==============================] - 5s 89us/step - loss: 0.2612 - acc: 0.9247 - val_loss: 0.1361 - val_acc: 0.9567
Epoch 2/20
60000/60000 [==============================] - 5s 90us/step - loss: 0.1128 - acc: 0.9661 - val_loss: 0.1003 - val_acc: 0.9697
Epoch 3/20
60000/60000 [==============================] - 6s 98us/step - loss: 0.0779 - acc: 0.9765 - val_loss: 0.0797 - val_acc: 0.9755
Epoch 4/20
60000/60000 [==============================] - 5s 86us/step - loss: 0.0585 - acc: 0.9825 - val_loss: 0.0815 - val_acc: 0.9764
Epoch 5/20
60000/60000 [==============================] - ETA: 0s - loss: 0.0440 - acc: 0.986 - 5s 87us/step - loss: 0.0439 - acc: 0.9869 - val_loss: 0.0738 - val_acc: 0.9773
Epoch 6/20
60000/60000 [==============================] - 5s 81us/step - loss: 0.0367 - acc: 0.9886 - val_loss: 0.0744 - val_acc: 0.9755
Epoch 7/20
60000/60000 [==============================] - 5s 83us/step - loss: 0.0291 - acc: 0.9906 - val_loss: 0.0830 - val_acc: 0.9752
Epoch 8/20
60000/60000 [==============================] - 5s 82us/step - loss: 0.0233 - acc: 0.9927 - val_loss: 0.0764 - val_acc: 0.9806
Epoch 9/20
60000/60000 [==============================] - 5s 85us/step - loss: 0.0196 - acc: 0.9939 - val_loss: 0.0780 - val_acc: 0.9782
Epoch 10/20
60000/60000 [==============================] - 5s 86us/step - loss: 0.0157 - acc: 0.9956 - val_loss: 0.0776 - val_acc: 0.9779
Epoch 11/20
60000/60000 [==============================] - 5s 89us/step - loss: 0.0139 - acc: 0.9959 - val_loss: 0.0798 - val_acc: 0.9776
Epoch 12/20
60000/60000 [==============================] - 5s 90us/step - loss: 0.0124 - acc: 0.9959 - val_loss: 0.0882 - val_acc: 0.9791
Epoch 13/20
60000/60000 [==============================] - 5s 85us/step - loss: 0.0093 - acc: 0.9974 - val_loss: 0.0842 - val_acc: 0.9800
Epoch 14/20
60000/60000 [==============================] - 5s 87us/step - loss: 0.0099 - acc: 0.9969 - val_loss: 0.0867 - val_acc: 0.9806
Epoch 15/20
60000/60000 [==============================] - 6s 99us/step - loss: 0.0086 - acc: 0.9975 - val_loss: 0.0826 - val_acc: 0.9821
Epoch 16/20
60000/60000 [==============================] - 6s 99us/step - loss: 0.0078 - acc: 0.9973 - val_loss: 0.0978 - val_acc: 0.9785
Epoch 17/20
60000/60000 [==============================] - 7s 114us/step - loss: 0.0068 - acc: 0.9979 - val_loss: 0.0977 - val_acc: 0.9791
Epoch 18/20
60000/60000 [==============================] - 5s 89us/step - loss: 0.0067 - acc: 0.9981 - val_loss: 0.1000 - val_acc: 0.9782
Epoch 19/20
60000/60000 [==============================] - 6s 102us/step - loss: 0.0051 - acc: 0.9985 - val_loss: 0.1215 - val_acc: 0.9733
Epoch 20/20
60000/60000 [==============================] - 5s 87us/step - loss: 0.0068 - acc: 0.9980 - val_loss: 0.1078 - val_acc: 0.9803

In [19]:

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend();

In [21]:

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend();

...what are these plots telling us?¶

Now use Convolutional and max-pooling layers¶

In [2]:

# reload and normalize the data for clarity

mnist = keras.datasets.mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train / 255.0
X_test = X_test / 255.0

# note new step needed - convolutional layers expect channel dimension
X_train = X_train.reshape((60000, 28, 28, 1))
X_test = X_test.reshape((10000, 28, 28, 1))

from sklearn.model_selection import train_test_split
X_test, X_valid, y_test, y_valid = train_test_split(X_test, y_test, test_size=0.33)

from keras.utils import to_categorical
y_train_cat = to_categorical(y_train)
y_test_cat  = to_categorical(y_test)
y_valid_cat = to_categorical(y_valid)