CS:4980:006 Deep Learning Assignment 4: Due 10/2/2018
In this assignment, we will study the Python program for creating multi layer general neural networks in Theano:
network3.py and its application
to the MNIST data set
mnist.pkl.gz .
-
Using
network3.py to create two fully connected neural networks for MNIST with the following architectures: [784, 100, 100, 100, 10] and [784, 200, 100, 10], where the output layer is the softmax layer. For epoch number=30, try the three activation functions (using Theano's implementation): sigmoid, tangent, and rectified linear unit for the hidden layers. Please answer the following questions: (1) Experiment with various learning rates and report their performances; (2) The size of an architecture is the total number of scalars used in the architecture. What are the sizes of the two architectures?
-
Make a copy of
network3.py and name it as network3BN.py and implement the idea of Batch Normalization in this file. For each layer of the network (either fully connected or softmax layer), introduce two scalar Theano variables self.gamma and self.beta, whose values can be learned by SGD. Create a function
batchNorm(z_wsum, gamma, beta)
where z_wsum is a batch of weighted sums, gamma and
beta are parameters for batch normalization. The function will be
used by each layer and will
normalize z_wsum into z_norm and return (gamma*z_norm + beta). When
z_wsum, gamma, and beta are Theano expressions,
the result is also a Theano expression.
Experiment this implementation with batch normalization with various
learning rates and compare it against the best performer of each
architecture (with the same architecture) in Problem 1, and draw your
conclusions.
Please submit everything required, including the changed code and output of a sample run, in the ICON dropbox for Assignment 4 before the deadline.
Thank you!