CS:4980 Deep Learning Assignment

CS:4980:006 Deep Learning Assignment 4: Due 10/2/2018

In this assignment, we will study the Python program for creating multi layer general neural networks in Theano: network3.py and its application to the MNIST data set mnist.pkl.gz .

Using network3.py to create two fully connected neural networks for MNIST with the following architectures: [784, 100, 100, 100, 10] and [784, 200, 100, 10], where the output layer is the softmax layer. For epoch number=30, try the three activation functions (using Theano's implementation): sigmoid, tangent, and rectified linear unit for the hidden layers. Please answer the following questions: (1) Experiment with various learning rates and report their performances; (2) The size of an architecture is the total number of scalars used in the architecture. What are the sizes of the two architectures?
Make a copy of network3.py and name it as network3BN.py and implement the idea of Batch Normalization in this file. For each layer of the network (either fully connected or softmax layer), introduce two scalar Theano variables self.gamma and self.beta, whose values can be learned by SGD. Create a function
```
         batchNorm(z_wsum, gamma, beta)
```
where z_wsum is a batch of weighted sums, gamma and beta are parameters for batch normalization. The function will be used by each layer and will normalize z_wsum into z_norm and return (gamma*z_norm + beta). When z_wsum, gamma, and beta are Theano expressions, the result is also a Theano expression.
Experiment this implementation with batch normalization with various learning rates and compare it against the best performer of each architecture (with the same architecture) in Problem 1, and draw your conclusions.

Please submit everything required, including the changed code and output of a sample run, in the ICON dropbox for Assignment 4 before the deadline.

Thank you!