CS:4980:006 Deep Learning Assignment 3: Due 9/20/2018


In this assignment, we will reuse the Python program for creating multi layer general neural networks: mlnnSGD.py and its application mnist.py to the MNIST data set mnist.pkl.gz .

  1. To see the impact of weight decay, before we update the weights for each mini batch, we mutiply the weights by one minus a small constant miu. For the default architecture (784, 30, 10) in mnist.py, try the following values for miu: 0.001, 0.005, 0.01, 0.05, and 0.1, by using the learning rate = 0.05, 0.1, 0.2, 0.4, respectively. Please record the error rates in each case. What conclusions can you draw from this experiment?

  2. For the MNIST data set, create a network with the architecture (784, 60, 30, 10), using the quadratic cost function with the mini batch size = 100 and the number of epochs = 10, record the minimal error rate on the test set by using the learning rate = 0.01, 0.02, 0.04, 0.08, 0.16, and the initial weights are multiplied by 0.5, 0.2, 0.1, 0.05. What conclusion can be drawn from from this experiment?

  3. For the same architecture and the best mutiplier for weights from the first problem, experiment with various learning rates, three different activation functions for the hidden layers: sigmoid, hyperbolic tangent, and rectified linear functions, and two cost functions with sigmoid output: quardratic cost and cross entropy. What conclusion can be drawn from from this experiment?

Please submit everything required, including the changed code and output of a sample run, in the ICON dropbox for Assignment 3 before the deadline.

Thank you!