CS:4980:006 Deep Learning Assignment 3: Due 9/20/2018
In this assignment, we will reuse the Python program for creating multi layer general neural networks:
mlnnSGD.py and its application
mnist.py to the MNIST data set
mnist.pkl.gz .
- To see the impact of weight decay, before we update the weights
for each mini batch, we mutiply the weights by one minus a small
constant miu. For the default architecture (784, 30, 10) in
mnist.py, try the following values for miu: 0.001, 0.005, 0.01,
0.05, and 0.1, by using the learning rate = 0.05, 0.1, 0.2, 0.4,
respectively. Please record the error rates in each case. What
conclusions can you draw from this experiment?
- For the MNIST data set, create a network with the architecture
(784, 60, 30, 10), using the quadratic cost function with the mini
batch size = 100 and the number of epochs = 10, record the minimal
error rate on the test set by using the learning rate = 0.01, 0.02,
0.04, 0.08, 0.16, and the initial weights are multiplied by 0.5, 0.2,
0.1, 0.05. What conclusion can be drawn from from this experiment?
-
For the same architecture and the best mutiplier for weights from the
first problem, experiment with various learning rates, three different
activation functions for the hidden layers: sigmoid, hyperbolic
tangent, and rectified linear functions, and two cost functions with
sigmoid output: quardratic cost and cross entropy. What conclusion can
be drawn from from this experiment?
Please submit everything required, including the changed code and
output of a sample run, in the ICON dropbox for Assignment 3 before
the deadline.
Thank you!