Author: Brian A. Ree
                            
                            
                            
                            
                            1: Class Template
                            
                                Let's start by coming up with a general outline for our neural network class. From our experience with the previous 
                                tutorial we can define three general functions our neural network class should support.
                            
                            
                            
                                - init: Setup our nodes in the three layers we talked about in the previous tutorial.
 
                                - train: Refine our connection weights after being fed input from a training set of data and compare results to an expected outcome.
 
                                - query: Give an answer from the output nodes after being given an input.
 
                            
                            
                            
                                Again these are general topics our class must support but their may be more things things it'll need to do, we'll cross that bridge when we come to it.
                            
                            
                            
                            
                            2: Initializing the Network
                            
                                Our simple little network has three layers, input, hidden, and output. The hidden layer is really just a middle layer but since all its are fed from the
                                input layer we consider it hidden because we can't get to it's input values as easily as we can with the input and hidden layers.
                                We don't want to hard code anything in our network so we'll take passed in parameters to set our layers. We have to be careful and make sure that our input
                                parameters make sense. We should check that our input node count and our output and hidden node counts all match. We also need to set a learning rate.
                                Remember our learning rate controls how we determine the lowest point on our error curve. If this rate is very high we'll completely over shoot our minimum
                                and never find it, or we could end up bouncing back and forth around some local minimum of our error curve. We should choose a small learning rate, something less than 1, so that we can make sure
                                the adjustments to our weights are small.
                            
                            
                            
# Initialize a simple 3 layer neural network.
def __init__(self, inputnodes, hiddennodes, outputnodes, learningrate):
    # Set the number of nodes in each layer.
    self.inodes = inputnodes
    self.hnodes = hiddennodes
    self.onodes = outputnodes
    # if len(self.inodes) != len(self.hnodes) or len(self.inodes) != len(self.onodes):
    #    print("Error: You must provide nodes lists of the same size.")
    #    print("Unexpected results may occur.")
    # eif
    # Set the learning rate.
    self.lr = learningrate
    if self.lr > 1.0:
        print("Error: You must provide a learning rate that is less than or equal to one.")
        print("Unexpected results may occur.")
    # eif
    # Set the link weight matrices, wih and who.
    self.wih = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.hnodes, self.inodes))
    self.who = numpy.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
    # Set the activation function for our neurons.
    self.activation_function = lambda x: scipy.special.expit(x)
# edef
                             
                            
                            
                                We offload our input parameters to local class variables so we can access them in our other methods.
                                Next up we'll store the learning rate in a local class variable and then we'll do a quick check to make sure the learning rate isn't too large.
                                For now we'll just print an error message if we detect something wrong with our input variables.
                            
                            
                            
                            
                            3: Initializing the Weights
                            
                                The next step is to create a network of neural nodes and links. The most important part of the network is the link weights.
                                They're used to calculate the signal being fed forward and the back propagated error. It is the link weights that are refined
                                in an attempt to improve the network.
                            
                            
                            
                                Weights can be represented as a matrix, so we can define them as follows:
                            
                            
                            
                                - A matrix for the weights for the links between the input and hidden layers, w_input_hidden of size self.hnodes by self.inodes or hidden_nodes by input_nodes.
 
                                - A matrix for the weights for the links between the hidden and output layers, w_hidden_output of size self.onodes by self.hnodes or hidden_nodes by output_nodes.
 
                            
                            
                            
                                Common practice is to set initial ink weight values to small random numbers. The following numpy function
                                generates an array of values selected randomly between 0 and 1, the size is rows X columns.
                            
                            
                            
numpy.random.rand(rows, columns)
                             
                            
                            
                                Let's setup our weight matrices.
                            
                            
                            
# Set the link weight matrices, wih and who.
self.wih = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.hnodes, self.inodes))
self.who = numpy.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
                             
                            
                            
                                This is a subtle and important step in the neural network design process, initializing the link weight matrices.
                                We initialize the matrix of shape, self.hnodes X self.inodes for our wih class variable. And we
                                initialize the matrix shape, self.onodes X self.hnodes for our who class variable.
                            
                            
                            
                            
                            4: Querying the Network
                            
                                By querying the network we are asking the network to infer an answer based on some input values. The inferred output is generated
                                by forwarding feeding the input signals through our neural network, the weight of the connections between our artificial network,
                                create an output signal. Our class method, query, takes the input parameters and feeds it into our neural network returning the
                                network's output.
                            
                            
                            
def query(self, inputs_list):
    # Convert inputs to a two dimensional matrix
    inputs = numpy.array(inputs_list, ndmin=2).T
    # Calculate the signals from the input layer to the hidden layer
    hidden_inputs = numpy.dot(self.wih, inputs)
    # Calculate the signals from the hidden layer to the output layer
    hidden_outputs = self.activation_function(hidden_inputs)
    # Calculate signals into final output layer
    final_inputs = numpy.dot(self.who, hidden_outputs)
    # Calculate the signals emerging from the final output layer
    final_outputs = self.activation_function(final_inputs)
    return final_outputs
# edef
                             
                            
                            
                                To perform this task we need to pass the input signals from the input layer of nodes, through the hidden layer and out
                                of the final output layer. Remember also that we use the link weights to moderate the signals as they feed into any given hidden or
                                output node, and we also use sigmoid activation function to alter the signals coming out of the respective network nodes.
                            
                            
                            
                                Before we can move forward we need to format the input data so that it has the proper shape.
                                We're planning to use this matrix in future calculation so we have to make sure we can use it in a 
                                matrix dot product.
                            
                            
                            
# Convert inputs to a two dimensional matrix
inputs = numpy.array(inputs_list, ndmin=2).T
                             
                            
                            
                                Using matrices we can express the matrix of weights for the link between the input layer and the hidden layers. We can combine the matrix
                                of inputs to generate the signals that are fed into the hidden layer nodes.
                                
                                x_hidden = w_input_hidden * inputs (where * is the matrix dot product)
                            
                            
                            
                                Now look how easy it is to express this in python.
                            
                            
                            
# Calculate the signals from the input layer to the hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
                             
                            
                            
                                To get the signals emerging from the hidden nodes, we apply the sigmoid activation function to each emerging signal.
                                Remember the activation function is stored in a special math library as the expit function. We created a local method for the
                                activation function in our initialization method, self.activation_function = lambda x: scipy.special.expit(x).
                                
                                o_hidden = sigmoid(x_hidden)
                                
                                Now to express this in python.
                            
                            
                            
# Calculate the signals from the hidden layer to the output layer
hidden_outputs = self.activation_function(hidden_inputs)
                             
                            
                            
                                This step stores the signals emerging from the hidden layer nodes in the matrix called hidden_outputs. The process between used
                                for the signals between the hidden and output nodes is similar.
                            
                            
                            
# Calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
    
# Calculate the signals emerging from the final output layer
final_outputs = self.activation_function(final_inputs)
                             
                            
                            
                                The next method we'll flush out is the train method. Remember there are two phases to training, the first is calculating
                                the output just as the query method does it, and the second part is back propagating the errors to inform the network how 
                                the link weights are refined.
                            
                            
                            
                            
                            5: Training the Network
                            
                                Now that we have our init and query methods defined we have to complete the train method.
                                The train method will run data through our network and adjust the network weights based on a comparison between the
                                expected output and the generated output. There are two parts to the training step.
                            
                            
                            
                                - 1: Working out the output for a given training example. This is the same as the functionality as the query method.
 
                                - 2: Working out the network weight adjustmnets by taking the error, expected outcome compared to generated outcome, and back propagating
                                through the network.
 
                            
                            
                            
def train(self, inputs_list, answers_list):
    # Convert inputs to a two dimensional matrix
    inputs = numpy.array(inputs_list, ndmin=2).T
    answers = numpy.array(answers_list, ndmin=2).T
    # Calculate the signals from the input layer to the hidden layer
    hidden_inputs = numpy.dot(self.wih, inputs)
    # Calculate the signals from the hidden layer to the output layer
    hidden_outputs = self.activation_function(hidden_inputs)
    # Calculate signals into final output layer
    final_inputs = numpy.dot(self.who, hidden_outputs)
    # Calculate the signals emerging from the final output layer
    final_outputs = self.activation_function(final_inputs)
    # Output layer error is the (answer - guess)
    output_errors = answers - final_outputs
    # Hidden layer error is the output_errors, split by weights, recombined at hidden nodes
    hidden_errors = numpy.dot(self.who.T, output_errors)
    # Update the weights for the links between the hidden and output layers
    self.who += self.lr * numpy.dot((output_errors * final_outputs * (1.0 - final_outputs)), numpy.transpose(hidden_outputs))
    # Update the weights for the links between the input and hidden layers
    self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))
# edef
                             
                            
                            
                                This code is almost exactly the same as that in the query method, because we're feeding forward the signal from the input layer to the final
                                output layer in exactly the same way. The only difference this far is that we have an additional parameter, targets_list, defined in function
                                train because you can't train the network without an expected output.
                            
                            
                            
                                The input_list and target_list are converted into a numpy.array data type. We're getting closer to the back propagation step and the weight
                                refinment based on the error. First let's calculate the error. 
                            
                            
                            
# Output layer error is the (answer - guess)
output_errors = answers - final_outputs
                             
                            
                            
                                Next we need to calculate the back-propagated errors for the hidden layer nodes. The matrix form of this calculation is as follows.
                                
                                errors_hidden = weights_T_hidden_output * errors_output
                                
                                Where weights_T_hidden_output is the matrix transpose of the weights_hidden_output mstrix.
                                Again we are altering the shape of the matrix so that we can use it properly in our matrix expresisons. 
                                This is expressed in python as follows.
                            
                            
                            
# Hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = numpy.dot(self.who.T, output_errors)
                             
                            
                            
                                We now have what we need to refine the weights between each layer. For the weights between the hidden and output layers, we
                                use the output_errors variable. For the weights between the input and the hidden layers, we use the hidden_errors list we
                                just calculated.
                            
                            
                            
                                The expression for updating the weights for the link between a node j and a node k in the next layer is a matrix of the 
                                following form.
                                
                                DELTA W_jk = ALPHA . E_k . sigmoid(O_k) . (1 - sigmoid(O_k)) * O_T_j
                            
                            
                            
                                The alpha is the learning rate, and the sigmoid is the node activation function we saw before. Remember that the . is
                                matrix scalar multiplication and * is the matrix dot product. The python code for this expression is as follows.
                            
                            
                            
# Update the weights for the links between the hidden and output layers
self.who += self.lr * numpy.dot((output_errors * final_outputs * (1.0 - final_outputs)), numpy.transpose(hidden_outputs))
                             
                            
                            
                                The code for the other weights between the input and hidden layers will be very similar.
                            
                            
                            
# Update the weights for the links between the input and hidden layers
self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))
                             
                            
                            
                                In the next tutorial we'll use our code to recognize hand written numbers!! What?!?!