**Neural Network Implemented in Matlab**** **

demo.m is the main file that computer and execute calculations via functions (another m. files)
app_newton – own version of the algorithm in Matlab using the approximate Hessian matrix with second-order network derivatives
backdrop-function that calculates derivatives (jacobian) and Hessian matrix
exact_newton - Netwon’s algorithm using the exact Hessians computed with the R-propagation algorithm
gradient_descent– classical backpropagation algorithm
model_network – model description from the figure in neural networks coursework2.pdf
weight_initilaization – function for random weight matrix initialization
In order to build and compare the efficiency of 3 learning algorithms, I performed multiple steps:
1) model description and generate input data (X, Y) for training algorithms.
2) multilayer perceptron weight and parameter initialization
3) simulation parameter choosing (num_epoch, learning rate)
4) main cycle over epoch with 3 algorithms training on the same input data: a) the classical
backpropagation algorithm (studied during the lectures); and b) the Netwon’s algorithm using the exact
Hessian computed with the R-propagation algorithm (given in the books below). c) Newton algorithm approximate Hessian matrix with second-order network derivatives
5) plot loss function over iterations to find difference convergence speed for 3 chosen algorithms

Figure 1. Convergence plot for 3 learning algorithms with simulation dataset** **
In the first step, I generate data with the following simple neural network. The function model_network performed this step with an output of 10000 samples (x-y). Then I initialize a neural network with 5 hidden units suitable for training. For weight initialization, I created function weight_initilaization() that performs a random generator for hidden and output weight matrices.
Firstly, I implemented my own version of the gradient descent (GD) algorithm in Matlab using the calculation of first derivatives with respect to the weights with backpropagation. In this function gradient_descent() I also calculated the Hessian matrix with the R-backpropagation algorithm from Nabney, I. (2002). Netlab: Algorithms for Pattern Recognition, Springer series Advances in Pattern Recognition. pp.160-163 Fast Multiplication by the Hessian (R-propagation algorithm). The output of this function is used for Newton's second-order optimizer. And for comparison, I also implemented the approximate Hessian matrix second-order algorithm in app_newton() function.
To sum up, I implemented 3 optimizer algorithms:
1) the classical backpropagation algorithm
2) the Netwon’s algorithm using the exact Hessian computed with the R-propagation algorithm
3) the Netwon’s algorithm using the approximate Hessian computed with the Levenberg-Marquardt optimization algorithm

Figure 2. Convergence plot for 3 learning algorithms with sunspot dataset Monte-Carlo simulation was run with 500 epoch, the learning rate was 0.001. Figure 1 shows that Netwon’s algorithm using the approximate Hessian computed with the Levenberg-Marquardt optimization algorithm (red curve) has the best performance in comparison with the classical backpropagation algorithm (blue curve) and Netwon’s algorithm using the exact Hessian computed with the R-propagation algorithm (green curve). I also performed some tests for 3 learning algorithms for the sunspot dataset (sunspot.dat). Figure 2 shows that newton's algorithm using the exact Hessian computed with the R-propagation algorithm (green curve) has the best performance. Before model building, I have feature normalization. From the above results, we can conclude that MLP is not the best and highest performance neural network architecture for time series prediction