Neural Network Implemented in Matlab
demo.m is the main file that computer and execute calculations via functions (another m. files) app_newton – own version of the algorithm in Matlab using the approximate Hessian matrix with second-order network derivatives backdrop-function that calculates derivatives (jacobian) and Hessian matrix exact_newton - Netwon’s algorithm using the exact Hessians computed with the R-propagation algorithm gradient_descent– classical backpropagation algorithm model_network – model description from the figure in neural networks coursework2.pdf weight_initilaization – function for random weight matrix initialization In order to build and compare the efficiency of 3 learning algorithms, I performed multiple steps: 1) model description and generate input data (X, Y) for training algorithms. 2) multilayer perceptron weight and parameter initialization 3) simulation parameter choosing (num_epoch, learning rate) 4) main cycle over epoch with 3 algorithms training on the same input data: a) the classical backpropagation algorithm (studied during the lectures); and b) the Netwon’s algorithm using the exact Hessian computed with the R-propagation algorithm (given in the books below). c) Newton algorithm approximate Hessian matrix with second-order network derivatives 5) plot loss function over iterations to find difference convergence speed for 3 chosen algorithms
Figure 1. Convergence plot for 3 learning algorithms with simulation dataset In the first step, I generate data with the following simple neural network. The function model_network performed this step with an output of 10000 samples (x-y). Then I initialize a neural network with 5 hidden units suitable for training. For weight initialization, I created function weight_initilaization() that performs a random generator for hidden and output weight matrices. Firstly, I implemented my own version of the gradient descent (GD) algorithm in Matlab using the calculation of first derivatives with respect to the weights with backpropagation. In this function gradient_descent() I also calculated the Hessian matrix with the R-backpropagation algorithm from Nabney, I. (2002). Netlab: Algorithms for Pattern Recognition, Springer series Advances in Pattern Recognition. pp.160-163 Fast Multiplication by the Hessian (R-propagation algorithm). The output of this function is used for Newton's second-order optimizer. And for comparison, I also implemented the approximate Hessian matrix second-order algorithm in app_newton() function. To sum up, I implemented 3 optimizer algorithms: 1) the classical backpropagation algorithm 2) the Netwon’s algorithm using the exact Hessian computed with the R-propagation algorithm 3) the Netwon’s algorithm using the approximate Hessian computed with the Levenberg-Marquardt optimization algorithm
Figure 2. Convergence plot for 3 learning algorithms with sunspot dataset Monte-Carlo simulation was run with 500 epoch, the learning rate was 0.001. Figure 1 shows that Netwon’s algorithm using the approximate Hessian computed with the Levenberg-Marquardt optimization algorithm (red curve) has the best performance in comparison with the classical backpropagation algorithm (blue curve) and Netwon’s algorithm using the exact Hessian computed with the R-propagation algorithm (green curve). I also performed some tests for 3 learning algorithms for the sunspot dataset (sunspot.dat). Figure 2 shows that newton's algorithm using the exact Hessian computed with the R-propagation algorithm (green curve) has the best performance. Before model building, I have feature normalization. From the above results, we can conclude that MLP is not the best and highest performance neural network architecture for time series prediction