Backpropagation
Theory and Examples
The perceptron learning rule of Frank Rosenblatt and the LMS algorithm of Bernard Widrow arid Martian Hotf were designed to train single-layer perceptron-like networks. these single-layer networks suffer from the disadvantage that they are only able to solve linearly separable classification pmhlems. Both Rosenblatt and Widraw were aware of these limitations and purposed multilayer networks that could overcome them, but they were not able to generalize their also-rithms to train these more powerful networks.
Apparently the first description of an algorithm to train multilayer networks was contained in the thesis of Paul Werhos in 1974. This thesis presented the algorithm in the context of general networks, with neural networks as a special case, and was not disseminated in the neural
network community. It was not until the mid 1980s that the backpropaga-tion algorithm was rediseovered and widely publicized. It was rediscovered independently David H.umelhart, Geoffrey Hintan and Ronald WilHams,David Parker and Yann Ixe Cun. The algo-rithm was popularized its inclusion in the book Parndled Distributed Pro-teas, which described the work afthe Parallel Dixtributed Processing Group led psychologists David Rumelhart and James Mc-Clelland. The publication of this book spurred a torrent of research in neu-ral networks. The multilayer perceptmn, trained by the backprapagation algorithm, is currently the most widely used neural network.this chapter we will first investigate the capabilities of multilayer net works and then present the barkpropagation algorithm.
1.2.1 Muitilayer Perceptrons
We first introduced the notation for multilayer networks. Forease of reference we have reproduced the diagram of the three-Iayer perceptron in Figure 1-1. Note that we have simply cascaded three perceptrop networks. The output of the first network is the input to the second network, and the output of the sevond network is the input to the third net-work. Each layer may have a different number of neurons, and even a different transfer function. we are using superecripts to identify the layer number. Thus, the weight matrix for the first layers written as W1 and the weight matrix for the second Layer is written W2.
反向传播
1.1理论和实例
Frank Rosenblat:的感知机学习规则和Bernard Widrow和Marcian Hnff的LMS算法是设计用来训练单层的类似感知器的网络的。这些单层网络的缺点是只能解线性可分的分类间题。Rosenblat,和Pvidrow均意识到这些限制并且都提出了克服此类问题的方法:多层网络。但他们未将这类算法推广到用来训练功能更强的网络。
Paul Werboss在他1974年的论文中第一次描述了训练多层神经网络的一个算法,论文中的算法是在一般网络的情况中描述的,而将神经网络作为一个特例。论文没有在神经网络研究圈子内传播。直到20世纪80年代中期,反向传播算法才重新被发现并广泛地宣扬,它是被David Ftumelhart, Geoffrey Hinton和Ronald Williaras , David Parkerr ,以及Yann Le Cun分别独立地重新发现的。这个算法因被包括在《并行分布式处理》( Parallel Distributed Processing)一书中而得到普及。这本书介绍心理学家David Rumelhart和James McClelland领导的并行分布处理小组所做的研究工作。这本书的出版引发了神经网络的研究热潮。当前,用反向传播算法训练的多层感知机是应用最广的神经网络。
本章中,首先让我们来看看多层网络的能力,然后叙述反向传播算法。
1.2.1多层感知机
首先我们介绍所用的多层网络的符号。为便于参考,我们在图1-1中画出一个三层感知机的图。注意三个感知机网络只是简单地被连接在一起。第一个网络的输出是第二个网络的输入甲第二个网络的输出是第三个网络输入。每一层可以有不同数目的神经元,甚至传输函数也可以不同:我们用上标来表示层号:因而,第一层的权值矩阵写作W1,第二层的权值矩阵写作W2