Date of publication: 2017-09-04 21:16
Radial basis function (RBF) networks are FFNNs with radial basis functions as activation functions. There 8767 s nothing more to it. Doesn 8767 t mean they don 8767 t have their uses, but most FFNNs with other activation functions don 8767 t get their own name. This mostly has to do with inventing them at the right time.
Also, is there some specific name for the ordinary autoencoder to let people know that you are talking about an autoencoder that compresses the data? Perhaps 8775 compressive autoencoder 8776 ? To me, the term 8775 autoencoder 8776 includes all kinds of autoencoders, . also denoising, variational and sparse autoencoders, not just 8775 compressive 8776 (?) autoencoders. (So to me it feels a bit wrong to talk about 8775 autoencoders 8776 like all of them compress the data.)
In training deep networks, it is usually helpful to anneal the learning rate over time. Good intuition to have in mind is that with a high learning rate, the system contains too much kinetic energy and the parameter vector bounces around chaotically, unable to settle down into deeper, but narrower parts of the loss function. Knowing when to decay the learning rate can be tricky: Decay it slowly and you’ll be wasting computation bouncing around chaotically with little improvement for a long time. But decay it too aggressively and the system will cool too quickly, unable to reach the best position it can. There are three common types of implementing the learning rate decay:
We note that optimization for deep networks is currently a very active area of research. In this section we highlight some established and common techniques you may see in practice, briefly describe their intuition, but leave a detailed analysis outside of the scope of the class. We provide some further pointers for an interested reader.
I reckon a combination of FF and possibly ELM. Sounds like a large deep FF network but with permanent learned dropout in the deeper layers. Interesting!
And finally, Kohonen networks (KN, also self organising (feature) map, SOM, SOFM) 8775 complete 8776 our KNs utilise competitive learning to classify data without supervision. Input is presented to the network, after which the network assesses which of its neurons most closely match that input. These neurons are then adjusted to match the input even better, dragging along their neighbours in the process. How much the neighbours are moved depends on the distance of the neighbours to the best matching units. KNs are sometimes not considered neural networks either.
Nesterov Momentum is a slightly different version of the momentum update that has recently been gaining popularity. It enjoys stronger theoretical converge guarantees for convex functions and in practice it also consistenly works slightly better than standard momentum.
One problem with drawing them as node maps: it doesn 8767 t really show how they 8767 re used. For example, variational autoencoders (VAE) may look just like autoencoders (AE), but the training process is actually quite different. The use-cases for trained networks differ even more, because VAEs are generators, where you insert noise to get a new sample. AEs, simply map whatever they get as input to the closest training sample they 8775 remember 8776 . I should add that this overview is in no way clarifying how each of the different node types work internally (but that 8767 s a topic for another day).
Vincent, Pascal, et al. 8775 Extracting and composing robust features with denoising autoencoders. 8776 Proceedings of the 75th international conference on Machine learning. ACM, 7558.
Original Paper PDF
The idea behind deeper SVMs is that they allow for classification tasks more complex than binary. Usually it would just be the directly one-to-one connected stuff as seen in the first layer. It 8767 s set up like that because each neuron in the input represents a point in the possibility space, and the network tries to separate the inputs with a margin as large as possible.
The second important quantity to track while training a classifier is the validation/training accuracy. This plot can give you valuable insights into the amount of overfitting in your model:
I have never seen SVMs classified as neural networks. The paper does call them support vector _networks_ and does compare/contrast them to perceptrons, but I still think it 8767 s a fundamentally different approach to learning. It 8767 s not connectivist.
Thanks for your quick reply！
Could you please give me some reference papers about
svm-network, i 8766 m interested in this research how neural network could union these algorthims.
Denoising autoencoders (DAE) are AEs where we don 8767 t feed just the input data, but we feed the input data with noise (like making an image more grainy). We compute the error the same way though, so the output of the network is compared to the original input without noise. This encourages the network not to learn details but broader features, as learning smaller features often turns out to be 8775 wrong 8776 due to it constantly changing with noise.
Note that it is possible to know if a kink was crossed in the evaluation of the loss. This can be done by keeping track of the identities of all “winners” in a function of form \(max(x,y)\) That is, was x or y higher during the forward pass. If the identity of at least one winner changes when evaluating \(f(x+h)\) and then \(f(x-h)\), then a kink was crossed and the numerical gradient will not be exact.