/    /  Machine Learning- Instance-based Learning: Radial Basis Functions

Instance-based Learning: Radial Basis Functions

 

Learning using radial basis functions is a function approximation method that is strongly connected to distance-weighted regression and artificial neural networks. In this blog, we’ll have a look at Radial Basis Functions.

 

This is the learned hypothesis function, 

 

If each xu is an X instance and the kernel K function, (d (xq, x)) is defined to decrease as d (xq, x) increases. The user-supplied constant k specifies the number of kernel functions to be included.

 

The contribution from each of the ku(d(xu,x)) elements is confined to an area around the point xu in , which is a global approximation to f (x). 

 

It is normal to choose each function ku(d(xu,x)) individually to be a Gaussian function centered at the point xu with some variance.

 

The function may be thought of as defining a two-layer network, with the first layer computing the values of the different ku(d(xu,x)) and the second layer computing a linear combination of these first-layer units values.

 

  • A Gaussian function centered at some instance xu determines the activation of each hidden unit. As a result, unless the input x is close to xu, its activation will be close to zero. 

 

  • The hidden unit activations are combined in a linear fashion by the output unit. Although the network illustrated here only has one output, it is possible to incorporate numerous output units.

 

Where ai(x) are the attributes describing instance x, and

 

One common choice for ku(d(xu,x)) is

 

RBF networks are generally trained in two stages, given a collection of target function training samples. 

 

The number of hidden units k is chosen first, and each hidden unit u is specified by the values of xu and  that form its kernel function ku(d(xu,x)). 

 

Second, using the global error criterion, the weights wu are trained to maximize the network’s fit to the training data. The linear weight values wu can be learned relatively efficiently since the kernel functions are kept constant throughout this second step.

 

One method is to assign a Gaussian kernel function to each training example (xi, f (xi)) and center it at the point xi. The width  for each of these kernels might be the same.

 

The RBF network learns a global approximation to the target function using this strategy, in which each training sample (xi, f (xi)) may only impact the value of in the region of xi. 

 

This choice of kernel functions has the advantage of allowing the RBF network to precisely fit the training data. That is, the weights wo…w, for merging the m Gaussian kernel functions may be adjusted so that for each training example for any collection of m training instances.

 

Another option is to select a smaller set of kernel functions than the number of training instances. This method is far more efficient than the first, especially when dealing with a large number of training samples. 

 

The collection of kernel functions might be dispersed throughout the instance space X, with centers evenly separated. Alternatively, we may want to distribute the facilities consistently, especially if the conditions are still distributed more than X.

 

In this scenario, we may choose kernel function centers by sampling the underlying distribution of examples by randomly picking a portion of the training instances.

 

Alternatively, we may find archetypal clusters of instances and then apply a kernel function to each one. Unsupervised clustering algorithms that fit the training instances (but not their target values) to a mixture of Gaussians can be used to place the kernel functions in this manner.

 

The EM algorithm is a method for selecting the means of a group of k Gaussians that best fits the observed data. Given the k estimated means, the means in the EM method are chosen to maximize the likelihood of witnessing the occurrences xi. 

 

Unsupervised clustering algorithms do not take the instance’s target function value f (xi) into account while calculating kernel centers. The output layer weights wu, are determined only by the goal values f (xi) in this example.

 

function with a radial basis A global approximation to the target function is provided by networks, which are represented by a linear combination of several local kernel functions. 

 

Only when the input x falls inside the zone indicated by the kernel function’s specific center and width is the value for that kernel function non-negligible.

 

As a result, the network may be thought of as a smooth linear combination of a number of local approximations to the goal function. RBF networks have the advantage of being much more efficient to train than feedforward networks trained with BACKPROPAGATION. 

 

This is due to the fact that the input and output layers of an RBF are trained separately.

 

Training RBF Networks:

 

  • For the kernel function Ku(d(xu,x)), which xu should be used?
    • Use training instances or scatter uniformly throughout instance space (reflects instance distribution)

 

  • What is the best way to train weights (assuming Gaussian Ku)?
    • First, pick a variance (and maybe a mean) for each Ku — for example, use EM.
    • After that, keep Ku constant and train the linear output layer – a quick way to fit linear functions.

 

Reference

Instance-based Learning: Radial Basis Functions