biosim.core.util
Class AdaptiveKernelDensityEstimator

java.lang.Object
  extended by biosim.core.util.AdaptiveKernelDensityEstimator

public class AdaptiveKernelDensityEstimator
extends java.lang.Object


Nested Class Summary
static class AdaptiveKernelDensityEstimator.NormalKernel
           
 
Constructor Summary
AdaptiveKernelDensityEstimator(int dim, Kernel kernel)
           
 
Method Summary
 void add(double[] sample)
           
 double estimate(double[] target, double bandwidth, int k)
          Estimate f(x) using an adaptive bandwidth.
static void main(java.lang.String[] args)
           
 int numSamples()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AdaptiveKernelDensityEstimator

public AdaptiveKernelDensityEstimator(int dim,
                                      Kernel kernel)
Method Detail

numSamples

public int numSamples()

add

public void add(double[] sample)

estimate

public double estimate(double[] target,
                       double bandwidth,
                       int k)
Estimate f(x) using an adaptive bandwidth. Kernel density estimates that use fixed bandwidth can have issues in areas of the space with few sample points. An adaptive bandwidth estimate adjusts the width of the kernel at each sample point according to the distance from that sample point to it's kth nearest neighbor (d_jk). The exact equation used is: f(x) = 1/n sum( (bandwidth*d_jk)^(-M) K( (x - x_i) / (bandwidth*d_jk) ) where M is the dimensionality of x, and K is the kernel. According to Breiman, Meisel, and Purcell (who original proposed this technique in 1977) finding a good combination of k and the bandwidth can be difficult. They suggest starting with k = 10% of the number of sample points, or by plotting the average value of d_jk versus k and using a value for k "past the knee of the curve", and then optimizing a goodness of fit metric by making small changes to k and d_jk individually. They note that small values of k usually lead to very poor results.


main

public static void main(java.lang.String[] args)