biosim.core.util
Class AdaptiveKernelDensityEstimator
java.lang.Object
biosim.core.util.AdaptiveKernelDensityEstimator
public class AdaptiveKernelDensityEstimator
- extends java.lang.Object
Method Summary |
void |
add(double[] sample)
|
double |
estimate(double[] target,
double bandwidth,
int k)
Estimate f(x) using an adaptive bandwidth. |
static void |
main(java.lang.String[] args)
|
int |
numSamples()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
AdaptiveKernelDensityEstimator
public AdaptiveKernelDensityEstimator(int dim,
Kernel kernel)
numSamples
public int numSamples()
add
public void add(double[] sample)
estimate
public double estimate(double[] target,
double bandwidth,
int k)
- Estimate f(x) using an adaptive bandwidth. Kernel density estimates that
use fixed bandwidth can have issues in areas of the space with few sample
points. An adaptive bandwidth estimate adjusts the width of the kernel
at each sample point according to the distance from that sample point to
it's kth nearest neighbor (d_jk). The exact equation used is:
f(x) = 1/n sum( (bandwidth*d_jk)^(-M) K( (x - x_i) / (bandwidth*d_jk) )
where M is the dimensionality of x, and K is the kernel. According to
Breiman, Meisel, and Purcell (who original proposed this technique in 1977)
finding a good combination of k and the bandwidth can be difficult. They
suggest starting with k = 10% of the number of sample points, or by
plotting the average value of d_jk versus k and using a value for k
"past the knee of the curve", and then optimizing a goodness of fit
metric by making small changes to k and d_jk individually. They note
that small values of k usually lead to very poor results.
main
public static void main(java.lang.String[] args)