I am a Ph.D. candidate at the department of Computer Science and Applied Mathematics at the Weizmann Institute of Science in Israel, working with Prof. Boaz Nadler. I am interested in many aspects of computational and mathematical statistics.
Office 124, Ziskind building, Weizmann Institute of Science, Israel.
Consider nonparametric regression (e.g. Nadaraya-Watson) in a high-dimensional metric space. In the general case, this typically requires an unreasonable number of labeled points due to the curse of dimensionality. We make two simplifying assumptions:
Manifold assumption: The data points are supported on (or near) a low-dimensional manifold.
Semi-supervised data set: In addition to the labeled points, we are given a large sample of unlabeled points
If there are many unlabeled points, then we can use them to estimate geodesic distances along the (unknown) manifold. This is done by connecting pairs of close points by a weighted edge with weight equal to their euclidean distance. Geodesic distances in the resulting graph approximate geodesic distances in the manifold. We propose to use this idea to apply standard nonparametric methods, but using the manifold distances instead of the euclidean distances. Thereby avoiding the curse of dimensionality.
Figure shamelessly stolen from IsoMap homepage
One way to think about manifold/graph-based semi-supervised methods, is to consider their Euclidean analogues. Consider the following table:
|Nonparametric regression method||Analogous approach for graphs/manifolds|
|Regression using orthogonal functions||Laplacian eigenvector regression|
|Wavelet regression||Graph multiscale wavelet regression|
|Nadaraya-Watson / K-nearest neighbor||Geodesic nearest-neighbor|
K nearest-neighbors and Nadaraya-watson regression are fundamental nonparametric regression method, but their manifold analogues have barely been studied. This was the initial motivation for this project. Results:
Minimax optimality: given a sufficient number of unlabeled data points from an unknown manifold, the geodesic K nearest-neighbors regressor obtains the finite-sample minimax bound for a Lipschitz function on the manifold, as if it were completely specified.
Good performance: on manifold-structured signals.
Efficient computation: regression and classification methods based on geodesic nearest neighbors can be efficiently computed, both for the transductive and the inductive cases of semi-supervised learning.
This paper presents a new algorithm that efficiently finds the k nearest labeled vertices for all vertices in the graph. We combine this algorithm with the ideas above to a problem of semi-supervised indoor localization using WiFi fingerprints.
Amit Moscovich, Ariel Jaffe, Boaz Nadler Minimax-optimal semi-supervised regression on unknown manifolds, accepted for publication in AISTATS 2017.
Code for efficiently finding K-nearest-labeled vertices: Python
This study started with an idea to modify the Higher Criticism statistic so as to improve its finite-sample behavior. After a lot of digging, this turned out to be a rediscovery of the Mn goodness-of-fit statistic proposed by Berk and Jones in 1979.
In this work we present a new derivation of the exact Berk-Jones statistics, prove that they are asymptotically optimal for the detection of Sparse Mixtures and consistent. We also compare them to other goodness-of-fit statistics and present an efficient method to compute the p-values of the one-sided statistics.
Overall, I believe these results demonstrate that the exact Berk-Jones statistics are an interesting alternative to other goodness-of-fit statistics such as Kolmogorov-Smirnov and Anderson-Darling, despite having received (much) less attention. They are particularly suited for the detection of alternatives that differ from the null hypothesis at the tails of the distribution.
Amit Moscovich, Boaz Nadler, Clifford Spiegelman On the exact Berk-Jones statistics and their p-value calculation, Electronic Journal of Statistics (2016). (Supplementary material)
Amit Moscovich, Boaz Nadler, Clifford Spiegelman (2014) Tail sensitive Goodness-of-fit, Poster presentation at the spring school "Structural Inference in Statistics: Adaptation and Efficiency". (somewhat outdated)
For computing the p-value of the Mn, Mn+ and Mn- statistics, see the Boundary Crossing section.
Code that reproduces all of the figures in the paper: exact_bj_paper_produce_figures.zip
Consider a one-dimensional homogeneous Poisson process. Given arbitrary boundary functions from above and below, how can we compute the boundary crossing probability of this process? This problem has several applications in statistics, including the computation of exact p-values and power for supremum-based continuous goodness-of-fit statistics, such as Kolmogorov-Smirnov, Berk-Jones and others.
Despite more than 50 years of research, the most efficient practical methods to date compute this probability in O(n^3) time, where n is an upper bound on the boundary functions. In this work, we discovered a simple and practical algorithm for this problem that solves it in O(n^2 log n) and more complicated O(n^2) algorithms, both for the one-sided and two-sided problem.
Amit Moscovich, Boaz Nadler Fast calculation of boundary crossing probabilities for Poisson processes, Statistics & Probability letters (2016), in press.
Amit Moscovich, Boaz Nadler, Faster calculation of boundary crossing probabilities for Poisson processes, in preparation.
Amit Moscovich, Boaz Nadler (2015) Fast computation of boundary crossing probabilities for the empirical CDF, poster+oral presentation at the Machine Learning Summer School in Kyoto.
A fast C++ implementation of several fast algorithms discussed in these papers: crossing-probability
We propose a new nonparametric approach for binary classification that exploits the modeling flexibility of sparse graphical models. We assume that each class can be represented by a family of undirected sparse graphical models, specifically a forest-structured distribution. Our procedure requires the nonparametric estimation of only one and two-dimensional marginal densities to transform the data into a space where a linear classifier is optimal. Experiments with simulated and real data indicate that the proposed method is competitive with popular methods across a range of applications.
Joint work with Mary Frances Dorn and Clifford Spiegelman of Texas A&M university and Boaz Nadler of the Weizmann institute of science.
Paper in preparation.