Fachbereich Informatik - Aktuell
Disputation Matthäus Kleindeßner
am Freitag, 21. Juli 2017 um 13 Uhr c.t. in Raum A104, Sand 1, EG.
Machine Learning in a Setting of Ordinal Distance Information
Berichterstatter 1: Prof. Dr. Ulrike von Luxburg
Berichterstatter 2: Prof. Dr. Matthias Hein
In a typical machine learning scenario we are given numerical dissimilarity values between objects (or feature representations of objects, from which such dissimilarity values can readily be computed). I study the relaxed setting of ordinal distance information, in which we are only provided with binary answers to comparisons of dissimilarity values such as d(A,B)<d(A,C). This setting has attracted interest in recent years, mainly due to the possibility of simple collection of such data by means of crowdsourcing.
In my talk, I will first discuss the ordinal embedding approach to machine learning in a setting of ordinal distance information. The idea is to map the objects of a data set to points in a Euclidean space such that the points preserve the given ordinal data as well as possible (with respect to the Euclidean interpoint distances). I present a result stating the asymptotic uniqueness of ordinal embeddings. It says that for Euclidean data sets, which permit perfect ordinal embeddings, the points are uniquely determined up to a similarity transformation and small individual offsets that uniformly go to zero as the size of the data set goes to infinity. My result is the first of this kind in the literature and proves a long-standing claim dating back to the 1960s.
Then I will present two estimators for the intrinsic dimension of a data set that are based on only ordinal distance information. Although dimensionality estimation is a well-studied problem with a long history, all previous estimators from the literature are based on dissimilarity values. I proved both estimators to be statistically consistent. This result is particularly appealing when combining it with the above-mentioned uniqueness result. I discuss why we might expect one of the estimators to perform better than the other one and show some experiments that confirm these findings.
Finally, I will briefly sketch two approaches that provide generic alternatives to the ordinal embedding approach. In the first approach, I make use of the concepts of the lens depth function and the k-relative neighborhood graph. The second approach consists of defining data-dependent kernel functions on a data set that can be evaluated given only ordinal distance information.