Closing the optimality gap using affinity propagation

-
Brendan Frey, University of Toronto

An important problem in science and engineering is how to find and associate constituent patterns or motifs in large amounts of high-dimensional data. Examples include the identification and modeling of object parts in images, and the detection and association of RNA motifs that regulate tissue-dependent gene splicing in mammals. One approach is to identify a subset of representative data exemplars that are used to summarize and model the data. This is an NP-hard problem that is traditionally solved approximately by randomly choosing an initial subset of data points and then iteratively refining it. I'll describe a method called 'affinity propagation', which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. Affinity propagation is a general-purpose method and has been applied in a variety of areas, including digital communications, genomics, transcriptomics and document analysis. I'll outline open problems and possible future directions of research.