Closing the optimality gap using affinity propagation
Closing the optimality gap using affinity propagation
An important problem in science and engineering is how to find and associate constituent patterns or motifs in large amounts of high-dimensional data. Examples include the identification and modeling of object parts in images, and the detection and association of RNA motifs that regulate tissue-dependent gene splicing in mammals. One approach is to identify a subset of representative data exemplars that are used to summarize and model the data. This is an NP-hard problem that is traditionally solved approximately by randomly choosing an initial subset of data points and then iteratively refining it. I'll describe a method called 'affinity propagation', which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. Affinity propagation is a general-purpose method and has been applied in a variety of areas, including digital communications, genomics, transcriptomics and document analysis. I'll outline open problems and possible future directions of research.