Structure-based audio fingerprinting for music retrieval
Publication Type:
Conference PaperSource:
Int. Soc. for Music Information Retrieval Conf. (ISMIR), Porto, Portugal, p.55-60 (2012)URL:
http://ismir2012.ismir.net/event/papers/055-ismir-2012.pdfAbstract:
Content-based approaches to music retrieval are of great relevance as they do not require any kind of manually generated annotations. In this paper, we introduce the concept of structure fingerprints, which are compact descriptors of the musical structure of an audio recording. Given a recorded music performance, structure fingerprints facilitate the retrieval of other performances sharing the same underlying structure. Avoiding any explicit determination of musical structure, our fingerprints can be thought of a probability density function derived from a self-similarity matrix. We show that the proposed fingerprints can be compared using simple Euclidean distances without using any kind of complex warping operations required in previous approaches. Experiments on a collection of Chopin Mazurkas reveal that structure fingerprints facilitate robust and efficient content-based music retrieval. Furthermore, we give a musically informed discussion that also deepens the understanding of the popular Mazurka dataset.
Unsupervised detection of music boundaries by time series structure features
Publication Type:
Conference PaperSource:
AAAI Conf. on Artificial Intelligence, AAAI Press, Toronto, Canada, p.1613-1619 (2012)URL:
http://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4907Keywords:
Time Series Structure; FeaturesAbstract:
Locating boundaries between coherent and/or repetitive segments of a time series is a challenging problem pervading many scientific domains. In this paper we propose an unsupervised method for boundary detection, combining three basic principles: novelty, homogeneity, and repetition. In particular, the method uses what we call structure features, a representation encapsulating both local and global properties of a time series. We demonstrate the usefulness of our approach in detecting music structure boundaries, a task that has received much attention in recent years and for which exist several benchmark datasets and publicly available annotations. We find our method to significantly outperform the best accuracies published so far. Importantly, our boundary approach is generic, thus being applicable to a wide range of time series beyond the music and audio domains.
