The use of complementary techniques of machine learning to discover knowledge in real complex domains
Publication Type:
ThesisSource:
Universitat Politècnica de Catalunya, Barcelona, p.240 (2002)Keywords:
Artificial Intelligence; clinical data; data analysis and modeling; data mining; diagnosis; Fuzzy; fuzzy representation; Machine Learning; WOWA aggregationAbstract:
This thesis is concerned with developing and refining a collection of methods and tools which can be applied to the different steps of the Data Mining process. Data Mining is understood as the analysis of data using sophisticated tools and methods, which include aspects of data representation, data exploration, knowledge discovery, data modelling and data aggregation. Data Mining can be applied in real and complex domains, such as the domain of clinical prognosis, as well as with artificial test, or benchmark data. Medical informatics is a dynamic area where new approaches and techniques are constantly being developed, the objective being to improve current data representation, modelling and aggregation methods to achieve better diagnosis and prognosis. In this work we focus on two medical data domains: prognosis for ICU patients and diagnosis of Sleep Apnea cases, although it is proposed that the techniques have general use for any data domain. A key approach which is used for data processing and representation is that of fuzzy logic techniques. Existing techniques are benchmarked against the data, such as neural networks, tree induction and standard statistical analysis methods such as correlation, principal components and regression models.
We carry out a survey of existing techniques, authors and their approaches, in order to establish their strong and weak points, limitations, and opportunities where improvement may be achieved.
The first major area under consideration is data representation: how to define a unified scheme which encompasses different data types, such as numeric, continuous, ordered categorical, unordered categorical, binary and fuzzy; how to define membership functions; how to measure differences and similarities in the data. This is followed by a comprehensive benchmarking of existing AI and statistical algorithms on a real ICU medical dataset, comparing the ‘Data Mining’ results to methods proposed by the author.
We define ‘fuzzy covariance’ as a value which permits the measurement of relation between two fuzzy variables. Previous fuzzy covariance work was limited to the covariance of a fuzzy cluster to its fuzzy prototype [Gustafson79]. More recent authors [Nakamori97][Wangh95][Watada94] have created specialised fuzzy covariance calculations tailored for specific applications. In this work, a general fuzzy covariance algorithm, which measures the fuzzy covariance between two fuzzy variables, has been conceived, developed and tested. The initial work based the Hartigan joining algorithm and fuzzy covariances evolves into and is contrasted with the later work on data and attribute fusion using the WOWA aggregation operator .
‘Aggregation operators’ are considered as a method for modelling data for clinical diagnosis, and use ‘relevance’ and ‘reliability’ meta-data together with grades of membership to enhance the information which the aggregation operator receives in order to model the data. We also make enhancements to the WOWA operator, to enable it to process data with missing values and we develop a novel method for learning the weighting vectors.
Extracting semantic information from an on-line Carnatic music forum
Publication Type:
Conference PaperSource:
Int. Soc. for Music Information Retrieval Conf. (ISMIR), Porto, Portugal, p.355-360 (2012)URL:
http://ismir2012.ismir.net/event/papers/355-ismir-2012.pdfAbstract:
By mining user-generated text content we can obtain music-related information that could not otherwise be extracted from audio signals or symbolic score representations. In this paper we propose a methodology for extracting musically-relevant semantic information from an online discussion forum, rasikas.org, dedicated to the Carnatic music tradition. For that we define a dictionary of relevant terms such as raagas, taalas, performers, composers, and instruments, and create a complex network representation by matching such dictionary against the forum posts. This network representation is used to identify popular terms within the forum, as well as relevant co-occurrences and semantic relationships. This way, for instance, we are able to guess the instrument of a performer with 95% accuracy, to discover the confusion between two raagas with different naming conventions, or to infer semantic relationships regarding lineage or musical influence. This contribution is a first step towards the creation of ontologies for a culture-specific art music tradition.
Folksonomy-based tag recommendation for online audio clip sharing
Publication Type:
Conference PaperSource:
Int. Soc. for Music Information Retrieval Conf. (ISMIR), Porto, Portugal (2012)URL:
http://ismir2012.ismir.net/event/papers/073-ismir-2012.pdfKeywords:
tagsAbstract:
Collaborative tagging has emerged as an efficient way to semantically describe online resources shared by a community of users. However, tag descriptions present some drawbacks such as tag scarcity or concept inconsistencies. In these situations, tag recommendation strategies can help users in adding meaningful tags to the resources being described. Freesound is an online audio clip sharing site that uses collaborative tagging to describe a collection of more than 130,000 sound samples. In this paper we propose four algorithm variants for tag recommendation based on tag co-occurrence in the Freesound folksonomy. On the basis of removing a number of tags that have to be later predicted by the algorithms, we find that using ranks instead of raw tag similarities produces statistically significant improvements. Moreover, we show how specific strategies for selecting the appropriate number of tags to be recommended can significantly improve algorithms' performance. These two aspects provide insight into some of the most basic components of tag recommendation systems, and we plan to exploit them in future real-world deployments.
Patterns, regularities, and evolution of contemporary popular music
Publication Type:
Conference PaperSource:
Complexitat.Cat, Barcelona (2012)URL:
http://www.complexitat.cat/seminars/112/Abstract:
Popular music is a key cultural expression that has captured listeners' attention for ages. Many of the structural regularities underlying musical discourse are yet to be discovered and, accordingly, their historical evolution remain formally unknown. In this contribution we use tools and concepts from statistical physics and complex networks to study and quantify the evolution of western contemporary popular music. In it, we unveil a number of patterns and metrics characterizing the generic usage of primary musical facets such as pitch, timbre, and loudness. Moreover, we find many of these patterns and metrics to be consistently stable for a period of more than fifty years, thus pointing towards a great degree of conventionalism in this type of music. Nonetheless, we prove important changes or trends. These are related to the restriction of pitch transitions, the homogenization of the timbral palette, and the growing loudness levels. The obtained results suggest that our perception of new popular music would be largely rooted on these changing characteristics. Hence, an old tune could perfectly sound novel and fashionable, provided that it consisted of common harmonic progressions, changed the instrumentation, and increased the average loudness.
An Evolutionary Optimization Approach for Categorical Data Protection
Publication Type:
Conference PaperSource:
Privacy and Anonymity in the Information Society 2012, Berlin (2012)Keywords:
genetic algorithms; data privacy; categorical data; data mining; information loss; disclosure riskAbstract:
The continuous growing amount of public sensible data has increased the risk of breaking the privacy of people or institutions in those datasets. Many protection methods have been developed to solve this problem by either distorting or generalizing data but taking into account the difficult trade-off between data utility (information loss) and protection against disclosure (disclosure risk).
In this paper we present an optimization approach for data protection based on an evolutionary algorithm which is guided by a combination of information loss and disclosure risk measures. In this way, state-of-the-art protection methods are combined to obtain new data protections with a better trade-off between these two measures. The paper presents several experimental results that assess the performance of our approach.
Analysis of On-line Social Networks Represented as Graphs – Extraction of an Approximation of Community Structure Using Sampling
Publication Type:
Conference PaperSource:
MDAI 2012, Springer-Verlag, Volume 7647, Girona, Catalunya., p. 149-160 (2012)Abstract:
In this paper we benchmark two distinct algorithms for extracting community structure from social networks represented as graphs, considering how we can representatively sample an OSN graph while maintaining its community structure. We also evaluate the extraction algorithms’ optimum value (modularity) for the number of communities using five well-known benchmarking datasets, two of which represent real online OSN data. Also we consider the assignment of the filtering and sampling criteria for each dataset. We find that the extraction algorithms work well for finding the major communities in the original and the sampled datasets. The quality of the results is measured using an NMI (Normalized Mutual Information) type metric to identify the grade of correspondence between the communities generated from the original data and those generated from the sampled data. We find that a representative sampling is possible which preserves the key community structures of an OSN graph, significantly reducing computational cost and also making the resulting graph structure easier to visualize. Finally, comparing the communities generated by each algorithm, we identify the grade of correspondence.
Técnicas para el Análisis de Datos Clínicos
Publication Type:
BookSource:
Diaz de Santos, p. 352 (2005)ISBN:
847978721XKeywords:
data analysis; clinical data; statistics; data mining; prognosis; diagnosisAbstract:
Este libro está dirigido a las personas que por razones profesionales o académicas tienen la necesidad de analizar datos de pacientes, con el motivo de realizar un diagnóstico o un pronóstico. Se explican en detalle las diversas técnicas estadísticas y de aprendizaje automatizado para su aplicación al análisis de datos clínicos. Además, el libro describe de forma estructurada, una serie de técnicas adaptadas y enfoques originales, basándose en la experiencia y colaboraciones del autor en este campo.
