Audio Analysis

In guitar playing both hands are used: one hand is used to press the strings on the fretboard and the other to pluck the strings. The hand that presses the frets is mainly determining the notes while the hand that plucks the strings is mainly determining the note onsets and timbral properties. However, fretting hand is also involved in the creation of a note onset or different expressive articulations such as legato, appoggiatura, glissando, or vibrato.

Our research has started studying the identification of attack articulations such as legato, appoggiatura, and glissando. Specifically, we are designing a system to automatically identify and analyze expressive performances. The system is composed of three main processes: segmentation and feature extraction, Acquisition (learning) of models for identifying expressive guitar resources, and analysis of guitar performances.

Analysis Blocks

Figure 1: System Architecture

Segmentation and Feature Extraction

Our approach is based on first determining the note onsets caused when plucking the strings. Next, a more finely grained analysis is performed inside the regions delimited by two plucking onsets.

Plucking Detection

The task of this module is to determine the onsets caused by the plucking hand, i.e. right hand onsets. As right hand onsets are more percussive than left hand onsets, we use High Frequency Content (HFC) measure. HFC is sensitive for abrupt onsets but not enough sensitive to the changes of fundamental fre- quency caused by the left hand.

Pitch Detection

The task performed by this module is to analyze the sound fragment between two plucking onsets. First, two points are determined: the end of the attack and the release start.  We use additional algorithms with a lower threshold in order to capture the changes in fundamental frequency inside each sound fragment. Specifically, Complex Domain algorithm is used to determine the peaks and Yin is used for the fundamental frequency estimation.

Extracting Sound Features

We plan to combine several state of the art feature extraction algorithms. Currently, we use features such as amplitude, aperiodicity, or fundamental frequency. However, the list will be enriched soon.

Legato Example of extracted featuresGlissando Example of extracted features

                    Figure 2: Amplitude, f0 and aperiodicity of a Legato.                         Figure 3: Amplitude, f0 and aperiodicity of a Glissando.


Before acquiring articulation models, two pre-processing steps are applied to the obtained features: smoothing and scaling. Smoothing is applied to reduce the impact of noise in feature extraction. The goal of scaling is to obtain a fixed length representation.

The first technique we are using is histogram envelope calculation. We use this technique to calculate the peak density of a stream of data. Specifically, we want to model the places where condensed peaks occur.

Next, we use SAX (Symbolic Aggregate Approximation), a symbolic representation used in time series analysis that provides a dimensionality reduction while preserving the properties of the curves, to construct articulation models. An example of legato and glissando models, using aperiodicity, is shown below.

SAX model for LegatoSax model for Glissando

                  Figure 4: Legato model using SAX representation.                                      Figure 5: Glissando model using SAX representation.


The current performance annotation process is simple. When a new performance is presented to the system, first the segmentation and feature extraction process is performed. Then, for each fragment considered a candidate to contain an expressive articulation, its distance to the articulation models is computed.

Borrowing from Carlevaro’s guitar exercises, we recorded a collection of ascending and descending chromatic scales. Legato and Glissando examples were recorded by a professional classical guitar performer. The performer was asked to play chromatic scales in three different regions of the guitar fretboard (we recorded notes from the first 12 frets where each recording concentrated in 4 specific frets). From thes recordings we obtained 72 examples of expressive articulations.

A deeper description of the analysis process can be found at our CMMR-2010 and SMC-2010 publications. A summary of the results presented at SMC-2010 is shown below.

Table 1: Performance results of our system.
Recordings Performance
Ascending Legato 100 %
Descending Legato 66.6 %
Ascending Glissando 83.3 %
Descending Glissando 77.7 %
Glissando in Metallic Strings 77.7 %
Glissando in Nylon Strings 83.3 %
Legato in Metallic Strings 86.6 %
Legato in Nylon Strings 73.3 %