In guitar playing both hands are used: one hand is used to press the strings on the fretboard and the other to pluck the strings. The hand that presses the frets is mainly determining the notes while the hand that plucks the strings is mainly determining the note onsets and timbral properties. However, fretting hand is also involved in the creation of a note onset or different expressive articulations such as legato, appoggiatura, glissando, or vibrato.
Our research has started studying the identification of attack articulations such as legato, appoggiatura, and glissando. Specifically, we are designing a system to automatically identify and analyze expressive performances. The system is composed of three main processes: segmentation and feature extraction, Acquisition (learning) of models for identifying expressive guitar resources, and analysis of guitar performances.
Figure 1: System Architecture
Segmentation and Feature Extraction
Our approach is based on first determining the note onsets caused when plucking the strings. Next, a more finely grained analysis is performed inside the regions delimited by two plucking onsets.
The task of this module is to determine the onsets caused by the plucking hand, i.e. right hand onsets. As right hand onsets are more percussive than left hand onsets, we use High Frequency Content (HFC) measure. HFC is sensitive for abrupt onsets but not enough sensitive to the changes of fundamental fre- quency caused by the left hand.
The task performed by this module is to analyze the sound fragment between two plucking onsets. First, two points are determined: the end of the attack and the release start. We use additional algorithms with a lower threshold in order to capture the changes in fundamental frequency inside each sound fragment. Specifically, Complex Domain algorithm is used to determine the peaks and Yin is used for the fundamental frequency estimation.
Extracting Sound Features
We plan to combine several state of the art feature extraction algorithms. Currently, we use features such as amplitude, aperiodicity, or fundamental frequency. However, the list will be enriched soon.
Figure 2: Amplitude, f0 and aperiodicity of a Legato. Figure 3: Amplitude, f0 and aperiodicity of a Glissando.
Before acquiring articulation models, two pre-processing steps are applied to the obtained features: smoothing and scaling. Smoothing is applied to reduce the impact of noise in feature extraction. The goal of scaling is to obtain a fixed length representation.
The first technique we are using is histogram envelope calculation. We use this technique to calculate the peak density of a stream of data. Specifically, we want to model the places where condensed peaks occur.
Next, we use SAX (Symbolic Aggregate Approximation), a symbolic representation used in time series analysis that provides a dimensionality reduction while preserving the properties of the curves, to construct articulation models. An example of legato and glissando models, using aperiodicity, is shown below.
Figure 4: Legato model using SAX representation. Figure 5: Glissando model using SAX representation.
The current performance annotation process is simple. When a new performance is presented to the system, first the segmentation and feature extraction process is performed. Then, for each fragment considered a candidate to contain an expressive articulation, its distance to the articulation models is computed.
Borrowing from Carlevaro’s guitar exercises, we recorded a collection of ascending and descending chromatic scales. Legato and Glissando examples were recorded by a professional classical guitar performer. The performer was asked to play chromatic scales in three different regions of the guitar fretboard (we recorded notes from the first 12 frets where each recording concentrated in 4 specific frets). From thes recordings we obtained 72 examples of expressive articulations.
|Ascending Legato||100 %|
|Descending Legato||66.6 %|
|Ascending Glissando||83.3 %|
|Descending Glissando||77.7 %|
|Glissando in Metallic Strings||77.7 %|
|Glissando in Nylon Strings||83.3 %|
|Legato in Metallic Strings||86.6 %|
|Legato in Nylon Strings||73.3 %|