Riede, Tobias : Vocal changes in animals during disorders


Chapter 2. Fundamentals of sound production and sound analysis

2.1 General aspects of laryngeal anatomy

Although the gross anatomy of the larynx is similar in all mammals (Negus, 1949; Schneider, 1964; Harrison, 1995; Nickel et al. 1998), Paulsen (1967) has already mentioned the variability of the laryngeal fine structure when referring to its function for sound generation. In this section some aspects of the larynx (Fig. 2.1) will be discussed with regard to their variability in structure in light of inter and mainly intraspecies variation.

Figure 2.1: Horizontal section of a canid larynx.


The larynx belongs to the respiratory tract. Its primary function is protecting the respiratory tract against food and foreign bodies. Sound production is a secondary function. This situation results in multi-use constraints, i.e. the same structure serves multiple functions and its anatomy is a compromise between these functions.

The laryngeal cartilage framework in mammals consist of 4 types of cartilages, one thyroid, one cricoid, one epiglottic and two (pairwise) arytaenoid cartilages. The arytaenoid cartilages are the phylogenetically oldest parts. They can be traced back to the arytaenoid plates (or Cartilago lateralis) of the Urodela, where they are situated lateral at the entrance of the trachea.

In recent mammals these paired cartilages show basically three degrees of freedom in mobility. The position of the arytaenoids is changed prior to vocalization and during breathing. The mobility of the arytaenoids is variable between species (Nickel et al. 1998). The left and right arytaenoid cartilages are attached by two muscles to the cricoid cartilage (Musculus cricoarytaenoideus dorsalis and lateralis). Between the arytaenoid cartilages and the thyroid cartilages the vocal folds are stretched.

The thyroid cartilage (Cartilago thyroidea) is an innovation in the monotremes. It represents the ventral and side part of the larynx. The thyroid shows high variability between as well as within species. A bulla-like enlargement, for instance, can be found in the marsupials, in the musk deer (Schneider 1964) and the takin (Frey, Hofmann in press) that might have an effect on the resonance characteristics of the vocal tract. In several species the thyroid shows the tendency to ossify. The cartilage tissue is replaced by a bony structure with increasing age. The resulting lower flexibility of the thyroid is discussed as one cause of an individual's ontogenetic vocal change (Titze 1994). The thyroid serves as insertion point of the vocal folds.

The vocal folds basically consist of a muscle (Musculus vocalis), a ligament (Ligamentum vocale), connective tissue and a mucosal cover. The vocal folds are stretched between thyroid and arytaenoid cartilage. The position of the vocal folds in relation to the airstream (rectangular or somewhat different) is species-specific (Schneider 1964). In clinical conditions, intraspecific shape variability of the vocal folds have been discussed in humans (Wendler et al. 1996), and few data exist in experimentally used mongrel dog (Jiang et al. 1994).

Laryngeal muscles are broudly divided into internal and external muscles. External muscles connect the larynx cranial to the hyoid bone and caudal to the sternoid bone. They are responsible for up and down movement of the larynx in humans and probably to a certain


extent also in animals (Fitch, pers. comm.) affecting the vocal tract length and vocal tract shape. The internal muscles of the larynx are responsible for opening and closing the glottis. The Musculus cricoarytaenoideus dorsalis is the only abductor of the vocal folds, i.e. it separates the dorsal ends of the vocal folds.

The neuronal supply of the internal laryngeal muscles occurs from branches of the vagus nerve, i.e. the Nervus laryngeus.

The musosal cover consists of cutaneous mucus layer from the epiglottis till the vocal folds and the respiratory mucus layer in the lower parts. The histological structure of the lamina propria of the vocal fold mucosa significantly varies among animals (Kurita et al. 1986).

2.2 Physiology of the laryngeal sound production

The vocal apparatus allows the transformation of aerodynamic energy into acoustic sound. The aerodynamic energy is sustained by the subglottic pressure which is maintained by the muscles of expiration. We distinguish voiced sound and turbulent noise. The voiced sound is generated by selfsustained oscillations of the vocal folds, these may be periodic (resulting in harmonic sound; syn. tonal) or aperiodic (resulting in noise; syn. atonal). Selfsustained oscillations of the vocal folds are mainly supported by a mechanical force which follows the Bernoulli law. This myoelastic-aerodynamic theory was stated by van den Berg (1958) in the following terms:

The fundamental frequency of the glottis generator is equal to the frequency of the vocal fold vibrations and depends on several interrelated factors: (1) the effective mass of the vibrating part of the vocal fold; (2) the effective tension in the vibrating part of the vocal fold; (3) the effective area of the rima glottidis during a cycle, which determines the effective value of the Bernoulli effect in the glottis; (4) the effective subglottic pressure and (5) the damping of the vocal fold.

Those aerodynamical, myoelastical and geometrical properties of the voice apparatus can be parameterized in a more or less accurate way. Even the simplest vocal fold models include nonlinear terms and more than 3 dynamical variables. Consequently, in case of chaotic dynamics vocal fold dynamics cannot necessarily be predicted over long times, because tiny errors in measurement of the initial state can result in 100% uncertainty in forecasting a later state. The theory of nonlinear dynamics provides appropriate methods in order to systematize such irregular and chaotic dynamical behaviour (Herzel 1993; Titze et al. 1993). The synthesis of the theory of nonlinear dynamics and the notion of vocal folds as coupled oscillators resulted in an acceptable realistic modelling of the vocal fold behaviour (Herzel et al. 1994). Most important deserve of this approach is the ability to explain not only normal


regular vibration patterns of the vocal folds and their interactions, but also the commonly occuring irregular behaviour (Herzel et al. 1998). Whereas the regular patterns refer to the harmonic vocalization, the irregular patterns refer to subharmonics, biphonation and deterministic chaos, which are henceforth refered to as nonlinear phenomena.

The most important impact of the nonlinear dynamic to bioacoustic research is the explaination of the acoustic phenomena subharmonics, biphonation and deterministic chaos. The nonlinear dynamics presented evidence that, in addition to central neural control, a further level of temporal organisation is provided by nonlinear oscillation dynamics that are intrinsic to the larynx as well as to the avian syrinx. Detailed spectral and temporal examination of the acoustic product revealed those nonlinear phenomena. They are consistent with transitions in the dynamical state of the in vitro larynx (Berry et al. 1994), the in vivo human larynx (Mergell 1998), the in vitro avian syrinx (Fee et al. 1998), the in vivo avian syrinx (Goller, Larson 1997) and probably also the in vivo nonhuman mammalian larynx (Fitch, unpublished data).

Following the production of the primary acoustic sound at the larynx ('sound source') the sound passes through the vocal tract, an anatomical structure including all cavities cranial from the glottis (pharyngeal, mouth and nasal cavity). While passing the vocal tract the primary signal will be changed, an effect known as source-tract-theory (Fant 1960). The vocal tract selects (filters) a subset of these frequencies (i.e. the secondary signal) for radiation from the mouth (the final, tape-recorded signal). The selected subsets of frequencies are (spectrally spoken) small areas in the frequency spectrum which represents resonance characteristics, i.e. resonance frequencies, of the vocal tract. The resonance frequencies have the special name 'formants' (from Latin formare - forming; according to Herrmann 1890), a term used in human phoniatrics (Fant 1960, Titze 1994) and animal bioacoustics (e.g. Lieberman et al., 1969; Nowicki, 1987; McComb, 1988; Fitch & Hauser, 1995).

2.3 Laryngeal pathology and vocal changes in animals

A complete list of laryngeal disorders and diseases indirectly affecting the larynx, respectively, will not be given here, rather, some of the more common will be described briefly, in particular those in which vocal changes have been mentioned as clinical symptom.

Congenital laryngeal malformations

Congenital laryngeal malformations are known in the domestic dog (Canis familiaris), for


instance laryngeal hypoplasia in brachycephalic dogs or a subglottic stenosis (Venker-van Haagen 1992), probably mostly affecting the sufficient supply of aerodynamic energy.

In horses (Equus caballus) hemiplegia laryngis can occur as a sever problem, a mostly on left side occuring paralysis of the Musculus cricoarytaenoideus dorsalis (Ohnesorge et al. 1993). Thus the control of the left arytaenoid cartilage and the left vocal fold is lost. The most pronounced clinical symptom is a stenotic (inspiratoric) noise while the horse is in exercise ('roaring', in German: 'Kehlkopfpfeifen') (Wintzer, 1997). The vocalization of such patients is lower in intensity and sounds hoarse (Wintzer, 1997). An unilateral chronic, degenerative axonopathia (destruction of the axon) of the Nervus laryngeus reccurens (which innervates the Musculus cricoarytaenoideus dorsalis) causes the paralysis (Cahill, Goulden 1987). The disease is transmitted genetically (Ohnesorge et al. 1993).

Laryngeal inflammation/ Systemic infections

Pathogen strains of the bacteria E. coli are associated with a systemic infection - the colienterotoxaemia - in juvenile, just weaned piglets (Sus scrofa f. domestica). The toxins produced by the bacterium cause among other symptoms edema of the larynx mucosa which obviously should have an effect on the vibration characteristics of the vocal folds. Dysphonia and aphonia were observed in those piglets (Schulze et al. 1980).

The most common cause of laryngitis in dogs is infectious tracheobronchitis (kennel cough), a viral/bacterial disease which causes inflammation of the laryngeal mucosa (Bemis 1992). Local irritations of the vocal folds in the dog can also be caused by a day of continuous barking (hyperphonation, Gray et al. 1987; Gray, Titze 1988) and by intratracheal intubation during anaesthesia (Leonard et al. 1992).

In pseudorabies (syn. Aujeszky's disease) infections (Herpesvirus) frequent vocalization ('...as in pain...') is mentioned as clinical symptome in dogs (Monroe, 1989) and loss of voice in pigs (Plonait, Bickhardt 1997). Since that virus is neurotropic, i.e. it affects the nervous system, it is assumed the conspicoues vocalization is caused by neural coordinative dysfunction. Excessive vocalization was also observed in rabies infected cats (Fogelman et al., 1993) and cattles and sheep (Hudson et al., 1996)

Misscellaneous tissue change

Laryngeal neoplasia have been described in dogs (Wheeldon et al. 1982; Carlisle et al. 1991)


and horses (Jones, 1994). In dogs the most common sign was a hoarse bark or a loss of voice (Venker-van Haagen 1992).

2.4 Signal analysis

Signal analysis provides a basis for assessing the vocal repertoire of individuals and species, and for relating variation in signal structure to variation in the phenotypic attributes of the signaller. Correlations are usually found between signal structure and both social and ecological contexts of signal production (e.g. Falls 1982; Wells 1988; Gouzoules, Gouzoules 1990). Signal analysis may also provide important hints for studying mechanisms of sound production (insects: e.g. Elsner 1994; birds: e.g. Suthers, Goller 1997; mammals: e.g. Lieberman 1969; Brown, Cannito 1995; Fitch 1997).

Hypothesis arising from such correlational data can be tested experimentally by playbacks of synthetic sounds (Hopp, Morton 1998; Jouventin et al. 1999), and acoustic analysis of natural sounds provide the information needed to generate such stimuli.

What is the smallest analyzed unit? In mammalian acoustic signals, Tembrock (1977, 1996 b) suggests that the acoustic unit is sorted according to its duration or temporal pattern, first, into simple utterances (pulsed, short, long) of similar spectral characteristics, second, into compound calls (consisting of two or more spectral characteristics in temporal succession) and, third, into sequences of calls of varying duration and spectral pattern. If two or more animals phonate simutaneously, he terms it 'supraphonation' (e.g. wolf chorusing).

Uniparametric (e.g. Green 1975) or multiparametric (e.g. Todt et al. 1995) approaches were used for repertoire analysis, i.e. for the temporal (time series) or spectral (spectrogram, spectrum) visual representation of an acoustic utterance a single or a set of parameters (e.g. duration of the call, fundamental frequency) were measured and set into relation of social or ecological context. The multiparametric approach (Schrader, Hammerschmidt, 1998) goes basically through two steps: the acoustic analyses were used to characterize spectral energy distribution, the fundamental frequency and temporal characteristics of particular distinct call types. Vocalizations were then classified by caller identity (e.g. Smith et al. 1982; Hammerschmidt, Todt 1995; Riede 1997; Rendall et al. 1998; Schön et al. 1999), population identity (Mitani et al. 1999) or by situational context (Fischer 1998) based on discriminant function analysis.

A more production-oriented way of parameter extraction implements the 'source-tract theory'


(Fant 1960). Source characteristics are mainly fundamental frequency parameters measured in the spectrum.

As previously mentioned, the sound source can be considered as a system of several coupled oscillators: the left and right vocal fold and sub- and supraglottal resonators. Applying the concepts of nonlinear dynamic means signal analysis with nonlinear techniques, as for instance generalized mutual information, dimensions, and Lyapunov exponents (Herzel et al. 1998). Those measures exploit quite different signal properties than spectrograms since they are based on phase space reconstruction and, hence, they reflect attractor properties instead of frequency patterns.

Former studies, using conventional tools of voise research and bioacoustics (spectrograms, spectra) showed that nonlinear phenomena, like biphonation, subharmonics and deterministic chaos, are very common in human and nonhuman mammal vocalization (Titze 1994; Wilden et al. 1998) as well as bird vocalization (Fee et al. 1998). A first step towards a functional understanding is to quantify those phenomena in an individual's repertoire. In order to detect nonlinear phenomena in sustained phonation, narrow band spectrograms are required. The (dis)appearance of spectral peaks due to nonlinear behaviour of the underlying dynamical system can be monitored. In this way subharmonics (related to period doubling or tripling), biphonation (two independent frequencies) and chaos have been identified by spectrograms in the animal's utterances.

Tract characteristics are mainly the resonance frequencies, the formants. Formant frequencies are measured using linear predictive coding (LPC) via autocorrelation (Markel, Gray 1976; Owren, Bernacki 1997). The LPC produces an envelope of the spectrum. A strength of the LPC lies among others in providing objective estimates of formant characteristics. The principle of LPC is that the values of the signal in the time domain are approximated by linear combinations of the previous values. A set of such predictive coefficients is calculated that the mean square error between the signal values and the linearly predicted values are minimized.

In summary, some researchers relate particular acoustic units to mechanisms of sound production, while others define acoustic properties in terms of perceptually relevant pattern. Still other investigators use acoustic characters from the visualized version of the signal. These diverse goals lead to different ways of defining, describing, and labeling particular acoustic structures. Thus, a describtion of typical examples of each animal signal should include labeled time series (oscillogram) and spectrogram (spectrograms, 2-D-spectrogram). The use of such figures avoids the confusion and errors that can arise from the assumption that a particular term always refers to the same acoustic unit.


[Front page] [Acknowledgements] [Attached Audiofiles] [1] [2] [3] [4] [5] [6] [7] [Bibliography] [Declaration]

© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.

DiML DTD Version 2.0
Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML - Version erstellt am:
Thu Sep 28 14:01:51 2000