Riede, Tobias : Vocal changes in animals during disorders

Institut für Biologie


Vocal changes in animals during disorders
Dissertation

zur Erlangung des akademischen Grades
doctor rerum naturalium (Dr. rer. nat.)

Mathematisch-Naturwissenschaftlichen Fakultät I
der Humboldt-Universität zu Berlin

Diplom-Biologe Tobias Riede,
tobiasriede@web.de

Dekan der Fakultät Prof. Dr. B. Ronacher

Gutachter:
Prof. em. Dr. Dr. h.c. mult. Günter Tembrock
Prof. Dr. Hanspeter Herzel
Prof. Dr. Dietmar Todt

eingereicht: 7. Februar 2000

Datum der Promotion: 26. Juni 2000

Abstract

If the sender's physiology or merely the sound generating apparatus is affected by a disease, what impact on voice does it have? How can this vocal change be described? Those questions were the central issue in this work, consequently this work is focussed on the sender's side - the acoustic signal and the mechanism of sound production. First nonlinear phenomena, acoustic events arising from certain vibration patterns of the vocal folds were investigated in three case studies. In all three cases the amount of nonlinear phenomena was higher in the disordered animal. Second, the harmonic-to-noise-ratio (HNR), an acoustic parameter not yet used in animal bioacoustics, was applied to dog barks to quantify dysphonia. Normal sounding dogs occupy a middle HNR range, while dysphonic dogs exceed this range to higher as well as to lower HNR values. Additionally, certain aspects of the vocal fold and vocal tract anatomy were investigated in respect to their significance for laryngeal sound generation.

This dissertation contains WAVE-files which can be downloaded here: Attached Audiofiles .

Keywords:
bioacoustics, nonlinear phenomena, harmonic-to-noise-ratio, vocal tract

Zusammenfassung

Welchen Einfluß hat eine Erkrankung der lautgenerierenden Strukturen auf das Lautprodukt. Wie kann eine Stimmveränderung beschrieben werden? Diese Fragen waren zentrales Thema der Untersuchungen. Es wurde ausschließlich auf Senderseite gearbeitet und das akustische Signal und sein Generierungsmechanismus betrachtet. Zunächst wurden nichtlineare Phänomene in drei Fallstudien betrachtet. Nichtlineare Phänomene sind akustische Ereignisse, die auf ein besonderes Schwingungsverhalten der Stimmlippen zurückzuführen sind. In allen drei Fällen kamen nichtlineare Phänomene am häufigsten bei dem erkrankten Tier vor. In einer weiteren Untersuchung wurde der Harmonischen-Rausch-Abstand auf Hundebellen angewendet. Dieser akustische Parameter wurde bisher noch nicht in der Bioakustik verwendet. Normal klingende Hunde scheinen einen mittleren HNR Bereich einzunehmen, während Hunde mit Dysphonie außerhalb dieses Bereiches liegen. Außerdem wurden Untersuchungen zur Anatomie der Stimmlippen und des Vokaltraktes durchgeführt, um bestimmte Aspekte der laryngealen Stimmgenerierung zu verstehen.

Diese Dissertation enthält WAVE-Datein, welche hier heruntergeladen werden können: Attached Audiofiles .

Schlagwörter:
Bioakustik, nichtlineare Phänomene, Harmonischen-Rausch-Abstand, Vokaltrakt


Pages: [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111]

Table of Contents

Front pageVocal changes in animals during disorders
Acknowledgements
Attached Audiofiles Attached Audiofiles
1 Introduction
2 Fundamentals of sound production and sound analysis
2.1General aspects of laryngeal anatomy
2.2Physiology of the laryngeal sound production
2.3Laryngeal pathology and vocal changes in animals
2.4Signal analysis
3 Nonlinear phenomena - common components of mammalian vocalization or indicators for disorders: three case studies
3.1The Japanese macaque infant
3.1.1Material and Methods
3.1.2Results
3.1.3Discussion
3.2The domestic cat infant
3.2.1Case history and course of the disease
3.2.2Acoustic analysis
3.2.3Results
3.2.4Discussion
3.3The dog-wolf hybrid
3.3.1Material and Methods
3.3.2Results
3.3.3Discussion
4 The harmonic-to-noise-ratio applied to dog barks
4.1Introduction
4.2Material and Methods
4.3Results
4.4Discussion
5 Vocal tract length and acoustics of vocalization in the domestic dog
5.1Formant frequencies and vocal tract length
5.2Materials and Methods
5.2.1Subjects
5.2.2Anatomical Measures
5.2.3Acoustical Measurements
5.3Results
5.4Discussion
5.5Conclusion
6 Summary
7 Zusammenfassung
Bibliography References
Declaration

Table of Tables

Table 3.1: The table shows the actual distribution of subharmonics, biphonations and frequency jumps in the vocalization of ten individuals. The relative frequency (rel. fr. [%]) of all three phenomena was calculated from 200 calls per individual.
Table 3.2: Means and standard deviations of 10 parameters extracted from the calls: f0 A-fundamental frequency at the beginning of the call; f0 E- fundamental frequency at the end of the call; f0 max - maximal fundamental frequency; t max. - distance between the beginning of the call and the point of maximal fundamental frequency; t gesamt - total length of the call; 1st quart, 2nd quart, 3rd quart - point of first, second and third energy quartil; f1/f0 - ratio of the relative amplitudes of the second harmonic and the fundamental frequency; f-peak - frequency with the highest peak in the spectrum; Hz - Hertz, ms - Milliseconds; N - number of calls investigated. On the 8th day the animal was discharged from the clinic. On two days thereafter the animal was acoustically recorded.
Table 3.3: Number of calls which contained nonlinear phenomena. SH - Calls containing Subharmonics, BP - Calls containing Biphonationen, CH - Calls containing Deterministic Chaos, SH+CH - Calls containing Subharmonics and Deterministic Chaos, N - number of calls investigated
Table 3.4: Total number and percentage of calls containing nonlinear phenomena and relative duration of nonlinear phenomena in five animals. The calls come from different chorusus.
Table 4.1: The pairs in the matrix with a frame represent those individuals which were used for the evaluation of the auditive impression by 5 subjects. The numbers in the framed boxes have to be read as ’the animal in the first column sounds more hoarse than the corresponding animal from the first row‘. The evaluation by the subjects corresponds with the HNR measure above the diagonal, but it does not below the diagonal. HNR means are given in the second row. n or c in the third row indicate if the individual belongs to the clinic or the normal sample.
Table 4.2: Actual values HNR mean and standard deviation of twenty individuals considered in this study. m - male, w - female
Table 5.1: Raw morphological and acoustic data for 47 individual dogs. VTL - vocal tract length; Df - formant dispersion; CV - coefficient of variation; F1, F2 - first and second formant
Table 5.2: Basic descriptive data for acoustic and anatomical variables of the dogs used in the study. (n - sample size, d - the standard deviation, S.E. - standard error of the mean, ’min‘ and ’max‘ - minimum and maximum values, respectiveliy; VTL - vocal tract length, Df - formant dispersion, F1 - lowest formant frequency).
Table 5.3: Pearson correlation coefficients between the various acoustic and anatomical variables measured in this study. *** - significant at P<0.001; ** - significant at P<0.01; * - significant at P<0.05; ns - not significant p>0.05. (F1 - lowest formant frequency, VTL - vocal tract length, Df - formant dispersion)

Table of Figures

Figure 2.1: Horizontal section of a canid larynx.
Figure 3.1a: 3.1a shows the time series and the spectrogram of two single calls each with a frequency jump (FJ). The FJ appears as a sudden change of the fundamental frequency. It is indicated by the arrows.
Figure 3.1b: 3.1b shows the time serie, the spectrogram, the averaged power spectrum and a zoomed segment of the time series of a call containing subharmonics (SH), indicated by the arrow in the spectrogram. The SH appear as parallel lines between the overtones (=’harmonics‘) of the fundamental frequency. The power spectrum gives further information. The energy peaks of the SH lie in a determinated distance (indicated by the numbers in the power spectrum) of the overtones of the fundamental frequency (f0 = 1.0 kHz), here the determinated distance is about 0.5 to 0.55 kHz. The subharmonic segment gives also a charcteristic picture in the time series if zoomed.
Figure 3.1c: In 3.1c the biphonic call shows two characteristic features, firstly, some elements can be found where the lines between the overtones of the original fundamental frequency are not parallel, and secondly, the distance of the lines is not necessarely related to the original f0. The distance between the parallel lines (here: 0.45 kHz) represent the f0 of the second pitch, which is only represented by its overtones in the spectrogram. The energy of the f0-line of the second pitch is too low that it can not be represented in the spectrogram. The zoomed segment of the time series gives a characteristic picture.
Figure 3.2: Spectrographic representation of the nonlinear phenomena, subharmonics, chaos, biphonation, in the calls of the cat. Harmonic windows appear within chaotic segments.
Figure 3.3a: Three calls from the third day in the clinic. The calls show a harmonic structure. The middle call starts with a noisy (chaotic) call segment. Some few calls from the fourth day show non linear phenomena. The time series are represented above each spectrogram to give an impression of the amplitude modulation.
Figure 3.3b: Three calls from the seventh day in the clinic. These calls show harmonical segments with clear fundamental frequency and further harmonics. We also found a high amount of calls containing segments with non linear phenomena. In the left there is a chaotic segment passing over in a short subharmonic regime and ending with a harmonic structure. In the middle call biphonation can be seen. In the right call we found chaotic and subharmonic segments as well as harmonic windows.
Figure 3.4: Time series and the spectrogram of a chorus. The upper graphs represent the first 20s, the middle ones the next 20s and below, the final 20s of the chorus are shown. The calls uttered by ’Schaka‘ are marked over the total call duration by arrows.
Figure 3.5: Time series and spectrograms of a 10 s - chorus cut-out with calls uttered by ’Schaka‘: The first call (between arrow 1 and 2) was harmonical (f0 at the beginning of the call 390 Hz, at the end 410 Hz, maximum 470 Hz, call duration 3.7 s). This call was overlaid by the call of another animal with very similar fundamental frequency. Schaka‘s second call started harmonically with increasing fundamental frequency (arrow 3). There was an abrupt change to a nonperiodic element with chaos at the beginning which continues to biphonic structures. The biphonation ended suddenly (arrow 4) and passed on to a harmonic element with decreasing fundamental frequency from 430 Hz to 380 Hz (arrow 5). This call element was overlaid by the howling call of another animal with increasing fundamental frequency. Note, that the time series displays a high amplitude during the chaotic and biphonic episode.
Figure 3.6: The call starts (arrow 1) with a chaotic element which passed on to a harmonic part with side bands (65 Hz distance). This part shows an abrupt transition (arrow 2) to a harmonic part with a fundamental frequency of 430 Hz, which decreases to 390 Hz (arrow 3). The total call duration is 6.4 s.
Figure 3.7: Two calls of ’Schaka‘ from the middle of a howling ceremony are shown. Both calls are overlaid by calls of other animals. Both calls start with a chaotic element (arrows 1 and 4 respectively). It follows a harmonic window and a second and much longer chaotic element (ending at arrows 2 and 5, respectively). Both calls end with a harmonic element with a fundamental frequency decreasing from 410 Hz to 390 Hz and 430 Hz to 380 Hz.
Figure 3.8: Time series and spectrogram of two vocalizations. The left one is from a 24 years old woman who was asked to imitate Schaka‘s vocalization which is on the right side. The woman is able to produce biphonation intentionally.
Figure 3.9: Time series, spectrogram and spectrum of the biphonic call from Figure 2. The spectrum represents only a short term segment (50 ms) around 0.9 s in the spectrogram. Indications of vocal tract resonances are found with LPC analysis around 550 Hz and 1800 Hz.
Figure 3.10: Horizontal sections of the middle part of the vocal fold of Schaka (with vocal lip) and of a normal sounding animal (without vocal lip). Compare to the section of the larynx in figure 2.1.
Figure 4.1: Schematic drawing of an original spectrum curve with harmonic peaks (thin line) and the 10-point-moving-average curve of the same spectrum (thick line).
Figure 4.2: Flow chart of the HNR calculation procedure.
Figure 4.3: Spectrograms, time series and spectra of three barks with different HNR illustrating that in calls with high HNR values the harmonic peaks are stronger than in others.
Figure 4.4: HNR mean and standard deviation of twenty individuals considered in this study.
Figure 4.5: The left diagram shows the distribution of 50 calls for each individual into three HNR groups according to its HNR value. The middle diagram shows the expected distribution of calls if the discrimination reaches a 100% correct assignment. The right diagram shows the result of the discriminant analysis, i.e. the assignment of calls into three groups.
Figure 4.6: Assignment of call to three HNR groups. In the first diagram 50 calls per individual were used for calculation the discriminant function and the same 50 calls were assigned into the groups. The second and third diagram shows the results when the sample was devided into a training and a test set. The size of the circles corresponds to the number of the calls.
sieheequation 1
sieheequation 2
sieheequation 3
sieheequation 4
Figure 5.1: Schematic drawing of the anatomical features and the morphometric features used in this study as observed by radiography. The lines represent 1 vocal tract length (VTL) and 2 skull length and were measured on digitized images of the radiographs in NIH image with reference to a 1 cm calibration square.
Figure 5.2: Relationship between (A) vocal tract length, VTL and body mass (m in kg), (B) VTL and formant dispersion, Df and (C) formant dispersion and body mass. The lines represent the linear regression lines, with the equations and r² values given in each case.
Figure 5.3: Waterfall representation of the LPC curves of a growling utterance showing the formant distribution over approximately one second. This 3-D spectrogram display shows the Fourier transform spectra of several time slices. The actual time slices are 75% overlapping.
Figure 5.4: Consistency of formant frequencies, measured across different growls for each of four individual dogs. Each growl is represented by a set of formant frequencies. For instance the first growl of dog 1 has four formant frequencies at approximately 500 Hz, 1800 Hz, 3000 Hz and 4200 Hz.(dog 1: Dachshund; dog 2: Rottweiler; dog 3: Irish Setter; dog 4: Mix (mass 10kg))

[Front page] [Acknowledgements] [Attached Audiofiles] [1] [2] [3] [4] [5] [6] [7] [Bibliography] [Declaration]

© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.

DiML DTD Version 2.0
Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML - Version erstellt am:
Thu Sep 28 14:01:51 2000