Tag Archives: chen-gia tsai

Aside

Perception of Overtone Singing

Chen-Gia Tsai

Pitch strength

Voices of overtone-singing differ from normal voices in having a sharp formant Fk (k denotes Kh??mei), which elicits the melody pitch fk = nf0. For normal voices, the bandwidths of formants are always so large that the formants merely contribute to the perception of timbre. For overtone-singing voices, the sharp formant Fk can contribute to the perception of pitch.

A pitch model based on autocorrelation analysis predicts that the strength of fk increases as the bandwidth of Fk decreases. Fig. 1 compares the spectra and autocorrelation functions of three synthesized single-formant vowels with the same fundamental frequency f0 = 150 Hz and formant frequency 9f0. In the autocorrelation functions the height of the peak at 1/9f0, which represents the pitch strength of 9f0, increases as the the formant bandwidth decreases. Fig. 1 suggests that the pitch of fk is audible once the strongest harmonic is larger than the adjacent harmonics by 10 dB.

Figure 1: Spectra (left) and autocorrelation functions (right) of three single-formant vowels. Stream segregation

Next to the bandwidth of Fk, the musical context also plays a role in the perception of fk. During a performance of overtone-singing, the low pitch of f0 is always held constant. When fk moves up and down, the pitch sensation of f0 may be suppressed by the preceding f0 and listeners become indifferent to it. On the contrary, if f0 and fk change simultaneously, listeners tend to hear the pitch contour of f0, while the stream of fk may be more difficult to trace.

The multi-pitch effect in overtone-singing highlights a limitation of auditory scene analysis, by which the components radiated by the same object should be grouped and perceived as a single entity. Stream segregation occurs in the quasi-periodic voices of overtone-singing through the segregation/grouping mechanism based on pitch. This may explain that overtone-singing always sounds extraordinary when we first hear it.

Perception of rapid fluctuations

Tuvans employ a range of vocalizations to imitate natural sounds. Such singing voices (e.g., Ezengileer and Borbannadir) are characterized by rapid spectral fluctuations, evoking the sensation of rhythm, timbre vibrato or trill.

Return to Mongolian Khoomii Singing main page

http://www.soundtransformations.b

CHEN-GIA TSAI : Perception of Overtone Singing, TAIWAN

CHEN-GIA TSAI : Kargyraa and meditation, TAIWAN

Standard

Kargyraa and meditation : Chen-Gia Tsai

Pipe model of a Kargyraa singer’s vocal tract

The melody pitch f1 (the centre frequency of the first formant) in Kargyraa voices is determined by the mouth opening. A perturbation method predicts the resonance shift caused by a bore enlargement at a position x0 of a pipe with an irregular geometry (e.g., Fletcher & Rossing 1991). During a performance of Kargyraa, the bore diameter of the vocal tract changes at the lips, a pressure node for all modes. Hence, an enlargement of mouth opening leads to an increase in the centre frequencies of the first andsecond formants (Tsai 2001).

Image

ImageImage

Figure: (a) Spectrogram of a Kargyraa song “the far side of a dry riverbed” (b) and (c) are two snapshot spectra of (a). They show f2=2f1.

This pipe model does not predict (1) the small bandwidth of the first and second formants, and (2) “mode-locking” f2=2f1. I hypothesize that periodic vorticity bursts at the diffuser-like supraglottal structures are responsible for producing the strong components at f1 and 2f1.

Subharmonic generation

In Kargyraa, there is a nonlinear coupling between the two pairs of the vocal folds, which can lead to either entrainments or chaos. While 1:2 entrainment can produce beautiful voices of Kargyraa, pathological voices with the involvement of chaotic vibration of the ventricular folds have a hoarse quality (ventricular dysphonia).

Based on recordings of high-speed images of the laryngeal movement, Lindestad and colleagues (2001) reported that during Kargyraa singing the ventricular folds vibrated with complete but short closures at half the frequency of the true vocal folds, thus contributing to subharmonic generation.

Autonomic functions

It seems that stiffness of the ventricular folds cannot be manipulated by will, because they contain very few muscle fibres. However, the constantly increased ventricular function and repetitive closure may lead to new functional and anatomical changes in the interior of the larynx (such as ventricular hypertrophy) and, possibly, to a new system of innervation.

On the other hand, evidence of psycho emotional, cerebella or midbrain (e.g., Parkinsonism) types of ventricular dysphonia suggests sub-cortical influences of the ventricular folds.

It is interesting to note that Tibetan monks do not practice their vocalization. They improve the control of the ventricular folds through meditation! Meditation is a conscious mental process that induces a set of integrated physiologic changes termed the relaxation response. The elastic property of the ventricular folds may be affected by meditation through autonomic functions. They become so relaxed that they vibrate with complete closures at half the frequency of the true vocal folds. In contrast, emotional stress can lead to adduction and vibration of the stiff ventricular folds with incomplete closures. Because lower subharmonics are weak in such melancholic voices, they sound rough.

Tibetan monks stated repeatedly that while singing overtones one should always make a special effort to attune heart and mind to the meaning of the holy moment (Smith and Stevens 1967).

An overtone singer and researcher related the psychological mechanism underlying overtone singing during meditation to “a higher sound awareness”: When we meditate by way of singing the need to make pleasant or even beautiful sounds moves to the background. It is not the singing that decides whether we enter a truly meditative state of mind. More important is that we listen to ourselves that we search for the voice inside. We are not concerned with personal judgments about our voice or with the personality in our voice. Singing harmonics automatically focuses the mind more than most other types of singing, because we essentially sing just one tone and listen to its internal dynamics. Overtones demand from us a higher than normal sounds awareness. They fulfil a service in certain spiritual traditions and have a built-in symbolic association with ‘thing high’. They have the exceptional ability to unite voices to the highest degree and a tendency to unify the body and the mind. (van Tongeren 2002:207)

It is my hypothesis that overtone singing focuses the mind automatically on the weak pitch of the prominent nth harmonic. This form of meditation is designed to lead one to a subjective experience of absorption with the object of focus. From a viewpoint of neuroscience it seems appropriate that a model for this kind of meditation begins with activation of the prefrontal cortex and the cingulated gyrus. Brain imaging studies have suggested that tasks requiring sustained attention are initiated via activity in the prefrontal cortex, particularly in the right hemisphere, and the cingulated gyrus appears to be involved in focusing attention. In an excellent review paper on the neural basis of meditation, Newberg and Iversen (2003) proposed a neurophysiological network possibly underlying meditative states. They discussed the prefrontal cortex effects on thalamic activation, posterior superior parietal lobule deafferentation, hippocampal and amygdalar activation, hypothalamic and autonomic nervous system changes, autonomic-cortical activity, and neurotransmitter activity. Although their model may provide a general framework for studying the neural basis of meditation, it should be noted that there are categories and subcategories of meditation that may be associated with different neural activity. For example, overtone singing by Tibetan monks belongs to the meditation category in which the subjects focus their attention on a particular object. When the object is the melody composed of overtones, the mental task and thus neural activity may differ from the meditation technique that focuses the mind on an image, phrase, or word, because of the involvement of supraglottal structures.

Nitric oxide mechanisms

Image

Nonadrenergic, noncholinergic (NANC) nerves, which cause relaxation of airway smooth muscle, have been described in several species including man. Nitric oxide appears to account for all the NANC response in human central and peripheral airways in vitro. A recent review on meditation stressed the importance of the involvement of nitric oxide during meditation (Esch et al. 2004, see also Kim et al. 2005). Based on these findings I propose a model for Tibetan overtone chanting:

The loop underlying Tibetan overtone chanting can be described as: (1) a monk adducts and relaxes the ventricular folds; (2) he sings overtones; (3) he focuses his mind on the weak pitch of reinforced overtones; (4) this concentration triggers autonomic functions and nitric oxide mechanisms that in turn lead to a relaxation of the smooth muscles in the supraglottal structures.

References

Andersson K, et. al. (1998) Etiology and treatment of psychogenic voice disorders: results of a follow-up study of thirty patients. J Voice 12: 96-106.

Doersten PG, Izdebski K, Ross JC, Cruz RM. (1992). Ventricular dysphonia: a profile of 40 cases. Laryngoscope 102: 1296-1301.

D’Antonio L, et. al. (1987) Perceptual-physiologic approach to evaluation and treatment of dysphonia. Ann Otol Rhinol Laryngol 96: 187-190.

Esch T, Guarna M, Bianchi E, Zhu W, Stefano GB. (2004) Commonalities in the central nervous system’s involvement with complementary medical therapies: limbic morphinergic processes. Med Sci Monit. 10(6):MS6-17.

Hisa Y, Koike S, Tadaki N, Bamba H, Shogaki K, Uno T. (1999) Neurotransmitters and neuromodulators involved in laryngeal innervation. Ann Otol Rhinol Laryngol Suppl. 178:3-14.

Kim DH, Moon YS, Kim HS, Jung JS, Park HM, Suh HW, Kim YH, Song DK. (2005) Effect of Zen Meditation on serum nitric oxide activity and lipid peroxidation. Prog Neuropsychopharmacol Biol Psychiatry. 2005 Feb;29(2):327-31. Epub 2004 Dec 29. Lazar SW, Bush G, Gollub RL, Fricchione GL, Khalsa G, Benson H. (2000) Functional brain mapping of the relaxation response and meditation. Neuroreport 11(7):1581-5.

Newberg AB, Iversen J. (2003) The neural basis of the complex mental task of meditation: neurotransmitter and neurochemical considerations. Med Hypotheses 61(2):282-91.

van Tongeren, M. (2002) Overtone singing – physics and metaphysics of harmonics in East and West. The Netherlands: Fusica,Amsterdam.

Yuceturk AV, Yilmaz H, Egrilmez M, and Karaca S. (2003) Voice analysis and videolaryngostroboscopy in patients with Parkinson’s disease. Eur Arch Otorhinolaryngol. 2002 259(6):290-3.

http://soundtransformations.co.uk/KargyraaandmeditationchenGiaTsai.htm

Chen-Gia Tsai, Yio-Wha Shau, and Tzu-Yu Hsiao : False vocal fold surface waves during Sygyt singing: A hypothesis, TAIWAN

Standard

False vocal fold surface waves during Sygyt singing: A hypothesis

Chen-Gia Tsai, Yio-Wha Shau, and Tzu-Yu Hsiao

Abstract

Overtone singing is a vocal technique found in Central Asian cultures, by which one singer produces a high pitch of nF0 along with a low drone pitch of F0. The pitch of nF0 arises from a very sharp formant. Current physical modelling of overtone singing asserts that the harmonic at nF0 is emphasized by a resonance of the vocal tract. However, this approach could not explain the extraordinarily small bandwidth of this formant.

This paper offers a hypothesis that surface waves (Rayleigh waves) of the false vocal folds might actively amplify the harmonic at nF0 in a specific technique of overtone singing: Sygyt. We propose a loop for harmonic amplification, which is composed of (1) the vocal tract with resonance nF0, (2) surface waves of the false vocal folds, and (3) a varicose jet separating from the false folds. This model receives indirect support from an experimental study on a novel human vocalization, which is characterized by a prominent component at 4 kHz. During this pure tonal vocalization, false fold surface vibrations were detected by ultrasound colour Doppler imaging. High-frequency false fold surface waves may also occur during Sygyt singing.

1. Introduction

Overtone singing (or throat singing, biphonic singing) is a vocal technique found in Central Asian cultures such as Tuva and Mongolia, by which one singer produces a high pitch of nF0 along with a low drone pitch of F0 (F0 is the fundamental frequency, n = 6, 7, …13 in typical performances). The voice of overtone singing is characterized by a sharp formant centered at nF0, as can be seen in Figs. 1 and 2. Traditional techniques of overtonesinging include Khoomei, Sygyt, Kargyraa and others.

There are two approaches of physical modelling of overtone singing: (1) the double-source theory [1], which asserts the existence of a second sound source that is responsible for the melody pitch; and (2) the resonance theory, which asserts that a harmonic is emphasized by a extreme resonance of the vocal tract. The fact that the melody pitches producible by the singer are limited to the harmonic series of the drone was regarded as robust support of the resonance theory [2].Image

Recent attempts of physical modleling of Sygyt were concerned with calculation of the transfer function of the vocal tract using one-dimensional models, successfully predicting the formant frequency [2,3]. From a theoretical standpoint, however, this approach may not be suitable for the tract with a rapidly flaring bell section. A Sygyt singer raises the tongue so that the tract shape changes abruptly at the narrowing of the tongue (marked with a red dot in Fig. 1b), where the assumption of planar wave fronts breaks down, and evanescent cross-modes can be excited in this flaring section even at low frequencies [4]. This may leads to errors in transfer function calculation using one-dimensional models. An alternative approach of Matched Asymptotic Expansions for modelling a Sygyt singer’s vocal tract was proposed in [5].

In a two-resonator theory, a Sygyt singer’s vocal tract was modelled as a coupled system of a longitudinal resonator that was from the glottis to the narrowing of the tongue, and a Helmholtz resonator that was from the articulation by the tongue to the mouth exit. Experiments showed that for some Sygyt voices with a sharp formant two resonances were matched, while a melody pitch can be perceived even in the case of not exactly matched resonances [6]. Although the formant magnitude was shown to be increased by resonance matching [3], it is unclear whether resonance-matching will reduce the formant bandwidth.

From a psychoacoustic point of view, a small bandwidth of the prominent formant is critical to a clear melody in Sygyt singing. A preliminary study using an autocorrelation model for pitch extraction suggested that the pitch strength of nF0 increased along with the Q value of this formant, with the formant magnitude playing a secondary role [5]. The spectrum of the Sygyt voice shown in Fig. 1a has the 12th harmonic approximately 15 dB stronger than its flanking components. If the amplification of this harmonic cannot be explained in terms of vocal tract impedance, it should be attributed to the source signal.Image

The insufficiency of the resonance theory is even more notable in another technique of overtone singing: Kargyraa. A

Kargyraa singer uses his false vocal folds to produce low pitched drone, manipulating his mouth opening to change the vocal tract resonance. Spectra in Fig. 2 show that the centre frequencies of the first and second formants of Kargyraa voices always stand in the ratio of 1:2. This strange phenomenon suggests an unknown glottal source that produces the outstanding component at F1, and its second harmonic.

The goal of this study is to offer a physical model based on a nonlinear loop that explains the harmonic amplification in

Sygyt. This model asserts that surface waves (Rayleigh waves) of the adducted false vocal folds can actively amplify a harmonic. We first discuss the interactions between the false vocal fold surface waves (FVFSWs), the glottal flow and acoustic waves. A preliminary experiment that provided indirect evidence of this model is then addressed.

2. Theory

2.1. Rayleigh surface waves

The Rayleigh surface wave is a specific superposition of a transverse wave and a longitudinal wave of an elastic solid (see, e.g. [7]). Its amplitude is significant only near the surface and attenuates exponentially with the depth. The trajectories of material particles are ellipses. At the surface the normal displacement is about 1.5 times the tangential displacement. The velocity of Rayleigh waves, independent on the wavelength, is about 0.9 times the transverse wave velocity. Rayleigh’s theory of surface waves has been generalized to viscoelastic solids (see, e.g. [8]).

The assumption of Rayleigh surface wave on the false vocal folds is supported, although indirectly, by recent measurements of the medial surface dynamics of the vocal folds [9]. The trajectories of fleshpoints were approximately ellipses, with the length ratio of the two axes varying in the range of 1.5-2.0. This value is in remarkable agreement with Rayleigh’s theory of surface waves.

2.2. Physical modelling of Sygyt

Here we propose a physical model that describes how FVFSWs absorb the energy of the glottal flow and acoustic waves.

Image

The false folds are significantly adducted during Sygyt singing. Hence, the volume flow through them (UF) is sensitive to FVFSWs. FVFSWs are supposed to be triggered by the acoustic pressure, which is predominated by the resonance of the vocal tract nF0. So we assume a FVFSW with the frequency of nF0.

Based on the assumption of elliptic movements of fleshpoints on the false folds, snapshots of this wave can be obtained. The ellipses in Figs. 3b and 3c represent the trajectory of fleshpoints. We estimate the energy exchange between the flow and the tissue occurs at one point. In Fig. 3b the work done by the viscous flow at this point is positive. In Fig. 3c the flow separates upstream, performing no work (or positive work, if back-flow appears) at this point. It can easily be seen that over a period the FVFSW absorbs energy from the flow in the vicinity of the flow separation point, which moves back and forth at a crest of the FVFSW, modulating the flow through the false folds at frequency of nF0. This induces varicose oscillations of UF, which produce the harmonic at nF0 in the source signal. This harmonic is in turn reinforced by the strong vocal tract resonance at nF0.

The net work done by the sinusoidal acoustic wave with frequency nF0 at a point on the false fold over a period can be positive or negative, depending on the phase relationship between the FVFSW and the acoustic pressure. We suppose that within a half wavelength of the FVFSW in the vicinity of the flow separation point, the FVFSW absorbs the acoustic energy of the harmonic at nF0. Away from this flow separation point, the FVFSW is expected to decay rapidly because of large viscous losses in the tissue during high frequency vibrations. We thus conclude that the total work done by the acoustic wave on the FVFSW is positive.

To sum up, a loop for Sygyt is established in terms of (1) linear resonator: the vocal tract with resonance at nF0, (2) energy source: pressure difference across the false glottis, and (3) nonlinear amplifier: a flow separating from curved walls with mucosal layers receiving acoustic feedback. This self sustained oscillator differs from the true vocal folds in that the false fold mucosa does not vibrate at any intrinsic resonance, but rather respond to the acoustic pressure.

2.3. Discussion

The present model explains the crucial role of the adduction of the false folds in Sygyt technique. Because of this adduction the flow velocity over their mucosal layers is high enough to   supply the energy for sustaining FVFSWs. It is interesting to note that FVFSWs have been observed in patients suffering from ventricular dysphonia [10], although their frequencies appeared to be much lower than those during Sygyt singing.

From an empirical standpoint, learning Sygyt is much more difficult than it is implicated by the resonance theory. In workshops of overtone singing, it has been repeatedly observed that only very few people are able to produce voices with a clear melody pitch. The present model predicts that one cannot sing Sygyt well even when manipulating the tract shape perfectly, because his false folds are not correctly adducted, or their mucosal layers do not have a proper shape, thickness, and viscoelastic properties.

The loop described in our model tends to “unify” the double-source theory and the resonance theory of overtone singing. Whereas the true vocal folds and the vocal tract are, as usual, viewed as the independent source and filter, the false fold mucosa plays a key role in introducing acoustic feedback into the loop for harmonic amplification.

The present model for Sygyt might also shed new light on the production of high-frequency, whistle-like voice type of birds, dolphins, whales, and groaning dogs. In this regard, our model is an updated version of the double-source theory [1], which already drew parallels between the sounding mechanisms of overtone singing and the whistle-like voice type, which is produced with the false folds adducted.

It is interesting to compare the harmonic-amplification loop with the sounding mechanism of flute-type instruments, which is based on a loop composed of a vibrating jet and acoustic waves filtered by a resonator. In the case of flutes the jet separates from the musician’s lips, travelling along the mouth of the resonator towards a sharp edge. When the instrument produces a tone, the jet oscillates at one of the resonances of the pipe. The acoustic flow field near the flow separation point excites sinuous oscillations of the jet. At the sharp edge, the jet is directed alternately toward the inside and the outside of the resonator. This pulsing injection induces an equivalent pressure difference across the mouth that excites and maintains acoustic waves in the pipe [11]. The jet, like the false fold mucosa, does not vibrate at any intrinsic resonance. It should be noted that the acoustic flow induces sinuous oscillations of the jet at the mouth hole of a flute, whereas the acoustic pressure excites FVFSWs that induce varicose oscillations of the glottal flow.

While a varicose jet is essential for whistle-like sound production, the role of wall vibration is not fully understood. It has been suggested that the sounding mechanism of human whistling is a loop composed of the jet and the oral cavity with a prominent resonance. The pressure fluctuations due to the acoustic wave at the flow separation point could induce varicose oscillations of the jet without any wall vibration. This model is in an interesting contrast to our model of Sygyt, which assumes vibrations of the compliant walls. To examine the assumption of FVFSWs in our model of Sygyt, we measure surface vibrations during whistle-like singing in vivo.

3. Experimental Study

3.1. Whistle-like voice type

Image

The present model of “varicose jet oscillations induced by surface waves of curved walls in the vicinity of the flow separation point” may provide insight into the production of the whistle-like voice type in birds and mammals. It has been suggested that the production mechanism of bird whistled song might be related to a retraction of the syringeal membranes while in oscillation so that they no longer completely close, leading to a great reduction in the harmonic content of the flow. An alternative explanation of whistled song is that it is produced by pure aerodynamic means without any vibrating surfaces [12]. However, recent experimental studies favour the sounding mechanism of vibrating surface [13,14].

After some practice, human can imitate dog’s groaning to produce high-frequency whistle-like voices, which have a prominent component approximately at 4 kHz, as shown in Fig. 4c. We hypothesize that the mechanism underlying this vocalization is a varicose jet induced by FVFSWs.

Medical ultrasound (US) provides an ideal non-invasive method for observing high-frequency surface vibrations with small amplitude, because the vibratory artefact of colour Doppler imaging (CDI) detects surface velocity rather than displacement. In previous studies, the CDI was used to measure the frequency and the length of the vocal folds during normal phonation [15,16]. In the present experiment we employ this technique to detect FVFSWs during whistle like singing.

3.2. Methods

A commercially available, high resolution US scanner (HDI-5000, ATL, Bothell, WA) with a 5- to 12-MHz linear-array transducer (L12 to 5 38 mm, ATL) was used in this study. The frame rate in B-mode was about 25 Hz. In the colour mode, the pulse-repetition rate was 10,000 Hz and th measuring velocity range was set at 0 to 128.3 cm/s with baseline offset, which resulted in a frame rate of about 7 Hz. TheUS scan head was placed horizontally at the midline of the thyroid cartilage lamina on one side (Fig. 4a). The subject is the first author of this paper, who is a healthy man aged 33 with normal vocal function. For this experiment he had practiced the whistle-like vocalization for a week.

3.3. Results

CDI colour artefacts detected surface vibrations of the right false vocal fold during pure tonal singing (Fig. 4d). During warming up of this vocalization, surface vibrations of the right vocal fold and the false fold were observed (Fig. 4b).

The frequency of pure tonal singing was found to range from 3.7 kHz to 4.6 kHz. Out of this range the voice lose the pure tonal characteristic, with breathy noises accumulating at the prominent resonance.

4. Concluding Remarks

The observation of false fold surface vibrations during pure tonal singing provides indirect support of our model for Sygyt. As FVFSWs may generate 4 kHz pure tonal voices with the second harmonic 30 dB (or more) weaker than the fundamental, it should be possible that a Sygyt singer amplifies a selected harmonic of the voice produced by the true vocal folds through FVFSWs.

The role of acoustic feedback in FVFSW generation is not fully understood. When the acoustic wave filtered by the resonator is strong enough to trigger FVFSWs, a loop for pure tonal vocalization may be established. If not, periodic FVFSWs may not occur. The laryngeal ventricle may be the Helmholtz resonator that is responsible for the prominent resonance at 3.7-4.6 kHz. However, this “resonance” model appears against experimental results about bird’s pure tonal vocalization [13,14]. If the frequency of surface waves is not determined by the tract resonance, it should be determined by the tissue curvature, elastic properties, and the flow speed. In the case of Sygyt singing, however, it has not been reported that a singer manipulates the false folds to change the melody pitch. Further research is needed to compare the sounding mechanisms of Sygyt singing and the pure tonal vocalization.

One implication of our surface wave model is that the vertical motion of fleshpoints on the true/false vocal folds may be critical to their self-sustained oscillation. The two-mass and three-mass models of the vocal folds [17,18] do not take into account the ellipse-like motion of vocal fold fleshpoints, which is consistent with Rayleigh’s theory of surface waves and has been demonstrated in excised canine larynx experiments [9]. We suggest that the vertical motion of fleshpoints near the flow separation point can absorb the kinetic energy of the glottal flow through viscous shear force.

The effect of surface viscous shear stress exerted by a flow also plays a central role in the system of a pair of fluttering flags in wind. This system shows some notable similarities of the glottis. When the inter-flag distance lies in a definite range the flags flutter in an out-of-phase state and generate a pulsating flow, with striking similarities of the vocal fold vibration in the chest register. Flow visualizations showed significant shear stress on the flags exerted by the flow [19]. This finding suggests that viscous shear stress on the vocal fold mucosa should not be ignored, especially in the vocalizations with a large open quotient.

Next to the viscosity effect, the surface shear stress may be attributed to the carrying-along of the varicose flow. It was observed in a pair of flags that the flag wave propagates along with the flow, while the wave of an isolated flag propagates in the direction opposite to the flow. Note that the surface shear stress dominates the system of a pair of flags but not an isolated flag [19]. It is likely that the surface shear stress is due to the effect that a varicose or sinuous flow carries along the flag wave. This approach may shed new light on the mechanism of the self-sustained oscillation of the vocal folds.

5. References

[1] Chernov, B.; and Maslov, V. 1987. Larynx double sound generator. Proc. XI Congress of Phonetic Sciences,

Tallinn 6, 40-43.

[2] Adachi, S.; and Yamada, M. 1999. An acoustical study of sound production in biphonic singing, Xöömij. J. Acoust. Soc. Am. 105(5), 2920-2932.

[3] Kob, M. 2002. Physical modeling of the singing voice. PhD thesis, Aachen University (RWTH).

[4] Pagneux, V.; Amir, N.; and Kergomard, J. 1996. A study of wave propagation in varying cross-section waveguides by modal decomposition. Part I. Theory and validation. J.Acoust. Soc. Am. 100, 2034-2048.

[5] Tsai, C.G. 2004. Physics and perception of overtone singing. URL: http://jia.yogimont.net/overtonesinging/

[6] Kob, M.; and Neuschaefer-Rube, C. 2004. Acoustic properties of the vocal tract resonances during Sygyt singing. Proc. of the International Symposium on Musical Acoustics, Nara, Japan.

[7] Achenbach, J.D. 1984. Wave propagation in elastic solids. Elsevier, New York.

[8] Romeo, M. 2001. Rayleigh waves on a viscoelastic solid half-space. J. Acoust. Soc. Am. 110 (1), 59-67.

[9] Berry, D.A.; Montequin, D.W.; and Tayama, N. 2001. High-speed digital imaging of the medial surface of the vocal folds. J. Acoust. Soc. Am. 110(5), 2539-2547.

[10] Nasri, S.; Jasleen, J.; Gerratt, B.R.; Sercarz, J.A.; Wenokur, R.; and Berke, G.S. 1996. Ventricular dysphonia: a case of false vocal fold mucosal travelling wave. Am. J. Otolaryngol. 17(6), 427-431.

[11] Verge, M.P.; Caussé, R.; Fabre, B.; Hirschberg, A.; Wijnands, A.P.J.; and van Steenbergen, A. 1994. Jet oscillations and jet drive in recorder-like instruments. Acustica 2, 403-419.

[12] Gaunt, A.S.; Gaunt, S.L.L.; and Casey, R.M. 1982. Syringeal mechanics reassessed: evidence from Streptopelia. Auk 99, 474-494.

[13] Brittan-Powell, E.F.; Dooling, R.F.; Larsen, O.N.; and Heaton, J.T. 1997. Mechanisms of vocal production in budgerigars (Melopsittacus undulatus). J. Acoust. Soc.Am. 101, 578-589.

[14] Ballintijn, M.R.; and Cate, C.T. 1998. Sound production in the collared dove: a test of the ‘whistle’ hypothesis. J

Experimental Biology 201, 1637-1649.

[15] Shau, Y.W.; Wang, C.L.; Hsieh, F.J.; and Hsiao, T.Y.

2001. Noninvasive assessment of vocal fold mucosal wave velocity using color Doppler imaging. Ultrasound

Med. Biol. 27, 1451-1460.

[16] Hsiao, T.Y.; Wang, C.L.; Chen, C.N.; Hsieh, F.J.; and Shau, Y.W. 2002. Elasticity of human vocal folds

measured in vivo using color Doppler imaging. Ultrasound Med. Biol. 28, 1145-1152.

[17] Ishizaka, K.; and Flanagan, J.L. 1972. Synthesis of voiced sounds from a two-mass model of the vocal cords.

Bell Syst. Tech. J. 51(6), 1233-1268.

[18] Story, B.H.; and Titze, I.R. 1995. Voice simulation with a body cover model of the vocal folds. J. Acoust. Soc. Am.97, 1249-1260.

[19] Zhang, J.; Childress, S.; Libchaber, A.; and Shelley, M. 2000. Flexible filaments in a flowing soap film as a model for one-dimensional flags in a two-dimensional wind. Nature 408, 835-839.

http://www.soundtransformations.co.uk/FalsevocalfoldsurfacewavesduringSygytsinging.htm

Aside

Physical Modelling of the vocal tract of a Sygyt singer

Chen-Gia Tsai

Source theory vs. Resonance theory

Two types of overtone-singing should be distinguished: Sygyt and Kargyraa. In Sygyt performances, the rising tongue divides the vocal tract into two cavities, which are connected by a narrow channel, whereas the tongue does not rise in Kargyraa performances.

Up until now, two major theories have been proposed on the production of the melody pitch: (1) The ‘double-source’ theory (Chernov & Maslov 1987), which asserts the existence of a second sound source such as a whistle-like mechanism formed by the narrowing of the false vocal folds (ventricular folds) in addition to the true vocal fold vibration; and (2) the ‘resonance’ theory, which asserts that only a glottal sound source exists, but that an upper harmonic is so emphasized by an extreme resonance of the vocal tract that it is segregated from the other components and heard as another pitch. The fact that the melody pitches producible by the singer are limited to the harmonic series of the drone supports the resonance theory (Adachi & Yamada 1999).

Physical modelling of the resonance of the vocal tract of Sygyt singers includes: (1) rear cavity theory, (2) front cavity theory, and (3) resonance-matching theory. The glottal sound source of Sygyt voices is rich in harmonics. This has been attributed to the short open duration of the glottis (Bloothooft et al. 1992, Adachi & Yamada 1999).

Rear cavity theory

Based on vocal tract shape measurements by MRI, Adachi and Yamada (1999) reported that the resonance of the rear cavity, that was, from the glottis to the narrowing of the tongue, produced the sharp formant Fk. The resonance of the front cavity, that was, from the articulation by the tongue to the mouth exit, was not critical to the production of the melody pitch. The length of the rear cavity decreases as fk increases.

Adachi and Yamada (1999) synthesized tones from transfer functions calculated with and without the front cavity, finding that the front cavity did not affect the formant frequencies, although the magnitude of Fk decreased due to the lack of the front cavity resonance. It is important to note that Adachi and Yamada calculated the transfer functions of a Sygyt singer’s vocal tract using a one-dimensional model, in which the tract shape was approximated as a succession of cones. While such models are widely used in speech research, I argue that the change in the tract shape at the articulation point is so abrupt that the assumption of planar-wave fronts clearly breaks down. Theoretically, one-dimensional models are unsuitable for a Sygyt singer’s vocal tract.

In practice, the rear cavity theory is not supported by a non-traditional technique of overtone-singing used by Tran Quang Hai, who calls it ‘one-cavity technique’ because the tongue does not rise to divide the vocal tract into two cavities. However, there is an articulation point at the soft palate, as to pronounce the velar /ng/. The melody of fk is produced by manipulating the opening of the front cavity, while the rear cavity, that is, from the glottis to the soft palate, remains unchanged. This technique suggests that the front cavity may be more important for the production of fk.

Front cavity theory

Based on preliminary impedance measurements of vocal tract by a Jew’s harp, Tsai (2001) reported that the resonance of the front cavity determined fk. The author modelled the front cavity as a Helmholtz resonator driven by a flow source U1 at the articulation point. The transfer function can be calculated according to Eq. (6.65) in [Fletcher & Rossing 1991].

Owing to the tract shape at the articulation point, the flow U1 is presumed to be incompressible. It is known that in regions of fast change in pipe geometry, such as a tone hole or the pipe termination, the Helmholtz number He<<1 implies that the wave equation can locally be approximated by the Laplace equation, which describes an incompressible potential flow (Hirschberg & Kergomard 1995). In overtone-singing, the acoustic flow at the articulation point is therefore incompressible (compact region). This is not true for normal phonations.

The front cavity theory failed to explain the small bandwidth of Fk. Fig. 2 compares the matched theoretical spectral envelops and recorded spectra of a Sygyt voice and a Jew’s harp tone, which were produced by me with the same front cavity. It can be seen that the Fk bandwidth of the voice is smaller than that of the Jew’s harp tone. The latter was produced without the rear cavity because the rising tongue completely closed the channel between the front and the rear cavities. This discrepancy suggests that the rear cavity may play a role in sharpening Fk.


Figure 2: Spectra of a Sygyt voice (left) and a Jew’s harp tone (right) produced with the same front cavity.

Resonance-matching theory

The resonance-matching theory takes into account the contributions of both the front and the rear cavities, whose resonances are more or less matched to produce a sharp Fk. Kob (2002), reported that an improvement of the second resonance by about 15 dB was achieved by matching two resonance frequencies, which was fulfilled by manipulating the mouth opening. Although this theory appears to ‘unified’ the theories of rear/front cavity, it should be noted that according to Table 6.1 in [Kob 2002], the resonance of the front cavity was just close to the second resonance of the rear cavity; Fk could be sharp enough for pitch production without an exact resonance-matching.

Discussion

Kob (2002) calculated the transfer functions of a Sygyt singer’s vocal tract using an improved method of continuous-time interpolated multiconvolution (Barjau et al. 1999), which was originally developed to calculate the impulse response of wind instruments with tone-hole discontinuities. However, this approach does not predict the flow field at the articulation point. Fig. 3 displays the shape of a Sygyt singer’s vocal tract and the potential field at the articulation point. As can be seen from the isobar (equal-potential) lines, the acoustic flow has a higher velocity near the tongue. This contradicts the assumption of planar-wave fronts in Kob’s calculation.


Figure 3: Shape of a Sygyt singer’s vocal tract (left) and the isobar lines at the articulation point (right).

The limitations of one-dimensional models of the vocal tract or the bore of wind instruments should be borne in mind: even at low frequencies evanescent cross-modes will be excited in the rapidly flaring bell section because of strong mode coupling (e.g., Pagneux et al. 1996). In a Sygyt singer’s vocal tract, one-dimensional models are suitable only for the rear cavity.

The vocal tract sould be divided into four regions, in which the wave equations have different forms for approximation. In light of Matched Asymptotic Expansions, the global solution can be obtained by ‘gluing’ the local solutions together (Hirschberg & Kergomardh 1995). The four regions are (1) the rear cavity, (2) the compact region at the articulation point, (3) the front cavity as a Helmholtz resonator, and (4) the compact region at the mouth opening. The rear cavity is approximated as a succession of cones, where the acoustic field is governed by the Webster equation for He<<1. At the articulation point and at the mouth opening, the incompressible air is approximated as a piston. The front cavity is a Helmholtz resonator with a short neck.

If the transfer function of a Sygyt singer’s vocal tract does not predict the small bandwidth of the second formant, one should consider the possible effect of acoustic feedback to the glottal source (Levin and Edgerton 1999). This may be related to the nonlinear effect of the adducted ventricular folds.

CHEN-GIA TSAI : Physical Modelling of the vocal tract of a Sygyt singer

CHEN-GIA TSAI : Perception of Overtone Singing , TAIWAN

Standard

Perception of Overtone Singing : Chen-Gia Tsai

Pitch strength

Voices of overtone-singing differ from normal voices in having a sharp formant Fk (k denotes Kh??mei), which elicits the melody pitch fk = nf0. For normal voices, the bandwidths of formants are always so large that the formants merely contribute to the perception of timbre. For overtone-singing voices, the sharp formant Fk can contribute to the perception of pitch.

A pitch model based on autocorrelation analysis predicts that the strength of fk increases as the bandwidth of Fk decreases. Fig. 1 compares the spectra and autocorrelation functions of three synthesized single-formant vowels with the same fundamental frequency f0 = 150 Hz and formant frequency 9f0. In the autocorrelation functions the height of the peak at 1/9f0, which represents the pitch strength of 9f0, increases as the the formant bandwidth decreases. Fig. 1 suggests that the pitch of fk is audible once the strongest harmonic is larger than the adjacent harmonics by 10 dB.



Figure 1: Spectra (left) and autocorrelation functions (right) of three single-formant vowels. Stream segregation

Next to the bandwidth of Fk, the musical context also plays a role in the perception of fk. During a performance of overtone-singing, the low pitch of f0 is always held constant. When fk moves up and down, the pitch sensation of f0 may be suppressed by the preceding f0 and listeners become indifferent to it. On the contrary, if f0 and fk change simultaneously, listeners tend to hear the pitch contour of f0, while the stream of fk may be more difficult to trace.

The multi-pitch effect in overtone-singing highlights a limitation of auditory scene analysis, by which the components radiated by the same object should be grouped and perceived as a single entity. Stream segregation occurs in the quasi-periodic voices of overtone-singing through the segregation/grouping mechanism based on pitch. This may explain that overtone-singing always sounds extraordinary when we first hear it.

Perception of rapid fluctuations

Tuvans employ a range of vocalizations to imitate natural sounds. Such singing voices (e.g., Ezengileer and Borbannadir) are characterized by rapid spectral fluctuations, evoking the sensation of rhythm, timbre vibrato or trill.

CHEN-GIA TSAI : articles on Overtone Singing , TAIWAN

Standard

TSAI Chen-Gia : Overtone Singing

Overtone Singing
Chen-Gia Tsai
* Perception of overtone singing
* Physical modeling of the vocal tract of a Sygyt singer
* False vocal fold surface waves during Sygyt singing: A hypothesis
* Kargyraa and meditation

The voice of overtone singing is characterized by a prominent formant. In this spectrum of a sound produced by a Taiwanese overtone singer, the 10th harmonic is stronger than its flanking components by more than 25 dB. It is not fully understood how the formant becomes so sharp.
Introduction
Overtone singing, also known as throat singing, is a vocal technique found in Central Asian cultures, by which one singer produces two pitches simultaneously. When listening to the performance, a high pitch of n*f0 can be perceived along with a low drone pitch of f0.

References
Adachi, S., and Yamada, M. (1999). An acoustical study of sound production in biphonic singing, Xoomij. J. Acoust. Soc. Am. 105(5), 2920-2932.

Bloothooft, G., Bringmann, E., Capellen, M., Luipen, J., Thomassen, K. (1992). Acoustics and perception of overtone singing. J. Acoust. Soc. Am. 92(4), 1827-836.

Chernov, B. and Maslov, V. (1987). Larynx double sound generator. Proc. XI Congress of Phonetic Sciences, Tallinn 6, 40-43.

Fletcher, N.H., and Rossing, T.D. (1991). The Physics of Musical Instruments. Springer-Verlag.

Hirschberg, A., and Kergomard, J. (1995). Aerodynamics of wind instruments. In: Mechanics of Musical Instruments. Springer-Verlag, 291-369.

Kob, M. (2002). Physical Modeling of Singing Voice. Dissertation, University of Technology Aachen, Logos Berlin.

Levin, T.C., and Edgerton, M.E. (1999). The throat singers of tuva. Scientific American. Sep-1999, 80-87.

Lindestad, P.A., Sodersten, M., Merker, B., Granqvist, S. (2001). Voice source characteristics in Mongolian “throat singing” studied with high-speed imaging technique, acoustic spectra, and inverse filtering. J. Voice 15(1), 78-85.

MacDonald, A.W., Cohen, J.D., Stenger, V.A., and Carter, C.S. (2000). Dissociating the role of dorsolateral prefrontal cortex and anterior cingulate cortex in cognitive control. Science 288, 1835-1837.

Pagneux, V., Amir, N., and Kergomard, J. (1996). A study of wave propagation in varying cross-section waveguides by modal decomposition. Part I. Theory and validation. J. Acoust. Soc. Am. 100, 2034-2048.

Tsai, C.G. (2001). Physical foundations of overtone-singing. Science Monthly 375, 209-216. [in Chinese]
Links

* http://www.avantart.com/postcards/etuva.html
http://www.acoustics.org/press/144th/Sakakibara.htm
* http://www.scs-intl.com/cgi-bin/webzonetuva/zone.cgi?list