Singing Synthesis

Singing synthesis involves different techniques to speech synthesis. The main reason for this is that in singing, about 95% of the time is spent sounding vowels, whereas in speech this is nearer 65%. Vowels are the most important part of singing, and it is possible to perform music using vowels alone, with consonants adding the final touches. Vowels and consonants can be likened to an orchestra and percussion. With just the orchestra, many pieces can be made, but the best will often involve the percussion as well. In contrast, very few pieces are made for unaccompanied percussion (the only piece I know of is Stonewave by Wallin, 1990 - whereas I know many for orchestra without percussion.).

The difficulties in producing accurate singing synthesis are fairly evident, and stem from the huge variety of techniques needed to simulate singing: vibrato, glissando, dynamics, breath, the change of 'gear' when going into a higher or lower vocal range etc etc...

There are various techniques for synthesising singing - the one used for the examples in this website is FOF, using CSound. CSound is a programming language to produce music, where the algorithms to create sound are entered at a fairly low level. This means that a lot has to be entered to get the most basic of sounds, but that with enough code, any sound at all can be produced - and this lack of limits is useful.

Lyricos is a system that synthesises voice from a MIDI input, and the developers, Macon and Clements, are now making Flinger. Lyricos takes a score input with phonetic lyrics, along with performance directions such as vibrato, using standard MIDI-compatable software. It then performs waveform synthesis and uses a bank of pre-recorded phonemes to produce the final sound, which does sound quite natural, with many aspects of the complexity of singing synthesis taken into account.

Back - Speech Synthesis Contents
Forward - Vowels