hex23.gif
hex38.gif
hex32.gif
hex26.gif
hex35.gif
hex29.gif

Budapest University of Technology and Economics Department of Telecommunications and Media Informatics Home send e-mail
Back to the first page
 


 

 
 

Hungarian Speech Databases


 

Hungarian speech database for creation of voice driven teleservices

 

  Hungarian project coordinator: Klara Vicsi

 
 

Here we present the project for the creation of the fixed line telephone voices based Hungarian speech database.

The work is embedded in the SpeechDat-E project, which is the extension of the Language Engineering Project LE-4001 SpeechDat to the Eastern European languages. The goal of the project is collecting speech database via fixed network phones, in which all official European languages and some major dialectal variants are represented. This database could provide a realistic base both for the trainig and testing of the present-day teleservices, and - because of the phonetically richness - the training of real speaker independent recognizers.

The database contains records based on the definition in SpeechDat(II) for the dialectical, age and sex balance and vocabulary. During the planning of the corpus, we should take into consideration not only the variety of the dialectical aspects, but the special characteristics of Hungarian language too. Since the Hungarian is an agglutinative language, we need to create a larger vocabulary in some categories, than it is mandatory. We try to pay an extra attention to the topic 'phonetically rich sentences and words', to create a phonetically well balanced speech database for text independent speech recognizers. A detailed statistical analysis was prepared to examine the statistics of phonemes, diphones, triphones and syllables.

The voice of 1000 speakers have been recorded from all over the the country, which provided the balanced distribution of the dialects. In the organization of the speakers the Hungarian Railway Company (MÁV Rt.), the MATÁV Rt. Telecommunication Company and several schools and universities help us.

The speakers has to read a given text material into the phone.

After recording we prepare the so called annotation process. This means that we listen to every recorded speech, and create label files containing information about the speaker and the speech according to the database definitions. This is done with a special software called A_TOOL, which was written at our laboratory.  The source code (Delphi) of A_TOOL is public domain, you can download it here (you will need a zip-file decompression program to unzip it).

For more details please visit the official web-site of this project:


http://www.fee.vutbr.cz/SPEECHDAT-E/