Budapest University of Technology and Economics Department of Telecommunications and Media Informatics Home send e-mail
Back to the first page



Scientific and technical results


Psychological Status Monitoring by Computerised Analysis of Language phenomena (COALA-Phonetics) EUROPEAN SPACE AGENCY - No. 4000 I 08003/1 3A{L/KML

The examination of the sensitivity of acoustic-phonetic parameters of speech to hypoxia and to Seasonal Affective Disorder (SAD) and the development of a metric that alert crews at early stage of cognitive dysfunction (Automatic detection) using Concordia Antarctic Station as Human Exploration Analogue.

For more details, click here!

Speech prosody

In the Laboratory of Speech Acoustics, a HMM-based automatic phonological phrase segmenter has been developed, which is easily adaptable for fixed-stress languages (i.e. Hungarian and Finnish). This tool has already been used for prosodic analysis and segmentation, word-boundary detection in ASR and - in a modified version - for automatic sentence type detection.

Current research focuses on the adaptation for English, French and German versions, the integration of duration and probabilistic features, linking prosody and syntax in speech, extraction of syntactic layering and partial syntactic analysis from speech, automatically separate syntactically motivated and non-motivated prominence/salience, and, as this latter is closely related to semantics and pragmantics, assess some prosody based semantic and pargamtic information extraction from speech.

Developement of Prosody Teaching Software

Correctness of our speech in necessary in everyday communication. Wrong intonation and rhythm not only influences the proper speech but can cause differences in semantic meanings. Therefore successful teaching of pronunciation is a very important factor, especially in the case of subjects – children – with hearing impairments. The learning of correct intonation and rhythm is a hard task for them. The Prosody Teaching Software helps them with visual display and learning tasks adapted to children. The automatic evaluation of intonation and rhythm is driven by speech recognition engine and robust pitch calculating methods. The visual and numerical feedback of the results makes the children able to practice by themselves. The software is used in everyday practice in the Dr. Török Béla Special Vocational School.

Clinical Depression in speech

Depression is a state of low mood and aversion to activity that can affect a person's thoughts, behavior, feelings and sense of well-being. Depressed people can feel sad, anxious, empty, hopeless, helpless, worthless, guilty, irritable or restless. They may lose interest in activities that were once pleasurable, experience loss of appetite or overeating, have problems concentrating, remembering details or making decisions, and may contemplate, attempt or commit suicide. Insomnia, excessive sleeping, fatigue, aches, pains, digestive problems or reduced energy may also be present.

Depression is a feature of some psychiatric syndromes such as major depressive disorder but it may also be a normal reaction to life events such as bereavement, a symptom of some bodily ailments or a side effect of some drugs and medical treatments.

A number of psychiatric syndromes feature depressed mood as a main symptom. The mood disorders are a group of disorders considered to be primary disturbances of mood. These include major depressive disorder (MDD; commonly called major depression or clinical depression) where a person has at least two weeks of depressed mood or a loss of interest or pleasure in nearly all activities; and dysthymia, a state of chronic depressed mood, the symptoms of which do not meet the severity of a major depressive episode. Another mood disorder, bipolar disorder, features one or more episodes of abnormally elevated mood, cognition and energy levels, but may also involve one or more episodes of depression. When the course of depressive episodes follows a seasonal pattern, the disorder (major depressive disorder, bipolar disorder, etc.) may be described as a seasonal affective disorder.

The main goal of our research is to recognize how depression modifies human speech. Physicians often use the indicators “faded”, “slow”, ”monotonous”, “lifeless”, and “metallic” as properties of depressed speech. Our goal is to identify the acoustic-phonetic parameters, separately in segmental and supra-segmental level, that can characterize the speech of depressed people.

Development of Automatic Pathological Speech Recognition and Classification System

It is well known that different disorders and malfunctions in human voice production cause detectable changes in the acoustic parameters of the speech. The main goal of the research is to identify and define protocols which provides an automatic detection of these acoustic parameter changes and differentiate whether it has neurological (Functional Dysphonia, Recurrent Paresis etc.) or morphological (Vocal tract, throat or tongue Cancer) root causes. Furthermore we aim to design and develop a Medical Decision Support System that can be used by Medical Doctors and Specialists to collect, diagnose and provide early detection of vocal tract diseases in order to increase the prevention.

For more details, click here!

Emotions in speech

Huge effort has been taken in the last decade to understand the operation of the verbal channel of speech. Research into the non-verbal channel has been smaller so far, and its operation is still less understood. Tone, modulation and rhythm changes can be expressed besides semantic content with human speech. These are also appropriate to express the emotional intent, health, mood and speech style of the speaker. Emotion helps to inform us better, even if it is not expressed in words. We examined expression of emotions, automatic classification without taking semantic content into account. The statistical results of investigation of spectral and prosodic acoustic parameters revealed a basis for automatic recognition possibilities. Automatic experiments were carried out in order to classify speech segments into four basic emotion categories with 80% performance. With machine learning algorithms fused with an automatic speech segmentation system we created an emotion recognition engine that can be used in call centers, human-machine interactions.

Influence of alcohol on acoustic parameters of speech

The effect of alcohol to human speech has been known for many years. These changes include both the factors of content and acoustic parameters of speech. Alcohol serves as a depressant to the central nervous system, thus has salient effect on speech being a degradation of peripheral speech motor control. The effects appear in the physical parameters of speech that can be detected with signal processing methods. In the Laboratory of Speech Acoustics we examine the speech of subjects with different blood-alcohol concentrations. We look for acoustic cues that show significant differences in the case of subjects who consume alcohol and who don’t. Automatic classification methods are developed that, based on a subject’s voice, can make a decision if the given person had consumed alcohol or not.

Multilingual speech-controlled telematics framework for personalised automotive services.

NKFP072-TELEAUTO (2008-2010)

The aim of the PACS (people assisted computer system) concept is to make it possible for the users to ask questions in a natural language, and get answers based on their geographical location. The process handles data and voice connection in a unified manner. Conversation begins with the user's request and ends with the downloading of the desired data. The questions are interpreted by a speech recognition framework. Should it fail, or be unable to serve the user's request, a human operator takes over its role.

Ygomi Europe Kft., as the leader of the consortium is responsible for the elaboration of the whole project. The company takes an active part in the work of international standardisation associations. Ygomi participates in the work of Ertico, Europe's ITS (intelligent traffic systems) organization, which finances researches and defines industrial standards in the field of intelligent transportation.
ROC Development Hungary Kft. has had many years of experience in developing call centres.
Connexis Kft. has great experience in the field of automotive software development, telecommunication, especially the development of automotive communication protocols.
BME Laboratory of Speech Technology

Child-Computer Interaction Techniques: Exploring the Prosody.

TéT - Balaton project (2008-2009)

One of the primary aims of the project is to improve computer assisted speech learning methods for handicapped children (hard of hearing and autistic children) by the introduction of prosody models. These models will be firstly used for the development of the prosody of the speakers. Prosody plays an important role in communication. Prosodic cues help the human to understand the messages through a conversation, thus the correct pronunciation of the different prosodic cues are very important. Moreover, we will use prosody for the determination of the emotional state of the speaker which can be take into account in the interface.

The Institute of Intelligent Systems and Robotics ISIR CNRS FRE2507 (France) has wide experience in different classification technics in speech technology. Moreover, the institute is leading several projects involving normal and autistic children.

Cross-Modal Analysis of Verbal and Non-verbal Communication

COST Action 2102 (2006-2011)

The main objective of the Action is to develop an advanced acoustical, perceptual and psychological analysis of verbal and non-verbal communication signals originating in spontaneous face-to-face interaction, in order to identify algorithms and automatic procedures capable of identifying the human emotional states. Several key aspects will be considered, such as the integration of the developed algorithms and procedures for application in telecommunication, and for the recognition of emotional states, gestures, speech and facial expressions, in anticipation of the implementation of intelligent avatars and interactive dialogue systems that could be exploited to improve user access to future telecommunication services.
Link to COST homepage

Our Contribution

Partners: countries from Europe

Continuous speech recognition:

-  A development tool (MKBF 1.0) for constructing continuous speech recognizers has been created under Windows XP. The system is based on a statistical approach (HMM phoneme models, and bi-gram language models with non linear smoothing) and works in real time. The tool is able to construct a middle sized speech recognizer with a vocabulary of 1000-20000 words. New solutions have been developed for the acoustical pre-processing, for the statistical model building of phonemes, and in syntactic level [ ](IKTA, OTKA). (2004- )
For details in Hungarian, click here! or click on the picture below

A prosodic recogniser has been developed.A cross-lingual study for agglutinative, fixed stressed languages, like Hungarian and Finnish was prepared by this recognizer, about the segmentation of continuous speech on word level by examination of supra-segmental parameters.

We have developed different algorithms based either on a rule based or a data-driven approach. The best results were obtained by data-driven algorithms (HMM-based methods) using the time series of fundamental frequency and energy together. Word boundaries were marked with acceptable accuracy, even if we were unable to find all of them. On the base of this study a word level segmentationer has been developed which can indicate the word boundaries with acceptable precision for both languages

IKTA-00056/2003 project

For details in Hungarian, click here!

Statistical examination of languages:

- Search of optimal unit of CSR system.
- Construction of optimal sized teaching and testing material on the base of ture of the language concerned [19] (1999- )

-Examination of the Pronunciation Variation from hand-labelled corpora

Data driven method has been used for the examination of the pronunciation variation rules.

Systematic statistical analysis was prepared and the obtained results demonstrate how the pronunciation variation depends on the position and the connection of the sound. Pronunciation matrixes were constructed, presenting the probability of the occurences. Both word-internal and cross-word rules were examined separately and compared.

The examination was prepared based on the Hungarian speech databases, but the method is adaptable for other languages.[B3], [B4] (2003-,[](ITKA, OMFB)

-Rule based and statistical language model for Hungarian speech recognition (ITKA, OMFB)[],[](2004-

Database collection:

- Hungarian Reference Speech Database - Read speech database collection in office like environments for training and testing continuous speech recognition programs running in PC-s [ ](IKTA). (2004-2005)

- SpeechDat(E) - European project (1999- ) Speech database collection through telephone lines. A realistic base noth for the training and testing of the present-day teleservices and the training of real speaker independent recognizers. [D6][E6[E7]]
- BABEL - A multilingual speech database collection (1995-99) INCO-COPERNICUS project. Clear, read speech for general speech processing purposes. [B6],[15],[E1][E8],[E9]
- CHILDREN SPEECH data-base collection [D9], (1999- ). Part of the SPECO Copernicus program.

-MTBA, BESZTEL-Speech database collection through telephone lines and mobil telephones.[C3],[C16] (2000-2003)

Speech perception:

- The most relevant acoustical parameters for the perception of different speech-sounds were determined;
- the role of the forward and backward masking in speech perception were examined.
[1], [2], [3], [4], [5] (1977-1983).

Speech analysis:

- A simplified auditory model (Bark scale, masking, loudness involved) were developed, and speech sounds were analyzed by this model [7], [8]. (1983-1985)
- The bark scale representation, FFT (Hamming window, 512 point), and LPC (autocorrelation, order 24) were compared. (1986-1988)

Speech processing for speech handicapped:

- The "Speech Corrector" an audio-visual speech-pronunciation teaching system has been developed for speech handicapped people. The database of Corrector System for Hungarian and German language is finished. [16], [17], [20]. The system uses up the knowledge of speech recognition results. Demo of the program can be reached.
- We lead an international European Copernicus program, titled SPECO - A Multilingual pronunciation teaching and training method and a software system for hearing and speech handicapped children. [C17],[D4],[D5],[D7] [D8],[Đ10](1986-)

- Interactive hearing and speech perception theraphy through the internet (item 39)

The proposed hearing and speech perception training program will help the education of hearing impaired children by didactically well-constructed and playful exercises using specially constructed databases. Poor hearing abilities are stimulated by different environmental sounds and special speech examples. In this way we develop the ability of the perception and distinction of different sounds and moreover we train the speech understanding and combination skills of children. The program will help the rehabilitation of children with cochlear implant, too. The program will be available through the Internet for everybody free of charge.[[D1][E2](2003)

Speech enhancement:

- Some noise reduction methods were developed for speech recognizers [12],[E18][E19](1988-1990)

Isolated word recognition:

- Speaker dependent small vocabulary system (max. 80 words) stand alone [6],[E16];
- Speaker dependent medium vocabulary system (some hundred words) for IBM PC-s [9].
- Speaker independent isolated speech recognizer. Recognize numbers, short instructions throught telephone line, and in different sound fields (1989 -)

Dialogues systems

- Adaptation of well known systems to the Hungarian language and to the Hungarian habits (1998 -)

Annotation and segmentation:

- Automatic speech segmentation on phonetic, sub-phonetic level of continuous speech, automatic labelling (1997- )
- A language independent automatic segmentation and labelling technique is done for training speech recognizers, respectively to collect data-base  [D11,22]. (1990- )