Voice-recognition system

From ScenarioThinking
Jump to navigation Jump to search

This page is being edited by Jetske Tamboezer EMBA09. In case of any questions/remarks contact me.

Description:

When considering voice recognition, firstly a distinction should be made between Speaker Recognition and Speech Recognition. The difference is that Speaker Recognition means that a digital device (such as a computer, PDA or mobile phone) recognises the person who is speaking to it, thus recognising the sound of a persons voice and being able to identify it and Speech Recognition is recognising what is being said, thus translating spoken words into written words.

To differentiate further, 2 subparts are recognised in Speaker Recognition; Speaker Verification (1 voice is compared to 1 template of that voice) and Speaker Authentication (1 voice compared to N templates of stored voices).

The term "voice recognition" is often used for the combination of Speaker Recognition and Speech Recognition. This is when the recognition system is trained to a particular speaker - as is the case for most desktop recognition software, hence there is an aspect of speaker recognition, which attempts to identify the person speaking, to better recognise what is being said.

Various technologies have been developed to process and store voice prints. They include frequency estimation, hidden Markov models, Gaussian mixture models, pattern matching algorithms, neural networks, matrix representation and decision trees. Ambient noise levels can impede both collection of the initial and subsequent voice samples. Noise reduction algorithms can be employed to improve accuracy, but incorrect application can have the opposite effect.

Nowadays, many computer software programs have built-in voice recognition software, which will turn spoken word into written words. These often require a person to pre-read a certain amount of text (this process is called enrollment) to be able to subsequently recognize that specific voice.

Speaker recognition is used a s a way of identifying an individual. Examples of uses are; to gain access to a secure system, to identify individuals on recordings in detective/police work,

Speech recognition has a wide range of possible uses. Some of which are:

  • Health care; reduce the amount of medical transcriptionists and enable search options in Electronic Medical Records
  • Military; reducing pilot workload by using voice commands for certain tasks, as a training aid when training air traffic controllers
  • Aiding people with disabilities; assist people with limited use of their hands (also RSI-sufferers); enable deaf telephony; assist people with learning disabilities (problems with thought-to-paper communication)
  • Automatic translation
  • Robotics
  • Telecommunications and Video-games

Enablers:

Factors which strengthen this driving force. (these are actually other driving forces, and you can link to them in the wiki!)

1. Aging people

2. Deteriorating security, especially for children

3. Globalization

4. Broadbandization

5. Popularity of car navigation system

Inhibitors:

Factors which weaken this driving force. (these are actually other driving forces, and you can link to them in the wiki!)

1. Technical difficulties (Phonological recognition, extraction of intent, etc)

Paradigms:

Users of internet will exponentially increase because anyone in any age group with any skill of computer can utilize internet.

Use of internet will exponentially increase when people can access and retrieve information through only voice.

By combining voice recognition system and translation system, people can access all the necessary information in any part of the world from anywhere.

Information available in website will exponentially increase because all spoken languages can be directly in storage in web space.

Experts:

Sources for additional information about this driving force. (if you have found people, put the links to them)

William S. Meisel, Ph.D., president of TMA Associates

Prof. Hiroaki Sakoe, Dept. of Intelligent Systems, Kyushu University

Timing:

1952 Bell Communications Research started to investigate speech recognition with zero crossing

1959 Kyoto University, Japan, developed “speech-recognition typewriter” utilizing the technology Bell Communication research developed.

1970s Russia and Japan simultaneously developed DP matching method, which normalizes utterance time length by using dynamic programming

1990s Defense Advanced Research Projects Agency, U.S, started dictation program for speech recognition, which realized Q&A voice recognition system by n-Gram method

Web Resources:

http://www.tmaa.com/