Difference between revisions of "Voice-recognition system"

From ScenarioThinking
Jump to navigation Jump to search
 
 
(49 intermediate revisions by 6 users not shown)
Line 1: Line 1:
==Description:==
==Description:==
Speech recognition technology and natural-language-processing technology have long been studied as input-and-output technology for electronics. However, use of speech recognition technology is still limited because the system can only recognize speech with clear and slow pronunciation. Regrettably, present technology cannot recognize conversation among multiple people or naturally spoken conversation. Many other conventional interfaces of information machines and equipment require a certain amount of mastery. Until now, there is no established interface with which people can communicate with other people easily in a natural form.  
When considering voice recognition, firstly a distinction should be made between <i>Speaker Recognition</i> and  
With the wide speared of internet, many information devices like personal computers, mobile phones or PDAs are becoming widely used. A development of user-friendly information device, which can be used by anyone, anywhere and easily, is required for not only those who are good at computers but also those who are beginners and elderly people. As one of the basic technology to realize the information device, development of advanced speech recognition technology and natural-language-processing technology is widely brought to attention.
<i>Speech Recognition</i>. The difference is that Speaker Recognition means that a digital device (such as a computer, PDA or mobile phone) recognises the person who is speaking to it, thus recognising the sound of a persons voice and being able to identify it and Speech Recognition is recognising what is being said, thus translating spoken words into written words.
 
To differentiate further, 2 subparts are recognised in Speaker Recognition; <i>Speaker Verification</i> (1 voice is compared to 1 template of that voice) and <i>Speaker Authentication</i> (1 voice compared to N templates of stored voices).
 
The term <i>Voice recognition</i> is often used for the combination of Speaker Recognition and Speech Recognition. This is when the recognition system is trained to a particular speaker - as is the case for most desktop recognition software, hence there is an aspect of speaker recognition, which attempts to identify the person speaking, to better recognise what is being said.
 
Various technologies have been developed to process and store voice prints. They  include frequency estimation, hidden Markov models, Gaussian mixture models, pattern matching algorithms, neural networks, matrix representation and decision trees.  Ambient noise levels can impede both collection of the initial and subsequent voice samples. Noise reduction algorithms can be employed to improve accuracy, but incorrect application can have the opposite effect.
 
Nowadays, many computer software programs have built-in voice recognition software, which will turn spoken word into written words. These often require a person to pre-read a certain amount of text (this process is called enrollment) to be able to subsequently recognize that specific voice.
 
Speaker recognition is used a s a way of identifying an individual. Examples of uses are; to gain access to a secure system, to identify individuals on recordings in detective/police work,
 
Speech recognition has a wide range of possible uses. Some of which are:
* Health care; reduce the amount of medical transcriptionists and enable search options in Electronic Medical Records
* Military; reducing pilot workload by using voice commands for certain tasks, as a training aid when training air traffic controllers
* Aiding people with disabilities; assist people with limited use of their hands (also RSI-sufferers); enable deaf telephony; assist people with learning disabilities (problems with thought-to-paper communication)
* Automatic translation
* Robotics
* Telecommunications and Video-games
* Internet-based services


==Enablers:==
==Enablers:==
Factors which strengthen this driving force. (these are actually other driving forces, and you can link to them in the wiki!)
* Technological improvement of voice-recognition software
1. Aging people
* Increased capacity of computers
2. Deteriorating security, especially for children
* Illiteracy; increased need for computer applications for illiterate people
3. Globalization
* Increased attention for needs of people with disabilities
4. Broadbandization
* Increased availability of computers worldwide
5. Popularity of car navigation system


==Inhibitors:==
==Inhibitors:==
Factors which weaken this driving force. (these are actually other driving forces, and you can link to them in the wiki!)
* The cost of developing voice recognition software
1. Technical difficulties (Phonological recognition, extraction of intent, etc)
* Other biometric technologies (such as fingerprinting, iris-scans)


==Paradigms:==
==Paradigms:==
Users of internet will exponentially increase because anyone in any age group with any skill of computer can utilize internet.
In the future voice recognition could make other forms of identification obsolete. It will surpass other forms of biometrics, such as fingerprinting and iris scans, as the technology is simpler.
Use of internet will exponentially increase when people can access and retrieve information through only voice.
 
By combining voice recognition system and translation system, people can access all the necessary information in any part of the world from anywhere.
Increased use of computers and internet will speed up the development of voice-recognition software.
Information available in website will exponentially increase because all spoken languages can be directly in storage in web space.
 
Voice-recognition could make the keyboard and mouse obsolete.
 
Voice-recognition can be very helpful to various services in regions where there is a high illiteracy rate, as users do not have to type to activate the services. This could be used in banking, healthcare, public administrations etc.


==Experts:==
==Experts:==
Sources for additional information about this driving force. (if you have found people, put the links to them)
[http://www.tmaa.com/aboutbillmeisel.html William S. Meisel, Ph.D., president of TMA Associates]<br>
Prof. Hiroaki Sakoe, Dept. of Intelligent Systems, Kyushu University.
[http://ttic.uchicago.edu/~klivescu/ Karen Livescu Ph.D., assistant professor Toyota Technical Institute Chicago]<br>
William S. Meisel, Ph.D., president of TMA Associates
[http://groups.csail.mit.edu/sls/people/glass.shtml James R. Glass Ph.D., Principal Research Scientist at CSAIL]


==Timing:==
==Timing:==
1952 Bell Communications Research started to investigate speech recognition with zero crossing
1870 This technology really began with Alexander Graham Bell's inventions in the 1870s. <br><br>
1959 Kyoto University, Japan, developed “speech-recognition typewriter” utilizing the technology Bell Communication research developed.  
1952 Bell Communications Research started to investigate speech recognition with zero crossing<br><br>
1970s Russia and Japan simultaneously developed DP matching method, which normalizes utterance time length by using dynamic programming
1959 Kyoto University, Japan, developed “speech-recognition typewriter” utilizing the technology Bell Communication research developed. <br><br>
1964 IBM presents an early Speech recognition device, the IBM Shoebox, at the New York's World Fair<br><br>
1970s Russia and Japan simultaneously developed DP matching method, which normalizes utterance time length by using dynamic programming<br><br>
1980s Two distinct types of commercial products were available. The first offered speaker-independent recognition of small vocabularies. It was most useful for telephone transaction processing. The second, offered by Kurzweil Applied Intelligence, Dragon Systems, and IBM, focused on the development of large-vocabulary voice recognition systems so that text documents could be created by voice dictation.<br><br>
1990s Defense Advanced Research Projects Agency, U.S, started dictation program for speech recognition, which realized Q&A voice recognition system by n-Gram method
1990s Defense Advanced Research Projects Agency, U.S, started dictation program for speech recognition, which realized Q&A voice recognition system by n-Gram method
<br>


==Web Resources:==
==Web Resources:==
* http://www.tmaa.com/
* http://www.recognitiontechnologies.com/book.html
* http://en.wikipedia.org/wiki/Speaker_recognition
* http://www.spokenproof.com/
* http://cslu.cse.ogi.edu/HLTsurvey/ch1node9.html
* http://en.wikipedia.org/wiki/Speech_recognition
* http://www.abilitynet.org.uk/content/factsheets/pdfs/Voice%20Recognition%20Software%20-%20An%20Introduction.pdf
* http://www.speaker-recognition.org/
* http://findarticles.com/p/articles/mi_qa3937/is_200401/ai_n9383074/
* http://groups.csail.mit.edu/sls/sls-blue-noflash.shtml

Latest revision as of 06:28, 6 September 2011

Description:

When considering voice recognition, firstly a distinction should be made between Speaker Recognition and Speech Recognition. The difference is that Speaker Recognition means that a digital device (such as a computer, PDA or mobile phone) recognises the person who is speaking to it, thus recognising the sound of a persons voice and being able to identify it and Speech Recognition is recognising what is being said, thus translating spoken words into written words.

To differentiate further, 2 subparts are recognised in Speaker Recognition; Speaker Verification (1 voice is compared to 1 template of that voice) and Speaker Authentication (1 voice compared to N templates of stored voices).

The term Voice recognition is often used for the combination of Speaker Recognition and Speech Recognition. This is when the recognition system is trained to a particular speaker - as is the case for most desktop recognition software, hence there is an aspect of speaker recognition, which attempts to identify the person speaking, to better recognise what is being said.

Various technologies have been developed to process and store voice prints. They include frequency estimation, hidden Markov models, Gaussian mixture models, pattern matching algorithms, neural networks, matrix representation and decision trees. Ambient noise levels can impede both collection of the initial and subsequent voice samples. Noise reduction algorithms can be employed to improve accuracy, but incorrect application can have the opposite effect.

Nowadays, many computer software programs have built-in voice recognition software, which will turn spoken word into written words. These often require a person to pre-read a certain amount of text (this process is called enrollment) to be able to subsequently recognize that specific voice.

Speaker recognition is used a s a way of identifying an individual. Examples of uses are; to gain access to a secure system, to identify individuals on recordings in detective/police work,

Speech recognition has a wide range of possible uses. Some of which are:

  • Health care; reduce the amount of medical transcriptionists and enable search options in Electronic Medical Records
  • Military; reducing pilot workload by using voice commands for certain tasks, as a training aid when training air traffic controllers
  • Aiding people with disabilities; assist people with limited use of their hands (also RSI-sufferers); enable deaf telephony; assist people with learning disabilities (problems with thought-to-paper communication)
  • Automatic translation
  • Robotics
  • Telecommunications and Video-games
  • Internet-based services

Enablers:

  • Technological improvement of voice-recognition software
  • Increased capacity of computers
  • Illiteracy; increased need for computer applications for illiterate people
  • Increased attention for needs of people with disabilities
  • Increased availability of computers worldwide

Inhibitors:

  • The cost of developing voice recognition software
  • Other biometric technologies (such as fingerprinting, iris-scans)

Paradigms:

In the future voice recognition could make other forms of identification obsolete. It will surpass other forms of biometrics, such as fingerprinting and iris scans, as the technology is simpler.

Increased use of computers and internet will speed up the development of voice-recognition software.

Voice-recognition could make the keyboard and mouse obsolete.

Voice-recognition can be very helpful to various services in regions where there is a high illiteracy rate, as users do not have to type to activate the services. This could be used in banking, healthcare, public administrations etc.

Experts:

William S. Meisel, Ph.D., president of TMA Associates
Karen Livescu Ph.D., assistant professor Toyota Technical Institute Chicago
James R. Glass Ph.D., Principal Research Scientist at CSAIL

Timing:

1870 This technology really began with Alexander Graham Bell's inventions in the 1870s.

1952 Bell Communications Research started to investigate speech recognition with zero crossing

1959 Kyoto University, Japan, developed “speech-recognition typewriter” utilizing the technology Bell Communication research developed.

1964 IBM presents an early Speech recognition device, the IBM Shoebox, at the New York's World Fair

1970s Russia and Japan simultaneously developed DP matching method, which normalizes utterance time length by using dynamic programming

1980s Two distinct types of commercial products were available. The first offered speaker-independent recognition of small vocabularies. It was most useful for telephone transaction processing. The second, offered by Kurzweil Applied Intelligence, Dragon Systems, and IBM, focused on the development of large-vocabulary voice recognition systems so that text documents could be created by voice dictation.

1990s Defense Advanced Research Projects Agency, U.S, started dictation program for speech recognition, which realized Q&A voice recognition system by n-Gram method

Web Resources: