Literature review

The purpose of the STARDUST (Speech Recognition as a Control Method for People with Severe Dysarthria) project is to develop an effective speech recogniser for people with severe Dysarthria.  This being defined by a less than 30% intelligibility score as measured by the Assessment of Intelligibility of Dysarthric Speech tool[SJB1] (Yorkston and Beukelman, 1981) .  Having developed the recogniser various assistive technologies can then be controlled by the user’s voice.  The purpose of this literature review is to identify:

§         trials where speech recognition has been used with speech disorders; with particular reference to dysarthria.

§         appropriate speech recognition techniques.

§         the use of technology in speech therapy.

§         the level of consistency in dysarthric speech.

 

Methodology

In order to identify the appropriate papers various techniques have been employed, these consist of:

1.      Search of research databases: Cochrane, MEDLINE 1997 – October week 3 2000,

2.      Search databases DARE, Health and Technology Assessment (HTA) and NRR

3.      Search of ‘Grey’ literature: Search engines of AltraVista and Yahoo

4.      Journal evaluation form 1995 to October 2000: Journal of Speech and Hearing Research, Journal of Voice, Journal of Communication disorders, Journal of Speech communication, Journal of speech and language, and the Journal of Augmentative and Alternative Communication (ISAAC)

 

Research Databases

In order to address the aims of the literature review the identified databases were searched using Boolean selection criteria:

 

(voice recognition OR speech recognition OR ASR OR speech recogniser OR computer-assisted instruction) AND (speech disorder OR voice disorder OR communication disorder OR Dysarthria OR speech therapy)

 

(speech disorder OR voice disorder OR communication disorder OR Dysarthria OR speech therapy) AND (speech intelligibility OR voice intelligibility OR speech consistency OR voice consistency OR speech-language pathology OR speech variability or voice variability)

 

Speech therapy AND (technology OR ASR OR recogniser)

 

The Internet search engines were also investigated using these or similar queries, while specific searches on dysarthria, automatic speech recognition, ASR, speech recognisers, voice recognisers, speech variability and speech consistency were also performed.  The journal articles were identified on the basis of possible relevance to the project based on the title and abstract.

 

Literature review results

Dysarthria is a speech disorder from a neurological origin[SJB2] (Milloy, 1991)  and Enderby 1995 et al suggest that it affects 170 per 100,000[SJB3]  people.  It has been defined as[SJB4]  (Styba, 2000):

 

“a speech disorder that is due to a weakness or incordination of the speech muscles.  Speech is slow weak, imprecise or uncoordinated and can affect children and adults.”

 

Darley[SJB5] (1975)  et al has identified 6 types of dysarthria and these are reproduced in Table 1.

 

Table 1: Clinically recognised types of dysarthria

Dysarthria Type

Lesion site

Flaccid dysarthria

Lower motor neurons

Spastic dysarthria

Upper motor neurons

Hypokinetic dysarthria

Basal ganglia and associated brainstem nuclei

Hyperkinetic dysarthria

Basal ganglia and associated brainstem nuclei

Ataxic dysarthria

Cerebellum and/or its connections

Mixed dysarthria e.g.

      Mixed-flaccid-spastic dysarthria

Both lower and upper motor neurons

 

Milloy (1991) suggests that in the UK the two main tests for the presence of dysarthria are[SJB6]  the Frenchay Dysarthria Assessment (Enderby, 1983) [SJB7] and Robertson’s Dysarthria Profile[SJB8] (Robertson, 1982) .  The intelligibility of speech may be defined by the Yorkston and Beukelman[SB9]  assessment 1981, or the aforementioned Frenchay Assessment[SJB10] .  Yorkston and Beukelman[SB11] , 1980 define intelligibility as the accuracy with which a message is conveyed.  However, intelligibility is one aspect of a dysarthric persons ability to communicate.  Other aspects include, ‘naturalness’, ‘acceptability’ and ‘bizarreness’ (Kent et al[SB12] , 1989).  However, it may be that intelligibility of speech is the key factor as this has the greatest impact on the ability to communicate.  As Peterson and Marquardt[SB13] , 1981 identify, if speech is distorted, but of a consistent manner, then it may be intelligible because of the predictability of the errors.

 

To improve intelligibility Yorkston et al 1992[SB14]  suggest that the rate of which speed is produced should be reduced.  However it is unknown whether the improved intelligibility is due to an improvement to achieve accurate articulatory targets or because it allows more processing time for the user, which increases their perception of increased intelligibility[SB15](Pilon et al, 1998) .

 

In the Frenchay assessment intelligibility is evaluated on single words, sentence and conversation (Enderby, 1983[SB16] ).  However, despite this assessment being widely used Kent et al [SB17] 1989 suggested that the ten random words selected from a pool of fifty are not equivalent and therefore contributes to variability in estimated levels of intelligibility.  The Yorkston and Beukelman assessment was adapted in the Frenchay test and is therefore similar, however they included the speed of speech as a contributor to intelligibility.  They suggested that faster speech was more intelligible, but while this may be true of severe speech, which tends to be slow, it is questionable with more competent speech (Kent et al, 1989[SB18] ).

 

Rosenbek et al (1978) points out, both of these tests are perceptive and as such questions have been raised regarding the validity of the assessment[SJB19] .  For example, Milloy (1991) comments that assessors often have experience of listening to dysarthric speech and can be inclined to rate intelligibility higher than it actually is[SJB20] .  Indeed, Kent et al[SB21] , 1989 suggests that a score of intelligibility alone is not an interpretable statement.  The conditions under which the results were obtained must be known as intelligibility can vary under a host of variables, including the listener, environment and speaker.  Doubt has also been cast over the appropriateness of the assessments to include words and sentences that are read.  Frearson 1985[SB22] , discovered that for dysarthric speakers read material was more intelligible than spontaneous speech.

 

Despite the possible inaccuracy of the perceptual assessments there is evidence to suggest that the intelligibility scores obtained from one speech and language therapist are reflective of another’s.  Beech et al 1993, in a review of one hundred and thirteen dysarthric subjects discovered that eight speech and language therapists with three hours of training of the Frenchay dysarthria assessment were able to consistently score subjects.  The widest range of scores was 15%, and the majority of the scores were grouped more closely[SJB23] .

 

Perceptual evaluations provide valuable information for diagnosing and interpreting dysarthria's however, Baken 1987 suggested that instrumental measurement offers significant advantages over unaided perceptual judgements[SJB24]  as the perceptive element of the assessment can be removed.  A variety of instruments have been developed to assess specific aspects of speech production and the particular instrument to use depends on the type of user defined in Table 1.  Various instruments enable the measurement of the muscle groups at each stage of the voice production process.  However, Gerrat et al have indicated that to date there is little use of these techniques and instruments in a clinical setting[SJB25] .

 

Automatic speech recognition (ASR)

Cohen (1984) comments that speech interfaces may be considered as ‘more natural’ when compared to other input mechanisms[SJB26] .  While Ferrier et al (1995) suggest that when communicating with other people speech is often the preferred method of communication even for people with poor speech intelligibility[SJB30]. 

 

For people with neurologic impairments they often suffer from speech difficulties and motor control problems (Kotler, 1997). [SJB27]  While Ferrier et al suggest that many people with motor limitations of the upper limbs also have dysarthria[SJB28]  and therefore due to these motor difficulties, speech would seem an appropriate medium to assist in communicating and controlling various assistive technologies.[SJB29]   

 

Perhaps one of the earliest examples of ASR was the voice-operated typewriter of 1969.  It used morse code to control a typewriter and also various technologies, such as a radio or lamp.  For example, ‘dit-dit-dah-dah’ would turn a radio on.  The main benefactors of this system were believed to be tetraplegics who have no movement from below the neck (Newell and Nabavi, 1969)[SJB31] .  However, in 1997 Blyth described ASR as a relatively new area of research and development and consequently was an untried technology[SJB32].  Nevertheless, ASR is now common place and   comes pre-installed on many desktop computers.  Irrespective of any speech impediment it is generally agreed that as many as 80% of those who try ASR systems fail to operate them satisfactorily[SJB33] (Fine, 2000) .   The reason suggested for this high figure is that ASR systems are complex and require the appropriate hardware and time to operate them appropriately.  However, as such systems come pre-installed there is a belief that they will function almost immediately[SJB34] (Fine, 2000) .

 

An overview of the ASR process as described by Rosen[SJB35]  and Yamplosky, 2000 consists of three processes:

1.      Preprocessing: the spoken input is converted to a digital format, suitable for the speech recogniser.  This consists of capturing the speech waveform (i.e. obtained with a microphone) and low pass filtering it, which removes high frequency components not necessary for the identification of speech.  The low-frequency portion is then converted to a digital format and reduced to identify the ‘frames’ of data that do not significantly change from beginning to end.

2.      Recognition: the spoken input is identified and this can be achieved by various techniques, such as Hidden Markov Models (HMM), template matching and neural networks.  The various techniques differing in speed, accuracy and storage requirements[SJB36] (Markowitz, 1996).

 

Hidden Markov Models  

Hung et al, 1996 comment that this approach uses statistical theory to compare a speech model with the spoken input[SJB37] .  Given sufficient amounts of training data for small units or ‘frames’ these can be combined to recognise words not originally encountered by the recogniser before. Kempainen 1997 considers that US English requires approximately 45 of these ‘frames’ or phonemes and when tirphones are used the current phoneme is linked with the preceding and next phoneme[SJB38] .

 

Template matching

Rabiner and Juang, 1993 describe this technique as matching the spoken input with a referenced pattern and as the name suggests is a pattern-matching technology[SJB39] .  This technique functions at the word level and therefore every word used must have a stored template.  It is therefore not efficient for large vocabularies.

 

Neural networks

Such a network consists of a set of neurons which are multiplied by a weight that reflects their importance and when the sum of a number of neurons reaches a threshold level the network constitutes the output of that node[SJB40] (Whinston, 1993) , i.e. estimates what word the speech input reefers to.

 

3.      Communication: the recognised message is used to communicate with other hardware or software.  For example, environmental controls or speech synthesis.

 

To aid the recognition process word prediction [SJB41] can be used to improve the accuracy and/or speed of the process (Whinston, 1993).  For example, the phrase ‘it would be nice if her returned the book” would be considered as incorrect by the majority of native English speakers, instead the phrase should be ‘it would be nice if she returned the book’.  Consequently, based on the previous word(s) it is possible to anticipate the next ‘kind’ of word.  However, Whinston, 1993 consider that a complete understanding of language, and therefore accurate language prediction, is still some considerable distance away[SJB42] .  Nevertheless, Newell et al, (1995) have developed PAL (Predictive Adaptive Lexicon) is being developed to predict syntactically plausible words prior to an implausible one[SJB43] .

 

Overall two groups either discrete or continuous divide speech recognition software.  Discrete recognition became commercially available during the 1980’s and requires the user to speak a word-at-a-time[SJB44] (BECTa 2000) .  Alternatively, Donegan 2000 considers that continuous recognition accommodates more natural speech where the user can speak without the imposed gaps[SJB45] .  Continuos speech recognisers would appear to be the preferred, and natural, choice, however, BECTa 2000 found them to require significantly more training and were harder to use[SJB46] .  Recognisers can also be distinguished by their dependence on the user, and again Rosen[SJB47]  and Yamplosky, 2000 define 3 types:

 

Speaker dependent
These use the template matching technique to compare the spoken input with stored patterns for the specific user.  Syrdal et al 1995, consider the main advantage of this approach is that recognition is more accurate than speaker independent systems[SJB48] .  As the comparison is with the users own speech it is less critical for the users speech to be ‘normal’ it just needs to be consistent[SJB49] (Schmitt and Tobias 1986, Ferrier et al, 1995) . It therefore allows the user to enter a consistent sound and this to be matched to a specific outcome.  For example, by saying ‘seza’ the outcome may be attached to represent ‘Sarah’.

 

Speaker independent
Huang and Lee 1991, reveal that these rely on templates created by a number of speakers and do not require training by the user[SJB50] .  However, Schmandt 1994, considers the accuracy of the recognition may not be as high as the speaker dependant system[SJB51] .  It may also be that people with speech that is significantly different to that stored in the system, for example dysarthric users, may not secure high accuracy rates.

 

Speaker adaptive

This is a hybrid approach that requires some training without requiring the user to enter every word in the systems vocabulary.  The more the user operates the system the more the accuracy is improved[SJB52] (Rosen and Yamplosky, 2000) .

 

Kotler and Thomas-Stonell N, (1997) comment that dysarthric speech has excessive nasalization, disordered speech prosody, imprecise articulation, and variable speech rate[SJB53] .  Lysyk and Smith, 2000 therefore suggest that dysarthric speakers are more intelligible on the word level [SJB54] and Beukelman and Yorkston, 1979 have commented that it may be difficult for some users to maintain breath support for a whole sentence[SJB55] .  Consequently, in terms of the most appropriate speech recognition software for dysarthric users, discrete systems would appear to be the most appropriate.  Although continuous speech recognition systems can function in a discrete way they were not designed in this way and Dragon recommend that their continuous system is not used in a ‘word..by..word’ method[SJB56] (Donegan , 2000) . 

 

For high recognition rates Schmitt and Tobias (1986) consider a primary requirement for speaker dependant systems is that speech is consistent[SJB57] , while for speaker adaptable systems Thomas-Stonell et al 1998, suggest that speech intelligibility may be as important as perceptual judgements of consistency[SJB58] .  Menendez-Pidal, 1996 consider dysarthric speech as ‘quite variable’[SJB59]  with inconsistencies in voice and articulation[SJB60] (Ferrier et al, 1995)  and in order for the speech recogniser to function effectively it will be necessary for the speech to be of a consistent nature.  It therefore would appear that in order to achieve high recognition rates that the speaker dependant approach be used.  Indeed, Raghavendra et al, 1994 consider that clinical experience and results from case studies have indicated that a discrete, speaker dependant system is the most appropriate for dysarthric speech[SJB61] .

 

For example, when using a speaker adaptive approach with sever dysarthric speech Rosengren et al [SJB62]1995  identified several problem areas:

1.      Defective consonant clusters can result in the ASR having difficulty when words are similar.

2.      One-syllable words can be regarded as two-syllable words when speech is slow.

3.      Single words with a long voiceless drop could be interpreted as two separate words.

4.      Involuntary sound or hesitation at the beginning of a word could be misrecognised.

 

Despite various ASR systems being readily available, research into such technology with people who have speech difficulties lags far behind efforts involving ‘normal’ speech[SJB63]  (Deller 1997, and Ferrier et al 1992).  However, Coleman and Myres believe that the available research indicates that ASR is not as effective with dysarthric users[SJB64] .  Indeed, in 2000 Rosen and Yamplosky indicated that there was little research, especially experimental research with ASR and dysarthric speech[SJB65] .  What research there has been in this area has tended to be limited by small sample sizes, typically in the order of less than five.  Nevertheless, a commercial product Clear:Voice TM is expected to be released in the near future and promises better recognition rates for people with dysarthria[SJB66]  (Meskimen, 1999).

 

In terms of the accuracy of ASR, Rose and Galdo, 1999 suggested that speech which is classified as ‘normal’ can be recognised by commercial dictation programs with accuracy rates of 95% or more.  If the vocabulary is limited then accuracy levels of 99% or more can be achieved for speaker-dependent systems[SJB67] .  In there study of 1992 Ferrier et al[SJB68]  illustrated that speakers with mild were between 10 and 15% lower than the recognition rate achieved by those with ‘normal’ speech.  While a subsequent study by Ferrier et al 1995, of speakers with spastic dysarthria indicated that seven out of the ten participants achieved recognition rates of 80% for the same passage within eight dictations[SJB69] . 

 

Thomas-Stonell et al [SJB70] in 1998 analysed one male and one female dysarthric speaker with varying degrees of intelligibility and assessed the accuracy of ASR using the DragonDictate system.  The Yorkston et al[SJB71]  1984 intelligibility score was used to define mild intelligibility at >70% <=90%, moderate intelligibility at >40%<=70%, while severe intelligibility was set at >10% <=40%.  Each of these groups consisted of one male and one female.  In terms of recognition accuracy by the fifth session of repeating the same text the recognition rates were:

§         88% for the mild dysarthria group.

§         75% for the moderate group.

§         77% for the severe group.

 

This is compared to 93% for the control group with ‘normal speech’.  The results from this study agree with those of Ferrier et al[SJB72] 1992, who concluded that ASR was successful with mild dysarthria.  The Thomas-Stonell study also discovered that there was no significant correlation between dysarthric speakers with perceptually high speech consistency and high recognition rates.  It was therefore suggested that perceptual speech consistency should not be used as a predetermine for the success of an ASR system.  It was also reported that since there is no simple way to measure speech consistency that clinicians should prescribe these systems to potential users and discover if they are appropriate on an individual basis.

 

Prior to the Thomas-Stonell et al [SJB73]study of  1998 a study by Doyle et al[SJB74] , 1997 was structured in the same way and used the same ame participants.  The difference between the two studies is that the Thomas-Stonell study used 70 randomly generated words across 5 sessions based on Kent et al[SJB75] 1989 , and the first three sentences of the Rainbow passage (Fairbanks, 1960[SJB76] ).  While the Doyle study just used the 70 randomly generated words, again across 5 sessions.  T..he grouping and number of participants in the Doyle study was the same, with listener intelligibility for the control group at 100%, 95%, for mild dysarthria, 92% for the moderate group, and 82% for the server group.  Recognition accuracy by the end of the fifth session using the IBM VoiceType, speaker adaptable ASR system was:

§         79% for the control group.

§         68% for mild dysarthria.

§         53% for moderate.

§         54% for server.

 

However, the severe group consisted of a female with 73% accuracy and a male with 35%.  One of the interesting outcomes from the research was that accuracy levels for female dysarthrics was generally higher than their male counterparts.  They also commented that ‘reduced intelligibility due to consistent motoric breakdown during speech would likely result in less variable recognition by computer, when compared to a speaker with inconsistent speech.

 

The only difference between the Thomas-Stonell et al [SJB77]  1998 and Doyle et al[SJB78] , 1997 study is the training used, even the people in the two studies were the same.  However, as indicated in Table 2 the additional sentences used in the Thomas-Stonell study made a significant difference in performance.  However, it is not possible to judge whether the improved accuracy is due to the use of sentences or the additional data entered during the training phase.  Thomas-Stonell et al [SJB79]  1998, suggest that the improved accuracy is due to the higher number of multisyllabic words used in the Rainbow sentences which provides more information to the recogniser than one sylable words.

 

Table 2: Comparison of the training techniques

Group

Recognition accuracy

Difference

 

Thomas-Stonell et al[SJB80] 

Doyle et al

 

Control

93%

79%

14%

Mild

88%

68%

20%

Moderate

75%

53%

22%

Servere

77%

54%

23%

 

In a further study by Kotler and Thomas-Stonell 1997[SJB81] , a single user with mild dysarthria and an intelligibility of 80% at the word level and 94% for sentences, as defined by CAIDS (Yorkston et al, 1984) discovered a similar trend. Using the 70 Kent words at the end of session 6 the accuracy was 65% and increased to 82% at the end of session 22.  With use of the first three rainbow sentenes only, accuracy improved from 86% at the third session to 92% at the end of seeison 22.  This study also discovered that readministration of the CAIDS assement resulted in single word intelligibility decreasing from 80% at the beginning of the study to 70%, with sentence intelligibility increasing slightly from 94% to 95%.

 

Training the user

In order for the same sound to be interpreted as the correct word or command it is necessary for the sound produced to be of a relatively consistent nature.  It may therefore be necessary for some level of training to improve speech consistency.  In the past there has been some doubt over the effectiveness of such training with Lenneberg in 1967 suggesting [SJB82] that people reach their potential for language learning in the teenage period and improvements cannot be secured thereafter.  However, it would appear that this view is no longer adhered to with Milloy, 1991 [SJB83] suggesting that learning continues throughout life.

 

Indeed, Coventry et al[SB84]  1997, have indicated that over the last 25 years the use of computer training aids or visual feedback techniques has been relatively widespread.  While in 2000 Yamada Y et al[SJB85] highlight some of the more common training aids available:  

§         Kay Elemetrics (http://www.kayelemetrics.com) – measures acoustic output from the nose and the laryngograph, which provides a measure of vocal folding.  However, it requires the user to wear a plate that covers the teeth and part of the soft palate.  A similar system is CISTA designed by Yamada Y et al [SJB86] that requires less interference with articulation.

§         Speech Viewer III (IBM) (http://www-3.ibm.com/able/snsspv3.html) -  This is the best known system that does not require the user to wear a device.  It includes games to practice speech parameters such as pitch, amplitude, duration and voicing.  However, Hatzis, 1999 has identified several deficiencies such as, inaccurate feedback and the possibility of repeating previous mistakes[SJB87] .

§         Idioma (Granot) (http://www.el-castellano.com/diccio.html ) – designed for training of phonemes and phonemic contrasts using ASR.

§         Dr Speech (Tiger Electronics/Laureate) (http://www.synapseadaptive.com/laureate /llsmain/desc3/ speechdc.html) – ‘There are technical displays of wave-forms, spectral information, the location of a produced vowel within the vowel-space, a vocaltract representation derived from the users production and tools for the therapist to evaluate pitch, amplitude, jitter, shimmer, tremor and other voice qualities.’

§         VideoVoice (MicroVideo) – games are provided for pitch, amplitude, duration, and voice-onset therapy.

§         Voice Prism (Language Vision) – Waveform and spectrographic data are displayed for the current word and the users speech is matched to the ‘ideal’.  The users can then train themselves towards the ‘ideal’ voice production.

 

In addition, the ‘Sights 'n Sounds’ software package by Bungalow software, 2000 [SJB88] a company based in American, has had some success in improvingg speech.  The package requires the user to record a given word and then the user can orally compare there speech with an ‘accepted’ target word.  No information is given as to how the speech could be improved but results from users have been encouraging, for example "I can always tell when my husband has been using ‘Sights and Sounds’ because his diction really improves[SJB89] ."  A trial of the Bungalow’s software by Aftonomos et al 1997, concluded that through the use of computer based language therapy (in chronic aphasia) language function can broadly, and positively be improved[SJB90] .  In addition, the use of a similar package with deaf students suggested that speech consistency and stability could be improved[SJB91]  (Yamada et al 2000).

 

In a case study of a single dysarthric user who used an ASR system, those with whom he regularly interacted suggested that his speech had improved with this being attributable, at least in part, through the use of the ASR[SJB92] (Donegan, 2000) .  In combination with the Bungalows software reports, there appears scope to improve the dysarthric users speech whilst also enabling them to communicate through the ASR.

 

Despite developments occuring over the last 25 years in 1991 Povel and Arends[SB93]  claimed that the attempts had not been very successful.  Back in 1935 Hudgins[SB94]  identified three criteria that must be addressed for a successful training aid that are still relevant today:

1.      The display must be simple and easily understood.

2.      The equipment used must be easy to use and adaptable in the classroom.

3.      The equipment must be present the visual patterns while the child is speaking.

 

More recently the appropriate criteria for a training aid has been developed.  For example, Ball 1991[SB95]  provides 11 criteria for design which are replicated below:

1.      There should be a simple visual representation of the acoustically complex speech signal by breaking it down into its constituent speech pattern elements.

2.      Visual patterns to represent the displayed speech pattern elements should be clear, unequivocal and intuitive, I.e. corresponding in a simple way to the normal experience of how that speech-feature changes.

3.      Display software must be flexible enough to allow selection of any speech feature in order to focus auditory attention on it, and so that any required combination of features could be displayed simultaneously.

4.      Feedback should be patterned, to provide, in addition to simple right/wrong information about an utterance, an indication of where and how something went wrong.

5.      Features should all be displayed in real-time to allow immediate visual feedback that is synchronous with the auditory signal.

6.      Visual patterns should be stored to facilitate delayed visual feedback and to avoid overloading visual memory.

7.      There should be the possibility of simultaneous display of the therapists target and the patients attempt to match it.

8.      Software must be made simple to operate, so that manipulation of the display does not interfere with the therapist-patient interaction and so that patients can be left for unsupervised practice.

9.      Software should afford means of making reliable, repeatable, quantitative measures of an individuals speech to reinforce and supplement qualative analysis.

10.  Devices have to be functionally reliable and consistently produce the same display for identical or very similar input signals.

11.  Devices should not be prohibitively expensive.

 

Despite several training aids being commerically available and literature indicating a design methodology there are few studies that evaluate the success of the training aid intervention (Coventry et al, 1997[SB96] ).  In their study of training aids, Coventry et al 1997 surveyed 193 speech and language therapists from across the UK and discovered that the thereapists did not regard training aids as very satisfactory as they all of the used aids scored badly on some dimension.  This study also revealed that the training aids were not used with a large percentage of caseloads and that many health trusts could not afford to purchase them or support their use.

 

Training the ASR

In order to assist in the development of speech recognisers for dysarthric users recorded training data has been made available.  The Whitaker database of dysarthric speech (Deller et al, 1993) [SJB97] contains 19,275 isolated word utterances from seven speakers and is divided into two categories the TI-46 and Grandfather list.  The TI-46 list is suggested as a standard by the Texas Instruments Corporation[SJB98] (Doddington and Schalk, 1981)  and contains the 26 letters of alphabet, 10 digits (0-9) and 10 control words – start, stop, yes, no, go, help, erase, rubout, repeat, and enter.   The contents of the grandfather list are provided in Table 3.

 

Table 3: The contents of the Grandfather list

missing

several

To

well

Thinks

Long

my

old

you

ever

an

Frock

coat

Usually

Still

he

dresses

about

years

is

Wish

know

Himself

Buttons

all

grandfather

as

swiftly

black

Beard

in

Yet

Nearly

clings

ninety-three

 

 

An alternative database is the Nemours database of dysarthric speech that contains 814 short nonsense sentences by 11 speakers with varying severity of dysarthria[SJB99]  (Mendez-Pidal et al, 1996).  However, both of these databases are American English and may not be appropriate for UK based English.  In addition, for the STARDUST project, which seeks to address severe dysarthria these databases may be of limited assistance.  With respect to training words for students with special needs (although not specifically people with dysarthria) wishing to use Dragon Dictate; Speaking to Write [SJB100] suggested a list of appropriate words in 1999, details can be found at http://www.edc.org/spk2wrt/Resources/ cmndlst.pdf. 

 

With respect to how long it takes to train a particular ASR package Thomas-Stonell et al[SJB101]  1998 have indicated that further research is required to determine how much time is required to train a speaker dependant or adaptive system.  Some commercial speech recognition systems suggest 8 hours of training for someone with ‘normal’ speech and it may be that more time would be required depending on the severity of the dysarthria[SJB102]  (Rosen and Yamplosky, 2000).  In their trial of dysarthric speech Kotler and Thomas-Stonell, 1997 [SJB103] discovered accuracy stability (defined at less than 10% variability in accuracy over 3 sessions) over sessions 6 to 8 sessions, while after session 22 accuracy had improved from 65% at session 6 to 82% at session 22.  This would agree with the aforementioned study by Ferrier et al[SJB104]  1995, who discovered that accuracy improvements continued after the 8th session for users with higher intelligibility scores.  It may be appropriate that the level of training required therefore ends once the adaptation curve (little improvement in accuracy after repeated sessions) plateau’s. 

 

Kotler and Thomas-Stonell, 1997 [SJB105] also suggest that if stability (10% variation or less over three consecutive productions of the word) is not reached by the sixth session then speech therapy is required to stabilise the speech.

 

Measuring the accuracy of the recogniser

There is no established standard for measuring accuracy, however Oki Semiconductor offers the following equation for its voice recognition processor[SJB106]  (Kempainen, 1997):

 

Accuracy = 100% - (Esub + ½ Erej)                                                                                                   1

 

Esub is the substitution error where a speaker says ‘five’ and the system recognises ‘nine’, described as the most crucial error as it is the most difficult to recover from.

 

Erej is the sum of the recognition errors where:

1.      An error occurring due to the recognition engine not being ready for the next word.

2.      An error caused by words being too long.

3.      An error caused by spurious noise or by the wrong word’s being spoken.

 

Equipment considerations

In order to achieve optimum recognition and performance it is necessary to have the correct equipment.  For example, to operate Dragon’s NaturallySpeaking software requires a minimum of a Pentium 133 and between 32 and 64MB of RAM[SJB107] (Bowes, 1999) .  32MB has been described as ‘an absolute minimum[SJB108] ’ (BECTa, 2000) while it has been suggested for the new Dragon Naturally speaking 4 and Via Voice Millennium Edition that a 500Mhz processor and 128RAM is required[SJB109]  (BECTa, 2000).  Donegan, 2000 also comments that the appropriate sound card must also be used, as not all sound cards are compatible with all voice recognition systems[SJB110]  and sound card software incompatibility is the most common reason for poor recognition rates[SJB111] (Bowes, 1999) .  Alternatively a USB microphone could be used which bypasses the sound card, however using a USB port does not necessarily mean it will provide better recognition[SJB112]  (Donegan, 2000).

 

The correct microphone and positioning is also important as variable positioning of the microphone can make it difficult for the recogniser to obtain consistent speech.  In order to improve the quality of the input the microphone should have active noise cancellation that reduces background noise and speaker feedback[SJB113]  (Donegan, 2000).  BECTa, 2000 suggests that microphones supplied with commercial software packages are ‘adequate’ however, better results can be achieved with noise cancelling microphones[SJB114] .  A fuller account of benefits and weaknesses of various microphones can be obtained from http://www.microphones.com.

 

Key points that impact upon the STARDUST project

§         There would appear to be only a small amount of published literature on dysarthric speech and ASR.

§         It has been suggested that for dysarthric speech a discrete, speaker dependant system is the most appropriate.

§         The recogniser requires speech of a consistent nature.  For users with inconsistent speech it would appear that there is no real evidence to suggest that training of users improves consistency or not, although users of the Bungalows software have indicated that they feel their speech has improved.

§         With respect to the accuracy of ASR, consistent speech has been suggested as more important than the level of intelligibility.  Ideally, it is therefore necessary to measure intelligibility and consistency.

§         For people with mild dysarthria one study has suggested that accuracy is in the order of 10–15% less than ‘normal’ speech.  For people with moderate to severe dysarthria it has been suggested that accuracy is greatly affected.  However, the Thomas-Stonell study suggested that recognition accuracy for 'normal' speech was 93%, mild dysarthria was 88%, 75% for moderate and 77% for severe.  However, there were only two people in each of these groups.

§         The small number of studies that have been performed have tended to be based on small sample sizes or individual case studies.  Therefore, a sample size in the order of 10, as suggested in the proposal, could make a significant contribution to the research literature.

§         All of the studies refereed to in this review have used commercial recognisers by Dragon or IBM.  Only 1 attempt to create a commercial system for dysarthric speech has been identified and is expected to be released during 2000.  However, using a discrete, speaker adaptive recogniser such as DragonDictate and removing all of the templates would in effect make the recogniser a discrete, speaker dependant system.

 

References

Aftonomos LB, Steele RD, Wertz RT. (1997) “Promoting recovery in chronic aphasia with an interactive technology. “Arch Phys Med Rehabil 1997 Aug;78(8):841-6. http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9344303&form=6&db=m&Dopt=b.

 

Baken JJ. (1987). “Clinical Measurement of Speech and Voice.” College-Hill Press, Boston MA: 68.

 

Ball V. (1991). “Computer-based tools for assessment and remediation of speech.” British Journal of Disorders of Communication. 26:95-113. In Coventry KR. Clibbens J. Cooper M. Rood B. (1997). "Visual speech aids: a British survey of use and evaluation by speech and language therapists. European Journal of Disorders of Communication. 32 (3):203-17.

 

BECTa. (2000). “DfEE/Becta SEN speech recognition project: Final report June 2000.” DfEE.

 

Beech JR. Harding L. Jone H. (1993). “Assessment inn speech and Language Therapy.” Routledge – NFER Assessment Library: 214-16.

 

Beukelman DR. Yorkston KM. (1979). “The relationship between information transfer and speech intelligibility of dysarthric speakers.” Journal of Communication disorders. 12:189-96.

 

Blyth B. (1997).“Developing a speech recognition application for survey research.” Chapter 10. In “Survey Measurements and Process Quality.” Eds Lyberg L Biemer P. Collins M. De-Leeuw E. Dippo C. Scwanz N. Trewin D.. John Wiley and Sons:249-66.

 

Bowes DR. (1999). “Getting it Right and Making it Work ! Selecting the Right Speech Input Writing Software for Users with Special Needs.” 14th annual, international conference, "Technology and Persons with Disabilities." Los Angeles. March 15-20. (1999). http://www.dinf.ch/csun_99/session1014.html.

 

Bungalow software 2000 http://www.bungalowsoftware.com/index_bungalow.htm.

 

Cohen PR. (1984). “The pragmatics of referring and the modality of communication.” Computational Linguistics. 10:97-146.

 

Coleman CL. Myres LS. (1991). “Computer recognition of the speech of adults with cerebral palsy and dysarthria.” Augmentative and Alternative Communication. 7:34-42.

 

Coventry KR. Clibbens J. Cooper M. Rood B. (1997). "Visual speech aids: a British survey of use and evaluation by speech and language therapists. European Journal of Disorders of Communication. 32 (3):203-17.

 

Darley FL. Aropnson AE. Brown JR. (1975). “Motor Speech Disorders.” WB Saunders. Philadelphia. PA.

 

Deller JR. Liu MS. Ferrier LJ. Robichaud P. (1993). “The Whitaker database of dysarthric (cerebral palsy) speech.” Journal of Acoustic Society of America. (6):3516-18 Deller JR. Liu MS. Ferrier LJ. Robichaud P. (1993). “The Whitaker database of dysarthric (cerebral palsy) speech.” Journal of Acoustic Society of America. (6):3516-3518.

 

Doddington GR. Schalk TB. (1981). “Speech recognition: turning theory to practice.” IEEE Spectrum. 18: 26-32.

 

Donegan M. (2000). “Voice recognition technology in education: factors for success.” ACE.

 

Doyle PC. Leeper HA. Kotler A. Thomas-Stonell N. O’Neill C. Dylke M. Rolls K. (1997). “Dysarthric speech: A comparison of computerized speech recognition and listner intelligibility.” Journal of Rehabilitation Research and Development. Vol 34. 3:309-316.

 

Enderby P. (1983). “The Frenchay dysarthria assessment.” College-Hill Press. San Diego.

 

Enderby P. Emerson L. (1995). “Does Speech and Language Therapy Work?” Singular Publications: 84.

 

Fairbanks G. (1960). “Speech and articulation drillbook.” (2nd Ed). New York: Harper and Brothers.

 

Ferrier LJ. Jarell N. Carpenter T. Shane C. (1992). “A case study of a dysarthric speaker using the DracginDictate speech recognition system.” Journal of Computer Users in Speech and Hearing. 8(1-2):33-52.

 

Ferrier LJ. Shane HC. Ballard HF. Carpenter T. Benoit A. (1995). “Dysarthric Speakers’ Intelligibility and Speech Characteristics in Relation to Computer Speech Recognition.” Journal of Augmentative and Alternative Communication.  11:165-174.

 

Fine B. (13th July 2000). “The Fine Line.” Computing.

 

Frearson B. (1985). "A comparison of the A.I.D.S. sentence list and spontaneous speech intelligibility scores for dysarthric speech." Australian Journal of Human Communication Disorders. 1391):5-21.

 

Gerratt BR. Till JA. Rosenbek JC. et al. “Use and perceived value of perceptual and instrumental measures in dysarthria management.” In dysarthria and Apraxia of Speech: Perspectives on Management. (eds Moore CA. Yorkston KM. Beukelman DR.) Brookes PH. Baltimore: 77-88.

 

Hatzis A. (1999). “Optical Logo-Therapy (OLT) : Computer-Based Audio-Visual Feedback Using Interactive Visual Displays for Speech Training.” PhD thesis, University of Sheffield.

 

Huang XD. Lee KF. (1991). “On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition.” The institute of electrical and electronics engineers. 2(1):877-80.

 

Hung XD. Ariki Y. Lee KF. (1990). “Hidden Markov models for speech recognition.” Edinburgh: Edinburgh University Press.

 

Hudgins CV. (1935). "Visual aids in the correction of speech." Volta Review. 37:637-704.

 

Kempainen S. (1997). “Automatic speech recognition lets machines listen and comprehend.” http://www.ednmag.com/reg/1997/030397/05DF_02.htm.

 

Kent RD. Weismer G. Kent JF. Rosenbek JC. (1989). “Toward phonetic intelligibility testing in dysarthria.” Journal of Speech Hearing Disorders. 54:482-99.

 

Kotler A. Thomas-Stonell N. (1997). “Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment.” Augmentative and Alternative Communication. 13:71-80.

 

Lenneberg EH. (1967) “Biological Foundations of Language.” John Wiley and Sons Inc. New York.

 

Lysyk M. Smith C. (1998). “Voice recognition and dysarthria: A case study.” In proceedings of the 8th Biennial Conference of the International Society of Augmentative and Alternative Communication. Dublin:476-77.

 

Markowitz JA. (1996). “Using speech recognition.” Englewood Cliffs NJ: Prentice-Hall.

 

Mendez-Pidal X. Polikoff JB. Peters SM. Leonzio JE. Bunnell HT. (1996) “The Nemours database of dysarthric speech.” Applied Science and Engineering Laboratories (ASEL). AI DuPont Institute. PO Box 269. Wilmington. DE 19899. USA http://searchpdf.adobe.com/proxies/1/54/58/40.html.

 

Meskimen NC. “Voice recognition solutions for challenged speaker.” Proc. “Technology and Persons with Disabilities.” Conference, California State University Northridge, March 1999.

 

Milloy NR. (1991). “Breakdown of speech: cause and remediation.” Chapman and Hall.

 

Newell AF and Nabavi CD. (1969). “VOTEM: the voice operated typewriter employing morse code.” The physics exhibition:655-57.

 

Newell AF. Arnott JL. Cairns AY. Ricketts IW. Gregor P. “Intelligent systems for speech and language impaired people: a portfolio of research.” In  (ed Edwards A.) (1995). “Extra-ordinary human-computer interaction: interfaces for users with disabilities.” Cambridge University Press:87-89.

 

Peterson HA. Marquardt TP. (1981). “Appraisal and diagnosis of speech and language disorders.” Englewood Cliffs. NJ. Prentice-Hall:59.

 

Pilon MA. McIntosh KW. Thaut MH. (1998). "Auditory vs visual timing cues as external rate control to enhance verbal intelligibility in mixed spastic-ataxic dysarthric speakers: a pilot study." Brain Injury. 12 (9):793-803.

 

PovelDJ. Arends N. (1991). "Visual speech apparatus: theoretical and practical aspects." Speech Communication. 10:59-80.

Page: 1
 

Rabiner L. Juang B-H. (1993). “Fundamentals of speech recognition.” Englewood Cliffs. NJ: Prentice Hall.

 

Raghavendra P. Rosengren E. Hunnicutt S. (1994). “An investigation of two speech recognition systems with dysarthric speech input.” In Proceedings of the 6th biennial conference of the International Society of Augmentative and Alternative Communication. Maastricht. The Netherlands:479-81.

 

Rose T. Galdo ED. (1999). “Designing speech-driven user interfaces.” Journal of Human-Computer Interaction. 11:127-28.

 

Rosen K. Yamplosky S. (2000). “Automatic speech recognition and a review of its functioning with dysarthric speech.” Journal of Augmentative and alternative Communication. 16:48-60.

 

Rosenbek JC. LaPointe LL. (1978). “The dysarthrias: description, diagnosis and treatment, in Clinical Management of Neurogenic Communication disorders. (ed D.Johs). Little Brown & Co. Boston MA: 251-310.

 

Rosengren E. Raghavendra P. Hunnicutt S. (1995) “How does automatic speech recognition handle severely dysarthric speech?” The European context for assistive technology. Proceedings of the 2nd TIDE congress 26-28 April 1995.

 

Schmandt C. (1994). “Voice communication with computers.” New York. Van Nostrand Reinhold.

 

Schmitt DG. Tobias J. (1986). “Enhanced communication for a severely disabled dysarthric individual using speech recognition and speech synthesis.” Proceedings of the 9th annual conference on rehabilitation technology. RESNA 1986: Employing technology. 6:304-6.

 

Sytba L. (2000). http://home.ica.net/~fred/anch10-1.htm.

 

Syrdal A. Bennett R. Greenspan S. (1995). “Applied speech technology.” Boca Raton. Fl: CRC Press.

 

Thomas-Stonell N. Kotler A. Leeper HA. Doyle PC. (1998). ”Computerised speech recognition: influence of intelligibility and perceptual consistency on recognition accuracy.” Journal of Augmentative and Alternative Communication. 14:51-56.

 

Winston PH. (1993). “Artificial Intelligence.” Addison Wesley:444.

 

Yamada Y. Javkin H. Youdelman K. (2000). “Assistive speech technology for persons with speech impairments.” Speech communication 30:179-87.

 

Yorkston K. Beukelman D. Trynor C. (1984). “Computerised assessment of intelligibility of dysarthric speech.” Tigard, OR: C.C. Publications.

 

Yorkston KM. Beukelman DR. (1980). “A clinician judged technique for quantifying dysarthric speech based on single-word intelligibility.” Journal of Communication Disorders. 13:15-31.

 

Yorkston KM. Beukelman DR. (1981). “Assessment of Intelligibility of Dysarthric Speech.” Austin, TX: Pro-ed.

 

Yorkston KM. Dowden PA. Beukleman DR. "Intelligibility measurement as a tool in the clinical management of dysarthric speakers." In Kent RD. (1992). "Intelligibility in speech disorders." Philadelphia. PA: John Nejamins Co.:245-85.

 


 [SJB1]Yorkston KM. Beukelman DR. (1981). “Assessment of Intelligibility of Dysarthric Speech.” Austin, TX: Pro-ed.

 [SJB2]Milloy NR. (1991). “Breakdown of speech: cause and remediation.” Chapman and Hall: 36

 [SJB3]Enderby P. Emerson L. (1995). “Does Speech and Language Therapy Work?” Singular Publications: 84

 [SJB4]http://home.ica.net/~fred/anch10-1.htm.

 [SJB5]Darley FL. Aropnson AE. Brown JR. (1975). “Motor Speech Disorders.” WB Saunders. Philadelphia. PA.

 [SJB6]Milloy NR. (1991). “Breakdown of speech: cause and remediation.” Chapman and Hall: 45

 [SJB7]Enderby P. (1983). “The Frenchay dysarthria assessment.” College-Hill Press. San Diego.

 [SJB8]Robertson SJ. (1982). “Dysarthria Profile.” Robertson. London.

 [SB9]Yorkston KM. Beukelman DR. (1981). “Assessment of intelligibility of dysarthric speech.” Austin. TX: Pro-ed.

 [SJB10]Beech JR. Harding L. Jone H. (1993). “Assessment inn speech and Language Therapy.” Routledge – NFER Assessment Library.

 [SB11]Yorkston KM. Beukelman DR. (1980). “A clinician judged technique for quantifying dysarthric speech based on single-word intelligibility.” Journal of Communication Disorders. 13:15-31.

 [SB12]Kent RD. Weismer G. Kent J. Rosenbek JC. (1989). "Toward phonetic intelligibility testing in dysarthria." Journal of Speech and Hearing Disorders. 54:482-99.

 [SB13]Peterson HA. Marquardt TP. (1981). “Appraisal and diagnosis of speech and language disorders.” Englewood Cliffs. NJ. Prentice-Hall:59.

 [SB14]Yorkston KM. Dowden PA. Beukleman DR. "Intelligibility measurement as a tool in the clinical management of dysarthric speakers." In Kent RD. (1992). "Intelligibility in speech disorders." Philadelphia. PA: John Nejamins Co.:245-85.

 [SB15]Pilon MA. McIntosh KW. Thaut MH. (1998). "Auditory vs visual timing cues as external rate control to enhance verbal intelligibility in mixed spastic-ataxic dysarthric speakers: a pilot study." Brain Injury. 12 (9):793-803.

 [SB16]Enderby P. (1983). “The Frenchay dysarthria assessment.” College-Hill Press. San Diego.

 [SB17]Kent RD. Weismer G. Kent J. Rosenbek JC. (1989). "Toward phonetic intelligibility testing in dysarthria." Journal of Speech and Hearing Disorders. 54:482-99.

 [SB18]Kent RD. Weismer G. Kent J. Rosenbek JC. (1989). "Toward phonetic intelligibility testing in dysarthria." Journal of Speech and Hearing Disorders. 54:482-99.

 [SJB19]Rosenbek JC. LaPointe LL. (1978). “The dysarthrias: description, diagnosis and treatment, in Clinical Management of Neurogenic Communication disorders. (ed D.Johs). Little Brown & Co. Boston MA: 251-310

 [SJB20]Milloy NR. (1991). “Breakdown of speech: cause and remediation.” Chapman and Hall: 46

 [SB21]Kent RD. Weismer G. Kent J. Rosenbek JC. (1989). "Toward phonetic intelligibility testing in dysarthria." Journal of Speech and Hearing Disorders. 54:482-99.

 [SB22]Frearson B. (1985). "A comparison of the A.I.D.S. sentence list and spontaneous speech intelligibility scores for dysarthric speech." Australian Journal of Human Communication Disorders. 1391):5-21.

 [SJB23]Beech JR. Harding L. Jone H. (1993). “Assessment inn speech and Language Therapy.” Routledge – NFER Assessment Library: 214-16

 [SJB24]Baken JJ. (1987). “Clinical Measurement of Speech and Voice.” College-Hill Press, Boston MA: 68

 [SJB25]Gerratt BR. Till JA. Rosenbek JC. et al. “Use and perceived value of perceptual and instrumental measures in dysarthria management.” In dysarthria and Apraxia of Speech: Perspectives on Management. (eds Moore CA. Yorkston KM. Beukelman DR.) Brookes PH. Baltimore: 77-88.

 [SJB26]Cohen PR. (1984). “The pragmatics of referring and the modality of communication.” Computational Linguistics. 10:97-146

 [SJB27]Ferrier LJ. Shane HC. Ballard HF. Carpenter T. Benoit A. (1995). “Dysarthric Speakers’ Intelligibility and Speech Characteristics in Relation to Computer Speech Recognition.” Journal of Augmentative and Alternative Communication.  11:165-174

 [SJB28]Kotler A. Thomas-Stonell N. (1997). “Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment.” Augmentative and Alternative Communication. 13:71-80

 [SJB29]Vanderheiden PJ. (1985). “Writing aids.” In Webster JG. Cook AM. Tompkins WJ. Vanderheiden GC. (eds) “Electronic aids for rehabilitation.” London: Chapman and Hall:262-82

 [SJB30]Ferrier LJ. Shane HC. Ballard HF. Carpenter T. Benoit A. (1995). “Dysarthric Speakers’ Intelligibility and Speech Characteristics in Relation to Computer Speech Recognition.” Journal of Augmentative and Alternative Communication.  11:165-174

 [SJB31]Newell AF. Nabavi CD. (1969). “VOTEM: the voice operated typewriter employing morse code.” Journal of Scientific Instruments (Journal of Physics E) 2.2:655-57

 [SJB32]Blyth B. “Developing a speech recognition application for survey research.” Chapter 10. In “Survey Measurements and Process Quality.” Eds Lyberg L Biemer P. Collins M. De-Leeuw E. Dippo C. Scwanz N. Trewin D.. John Wiley and Sons:249-66

 [SJB33]Fine B. (13th July 2000). “The Fine Line.” Computing.

 [SJB34]Fine B. (13th July 2000). “The Fine Line.” Computing.

 [SJB35]Rosen K. Yamplosky S. (2000). “Automatic speech recognition and a review of its functioning with dysarthric speech.” Journal of Augmentative and alternative Communication. 16:48-60

 [SJB36]Markowitz JA. (1996). “Using speech recognition.” Englewood Cliffs NJ: Prentice-Hall.

 [SJB37]Hung XD. Ariki Y. Lee KF. (1990). “Hidden Markov models for speech recognition.” Edinburgh: Edinburgh University Press.

 [SJB38]Kempainen S. “Automatic speech recognition lets machines listen and comprehend.”  (March 1997). http://www.ednmag.com/reg/1997/030397/05DF_02.htm

 [SJB39]Rabiner L. Juang B-H. (1993). “Fundamentals of speech recognition.” Englewood Cliffs. NJ: Prentice Hall.

 [SJB40]Winston PH. (1993). “Artificial Intelligence.” Addison Wesley:444

 [SJB41]Winston PH. (1993). “Artificial Intelligence.” Addison Wesley:575-598

 [SJB42]Winston PH. (1993). “Artificial Intelligence.” Addison Wesley:575-598

 [SJB43]Newell AF. Arnott JL. Cairns AY. Ricketts IW. Gregor P. “Intelligent systems for speech and language impaired people: a portfolio of research.” In  (ed Edwards A.) (1995). “Extra-ordinary human-computer interaction: interfaces for users with disabilities.” Cambridge University Press:87-89

 [SJB45]Donegan M. (2000). “Voice recognition technology in education: factors for success.” ACE:8

 [SJB47]Rosen K. Yamplosky S. (2000). “Automatic speech recognition and a review of its functioning with dysarthric speech.” Journal of Augmentative and alternative Communication. 16:48-60

 [SJB48]Syrdal A. Bennett R. Greenspan S. (1995). “Applied speech technology.” Boca Raton. Fl: CRC Press

 [SJB49]Schmitt DG. Tobias J. (1986). “Enhanced communication for a severely disabled dysarthric individual using speech recognition and speech synthesis.” Proceedings of the 9th annual conference on rehabilitation technology. RESNA 1986: Employing technology. 6:304-6

 

Ferrier LJ. Shane HC. Ballard HF. Carpenter T. Benoit A. (1995). “Dysarthric Speakers’ Intelligibility and Speech Characteristics in Relation to Computer Speech Recognition.” Journal of Augmentative and Alternative Communication.  11:165-174

 

 [SJB50]Huang XD. Lee KF. (1991). “On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition.” The institute of electrical and electronics engineers. 2(1):877-80

 [SJB51]Schmandt C. (1994). “Voice communication with computers.” New York. Van Nostrand Reinhold.

 [SJB52]Rosen K. Yamplosky S. (2000). “Automatic speech recognition and a review of its functioning with dysarthric speech.” Journal of Augmentative and alternative Communication. 16:48-60

 [SJB53]Kotler A. Thomas-Stonell N. (1997). “Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment.” Augmentative and Alternative Communication. 13:71-80

 [SJB54]Lysyk M. Smith C. (1998). “Voice recognition and dysarthria: A case study.” In proceedings of the 8th Biennial Conference of the International Society of Augmentative and Alternative Communication. Dublin:476-77

 [SJB55]Beukelman DR. Yorkston KM. (1979). “The relationship between information transfer and speech intelligibility of dysarthric speakers.” Journal of Communication disorders. 12:189-96

 [SJB56]Donegan M. (2000). “Voice recognition technology in education: factors for success.” ACE:8

 [SJB57]Schmitt DG. Tobias J. (1986). “Enhanced communication for a severely disabled dysarthric individual using speech recognition and speech synthesis.” Proceedings of the 9th annual conference on rehabilitation technology. RESNA 1986: Employing technology. 6:304-6

 [SJB58]Thomas-Stonell N. Kotler A. Leeper HA. Doyle PC. (1998). ”Computerised speech recognition: influence of intelligibility and perceptual consistency on recognition accuracy.” Journal of Augmentative and Alternative Communication. 14:51-56

 [SJB59]Menendez-Pidal X. Piolikoff JB. Peters SM. Leonzio JE. Bunnell HT. “The Nemours Database of Dysarthric speech.”

 [SJB60]Ferrier LJ. Shane HC. Ballard HF. Carpenter T. Benoit A. (1995). “Dysarthric Speakers’ Intelligibility and Speech Characteristics in Relation to Computer Speech Recognition.” Journal of Augmentative and Alternative Communication.  11:165-174

 [SJB61]Raghavendra P. Rosengren E. Hunnicutt S. (1994). “An investigation of two speech recognition systems with dysarthric speech input.” In Proceedings of the 6th biennial conference of the International Society of Augmentative and Alternative Communication. Maastricht. The Netherlands:479-81

 [SJB62]Rosengren E. Raghavendra P. Hunnicutt S. “How does automatic speech recognition handle severely dysarthric speech?” The European context for assistive technology. Proceedings of the 2nd TIDE congress 26-28 April 1995.

 [SJB63]Deller JR. Liu MS. Ferrier LJ. Robichaud P. (1993). “The Whitaker database of dysarthric (cerebral palsy) speech.” Journal of Acoustic Society of America. (6):3516-3518

 

Ferrier LJ. Jarell N. Carpenter T. Shane C. (1992). “A case study of a dysarthric speaker using the DracginDictate speech recognition system.” Journal of Computer Users in Speech and Hearing. 8(1-2):33-52

 [SJB64]Coleman CL. Myres LS. (1991). “Computer recognition of the speech of adults with cerebral palsy and dysarthria.” Augmentative and Alternative Communication. 7:34-42

 [SJB65]Rosen K. Yamplosky S. (2000). “Automatic speech recognition and a review of its functioning with dysarthric speech.” Journal of Augmentative and alternative Communication. 16:48-60

 [SJB66]Meskimen NC. “Voice recognition solutions for challenged speaker.” Proc. “Technology and Persons with Disabilities.” Conference, California State University Northridge, March 1999.

 [SJB67]Rose T. Galdo ED. (1999). “Designing speech-driven user interfaces.” Journal of Human-Computer Interaction. 11:127-28

 [SJB68]Ferrier LJ. Jarell N. Carpenter T. Shane C. (1992). “A case study of a dysarthric speaker using the DracginDictate speech recognition system.” Journal of Computer Users in Speech and Hearing. 8(1-2):33-52

 [SJB69]Ferrier LJ. Shane HC. Ballard HF. Carpenter T. Benoit A. (1995). “Dysarthric Speakers’ Intelligibility and Speech Characteristics in Relation to Computer Speech Recognition.” Journal of Augmentative and Alternative Communication.  11:165-174

 [SJB70]Thomas-Stonell N. Kotler A. Leeper HA. Doyle PC. (1998). ”Computerised speech recognition: influence of intelligibility and perceptual consistency on recognition accuracy.” Journal of Augmentative and Alternative Communication. 14:51-56

 [SJB71]Yorkston K. Beukelman D. Trynor C. (1984). “Computerised assessment of intelligibility of dysarthric speech.” Tigard, OR: C.C. Publications.

 [SJB72]Ferrier LJ. Jarell N. Carpenter T. Shane C. (1992). “A case study of a dysarthric speaker using the DracginDictate speech recognition system.” Journal of Computer Users in Speech and Hearing. 8(1-2):33-52

 [SJB73]Thomas-Stonell N. Kotler A. Leeper HA. Doyle PC. (1998). ”Computerised speech recognition: influence of intelligibility and perceptual consistency on recognition accuracy.” Journal of Augmentative and Alternative Communication. 14:51-56

 [SJB74]Doyle PC. Leeper HA. Kotler A. Thomas-Stonell N. O’Neill C. Dylke M. Rolls K. (1997). “Dysarthric speech: A comparison of computerized speech recognition and listner intelligibility.” Journal of Rehabilitation Research and Development. Vol 34. 3:309-316

 [SJB75]Kent RD. Weismer G. Kent JF. Rosenbek JC. (1989). “Toward phonetic intelligibility testing in dysarthria.” Journal of Speech Hearing Disorders. 54:482-99

 [SJB76]Fairbanks G. (1960). “Speech and articulation drillbook.” (2nd Ed). New York: Harper and Brothers.

 [SJB77]Thomas-Stonell N. Kotler A. Leeper HA. Doyle PC. (1998). ”Computerised speech recognition: influence of intelligibility and perceptual consistency on recognition accuracy.” Journal of Augmentative and Alternative Communication. 14:51-56

 [SJB78]Doyle PC. Leeper HA. Kotler A. Thomas-Stonell N. O’Neill C. Dylke M. Rolls K. (1997). “Dysarthric speech: A comparison of computerized speech recognition and listner intelligibility.” Journal of Rehabilitation Research and Development. Vol 34. 3:309-316

 [SJB79]Thomas-Stonell N. Kotler A. Leeper HA. Doyle PC. (1998). ”Computerised speech recognition: influence of intelligibility and perceptual consistency on recognition accuracy.” Journal of Augmentative and Alternative Communication. 14:51-56

 [SJB80]Thomas-Stonell N. Kotler A. Leeper HA. Doyle PC. (1998). ”Computerised speech recognition: influence of intelligibility and perceptual consistency on recognition accuracy.” Journal of Augmentative and Alternative Communication. 14:51-56

 [SJB81]Kotler A. Thomas-Stonell N. (1997). “Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment.” Augmentative and Alternative Communication. 13:71-80

 [SJB82]Lenneberg EH. (1967) “Biological Foundations of Language.” John Wiley and Sons Inc. New York.

 [SJB83]Milloy NR. (1991). “Breakdown of speech: cause and remediation.” Chapman and Hall.

 [SB84]Coventry KR. Clibbens J. Cooper M. Rood B. (1997). "Visual speech aids: a British survey of use and evaluation by speech and language therapists. European Journal of Disorders of Communication. 32 (3):203-17.

 [SJB85]Yamada Y. Javkin H. Youdelman K. (2000). “Assistive speech technology for persons with speech impairments.” Speech communication 30:179-87

 [SJB86]Yamada Y. Javkin H. Youdelman K. (2000). “Assistive speech technology for persons with speech impairments.” Speech communication 30:179-87

 [SJB87]Hatzis A. (1999). “Optical Logo-Therapy (OLT) : Computer-Based Audio-Visual Feedback Using Interactive Visual Displays for Speech Training.” PhD thesis, University of Sheffield.

 [SJB90]Aftonomos LB, Steele RD, Wertz RT. (1997) “Promoting recovery in chronic aphasia with an interactive technology. “Arch Phys Med Rehabil 1997 Aug;78(8):841-6. http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=9344303&form=6&db=m&Dopt=b

 [SJB91]Yamada Y. Javkin H. Youdelman K. (2000). “Assistive speech technology for persons with speech impairments.” Speech communication 30:179-87

 [SJB92]Donegan M. (2000). “Voice recognition technology in education: factors for success.” ACE:8

 [SB93]PovelDJ. Arends N. (1991). "Visual speech apparatus: theoretical and practical aspects." Speech Communication. 10:59-80.

 [SB94]Hudgins CV. (1935). "Visual aids in the correction of speech." Volta Review. 37:637-704.

 [SB95]Ball V. (1991). “Computer-based tools for assessment and remediation of speech.” British Journal of Disorders of Communication. 26:95-113. In Coventry KR. Clibbens J. Cooper M. Rood B. (1997). "Visual speech aids: a British survey of use and evaluation by speech and language therapists. European Journal of Disorders of Communication. 32 (3):203-17.

 [SB96]Coventry KR. Clibbens J. Cooper M. Rood B. (1997). "Visual speech aids: a British survey of use and evaluation by speech and language therapists. European Journal of Disorders of Communication. 32 (3):203-17.

 [SJB97]Deller JR. Liu MS. Ferrier LJ. Robichaud P. (1993). “The Whitaker database of dysarthric (cerebral palsy) speech.” Journal of Acoustical Society of America. 93. (6):3516-19

 [SJB98]Doddington GR. Schalk TB. (1981). “Speech recognition: turning theory to practice.” IEEE Spectrum. 18: 26-32

 [SJB99]Mendez-Pidal X. Polikoff JB. Peters SM. Leonzio JE. Bunnell HT. “The Nemours database of dysarthric speech.” Applied Science and Engineering Laboratories (ASEL). AI DuPont Institute. PO Box 269. Wilmington. DE 19899. USA http://searchpdf.adobe.com/proxies/1/54/58/40.html

 [SJB101]Thomas-Stonell N. Kotler A. Leeper HA. Doyle PC. (1998). ”Computerised speech recognition: influence of intelligibility and perceptual consistency on recognition accuracy.” Journal of Augmentative and Alternative Communication. 14:51-56

 [SJB102]Rosen K. Yamplosky S. (2000). “Automatic speech recognition and a review of its functioning with dysarthric speech.” Journal of Augmentative and alternative Communication. 16:48-60

 [SJB103]Kotler A. Thomas-Stonell N. (1997). “Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment.” Augmentative and Alternative Communication. 13:71-80

 [SJB104]Ferrier LJ. Shane HC. Ballard HF. Carpenter T. Benoit A. (1995). “Dysarthric Speakers’ Intelligibility and Speech Characteristics in Relation to Computer Speech Recognition.” Journal of Augmentative and Alternative Communication.  11:165-174

 [SJB105]Kotler A. Thomas-Stonell N. (1997). “Effects of speech training on the accuracy of speech recognition for an individual with a speech impairment.” Augmentative and Alternative Communication. 13:71-80

 [SJB106]Kempainen S. “Automatic speech recognition lets machines listen and comprehend.”  (March 1997). http://www.ednmag.com/reg/1997/030397/05DF_02.htm

 [SJB107]Bowes DR. (1999). “Getting it Right and Making it Work ! Selecting the Right Speech Input Writing Software for Users with Special Needs.” 14th annual, international conference, "Technology and Persons with Disabilities." Los Angeles. March 15-20. (1999). http://www.dinf.ch/csun_99/session1014.html

 [SJB110]Donegan M. (2000). “Voice recognition technology in education: factors for success.” ACE:32

 [SJB111]http://www.dinf.ch/csun_99/session1014.html

 [SJB112]Donegan M. (2000). “Voice recognition technology in education: factors for success.” ACE:32

 [SJB113]Donegan M. (2000). “Voice recognition technology in education: factors for success.” ACE:32