提示: 手机请竖屏浏览!

应用神经假体解码瘫痪合并构音不全患者的语言
Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria


David A. Moses ... 其他 • 2021.07.15

人工智能语音信号解码技术帮助构音不全瘫痪患者恢复语音交流能力

 

罗思琦‡,吴元波†,黄鑫‡*

†中国科技大学附属第一医院神经内科;‡安徽科大讯飞医疗信息技术有限公司研究院

*通讯作者

 

构音不全瘫痪患者由于脑部疾病、物理外伤等原因损伤了脑部语言及其相关区域,语言功能产生障碍,仅能够发出极其有限的声音或模糊不清的语音。引起构音不全的常见原因有脑卒中、脑外伤、脑肿瘤、脑部炎症等疾病,其中以脑卒中最为常见(据报道,卒中后构音不全占脑卒中人群的20%~40% 1)。卒中后构音不全大多是大脑中动脉或大脑后动脉分支病变的结果,右利手患者一般伴右侧偏瘫,这就导致他们不仅不能开口说话,同时也很难借助写字、打字等手段进行表达与沟通。这为患者和照料者在生活质量的维持、心理健康的保障以及长期认知水平的发展等方面都带来相当不利的影响。

查看更多

摘要


背景

对于丧失讲话能力的瘫痪患者,通过技术恢复其沟通能力有可能提高自主性和生活质量。直接根据患者大脑皮质活动解码单词和句子的方法与目前的辅助沟通方法相比,可能有所进步。

 

方法

一名患者因脑干卒中导致了构音不全(丧失清晰讲话的能力)合并痉挛性四肢瘫痪,我们在该患者控制语言的感觉运动皮质区上方植入硬膜下高密度多电极阵列。在48次测试中,参与者尝试说出词汇表(包含50个单词)中的单个单词,我们在这一期间记录了22个小时的皮质活动。我们使用深度学习算法创建计算模型,用于从记录的皮质活动模式中检测和分类单词。我们应用这些计算模型以及自然语言模型(该模型可根据序列中前面的单词得出下一个单词的概率),在参与者尝试说出句子时解码完整句子。

 

结果

我们根据参与者的皮质活动实时解码句子,中位速度为每分钟15.2个单词,单词中位错误率为25.6%。在事后分析中,在参与者尝试说出单个单词时,我们检测到了98%的尝试,并且我们使用在整个81周研究期间保持稳定的皮质信号,以47.1%的准确度对单词作出分类。

 

结论

一名患者因脑干卒中导致了构音不全合并痉挛性四肢瘫痪,我们使用深度学习模型和自然语言模型,直接根据患者尝试讲话时的皮质活动解码了单词和句子(由Facebook等资助;在ClinicalTrials.gov注册号为NCT03698149)。





作者信息

David A. Moses, Ph.D., Sean L. Metzger, M.S., Jessie R. Liu, B.S., Gopala K. Anumanchipalli, Ph.D., Joseph G. Makin, Ph.D., Pengfei F. Sun, Ph.D., Josh Chartier, Ph.D., Maximilian E. Dougherty, B.A., Patricia M. Liu, M.A., Gary M. Abrams, M.D., Adelyn Tu-Chan, D.O., Karunesh Ganguly, M.D., Ph.D., and Edward F. Chang, M.D.
From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley–UCSF, Berkeley (S.L.M., J.R.L., E.F.C.). Address reprint requests to Dr. Chang at edward.chang@ucsf.edu.

 

参考文献

1. Beukelman DR, Fager S, Ball L, Dietz A. AAC for adults with acquired neurological conditions: a review. Augment Altern Commun 2007;23:230-242.

2. Nip I, Roth CR. Anarthria. In: Kreutzer J, DeLuca J, Caplan B, eds. Encyclopedia of clinical neuropsychology. 2nd ed. New York: Springer International Publishing, 2017:1-1.

3. Felgoise SH, Zaccheo V, Duff J, Simmons Z. Verbal communication impacts quality of life in patients with amyotrophic lateral sclerosis. Amyotroph Lateral Scler Frontotemporal Degener 2016;17:179-183.

4. Sellers EW, Ryan DB, Hauser CK. Noninvasive brain–computer interface enables communication after brainstem stroke. Sci Transl Med 2014;6(257):257re7-257re7.

5. Vansteensel MJ, Pels EGM, Bleichner MG, et al. Fully implanted brain–computer interface in a locked-in patient with ALS. N Engl J Med 2016;375:2060-2066.

6. Pandarinath C, Nuyujukian P, Blabe CH, et al. High performance communication by people with paralysis using an intracortical brain–computer interface. Elife 2017;6:e18554-e18554.

7. Brumberg JS, Pitt KM, Mantie-Kozlowski A, Burnison JD. Brain–computer interfaces for augmentative and alternative communication: a tutorial. Am J Speech Lang Pathol 2018;27:1-12.

8. Linse K, Aust E, Joos M, Hermann A. Communication matters — pitfalls and promise of hightech communication devices in palliative care of severely physically disabled patients with amyotrophic lateral sclerosis. Front Neurol 2018;9:603-603.

9. Bouchard KE, Mesgarani N, Johnson K, Chang EF. Functional organization of human sensorimotor cortex for speech articulation. Nature 2013;495:327-332.

10. Lotte F, Brumberg JS, Brunner P, et al. Electrocorticographic representations of segmental features in continuous speech. Front Hum Neurosci 2015;9:97-97.

11. Guenther FH, Hickok G. Neural models of motor speech control. In: Hickok G, Small S, eds. Neurobiology of language. Cambridge, MA: Academic Press, 2015:725-740.

12. Mugler EM, Tate MC, Livescu K, Templer JW, Goldrick MA, Slutzky MW. Differential representation of articulatory gestures and phonemes in precentral and inferior frontal gyri. J Neurosci 2018;38:9803-9813.

13. Chartier J, Anumanchipalli GK, Johnson K, Chang EF. Encoding of articulatory kinematic trajectories in human speech sensorimotor cortex. Neuron 2018;98(5):1042.e4-1054.e4.

14. Salari E, Freudenburg ZV, Branco MP, Aarnoutse EJ, Vansteensel MJ, Ramsey NF. Classification of articulator movements and movement direction from sensorimotor cortex activity. Sci Rep 2019;9:14165-14165.

15. Herff C, Heger D, de Pesters A, et al. Brain-to-text: decoding spoken phrases from phone representations in the brain. Front Neurosci 2015;9:217-217.

16. Angrick M, Herff C, Mugler E, et al. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J Neural Eng 2019;16:036019-036019.

17. Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences. Nature 2019;568:493-498.

18. Moses DA, Leonard MK, Makin JG, Chang EF. Real-time decoding of question-and-answer speech dialogue using human cortical activity. Nat Commun 2019;10:3096-3096.

19. Makin JG, Moses DA, Chang EF. Machine translation of cortical activity to text with an encoder-decoder framework. Nat Neurosci 2020;23:575-582.

20.Martin S, Iturrate I, Millán JDR, Knight RT, Pasley BN. Decoding inner speech using electrocorticography: progress and challenges toward a speech prosthesis. Front Neurosci 2018;12:422-422.

21. Guenther FH, Brumberg JS, Wright EJ, et al. A wireless brain–machine interface for real-time speech synthesis. PLoS One 2009;4(12):e8218-e8218.

22. Brumberg JS, Wright EJ, Andreasen DS, Guenther FH, Kennedy PR. Classification of intended phoneme production from chronic intracortical microelectrode recordings in speech-motor cortex. Front Neurosci 2011;5:65-65.

23. Moses DA, Leonard MK, Chang EF. Real-time classification of auditory sentences using evoked cortical activity in humans. J Neural Eng 2018;15:036005-036005.

24. Kneser R, Ney H. Improved backing-off for M-gram language modeling. In: Conference proceedings: 1995 International Conference on Acoustics, Speech, and Signal Processing. Vol. 1. New York: Institute of Electrical and Electronics Engineers, 1995:181-184.

25. Chen SF, Goodman J. An empirical study of smoothing techniques for language modeling. Comput Speech Lang 1999;13:359-394.

26. Viterbi AJ. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 1967;13:260-269.

27. Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. In: Bengio Y, LeCun Y, eds. Workshop at the International Conference on Learning Representations. Banff, AB, Canada: ICLR Workshop, 2014.

28. Kanas VG, Mporas I, Benz HL, Sgarbas KN, Bezerianos A, Crone NE. Real-time voice activity detection for ECoG-based speech brain machine interfaces. In: 19th International Conference on Digital Signal Processing: proceedings. New York: Institute of Electrical and Electronics Engineers, 2014:862-865.

29. Dash D, Ferrari P, Dutta S, Wang J. NeuroVAD: real-time voice activity detection from non-invasive neuromagnetic signals. Sensors (Basel) 2020;20:2248-2248.

30. Sollich P, Krogh A. Learning with ensembles: how overfitting can be useful. In: Touretzky DS, Mozer MC, Hasselmo ME, eds. Advances in neural information processing systems 8. Cambridge, MA: MIT Press, 1996:190-196.

31. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Bartlett P, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ, eds. Advances in neural information processing systems 25. Red Hook, NY: Curran Associates, 2012:1097-1105.

32. Shoham S, Halgren E, Maynard EM, Normann RA. Motor-cortical activity in tetraplegics. Nature 2001;413:793-793.

33. Hochberg LR, Serruya MD, Friehs GM, et al. Neuronal ensemble control of prosthetic devices by a human with tetraplegia. Nature 2006;442:164-171.

34. Watanabe S, Delcroix M, Metze F, Hershey JR, eds. New era for robust speech recognition: exploiting deep learning. Berlin: Springer-Verlag, 2017.

35. Wolpaw JR, Bedlack RS, Reda DJ, et al. Independent home use of a brain–computer interface by people with amyotrophic lateral sclerosis. Neurology 2018;91(3):e258-e267.

36. Silversmith DB, Abiri R, Hardy NF, et al. Plug-and-play control of a brain–computer interface through neural map stabilization. Nat Biotechnol 2021;39:326-335.

37. Chao ZC, Nagasaka Y, Fujii N. Long-term asynchronous decoding of arm motion using electrocorticographic signals in monkeys. Front Neuroeng 2010;3:3-3.

38. Rao VR, Leonard MK, Kleen JK, Lucas BA, Mirro EA, Chang EF. Chronic ambulatory electrocorticography from human speech cortex. Neuroimage 2017;153:273-282.

39. Pels EGM, Aarnoutse EJ, Leinders S, et al. Stability of a chronic implanted brain–computer interface in late-stage amyotrophic lateral sclerosis. Clin Neurophysiol 2019;130:1798-1803.

服务条款 | 隐私政策 | 联系我们