东北大学学报(自然科学版) ›› 2004, Vol. 25 ›› Issue (11): 1061-1064.DOI: -

  1. 东北大学信息科学与工程学院;东北大学信息科学与工程学院 辽宁沈阳 110004
  • 收稿日期:2013-06-24 修回日期:2013-06-24 出版日期:2004-11-15 发布日期:2013-06-24
Manchu character recognition post-processing based on bayes rules and substitution set confusion matrix

Li, Jing-Jiao (1); Zhao, Ji (1)   

  1. (1) Sch. of Info. Sci. and Eng., Northeastern Univ., Shenyang 110004, China; (2) Anshan Univ. of Sci. and Technol., Anshan 114002, China
  • Received:2013-06-24 Revised:2013-06-24 Online:2004-11-15 Published:2013-06-24
摘要: 将满文单词识别系统的识别信息和满文的词组信息有机地结合起来,建立满文词组和待定词集统计信息库,利用贝叶斯准则,综合满文待定词的后验概率和词组的先验概率信息,建立合理有效便于实现的数据结构,对满文单词识别系统输出存在的拒识词和错识词进行检测和纠正,从而有效地提高满文识别系统的识别率·实验表明:后处理性能除取决于语言模型外,还取决于后概率的精确估计·另外,在单词识别系统识别率高的情况下,后处理的纠错能力会增强·

关键词: 满文, 后处理, 待定词集, 模糊矩阵, 贝叶斯准则, 特征矢量, 词组库

Abstract: After combining of organically the recognition information on single Manchu characters from relevant system with the information on phrases to set up a statistical information database of Manchu phrases and underdetermined word sets, Bayes rules are used to synthesize the prior probability of underdetermined Manchu word sets and posterior probability of phrases. A data construction is thus developed to improve efficiently the recognition rate, which is rational and easy to implement especially available to detect and correct those rejected and incorrectly recognized words output from the SCR single character recognition system. Experiment shows that the post-processing performance depends on not only the language model but the accurate estimate of posterior probability. In addition, the higher the recognition rate of SCR, the stronger the rectifiability of postprocessing.
