安装emgucv.
Emgu CV – Browse /emgucv/4.1.1 at SourceForge.net
一、下载tesseract-ocr
http://digi.bib.uni-mannheim.de/tesseract/
二、安装
安装完毕后设置环境变量
TESSDATA_PREFIX X:\Program Files\Tesseract-OCR\
将X:\Tesseract-OCR”添加到环境变量Path中 (可以省略,设置该变量后可以用于训练)
参考:
Emgu.CV.OCR Unable to create ocr model using Path and language_jzdzhiyun的博客-CSDN博客
tesseract-ocr的安装及使用_褶皱的包子的博客-CSDN博客_tesseract-ocr
三、下载其他语言识别包 ,也可在安装时直接勾选要安装的语言包,放入tessdata文件夹。
https://github.com/tesseract-ocr/tessdata
GitHub – tesseract-ocr/tessdata_best: Best (most accurate) trained LSTM models.
四、使用
using Emgu.CV.Structure;
using Emgu.CV;
using Emgu.CV.OCR;
private Tesseract _ocr;//创建Tesseract 类
string path = Application.StartupPath+"//";//申明数据源的路径,在运行目录的tessdata 文件夹下。
string language = "";//申明选择语言。
//*判断选择的语言*//
if (checkBox1.Checked && checkBox2.Checked)//checkBox1 为识别英文。
{
language = "chi_sim+eng";
}
else
{
if (checkBox2.Checked)
{
language = "chi_sim";
}
else
{
language = "eng";
checkBox1.Checked = true;
}
}
try
{//https://github.com/tesseract-ocr/tessdata Application.StartupPath + @"\tessdata" \tessdata .TesseractOnly)) //TesseractCubeCombined
_ocr = new Tesseract(@"E:\Program Files\Tesseract-OCR\tessdata", language, OcrEngineMode.Default);//指定参数实例化tessdata 类。地址为空时,需将tessdata文件夹放在debug根目录
_ocr.PageSegMode = PageSegMode.SingleBlock;
_ocr.SetImage(gray);
int result = _ocr.Recognize();
if (result != 0)
{
MessageBox.Show("识别失败!");
return;
}
Tesseract.Character[] characters = _ocr.GetCharacters();//获取识别数据
//Bgr drawColor = new Bgr(Color.Blue);//创建Bgr 为蓝色。
//foreach (Tesseract.Character c in characters)//遍历每个识别数据。
//{
// image.Draw(c.Region, drawColor, 1);//绘制检测到的区域。
//}
//imageBox1.Image = image;//显示绘制矩形区域的图像
String text = _ocr.GetUTF8Text();//得到识别字符串。
richTextBox1.Text = text;//显示获取的字符串。
}
catch(Exception ex)
{
MessageBox.Show("检查运行目录是否有语言包"+ex.ToString());
}
五、效果
Emgu.CV.OCR.Tesseract.Tesseract(string, string, Emgu.CV.OCR.Tesseract.OcrEngineMode, string)
public Tesseract(string dataPath, string language, Emgu.CV.OCR.Tesseract.OcrEngineMode mode, string whiteList)
Member of Emgu.CV.OCR.Tesseract
Summary:
Create an tesseract OCR engine.
Parameters:
dataPath: The datapath must be the name of the parent directory of tessdata and must end in / . Any name after the last / will be stripped.
language: The language is (usually) an ISO 639-3 string or NULL will default to eng. It is entirely safe (and eventually will be efficient too) to call Init multiple times on the same instance to change language, or just to reset the classifier. The language may be a string of the form [~]%lt;lang>[+[~]<lang>]* indicating that multiple languages are to be loaded. Eg hin+eng will load Hindi and English. Languages may specify internally that they want to be loaded with one or more other languages, so the ~ sign is available to override that. Eg if hin were set to load eng by default, then hin+~eng would force loading only hin. The number of loaded languages is limited only by memory, with the caveat that loading additional languages will impact both speed and accuracy, as there is more work to do to decide on the applicable language, and there is more chance of hallucinating incorrect words.
mode: OCR engine mode
whiteList: This can be used to specify a white list for OCR. e.g. specify "1234567890" to recognize digits only. Note that the white list currently seems to only work with OcrEngineMode.OEM_TESSERACT_ONLY
Tesseract tesseract = new Tesseract();
tesseract.Init(path, lang,Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);//path为语言包路径,lang为语言
tesseract.SetVariable("tessedit_char_whitelist", "0123456789");
Tesseract tesseract = new Tesseract();
tesseract.Init(path, lang,Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);//path为语言包路径,lang为语言
tesseract.SetVariable("tessedit_char_whitelist", "0123456789");
参考:
C# OpenCV6 -车牌识别__iorilan的博客-CSDN博客_c#车牌识别
Tesseract.GetText, Emgu.CV.OCR C# (CSharp) Code Examples – HotExamples
OpenCVSharp入门教程——导读_小康师兄的博客-CSDN博客_opencvsharp中文文档
http://code.google.com/p/opencvsharp/w/list