使用tesseract.js-offline识别图片文字记录

Post author:xfxia
Post published:2023年9月10日
Post category:其他

一、概述

Tesseract.js
是一个 JavaScript 库，可以从图像中获取几乎任何语言的单词。

离线版：

https://github.com/jeromewu/tesseract.js-offline
电子版：

https://github.com/jeromewu/tesseract.js-electron
自定义训练数据：

https://github.com/jeromewu/tesseract.js-custom-traineddata
Chrome 扩展 #1：

https://github.com/jeromewu/tesseract.js-chrome-extension
Chrome 扩展 #2：

https://github.com/fxnoob/image-to-text
Firefox 扩展：

https://github.com/gnonio/korporize
使用 Vue：

https://github.com/jeromewu/tesseract.js-vue-app
使用 Angular：

https://github.com/jeromewu/tesseract.js-angular-app
使用 React：

https://github.com/jeromewu/tesseract.js-react-app
Typescript：

https://github.com/jeromewu/tesseract.js-typescript
视频实时识别：

https://github.com/jeromewu/tesseract.js-video

二、离线版

克隆

git clone https://github.com/jeromewu/tesseract.js-offline

依赖下载

yarn/npm install

启动
访问
http://127.0.0.1:3000/browser/index.html
或者不启动直接对
html
文件进行
Open with Live Server

npm run start

直接使用脚本运行英文示例

const { createWorker } = require('tesseract.js');
const path = require('path');
// 语言设置，单种语言识别
// chi_sim----eng
const language = 'eng'

const worker = createWorker({
  langPath: path.join(__dirname, '..', 'lang-data'), 
  logger: m => console.log(m),
});

(async () => {
  await worker.load();
  await worker.loadLanguage(language);
  await worker.initialize(language);
  const { data: { text } } = await worker.recognize(path.join(__dirname, '..', 'images/en', 'demo_eurotext.png'));
  console.log(text);
  await worker.terminate();
})();

node .\node\index.js

在这里插入图片描述

识别其他语言
chi_sim
是中文

其他语言训练数据下载

下载后放置在
lang-data
应用后会解压

原文链接：https://blog.csdn.net/weixin_46037781/article/details/125799183

一、概述

二、离线版

你可能也喜欢