NET|NET6使用PaddleOCR识别图片中的文字信息

【NET|NET6使用PaddleOCR识别图片中的文字信息】最近发现一个开源的OCR项目,PaddleOCR,支持通过离线部署Hub Serving服务来识别和本地程序包识别。
运行环境 :Windows 10
开发工具: Visual Studio 2022
NET版本:NET6
需要安装的程序包:PaddleOCR,版本:0.0.5 。以及PaddleOCRUtf8,版本:0.0.5
刚刚开始时候使用PaddleOCR来识别,发现英文和数字可以成功识别,准确率还很高。后面发现识别中文的时候,出现中文乱码(识别模型都是用的同一个)。后面用PaddleOCRUtf8包识别,发现可以解决中文乱码的问题,如下图:
识别图片:
NET|NET6使用PaddleOCR识别图片中的文字信息
文章图片

识别的基础代码如下:

using System.Text; using System.Text.Json; namespace JuCheap_Demo_OCR { internal class PaddleOCRService { //基础路径 private readonly static string _basePath = AppDomain.CurrentDomain.BaseDirectory; //识别图片的路径 private readonly static string _imagePath = $"{_basePath}\\id_card.jpg"; private readonly string _detPath = $"{_basePath}\\PaddleModel\\ch_ppocr_server_v2.0_det_infer"; private readonly string _recPath = $"{_basePath}\\PaddleModel\\ch_ppocr_server_v2.0_rec_infer"; private readonly string _clsPath = $"{_basePath}\\PaddleModel\\ch_ppocr_mobile_v2.0_cls_infer"; private readonly string _charListFileListPath = $"{_basePath}\\PaddleModel\\chinese_zh_dict.txt"; private readonly string _fileBase64 = Convert.ToBase64String(File.ReadAllBytes(_imagePath), Base64FormattingOptions.None); /// /// PaddleOCR包本地识别 /// public async Task RecognizeByPaddleOCR() { WriteOneLine(); //通过本地程序包识别(英文和数字可以。中文会出现乱码) PaddleOCR.PaddleOCR.Initialize(_detPath, _recPath, _clsPath, _charListFileListPath, 4, true); var result = await PaddleOCR.PaddleOCR.Recognize(_imagePath); foreach (var box in result.Boxes) { Console.WriteLine($"PaddleOCR本地包识别结果={box.Text},信任度={box.Score}"); } }/// /// PaddleOCRUtf8本地识别 /// public async Task RecognizeByPaddleOCRUtf8() { WriteOneLine(); //解决中文乱码问题 PaddleOCRUtf8.PaddleOCR.Initialize(_detPath, _recPath, _clsPath, _charListFileListPath, 4, true); var resultUtf8 = await PaddleOCRUtf8.PaddleOCR.Recognize(_imagePath); foreach (var box in resultUtf8.Boxes) { Console.WriteLine($"PaddleOCRUtf8本地包识别结果={box.Text},信任度={box.Score}"); } }/// /// 使用python搭建的HubServing解析服务识别 /// public async Task RecognizeByHubServing() { WriteOneLine(); try { //通过hub ocr_system识别 var client = new HttpClient(); client.BaseAddress = new Uri("http://127.0.0.1:8866/"); var postData = https://www.it610.com/article/new { images = new string[] { _fileBase64 } }; var content = new StringContent(JsonSerializer.Serialize(postData), Encoding.UTF8,"application/json"); var response = await client.PostAsync("predict/ocr_system", content); var responseContent = await response.Content.ReadAsStringAsync(); var responseResult = JsonSerializer.Deserialize(responseContent); if (responseResult != null && responseResult.Data != null) { foreach (var items in responseResult.Data) { foreach (var box in items) { Console.WriteLine($"HubServing识别结果={box.Text},信任度={box.Confidence}"); } } } } catch (Exception ex) { Console.WriteLine($"Hub Serving识别异常:{ex.Message}"); }WriteOneLine(); }private void WriteOneLine() { Console.WriteLine($"--------------------------------------------------------------------------------------------------"); } } }

识别结果:
NET|NET6使用PaddleOCR识别图片中的文字信息
文章图片


源代码:
https://gitee.com/jucheap/demo
里面的JuCheap-Demo-OCR项目,直接运行,可以看到效果。
总结:本地包的识别,多少会有点问题,比如:【公民身份证】没有识别完整。推荐使用Hub Serving来搭建服务识别。准确率更高。

    推荐阅读