跳至内容。

基准测试

此页面致力于对各种 Tesseract 版本和选项进行简单基准测试。测试使用的输入图像来自 问题 236

结果

构建 tessdata_best tessdata_fast tessdata
305 - - 2.4713
413noavx 37.6052 5.1589 10.1519
413avx 12.7300 2.9538 4.0860
501 6.1981 2.1241 2.9107
501ap 6.1369 2.1254 2.9221
501openmp 3.4590 1.9612 2.3554

测试环境信息

使用 Python 代码测试

import timeit
import time
import os
import pytesseract

start_time = time.time()
tess_exe = r"msvc.v5.openmp\tesseract.exe"
test_image = r"i263_speed.jpg"
os.environ['TESSDATA_PREFIX'] = r"tessdata_best\tessdata"

code_to_test = """
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"{}"
pytesseract.pytesseract.image_to_string(r"{}", lang = 'eng')
"""

runs = 15
elapsed_time = timeit.timeit(code_to_test.format((tess_exe, test_image), number=runs)/runs
print("\nDuration:", elapsed_time)

Tesseract 构建信息

信息由 tesseract -v 提供

3.05

305

它使用的是传统引擎。

tesseract 3.05.02
 leptonica-1.83.0 (Dec 17 2021, 17:33:37) [MSC v.1929 LIB Release x64]
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0

4.1

413noavx

没有 AVX2/AVX/SSE4 支持的构建

tesseract 4.1.3
 leptonica-1.83.0 (Dec 17 2021, 17:33:37) [MSC v.1929 LIB Release x64]
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0
 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9

413avx

带有 AVX2/AVX/SSE4 支持的构建

tesseract 4.1.3-1-ge9986
 leptonica-1.83.0 (Jan 26 2022, 19:15:03) [MSC v.1929 LIB Release x64]
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9

5.0

501

支持 AVX2

tesseract 5.0.1
 leptonica-1.83.0 (Dec 17 2021, 17:33:37) [MSC v.1929 LIB Release x64]
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
 Found libcurl/7.75.0 zlib/1.2.11 libssh2/1.10.1_DEV

501ap

使用以下命令构建: cmake -E env CXXFLAGS="/Qpar /fp:fast" cmake ..

tesseract 5.0.1
 leptonica-1.83.0 (Dec 17 2021, 17:33:37) [MSC v.1929 LIB Release x64]
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
 Found libcurl/7.75.0 zlib/1.2.11 libssh2/1.10.1_DEV

501openmp

OpenMP 构建已知会浪费大量 CPU 时间。由于多个用户报告问题,在 5.0.1 及更高版本中默认情况下已禁用此功能。对于其他版本(>= 4.x),建议使用环境变量 OMP_THREAD_LIMIT=1。欢迎 OpenMP 专家提供意见。

tesseract 5.0.1
 leptonica-1.83.0 (Dec 17 2021, 17:33:37) [MSC v.1929 LIB Release x64]
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 2019
 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
 Found libcurl/7.75.0 zlib/1.2.11 libssh2/1.10.1_DEV