基准测试
此页面致力于对各种 Tesseract 版本和选项进行简单基准测试。测试使用的输入图像来自 问题 236。
结果
构建 | tessdata_best | tessdata_fast | tessdata |
---|---|---|---|
305 | - | - | 2.4713 |
413noavx | 37.6052 | 5.1589 | 10.1519 |
413avx | 12.7300 | 2.9538 | 4.0860 |
501 | 6.1981 | 2.1241 | 2.9107 |
501ap | 6.1369 | 2.1254 | 2.9221 |
501openmp | 3.4590 | 1.9612 | 2.3554 |
测试环境信息
- Windows 10 64 位
- 编译器:VS 2019
- CPU:Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz 6 核
- 内存:16GB RAM
使用 Python 代码测试
import timeit
import time
import os
import pytesseract
start_time = time.time()
tess_exe = r"msvc.v5.openmp\tesseract.exe"
test_image = r"i263_speed.jpg"
os.environ['TESSDATA_PREFIX'] = r"tessdata_best\tessdata"
code_to_test = """
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"{}"
pytesseract.pytesseract.image_to_string(r"{}", lang = 'eng')
"""
runs = 15
elapsed_time = timeit.timeit(code_to_test.format((tess_exe, test_image), number=runs)/runs
print("\nDuration:", elapsed_time)
Tesseract 构建信息
信息由 tesseract -v
提供
3.05
305
它使用的是传统引擎。
tesseract 3.05.02
leptonica-1.83.0 (Dec 17 2021, 17:33:37) [MSC v.1929 LIB Release x64]
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0
4.1
413noavx
没有 AVX2/AVX/SSE4 支持的构建
tesseract 4.1.3
leptonica-1.83.0 (Dec 17 2021, 17:33:37) [MSC v.1929 LIB Release x64]
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0
Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
413avx
带有 AVX2/AVX/SSE4 支持的构建
tesseract 4.1.3-1-ge9986
leptonica-1.83.0 (Jan 26 2022, 19:15:03) [MSC v.1929 LIB Release x64]
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
5.0
501
支持 AVX2
tesseract 5.0.1
leptonica-1.83.0 (Dec 17 2021, 17:33:37) [MSC v.1929 LIB Release x64]
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
Found libcurl/7.75.0 zlib/1.2.11 libssh2/1.10.1_DEV
501ap
使用以下命令构建: cmake -E env CXXFLAGS="/Qpar /fp:fast" cmake ..
tesseract 5.0.1
leptonica-1.83.0 (Dec 17 2021, 17:33:37) [MSC v.1929 LIB Release x64]
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
Found libcurl/7.75.0 zlib/1.2.11 libssh2/1.10.1_DEV
501openmp
OpenMP 构建已知会浪费大量 CPU 时间。由于多个用户报告问题,在 5.0.1 及更高版本中默认情况下已禁用此功能。对于其他版本(>= 4.x),建议使用环境变量 OMP_THREAD_LIMIT=1
。欢迎 OpenMP 专家提供意见。
tesseract 5.0.1
leptonica-1.83.0 (Dec 17 2021, 17:33:37) [MSC v.1929 LIB Release x64]
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0 : libopenjp2 2.4.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 2019
Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 libzstd/1.4.9
Found libcurl/7.75.0 zlib/1.2.11 libssh2/1.10.1_DEV