您可以只包装
tesseract一个函数:
import osimport tempfileimport subprocessdef ocr(path): temp = tempfile.NamedTemporaryFile(delete=False) process = subprocess.Popen(['tesseract', path, temp.name], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) process.communicate() with open(temp.name + '.txt', 'r') as handle: contents = handle.read() os.remove(temp.name + '.txt') os.remove(temp.name) return contents
如果您希望文档分割和更多高级功能,请尝试OCRopus。



