Installation

This section outlines the steps required to install ImageDataExtractor.

We strongly advise the use of a virtual environment when installing ImageDataExtractor (Click here to learn how.)

Step 1: Install Tesseract 4

ImageDataExtractor currently uses Tesseract 4 for text recognition. You can check your existing version by running:

$ tesseract -v

The source code for the correct installation can be downloaded here if required. Instructions for compiling on your machine can be found here.

Installation with pip is the simplest option. Simply run:

pip install imagedataextractor

Clone the repo and move into the directory:

git clone https://github.com/by256/imagedataextractor.git
cd imagedataextractor

Activate your virtual environment and install:

python setup.py install

Finally, download the data files necessary to be able to use ChemDataExtractor-based document extraction:

cde data download

and you're ready to go!

« Introduction

Usage »