This section outlines the steps required to install ImageDataExtractor.
We strongly advise the use of a virtual environment when installing ImageDataExtractor (Click here to learn how.)
ImageDataExtractor currently uses Tesseract 4 for text recognition. You can check your existing version by running:
$ tesseract -v
The source code for the correct installation can be downloaded here if required. Instructions for compiling on your machine can be found here.
pip
(recommended)Installation with pip
is the simplest option. Simply run:
pip install imagedataextractor
Clone the repo and move into the directory:
git clone https://github.com/by256/imagedataextractor.git
cd imagedataextractor
Activate your virtual environment and install:
python setup.py install
Finally, download the data files necessary to be able to use ChemDataExtractor-based document extraction:
cde data download
and you're ready to go!