This section outlines a few advanced options for using ImageDataExtractor.
ImageDataExtractor can be used for high-throughput data extraction using two methods:
>>> ide.extract_images('<path/to/img/dir>')
>>> ide.extract_documents('<path/to/docs/dir>')
These run the extract_image
and extract_document
methods sequentially on every file in the target directory.
ImageDataExtractor also supports .zip
, .tar
and .tar.gz
inputs.
In addition to the output graphs, images are in the following locations for transparency:
raw_images
: Contains the images that were downloaded from the article split_photo_images
: Individual images after the first splitting algorithm (whitespace)spit_grid_images
: Individual images after the second splitting algorithm (integer fractions of the grid)CSV's containing metadata are also outputted to:
_raw
: Contains metadata of the extracted image, including figure caption_csv
: Contains metadata of split images.To specify an output directory add this as the second argument to any function. For example:
>>> ide.extract_image('<path/to/image/file>', '<path/to/output>')
wll save all outputs to <path/to/output>