This page gives a introduction on how to get started with ImageDataExtractor. This assumes you already have ImageDataExtractor and its requirements installed.
>>> import imagedataextractor as ide
ImageDataExtractor can be used to extract information from images directly. Conversely, microscopy images can be automatically identified and extracted from HTML or XML documents, followed by particle extraction with ImageDataExtractor. The latter requires ChemDataExtractor to be installed.
You can view the example usage notebook here.
Simply provide as input a path to an image or a document, or a path to a directory of images and/or documents, as well as an output directory which specifies where you would like the results to be written to. If the input image is a figure containing a panel of images, these will be split and extraction will be performed on each sub-image separately.
>>> data = ide.extract(input_path)
This will return a list of EMData
objects, each of which contains the image, resulting segmentation, uncertainty, scalebar information and extracted quantitative data for each detected particle.
The resulting segmentation and its uncertainty can be accessed by
>>> seg = data.segmentation
>>> uncertainty = data.uncertainty
You can obtain a pandas DataFrame
containing all extracted data from an EMData
object.
>>> df = data.to_pandas()
Extracted scalebar information can be accessed from the scalebar
attribute of an EMData
object.
>>> sb_text = data.scalebar.text
>>> conversion = data.scalebar.conversion
>>> units = data.scalebar.units
>>> sb_contours = data.scalebar.scalebar_contour
And that's it!
ImageDataExtractor currently supports HTML documents from the Royal Society of Chemistry and XML files obtained using the Elsevier Developers Portal.
The segmentation model can be adjusted using the seg
keyword arguments of ide.extract
:
>>> data = ide.extract(input_path, seg_bayesian=True, seg_tu=0.0125, seg_n_samples=30, seg_device='cpu')
For optimal performance, particle segmentation is performed using Bayesian inference by default. Segmentation can be performed discriminatively, although this is not recommended, due to the significant accuracy and precision gains afforded by the Bayesian version. Setting the seg_bayesian
argument to True
will allow the segmentation model to run in the recommended Bayesian-mode. The default is True
.
False positives are filtered automatically using the uncertainties afforded by Bayesian inference. The threshold beyond which particles are filtered can be adjusted using the seg_tu
parameter. The default is 0.0125
.
Performing Bayesian inference by Monte Carlo sampling slows down the extraction process noticeably. The number of Monte Carlo samples used in inference can be set using the seg_n_samples
argument. The default is 30
.
Extraction can be accelerated by utilising a Graphics Processing Unit (GPU). Specifying the device
argument as 'cuda'
allows particle segmentation to be performed on a GPU, if one is available. This can speed up extraction significantly, particularly if extraction is being run in Bayesian mode. The default is 'cpu'
.