Document Extraction

Overview diagram

We need to extract data from the documents in the system to be able to create configurable action / condition rules based on document data.

We have a multitude of possibilities for data extraction:

Metadata extraction
- OriginalFilename
- Filesize
- MimeTpe
Content extraction -> see also PDF Tools
- Text extraction
  - PDF text extraction (just raw text - no text in images)
  - Image text extraction using OCR
- Structured data extraction

Document Extraction ​