Document Extraction
We need to extract data from the documents in the system to be able to create configurable action / condition rules based on document data.
We have a multitude of possibilities for data extraction:
- Metadata extraction
- OriginalFilename
- Filesize
- MimeTpe
- Content extraction -> see also PDF Tools
- Text extraction
- PDF text extraction (just raw text - no text in images)
- Image text extraction using OCR
- Structured data extraction
- Text extraction