Extraction Templates vs. Prompt Templates
Why not just prompt templates?
While prompt templates (such as „Classify the following text into the following categories…“) can get the work done for simple extraction tasks, they are lacking in a few crucial areas:
Limited to LLMs
The idea of Parsee’s extraction templates is to define a format that is not just limited to the use with LLMs, but can be used also to train other model types and run predictions with these
Messy for More Complex Extraction/Structuring Task
While prompt templates can work well for simple tasks such as „What is the invoice total?“ it gets very quickly very messy when also wanting to extract currencies, units, time periods etc. along with some „main“ piece of information.
Type Safety
While you can tell an LLM to return data in a specific format, there is no „guarantee“ it will do that. We try to parse the data according to the defined data type and if the parsing is not successful, we return a „null“ value, to ensure that the output data is really exactly in the right format
Parsee Cloud kostenlos testen
- Data ExtractionVergleich zwischen Parsee Dokumenten Loader und Langchain Dokumenten Loader für PDFsIm Folgenden werden wir die Ergebnisse des Parsee Document Loader mit denen des PyPDF Langchain Document Loader für verschiedene Datensätze vergleichen. Alle hier verwendeten Datensätze sind auf Huggingface zu finden (Links unten), so dass die Ergebnisse alle reproduzierbar sind.
- ParseeParsee LaunchParsee aims to be a simple, opinionated framework for easily structuring data from the most common sources of unstructured data. These are in our opinion: pdfs, HTML files and images.