Comparing Parsee Document Loader vs. Langchain Document Loaders for PDFs

With the datasets in this folder we want to test how the results of an LLM for extracting structured data from invoices differs for different document loaders.

Both datasets have their own Readme's with more info about the methodology, notebooks for the creation of the dataset and evaluation results:

1. Invoice Dataset - Langchain Loader

parsee-core version used: 0.1.3.11

This dataset was created on the basis of 15 sample invoices (PDF files).

All PDF files are publicly accessible on parsee.ai, to access them copy the "source_identifier" (first column) and paste it in this URL (replace '{SOURCE_IDENTIFIER}' with the actual identifier):

https://app.parsee.ai/documents/view/{SOURCE_IDENTIFIER}

So for example:

https://app.parsee.ai/documents/view/1fd7fdbd88d78aa6e80737b8757290b78570679fbb926995db362f38a0d161ea

The invoices were selected randomly and are in either German or English.

The following code was used to create the dataset: jupyter notebook

The correct answers for each row were loaded from Parsee Cloud, where they were checked by a human and corrected prior to running this code.

1.1 LLM Evaluation

For the evaluation we are using the mistralai/mixtral-8x7b-instruct-v0.1 model from replicate.

The results of the evaluation can be found here: jupyter notebook

1.2 Result

Even though the Parsee PDF Reader was not initially designed for invoices (which have often quite fractured text pieces and tables that are difficult to structure properly), it is still able to outperform the langchain PyPDF reader with a total accuracy of 88% vs. 82% for the langchain reader.

Parsee PDF Reader compared with Langchain PyPDF

2. Revenues Dataset - Parsing Tables

This dataset consists of 15 pages from annual/quarterly reports of German companies (PDF files), the filings are in English though.

The goal is to evaluate two things:

How well can a state-of-the-art LLM retrieve complex structured information from the documents?
How does the Parsee.ai document loader fare against the langchain PyPDF loader for this document type

We are using the Claude 3 Opus model for all runs here, as this was the most capable model in our prior experiments (beating GPT 4).