Back To Blog
Parsee

Parsee Launch

February 29, 2024 - 2 min read
frame-1321315523
Parsee aims to be a simple, opinionated framework for easily structuring data from the most common sources of unstructured data. These are in our opinion: pdfs, HTML files and images.

Why Parsee?

Parsee aims to be a simple, opinionated framework for easily structuring data from the most common sources of unstructured data. These are in our opinion: pdfs, HTML files and images.

Parsee is NOT aimed to be used in a „chat bot“ format. For fully structuring data we require the output to be most concise and type safe (I.e. we want to make sure the output is just a number or enum). For chat bot style usage there are already many frameworks available. That’s also why in this first release at least, we have not prioritized the integration with popular frameworks such as Langchain. We might add support for these later if requested by enough users.

If you are interested in running multiple extraction jobs in parallel in the cloud, you can sign up for Parsee Cloud: app.parsee.ai

In Parsee Cloud you can also find pre-defined extraction templates from the community or share your own.

Parsee can be both used with LLMs and other model architectures. The latter require the presence of a dataset, which you can also create in the python api and on parsee cloud. If you don’t have a dataset yet, you can always use a range of LLMs, which can already perform most extraction tasks fairly well (see the examples).

Open Source Framework Data Extraction and Structuring

Try Parsee Cloud for free

Explore Parsee Cloud's Document Processing Capabilities at No Cost
Related posts
  • Data Extraction
    finRAG Datasets & Study
    We wanted to investigate how good the current state of the art (M)LLMs are at solving the relatively simple problem of extracting revenue figures from publicly available financial reports. To test this, we created 3 different datasets, all based on the same selection of 1,156 randomly selected annual reports for the year 2023 of publicly listed US companies. The resulting datasets contain a combined total of 10,404 rows, 37,536,847 tokens and 1,156 images. For our study, we are evaluating 8 state-of-the-art (M)LLMs on a subset of 100 reports.
  • Data Extraction
    Comparing Parsee Document Loader vs. Langchain Document Loaders for PDFs
    In the following we will be comparing the results of the Parsee Document Loader vs. the PyPDF Langchain Document Loader for various datasets. All datasets that are used here can be found on Huggingface (links below), so the results are all reproducible.