Skip to content

Powering Next-Gen AI: Datasets to Refine & Revolutionize LLM Models

Curated, Comprehensive, Global Company Data for Superior AI Outcomes

High-quality well-structured LLM training data

100+ terabytes of raw data from original sources

Information on over 100 million private companies

Data on licensing, service and loan agreements

Our Solutions

RoyaltyRange – Your Source for Premium LLM Training Data

Big Scale Data: structured information on over 100 million private companies, including detailed and consistent financials, ownership details, tagged activities, descriptions and more. Includes 100+ terabytes of linked raw data from original sources, ensuring traceability and transparency.

  • CompID – Company Data for AI Excellence


Manually Curated with Extensive Labels: each dataset is carefully curated by experts and extensively labeled, covering both quantitative and qualitative aspects of the agreements. From the names of the involved parties to detailed summaries and industry classifications, our data offers a comprehensive view that enriches AI training and functionality. Includes negative examples.

  • Royalty & Franchise Agreements
  • Intercompany Service Agreements
  • Loan Agreements
Copy of Copy of Royalty Rates LinkedIn post

CompID Dataset – Beyond Company Financials and Insights

100+ terabytes of raw data from original sources, ensuring traceability and transparency.

Financial ratios, company names, addresses, standardized legal forms and status ownership relationships (while respecting privacy laws) for better context.

We offer meticulously curated datasets containing structured financial line items extracted from annual reports across multiple jurisdictions.

Sector-specific, structured data is critical to refine LLM’s understanding of their specialized domain.

Our datasets include formulas, values, and supporting original company reports (PDF, iXBRL, XBRL, XLSX).


Royalty & Franchise Agreements Dataset – Unlocking Insights from Complex Agreements

Fully curated dataset with extensive labels on quantitative and qualitative agreement aspects.


Intercompany Services Data – Mapping the Intricacies of Service Agreements

Dataset detailing company services with labels covering tagged types, industries, scope, geographical scope, pricing (fee and remuneration details) and contract summaries.


Loan Agreements Dataset – Navigating Lending Landscapes

Curated loan agreements with labels including transaction type, parties, geographical scope, industries, terms, credit ratings, and detailed summaries.

Curated Agreement Data to Power Specific AI Use Cases

Royalty Rates dataset potential use cases:

  • Market Analysis: train models to track industry trends, royalty rates, and standard terms for better deal negotiation.
  • Predictive Modeling: forecast potential franchise success based on historical patterns.
  • Compliance Monitoring: train your models to automate the review of agreements to identify potential risks or deviations from standards.
  • Inhouse Knowledge Base Augmentation: the database can enhance an in-house knowledge base by providing a comprehensive reference of historical and current royalty rates and franchise terms, aiding in more informed decision-making and strategy development.
  • Comparability Factors: 50+ different label types.

Service Fees dataset potential use cases:

  • Inhouse Knowledge Base Expansion: enrich your internal knowledgebase
  • Trained Model Validation: independent validation set for your LegalTech models

Loan Rates dataset potential use cases:

  • Risk Assessment: build models to evaluate the risk profiles of borrowers and lenders.
  • Regulatory Compliance: ensure agreements adhere to ever-changing lending practices and requirements
  • Trend Identification: detect emerging lending patterns or anomalies within specific sectors.

AI Training for Robustness

The Power of Positive and Negative Examples

Elevate your AI models with negative examples to refine their understanding of real-world contract variations. Our datasets include both positive and negative examples.

Global diversity: our AI models benefit from a wide range of data sourced from various jurisdictions, reflecting diverse accounting regulations and reporting languages.

Having detailed and well-structured datasets is essential for achieving efficient results when using AI technology.

Elevate Your AI with the Power of Data

Our global focus provides diverse data reflecting different accounting standards, enhancing LLM adaptability

Strengthen your data architecture with seamless access to large bodies of structured and unstructured data, tailored to your requirements.

The AI Data Challenge

High-Quality LLM Training Data – The Key to Unlocking AI Potential

  • Large Language Models (LLMs) are revolutionizing AI, but their accuracy and reliability depend on the quality of data they’re trained on.
  • Generic datasets often lead to bias, inaccuracies, and the dreaded “hallucinations” in AI outputs.
  • Sector-specific, structured data is critical to refine LLM’s understanding of their specialized domain.

Why Choose Us?

Unleash Your LLM’s Potential

  • Prevent LLM Model Collapse: our high-quality data acts as an independent verification source, mitigating “hallucinations”.
  • Legally Sourced: data obtained directly from government registries, ensuring compliance and ethical use.
  • Precision data: crafted by accounting and contract professionals in Europe and Canada
  • Flexibility: conveniently access data Snowflake marketplace or in CSV/JSON format and supporting plane text documents.

Contact us today to explore how our LLM training datasets can give you the competitive edge.

Request access to AI training data

Select relevant products