Blog
April 13, 2026

From Raw PDFs to Structured Claim Representation with purpose-built AI

von
Andrej Evtimov

The insurance industry has discussed “data-driven claims” for years. Achieving this goal requires a new approach.

There’s a concept in molecular biology called DNA sequencing. It’s the process of taking a complex, seemingly chaotic biological sample and mapping it into a structured code that scientists can read, analyze, and use to make decisions. Before sequencing, a genome was an abstraction; after sequencing, it became actionable intelligence.

Bodily injury claim files face a similar challenge. Before structuring, they consist of disparate PDFs: police reports, medical records, demand letters, billing summaries, witness statements, and adjuster notes. While each document contains facts, they are disconnected, stored in various formats, and spread across many pages. The claim file contains valuable information but remains unstructured.

At amaise, we refer to a CaseDNA as a structured knowledge representation of an entire claim, designed to address this gap. Understanding its definition, functionality, and significance is essential for evaluating next-generation claims technology.

Key Takeaways

  • Bodily injury claim files are not disorganized; they are unstructured. Police reports, medical records, demand letters, and billing summaries each contain accurate facts. However, these facts are scattered across formats and pages, requiring adjusters to manually synthesize them before analysis can begin.
  • Structured claim intelligence is not simply an improved document viewer; it represents a fundamentally different approach to claims. A system that builds a CaseDNA reads every document, extracts injuries, treatments, liability factors, and coverage terms, and reconstructs them into a connected knowledge graph where each fact links directly to its source.
  • General-purpose AI cannot reliably address the nuances of insurance claims. Differences such as those between a disc bulge and a herniation, or between patient-reported pain and objectively confirmed findings, directly impact claim value. Models not trained on insurance-specific patterns and terminology often produce plausible but unreliable outputs, posing unacceptable risks to carriers.

Current state: What adjusters see

In most claims management systems, bodily injury claims are organized as file directories. Documents are listed by upload date or type, sometimes with a basic index. Adjusters must open each document, review its contents, take notes, and recall details from previous pages.

This scenario is common in claims handling. The challenge lies not in a lack of information, but in its format, which complicates synthesis. Medical records are organized by provider visit rather than by injury. Police reports are narrative rather than structured data. Demand letters present the plaintiff’s argument instead of a neutral assessment.

Adjusters must synthesize fragmented details to form a coherent understanding of the claim. This analysis is often compromised by high caseloads and limited time.

Future state: what structured intelligence looks like

Now consider the same claim file processed by a system that reads every document and reconstructs the information into a structured knowledge graph.

Instead of a directory of PDFs, the adjuster accesses an integrated claim view. Injuries are listed with supporting evidence, including references to relevant documents, clinical findings, treatment timelines, and whether each injury was documented before the accident date.

Accident circumstances are extracted and organized, detailing each party’s account, police documentation, and witness observations, along with areas of agreement or conflict. Liability is presented from multiple perspectives, each supported by evidence.

Coverage terms align with the claim facts. Fraud indicators are highlighted and explained. Treatment trajectories are benchmarked against expected patterns for the documented injury type and severity.

Each fact in this structured view links directly to its source document and page. Adjusters can verify information instantly. All data remains transparent and traceable. The AI manages reading and organization, allowing adjusters to focus on evaluation and decision-making.

How it works

Building a CaseDNA from raw claim documents requires a multi-layered process that surpasses the capabilities of general-purpose AI models.

Document Intelligence

The first layer is document intelligence. Each PDF is read, classified, and parsed. This process is complex due to varied EHR formats, jurisdictional differences in police reports, inconsistent scan quality, and issues such as out-of-order, duplicated, or missing pages. The system must address these challenges without human intervention.

Extraction

The second layer is extraction. The system identifies and extracts categories such as injuries, treatments, providers, dates, accident details, liability factors, and coverage terms from parsed documents. This requires insurance- and medical-specific AI models, since general-purpose models cannot reliably distinguish nuanced terms or document types.

Reconstruction

The third layer is reconstruction. Extracted facts are assembled into a knowledge graph, creating a structured network of relationships. Injuries connect to treatments, providers, and billing. Accident mechanisms relate to injury types, medical findings, and liability assessments. These connections turn isolated facts into actionable claim intelligence.

Reasoning

The fourth layer is reasoning. AI agents trained in insurance claims logic analyze the knowledge graph to generate insights, including treatment pattern consistency, billing alignment with expectations, and the strength of liability arguments from both the claimant and carrier perspectives.

Each layer builds on the previous one. Reasoning requires structured data, which depends on accurate extraction and thorough reading of documents. Point solutions that address only one layer of the claim often fall short in bodily injury claims. The full value lies in an integrated approach.

Why general-purpose AI isn’t enough

A common question is why general-purpose language models are insufficient and why specialized, insurance-trained models are necessary.

Accuracy and reliability are essential. General-purpose models may generate plausible summaries, but in insurance, these summaries can be misleading.

For example, the distinction between a disc bulge and a herniation can significantly affect claim value. Similarly, differentiating between “pain reported by patient” and “pain confirmed by objective findings” impacts claim strength. These nuances require models trained on millions of real claims documents that understand insurance-specific patterns, terminology, and reasoning.

Purpose-built systems also address the hallucination issue, which makes general-purpose models risky for insurance. When each claim in the knowledge graph links to specific source evidence, unverifiable information is flagged immediately rather than presented as fact.

This engineering challenge takes years to solve. It is not a matter of connecting a claims database to a language model and adding a user interface.

Instead, it requires building an end-to-end system in which document intelligence, extraction, knowledge reconstruction, and reasoning work together to meet the demands for domain-specificity and accuracy insurance.

What this means for claims operations

The practical impact of structured claim intelligence is measurable and consistent among carriers that have implemented it. Adjusters spend significantly less time on document review, not by skipping documents, but because comprehensive reading is automated. The time saved is redirected to activities that improve outcomes, including claimant communication, negotiation, decision-making, and collaboration with specialists.

The quality gap between experienced and newer adjusters narrows, as both use the same structured intelligence. Veterans apply their expertise, while newcomers benefit from a comprehensive, verified understanding of each claim. Organizational intelligence grows over time. As structured claim data accumulates, patterns emerge that individual adjusters may not detect, such as claim-escalation trends, treatment patterns linked to higher costs, and jurisdictional factors predicting litigation risk. Claims are handled proactively rather than reactively.

amaise is not simply a faster way to read documents; it offers a fundamentally different approach to understanding claims.

amaise transforms raw claim files into structured knowledge graphs powered by more than 60 insurance-trained AI models. Learn more at amaise.com.

Frequently Asked Questions

What are the four layers that turn raw PDFs into structured claim intelligence?

The process starts with document intelligence; reading, classifying, and parsing each file regardless of format or scan quality. Extraction follows, pulling out injuries, treatments, providers, dates, and liability factors. Reconstruction assembles those facts into a knowledge graph. Reasoning then applies insurance-trained logic to generate insights on treatment patterns, billing alignment, and liability strength. Each layer depends on the one before it.

How does this approach address the AI hallucination problem?

Every fact in the knowledge graph is tied to a specific source document and page. When information cannot be verified against a source, it is flagged rather than presented as a finding. This makes unverifiable claims visible immediately, instead of letting them surface later in a review or dispute.

What does this mean for adjuster performance and operational outcomes?

Adjusters spend less time reading documents, not by skipping them, but because comprehensive reading is automated. The quality gap between experienced and newer adjusters narrows, since both work from the same verified, structured view of the claim. Over time, structured claim data accumulates patterns that individual adjusters may not detect, shifting operations from reactive to anticipatory.