Unified.to
All articles

How to Build Legal Contract Search and Insights with RAG Using Unified's Storage and KMS APIs


February 13, 2026

Retrieval-Augmented Generation (RAG) is one of the most common patterns for building AI systems grounded in enterprise data.

In legal teams, RAG is often used to:

  • Search contracts across multiple repositories
  • Answer clause-level questions
  • Surface renewal terms and termination language
  • Ground AI responses in actual agreement text

But legal RAG systems fail when they:

  • Index documents without respecting permissions
  • Mix tenant data
  • Ignore document structure
  • Treat legal insight as inference instead of retrieval

Unified's Storage and Knowledge Management (KMS) APIs provide the normalized, real-time data layer required to implement contract RAG pipelines correctly.

This guide shows how to build a permission-aware, tenant-isolated, incrementally synchronized RAG pipeline for legal contracts using Unified.

RAG (Retrieval-Augmented Generation) is not model training.

It is a pattern:

  1. Retrieve relevant contract content.
  2. Provide that content as context.
  3. Generate a response grounded in the retrieved text.

For legal use cases, RAG must be:

  • Deterministic
  • Permission-aware
  • Metadata-driven
  • Non-inferential

Legal RAG answers questions like:

  • Which contracts include indemnification clauses?
  • Which agreements auto-renew?
  • Which vendor contracts expire next quarter?
  • Where is termination for convenience defined?

It does not determine compliance or legal validity.

Unified provides the data retrieval layer for this architecture. You own embeddings, vector storage, and model behavior.

Where Legal Contracts Live in SaaS Systems

Enterprise contracts typically exist in:

  • File Storage (PDF contracts)
  • Knowledge Management systems (Notion, Confluence contract pages)

Unified normalizes both.

StorageFile (File Storage API)

Contracts stored as files are represented as StorageFile.

Relevant fields:

  • id
  • created_at
  • updated_at
  • name
  • parent_id
  • user_id
  • size
  • type (FILE or FOLDER)
  • mime_type
  • permissions[]
  • download_url
  • hash
  • version
  • web_url

Important RAG implications:

  • download_url expires after 1 hour (must re-fetch when reprocessing)
  • permissions[] includes:
    • user_id
    • group_id
    • roles (OWNER, READ, WRITE)
  • Explicit permission metadata is available only for file objects
  • Unified does not enforce access — your RAG system must

This makes StorageFile ideal for embedding contract PDFs with permission metadata attached.

KmsPage (Knowledge Management API)

Contracts stored as structured pages use KmsPage.

Relevant fields:

  • id
  • created_at
  • updated_at
  • title
  • type (HTML, MARKDOWN, TEXT, OTHER)
  • space_id
  • parent_id
  • is_active
  • user_id
  • download_url
  • metadata[]
  • has_children
  • web_url

Important RAG implications:

  • No explicit permissions[] structure
  • Access must be enforced at the application layer
  • Content must be fetched via download_url

RAG pipelines must treat KMS content differently than file storage content.

Step 1: Connect Contract Data Sources

Authorize integrations using Embedded Auth.

Each integration produces a connection_id.

In RAG systems, connection_id must be treated as the tenant boundary.

All Unified endpoints require connection_id. Unified enforces isolation at the connection level.

Your vector database must also filter by connection_id.

Step 2: Subscribe to Contract Updates

RAG systems require freshness.

Unified supports:

  • Native webhooks (push-based)
  • Virtual webhooks (polling-based)

Subscribe to:

  • File created
  • File updated
  • Page created
  • Page updated

Deletion event support varies by provider. If unavailable, use incremental sync via updated_gte.

For legal RAG, webhook-driven indexing is strongly preferred over polling.

Step 3: Retrieve Metadata First

Example (Storage):

const files = await sdk.storage.listStorageFiles({
  connectionId,
  limit: 100,
  fields: 'id,name,type,mime_type,updated_at,permissions'
});

Example (KMS):

const pages = await sdk.kms.listKmsPages({
  connectionId,
  limit: 100,
  fields: 'id,title,space_id,updated_at'
});

List endpoints return metadata only.

Full content must be retrieved explicitly.

Step 4: Retrieve Full Contract Content

Storage:

const file = await sdk.storage.getStorageFile({
  connectionId,
  id: fileId,
  fields: 'download_url'
});

KMS:

const page = await sdk.kms.getKmsPage({
  connectionId,
  id: pageId,
  fields: 'download_url'
});

Fetch the content using the returned download_url.

Unified does not inline contract bodies.

Step 5: Chunk Contracts for RAG

Legal RAG quality depends on chunking.

Chunk by:

  • Section headings
  • Clause boundaries
  • Defined term groupings
  • Structured paragraphs

Attach metadata to each chunk:

  • connection_id
  • object_type
  • object_id
  • updated_at
  • permissions (for files)
  • space_id / parent_id

Unified normalizes the schema. You manage embedding and vector storage.

Step 6: Store in Your Vector Database

Unified does not:

  • Store embeddings
  • Cache payloads
  • Maintain vector indexes

Store embeddings in your infrastructure.

Use metadata to support:

  • Tenant isolation
  • Permission enforcement
  • Update replacement
  • Space/folder scoping

Step 7: Permission-Aware RAG Retrieval

When a user queries:

  1. Embed the query.
  2. Retrieve top-matching chunks.
  3. Filter by:
    • connection_id
    • permissions[] (for files)
    • Application-level user rules (for pages)
  4. Pass filtered context to the model.
  5. Generate response.

File objects include explicit permission metadata.

KMS objects do not.

Your RAG system must enforce access before generation.

Real-Time RAG vs Static Indexing

Legal RAG systems require up-to-date contracts.

Use:

  • Webhooks for real-time ingestion
  • updated_gte for incremental sync
  • Re-chunking and re-embedding on updates

RAG without synchronization leads to stale contract answers.

Unified supports event-driven ingestion but does not manage embeddings.

Use RAG when you need:

  • Clause-level search
  • Renewal tracking
  • Cross-repository contract discovery
  • AI assistants grounded in agreement text
  • Permission-aware contract Q&A

RAG is appropriate when retrieval is required.

It is not a substitute for legal interpretation.

Unified is not 'a RAG tool.'

It is:

  • The normalized retrieval layer
  • The tenant-isolated connection layer
  • The webhook-driven synchronization layer
  • The permission-aware file metadata layer

Legal RAG systems require:

  • Real-time SaaS access
  • Stable schemas across providers
  • Explicit permission metadata
  • Deterministic tenant scoping

Unified provides those foundations.

You own embeddings, indexing, and generation.

Closing Thoughts

RAG is powerful for legal insight — but only when retrieval boundaries are clear.

Unified's Storage and Knowledge Management APIs provide:

  • Normalized contract objects
  • Real-time SaaS access
  • Webhook-driven synchronization
  • Explicit file permission metadata
  • Connection-level tenant isolation

That clarity makes RAG defensible.

→ Start your 30-day free trial

→ Book a demo

All articles