Published April 17, 2024

How to Train AI with CRM Data using Unified's CRM API

January 14, 2026

Training AI on CRM data sounds straightforward. Pull deals, pull contacts, join companies, and turn the result into rows for embeddings or model training.

In practice, this only works if the underlying CRM data is consistent across customers.

CRMs don't just differ in UI. They model pipelines, stages, ownership, and activity differently. Some systems treat deals as rigid opportunities with fixed stages. Others allow custom pipelines or stage semantics that don't align cleanly. Contacts can appear multiple times. Activities live as notes, tasks, emails, meetings, or page views, each with different structures.

If you ignore those differences, AI features become fragile. You end up hardcoding assumptions per CRM vendor or quietly dropping fields that don't behave the way you expect.

Unified's CRM API is designed to remove those constraints by giving you a single, normalized set of CRM objects to build on.

This normalized layer can power retrieval-augmented generation (RAG) workflows, where CRM notes, deals, and activity are indexed and retrieved at query time to ground AI outputs in customer history.

Unified's CRM API is designed to prevent that. Instead of building training logic on vendor-specific objects, you work against a single normalized set of CRM objects:

deal, contact, company, pipeline, event, lead

This guide shows how to build a training-ready dataset from that normalized layer in TypeScript, using only documented fields, list parameters, and SDK methods. No UI. No dashboards. Just a backend pipeline you can run on a schedule or in response to updated data.

Prerequisites

Node.js v18+
A Unified account with a CRM integration enabled
Your Unified API key
A customer CRM connectionId

Step 1: Set up your project

mkdir crm-ai-training-demo
cd crm-ai-training-demo
npm init -y
npm install @unified-api/typescript-sdk dotenv

Create a .env file:

UNIFIED_API_KEY=your_unified_api_key
CONNECTION_CRM=your_customer_crm_connection_id

Step 2: Initialize the SDK

import "dotenv/config";
import { UnifiedTo } from "@unified-api/typescript-sdk";

const { UNIFIED_API_KEY, CONNECTION_CRM } = process.env;

if (!UNIFIED_API_KEY) throw new Error("Missing UNIFIED_API_KEY");
if (!CONNECTION_CRM) throw new Error("Missing CONNECTION_CRM");

const sdk = new UnifiedTo({
  security: { jwt: UNIFIED_API_KEY },
});

const connectionId = CONNECTION_CRM;

Step 3: Understand the normalized CRM objects (critical)

This guide only uses fields explicitly defined in the CRM data models and list endpoints.

Deals (`CrmDeal`)

Deals are your core training record. Useful fields include:

id
created_at, updated_at (ISO-8601)
name
amount (number)
currency (3-letter ISO code)
closing_at (expected close datetime)
closed_at (actual close datetime)
stage, stage_id
pipeline, pipeline_id
source
probability (number; range not defined)
lost_reason, won_reason
user_id (reference to HrisEmployee)
contact_ids[] (references to CrmContact)
company_ids[] (references to CrmCompany)
metadata[] (structured custom fields)
raw (vendor-native payload)

Contacts (`CrmContact`)

Contacts give person-level context:

id, created_at, updated_at
name, first_name, last_name, title, department, image_url
company (organization name string)
emails[] with type enum: WORK, HOME, OTHER
telephones[] with type enum: WORK, HOME, OTHER, FAX, MOBILE
deal_ids[], company_ids[]
address (structured)
user_id
link_urls[]
metadata[]
raw

Companies (`CrmCompany`)

Companies give firmographic context:

id, created_at, updated_at, name
deal_ids[], contact_ids[]
emails[], telephones[], websites[]
address
is_active, tags[]
description, industry, employees, timezone
domains[]
user_id
metadata[]
raw

Pipelines (`CrmPipeline`) optional

Use pipelines if you need stage metadata:

stages[] where each stage includes:
- id, name, active, deal_probability, is_closed, display_order

Events (`CrmEvent`) optional

Events represent activity and engagement:

type enum: NOTE, EMAIL, TASK, MEETING, CALL, MARKETING_EMAIL, FORM, PAGE_VIEW
Nested objects vary by type (note, meeting, email, call, task, marketing_email, form, page_view)
Relationship arrays: deal_ids[], company_ids[], contact_ids[], lead_ids[]
user_id
raw

Leads (`CrmLead`) optional

Leads are top-of-funnel records:

id, created_at, updated_at, name, first_name, last_name
company_id, contact_id, company_name
source, status
user_id, creator_user_id
emails[], telephones[], link_urls[]
metadata[]
raw

Step 4: Build reliable list fetchers (pagination + incremental pulls)

All list endpoints share the same pagination approach:

limit default 100 (and typically cannot exceed 100)
offset is zero-based
You know you're done when returned results < requested limit
Incremental pulls use updated_gte (ISO-8601)
Some integrations may not support every parameter; verify per-integration feature support in the dashboard

Below are safe, backend-only helpers for deals, contacts, and companies.

type Sort = "name" | "updated_at" | "created_at";
type Order = "asc" | "desc";

type ListBaseOpts = {
  pageSize?: number;
  updated_gte?: string;
  sort?: Sort;
  order?: Order;
  query?: string;
  fields?: string;
  raw?: string;
};

async function fetchAllDeals(
  opts?: ListBaseOpts & {
    company_id?: string;
    contact_id?: string;
    user_id?: string;
    pipeline_id?: string;
  }
) {
  const pageSize = opts?.pageSize ?? 100;
  let offset = 0;
  const out: any[] = [];

  while (true) {
    const page = await sdk.crm.listCrmDeals({
      connectionId,
      limit: pageSize,
      offset,
      updated_gte: opts?.updated_gte ?? "",
      sort: opts?.sort ?? "updated_at",
      order: opts?.order ?? "asc",
      query: opts?.query ?? "",
      company_id: opts?.company_id ?? "",
      contact_id: opts?.contact_id ?? "",
      user_id: opts?.user_id ?? "",
      pipeline_id: opts?.pipeline_id ?? "",
      fields: opts?.fields ?? "",
      raw: opts?.raw ?? "",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    if (page.length < pageSize) break;
    offset += pageSize;
  }

  return out;
}

async function fetchAllContacts(
  opts?: ListBaseOpts & {
    company_id?: string;
    deal_id?: string;
    user_id?: string;
  }
) {
  const pageSize = opts?.pageSize ?? 100;
  let offset = 0;
  const out: any[] = [];

  while (true) {
    const page = await sdk.crm.listCrmContacts({
      connectionId,
      limit: pageSize,
      offset,
      updated_gte: opts?.updated_gte ?? "",
      sort: opts?.sort ?? "updated_at",
      order: opts?.order ?? "asc",
      query: opts?.query ?? "",
      company_id: opts?.company_id ?? "",
      deal_id: opts?.deal_id ?? "",
      user_id: opts?.user_id ?? "",
      fields: opts?.fields ?? "",
      raw: opts?.raw ?? "",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    if (page.length < pageSize) break;
    offset += pageSize;
  }

  return out;
}

async function fetchAllCompanies(
  opts?: ListBaseOpts & {
    deal_id?: string;
    contact_id?: string;
    user_id?: string;
  }
) {
  const pageSize = opts?.pageSize ?? 100;
  let offset = 0;
  const out: any[] = [];

  while (true) {
    const page = await sdk.crm.listCrmCompanies({
      connectionId,
      limit: pageSize,
      offset,
      updated_gte: opts?.updated_gte ?? "",
      sort: opts?.sort ?? "updated_at",
      order: opts?.order ?? "asc",
      query: opts?.query ?? "",
      deal_id: opts?.deal_id ?? "",
      contact_id: opts?.contact_id ?? "",
      user_id: opts?.user_id ?? "",
      fields: opts?.fields ?? "",
      raw: opts?.raw ?? "",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    if (page.length < pageSize) break;
    offset += pageSize;
  }

  return out;
}

Step 5: Index contacts and companies for joins

Deals link to contacts and companies via ID arrays (contact_ids[], company_ids[]). Build lookup tables so joins are deterministic.

function indexById<T extends { id?: string }>(rows: T[]): Record<string, T> {
  const out: Record<string, T> = {};
  for (const r of rows) {
    if (!r.id) continue;
    out[r.id] = r;
  }
  return out;
}

Step 6: Decide what your 'training row' is

A training row should be:

deterministic
stable across runs
based on fields that exist in the normalized schema
explicit about missingness

Below is a conservative row shape that stays within documented fields.

type TrainingRow = {
  deal_id: string;

  deal_created_at?: string;
  deal_updated_at?: string;

  amount?: number;
  currency?: string;

  pipeline?: string;
  pipeline_id?: string;
  stage?: string;
  stage_id?: string;

  closing_at?: string;
  closed_at?: string;

  probability?: number;
  source?: string;

  owner_user_id?: string;

  contact_count: number;
  company_count: number;

  primary_contact_id?: string;
  primary_contact_title?: string;
  primary_contact_department?: string;
  primary_contact_email_type?: string;

  primary_company_id?: string;
  primary_company_industry?: string;
  primary_company_employees?: number;
  primary_company_is_active?: boolean;
  primary_company_timezone?: string;

  label?: "CLOSED" | "OPEN";
};

Note: This uses a simple label definition:

label = "CLOSED" when closed_at exists
otherwise label = "OPEN"

That is safe because closed_at is documented as 'the date that this deal closed.'

Step 7: Build training rows from deals + joins

This step converts normalized CRM objects into training rows.

function buildTrainingRows(input: {
  deals: any[];
  contactsById: Record<string, any>;
  companiesById: Record<string, any>;
}): TrainingRow[] {
  const { deals, contactsById, companiesById } = input;

  const rows: TrainingRow[] = [];

  for (const d of deals) {
    if (!d.id) continue;

    const contactIds: string[] = Array.isArray(d.contact_ids) ? d.contact_ids : [];
    const companyIds: string[] = Array.isArray(d.company_ids) ? d.company_ids : [];

    const primaryContactId = contactIds[0];
    const primaryCompanyId = companyIds[0];

    const primaryContact = primaryContactId ? contactsById[primaryContactId] : undefined;
    const primaryCompany = primaryCompanyId ? companiesById[primaryCompanyId] : undefined;

    const primaryEmail = primaryContact?.emails?.[0];
    const primaryEmailType = primaryEmail?.type;

    const row: TrainingRow = {
      deal_id: d.id,

      deal_created_at: d.created_at,
      deal_updated_at: d.updated_at,

      amount: typeof d.amount === "number" ? d.amount : undefined,
      currency: typeof d.currency === "string" ? d.currency : undefined,

      pipeline: typeof d.pipeline === "string" ? d.pipeline : undefined,
      pipeline_id: typeof d.pipeline_id === "string" ? d.pipeline_id : undefined,
      stage: typeof d.stage === "string" ? d.stage : undefined,
      stage_id: typeof d.stage_id === "string" ? d.stage_id : undefined,

      closing_at: d.closing_at,
      closed_at: d.closed_at,

      probability: typeof d.probability === "number" ? d.probability : undefined,
      source: typeof d.source === "string" ? d.source : undefined,

      owner_user_id: typeof d.user_id === "string" ? d.user_id : undefined,

      contact_count: contactIds.length,
      company_count: companyIds.length,

      primary_contact_id: primaryContactId,
      primary_contact_title: primaryContact?.title,
      primary_contact_department: primaryContact?.department,
      primary_contact_email_type: primaryEmailType,

      primary_company_id: primaryCompanyId,
      primary_company_industry: primaryCompany?.industry,
      primary_company_employees: typeof primaryCompany?.employees === "number" ? primaryCompany.employees : undefined,
      primary_company_is_active: typeof primaryCompany?.is_active === "boolean" ? primaryCompany.is_active : undefined,
      primary_company_timezone: primaryCompany?.timezone,

      label: d.closed_at ? "CLOSED" : "OPEN",
    };

    rows.push(row);
  }

  return rows;
}

Step 8: Optional enrichment with pipeline stages

If you want stage metadata (like is_closed or deal_probability) you can pull pipelines and map:

deal.pipeline_id → pipeline
deal.stage_id → pipeline.stages.id

async function fetchAllPipelines(opts?: ListBaseOpts) {
  const pageSize = opts?.pageSize ?? 100;
  let offset = 0;
  const out: any[] = [];

  while (true) {
    const page = await sdk.crm.listCrmPipelines({
      connectionId,
      limit: pageSize,
      offset,
      updated_gte: opts?.updated_gte ?? "",
      sort: opts?.sort ?? "updated_at",
      order: opts?.order ?? "asc",
      query: opts?.query ?? "",
      fields: opts?.fields ?? "",
      raw: opts?.raw ?? "",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    if (page.length < pageSize) break;
    offset += pageSize;
  }

  return out;
}

function indexStagesByPipeline(pipelines: any[]) {
  const out: Record<string, Record<string, any>> = {};
  for (const p of pipelines) {
    if (!p.id) continue;
    const stages: any[] = Array.isArray(p.stages) ? p.stages : [];
    out[p.id] = {};
    for (const s of stages) {
      if (!s.id) continue;
      out[p.id][s.id] = s;
    }
  }
  return out;
}

You can then enrich each training row with stage attributes if both IDs exist.

Step 9: Optional enrichment with events (engagement signals)

If you want engagement features, events are the normalized place to get them. You can filter events by:

deal_id
contact_id
company_id
type

This fetcher is identical to the other list helpers:

async function fetchAllEvents(
  opts?: ListBaseOpts & {
    deal_id?: string;
    contact_id?: string;
    company_id?: string;
    user_id?: string;
    type?: string;
    lead_id?: string;
  }
) {
  const pageSize = opts?.pageSize ?? 100;
  let offset = 0;
  const out: any[] = [];

  while (true) {
    const page = await sdk.crm.listCrmEvents({
      connectionId,
      limit: pageSize,
      offset,
      updated_gte: opts?.updated_gte ?? "",
      sort: opts?.sort ?? "updated_at",
      order: opts?.order ?? "asc",
      query: opts?.query ?? "",
      deal_id: opts?.deal_id ?? "",
      contact_id: opts?.contact_id ?? "",
      company_id: opts?.company_id ?? "",
      user_id: opts?.user_id ?? "",
      type: opts?.type ?? "",
      lead_id: opts?.lead_id ?? "",
      fields: opts?.fields ?? "",
      raw: opts?.raw ?? "",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    if (page.length < pageSize) break;
    offset += pageSize;
  }

  return out;
}

A safe engagement feature you can compute without guessing semantics:

counts by type for a given deal_id or contact_id

Step 10: Putting it all together

This pulls deals, contacts, companies, builds indexes, and produces training rows.

async function main() {
  const updated_gte = ""; // optionally set for incremental runs

  const deals = await fetchAllDeals({
    pageSize: 100,
    updated_gte,
    sort: "updated_at",
    order: "asc",
  });

  const contacts = await fetchAllContacts({
    pageSize: 100,
    updated_gte,
    sort: "updated_at",
    order: "asc",
  });

  const companies = await fetchAllCompanies({
    pageSize: 100,
    updated_gte,
    sort: "updated_at",
    order: "asc",
  });

  const contactsById = indexById(contacts);
  const companiesById = indexById(companies);

  const trainingRows = buildTrainingRows({
    deals,
    contactsById,
    companiesById,
  });

  console.log("Deals:", deals.length);
  console.log("Contacts:", contacts.length);
  console.log("Companies:", companies.length);
  console.log("Training rows:", trainingRows.length);

  // At this point:
  // - write to a JSONL file
  // - insert into your DB
  // - send through an embedding pipeline
  // - create a supervised dataset for classification/fine-tuning
  console.log("Sample row:", trainingRows[0]);
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Using This Dataset in a RAG Pipeline

Beyond supervised training, the same normalized CRM objects can power a RAG architecture:

Chunk and embed textual CRM fields (deal names, notes, event summaries, company descriptions).
Store embeddings in your vector database with identifiers like connection_id, deal_id, contact_id, and updated_at.
On user query ('What happened with Acme Corp?'), retrieve the most relevant deal, event, and note segments before generating a grounded response. Unified Use Case RAG Pipelines

Unified handles ingestion, normalization, and real-time updates; embeddings and vector search remain in your infrastructure.

Where this fits in an AI pipeline

Once you have trainingRows, you can:

embed textual fields (deal.name, company.industry, event notes, contact title/department) to build retrieval for RAG-based CRM search and assistants.
train a classifier (e.g. predict CLOSED vs OPEN) using numeric/context fields
fine-tune with structured examples if your model accepts JSON schemas

The key is that your dataset is built on the normalized CRM objects and list semantics that Unified documents, with pagination and filtering handled explicitly and without vendor-specific branching.

→ Start your 30-day free trial

→ Book a demo

All articles