Unified.to
All articles

How to Train AI with CRM Data using Unified's CRM API


January 15, 2026

Training AI on CRM data sounds straightforward. Pull deals, pull contacts, join companies, and turn the result into rows for embeddings or model training.

In practice, this only works if the underlying CRM data is consistent across customers.

CRMs don't just differ in UI. They model pipelines, stages, ownership, and activity differently. Some systems treat deals as rigid opportunities with fixed stages. Others allow custom pipelines or stage semantics that don't align cleanly. Contacts can appear multiple times. Activities live as notes, tasks, emails, meetings, or page views, each with different structures.

If you ignore those differences, AI features become fragile. You end up hardcoding assumptions per CRM vendor or quietly dropping fields that don't behave the way you expect.

Unified's CRM API is designed to remove those constraints by giving you a single, normalized set of CRM objects to build on.

Unified's CRM API is designed to prevent that. Instead of building training logic on vendor-specific objects, you work against a single normalized set of CRM objects:

  • deal, contact, company, pipeline, event, lead

This guide shows how to build a training-ready dataset from that normalized layer in TypeScript, using only documented fields, list parameters, and SDK methods. No UI. No dashboards. Just a backend pipeline you can run on a schedule or in response to updated data.

Prerequisites

  • Node.js v18+
  • A Unified account with a CRM integration enabled
  • Your Unified API key
  • A customer CRM connectionId

Step 1: Set up your project

mkdir crm-ai-training-demo
cd crm-ai-training-demo
npm init -y
npm install @unified-api/typescript-sdk dotenv

Create a .env file:

UNIFIED_API_KEY=your_unified_api_key
CONNECTION_CRM=your_customer_crm_connection_id

Step 2: Initialize the SDK

import "dotenv/config";
import { UnifiedTo } from "@unified-api/typescript-sdk";

const { UNIFIED_API_KEY, CONNECTION_CRM } = process.env;

if (!UNIFIED_API_KEY) throw new Error("Missing UNIFIED_API_KEY");
if (!CONNECTION_CRM) throw new Error("Missing CONNECTION_CRM");

const sdk = new UnifiedTo({
  security: { jwt: UNIFIED_API_KEY },
});

const connectionId = CONNECTION_CRM;

Step 3: Understand the normalized CRM objects (critical)

This guide only uses fields explicitly defined in the CRM data models and list endpoints.

Deals (CrmDeal)

Deals are your core training record. Useful fields include:

  • id
  • created_at, updated_at (ISO-8601)
  • name
  • amount (number)
  • currency (3-letter ISO code)
  • closing_at (expected close datetime)
  • closed_at (actual close datetime)
  • stage, stage_id
  • pipeline, pipeline_id
  • source
  • probability (number; range not defined)
  • lost_reason, won_reason
  • user_id (reference to HrisEmployee)
  • contact_ids[] (references to CrmContact)
  • company_ids[] (references to CrmCompany)
  • metadata[] (structured custom fields)
  • raw (vendor-native payload)

Contacts (CrmContact)

Contacts give person-level context:

  • id, created_at, updated_at
  • name, first_name, last_name, title, department, image_url
  • company (organization name string)
  • emails[] with type enum: WORK, HOME, OTHER
  • telephones[] with type enum: WORK, HOME, OTHER, FAX, MOBILE
  • deal_ids[], company_ids[]
  • address (structured)
  • user_id
  • link_urls[]
  • metadata[]
  • raw

Companies (CrmCompany)

Companies give firmographic context:

  • id, created_at, updated_at, name
  • deal_ids[], contact_ids[]
  • emails[], telephones[], websites[]
  • address
  • is_active, tags[]
  • description, industry, employees, timezone
  • domains[]
  • user_id
  • metadata[]
  • raw

Pipelines (CrmPipeline) optional

Use pipelines if you need stage metadata:

  • stages[] where each stage includes:
    • id, name, active, deal_probability, is_closed, display_order

Events (CrmEvent) optional

Events represent activity and engagement:

  • type enum: NOTE, EMAIL, TASK, MEETING, CALL, MARKETING_EMAIL, FORM, PAGE_VIEW
  • Nested objects vary by type (note, meeting, email, call, task, marketing_email, form, page_view)
  • Relationship arrays: deal_ids[], company_ids[], contact_ids[], lead_ids[]
  • user_id
  • raw

Leads (CrmLead) optional

Leads are top-of-funnel records:

  • id, created_at, updated_at, name, first_name, last_name
  • company_id, contact_id, company_name
  • source, status
  • user_id, creator_user_id
  • emails[], telephones[], link_urls[]
  • metadata[]
  • raw

Step 4: Build reliable list fetchers (pagination + incremental pulls)

All list endpoints share the same pagination approach:

  • limit default 100 (and typically cannot exceed 100)
  • offset is zero-based
  • You know you're done when returned results < requested limit
  • Incremental pulls use updated_gte (ISO-8601)
  • Some integrations may not support every parameter; verify per-integration feature support in the dashboard

Below are safe, backend-only helpers for deals, contacts, and companies.

type Sort = "name" | "updated_at" | "created_at";
type Order = "asc" | "desc";

type ListBaseOpts = {
  pageSize?: number;
  updated_gte?: string;
  sort?: Sort;
  order?: Order;
  query?: string;
  fields?: string;
  raw?: string;
};

async function fetchAllDeals(
  opts?: ListBaseOpts & {
    company_id?: string;
    contact_id?: string;
    user_id?: string;
    pipeline_id?: string;
  }
) {
  const pageSize = opts?.pageSize ?? 100;
  let offset = 0;
  const out: any[] = [];

  while (true) {
    const page = await sdk.crm.listCrmDeals({
      connectionId,
      limit: pageSize,
      offset,
      updated_gte: opts?.updated_gte ?? "",
      sort: opts?.sort ?? "updated_at",
      order: opts?.order ?? "asc",
      query: opts?.query ?? "",
      company_id: opts?.company_id ?? "",
      contact_id: opts?.contact_id ?? "",
      user_id: opts?.user_id ?? "",
      pipeline_id: opts?.pipeline_id ?? "",
      fields: opts?.fields ?? "",
      raw: opts?.raw ?? "",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    if (page.length < pageSize) break;
    offset += pageSize;
  }

  return out;
}

async function fetchAllContacts(
  opts?: ListBaseOpts & {
    company_id?: string;
    deal_id?: string;
    user_id?: string;
  }
) {
  const pageSize = opts?.pageSize ?? 100;
  let offset = 0;
  const out: any[] = [];

  while (true) {
    const page = await sdk.crm.listCrmContacts({
      connectionId,
      limit: pageSize,
      offset,
      updated_gte: opts?.updated_gte ?? "",
      sort: opts?.sort ?? "updated_at",
      order: opts?.order ?? "asc",
      query: opts?.query ?? "",
      company_id: opts?.company_id ?? "",
      deal_id: opts?.deal_id ?? "",
      user_id: opts?.user_id ?? "",
      fields: opts?.fields ?? "",
      raw: opts?.raw ?? "",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    if (page.length < pageSize) break;
    offset += pageSize;
  }

  return out;
}

async function fetchAllCompanies(
  opts?: ListBaseOpts & {
    deal_id?: string;
    contact_id?: string;
    user_id?: string;
  }
) {
  const pageSize = opts?.pageSize ?? 100;
  let offset = 0;
  const out: any[] = [];

  while (true) {
    const page = await sdk.crm.listCrmCompanies({
      connectionId,
      limit: pageSize,
      offset,
      updated_gte: opts?.updated_gte ?? "",
      sort: opts?.sort ?? "updated_at",
      order: opts?.order ?? "asc",
      query: opts?.query ?? "",
      deal_id: opts?.deal_id ?? "",
      contact_id: opts?.contact_id ?? "",
      user_id: opts?.user_id ?? "",
      fields: opts?.fields ?? "",
      raw: opts?.raw ?? "",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    if (page.length < pageSize) break;
    offset += pageSize;
  }

  return out;
}

Step 5: Index contacts and companies for joins

Deals link to contacts and companies via ID arrays (contact_ids[], company_ids[]). Build lookup tables so joins are deterministic.

function indexById<T extends { id?: string }>(rows: T[]): Record<string, T> {
  const out: Record<string, T> = {};
  for (const r of rows) {
    if (!r.id) continue;
    out[r.id] = r;
  }
  return out;
}

Step 6: Decide what your 'training row' is

A training row should be:

  • deterministic
  • stable across runs
  • based on fields that exist in the normalized schema
  • explicit about missingness

Below is a conservative row shape that stays within documented fields.

type TrainingRow = {
  deal_id: string;

  deal_created_at?: string;
  deal_updated_at?: string;

  amount?: number;
  currency?: string;

  pipeline?: string;
  pipeline_id?: string;
  stage?: string;
  stage_id?: string;

  closing_at?: string;
  closed_at?: string;

  probability?: number;
  source?: string;

  owner_user_id?: string;

  contact_count: number;
  company_count: number;

  primary_contact_id?: string;
  primary_contact_title?: string;
  primary_contact_department?: string;
  primary_contact_email_type?: string;

  primary_company_id?: string;
  primary_company_industry?: string;
  primary_company_employees?: number;
  primary_company_is_active?: boolean;
  primary_company_timezone?: string;

  label?: "CLOSED" | "OPEN";
};

Note: This uses a simple label definition:

  • label = "CLOSED" when closed_at exists
  • otherwise label = "OPEN"

That is safe because closed_at is documented as 'the date that this deal closed.'

Step 7: Build training rows from deals + joins

This step converts normalized CRM objects into training rows.

function buildTrainingRows(input: {
  deals: any[];
  contactsById: Record<string, any>;
  companiesById: Record<string, any>;
}): TrainingRow[] {
  const { deals, contactsById, companiesById } = input;

  const rows: TrainingRow[] = [];

  for (const d of deals) {
    if (!d.id) continue;

    const contactIds: string[] = Array.isArray(d.contact_ids) ? d.contact_ids : [];
    const companyIds: string[] = Array.isArray(d.company_ids) ? d.company_ids : [];

    const primaryContactId = contactIds[0];
    const primaryCompanyId = companyIds[0];

    const primaryContact = primaryContactId ? contactsById[primaryContactId] : undefined;
    const primaryCompany = primaryCompanyId ? companiesById[primaryCompanyId] : undefined;

    const primaryEmail = primaryContact?.emails?.[0];
    const primaryEmailType = primaryEmail?.type;

    const row: TrainingRow = {
      deal_id: d.id,

      deal_created_at: d.created_at,
      deal_updated_at: d.updated_at,

      amount: typeof d.amount === "number" ? d.amount : undefined,
      currency: typeof d.currency === "string" ? d.currency : undefined,

      pipeline: typeof d.pipeline === "string" ? d.pipeline : undefined,
      pipeline_id: typeof d.pipeline_id === "string" ? d.pipeline_id : undefined,
      stage: typeof d.stage === "string" ? d.stage : undefined,
      stage_id: typeof d.stage_id === "string" ? d.stage_id : undefined,

      closing_at: d.closing_at,
      closed_at: d.closed_at,

      probability: typeof d.probability === "number" ? d.probability : undefined,
      source: typeof d.source === "string" ? d.source : undefined,

      owner_user_id: typeof d.user_id === "string" ? d.user_id : undefined,

      contact_count: contactIds.length,
      company_count: companyIds.length,

      primary_contact_id: primaryContactId,
      primary_contact_title: primaryContact?.title,
      primary_contact_department: primaryContact?.department,
      primary_contact_email_type: primaryEmailType,

      primary_company_id: primaryCompanyId,
      primary_company_industry: primaryCompany?.industry,
      primary_company_employees: typeof primaryCompany?.employees === "number" ? primaryCompany.employees : undefined,
      primary_company_is_active: typeof primaryCompany?.is_active === "boolean" ? primaryCompany.is_active : undefined,
      primary_company_timezone: primaryCompany?.timezone,

      label: d.closed_at ? "CLOSED" : "OPEN",
    };

    rows.push(row);
  }

  return rows;
}

Step 8: Optional enrichment with pipeline stages

If you want stage metadata (like is_closed or deal_probability) you can pull pipelines and map:

  • deal.pipeline_id → pipeline
  • deal.stage_id → pipeline.stages.id
async function fetchAllPipelines(opts?: ListBaseOpts) {
  const pageSize = opts?.pageSize ?? 100;
  let offset = 0;
  const out: any[] = [];

  while (true) {
    const page = await sdk.crm.listCrmPipelines({
      connectionId,
      limit: pageSize,
      offset,
      updated_gte: opts?.updated_gte ?? "",
      sort: opts?.sort ?? "updated_at",
      order: opts?.order ?? "asc",
      query: opts?.query ?? "",
      fields: opts?.fields ?? "",
      raw: opts?.raw ?? "",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    if (page.length < pageSize) break;
    offset += pageSize;
  }

  return out;
}

function indexStagesByPipeline(pipelines: any[]) {
  const out: Record<string, Record<string, any>> = {};
  for (const p of pipelines) {
    if (!p.id) continue;
    const stages: any[] = Array.isArray(p.stages) ? p.stages : [];
    out[p.id] = {};
    for (const s of stages) {
      if (!s.id) continue;
      out[p.id][s.id] = s;
    }
  }
  return out;
}

You can then enrich each training row with stage attributes if both IDs exist.

Step 9: Optional enrichment with events (engagement signals)

If you want engagement features, events are the normalized place to get them. You can filter events by:

  • deal_id
  • contact_id
  • company_id
  • type

This fetcher is identical to the other list helpers:

async function fetchAllEvents(
  opts?: ListBaseOpts & {
    deal_id?: string;
    contact_id?: string;
    company_id?: string;
    user_id?: string;
    type?: string;
    lead_id?: string;
  }
) {
  const pageSize = opts?.pageSize ?? 100;
  let offset = 0;
  const out: any[] = [];

  while (true) {
    const page = await sdk.crm.listCrmEvents({
      connectionId,
      limit: pageSize,
      offset,
      updated_gte: opts?.updated_gte ?? "",
      sort: opts?.sort ?? "updated_at",
      order: opts?.order ?? "asc",
      query: opts?.query ?? "",
      deal_id: opts?.deal_id ?? "",
      contact_id: opts?.contact_id ?? "",
      company_id: opts?.company_id ?? "",
      user_id: opts?.user_id ?? "",
      type: opts?.type ?? "",
      lead_id: opts?.lead_id ?? "",
      fields: opts?.fields ?? "",
      raw: opts?.raw ?? "",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    if (page.length < pageSize) break;
    offset += pageSize;
  }

  return out;
}

A safe engagement feature you can compute without guessing semantics:

  • counts by type for a given deal_id or contact_id

Step 10: Putting it all together

This pulls deals, contacts, companies, builds indexes, and produces training rows.

async function main() {
  const updated_gte = ""; // optionally set for incremental runs

  const deals = await fetchAllDeals({
    pageSize: 100,
    updated_gte,
    sort: "updated_at",
    order: "asc",
  });

  const contacts = await fetchAllContacts({
    pageSize: 100,
    updated_gte,
    sort: "updated_at",
    order: "asc",
  });

  const companies = await fetchAllCompanies({
    pageSize: 100,
    updated_gte,
    sort: "updated_at",
    order: "asc",
  });

  const contactsById = indexById(contacts);
  const companiesById = indexById(companies);

  const trainingRows = buildTrainingRows({
    deals,
    contactsById,
    companiesById,
  });

  console.log("Deals:", deals.length);
  console.log("Contacts:", contacts.length);
  console.log("Companies:", companies.length);
  console.log("Training rows:", trainingRows.length);

  // At this point:
  // - write to a JSONL file
  // - insert into your DB
  // - send through an embedding pipeline
  // - create a supervised dataset for classification/fine-tuning
  console.log("Sample row:", trainingRows[0]);
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Where this fits in an AI pipeline

Once you have trainingRows, you can:

  • embed textual fields (deal.name, company.industry, contact title/department) to build retrieval
  • train a classifier (e.g. predict CLOSED vs OPEN) using numeric/context fields
  • fine-tune with structured examples if your model accepts JSON schemas

The key is that your dataset is built on the normalized CRM objects and list semantics that Unified documents, with pagination and filtering handled explicitly and without vendor-specific branching.

Start your 30-day free trial

Book a demo

All articles