How to Train AI with CRM Data using Unified's CRM API
January 15, 2026
Training AI on CRM data sounds straightforward. Pull deals, pull contacts, join companies, and turn the result into rows for embeddings or model training.
In practice, this only works if the underlying CRM data is consistent across customers.
CRMs don't just differ in UI. They model pipelines, stages, ownership, and activity differently. Some systems treat deals as rigid opportunities with fixed stages. Others allow custom pipelines or stage semantics that don't align cleanly. Contacts can appear multiple times. Activities live as notes, tasks, emails, meetings, or page views, each with different structures.
If you ignore those differences, AI features become fragile. You end up hardcoding assumptions per CRM vendor or quietly dropping fields that don't behave the way you expect.
Unified's CRM API is designed to remove those constraints by giving you a single, normalized set of CRM objects to build on.
Unified's CRM API is designed to prevent that. Instead of building training logic on vendor-specific objects, you work against a single normalized set of CRM objects:
deal,contact,company,pipeline,event,lead
This guide shows how to build a training-ready dataset from that normalized layer in TypeScript, using only documented fields, list parameters, and SDK methods. No UI. No dashboards. Just a backend pipeline you can run on a schedule or in response to updated data.
Prerequisites
- Node.js v18+
- A Unified account with a CRM integration enabled
- Your Unified API key
- A customer CRM
connectionId
Step 1: Set up your project
mkdir crm-ai-training-demo
cd crm-ai-training-demo
npm init -y
npm install @unified-api/typescript-sdk dotenv
Create a .env file:
UNIFIED_API_KEY=your_unified_api_key
CONNECTION_CRM=your_customer_crm_connection_id
Step 2: Initialize the SDK
import "dotenv/config";
import { UnifiedTo } from "@unified-api/typescript-sdk";
const { UNIFIED_API_KEY, CONNECTION_CRM } = process.env;
if (!UNIFIED_API_KEY) throw new Error("Missing UNIFIED_API_KEY");
if (!CONNECTION_CRM) throw new Error("Missing CONNECTION_CRM");
const sdk = new UnifiedTo({
security: { jwt: UNIFIED_API_KEY },
});
const connectionId = CONNECTION_CRM;
Step 3: Understand the normalized CRM objects (critical)
This guide only uses fields explicitly defined in the CRM data models and list endpoints.
Deals (CrmDeal)
Deals are your core training record. Useful fields include:
idcreated_at,updated_at(ISO-8601)nameamount(number)currency(3-letter ISO code)closing_at(expected close datetime)closed_at(actual close datetime)stage,stage_idpipeline,pipeline_idsourceprobability(number; range not defined)lost_reason,won_reasonuser_id(reference to HrisEmployee)contact_ids[](references to CrmContact)company_ids[](references to CrmCompany)metadata[](structured custom fields)raw(vendor-native payload)
Contacts (CrmContact)
Contacts give person-level context:
id,created_at,updated_atname,first_name,last_name,title,department,image_urlcompany(organization name string)emails[]withtypeenum:WORK,HOME,OTHERtelephones[]withtypeenum:WORK,HOME,OTHER,FAX,MOBILEdeal_ids[],company_ids[]address(structured)user_idlink_urls[]metadata[]raw
Companies (CrmCompany)
Companies give firmographic context:
id,created_at,updated_at,namedeal_ids[],contact_ids[]emails[],telephones[],websites[]addressis_active,tags[]description,industry,employees,timezonedomains[]user_idmetadata[]raw
Pipelines (CrmPipeline) optional
Use pipelines if you need stage metadata:
stages[]where each stage includes:id,name,active,deal_probability,is_closed,display_order
Events (CrmEvent) optional
Events represent activity and engagement:
typeenum:NOTE,EMAIL,TASK,MEETING,CALL,MARKETING_EMAIL,FORM,PAGE_VIEW- Nested objects vary by type (
note,meeting,email,call,task,marketing_email,form,page_view) - Relationship arrays:
deal_ids[],company_ids[],contact_ids[],lead_ids[] user_idraw
Leads (CrmLead) optional
Leads are top-of-funnel records:
id,created_at,updated_at,name,first_name,last_namecompany_id,contact_id,company_namesource,statususer_id,creator_user_idemails[],telephones[],link_urls[]metadata[]raw
Step 4: Build reliable list fetchers (pagination + incremental pulls)
All list endpoints share the same pagination approach:
limitdefault 100 (and typically cannot exceed 100)offsetis zero-based- You know you're done when returned results < requested
limit - Incremental pulls use
updated_gte(ISO-8601) - Some integrations may not support every parameter; verify per-integration feature support in the dashboard
Below are safe, backend-only helpers for deals, contacts, and companies.
type Sort = "name" | "updated_at" | "created_at";
type Order = "asc" | "desc";
type ListBaseOpts = {
pageSize?: number;
updated_gte?: string;
sort?: Sort;
order?: Order;
query?: string;
fields?: string;
raw?: string;
};
async function fetchAllDeals(
opts?: ListBaseOpts & {
company_id?: string;
contact_id?: string;
user_id?: string;
pipeline_id?: string;
}
) {
const pageSize = opts?.pageSize ?? 100;
let offset = 0;
const out: any[] = [];
while (true) {
const page = await sdk.crm.listCrmDeals({
connectionId,
limit: pageSize,
offset,
updated_gte: opts?.updated_gte ?? "",
sort: opts?.sort ?? "updated_at",
order: opts?.order ?? "asc",
query: opts?.query ?? "",
company_id: opts?.company_id ?? "",
contact_id: opts?.contact_id ?? "",
user_id: opts?.user_id ?? "",
pipeline_id: opts?.pipeline_id ?? "",
fields: opts?.fields ?? "",
raw: opts?.raw ?? "",
});
if (!page || page.length === 0) break;
out.push(...page);
if (page.length < pageSize) break;
offset += pageSize;
}
return out;
}
async function fetchAllContacts(
opts?: ListBaseOpts & {
company_id?: string;
deal_id?: string;
user_id?: string;
}
) {
const pageSize = opts?.pageSize ?? 100;
let offset = 0;
const out: any[] = [];
while (true) {
const page = await sdk.crm.listCrmContacts({
connectionId,
limit: pageSize,
offset,
updated_gte: opts?.updated_gte ?? "",
sort: opts?.sort ?? "updated_at",
order: opts?.order ?? "asc",
query: opts?.query ?? "",
company_id: opts?.company_id ?? "",
deal_id: opts?.deal_id ?? "",
user_id: opts?.user_id ?? "",
fields: opts?.fields ?? "",
raw: opts?.raw ?? "",
});
if (!page || page.length === 0) break;
out.push(...page);
if (page.length < pageSize) break;
offset += pageSize;
}
return out;
}
async function fetchAllCompanies(
opts?: ListBaseOpts & {
deal_id?: string;
contact_id?: string;
user_id?: string;
}
) {
const pageSize = opts?.pageSize ?? 100;
let offset = 0;
const out: any[] = [];
while (true) {
const page = await sdk.crm.listCrmCompanies({
connectionId,
limit: pageSize,
offset,
updated_gte: opts?.updated_gte ?? "",
sort: opts?.sort ?? "updated_at",
order: opts?.order ?? "asc",
query: opts?.query ?? "",
deal_id: opts?.deal_id ?? "",
contact_id: opts?.contact_id ?? "",
user_id: opts?.user_id ?? "",
fields: opts?.fields ?? "",
raw: opts?.raw ?? "",
});
if (!page || page.length === 0) break;
out.push(...page);
if (page.length < pageSize) break;
offset += pageSize;
}
return out;
}
Step 5: Index contacts and companies for joins
Deals link to contacts and companies via ID arrays (contact_ids[], company_ids[]). Build lookup tables so joins are deterministic.
function indexById<T extends { id?: string }>(rows: T[]): Record<string, T> {
const out: Record<string, T> = {};
for (const r of rows) {
if (!r.id) continue;
out[r.id] = r;
}
return out;
}
Step 6: Decide what your 'training row' is
A training row should be:
- deterministic
- stable across runs
- based on fields that exist in the normalized schema
- explicit about missingness
Below is a conservative row shape that stays within documented fields.
type TrainingRow = {
deal_id: string;
deal_created_at?: string;
deal_updated_at?: string;
amount?: number;
currency?: string;
pipeline?: string;
pipeline_id?: string;
stage?: string;
stage_id?: string;
closing_at?: string;
closed_at?: string;
probability?: number;
source?: string;
owner_user_id?: string;
contact_count: number;
company_count: number;
primary_contact_id?: string;
primary_contact_title?: string;
primary_contact_department?: string;
primary_contact_email_type?: string;
primary_company_id?: string;
primary_company_industry?: string;
primary_company_employees?: number;
primary_company_is_active?: boolean;
primary_company_timezone?: string;
label?: "CLOSED" | "OPEN";
};
Note: This uses a simple label definition:
label = "CLOSED"whenclosed_atexists- otherwise
label = "OPEN"
That is safe because closed_at is documented as 'the date that this deal closed.'
Step 7: Build training rows from deals + joins
This step converts normalized CRM objects into training rows.
function buildTrainingRows(input: {
deals: any[];
contactsById: Record<string, any>;
companiesById: Record<string, any>;
}): TrainingRow[] {
const { deals, contactsById, companiesById } = input;
const rows: TrainingRow[] = [];
for (const d of deals) {
if (!d.id) continue;
const contactIds: string[] = Array.isArray(d.contact_ids) ? d.contact_ids : [];
const companyIds: string[] = Array.isArray(d.company_ids) ? d.company_ids : [];
const primaryContactId = contactIds[0];
const primaryCompanyId = companyIds[0];
const primaryContact = primaryContactId ? contactsById[primaryContactId] : undefined;
const primaryCompany = primaryCompanyId ? companiesById[primaryCompanyId] : undefined;
const primaryEmail = primaryContact?.emails?.[0];
const primaryEmailType = primaryEmail?.type;
const row: TrainingRow = {
deal_id: d.id,
deal_created_at: d.created_at,
deal_updated_at: d.updated_at,
amount: typeof d.amount === "number" ? d.amount : undefined,
currency: typeof d.currency === "string" ? d.currency : undefined,
pipeline: typeof d.pipeline === "string" ? d.pipeline : undefined,
pipeline_id: typeof d.pipeline_id === "string" ? d.pipeline_id : undefined,
stage: typeof d.stage === "string" ? d.stage : undefined,
stage_id: typeof d.stage_id === "string" ? d.stage_id : undefined,
closing_at: d.closing_at,
closed_at: d.closed_at,
probability: typeof d.probability === "number" ? d.probability : undefined,
source: typeof d.source === "string" ? d.source : undefined,
owner_user_id: typeof d.user_id === "string" ? d.user_id : undefined,
contact_count: contactIds.length,
company_count: companyIds.length,
primary_contact_id: primaryContactId,
primary_contact_title: primaryContact?.title,
primary_contact_department: primaryContact?.department,
primary_contact_email_type: primaryEmailType,
primary_company_id: primaryCompanyId,
primary_company_industry: primaryCompany?.industry,
primary_company_employees: typeof primaryCompany?.employees === "number" ? primaryCompany.employees : undefined,
primary_company_is_active: typeof primaryCompany?.is_active === "boolean" ? primaryCompany.is_active : undefined,
primary_company_timezone: primaryCompany?.timezone,
label: d.closed_at ? "CLOSED" : "OPEN",
};
rows.push(row);
}
return rows;
}
Step 8: Optional enrichment with pipeline stages
If you want stage metadata (like is_closed or deal_probability) you can pull pipelines and map:
deal.pipeline_id→ pipelinedeal.stage_id→ pipeline.stages.id
async function fetchAllPipelines(opts?: ListBaseOpts) {
const pageSize = opts?.pageSize ?? 100;
let offset = 0;
const out: any[] = [];
while (true) {
const page = await sdk.crm.listCrmPipelines({
connectionId,
limit: pageSize,
offset,
updated_gte: opts?.updated_gte ?? "",
sort: opts?.sort ?? "updated_at",
order: opts?.order ?? "asc",
query: opts?.query ?? "",
fields: opts?.fields ?? "",
raw: opts?.raw ?? "",
});
if (!page || page.length === 0) break;
out.push(...page);
if (page.length < pageSize) break;
offset += pageSize;
}
return out;
}
function indexStagesByPipeline(pipelines: any[]) {
const out: Record<string, Record<string, any>> = {};
for (const p of pipelines) {
if (!p.id) continue;
const stages: any[] = Array.isArray(p.stages) ? p.stages : [];
out[p.id] = {};
for (const s of stages) {
if (!s.id) continue;
out[p.id][s.id] = s;
}
}
return out;
}
You can then enrich each training row with stage attributes if both IDs exist.
Step 9: Optional enrichment with events (engagement signals)
If you want engagement features, events are the normalized place to get them. You can filter events by:
deal_idcontact_idcompany_idtype
This fetcher is identical to the other list helpers:
async function fetchAllEvents(
opts?: ListBaseOpts & {
deal_id?: string;
contact_id?: string;
company_id?: string;
user_id?: string;
type?: string;
lead_id?: string;
}
) {
const pageSize = opts?.pageSize ?? 100;
let offset = 0;
const out: any[] = [];
while (true) {
const page = await sdk.crm.listCrmEvents({
connectionId,
limit: pageSize,
offset,
updated_gte: opts?.updated_gte ?? "",
sort: opts?.sort ?? "updated_at",
order: opts?.order ?? "asc",
query: opts?.query ?? "",
deal_id: opts?.deal_id ?? "",
contact_id: opts?.contact_id ?? "",
company_id: opts?.company_id ?? "",
user_id: opts?.user_id ?? "",
type: opts?.type ?? "",
lead_id: opts?.lead_id ?? "",
fields: opts?.fields ?? "",
raw: opts?.raw ?? "",
});
if (!page || page.length === 0) break;
out.push(...page);
if (page.length < pageSize) break;
offset += pageSize;
}
return out;
}
A safe engagement feature you can compute without guessing semantics:
- counts by
typefor a givendeal_idorcontact_id
Step 10: Putting it all together
This pulls deals, contacts, companies, builds indexes, and produces training rows.
async function main() {
const updated_gte = ""; // optionally set for incremental runs
const deals = await fetchAllDeals({
pageSize: 100,
updated_gte,
sort: "updated_at",
order: "asc",
});
const contacts = await fetchAllContacts({
pageSize: 100,
updated_gte,
sort: "updated_at",
order: "asc",
});
const companies = await fetchAllCompanies({
pageSize: 100,
updated_gte,
sort: "updated_at",
order: "asc",
});
const contactsById = indexById(contacts);
const companiesById = indexById(companies);
const trainingRows = buildTrainingRows({
deals,
contactsById,
companiesById,
});
console.log("Deals:", deals.length);
console.log("Contacts:", contacts.length);
console.log("Companies:", companies.length);
console.log("Training rows:", trainingRows.length);
// At this point:
// - write to a JSONL file
// - insert into your DB
// - send through an embedding pipeline
// - create a supervised dataset for classification/fine-tuning
console.log("Sample row:", trainingRows[0]);
}
main().catch((err) => {
console.error(err);
process.exit(1);
});
Where this fits in an AI pipeline
Once you have trainingRows, you can:
- embed textual fields (
deal.name,company.industry, contact title/department) to build retrieval - train a classifier (e.g. predict
CLOSEDvsOPEN) using numeric/context fields - fine-tune with structured examples if your model accepts JSON schemas
The key is that your dataset is built on the normalized CRM objects and list semantics that Unified documents, with pagination and filtering handled explicitly and without vendor-specific branching.