Published January 14, 2026

How to Train AI with HRIS Data with Unified's HR & Directory API

January 14, 2026

Most HR-driven AI features fail long before the model ever trains.

On paper, HR analytics looks straightforward. Pull employees. Read compensation. Add payroll history. Derive features like salary bands, tenure, churn risk, or cost forecasting.

In practice, those steps only work if HR data means the same thing across customers.

That's where complexity creeps in.

HR systems don't just differ in UI. They model employees, compensation, payroll, and time differently. Some systems treat compensation as a single current record. Others split it into multiple components. Payroll may arrive as clean payslips, or only as time-based shifts. Leave, termination, and employment status are often represented differently or inconsistently.

For a PM or platform engineer, this creates uncomfortable tradeoffs:

Can we trust compensation fields to be comparable across HR providers?
Is payroll data current-state or historical?
Can we train one model across all customers, or does each integration need special handling?

Many teams solve this by narrowing scope or hardcoding logic per HR vendor. That works early, but it limits who your AI features work for and how confidently you can ship changes.

Unified's HR & Directory API is designed to remove those constraints. Instead of reconciling HR semantics downstream, Unified normalizes employees, compensation, payslips, timeshifts, and timeoff upstream. Dates, enums, currencies, and identifiers behave consistently before your AI pipeline ever runs.

The same normalized HR layer can also support retrieval-augmented generation (RAG) workflows, where HR records or policy documents are indexed and retrieved to ground AI responses in real employee data.

This guide shows how to build AI-ready HR datasets on top of that normalized layer using Unified's HR & Directory API, without branching logic per provider and without guessing at data semantics.

Prerequisites

Node.js v18+
A Unified account with an HRIS integration enabled
Your Unified API key
A customer HRIS connectionId

Step 1: Set up your project

mkdir hris-ai-training
cd hris-ai-training
npm init -y
npm install @unified-api/typescript-sdk dotenv

Create a .env file:

UNIFIED_API_KEY=your_unified_api_key
CONNECTION_HRIS=your_customer_hris_connection_id

Step 2: Initialize the SDK

import "dotenv/config";
import { UnifiedTo } from "@unified-api/typescript-sdk";

const { UNIFIED_API_KEY, CONNECTION_HRIS } = process.env;

const sdk = new UnifiedTo({
  security: { jwt: UNIFIED_API_KEY! },
});

All HRIS operations live under the sdk.hris namespace.

Step 3: Understand the normalized HRIS objects

Before writing any model logic, it's critical to understand what Unified guarantees and what it does not.

Employees (`HrisEmployee`)

Employees are the anchor for all HRIS data. Each employee record includes identity fields, employment status, and current compensation.

Important fields you'll rely on:

id
employment_status (ACTIVE, INACTIVE)
employment_type (FULL_TIME, PART_TIME, CONTRACTOR, etc.)
hired_at, terminated_at
compensation[]

Compensation entries are current-state only. They do not include start or end dates. Multiple entries may exist per employee to represent salary, bonus, equity, or other components.

Payslips (`HrisPayslip`)

Payslips represent historical payroll events. Each record includes:

user_id (employee reference)
start_at, end_at, paid_at
gross_amount, net_amount
currency

Payslips are the primary source of historical pay data.

Timeshifts (`HrisTimeshift`)

Timeshifts represent time-based work records. Each record includes:

employee_user_id
start_at, end_at
hours
Optional compensation entries

Timeshifts are useful for hourly or shift-based analysis, but are provider-dependent and should be treated as optional.

Note: Some HRIS providers may not support timeshifts or timeoff.

Timeoff (`HrisTimeoff`)

Timeoff records represent leave events. They include:

user_id
start_at, end_at
status, is_paid

Timeoff can be used for availability or absence features, but is not required for payroll-based training.

Step 4: Fetch employees

Employee lists are paginated using limit and offset. The list endpoint returns a plain array, not a wrapped object.

export async function fetchAllEmployees() {
  const out = [];
  let offset = 0;
  const limit = 100;

  while (true) {
    const page = await sdk.hris.listHrisEmployees({
      connectionId: CONNECTION_HRIS!,
      limit,
      offset,
      sort: "updated_at",
      order: "asc",
    });

    if (!page || page.length === 0) break;

    out.push(...page);
    offset += limit;
  }

  return out;
}

Step 5: Fetch payslips per employee

Payslips can be filtered by employee using user_id.

export async function fetchPayslipsForEmployee(employeeId: string) {
  return await sdk.hris.listHrisPayslips({
    connectionId: CONNECTION_HRIS!,
    user_id: employeeId,
    limit: 100,
    offset: 0,
    sort: "updated_at",
    order: "asc",
  });
}

This returns an array of HrisPayslip.

Step 6: Optional — fetch timeshifts

If your use case involves hourly analysis, timeshifts can be fetched and joined by employee_user_id.

export async function fetchTimeshifts(startGte?: string, endLt?: string) {
  return await sdk.hris.listHrisTimeshifts({
    connectionId: CONNECTION_HRIS!,
    limit: 100,
    offset: 0,
    start_gte: startGte ?? "",
    end_lt: endLt ?? "",
  });
}

Not all providers support timeshifts. Your pipeline should tolerate empty results.

Step 7: Normalize employees for training

This step converts raw HRIS records into a stable shape for feature extraction.

export function normalizeEmployee(e: any) {
  return {
    id: e.id,
    status: e.employment_status,
    type: e.employment_type,
    hiredAt: e.hired_at,
    terminatedAt: e.terminated_at,
    compensation: (e.compensation ?? []).map((c: any) => ({
      type: c.type,
      amount: c.amount,
      currency: c.currency,
      frequency: c.frequency,
    })),
  };
}

Do not annualize or convert currencies here. Compensation frequency and currency are explicitly modeled and should remain intact.

Step 8: Build payroll history features

Payslips provide historical labels and trends.

export function aggregatePayroll(payslips: any[]) {
  return payslips.map((p) => ({
    startAt: p.start_at,
    endAt: p.end_at,
    gross: p.gross_amount,
    net: p.net_amount,
    currency: p.currency,
  }));
}

Group by month, quarter, or fiscal period depending on your model needs.

Step 9: Assemble AI-ready records

At this point, you can construct training rows that combine:

Current employee attributes
Current compensation structure
Historical payroll events
Optional hours worked or timeoff signals

export function buildTrainingRow(employee: any, payroll: any[]) {
  return {
    employeeId: employee.id,
    status: employee.status,
    employmentType: employee.type,
    compensation: employee.compensation,
    payrollHistory: payroll,
  };
}

These records can be serialized to JSONL, stored in a feature store, or streamed into a training job.

Optional: Using HRIS Data in a RAG Architecture

Beyond structured model training, normalized HR records can also power RAG-based assistants:

Chunk and embed relevant HR fields (for example, job titles, department descriptions, policy documents linked to roles).
Store embeddings in your vector database with identifiers like connection_id, employee_id, and updated_at.
On user query ('What is this employee's tenure?' or 'What policies apply to contractors?'), retrieve relevant segments before generating a grounded response. Unified Use Case RAG Pipelines

Unified handles ingestion and normalization across HR providers; embeddings and vector storage remain in your infrastructure.

Closing thoughts

AI features built on HR data fail when the data itself isn't reliable. Unified's HR & Directory API gives you a consistent, real-time foundation for employees, compensation, payroll, and time-based work across providers.

Once the semantics are stable, the modeling becomes the easy part.

If you're building AI features that depend on HR data, this is the layer that lets you do it once and scale it everywhere.

→ Start your 30-day free trial

→ Book a demo

All articles