Building a transformation pipeline

This guide walks you through the simplest possible SATIF AI pipeline:

• Build Phase : generate the transformation code once by feeding SATIF both the standardized data and an example of the desired output. • Run Phase : apply that code to any number of new datasources to get fresh output files.

Everything below fits in ±50 lines of Python.

1 Install the dependencies

pip install "satif-ai>=0.1" fastmcp

Python 3.10+ is required.

2 Project layout

my_project/
├── input_files/
│   └── sales.csv
├── output_examples/
│   └── expected_sales.json
└── run_satif.py

sales.csv is the raw source file.
expected_sales.json is a single example of the desired result.

SATIF will learn the transformation by comparing the generated output with the output example file.

3 The script (both phases)

# run_satif.py
"""Minimal two-phase SATIF pipeline in explicit functions."""

import asyncio
from pathlib import Path

from fastmcp import FastMCP, Client
from fastmcp.client.transports import FastMCPTransport

from satif_ai.standardizers.ai import AIStandardizer
from satif_ai.transformation_builders.syncpulse import (
    SyncpulseTransformationBuilder,
)
from satif_ai.utils.openai_mcp import OpenAICompatibleMCP
from satif_sdk.code_executors.local_executor import LocalCodeExecutor
from satif_sdk.transformers.code import CodeTransformer


# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
INPUT_FILE = "input_files/sales.csv"
OUTPUT_EXAMPLE = "output_examples/expected_sales.json"
MODEL = "o4-mini"

SDIF_PATH = Path("input.sdif")
CODE_PATH = Path("transform.py")
OUTPUT_DIR = Path("generated_output")


# ---------------------------------------------------------------------------
# Phase A – BUILD (one-off)
# ---------------------------------------------------------------------------


async def build_transformation() -> None:
    """Standardize *INPUT_FILE* and generate *CODE_PATH*."""

    mcp_server = FastMCP()
    mcp_transport = FastMCPTransport(mcp=mcp_server)

    async with Client(mcp_transport) as mcp_client:
        openai_mcp = OpenAICompatibleMCP(mcp=mcp_server)
        await openai_mcp.connect()

        # 1  Standardize datasource → SDIF
        standardizer = AIStandardizer(
            mcp_server=openai_mcp,
            mcp_session=mcp_client.session,
            llm_model=MODEL,
        )
        await standardizer.standardize(
            datasource=INPUT_FILE,
            output_path=SDIF_PATH,
            overwrite=True,
        )

        # 2  Generate Python transformation code
        builder = SyncpulseTransformationBuilder(
            mcp_server=openai_mcp,
            mcp_session=mcp_client.session,
            llm_model=MODEL,
        )
        code_str = await builder.build(
            sdif=SDIF_PATH,
            output_target_files={OUTPUT_EXAMPLE: Path(OUTPUT_EXAMPLE).name},
            instructions=(
                "For every customer in sales.csv, compute total_amount and "
                "output JSON with fields: customer_id and total_amount."
            ),
        )
        CODE_PATH.write_text(code_str)


# ---------------------------------------------------------------------------
# Phase B – RUN (repeatable)
# ---------------------------------------------------------------------------


async def run_transformation() -> None:
    """Apply *CODE_PATH* to *SDIF_PATH* to produce files in *OUTPUT_DIR*."""

    transformer = CodeTransformer(
        function=CODE_PATH,
        code_executor=LocalCodeExecutor(disable_security_warning=True),
    )
    transformer.export(
        sdif=SDIF_PATH,
        output_path=OUTPUT_DIR,
    )


# ---------------------------------------------------------------------------
# Entrypoint (dev convenience): build then run.
# ---------------------------------------------------------------------------


if __name__ == "__main__":
    asyncio.run(build_transformation())
    asyncio.run(run_transformation())

4 Run it

python run_satif.py

You will obtain:

input.sdif – the standardized SQLite database.
transform.py – the AI-generated transformation script.
generated_output/expected_sales.json – the file produced by the script.

Compare the generated file with output_examples/expected_sales.json. If they differ, tweak the instructions string or provide additional example files.

5 Where to go next

Pass a list of paths to datasource to merge multiple inputs into one SDIF.
Map several example outputs in output_target_files to generate multi-file transformations.
Tune llm_model for different speed/quality trade-offs.
Use the higher-level helpers astandardize() and atransform() when you don't need fine-grained control over the builder.

That's all – you now have a working SATIF transformation pipeline in two explicit steps.

1 Install the dependencies​

2 Project layout​

3 The script (both phases)​

4 Run it​

5 Where to go next​

1 Install the dependencies

2 Project layout

3 The script (both phases)

4 Run it

5 Where to go next