Building a transformation pipeline
This guide walks you through the simplest possible SATIF AI pipeline:
• Build Phase : generate the transformation code once by feeding SATIF both the standardized data and an example of the desired output. • Run Phase : apply that code to any number of new datasources to get fresh output files.
Everything below fits in ±50 lines of Python.
1 Install the dependencies
pip install "satif-ai>=0.1" fastmcp
Python 3.10+ is required.
2 Project layout
my_project/
├── input_files/
│ └── sales.csv
├── output_examples/
│ └── expected_sales.json
└── run_satif.py
sales.csv
is the raw source file.expected_sales.json
is a single example of the desired result.
SATIF will learn the transformation by comparing the generated output with the output example file.
3 The script (both phases)
# run_satif.py
"""Minimal two-phase SATIF pipeline in explicit functions."""
import asyncio
from pathlib import Path
from fastmcp import FastMCP, Client
from fastmcp.client.transports import FastMCPTransport
from satif_ai.standardizers.ai import AIStandardizer
from satif_ai.transformation_builders.syncpulse import (
SyncpulseTransformationBuilder,
)
from satif_ai.utils.openai_mcp import OpenAICompatibleMCP
from satif_sdk.code_executors.local_executor import LocalCodeExecutor
from satif_sdk.transformers.code import CodeTransformer
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
INPUT_FILE = "input_files/sales.csv"
OUTPUT_EXAMPLE = "output_examples/expected_sales.json"
MODEL = "o4-mini"
SDIF_PATH = Path("input.sdif")
CODE_PATH = Path("transform.py")
OUTPUT_DIR = Path("generated_output")
# ---------------------------------------------------------------------------
# Phase A – BUILD (one-off)
# ---------------------------------------------------------------------------
async def build_transformation() -> None:
"""Standardize *INPUT_FILE* and generate *CODE_PATH*."""
mcp_server = FastMCP()
mcp_transport = FastMCPTransport(mcp=mcp_server)
async with Client(mcp_transport) as mcp_client:
openai_mcp = OpenAICompatibleMCP(mcp=mcp_server)
await openai_mcp.connect()
# 1 Standardize datasource → SDIF
standardizer = AIStandardizer(
mcp_server=openai_mcp,
mcp_session=mcp_client.session,
llm_model=MODEL,
)
await standardizer.standardize(
datasource=INPUT_FILE,
output_path=SDIF_PATH,
overwrite=True,
)
# 2 Generate Python transformation code
builder = SyncpulseTransformationBuilder(
mcp_server=openai_mcp,
mcp_session=mcp_client.session,
llm_model=MODEL,
)
code_str = await builder.build(
sdif=SDIF_PATH,
output_target_files={OUTPUT_EXAMPLE: Path(OUTPUT_EXAMPLE).name},
instructions=(
"For every customer in sales.csv, compute total_amount and "
"output JSON with fields: customer_id and total_amount."
),
)
CODE_PATH.write_text(code_str)
# ---------------------------------------------------------------------------
# Phase B – RUN (repeatable)
# ---------------------------------------------------------------------------
async def run_transformation() -> None:
"""Apply *CODE_PATH* to *SDIF_PATH* to produce files in *OUTPUT_DIR*."""
transformer = CodeTransformer(
function=CODE_PATH,
code_executor=LocalCodeExecutor(disable_security_warning=True),
)
transformer.export(
sdif=SDIF_PATH,
output_path=OUTPUT_DIR,
)
# ---------------------------------------------------------------------------
# Entrypoint (dev convenience): build then run.
# ---------------------------------------------------------------------------
if __name__ == "__main__":
asyncio.run(build_transformation())
asyncio.run(run_transformation())
4 Run it
python run_satif.py
You will obtain:
input.sdif
– the standardized SQLite database.transform.py
– the AI-generated transformation script.generated_output/expected_sales.json
– the file produced by the script.
Compare the generated file with output_examples/expected_sales.json
. If they differ, tweak the instructions
string or provide additional example files.
5 Where to go next
- Pass a list of paths to
datasource
to merge multiple inputs into one SDIF. - Map several example outputs in
output_target_files
to generate multi-file transformations. - Tune
llm_model
for different speed/quality trade-offs. - Use the higher-level helpers
astandardize()
andatransform()
when you don't need fine-grained control over the builder.
That's all – you now have a working SATIF transformation pipeline in two explicit steps.