Transformation
The Transformation Layer is the second stage in the SATIF pipeline. Its core responsibility is to apply business-specific logic to the standardized data—contained within an SDIF file generated by the Standardization Layer—to produce the final output files in their required formats (e.g., XLSX, CSV, JSON, XML).
This layer can execute transformation logic that is either:
- AI-Generated: Developed by a
Transformation Builder
based on examples and natural language. - User-Provided: Manually written by developers, conforming to SATIF's
Transformer
interface.
Dual Nature: Code Generation & Execution
Transformation within SATIF is a two-fold process, aligning with the BUILD
and RUN
cycles:
- Transformation Code Generation (
BUILD
Cycle): The intelligent creation of executable code that defines the transformation rules. - Transformation Execution (
RUN
Cycle): The application of this generated (or user-provided) code to actual standardized data.
1. Transformation Code Generation (BUILD
Cycle)
This phase focuses on automatically generating an executable script (e.g., a Python file like transformation.py
) that encapsulates the necessary data manipulation logic.
-
Key Component:
Transformation Builder
(also known as Transformation Code Builder - TCB), an AI-powered agent. -
Primary Inputs for the Builder:
- Input Data Examples: Standardized input examples, typically an
input.sdif
file, often accompanied by itsinput_schema
and representativeinput_sample
data to guide the AI. - Output Data Examples: Samples of the desired output files (
output_example_files
). These are internally converted tooutput_example.sdif
,output_repr
(textual/structural representation),output_schema
, andoutput_sample
for the AI to analyze and understand the target structure and content. - Natural Language Instructions (
nl_instructions
): User-provided text describing the transformation goals, rules, and mappings. - Target Output File Names (
output_files_names
): Specifies the naming convention for the final output files.
- Input Data Examples: Standardized input examples, typically an
-
Iterative Generation Process:
- Prompt Synthesis: The
Transformation Builder
constructs a detailedPrompt
by integrating all provided inputs (SDIF examples, output examples, NL instructions, file names). - Code Generation & Evaluation Loop (controlled by
max_iteration
, e.g., 10 attempts): a. Agent Logic: The AI Agent generates a candidatetransformation_code
string. b. Agent Analysis Tools: To inform its code generation, the Agent can:execute_sql
: Directly query theinput_sdif_path
(example SDIF) to analyze its tables, schema, and content.represent_file
: Obtain a structured representation of theoutput_example_files
to better understand the target format and data organization. c. Code Execution: ACode Executor
runs the generatedtransformation_code
using theinput_sdif_path
as its data source. d. Comparison & Feedback: AFiles Comparator
compares thegenerated_files
(output from the trial execution) against theexample_files
(user-provided output examples). e. Refinement: Based on theare_equivalent?
status and detaileddiff details
from the comparator, the Agent refines thetransformation_code
in subsequent iterations.
- Prompt Synthesis: The
-
Output: A finalized, optimized
transformation_code
(string), ready to be saved as an executable script.
2. Transformation Execution (RUN
Cycle)
This phase applies the validated transformation_code
(generated during the BUILD
cycle or provided by a user) to new, live standardized data.
-
Key Component:
Transformation Execution Layer
(the runtime environment that executes the transformation script). -
Inputs:
sdif_standardized
file (e.g.,invoices.sdif
): The output from the Standardization Layer for the actual datasource being processed.transformation_code
: The executable script (e.g., Python code intransformation.py
).
-
Process: The
Transformation Execution Layer
invokes thetransformation_code
, which reads from thesdif_standardized
input and applies the defined business logic (queries, data manipulation, restructuring, formatting). -
Output: The final transformed data, written to one or more output files in the specified target formats (e.g.,
generated_output.xlsx
).
Inputs & Outputs (Summary)
- For Code Generation (
BUILD
):- Inputs: Example SDIF, example output files, NL instructions, target file names.
- Output: Executable
transformation_code
(e.g., a Python script).
- For Execution (
RUN
):- Inputs: Live
sdif_standardized
file,transformation_code
script. - Output: Final output files (e.g.,
generated_output.xlsx
).
- Inputs: Live
Components
Transformation Builder
(AI Agent): Generates transformation code.MCP Server
: Serves effective prompts for the AI Agent.Code Executor
: Runs the generated or user-provided transformation scripts in a controlled environment.Files Comparator
: Validates generated outputs against examples during the BUILD cycle.Code Transformer
: Executes the transformation logic.