Transformation
The Transformation Layer is the second stage in the SATIF pipeline. Its core responsibility is to apply business-specific logic to the standardized data—contained within an SDIF file generated by the Standardization Layer—to produce the final output files in their required formats (e.g., XLSX, CSV, JSON, XML).
This layer can execute transformation logic that is either:
- AI-Generated: Developed by a 
Transformation Builderbased on examples and natural language. - User-Provided: Manually written by developers, conforming to SATIF's 
Transformerinterface. 
Dual Nature: Code Generation & Execution
Transformation within SATIF is a two-fold process, aligning with the BUILD and RUN cycles:
- Transformation Code Generation (
BUILDCycle): The intelligent creation of executable code that defines the transformation rules. - Transformation Execution (
RUNCycle): The application of this generated (or user-provided) code to actual standardized data. 
1. Transformation Code Generation (BUILD Cycle)
This phase focuses on automatically generating an executable script (e.g., a Python file like transformation.py) that encapsulates the necessary data manipulation logic.
- 
Key Component:
Transformation Builder(also known as Transformation Code Builder - TCB), an AI-powered agent. - 
Primary Inputs for the Builder:
- Input Data Examples: Standardized input examples, typically an 
input.sdiffile, often accompanied by itsinput_schemaand representativeinput_sampledata to guide the AI. - Output Data Examples: Samples of the desired output files (
output_example_files). These are internally converted tooutput_example.sdif,output_repr(textual/structural representation),output_schema, andoutput_samplefor the AI to analyze and understand the target structure and content. - Natural Language Instructions (
nl_instructions): User-provided text describing the transformation goals, rules, and mappings. - Target Output File Names (
output_files_names): Specifies the naming convention for the final output files. 
 - Input Data Examples: Standardized input examples, typically an 
 - 
Iterative Generation Process:
- Prompt Synthesis: The 
Transformation Builderconstructs a detailedPromptby integrating all provided inputs (SDIF examples, output examples, NL instructions, file names). - Code Generation & Evaluation Loop (controlled by 
max_iteration, e.g., 10 attempts): a. Agent Logic: The AI Agent generates a candidatetransformation_codestring. b. Agent Analysis Tools: To inform its code generation, the Agent can:execute_sql: Directly query theinput_sdif_path(example SDIF) to analyze its tables, schema, and content.represent_file: Obtain a structured representation of theoutput_example_filesto better understand the target format and data organization. c. Code Execution: ACode Executorruns the generatedtransformation_codeusing theinput_sdif_pathas its data source. d. Comparison & Feedback: AFiles Comparatorcompares thegenerated_files(output from the trial execution) against theexample_files(user-provided output examples). e. Refinement: Based on theare_equivalent?status and detaileddiff detailsfrom the comparator, the Agent refines thetransformation_codein subsequent iterations.
 
 - Prompt Synthesis: The 
 - 
Output: A finalized, optimized
transformation_code(string), ready to be saved as an executable script. 
2. Transformation Execution (RUN Cycle)
This phase applies the validated transformation_code (generated during the BUILD cycle or provided by a user) to new, live standardized data.
- 
Key Component:
Transformation Execution Layer(the runtime environment that executes the transformation script). - 
Inputs:
sdif_standardizedfile (e.g.,invoices.sdif): The output from the Standardization Layer for the actual datasource being processed.transformation_code: The executable script (e.g., Python code intransformation.py).
 - 
Process: The
Transformation Execution Layerinvokes thetransformation_code, which reads from thesdif_standardizedinput and applies the defined business logic (queries, data manipulation, restructuring, formatting). - 
Output: The final transformed data, written to one or more output files in the specified target formats (e.g.,
generated_output.xlsx). 
Inputs & Outputs (Summary)
- For Code Generation (
BUILD):- Inputs: Example SDIF, example output files, NL instructions, target file names.
 - Output: Executable 
transformation_code(e.g., a Python script). 
 - For Execution (
RUN):- Inputs: Live 
sdif_standardizedfile,transformation_codescript. - Output: Final output files (e.g., 
generated_output.xlsx). 
 - Inputs: Live 
 
Components
Transformation Builder(AI Agent): Generates transformation code.MCP Server: Serves effective prompts for the AI Agent.Code Executor: Runs the generated or user-provided transformation scripts in a controlled environment.Files Comparator: Validates generated outputs against examples during the BUILD cycle.Code Transformer: Executes the transformation logic.