Skip to main content

Transformation

The Transformation Layer is the second stage in the SATIF pipeline. Its core responsibility is to apply business-specific logic to the standardized data—contained within an SDIF file generated by the Standardization Layer—to produce the final output files in their required formats (e.g., XLSX, CSV, JSON, XML).

This layer can execute transformation logic that is either:

  • AI-Generated: Developed by a Transformation Builder based on examples and natural language.
  • User-Provided: Manually written by developers, conforming to SATIF's Transformer interface.

Dual Nature: Code Generation & Execution

Transformation within SATIF is a two-fold process, aligning with the BUILD and RUN cycles:

  1. Transformation Code Generation (BUILD Cycle): The intelligent creation of executable code that defines the transformation rules.
  2. Transformation Execution (RUN Cycle): The application of this generated (or user-provided) code to actual standardized data.

1. Transformation Code Generation (BUILD Cycle)

This phase focuses on automatically generating an executable script (e.g., a Python file like transformation.py) that encapsulates the necessary data manipulation logic.

  • Key Component: Transformation Builder (also known as Transformation Code Builder - TCB), an AI-powered agent.

  • Primary Inputs for the Builder:

    • Input Data Examples: Standardized input examples, typically an input.sdif file, often accompanied by its input_schema and representative input_sample data to guide the AI.
    • Output Data Examples: Samples of the desired output files (output_example_files). These are internally converted to output_example.sdif, output_repr (textual/structural representation), output_schema, and output_sample for the AI to analyze and understand the target structure and content.
    • Natural Language Instructions (nl_instructions): User-provided text describing the transformation goals, rules, and mappings.
    • Target Output File Names (output_files_names): Specifies the naming convention for the final output files.
  • Iterative Generation Process:

    1. Prompt Synthesis: The Transformation Builder constructs a detailed Prompt by integrating all provided inputs (SDIF examples, output examples, NL instructions, file names).
    2. Code Generation & Evaluation Loop (controlled by max_iteration, e.g., 10 attempts): a. Agent Logic: The AI Agent generates a candidate transformation_code string. b. Agent Analysis Tools: To inform its code generation, the Agent can:
      • execute_sql: Directly query the input_sdif_path (example SDIF) to analyze its tables, schema, and content.
      • represent_file: Obtain a structured representation of the output_example_files to better understand the target format and data organization. c. Code Execution: A Code Executor runs the generated transformation_code using the input_sdif_path as its data source. d. Comparison & Feedback: A Files Comparator compares the generated_files (output from the trial execution) against the example_files (user-provided output examples). e. Refinement: Based on the are_equivalent? status and detailed diff details from the comparator, the Agent refines the transformation_code in subsequent iterations.
  • Output: A finalized, optimized transformation_code (string), ready to be saved as an executable script.

2. Transformation Execution (RUN Cycle)

This phase applies the validated transformation_code (generated during the BUILD cycle or provided by a user) to new, live standardized data.

  • Key Component: Transformation Execution Layer (the runtime environment that executes the transformation script).

  • Inputs:

    • sdif_standardized file (e.g., invoices.sdif): The output from the Standardization Layer for the actual datasource being processed.
    • transformation_code: The executable script (e.g., Python code in transformation.py).
  • Process: The Transformation Execution Layer invokes the transformation_code, which reads from the sdif_standardized input and applies the defined business logic (queries, data manipulation, restructuring, formatting).

  • Output: The final transformed data, written to one or more output files in the specified target formats (e.g., generated_output.xlsx).

Inputs & Outputs (Summary)

  • For Code Generation (BUILD):
    • Inputs: Example SDIF, example output files, NL instructions, target file names.
    • Output: Executable transformation_code (e.g., a Python script).
  • For Execution (RUN):
    • Inputs: Live sdif_standardized file, transformation_code script.
    • Output: Final output files (e.g., generated_output.xlsx).

Components

  • Transformation Builder (AI Agent): Generates transformation code.
  • MCP Server: Serves effective prompts for the AI Agent.
  • Code Executor: Runs the generated or user-provided transformation scripts in a controlled environment.
  • Files Comparator: Validates generated outputs against examples during the BUILD cycle.
  • Code Transformer: Executes the transformation logic.