Remote Standardizer
🚧 WORK IN PROGRESS: The RemoteStandardizer is currently under development and the cloud API service is not yet available. This documentation serves as a reference for the intended functionality.
The RemoteStandardizer
is an asynchronous component designed to interact with a remote Satif-compliant standardization API. It enables offloading standardization tasks to a cloud service, providing benefits such as avoiding local AI dependencies and running standardization as background tasks with progress tracking.
Benefits​
- Reduced Dependencies: Avoid installing heavy AI dependencies locally
- Background Processing: Run standardization as asynchronous background tasks
- Progress Tracking: Monitor standardization progress in real-time via Server-Sent Events
- Scalability: Leverage cloud infrastructure for processing large datasets
- Run History: Track and manage previous standardization runs
How It Works​
The RemoteStandardizer follows a workflow pattern:
- File Upload: Packages input files (compressing multiple files into ZIP archives)
- Run Creation: Initiates a standardization run on the remote API
- Progress Monitoring: Tracks progress via Server-Sent Events (SSE) streaming
- Result Download: Retrieves the resulting SDIF file once processing completes
Configuration​
The standardizer requires configuration of the remote API endpoints:
from satif_sdk.standardizers import RemoteStandardizer
# Using environment variables (recommended)
standardizer = RemoteStandardizer()
# Or explicit configuration
standardizer = RemoteStandardizer(
base_url="https://api.satif.example.com",
api_key="your-api-key",
timeout=600.0
)
Environment Variables​
SATIF_REMOTE_BASE_URL
: Base URL of the remote APISATIF_REMOTE_API_KEY
: API key for authentication
Custom HTTP Client​
For advanced configurations, provide a custom httpx.AsyncClient
:
import httpx
from satif_sdk.standardizers import RemoteStandardizer
custom_client = httpx.AsyncClient(
timeout=httpx.Timeout(900.0),
limits=httpx.Limits(max_connections=10)
)
standardizer = RemoteStandardizer(client=custom_client)
Basic Usage​
import asyncio
from satif_sdk.standardizers import RemoteStandardizer
async def main():
standardizer = RemoteStandardizer()
result = await standardizer.standardize(
datasource="data.csv",
output_path="output.sdif",
options={"custom_config": "value"},
log_sse_events=True
)
print(f"Standardization completed: {result.output_path}")
asyncio.run(main())
Parameters​
standardize()
Method​
datasource
: Path or list of paths to input filesoutput_path
: Path where the resulting SDIF file will be savedoptions
: Optional processing configuration sent to the remote APIlog_sse_events
: Whether to log Server-Sent Events messages (default:False
)overwrite
: Whether to overwrite existing output files (default:False
)
File Handling​
Single Files​
Individual files are uploaded directly with their original filename and MIME type.
Multiple Files​
Multiple input files are automatically compressed into a ZIP archive before upload, reducing transfer time and simplifying remote processing.
ZIP Archives​
ZIP files in the datasource are handled as single entities and processed by the remote service.
Error Handling​
The standardizer provides comprehensive error handling for:
- Network Issues: HTTP timeouts, connection failures
- Authentication: Invalid API keys or authorization failures
- File Operations: Missing files, permission issues, disk space
- Remote Processing: Failed standardization runs, invalid responses
- Download Issues: Corrupted or incomplete result files
Common exceptions:
ValueError
: Invalid datasource or configurationFileNotFoundError
: Input files don't existFileExistsError
: Output exists and overwrite is FalseRuntimeError
: Remote API or processing failureshttpx.HTTPStatusError
: HTTP-level errors
Current Status​
The RemoteStandardizer is not yet production-ready:
- Cloud API service infrastructure is under development
- API endpoints and authentication methods may change
- Documentation reflects intended functionality
For immediate standardization needs, use the local standardizers:
AIStandardizer
for AI-powered processing