Backend Architecture
Sparkless uses a modular backend architecture that allows different execution engines.
Backend Protocols
Protocol definitions for backend interfaces.
This module defines the protocols (interfaces) that backend implementations must satisfy. Using protocols enables dependency injection and makes modules testable independently.
- class sparkless.backend.protocols.QueryExecutor(*args, **kwargs)[source]
Bases:
ProtocolProtocol for executing queries on data.
This protocol defines the interface for query execution backends. Implementations can use different engines.
- __init__(*args, **kwargs)
- class sparkless.backend.protocols.DataMaterializer(*args, **kwargs)[source]
Bases:
ProtocolProtocol for materializing lazy DataFrame operations.
This protocol defines the interface for materializing queued operations on DataFrames. Implementations can use different execution engines.
Optional: backends may implement materialize_from_plan(self, data, schema, logical_plan) to execute from a serialized logical plan. When spark.sparkless.useLogicalPlan is true or backend is robin, the engine will call it when present instead of materialize().
- can_handle_operation(op_name, op_payload)[source]
Check if this materializer can handle a specific operation.
- can_handle_operations(operations)[source]
Check if this materializer can handle a list of operations.
- __init__(*args, **kwargs)
- class sparkless.backend.protocols.ExportBackend(*args, **kwargs)[source]
Bases:
ProtocolProtocol for DataFrame export operations.
This protocol defines the interface for exporting DataFrames to different formats and systems. Backend implementations should provide methods for exporting to their specific target systems.
- __init__(*args, **kwargs)
Polars Backend
The default backend uses Polars for high-performance DataFrame operations.
- class sparkless.backend.polars.operation_executor.PolarsOperationExecutor(expression_translator)[source]
Bases:
objectExecutes DataFrame operations using Polars DataFrame API.
- Parameters:
expression_translator (
PolarsExpressionTranslator)
Initialize operation executor.
- Parameters:
expression_translator (
PolarsExpressionTranslator) – Polars expression translator instance
- __init__(expression_translator)[source]
Initialize operation executor.
- Parameters:
expression_translator (
PolarsExpressionTranslator) – Polars expression translator instance
- apply_filter(df, condition)[source]
Apply a filter operation.
- Parameters:
df (
DataFrame) – Source Polars DataFramecondition (
Any) – Filter condition (ColumnOperation or expression)
- Return type:
DataFrame- Returns:
Filtered Polars DataFrame
- apply_with_column(df, column_name, expression, expected_field=None)[source]
Apply a withColumn operation.
- apply_join(df1, df2, on=None, how='inner', right_alias=None)[source]
Apply a join operation.
- Parameters:
- Return type:
DataFrame- Returns:
Joined DataFrame
- apply_union(df1, df2)[source]
Apply a union operation.
- Parameters:
df1 (
DataFrame) – First DataFramedf2 (
DataFrame) – Second DataFrame
- Return type:
DataFrame- Returns:
Unioned DataFrame
- apply_limit(df, n)[source]
Apply a limit operation.
- Parameters:
df (
DataFrame) – Source Polars DataFramen (
int) – Number of rows to return
- Return type:
DataFrame- Returns:
Limited DataFrame
- apply_offset(df, n)[source]
Apply an offset operation (skip first n rows).
- Parameters:
df (
DataFrame) – Source Polars DataFramen (
int) – Number of rows to skip
- Return type:
DataFrame- Returns:
DataFrame with first n rows skipped
- apply_distinct(df)[source]
Apply a distinct operation.
- Parameters:
df (
DataFrame) – Source Polars DataFrame- Return type:
DataFrame- Returns:
DataFrame with distinct rows
Polars materializer for lazy DataFrame operations.
This module provides materialization of lazy DataFrame operations using Polars, replacing SQL-based materialization with Polars DataFrame operations.
- class sparkless.backend.polars.materializer.PolarsMaterializer[source]
Bases:
objectMaterializes lazy operations using Polars.
Initialize Polars materializer.
- SUPPORTED_OPERATIONS = {'distinct', 'drop', 'filter', 'groupBy', 'join', 'limit', 'offset', 'orderBy', 'select', 'union', 'withColumn', 'withColumnRenamed'}
- UNSUPPORTED_OPERATIONS = {'e', 'months_between', 'pi'}
- can_handle_operation(op_name, op_payload)[source]
Check if this materializer can handle a specific operation.
- can_handle_operations(operations)[source]
Check if this materializer can handle a list of operations.
Expression translator for converting Column expressions to Polars expressions.
This module translates Sparkless column expressions (Column, ColumnOperation) to Polars expressions (pl.Expr) for DataFrame operations.
- class sparkless.backend.polars.expression_translator.PolarsExpressionTranslator[source]
Bases:
objectTranslates Column expressions to Polars expressions.
- translate(expr, input_col_dtype=None, available_columns=None, case_sensitive=None, column_dtypes=None)[source]
Translate Column expression to Polars expression.
- Parameters:
expr (
Any) – Column, ColumnOperation, or other expressioninput_col_dtype (
Any) – Optional Polars dtype of input column (for to_timestamp optimization)available_columns (
Optional[List[str]]) – Optional list of available column names for case-insensitive matchingcase_sensitive (
Optional[bool]) – Optional case sensitivity flag. If None, gets from session.
- Return type:
Expr- Returns:
Polars expression (pl.Expr)
Polars storage backend with Parquet file persistence.
This module provides a Polars-based storage implementation using Parquet files for persistence and in-memory DataFrames for active operations.
- class sparkless.backend.polars.storage.PolarsTable(name, schema, schema_name='default', db_path=None)[source]
Bases:
ITableTable implementation using Polars DataFrame.
- Parameters:
name (
str)schema (
StructType)schema_name (
str)
Initialize Polars table.
- Parameters:
name (
str) – Table nameschema (
StructType) – StructType schemaschema_name (
str) – Schema name (default: “default”)db_path (
Optional[str]) – Optional database path for persistence
- __init__(name, schema, schema_name='default', db_path=None)[source]
Initialize Polars table.
- Parameters:
name (
str) – Table nameschema (
StructType) – StructType schemaschema_name (
str) – Schema name (default: “default”)db_path (
Optional[str]) – Optional database path for persistence
- property schema: StructType
Get table schema.
- class sparkless.backend.polars.storage.PolarsSchema(name, db_path=None)[source]
Bases:
objectSchema implementation for Polars storage.
Initialize Polars schema.
- create_table(table, columns)[source]
Create a new table in this schema.
- Parameters:
table (
str) – Table namecolumns (
Union[List[StructField],StructType]) – Table schema (StructType or list of StructField)
- Return type:
- Returns:
PolarsTable instance
- class sparkless.backend.polars.storage.PolarsStorageManager(db_path=None)[source]
Bases:
IStorageManagerPolars-based storage manager implementing IStorageManager protocol.
Initialize Polars storage manager.
- Parameters:
db_path (
Optional[str]) – Optional database path for persistent storage. If None, uses in-memory storage only.
- create_table(schema_name, table_name, fields)[source]
Create a new table.
- Parameters:
schema_name (
str) – Schema nametable_name (
str) – Table namefields (
Union[List[StructField],StructType]) – Table schema
- Return type:
- Returns:
PolarsTable instance
- get_table_schema(schema_name, table_name)[source]
Get table schema.
- Parameters:
- Return type:
- Returns:
StructType schema
- query_table(schema_name, table_name, filter_expr=None)[source]
Query table with filter expression (optional method).
- Parameters:
- Return type:
- Returns:
List of dictionaries representing rows
Note
Filter expressions are not supported at the storage level for Polars backend. Filtering should be done at the DataFrame level using DataFrame.filter() or by loading data into a DataFrame and applying filters there. This method returns all data from the table regardless of filter_expr parameter.
Storage Backends
Storage manager module.
This module provides a unified storage manager that can use different backends.
- class sparkless.storage.manager.TableMetadata(name, schema, created_at, table_schema, properties)[source]
Bases:
objectMetadata for a table in the catalog.
- Parameters:
Initialize table metadata.
- Parameters:
- class sparkless.storage.manager.StorageManagerFactory[source]
Bases:
objectFactory for creating storage managers.
- static create_memory_manager()[source]
Create a memory storage manager.
- Return type:
IStorageManager- Returns:
Memory storage manager instance.
- class sparkless.storage.manager.UnifiedStorageManager(backend)[source]
Bases:
IStorageManagerUnified storage manager that can switch between backends.
- Parameters:
backend (
IStorageManager)
Initialize unified storage manager.
- Parameters:
backend (
IStorageManager) – Storage backend to use.
- __init__(backend)[source]
Initialize unified storage manager.
- Parameters:
backend (
IStorageManager) – Storage backend to use.
- get_current_schema()[source]
Return the current schema used for unqualified table references.
- Return type:
- set_current_schema(schema_name)[source]
Set the current schema used for unqualified table references.
- create_table(schema, table, columns)[source]
Create a new table.
- Parameters:
schema (
str) – Name of the schema.table (
str) – Name of the table.columns (
Union[List[StructField],StructType]) – Table columns definition.
- Return type:
- get_table_schema(schema_name, table_name)[source]
Get table schema.
- Parameters:
- Return type:
Union[Any,StructType]- Returns:
Table schema.
- get_table_metadata(schema_name, table_name)[source]
Get table metadata including Delta-specific fields.
- switch_backend(backend)[source]
Switch to a different storage backend.
- Parameters:
backend (
IStorageManager) – New storage backend to use.- Return type:
- save_table_metadata(qualified_name, metadata)[source]
Store table metadata.
- Parameters:
qualified_name (
str) – Qualified table name (schema.table).metadata (
TableMetadata) – Table metadata to store.
- Return type:
- get_table_metadata_by_name(qualified_name)[source]
Retrieve table metadata by qualified name.
- Parameters:
qualified_name (
str) – Qualified table name (schema.table).- Return type:
- Returns:
Table metadata or None if not found.
- list_table_metadata(schema)[source]
List all table metadata for a schema.
- Parameters:
schema (
str) – Schema name.- Return type:
- Returns:
List of table metadata objects.
Memory storage backend.
This module provides an in-memory storage implementation.
- class sparkless.storage.backends.memory.MemoryTable(name, schema)[source]
Bases:
ITableIn-memory table implementation.
- Parameters:
name (
str)schema (
StructType)
Initialize memory table.
- Parameters:
name (
str) – Table name.schema (
StructType) – Table schema.
- __init__(name, schema)[source]
Initialize memory table.
- Parameters:
name (
str) – Table name.schema (
StructType) – Table schema.
- property schema: StructType
Get table schema.
- class sparkless.storage.backends.memory.MemorySchema(name)[source]
Bases:
objectIn-memory database schema (namespace) implementation.
- Parameters:
name (
str)
Initialize memory schema.
- Parameters:
name (
str) – Schema name.
- create_table(table, columns)[source]
Create a new table in this schema.
- Parameters:
table (
str) – Name of the table.columns (
Union[List[StructField],StructType]) – Table columns definition.
- Return type:
- class sparkless.storage.backends.memory.MemoryStorageManager[source]
Bases:
IStorageManagerIn-memory storage manager implementation.
Initialize memory storage manager.
- get_table_schema(schema_name, table_name)[source]
Get table schema.
- Parameters:
- Return type:
- Returns:
Table schema.
- get_table_metadata(schema_name, table_name)[source]
Get table metadata including Delta-specific fields.
File-based storage backend.
This module provides a file-based storage implementation using JSON files.
- class sparkless.storage.backends.file.FileTable(name, schema, file_path)[source]
Bases:
ITableFile-based table implementation.
- Parameters:
name (
str)schema (
StructType)file_path (
str)
Initialize file table.
- Parameters:
name (
str) – Table name.schema (
StructType) – Table schema.file_path (
str) – Path to table data file.
- __init__(name, schema, file_path)[source]
Initialize file table.
- Parameters:
name (
str) – Table name.schema (
StructType) – Table schema.file_path (
str) – Path to table data file.
- property schema: StructType
Get table schema.
- class sparkless.storage.backends.file.FileSchema(name, base_path)[source]
Bases:
ISchemaFile-based schema implementation.
Initialize file schema.
- create_table(table, columns)[source]
Create a new table in this schema.
- Parameters:
table (
str) – Name of the table.columns (
Union[List[StructField],StructType]) – Table columns definition.
- Return type:
- class sparkless.storage.backends.file.FileStorageManager(base_path='sparkless_storage')[source]
Bases:
IStorageManagerFile-based storage manager implementation.
- Parameters:
base_path (
str)
Initialize file storage manager.
- Parameters:
base_path (
str) – Base path for storage files.
- __init__(base_path='sparkless_storage')[source]
Initialize file storage manager.
- Parameters:
base_path (
str) – Base path for storage files.
- create_table(schema_name, table_name, fields)[source]
Create a new table.
- Parameters:
schema_name (
str) – Name of the schema.table_name (
str) – Name of the table.fields (
Union[List[StructField],StructType]) – Table fields definition.
- Return type:
- get_table_schema(schema_name, table_name)[source]
Get table schema.
- Parameters:
- Return type:
- Returns:
Table schema.
- get_table_metadata(schema_name, table_name)[source]
Get table metadata including Delta-specific fields.