Session Management

SparkSession

SparklessSession implementation for Sparkless.

This module provides a complete mock implementation of PySpark’s SparkSession that behaves identically to the real SparkSession for testing and development. It includes session management, DataFrame creation, SQL operations, and catalog management without requiring a JVM or actual Spark installation.

Key Features:

Complete PySpark SparkSession API compatibility
DataFrame creation from various data sources
SQL query parsing and execution
Catalog operations (databases, tables)
Configuration management
Session lifecycle management

Example

>>> from sparkless.sql import SparkSession
>>> spark = SparkSession("MyApp")
>>> data = [{"name": "Alice", "age": 25}]
>>> df = spark.createDataFrame(data)
>>> df.show()
DataFrame[1 rows, 2 columns]
age name
25    Alice
>>> spark.sql("CREATE DATABASE test")

SparkContext

SparklessContext implementation for Sparkless.

This module provides a mock implementation of PySpark’s SparkContext that behaves identically to the real SparkContext for testing and development. It includes context management, JVM simulation, and logging without requiring a JVM or actual Spark installation.

Key Features:

Complete PySpark SparkContext API compatibility
JVM context simulation
Log level management
Application name management
Context lifecycle management

Example

>>> from sparkless.session import SparkContext
>>> sc = SparkContext("MyApp")
>>> sc.setLogLevel("WARN")
>>> print(sc.appName)
MyApp

class sparkless.session.context.MockJVMFunctions[source]

Bases: object

Mock JVM functions for testing without actual JVM.

Initialize mock JVM functions.

__init__()[source]: Initialize mock JVM functions.

class sparkless.session.context.JVMContext[source]

Bases: object

Mock JVM context for testing without actual JVM.

Initialize mock JVM context.

__init__()[source]: Initialize mock JVM context.

property available: bool: Check if JVM is available.

get(key)[source]

Get JVM property.

Parameters:: key (str)
Return type:: Any

set(key, value)[source]

Set JVM property.

Parameters:

key (str)
value (Any)

Return type:

None

class sparkless.session.context.SparkContext(app_name='SparklessApp')[source]

Bases: object

SparklessContext for testing without PySpark.

Provides a comprehensive mock implementation of PySpark’s SparkContext that supports all major operations including context management, logging, and JVM simulation without requiring actual Spark installation.

app_name: Application name for the Spark context.

_jvm: JVM context for JVM operations.

Example

>>> sc = SparkContext("MyApp")
>>> sc.setLogLevel("WARN")
>>> print(sc.appName)
MyApp

Parameters:: app_name (str)

Initialize SparkContext.

Parameters:: app_name (str) – Name of the Spark application.

__init__(app_name='SparklessApp')[source]

Initialize SparkContext.

Parameters:: app_name (str) – Name of the Spark application.

setLogLevel(level)[source]

Set log level.

Parameters:: level (str) – Log level (DEBUG, INFO, WARN, ERROR, FATAL).
Return type:: None

property appName: str

Get application name.

Returns:: Application name string.

property jvm: JVMContext

Get JVM context.

Returns:: JVM context instance.

stop()[source]

Stop the Spark context.

In a real Spark context, this would stop the Spark application. This is a mock implementation.

Return type:: None

sparkUser()[source]

Return the logical Spark user associated with this context.

Return type:: str

__enter__()[source]

Context manager entry.

Return type:: SparkContext

__exit__(exc_type, exc_val, exc_tb)[source]

Context manager exit.

Parameters:

exc_type (Any)
exc_val (Any)
exc_tb (Any)

Return type:

None

Configuration

Configuration management for Sparkless.

This module provides configuration management for Sparkless, including session configuration, runtime settings, and environment-specific configurations.

Key Features:

Complete PySpark SparkConf API compatibility
Configuration validation and type checking
Environment-specific settings
Configuration builder pattern
Runtime configuration updates

Example

>>> from sparkless.session.config import Configuration
>>> conf = Configuration()
>>> conf.set("spark.app.name", "MyApp")
>>> conf.get("spark.app.name")
'MyApp'

class sparkless.session.config.configuration.Configuration[source]

Bases: object

SparklessConf for configuration management.

Provides a comprehensive mock implementation of PySpark’s SparkConf that supports all major operations including configuration management, validation, and environment-specific settings without requiring actual Spark.

_config: Internal configuration dictionary.

Example

>>> conf = Configuration()
>>> conf.set("spark.app.name", "MyApp")
>>> conf.get("spark.app.name")
'MyApp'

Initialize Configuration with default settings.

__init__()[source]: Initialize Configuration with default settings.

get(key, default=None)[source]

Get configuration value.

Parameters:

key (str) – Configuration key.
default (Optional[str]) – Default value if key not found.

Return type:

Optional[str]

Returns:

Configuration value or default.

set(key, value)[source]

Set configuration value.

Parameters:

key (str) – Configuration key.
value (Any) – Configuration value.

Return type:

None

setAll(pairs)[source]

Set multiple configuration values.

Parameters:: pairs (Dict[str, Any]) – Dictionary of key-value pairs.
Return type:: None

setMaster(master)[source]

Set master URL.

Parameters:: master (str) – Master URL.
Return type:: None

setAppName(name)[source]

Set application name.

Parameters:: name (str) – Application name.
Return type:: None

getAll()[source]

Get all configuration values.

Return type:: Dict[str, str]
Returns:: Dictionary of all configuration values.

unset(key)[source]

Unset configuration value.

Parameters:: key (str) – Configuration key to unset.
Return type:: None

contains(key)[source]

Check if configuration contains key.

Parameters:: key (str) – Configuration key.
Return type:: bool
Returns:: True if key exists, False otherwise.

is_case_sensitive()[source]

Check if case-sensitive identifier resolution is enabled.

Return type:: bool
Returns:: True if case-sensitive mode is enabled, False otherwise. Defaults to False (case-insensitive) to match PySpark behavior.

__str__()[source]

String representation.

Return type:: str

__repr__()[source]

Representation.

Return type:: str

class sparkless.session.config.configuration.SparkConfig(validation_mode='relaxed', enable_type_coercion=True, enable_lazy_evaluation=True)[source]

Bases: object

High-level session configuration for validation and behavior flags.

This complements Configuration (SparkConf-like key/value) with strongly-typed knobs used by the mock engine.

validation_mode: Union[strict, relaxed] | minimal

enable_type_coercion: best-effort coercion during DataFrame creation

Parameters:

validation_mode (str)
enable_type_coercion (bool)
enable_lazy_evaluation (bool)

validation_mode: str = 'relaxed'

enable_type_coercion: bool = True

enable_lazy_evaluation: bool = True

__init__(validation_mode='relaxed', enable_type_coercion=True, enable_lazy_evaluation=True)

Parameters:

validation_mode (str)
enable_type_coercion (bool)
enable_lazy_evaluation (bool)

class sparkless.session.config.configuration.ConfigBuilder[source]

Bases: object

Configuration builder for Sparkless.

Provides a builder pattern for creating Configuration instances with fluent API for setting multiple configuration values.

Example

>>> builder = ConfigBuilder()
>>> conf = (builder
...     .appName("MyApp")
...     .master("local[*]")
...     .set("spark.sql.adaptive.enabled", "true")
...     .build())

Initialize ConfigBuilder.

__init__()[source]: Initialize ConfigBuilder.

appName(name)[source]

Set application name.

Parameters:: name (str) – Application name.
Return type:: ConfigBuilder
Returns:: Self for method chaining.

master(master)[source]

Set master URL.

Parameters:: master (str) – Master URL.
Return type:: ConfigBuilder
Returns:: Self for method chaining.

set(key, value)[source]

Set configuration value.

Parameters:

key (str) – Configuration key.
value (Any) – Configuration value.

Return type:

ConfigBuilder

Returns:

Self for method chaining.

setAll(pairs)[source]

Set multiple configuration values.

Parameters:: pairs (Dict[str, Any]) – Dictionary of key-value pairs.
Return type:: ConfigBuilder
Returns:: Self for method chaining.

build()[source]

Build the configuration.

Return type:: Configuration
Returns:: Configuration instance.

Catalog

Mock Catalog implementation for Sparkless.

This module provides a mock implementation of PySpark’s Catalog that behaves identically to the real Catalog for testing and development. It includes database and table management, caching operations, and catalog queries without requiring a JVM or actual Spark installation.

Key Features:

Complete PySpark Catalog API compatibility
Database management (create, list, drop)
Table management (create, list, drop, cache)
Schema validation and error handling
Integration with storage manager

Example

>>> from sparkless.session import Catalog
>>> catalog = Catalog(storage_manager)
>>> catalog.createDatabase("test_db")
>>> catalog.listDatabases()
[Database(name='test_db')]

class sparkless.session.catalog.Database(name)[source]

Bases: object

Mock database object for catalog operations.

Parameters:: name (str)

Initialize Database.

Parameters:: name (str) – Database name.

__init__(name)[source]

Initialize Database.

Parameters:: name (str) – Database name.

__str__()[source]

String representation.

Return type:: str

__repr__()[source]

Representation.

Return type:: str

class sparkless.session.catalog.Table(name, database='default')[source]

Bases: object

Mock table object for catalog operations.

Parameters:

name (str)
database (str)

Initialize Table.

Parameters:

name (str) – Table name.
database (str) – Database name.

__init__(name, database='default')[source]

Initialize Table.

Parameters:

name (str) – Table name.
database (str) – Database name.

__str__()[source]

String representation.

Return type:: str

__repr__()[source]

Representation.

Return type:: str

class sparkless.session.catalog.Catalog(storage, spark=None)[source]

Bases: object

Mock Catalog for Spark session.

Provides a comprehensive mock implementation of PySpark’s Catalog that supports all major operations including database management, table operations, and caching without requiring actual Spark installation.

storage: Storage manager for data persistence.

spark: Optional SparkSession reference for SQL-based operations.

Example

>>> catalog = Catalog(storage_manager, spark_session)
>>> catalog.createDatabase("test_db")
>>> catalog.listDatabases()
[Database(name='test_db')]

Parameters:

storage (IStorageManager)
spark (Optional[Any])

Initialize Catalog.

Parameters:

storage (IStorageManager) – Storage manager instance.
spark (Optional[Any]) – Optional SparkSession instance for SQL-based operations. If provided, createDatabase() will use SQL instead of direct storage calls.

__init__(storage, spark=None)[source]

Initialize Catalog.

Parameters:

storage (IStorageManager) – Storage manager instance.
spark (Optional[Any]) – Optional SparkSession instance for SQL-based operations. If provided, createDatabase() will use SQL instead of direct storage calls.

get_storage_backend()[source]

Get the storage backend instance.

Public accessor method for the storage backend, allowing access without breaking encapsulation.

Return type:: IStorageManager
Returns:: The storage manager instance.

listDatabases()[source]

List all databases.

Return type:: List[Database]
Returns:: List of Database objects.

setCurrentDatabase(dbName)[source]

Set current/active database.

Parameters:: dbName (str) – Database name to set as current.
Raises:: AnalysisException – If database does not exist.
Return type:: None

currentDatabase()[source]

Get current database name.

Return type:: str
Returns:: Current database name.

currentCatalog()[source]

Get current catalog name (Spark SQL compatibility).

Return type:: str
Returns:: Catalog identifier. Sparkless exposes a single catalog.

createDatabase(name, ignoreIfExists=True)[source]

Create a database.

This method uses SQL internally to match PySpark’s behavior, where database creation is done via SQL statements rather than direct API calls. However, to avoid infinite recursion when called from SQL execution, it checks if the database already exists first and uses direct storage calls when appropriate.

Parameters:

name (str) – Database name.
ignoreIfExists (bool) – Whether to ignore if database already exists.

Raises:

IllegalArgumentException – If name is not a string or is empty.
AnalysisException – If database already exists and ignoreIfExists is False.

Return type:

None

dropDatabase(name, ignoreIfNotExists=True, ignore_if_not_exists=None, cascade=False)[source]

Drop a database.

Parameters:

name (str) – Database name.
ignoreIfNotExists (bool) – Whether to ignore if database doesn’t exist (PySpark style).
ignore_if_not_exists (Optional[bool]) – Whether to ignore if database doesn’t exist (Python style).
cascade (bool) – Whether to drop tables in the database (ignored in mock).

Raises:

IllegalArgumentException – If name is not a string or is empty.
AnalysisException – If database doesn’t exist and ignoreIfNotExists is False.

Return type:

None

tableExists(tableName, dbName=None)[source]

Check if table exists.

Parameters:

tableName (str) – Table name or qualified name (schema.table).
dbName (Optional[str]) – Optional database name. Uses current database if None.

Return type:

bool

Returns:

True if table exists, False otherwise.

Raises:

IllegalArgumentException – If names are not strings or are empty.
AnalysisException – If there’s an error checking table existence.

listTables(dbName=None)[source]

List tables in database.

Parameters:

dbName (Optional[str]) – Optional database name. Uses current database if None.

Return type:

List[Table]

Returns:

List of MockTable objects.

Raises:

IllegalArgumentException – If dbName is not a string or is empty.
AnalysisException – If database doesn’t exist or there’s an error.

createTable(tableName, path, source='parquet', schema=None, **options)[source]

Create table.

Parameters:

tableName (str) – Table name.
path (str) – Path to data.
source (str) – Data source format.
schema (Optional[Any]) – Table schema.
**options (Any) – Additional options.

Return type:

None

dropTable(tableName)[source]

Drop table.

Parameters:

tableName (str) – Table name or qualified name (schema.table).

Raises:

IllegalArgumentException – If table name is invalid.
AnalysisException – If table doesn’t exist or can’t be dropped.

Return type:

None

isCached(tableName)[source]

Check if table is cached.

Parameters:: tableName (str) – Table name or qualified name (schema.table).
Return type:: bool
Returns:: True if table is cached, False otherwise.
Raises:: IllegalArgumentException – If table name is invalid.

cacheTable(tableName)[source]

Cache table.

Parameters:

tableName (str) – Table name or qualified name (schema.table).

Raises:

IllegalArgumentException – If table name is invalid.
AnalysisException – If table doesn’t exist.

Return type:

None

uncacheTable(tableName)[source]

Uncache table.

Parameters:: tableName (str) – Table name or qualified name (schema.table).
Raises:: IllegalArgumentException – If table name is invalid.
Return type:: None

refreshTable(tableName)[source]

Refresh table.

Parameters:: tableName (str) – Table name.
Return type:: None

refreshByPath(path)[source]

Refresh by path.

Parameters:: path (str) – Path to refresh.
Return type:: None

recoverPartitions(tableName)[source]

Recover partitions.

Parameters:: tableName (str) – Table name.
Return type:: None

getDatabase(dbName)[source]

Get database information.

Parameters:

dbName (str) – Database name.

Return type:

Database

Returns:

Database object with database information.

Raises:

IllegalArgumentException – If database name is invalid.
AnalysisException – If database doesn’t exist.

Example

>>> db = catalog.getDatabase("test_db")
>>> print(db.name)
test_db

getTable(tableName=None, dbName=None, *, databaseName=None)[source]

Get table information.

Parameters:

tableName (Optional[str]) – Table name or qualified name (schema.table). When called with two positional args, this may be dbName (PySpark compatibility).
dbName (Optional[str]) – Optional database name. When called with two positional args, this may be tableName.
databaseName (Optional[str]) – Optional keyword argument for database name (PySpark compatibility).

Return type:

Table

Returns:

Table object with table information.

Raises:

IllegalArgumentException – If table name is invalid.
AnalysisException – If table doesn’t exist.

Example

>>> table = catalog.getTable("users", "test_db")  # Standard: (tableName, dbName)
>>> table = catalog.getTable("test_db", "users")  # PySpark style: (dbName, tableName)
>>> table = catalog.getTable(tableName="users", databaseName="test_db")  # Keyword args

clearCache()[source]

Clear cache.

Return type:: None