Session Management
SparkSession
SparklessSession implementation for Sparkless.
This module provides a complete mock implementation of PySpark’s SparkSession that behaves identically to the real SparkSession for testing and development. It includes session management, DataFrame creation, SQL operations, and catalog management without requiring a JVM or actual Spark installation.
- Key Features:
Complete PySpark SparkSession API compatibility
DataFrame creation from various data sources
SQL query parsing and execution
Catalog operations (databases, tables)
Configuration management
Session lifecycle management
Example
>>> from sparkless.sql import SparkSession
>>> spark = SparkSession("MyApp")
>>> data = [{"name": "Alice", "age": 25}]
>>> df = spark.createDataFrame(data)
>>> df.show()
DataFrame[1 rows, 2 columns]
age name
25 Alice
>>> spark.sql("CREATE DATABASE test")
SparkContext
SparklessContext implementation for Sparkless.
This module provides a mock implementation of PySpark’s SparkContext that behaves identically to the real SparkContext for testing and development. It includes context management, JVM simulation, and logging without requiring a JVM or actual Spark installation.
- Key Features:
Complete PySpark SparkContext API compatibility
JVM context simulation
Log level management
Application name management
Context lifecycle management
Example
>>> from sparkless.session import SparkContext
>>> sc = SparkContext("MyApp")
>>> sc.setLogLevel("WARN")
>>> print(sc.appName)
MyApp
- class sparkless.session.context.MockJVMFunctions[source]
Bases:
objectMock JVM functions for testing without actual JVM.
Initialize mock JVM functions.
- class sparkless.session.context.JVMContext[source]
Bases:
objectMock JVM context for testing without actual JVM.
Initialize mock JVM context.
- class sparkless.session.context.SparkContext(app_name='SparklessApp')[source]
Bases:
objectSparklessContext for testing without PySpark.
Provides a comprehensive mock implementation of PySpark’s SparkContext that supports all major operations including context management, logging, and JVM simulation without requiring actual Spark installation.
- app_name
Application name for the Spark context.
- _jvm
JVM context for JVM operations.
Example
>>> sc = SparkContext("MyApp") >>> sc.setLogLevel("WARN") >>> print(sc.appName) MyApp
- Parameters:
app_name (
str)
Initialize SparkContext.
- Parameters:
app_name (
str) – Name of the Spark application.
- __init__(app_name='SparklessApp')[source]
Initialize SparkContext.
- Parameters:
app_name (
str) – Name of the Spark application.
- property jvm: JVMContext
Get JVM context.
- Returns:
JVM context instance.
- stop()[source]
Stop the Spark context.
In a real Spark context, this would stop the Spark application. This is a mock implementation.
- Return type:
Configuration
Configuration management for Sparkless.
This module provides configuration management for Sparkless, including session configuration, runtime settings, and environment-specific configurations.
- Key Features:
Complete PySpark SparkConf API compatibility
Configuration validation and type checking
Environment-specific settings
Configuration builder pattern
Runtime configuration updates
Example
>>> from sparkless.session.config import Configuration
>>> conf = Configuration()
>>> conf.set("spark.app.name", "MyApp")
>>> conf.get("spark.app.name")
'MyApp'
- class sparkless.session.config.configuration.Configuration[source]
Bases:
objectSparklessConf for configuration management.
Provides a comprehensive mock implementation of PySpark’s SparkConf that supports all major operations including configuration management, validation, and environment-specific settings without requiring actual Spark.
- _config
Internal configuration dictionary.
Example
>>> conf = Configuration() >>> conf.set("spark.app.name", "MyApp") >>> conf.get("spark.app.name") 'MyApp'
Initialize Configuration with default settings.
- class sparkless.session.config.configuration.SparkConfig(validation_mode='relaxed', enable_type_coercion=True, enable_lazy_evaluation=True)[source]
Bases:
objectHigh-level session configuration for validation and behavior flags.
This complements Configuration (SparkConf-like key/value) with strongly-typed knobs used by the mock engine.
- validation_mode
Union[strict, relaxed] | minimal
- enable_type_coercion
best-effort coercion during DataFrame creation
- class sparkless.session.config.configuration.ConfigBuilder[source]
Bases:
objectConfiguration builder for Sparkless.
Provides a builder pattern for creating Configuration instances with fluent API for setting multiple configuration values.
Example
>>> builder = ConfigBuilder() >>> conf = (builder ... .appName("MyApp") ... .master("local[*]") ... .set("spark.sql.adaptive.enabled", "true") ... .build())
Initialize ConfigBuilder.
- appName(name)[source]
Set application name.
- Parameters:
name (
str) – Application name.- Return type:
- Returns:
Self for method chaining.
- master(master)[source]
Set master URL.
- Parameters:
master (
str) – Master URL.- Return type:
- Returns:
Self for method chaining.
- set(key, value)[source]
Set configuration value.
- Parameters:
- Return type:
- Returns:
Self for method chaining.
- setAll(pairs)[source]
Set multiple configuration values.
- Parameters:
- Return type:
- Returns:
Self for method chaining.
Catalog
Mock Catalog implementation for Sparkless.
This module provides a mock implementation of PySpark’s Catalog that behaves identically to the real Catalog for testing and development. It includes database and table management, caching operations, and catalog queries without requiring a JVM or actual Spark installation.
- Key Features:
Complete PySpark Catalog API compatibility
Database management (create, list, drop)
Table management (create, list, drop, cache)
Schema validation and error handling
Integration with storage manager
Example
>>> from sparkless.session import Catalog
>>> catalog = Catalog(storage_manager)
>>> catalog.createDatabase("test_db")
>>> catalog.listDatabases()
[Database(name='test_db')]
- class sparkless.session.catalog.Database(name)[source]
Bases:
objectMock database object for catalog operations.
- Parameters:
name (
str)
Initialize Database.
- Parameters:
name (
str) – Database name.
- class sparkless.session.catalog.Table(name, database='default')[source]
Bases:
objectMock table object for catalog operations.
Initialize Table.
- class sparkless.session.catalog.Catalog(storage, spark=None)[source]
Bases:
objectMock Catalog for Spark session.
Provides a comprehensive mock implementation of PySpark’s Catalog that supports all major operations including database management, table operations, and caching without requiring actual Spark installation.
- storage
Storage manager for data persistence.
- spark
Optional SparkSession reference for SQL-based operations.
Example
>>> catalog = Catalog(storage_manager, spark_session) >>> catalog.createDatabase("test_db") >>> catalog.listDatabases() [Database(name='test_db')]
Initialize Catalog.
- Parameters:
- get_storage_backend()[source]
Get the storage backend instance.
Public accessor method for the storage backend, allowing access without breaking encapsulation.
- Return type:
IStorageManager- Returns:
The storage manager instance.
- currentCatalog()[source]
Get current catalog name (Spark SQL compatibility).
- Return type:
- Returns:
Catalog identifier. Sparkless exposes a single catalog.
- createDatabase(name, ignoreIfExists=True)[source]
Create a database.
This method uses SQL internally to match PySpark’s behavior, where database creation is done via SQL statements rather than direct API calls. However, to avoid infinite recursion when called from SQL execution, it checks if the database already exists first and uses direct storage calls when appropriate.
- dropDatabase(name, ignoreIfNotExists=True, ignore_if_not_exists=None, cascade=False)[source]
Drop a database.
- Parameters:
- Raises:
IllegalArgumentException – If name is not a string or is empty.
AnalysisException – If database doesn’t exist and ignoreIfNotExists is False.
- Return type:
- tableExists(tableName, dbName=None)[source]
Check if table exists.
- Parameters:
- Return type:
- Returns:
True if table exists, False otherwise.
- Raises:
IllegalArgumentException – If names are not strings or are empty.
AnalysisException – If there’s an error checking table existence.
- getDatabase(dbName)[source]
Get database information.
- Parameters:
dbName (
str) – Database name.- Return type:
- Returns:
Database object with database information.
- Raises:
IllegalArgumentException – If database name is invalid.
AnalysisException – If database doesn’t exist.
Example
>>> db = catalog.getDatabase("test_db") >>> print(db.name) test_db
- getTable(tableName=None, dbName=None, *, databaseName=None)[source]
Get table information.
- Parameters:
tableName (
Optional[str]) – Table name or qualified name (schema.table). When called with two positional args, this may be dbName (PySpark compatibility).dbName (
Optional[str]) – Optional database name. When called with two positional args, this may be tableName.databaseName (
Optional[str]) – Optional keyword argument for database name (PySpark compatibility).
- Return type:
- Returns:
Table object with table information.
- Raises:
IllegalArgumentException – If table name is invalid.
AnalysisException – If table doesn’t exist.
Example
>>> table = catalog.getTable("users", "test_db") # Standard: (tableName, dbName) >>> table = catalog.getTable("test_db", "users") # PySpark style: (dbName, tableName) >>> table = catalog.getTable(tableName="users", databaseName="test_db") # Keyword args