Welcome to Sparkless

πŸš€ Test PySpark code at lightning speedβ€”no JVM required

Sparkless is a lightweight PySpark replacement that runs your tests 10x faster by eliminating JVM overhead. Your existing PySpark code works unchangedβ€”just swap the import.

# Before
from pyspark.sql import SparkSession

# After
from sparkless.sql import SparkSession

Key Features

  • ⚑ 10x Faster - No JVM startup (30s β†’ 0.1s)

  • 🎯 Drop-in Replacement - Use existing PySpark code unchanged

  • πŸ“¦ Zero Java - Pure Python with Polars backend (thread-safe, no SQL required)

  • πŸ§ͺ 100% Compatible - Full PySpark 3.2-3.5 API support

  • πŸ”„ Lazy Evaluation - Mirrors PySpark’s execution model

  • 🏭 Production Ready - 2314+ passing tests, 100% mypy typed

  • 🧡 Thread-Safe - Polars backend designed for parallel execution

Quick Start

from sparkless.sql import SparkSession, functions as F

# Create session
spark = SparkSession("MyApp")

# Your PySpark code works as-is
data = [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 30}]
df = spark.createDataFrame(data)

# All operations work
result = df.filter(F.col("age") > 25).select("name").collect()
print(result)
# Output: [Row(name='Bob')]

Documentation Contents

Indices and tables