# Mock Spark Features


This guide explains sparkless-specific features that are not available in PySpark, and when to use them versus PySpark-compatible APIs.

## Overview

Sparkless provides two categories of APIs:

1. **PySpark-Compatible APIs** - Use these for code that needs to work with both sparkless and PySpark
2. **sparkless Convenience APIs** - Use these for sparkless-specific test utilities and convenience features

## PySpark-Compatible APIs (Recommended)

These APIs work identically in both sparkless and PySpark. Use them when:

- Writing code that needs to work with both engines
- Following PySpark best practices
- Writing production-like code
- Sharing code with teams using PySpark

### SQL Commands

```python
from sparkless.sql import SparkSession

spark = SparkSession("MyApp")

# Create database
spark.sql("CREATE DATABASE IF NOT EXISTS test_db")

# Create table
spark.sql("CREATE TABLE test_db.users (name STRING, age INT)")

# Insert data
spark.sql("INSERT INTO test_db.users VALUES ('Alice', 25), ('Bob', 30)")

# Query
result = spark.sql("SELECT * FROM test_db.users WHERE age > 25")
```

### Functions Module

```python
# PySpark-compatible import
from sparkless.sql import functions as F

df.select(F.col("name"), F.upper(F.col("name")))
```

### Catalog API

```python
# List databases
databases = spark.catalog.listDatabases()

# List tables
tables = spark.catalog.listTables("test_db")

# Check if table exists
exists = spark.catalog.tableExists("users", "test_db")

# Get table information
table = spark.catalog.getTable("users", "test_db")
```

## sparkless Convenience APIs

These APIs are specific to sparkless and provide convenient programmatic access. **They will not work with PySpark.**

### Storage API

The `.storage` API provides convenient programmatic access to databases and tables:

```python
from sparkless.sql import SparkSession
from sparkless.sql.types import StructType, StructField, StringType, IntegerType

spark = SparkSession("MyApp")

# Create schema (database)
spark._storage.create_schema("test_db")

# Create table with schema
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])
spark._storage.create_table("test_db", "users", schema)

# Insert data
spark._storage.insert_data("test_db", "users", [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30}
])

# Get table as DataFrame
df = spark._storage.get_table("test_db", "users")
```

**When to use:**
- Writing sparkless-specific test utilities
- Setting up test fixtures
- Need convenient programmatic access
- Code will only run with sparkless

**When NOT to use:**
- Code that needs to work with PySpark
- Production-like code
- Sharing code with PySpark users

### Enhanced Error Messages

Sparkless provides enhanced error messages with migration guidance:

```python
from sparkless.core.exceptions.analysis import AnalysisException

try:
    spark.sql("SELECT * FROM non_existent_table")
except AnalysisException as e:
    print(e)  # Includes helpful migration hints
```

The error messages automatically detect common patterns and provide hints:
- Table not found → Guidance on creating tables
- Database not found → Guidance on creating databases
- Column not found → Suggestion to check column names

### Enhanced Explain Method

Sparkless's `explain()` method provides detailed execution plans:

```python
df.explain()  # Basic plan
df.explain(extended=True)  # Extended plan with schema details
```

Shows:
- Source operations
- Pending transformations (lazy evaluation)
- Schema information (when extended=True)

### DataFrameWriter.delta() Convenience Method

Sparkless provides a convenience method for Delta Lake format:

```python
# Convenience method (sparkless)
df.write.delta("/path/to/delta_table")

# Equivalent PySpark-compatible way
df.write.format("delta").save("/path/to/delta_table")
```

Both work, but the convenience method is shorter.

## Migration Guide

### From sparkless Convenience APIs to PySpark-Compatible

If you have code using sparkless convenience APIs and want to make it PySpark-compatible:

**Before (sparkless only):**
```python
spark._storage.create_schema("test_db")
schema = StructType([StructField("name", StringType())])
spark._storage.create_table("test_db", "users", schema)
spark._storage.insert_data("test_db", "users", [{"name": "Alice"}])
```

**After (PySpark-compatible):**
```python
spark.sql("CREATE DATABASE IF NOT EXISTS test_db")
spark.sql("CREATE TABLE test_db.users (name STRING)")
spark.sql("INSERT INTO test_db.users VALUES ('Alice')")
```

### From PySpark-Compatible to sparkless Convenience APIs

If you want to use convenience APIs in sparkless-specific code:

**Before (SQL):**
```python
spark.sql("CREATE DATABASE IF NOT EXISTS test_db")
spark.sql("CREATE TABLE test_db.users (name STRING, age INT)")
spark.sql("INSERT INTO test_db.users VALUES ('Alice', 25)")
```

**After (convenience API):**
```python
spark._storage.create_schema("test_db")
schema = StructType([
    StructField("name", StringType()),
    StructField("age", IntegerType())
])
spark._storage.create_table("test_db", "users", schema)
spark._storage.insert_data("test_db", "users", [{"name": "Alice", "age": 25}])
```

## Best Practices

### For Production-Like Code

✅ **Use PySpark-Compatible APIs:**
- SQL commands for database/table operations
- Standard functions module import
- Catalog API for metadata operations

```python
# Good: Works with both sparkless and PySpark
spark.sql("CREATE DATABASE IF NOT EXISTS analytics")
spark.sql("CREATE TABLE analytics.events (timestamp TIMESTAMP, event_type STRING)")
```

### For Test Utilities

✅ **Use sparkless Convenience APIs:**
- `.storage` API for test setup
- Enhanced error messages for debugging
- Convenience methods for faster test writing

```python
# Good: Convenient for tests, but sparkless-specific
@pytest.fixture
def setup_test_data(spark):
    spark._storage.create_schema("test")
    schema = StructType([StructField("id", IntegerType())])
    spark._storage.create_table("test", "data", schema)
    return spark
```

### For Learning PySpark

✅ **Use PySpark-Compatible APIs:**
- Learn patterns that work in real PySpark
- Understand SQL-based operations
- Practice with standard PySpark APIs

## Summary

| Feature | PySpark-Compatible | sparkless Convenience |
|---------|-------------------|----------------------|
| **Storage Management** | SQL commands | `.storage` API |
| **Functions** | `from sparkless.sql import functions as F` | Same (no convenience API) |
| **Error Messages** | Standard exceptions | Enhanced with hints |
| **Explain** | Basic plan | Enhanced with details |
| **Delta Writer** | `df.write.format("delta").save()` | `df.write.delta()` |

**Recommendation:** Use PySpark-compatible APIs for code that needs to work with both engines. Use sparkless convenience APIs for sparkless-specific test utilities.

## See Also

- [Storage API Guide](storage_api_guide.md) - Detailed guide on storage APIs
- [Getting Started](getting_started.md) - Quick start guide
- [API Reference](api_reference.md) - Complete API documentation