# Profiling


This guide explains how to enable the new hot-path profiling hooks and interpret
the baseline measurements captured for the Polars backend and Python expression
evaluator.

## Enabling Instrumentation

Profiling is disabled by default. Turn it on via environment flags:

- `MOCK_SPARK_PROFILE=1` enables the global profiler.
- `MOCK_SPARK_FEATURE_enable_polars_vectorized_shortcuts=1` caches repeated
  struct field lookups in `PolarsOperationExecutor`.
- `MOCK_SPARK_FEATURE_enable_expression_translation_cache=1` enables the LRU
  cache inside `PolarsExpressionTranslator`.

You can set multiple flags using the JSON-based `MOCK_SPARK_FEATURE_FLAGS`
variable:

```bash
export MOCK_SPARK_FEATURE_FLAGS='{
  "enable_performance_profiling": true,
  "enable_polars_vectorized_shortcuts": true,
  "enable_expression_translation_cache": true
}'
```

## Instrumented Hot Paths

The following entry points now emit profiling samples when instrumentation is
enabled:

- `PolarsOperationExecutor.apply_select/apply_filter/apply_with_column`
- `PolarsOperationExecutor.apply_join/apply_group_by_agg`
- `ExpressionEvaluator.evaluate_expression/_evaluate_column_operation/_evaluate_function_call`

Each sample includes runtime (ms) and captured allocations (KB) based on
`tracemalloc`. Per-thread events are stored in
`sparkless.utils.profiling.collect_events()`.

## Collecting Samples

```python
from sparkless.session import SparkSession
from sparkless.utils import profiling

spark = SparkSession.builder.master("local[1]").getOrCreate()
profiling.clear_events()

df = spark.createDataFrame(
    [(i, f"text-{i % 5}", {"k": i}) for i in range(10_000)],
    ["id", "label", "payload"],
)
(
    df.filter(df.id % 2 == 0)
      .select("id", "label", df.payload["k"].alias("payload_k"))
      .groupBy("label")
      .count()
      .collect()
)

for event in profiling.collect_events():
    print(event)
```

## Baseline Snapshot (2025-11-13)

Measurements were taken on macOS 14.6.1 (M3 Pro, Python 3.11.7) using the sample
workloads above.

| Hot Path                                    | Duration (ms) | Peak (KB) | Notes |
|---------------------------------------------|--------------:|----------:|-------|
| `polars.apply_select`                       | 5.8           | 412.0     | Vectorised cache disabled |
| `polars.apply_select`                       | 4.1           | 414.3     | Vectorised cache enabled |
| `polars.apply_group_by_agg`                 | 7.4           | 155.1     | Aggregating 5 partitions |
| `expression.evaluate_expression`            | 14.9          | 96.2      | 10k rows, mixed arithmetic |
| `expression.evaluate_function_call`         | 9.6           | 60.8      | Map/string heavy workload |

> **Interpretation:** enabling the shortcut + translation caches reduces the
> `apply_select` wall time by ~29% for map-heavy queries, with stable memory
> usage. Expression evaluation remains dominated by user-defined functions;
> additional vectorisation opportunities should focus on Python fallbacks.

## Next Steps

- Use `profiling.collect_events()` at the end of larger integration tests to
  capture regressions over time.
- Wire the feature flags into CI smoke runs once steady-state performance
  thresholds are established.