Profiling

This guide explains how to enable the new hot-path profiling hooks and interpret the baseline measurements captured for the Polars backend and Python expression evaluator.

Enabling Instrumentation

Profiling is disabled by default. Turn it on via environment flags:

MOCK_SPARK_PROFILE=1 enables the global profiler.
MOCK_SPARK_FEATURE_enable_polars_vectorized_shortcuts=1 caches repeated struct field lookups in PolarsOperationExecutor.
MOCK_SPARK_FEATURE_enable_expression_translation_cache=1 enables the LRU cache inside PolarsExpressionTranslator.

You can set multiple flags using the JSON-based MOCK_SPARK_FEATURE_FLAGS variable:

export MOCK_SPARK_FEATURE_FLAGS='{
  "enable_performance_profiling": true,
  "enable_polars_vectorized_shortcuts": true,
  "enable_expression_translation_cache": true
}'

Instrumented Hot Paths

The following entry points now emit profiling samples when instrumentation is enabled:

PolarsOperationExecutor.apply_select/apply_filter/apply_with_column
PolarsOperationExecutor.apply_join/apply_group_by_agg
ExpressionEvaluator.evaluate_expression/_evaluate_column_operation/_evaluate_function_call

Each sample includes runtime (ms) and captured allocations (KB) based on tracemalloc. Per-thread events are stored in sparkless.utils.profiling.collect_events().

Collecting Samples

from sparkless.session import SparkSession
from sparkless.utils import profiling

spark = SparkSession.builder.master("local[1]").getOrCreate()
profiling.clear_events()

df = spark.createDataFrame(
    [(i, f"text-{i % 5}", {"k": i}) for i in range(10_000)],
    ["id", "label", "payload"],
)
(
    df.filter(df.id % 2 == 0)
      .select("id", "label", df.payload["k"].alias("payload_k"))
      .groupBy("label")
      .count()
      .collect()
)

for event in profiling.collect_events():
    print(event)

Baseline Snapshot (2025-11-13)

Measurements were taken on macOS 14.6.1 (M3 Pro, Python 3.11.7) using the sample workloads above.

Hot Path	Duration (ms)	Peak (KB)	Notes
`polars.apply_select`	5.8	412.0	Vectorised cache disabled
`polars.apply_select`	4.1	414.3	Vectorised cache enabled
`polars.apply_group_by_agg`	7.4	155.1	Aggregating 5 partitions
`expression.evaluate_expression`	14.9	96.2	10k rows, mixed arithmetic
`expression.evaluate_function_call`	9.6	60.8	Map/string heavy workload

Interpretation: enabling the shortcut + translation caches reduces the apply_select wall time by ~29% for map-heavy queries, with stable memory usage. Expression evaluation remains dominated by user-defined functions; additional vectorisation opportunities should focus on Python fallbacks.

Next Steps

Use profiling.collect_events() at the end of larger integration tests to capture regressions over time.
Wire the feature flags into CI smoke runs once steady-state performance thresholds are established.