Profiling

This guide explains how to enable the new hot-path profiling hooks and interpret the baseline measurements captured for the Polars backend and Python expression evaluator.

Enabling Instrumentation

Profiling is disabled by default. Turn it on via environment flags:

  • MOCK_SPARK_PROFILE=1 enables the global profiler.

  • MOCK_SPARK_FEATURE_enable_polars_vectorized_shortcuts=1 caches repeated struct field lookups in PolarsOperationExecutor.

  • MOCK_SPARK_FEATURE_enable_expression_translation_cache=1 enables the LRU cache inside PolarsExpressionTranslator.

You can set multiple flags using the JSON-based MOCK_SPARK_FEATURE_FLAGS variable:

export MOCK_SPARK_FEATURE_FLAGS='{
  "enable_performance_profiling": true,
  "enable_polars_vectorized_shortcuts": true,
  "enable_expression_translation_cache": true
}'

Instrumented Hot Paths

The following entry points now emit profiling samples when instrumentation is enabled:

  • PolarsOperationExecutor.apply_select/apply_filter/apply_with_column

  • PolarsOperationExecutor.apply_join/apply_group_by_agg

  • ExpressionEvaluator.evaluate_expression/_evaluate_column_operation/_evaluate_function_call

Each sample includes runtime (ms) and captured allocations (KB) based on tracemalloc. Per-thread events are stored in sparkless.utils.profiling.collect_events().

Collecting Samples

from sparkless.session import SparkSession
from sparkless.utils import profiling

spark = SparkSession.builder.master("local[1]").getOrCreate()
profiling.clear_events()

df = spark.createDataFrame(
    [(i, f"text-{i % 5}", {"k": i}) for i in range(10_000)],
    ["id", "label", "payload"],
)
(
    df.filter(df.id % 2 == 0)
      .select("id", "label", df.payload["k"].alias("payload_k"))
      .groupBy("label")
      .count()
      .collect()
)

for event in profiling.collect_events():
    print(event)

Baseline Snapshot (2025-11-13)

Measurements were taken on macOS 14.6.1 (M3 Pro, Python 3.11.7) using the sample workloads above.

Hot Path

Duration (ms)

Peak (KB)

Notes

polars.apply_select

5.8

412.0

Vectorised cache disabled

polars.apply_select

4.1

414.3

Vectorised cache enabled

polars.apply_group_by_agg

7.4

155.1

Aggregating 5 partitions

expression.evaluate_expression

14.9

96.2

10k rows, mixed arithmetic

expression.evaluate_function_call

9.6

60.8

Map/string heavy workload

Interpretation: enabling the shortcut + translation caches reduces the apply_select wall time by ~29% for map-heavy queries, with stable memory usage. Expression evaluation remains dominated by user-defined functions; additional vectorisation opportunities should focus on Python fallbacks.

Next Steps

  • Use profiling.collect_events() at the end of larger integration tests to capture regressions over time.

  • Wire the feature flags into CI smoke runs once steady-state performance thresholds are established.