Profiling
This guide explains how to enable the new hot-path profiling hooks and interpret the baseline measurements captured for the Polars backend and Python expression evaluator.
Enabling Instrumentation
Profiling is disabled by default. Turn it on via environment flags:
MOCK_SPARK_PROFILE=1enables the global profiler.MOCK_SPARK_FEATURE_enable_polars_vectorized_shortcuts=1caches repeated struct field lookups inPolarsOperationExecutor.MOCK_SPARK_FEATURE_enable_expression_translation_cache=1enables the LRU cache insidePolarsExpressionTranslator.
You can set multiple flags using the JSON-based MOCK_SPARK_FEATURE_FLAGS
variable:
export MOCK_SPARK_FEATURE_FLAGS='{
"enable_performance_profiling": true,
"enable_polars_vectorized_shortcuts": true,
"enable_expression_translation_cache": true
}'
Instrumented Hot Paths
The following entry points now emit profiling samples when instrumentation is enabled:
PolarsOperationExecutor.apply_select/apply_filter/apply_with_columnPolarsOperationExecutor.apply_join/apply_group_by_aggExpressionEvaluator.evaluate_expression/_evaluate_column_operation/_evaluate_function_call
Each sample includes runtime (ms) and captured allocations (KB) based on
tracemalloc. Per-thread events are stored in
sparkless.utils.profiling.collect_events().
Collecting Samples
from sparkless.session import SparkSession
from sparkless.utils import profiling
spark = SparkSession.builder.master("local[1]").getOrCreate()
profiling.clear_events()
df = spark.createDataFrame(
[(i, f"text-{i % 5}", {"k": i}) for i in range(10_000)],
["id", "label", "payload"],
)
(
df.filter(df.id % 2 == 0)
.select("id", "label", df.payload["k"].alias("payload_k"))
.groupBy("label")
.count()
.collect()
)
for event in profiling.collect_events():
print(event)
Baseline Snapshot (2025-11-13)
Measurements were taken on macOS 14.6.1 (M3 Pro, Python 3.11.7) using the sample workloads above.
Hot Path |
Duration (ms) |
Peak (KB) |
Notes |
|---|---|---|---|
|
5.8 |
412.0 |
Vectorised cache disabled |
|
4.1 |
414.3 |
Vectorised cache enabled |
|
7.4 |
155.1 |
Aggregating 5 partitions |
|
14.9 |
96.2 |
10k rows, mixed arithmetic |
|
9.6 |
60.8 |
Map/string heavy workload |
Interpretation: enabling the shortcut + translation caches reduces the
apply_selectwall time by ~29% for map-heavy queries, with stable memory usage. Expression evaluation remains dominated by user-defined functions; additional vectorisation opportunities should focus on Python fallbacks.
Next Steps
Use
profiling.collect_events()at the end of larger integration tests to capture regressions over time.Wire the feature flags into CI smoke runs once steady-state performance thresholds are established.