Robin Mode Ownership Analysis: Sparkless vs robin-sparklessο
Date: 2026-02-06
Reference run: test_results_robin_mode.txt β 1,656 failed, 1,050 passed, 22 skipped
Purpose: Determine whether failures require Sparkless adaptations or robin-sparkless fixes.
Executive Summaryο
Most failures require Sparkless adaptations. The robin-sparkless package exposes the APIs needed (filter, select, join, group_by, etc.), but uses a method-based comparison API (e.g. col.gt(lit)) instead of Python operators (col > lit). Sparkless currently generates operator-based expressions, which fail. Additionally, Sparklessβ Robin materializer has narrow can_handle rules and supports only a subset of operations.
robin-sparkless issues identified: Column class does not implement __gt__, __lt__, etc.; Python col > lit raises TypeError. Upstream fix would be adding operator overloads for PySpark compatibility.
1. Failure Categoriesο
Category A: SparkUnsupportedOperationError (Sparkless)ο
Cause: Robin materializerβs can_handle_operation() returns False; Sparkless raises fail-fast.
Examples:
test_create_map_with_literals:Operation 'Operations: select' is not supportedβ select with Column expressions (e.g.F.create_map(...).alias("map_col")) fails because materializer only acceptsselect([str, str, ...]), not Column objects.UDF tests, withField tests, window comparison tests β these use operations the materializer does not declare support for.
Owner: Sparkless β extend can_handle_operation and add translation for:
select with Column/expression payloads
create_map, withField, UDFs (if robin-sparkless supports them; otherwise document as unsupported)
Category B: Row count mismatch mock=0 (Mixed β Sparkless translation bug confirmed)ο
Cause: Filter/join returns 0 rows when it should return N.
Root cause (confirmed): Sparkless Robin materializer uses Python operators:
return robin_col > robin_lit # Fails!
robin-sparkless Column does not implement __gt__; it raises:
TypeError: '>' not supported between instances of 'builtins.Column' and 'builtins.Column'
robin-sparkless expects the method-based API:
robin_col.gt(robin_lit) # Works
Owner: Sparkless β change _simple_filter_to_robin() in sparkless/backend/robin/materializer.py to use .gt(), .lt(), .ge(), .le(), .eq(), .ne() instead of >, <, etc.
Optional robin-sparkless improvement: Add __gt__, __lt__, etc. for PySpark-style col > lit usage.
Category C: Operations robin-sparkless may not supportο
Examples: create_map, UDFs, rlike lookaround, withField, window functions with complex expressions.
Owner: Depends on robin-sparkless API:
If robin-sparkless exposes the function β Sparkless must translate to it.
If not β either Sparkless documents as unsupported (fail-fast) or robin-sparkless adds support.
Category D: AssertionError / wrong result (needs isolation)ο
Cause: Result shape or values differ from expected.
Owner: Requires per-test isolation:
If translation to robin-sparkless is correct β likely robin-sparkless bug.
If translation is wrong β Sparkless bug.
2. Action Itemsο
Sparkless (high priority)ο
Item |
File |
Description |
|---|---|---|
1 |
|
Use |
2 |
|
Broaden select handling for Column expressions where possible |
3 |
|
Extend |
Sparkless (medium priority)ο
Item |
Description |
|---|---|
4 |
Document which operations are supported vs unsupported for Robin backend |
5 |
Consider a fallback path: when Robin cannot handle an op, use Polars materializer if available |
robin-sparkless (optional)ο
Item |
Description |
|---|---|
1 |
Add |
3. Verificationο
After fixing the filter translation (item 1):
# Should pass filter and join parity
SPARKLESS_TEST_BACKEND=robin python -m pytest \
tests/parity/dataframe/test_filter.py::TestFilterParity::test_filter_operations \
tests/parity/dataframe/test_join.py::TestJoinParity::test_inner_join \
-v --no-cov
4. Test Run Commandsο
# Full Robin run
SPARKLESS_TEST_BACKEND=robin SPARKLESS_BACKEND=robin python -m pytest \
tests/ --ignore=tests/archive -n 12 --dist loadfile -v --tb=short \
2>&1 | tee test_results_robin_mode.txt
# Sample with long tracebacks
SPARKLESS_TEST_BACKEND=robin python scripts/run_robin_failure_sample.py --max 50