Issue Templates for robin-sparkless (Upstream)

The following text can be copied into the robin-sparkless GitHub repo when opening issues.

Issues created from robin_sparkless_needs.md (2026-02-08)

#

Title

Link

182

select()/with_column() resolve Column expressions by name instead of evaluating them

https://github.com/eddiethedean/robin-sparkless/issues/182

184

[Enhancement] Filter: support Column–Column comparisons (col_a > col_b)

https://github.com/eddiethedean/robin-sparkless/issues/184

185

[Enhancement] filter(condition): document Column-only or accept literal bool

https://github.com/eddiethedean/robin-sparkless/issues/185

186

[Enhancement] lit(): extend to date/datetime types for PySpark parity

https://github.com/eddiethedean/robin-sparkless/issues/186

187

[Enhancement] Window API for row_number, rank, sum over window, lag, lead

https://github.com/eddiethedean/robin-sparkless/issues/187

Created via gh issue create -R eddiethedean/robin-sparkless with body files in docs/robin_issue_*.md.

Issues created 2026-02-06 (from ownership analysis)

#

Title

Link

174

[Enhancement] Add Python operator overloads to Column for PySpark compatibility

https://github.com/eddiethedean/robin-sparkless/issues/174

175

[Enhancement] Join on= parameter: accept string for single column (PySpark compatibility)

https://github.com/eddiethedean/robin-sparkless/issues/175

Created via python scripts/create_robin_github_issues_2026_02.py

Sparkless parity issues created (earlier)

  • #1–#17: Created from initial subset (join, filter, select, transformations); see scripts/create_robin_github_issues.py.

  • 104 additional issues: Created from broad parity run: same tests run in Robin mode (tests/robin_parity_broad_results.txt) and PySpark mode (tests/pyspark_parity_failed_results.txt); issues opened only for tests that fail with Robin and pass with PySpark, excluding the 17 above. Script: scripts/create_robin_github_issues_from_results.py (uses --dry-run to preview).

  • Second batch (19 issues): From tests/parity/sql/ and tests/parity/internal/. Robin run saved to tests/robin_parity_sql_internal_results.txt (23 failed, 32 passed). Those 23 run in PySpark β†’ 19 passed, 4 skipped. Issues created for the 19 parity gaps. Command to reproduce results:

    SPARKLESS_TEST_BACKEND=robin SPARKLESS_BACKEND=robin python -m pytest tests/parity/sql/ tests/parity/internal/ -v --tb=line -q 2>&1 | tee tests/robin_parity_sql_internal_results.txt
    

    Then run failed IDs in PySpark and create issues:

    python scripts/create_robin_github_issues_from_results.py \
      --robin-results tests/robin_parity_sql_internal_results.txt \
      --pyspark-results tests/pyspark_parity_sql_internal_results.txt \
      --no-already-filed
    

    Use --dry-run to preview before creating issues.


Sparkless integration note (no upstream feature request needed)

Finding: Robin-sparkless already provides what Sparkless needs:

  • Arbitrary schema: Use create_dataframe_from_rows(data, schema) where data is a list of dicts or lists and schema is a list of (column_name, dtype_str) (e.g. [("id", "bigint"), ("name", "string")]). The 3-column restriction applies only to create_dataframe().

  • Operations: The DataFrame API already has filter, select, with_column, order_by, order_by_exprs, group_by, limit, union, union_by_name, join, and GroupedData (count, sum, avg, min, max, agg, etc.).

The gap is in Sparkless: our Robin materializer currently uses only create_dataframe (3-column) and supports only filter/select/limit. We will extend it to use create_dataframe_from_rows and to translate more operations to the existing robin-sparkless API. No upstream feature issues are required for β€œflexible schema” or β€œmore operations.”


Bug report template

Title: [Bug] Short description of the bug

Body:

Description
[One or two sentences describing the incorrect behavior.]

To reproduce
[Minimal code or steps, e.g. Sparkless snippet that calls robin_sparkless and triggers the bug.]

# Example:
import robin_sparkless
# ...

Expected behavior
[What you expect to happen.]

Actual behavior
[What actually happens (error message, wrong result, etc.).]

Environment

  • Python version:

  • robin-sparkless version:

  • OS:

Additional context
[Optional: stack trace, logs, or links to Sparkless integration code.]