# Issue Templates for robin-sparkless (Upstream) The following text can be copied into the [robin-sparkless](https://github.com/eddiethedean/robin-sparkless) GitHub repo when opening issues. ## Issues created from robin_sparkless_needs.md (2026-02-08) | # | Title | Link | |---|-------|------| | 182 | select()/with_column() resolve Column expressions by name instead of evaluating them | https://github.com/eddiethedean/robin-sparkless/issues/182 | | 184 | [Enhancement] Filter: support Column–Column comparisons (col_a > col_b) | https://github.com/eddiethedean/robin-sparkless/issues/184 | | 185 | [Enhancement] filter(condition): document Column-only or accept literal bool | https://github.com/eddiethedean/robin-sparkless/issues/185 | | 186 | [Enhancement] lit(): extend to date/datetime types for PySpark parity | https://github.com/eddiethedean/robin-sparkless/issues/186 | | 187 | [Enhancement] Window API for row_number, rank, sum over window, lag, lead | https://github.com/eddiethedean/robin-sparkless/issues/187 | Created via `gh issue create -R eddiethedean/robin-sparkless` with body files in `docs/robin_issue_*.md`. ## Issues created 2026-02-06 (from ownership analysis) | # | Title | Link | |---|-------|------| | 174 | [Enhancement] Add Python operator overloads to Column for PySpark compatibility | https://github.com/eddiethedean/robin-sparkless/issues/174 | | 175 | [Enhancement] Join on= parameter: accept string for single column (PySpark compatibility) | https://github.com/eddiethedean/robin-sparkless/issues/175 | Created via `python scripts/create_robin_github_issues_2026_02.py` ## Sparkless parity issues created (earlier) - **#1–#17:** Created from initial subset (join, filter, select, transformations); see `scripts/create_robin_github_issues.py`. - **104 additional issues:** Created from broad parity run: same tests run in Robin mode (`tests/robin_parity_broad_results.txt`) and PySpark mode (`tests/pyspark_parity_failed_results.txt`); issues opened only for tests that **fail with Robin** and **pass with PySpark**, excluding the 17 above. Script: `scripts/create_robin_github_issues_from_results.py` (uses `--dry-run` to preview). - **Second batch (19 issues):** From `tests/parity/sql/` and `tests/parity/internal/`. Robin run saved to `tests/robin_parity_sql_internal_results.txt` (23 failed, 32 passed). Those 23 run in PySpark → 19 passed, 4 skipped. Issues created for the 19 parity gaps. Command to reproduce results: ```bash SPARKLESS_TEST_BACKEND=robin SPARKLESS_BACKEND=robin python -m pytest tests/parity/sql/ tests/parity/internal/ -v --tb=line -q 2>&1 | tee tests/robin_parity_sql_internal_results.txt ``` Then run failed IDs in PySpark and create issues: ```bash python scripts/create_robin_github_issues_from_results.py \ --robin-results tests/robin_parity_sql_internal_results.txt \ --pyspark-results tests/pyspark_parity_sql_internal_results.txt \ --no-already-filed ``` Use `--dry-run` to preview before creating issues. --- ## Sparkless integration note (no upstream feature request needed) **Finding:** Robin-sparkless already provides what Sparkless needs: - **Arbitrary schema:** Use `create_dataframe_from_rows(data, schema)` where `data` is a list of dicts or lists and `schema` is a list of `(column_name, dtype_str)` (e.g. `[("id", "bigint"), ("name", "string")]`). The 3-column restriction applies only to `create_dataframe()`. - **Operations:** The DataFrame API already has `filter`, `select`, `with_column`, `order_by`, `order_by_exprs`, `group_by`, `limit`, `union`, `union_by_name`, `join`, and `GroupedData` (count, sum, avg, min, max, agg, etc.). The gap is in **Sparkless**: our Robin materializer currently uses only `create_dataframe` (3-column) and supports only filter/select/limit. We will extend it to use `create_dataframe_from_rows` and to translate more operations to the existing robin-sparkless API. No upstream feature issues are required for “flexible schema” or “more operations.” --- ## Bug report template **Title:** [Bug] Short description of the bug **Body:** **Description** [One or two sentences describing the incorrect behavior.] **To reproduce** [Minimal code or steps, e.g. Sparkless snippet that calls robin_sparkless and triggers the bug.] ```python # Example: import robin_sparkless # ... ``` **Expected behavior** [What you expect to happen.] **Actual behavior** [What actually happens (error message, wrong result, etc.).] **Environment** - Python version: - robin-sparkless version: - OS: **Additional context** [Optional: stack trace, logs, or links to Sparkless integration code.]