# Troubleshooting ## Native dependency crashes (Polars / Arrow) Sparkless uses Polars (and optionally Arrow) for execution. On some environments you may see process crashes (e.g. segmentation fault, exit code 139) during tests or when running scripts. ### What to try 1. **Isolate the failure** Run a minimal script that only creates a session and one DataFrame operation. If that crashes, the issue is likely in the native toolchain (Polars/Arrow/Rust) or the Python version. 2. **Pin Polars / PyArrow** If you recently upgraded Polars or PyArrow, try pinning to a known-good version (see `pyproject.toml` or `docs/requirements.txt` for versions used in CI). 3. **Use a supported Python version** Sparkless supports Python 3.9–3.12. Crashes on older or newer interpreters may be due to ABI or build mismatches. 4. **Run without coverage** Coverage instrumentation can trigger different code paths. Run tests with `--no-cov` to see if the crash disappears. 5. **Avoid Hypothesis in-process DB under load** In some environments, Hypothesis’s in-memory database during collection can contribute to instability. Running a subset of tests (e.g. by path or marker) can help narrow this down. ### Pure-Python fallbacks Some operations have **pure-Python fallbacks** when the Polars path fails or is unavailable: - **Percentile / approximate percentile** – If the Polars implementation is not available or raises, the engine may fall back to a Python implementation. Results should still be correct; performance may be lower. - **Covariance / correlation** – Similarly, covariance and correlation can use Python fallbacks when the native path is not used. If you rely on these and see crashes, ensure you are on a supported Polars version and Python version; the fallbacks are intended to improve robustness rather than to work around broken native builds. ## Session or catalog errors - **"No active SparkSession"** – Ensure you create a session (e.g. `SparkSession("MyApp")` or `SparkSession.builder.appName("MyApp").getOrCreate()`) before calling `F.col`, `F.lit`, or other session-dependent functions. - **Catalog / database not found** – Use `spark.catalog.setCurrentDatabase("your_db")` (or the builder/config equivalent) so that session-aware helpers and schema tracking see the correct database. See [Configuration](configuration.md) and the “Session-aware literals and schema tracking” section in the getting started guide. ## Tests failing only with PySpark or only with mock - Run with an explicit backend: `MOCK_SPARK_TEST_BACKEND=mock` or `MOCK_SPARK_TEST_BACKEND=pyspark` so behavior is deterministic. - Some tests are skipped when PySpark or Delta is not installed; see test docstrings and `tests/conftest.py` for markers and skip conditions. ## Getting help - **Known issues**: [Known issues](../known_issues.md) documents limitations (e.g. Delta schema evolution). - **GitHub issues**: [github.com/eddiethedean/sparkless/issues](https://github.com/eddiethedean/sparkless/issues) for bugs and feature requests.