Troubleshooting

Native dependency crashes (Polars / Arrow)

Sparkless uses Polars (and optionally Arrow) for execution. On some environments you may see process crashes (e.g. segmentation fault, exit code 139) during tests or when running scripts.

What to try

  1. Isolate the failure
    Run a minimal script that only creates a session and one DataFrame operation. If that crashes, the issue is likely in the native toolchain (Polars/Arrow/Rust) or the Python version.

  2. Pin Polars / PyArrow
    If you recently upgraded Polars or PyArrow, try pinning to a known-good version (see pyproject.toml or docs/requirements.txt for versions used in CI).

  3. Use a supported Python version
    Sparkless supports Python 3.9–3.12. Crashes on older or newer interpreters may be due to ABI or build mismatches.

  4. Run without coverage
    Coverage instrumentation can trigger different code paths. Run tests with --no-cov to see if the crash disappears.

  5. Avoid Hypothesis in-process DB under load
    In some environments, Hypothesis’s in-memory database during collection can contribute to instability. Running a subset of tests (e.g. by path or marker) can help narrow this down.

Pure-Python fallbacks

Some operations have pure-Python fallbacks when the Polars path fails or is unavailable:

  • Percentile / approximate percentile – If the Polars implementation is not available or raises, the engine may fall back to a Python implementation. Results should still be correct; performance may be lower.

  • Covariance / correlation – Similarly, covariance and correlation can use Python fallbacks when the native path is not used.

If you rely on these and see crashes, ensure you are on a supported Polars version and Python version; the fallbacks are intended to improve robustness rather than to work around broken native builds.

Session or catalog errors

  • β€œNo active SparkSession” – Ensure you create a session (e.g. SparkSession("MyApp") or SparkSession.builder.appName("MyApp").getOrCreate()) before calling F.col, F.lit, or other session-dependent functions.

  • Catalog / database not found – Use spark.catalog.setCurrentDatabase("your_db") (or the builder/config equivalent) so that session-aware helpers and schema tracking see the correct database. See Configuration and the β€œSession-aware literals and schema tracking” section in the getting started guide.

Tests failing only with PySpark or only with mock

  • Run with an explicit backend: MOCK_SPARK_TEST_BACKEND=mock or MOCK_SPARK_TEST_BACKEND=pyspark so behavior is deterministic.

  • Some tests are skipped when PySpark or Delta is not installed; see test docstrings and tests/conftest.py for markers and skip conditions.

Getting help