Troubleshootingο
Native dependency crashes (Polars / Arrow)ο
Sparkless uses Polars (and optionally Arrow) for execution. On some environments you may see process crashes (e.g. segmentation fault, exit code 139) during tests or when running scripts.
What to tryο
Isolate the failure
Run a minimal script that only creates a session and one DataFrame operation. If that crashes, the issue is likely in the native toolchain (Polars/Arrow/Rust) or the Python version.Pin Polars / PyArrow
If you recently upgraded Polars or PyArrow, try pinning to a known-good version (seepyproject.tomlordocs/requirements.txtfor versions used in CI).Use a supported Python version
Sparkless supports Python 3.9β3.12. Crashes on older or newer interpreters may be due to ABI or build mismatches.Run without coverage
Coverage instrumentation can trigger different code paths. Run tests with--no-covto see if the crash disappears.Avoid Hypothesis in-process DB under load
In some environments, Hypothesisβs in-memory database during collection can contribute to instability. Running a subset of tests (e.g. by path or marker) can help narrow this down.
Pure-Python fallbacksο
Some operations have pure-Python fallbacks when the Polars path fails or is unavailable:
Percentile / approximate percentile β If the Polars implementation is not available or raises, the engine may fall back to a Python implementation. Results should still be correct; performance may be lower.
Covariance / correlation β Similarly, covariance and correlation can use Python fallbacks when the native path is not used.
If you rely on these and see crashes, ensure you are on a supported Polars version and Python version; the fallbacks are intended to improve robustness rather than to work around broken native builds.
Session or catalog errorsο
βNo active SparkSessionβ β Ensure you create a session (e.g.
SparkSession("MyApp")orSparkSession.builder.appName("MyApp").getOrCreate()) before callingF.col,F.lit, or other session-dependent functions.Catalog / database not found β Use
spark.catalog.setCurrentDatabase("your_db")(or the builder/config equivalent) so that session-aware helpers and schema tracking see the correct database. See Configuration and the βSession-aware literals and schema trackingβ section in the getting started guide.
Tests failing only with PySpark or only with mockο
Run with an explicit backend:
MOCK_SPARK_TEST_BACKEND=mockorMOCK_SPARK_TEST_BACKEND=pysparkso behavior is deterministic.Some tests are skipped when PySpark or Delta is not installed; see test docstrings and
tests/conftest.pyfor markers and skip conditions.
Getting helpο
Known issues: Known issues documents limitations (e.g. Delta schema evolution).
GitHub issues: github.com/eddiethedean/sparkless/issues for bugs and feature requests.