Data Types
Sparkless provides all PySpark-compatible data types.
Base Types
Mock data types and schema system for Sparkless.
This module provides comprehensive mock implementations of PySpark data types and schema structures that behave identically to the real PySpark types. Includes primitive types, complex types, schema definitions, and Row objects for complete type system compatibility.
- Key Features:
Complete PySpark data type hierarchy
Primitive types (String, Integer, Long, Double, Boolean)
Complex types (Array, Map, Struct)
Schema definition with StructType and StructField
Row objects with PySpark-compatible interface
Type inference and conversion utilities
Example
>>> from sparkless.spark_types import StringType, IntegerType, StructType, StructField
>>> schema = StructType([
... StructField("name", StringType()),
... StructField("age", IntegerType())
... ])
>>> df = spark.createDataFrame(data, schema)
- class sparkless.spark_types.DataType(nullable=True)[source]
Bases:
objectBase class for mock data types.
Provides the foundation for all data types in the Sparkless type system. Supports nullable/non-nullable semantics and PySpark-compatible type names. Inherits from PySpark DataType when available for compatibility.
- nullable
Whether the data type allows null values.
Example
>>> StringType() StringType(nullable=True) >>> IntegerType(nullable=False) IntegerType(nullable=False)
- Parameters:
nullable (
bool)
- simpleString()[source]
Get PySpark-compatible simple string representation of the data type.
- Return type:
- Returns:
Simple string representation (e.g., “string”, “int”, “array<string>”).
Note
Fixed in version 3.23.0 (Issue #231): All DataType classes now implement simpleString() with PySpark-compatible string representations.
- class sparkless.spark_types.StringType(nullable=True)[source]
Bases:
DataTypeMock StringType.
Inherits from DataType which inherits from PySpark DataType when available. This avoids the singleton issue while maintaining compatibility.
- Parameters:
nullable (
bool)
Initialize StringType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.IntegerType(nullable=True)[source]
Bases:
DataTypeMock IntegerType.
Inherits from DataType which inherits from PySpark DataType when available.
- Parameters:
nullable (
bool)
Initialize IntegerType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.LongType(nullable=True)[source]
Bases:
DataTypeMock LongType.
Inherits from DataType which inherits from PySpark DataType when available.
- Parameters:
nullable (
bool)
Initialize LongType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.DoubleType(nullable=True)[source]
Bases:
DataTypeMock DoubleType.
Inherits from DataType which inherits from PySpark DataType when available.
- Parameters:
nullable (
bool)
Initialize DoubleType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.BooleanType(nullable=True)[source]
Bases:
DataTypeMock BooleanType.
Inherits from DataType which inherits from PySpark DataType when available.
- Parameters:
nullable (
bool)
Initialize BooleanType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.DateType(nullable=True)[source]
Bases:
DataTypeMock DateType.
Inherits from DataType which inherits from PySpark DataType when available.
- Parameters:
nullable (
bool)
Initialize DateType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.TimestampType(nullable=True)[source]
Bases:
DataTypeMock TimestampType.
Inherits from DataType which inherits from PySpark DataType when available.
- Parameters:
nullable (
bool)
Initialize TimestampType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.DecimalType(precision=10, scale=0, nullable=True)[source]
Bases:
DataTypeMock decimal type.
Initialize DecimalType.
- class sparkless.spark_types.ArrayType(element_type=None, elementType=None, nullable=True)[source]
Bases:
DataTypeMock array type.
Represents an array data type with PySpark-compatible initialization. Supports both PySpark’s camelCase keyword convention and backward-compatible snake_case naming.
Example
>>> # PySpark convention (camelCase) >>> ArrayType(elementType=StringType()) >>> # Backward-compatible (snake_case) >>> ArrayType(element_type=StringType()) >>> # Positional argument >>> ArrayType(StringType())
Initialize ArrayType.
- Parameters:
Either element_type (positional/keyword) or elementType (keyword) must be provided.
- Raises:
TypeError – If both elementType and element_type are provided, or if neither is provided.
Note
This matches PySpark’s ArrayType API. Using elementType keyword argument provides full PySpark compatibility (Issue #247).
- __init__(element_type=None, elementType=None, nullable=True)[source]
Initialize ArrayType.
- Parameters:
Either element_type (positional/keyword) or elementType (keyword) must be provided.
- Raises:
TypeError – If both elementType and element_type are provided, or if neither is provided.
Note
This matches PySpark’s ArrayType API. Using elementType keyword argument provides full PySpark compatibility (Issue #247).
- class sparkless.spark_types.MapType(key_type, value_type, nullable=True)[source]
Bases:
DataTypeMock map type.
Initialize MapType.
- class sparkless.spark_types.BinaryType(nullable=True)[source]
Bases:
DataTypeMock BinaryType for binary data.
- Parameters:
nullable (
bool)
Initialize BinaryType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.NullType(nullable=True)[source]
Bases:
DataTypeMock NullType for null values.
- Parameters:
nullable (
bool)
Initialize NullType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.FloatType(nullable=True)[source]
Bases:
DataTypeMock FloatType for single precision floating point numbers.
- Parameters:
nullable (
bool)
Initialize FloatType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.ShortType(nullable=True)[source]
Bases:
DataTypeMock ShortType for short integers.
- Parameters:
nullable (
bool)
Initialize ShortType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.ByteType(nullable=True)[source]
Bases:
DataTypeMock ByteType for byte values.
- Parameters:
nullable (
bool)
Initialize ByteType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.CharType(length=1, nullable=True)[source]
Bases:
DataTypeMock CharType for fixed-length character strings.
- class sparkless.spark_types.VarcharType(length=255, nullable=True)[source]
Bases:
DataTypeMock VarcharType for variable-length character strings.
- class sparkless.spark_types.TimestampNTZType(nullable=True)[source]
Bases:
DataTypeMock TimestampNTZType for timestamp without timezone.
- Parameters:
nullable (
bool)
Initialize TimestampNTZType.
- Parameters:
nullable (
bool) – Whether the type allows null values.
- class sparkless.spark_types.IntervalType(start_field='YEAR', end_field='MONTH', nullable=True)[source]
Bases:
DataTypeMock IntervalType for time intervals.
- class sparkless.spark_types.YearMonthIntervalType(start_field='YEAR', end_field='MONTH', nullable=True)[source]
Bases:
DataTypeMock YearMonthIntervalType for year-month intervals.
- class sparkless.spark_types.DayTimeIntervalType(start_field='DAY', end_field='SECOND', nullable=True)[source]
Bases:
DataTypeMock DayTimeIntervalType for day-time intervals.
- class sparkless.spark_types.StructField(name, dataType, nullable=True, metadata=None, default_value=None)[source]
Bases:
objectMock StructField for schema definition.
Inherits from PySpark StructField when available for compatibility.
- Parameters:
- class sparkless.spark_types.StructType(fields=None, nullable=True)[source]
Bases:
DataTypeMock StructType for schema definition.
Inherits from PySpark StructType when available for compatibility.
- Parameters:
fields (
Optional[List[StructField]])nullable (
bool)
- __init__(fields=None, nullable=True)[source]
- Parameters:
fields (
Optional[List[StructField]])nullable (
bool)
- fields: List[StructField]
- merge_with(other)[source]
Merge this schema with another, adding new fields from other.
- Parameters:
other (
StructType) – Schema to merge with- Return type:
- Returns:
New schema with fields from both schemas
- has_same_columns(other)[source]
Check if two schemas have the same column names.
- Parameters:
other (
StructType) – Schema to compare with- Return type:
- Returns:
True if column names match, False otherwise
- add_field(field)[source]
Add a field to the struct type.
- Parameters:
field (
StructField)- Return type:
- class sparkless.spark_types.MockDatabase(name, description=None, locationUri=None)[source]
Bases:
objectMock database representation.
- class sparkless.spark_types.MockTable(name, database, tableType='MANAGED', isTemporary=False)[source]
Bases:
objectMock table representation.
- sparkless.spark_types.convert_python_type_to_mock_type(python_type)[source]
Convert Python type to DataType.
- sparkless.spark_types.infer_schema_from_data(data)[source]
Infer schema from data.
- Parameters:
- Return type:
- sparkless.spark_types.create_schema_from_columns(columns)[source]
Create schema from column names (all StringType).
- Parameters:
- Return type:
- sparkless.spark_types.get_row_value(row, key, default=None)[source]
Get value from Row or dict by key (PySpark-compatible; Row has no .get()).
- class sparkless.spark_types.Row(data=None, schema=None, **kwargs)[source]
Bases:
objectMock Row object providing PySpark-compatible row interface.
Represents a single row in a DataFrame with PySpark-compatible methods for accessing data by index, key, or attribute. Use row[key] or row.field_name (PySpark Row does not support .get()).
- data
Dictionary containing row data.
Example
>>> row = Row({"name": "Alice", "age": 25}) >>> row.name 'Alice' >>> row["name"] 'Alice' >>> row[0] 'Alice' >>> row.asDict() {'name': 'Alice', 'age': 25}
- Parameters:
data (
Any)schema (
Optional[StructType])kwargs (
Any)
Initialize Row.
- Parameters:
data (
Any) – Row data. Accepts dict, list of tuples, or sequence-like. If None and kwargs are provided, kwargs are used as data (PySpark-compatible).schema (
Optional[StructType]) – Optional schema providing ordered field names for index access.**kwargs (
Any) – Optional keyword arguments for kwargs-style initialization (PySpark-compatible). Example: Row(Column1=”Value1”, Column2=2)
Example
>>> row = Row({"name": "Alice", "age": 25}) >>> row = Row(name="Alice", age=25) # kwargs-style >>> row.name 'Alice'
- __init__(data=None, schema=None, **kwargs)[source]
Initialize Row.
- Parameters:
data (
Any) – Row data. Accepts dict, list of tuples, or sequence-like. If None and kwargs are provided, kwargs are used as data (PySpark-compatible).schema (
Optional[StructType]) – Optional schema providing ordered field names for index access.**kwargs (
Any) – Optional keyword arguments for kwargs-style initialization (PySpark-compatible). Example: Row(Column1=”Value1”, Column2=2)
Example
>>> row = Row({"name": "Alice", "age": 25}) >>> row = Row(name="Alice", age=25) # kwargs-style >>> row.name 'Alice'
Primitive Types
The following primitive types are available:
StringType- String data typeIntegerType- 32-bit integerLongType- 64-bit integerShortType- 16-bit integerByteType- 8-bit integerDoubleType- 64-bit floating pointFloatType- 32-bit floating pointBooleanType- Boolean typeDateType- Date typeTimestampType- Timestamp typeBinaryType- Binary data typeNullType- Null type
Complex Types
ArrayType- Array of elementsMapType- Map/dictionary typeStructType- Struct/row typeStructField- Field in a structDecimalType- Decimal/precision numeric type
Usage Examples
from sparkless.sql.types import (
StructType, StructField, StringType, IntegerType,
ArrayType, MapType, DoubleType
)
# Simple schema
schema = StructType([
StructField("name", StringType(), True),
StructField("age", IntegerType(), False)
])
# Complex schema with arrays and maps
complex_schema = StructType([
StructField("id", IntegerType(), False),
StructField("tags", ArrayType(StringType()), True),
StructField("metadata", MapType(StringType(), StringType()), True),
StructField("score", DoubleType(), True)
])