Data Types

Sparkless provides all PySpark-compatible data types.

Base Types

Mock data types and schema system for Sparkless.

This module provides comprehensive mock implementations of PySpark data types and schema structures that behave identically to the real PySpark types. Includes primitive types, complex types, schema definitions, and Row objects for complete type system compatibility.

Key Features:

Complete PySpark data type hierarchy
Primitive types (String, Integer, Long, Double, Boolean)
Complex types (Array, Map, Struct)
Schema definition with StructType and StructField
Row objects with PySpark-compatible interface
Type inference and conversion utilities

Example

>>> from sparkless.spark_types import StringType, IntegerType, StructType, StructField
>>> schema = StructType([
...     StructField("name", StringType()),
...     StructField("age", IntegerType())
... ])
>>> df = spark.createDataFrame(data, schema)

class sparkless.spark_types.DataType(nullable=True)[source]

Bases: object

Base class for mock data types.

Provides the foundation for all data types in the Sparkless type system. Supports nullable/non-nullable semantics and PySpark-compatible type names. Inherits from PySpark DataType when available for compatibility.

nullable: Whether the data type allows null values.

Example

>>> StringType()
StringType(nullable=True)
>>> IntegerType(nullable=False)
IntegerType(nullable=False)

Parameters:: nullable (bool)

__init__(nullable=True)[source]

Parameters:: nullable (bool)

__hash__()[source]

Hash method to make DataType hashable.

Return type:: int

typeName()[source]

Get PySpark-compatible type name.

Return type:: str

simpleString()[source]

Get PySpark-compatible simple string representation of the data type.

Return type:: str
Returns:: Simple string representation (e.g., “string”, “int”, “array<string>”).

Note

Fixed in version 3.23.0 (Issue #231): All DataType classes now implement simpleString() with PySpark-compatible string representations.

class sparkless.spark_types.StringType(nullable=True)[source]

Bases: DataType

Mock StringType.

Inherits from DataType which inherits from PySpark DataType when available. This avoids the singleton issue while maintaining compatibility.

Parameters:: nullable (bool)

Initialize StringType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize StringType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.IntegerType(nullable=True)[source]

Bases: DataType

Mock IntegerType.

Inherits from DataType which inherits from PySpark DataType when available.

Parameters:: nullable (bool)

Initialize IntegerType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize IntegerType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.LongType(nullable=True)[source]

Bases: DataType

Mock LongType.

Inherits from DataType which inherits from PySpark DataType when available.

Parameters:: nullable (bool)

Initialize LongType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize LongType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.DoubleType(nullable=True)[source]

Bases: DataType

Mock DoubleType.

Inherits from DataType which inherits from PySpark DataType when available.

Parameters:: nullable (bool)

Initialize DoubleType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize DoubleType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.BooleanType(nullable=True)[source]

Bases: DataType

Mock BooleanType.

Inherits from DataType which inherits from PySpark DataType when available.

Parameters:: nullable (bool)

Initialize BooleanType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize BooleanType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.DateType(nullable=True)[source]

Bases: DataType

Mock DateType.

Inherits from DataType which inherits from PySpark DataType when available.

Parameters:: nullable (bool)

Initialize DateType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize DateType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.TimestampType(nullable=True)[source]

Bases: DataType

Mock TimestampType.

Inherits from DataType which inherits from PySpark DataType when available.

Parameters:: nullable (bool)

Initialize TimestampType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize TimestampType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.DecimalType(precision=10, scale=0, nullable=True)[source]

Bases: DataType

Mock decimal type.

Parameters:

precision (int)
scale (int)
nullable (bool)

Initialize DecimalType.

Parameters:

precision (int)
scale (int)
nullable (bool)

__init__(precision=10, scale=0, nullable=True)[source]

Initialize DecimalType.

Parameters:

precision (int)
scale (int)
nullable (bool)

__repr__()[source]

String representation.

Return type:: str

class sparkless.spark_types.ArrayType(element_type=None, elementType=None, nullable=True)[source]

Bases: DataType

Mock array type.

Represents an array data type with PySpark-compatible initialization. Supports both PySpark’s camelCase keyword convention and backward-compatible snake_case naming.

Example

>>> # PySpark convention (camelCase)
>>> ArrayType(elementType=StringType())
>>> # Backward-compatible (snake_case)
>>> ArrayType(element_type=StringType())
>>> # Positional argument
>>> ArrayType(StringType())

Parameters:

element_type (Optional[DataType])
elementType (Optional[DataType])
nullable (bool)

Initialize ArrayType.

Parameters:

element_type (Optional[DataType]) – Element data type (positional or keyword with snake_case)
elementType (Optional[DataType]) – Element data type (keyword, PySpark convention - Issue #247)
nullable (bool) – Whether the array can contain null values

Either element_type (positional/keyword) or elementType (keyword) must be provided.

Raises:: TypeError – If both elementType and element_type are provided, or if neither is provided.

Note

This matches PySpark’s ArrayType API. Using elementType keyword argument provides full PySpark compatibility (Issue #247).

__init__(element_type=None, elementType=None, nullable=True)[source]

Initialize ArrayType.

Parameters:

element_type (Optional[DataType]) – Element data type (positional or keyword with snake_case)
elementType (Optional[DataType]) – Element data type (keyword, PySpark convention - Issue #247)
nullable (bool) – Whether the array can contain null values

Either element_type (positional/keyword) or elementType (keyword) must be provided.

Raises:: TypeError – If both elementType and element_type are provided, or if neither is provided.

Note

This matches PySpark’s ArrayType API. Using elementType keyword argument provides full PySpark compatibility (Issue #247).

__repr__()[source]

String representation.

Return type:: str

simpleString()[source]

Get PySpark-compatible simple string representation.

Return type:: str

class sparkless.spark_types.MapType(key_type, value_type, nullable=True)[source]

Bases: DataType

Mock map type.

Parameters:

key_type (DataType)
value_type (DataType)
nullable (bool)

Initialize MapType.

Parameters:

key_type (DataType)
value_type (DataType)
nullable (bool)

__init__(key_type, value_type, nullable=True)[source]

Initialize MapType.

Parameters:

key_type (DataType)
value_type (DataType)
nullable (bool)

__repr__()[source]

String representation.

Return type:: str

simpleString()[source]

Get PySpark-compatible simple string representation.

Return type:: str

class sparkless.spark_types.BinaryType(nullable=True)[source]

Bases: DataType

Mock BinaryType for binary data.

Parameters:: nullable (bool)

Initialize BinaryType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize BinaryType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.NullType(nullable=True)[source]

Bases: DataType

Mock NullType for null values.

Parameters:: nullable (bool)

Initialize NullType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize NullType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.FloatType(nullable=True)[source]

Bases: DataType

Mock FloatType for single precision floating point numbers.

Parameters:: nullable (bool)

Initialize FloatType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize FloatType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.ShortType(nullable=True)[source]

Bases: DataType

Mock ShortType for short integers.

Parameters:: nullable (bool)

Initialize ShortType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize ShortType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.ByteType(nullable=True)[source]

Bases: DataType

Mock ByteType for byte values.

Parameters:: nullable (bool)

Initialize ByteType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize ByteType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.CharType(length=1, nullable=True)[source]

Bases: DataType

Mock CharType for fixed-length character strings.

Parameters:

length (int)
nullable (bool)

__init__(length=1, nullable=True)[source]

Parameters:

length (int)
nullable (bool)

class sparkless.spark_types.VarcharType(length=255, nullable=True)[source]

Bases: DataType

Mock VarcharType for variable-length character strings.

Parameters:

length (int)
nullable (bool)

__init__(length=255, nullable=True)[source]

Parameters:

length (int)
nullable (bool)

class sparkless.spark_types.TimestampNTZType(nullable=True)[source]

Bases: DataType

Mock TimestampNTZType for timestamp without timezone.

Parameters:: nullable (bool)

Initialize TimestampNTZType.

Parameters:: nullable (bool) – Whether the type allows null values.

__init__(nullable=True)[source]

Initialize TimestampNTZType.

Parameters:: nullable (bool) – Whether the type allows null values.

class sparkless.spark_types.IntervalType(start_field='YEAR', end_field='MONTH', nullable=True)[source]

Bases: DataType

Mock IntervalType for time intervals.

Parameters:

start_field (str)
end_field (str)
nullable (bool)

__init__(start_field='YEAR', end_field='MONTH', nullable=True)[source]

Parameters:

start_field (str)
end_field (str)
nullable (bool)

class sparkless.spark_types.YearMonthIntervalType(start_field='YEAR', end_field='MONTH', nullable=True)[source]

Bases: DataType

Mock YearMonthIntervalType for year-month intervals.

Parameters:

start_field (str)
end_field (str)
nullable (bool)

__init__(start_field='YEAR', end_field='MONTH', nullable=True)[source]

Parameters:

start_field (str)
end_field (str)
nullable (bool)

class sparkless.spark_types.DayTimeIntervalType(start_field='DAY', end_field='SECOND', nullable=True)[source]

Bases: DataType

Mock DayTimeIntervalType for day-time intervals.

Parameters:

start_field (str)
end_field (str)
nullable (bool)

__init__(start_field='DAY', end_field='SECOND', nullable=True)[source]

Parameters:

start_field (str)
end_field (str)
nullable (bool)

class sparkless.spark_types.StructField(name, dataType, nullable=True, metadata=None, default_value=None)[source]

Bases: object

Mock StructField for schema definition.

Inherits from PySpark StructField when available for compatibility.

Parameters:

name (str)
dataType (DataType)
nullable (bool)
metadata (Optional[Dict[str, Any]])
default_value (Optional[Any])

name: str

dataType: DataType

nullable: bool = True

metadata: Dict[str, Any] | None = None

default_value: Any | None = None

__init__(name, dataType, nullable=True, metadata=None, default_value=None)

Parameters:

name (str)
dataType (DataType)
nullable (bool)
metadata (Optional[Dict[str, Any]])
default_value (Optional[Any])

class sparkless.spark_types.StructType(fields=None, nullable=True)[source]

Bases: DataType

Mock StructType for schema definition.

Inherits from PySpark StructType when available for compatibility.

Parameters:

fields (Optional[List[StructField]])
nullable (bool)

__init__(fields=None, nullable=True)[source]

Parameters:

fields (Optional[List[StructField]])
nullable (bool)

fields: List[StructField]

simpleString()[source]

Get PySpark-compatible simple string representation.

Return type:: str

merge_with(other)[source]

Merge this schema with another, adding new fields from other.

Parameters:: other (StructType) – Schema to merge with
Return type:: StructType
Returns:: New schema with fields from both schemas

has_same_columns(other)[source]

Check if two schemas have the same column names.

Parameters:: other (StructType) – Schema to compare with
Return type:: bool
Returns:: True if column names match, False otherwise

fieldNames()[source]

Get list of field names.

Return type:: List[str]

getFieldIndex(name)[source]

Get index of field by name.

Parameters:: name (str)
Return type:: int

contains(name)[source]

Check if field exists in schema.

Parameters:: name (str)
Return type:: bool

add_field(field)[source]

Add a field to the struct type.

Parameters:: field (StructField)
Return type:: None

get_field_by_name(name)[source]

Get field by name.

Parameters:: name (str)
Return type:: Optional[StructField]

has_field(name)[source]

Check if field exists in schema.

Parameters:: name (str)
Return type:: bool

class sparkless.spark_types.MockDatabase(name, description=None, locationUri=None)[source]

Bases: object

Mock database representation.

Parameters:

name (str)
description (Optional[str])
locationUri (Optional[str])

name: str

description: str | None = None

locationUri: str | None = None

__init__(name, description=None, locationUri=None)

Parameters:

name (str)
description (Optional[str])
locationUri (Optional[str])

class sparkless.spark_types.MockTable(name, database, tableType='MANAGED', isTemporary=False)[source]

Bases: object

Mock table representation.

Parameters:

name (str)
database (str)
tableType (str)
isTemporary (bool)

name: str

database: str

tableType: str = 'MANAGED'

isTemporary: bool = False

__init__(name, database, tableType='MANAGED', isTemporary=False)

Parameters:

name (str)
database (str)
tableType (str)
isTemporary (bool)

sparkless.spark_types.convert_python_type_to_mock_type(python_type)[source]

Convert Python type to DataType.

Parameters:: python_type (type)
Return type:: DataType

sparkless.spark_types.infer_schema_from_data(data)[source]

Infer schema from data.

Parameters:: data (List[Dict[str, Any]])
Return type:: StructType

sparkless.spark_types.create_schema_from_columns(columns)[source]

Create schema from column names (all StringType).

Parameters:: columns (List[str])
Return type:: StructType

sparkless.spark_types.get_row_value(row, key, default=None)[source]

Get value from Row or dict by key (PySpark-compatible; Row has no .get()).

Parameters:

row (Any)
key (str)
default (Any)

Return type:

Any

class sparkless.spark_types.Row(data=None, schema=None, **kwargs)[source]

Bases: object

Mock Row object providing PySpark-compatible row interface.

Represents a single row in a DataFrame with PySpark-compatible methods for accessing data by index, key, or attribute. Use row[key] or row.field_name (PySpark Row does not support .get()).

data: Dictionary containing row data.

Example

>>> row = Row({"name": "Alice", "age": 25})
>>> row.name
'Alice'
>>> row["name"]
'Alice'
>>> row[0]
'Alice'
>>> row.asDict()
{'name': 'Alice', 'age': 25}

Parameters:

data (Any)
schema (Optional[StructType])
kwargs (Any)

Initialize Row.

Parameters:

data (Any) – Row data. Accepts dict, list of tuples, or sequence-like. If None and kwargs are provided, kwargs are used as data (PySpark-compatible).
schema (Optional[StructType]) – Optional schema providing ordered field names for index access.
**kwargs (Any) – Optional keyword arguments for kwargs-style initialization (PySpark-compatible). Example: Row(Column1=”Value1”, Column2=2)

Example

>>> row = Row({"name": "Alice", "age": 25})
>>> row = Row(name="Alice", age=25)  # kwargs-style
>>> row.name
'Alice'

__init__(data=None, schema=None, **kwargs)[source]

Initialize Row.

Parameters:

data (Any) – Row data. Accepts dict, list of tuples, or sequence-like. If None and kwargs are provided, kwargs are used as data (PySpark-compatible).
schema (Optional[StructType]) – Optional schema providing ordered field names for index access.
**kwargs (Any) – Optional keyword arguments for kwargs-style initialization (PySpark-compatible). Example: Row(Column1=”Value1”, Column2=2)

Example

>>> row = Row({"name": "Alice", "age": 25})
>>> row = Row(name="Alice", age=25)  # kwargs-style
>>> row.name
'Alice'

data: List[Tuple[str, Any]] | Dict[str, Any]

__getitem__(key)[source]

Get item by column name or index (PySpark-compatible).

Parameters:: key (Any)
Return type:: Any

__contains__(key)[source]

Check if key exists.

Parameters:: key (str)
Return type:: bool

values()[source]

Get values.

Return type:: ValuesView[Any]

items()[source]

Get items.

Return type:: ItemsView[str, Any]

__len__()[source]

Get length.

Return type:: int

__eq__(other)[source]

Compare with another row object.

Parameters:: other (Any)
Return type:: bool

asDict()[source]

Convert to dictionary (PySpark compatibility).

Return type:: Dict[str, Any]

__getattr__(name)[source]

Get value by attribute name (PySpark compatibility).

Parameters:: name (str)
Return type:: Any

__iter__()[source]

Iterate values in schema order if available, else dict order.

Return type:: Iterator[Any]

__repr__()[source]

String representation matching PySpark format.

Return type:: str

Primitive Types

The following primitive types are available:

StringType - String data type
IntegerType - 32-bit integer
LongType - 64-bit integer
ShortType - 16-bit integer
ByteType - 8-bit integer
DoubleType - 64-bit floating point
FloatType - 32-bit floating point
BooleanType - Boolean type
DateType - Date type
TimestampType - Timestamp type
BinaryType - Binary data type
NullType - Null type

Complex Types

ArrayType - Array of elements
MapType - Map/dictionary type
StructType - Struct/row type
StructField - Field in a struct
DecimalType - Decimal/precision numeric type

Usage Examples

from sparkless.sql.types import (
    StructType, StructField, StringType, IntegerType,
    ArrayType, MapType, DoubleType
)

# Simple schema
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), False)
])

# Complex schema with arrays and maps
complex_schema = StructType([
    StructField("id", IntegerType(), False),
    StructField("tags", ArrayType(StringType()), True),
    StructField("metadata", MapType(StringType(), StringType()), True),
    StructField("score", DoubleType(), True)
])