pandera.api.pyspark.components.ColumnΒΆ
- class pandera.api.pyspark.components.Column(dtype=None, checks=None, nullable=False, coerce=False, required=True, name=None, regex=False, title=None, description=None, metadata=None)[source]ΒΆ
Validate types and properties of DataFrame columns.
Create column validator object.
- Parameters:
dtype (
Union
[str
,int
,float
,bool
,type
,DataType
,Type
,BooleanType
,StringType
,IntegerType
,DecimalType
,FloatType
,DateType
,TimestampType
,DoubleType
,ShortType
,ByteType
,LongType
,BinaryType
]) β datatype of the column. The datatype for type-checking a dataframe. If a string is specified, then assumes one of the valid pyspark string values: https://spark.apache.org/docs/latest/sql-ref-datatypes.htmlchecks (
Union
[Check
,List
[Check
],None
]) β checks to verify validity of the columnnullable (
bool
) β Whether or not column can contain null values.coerce (
bool
) β If True, when schema.validate is called the column will be coerced into the specified dtype. This has no effect on columns wheredtype=None
.required (
bool
) β Whether or not column is allowed to be missingname (
Union
[str
,Tuple
[str
, β¦],None
]) β column name in dataframe to validate.regex (
bool
) β whether thename
attribute should be treated as a regex pattern to apply to multiple columns in a dataframe.title (
Optional
[str
,None
]) β A human-readable label for the column.description (
Optional
[str
,None
]) β An arbitrary textual description of the column.metadata (
Optional
[dict
,None
]) β An optional key value data.
- Raises:
SchemaInitError β if impossible to build schema from parameters
- Example:
>>> import pyspark as ps >>> from pyspark.sql import SparkSession >>> import pandera.pyspark as pa >>> >>> >>> schema = pa.DataFrameSchema({ ... "column": pa.Column(str) ... }) >>> spark = SparkSession.builder.getOrCreate() >>> schema.validate(spark.createDataFrame([{"column": "foo"},{ "column":"bar"}])).show() +------+ |column| +------+ | foo| | bar| +------+
See here for more usage details.
Attributes
BACKEND_REGISTRY
dtype
Get the pyspark dtype
properties
Get column properties.
Methods
Create column validator object.
Get matching column names based on regex column name pattern.
Used to set or modify the name of a column object.
Validate a Column in a DataFrame object.
Alias for
validate
method.