pandera.api.checks.Check#

class pandera.api.checks.Check(check_fn, groups=None, groupby=None, ignore_na=True, element_wise=False, name=None, error=None, raise_warning=False, n_failure_cases=None, title=None, description=None, statistics=None, strategy=None, **check_kwargs)[source]#

Check a data object for certain properties.

Apply a validation function to a data object.

Parameters

check_fn (Callable) –
A function to check pandas data structure. For Column or SeriesSchema checks, if element_wise is True, this function should have the signature: Callable[[pd.Series], Union[pd.Series, bool]], where the output series is a boolean vector.

If element_wise is False, this function should have the signature: Callable[[Any], bool], where Any is an element in the column.

For DataFrameSchema checks, if element_wise=True, fn should have the signature: Callable[[pd.DataFrame], Union[pd.DataFrame, pd.Series, bool]], where the output dataframe or series contains booleans.

If element_wise is True, fn is applied to each row in the dataframe with the signature Callable[[pd.Series], bool] where the series input is a row in the dataframe.
groups (Union[str, List[str], None]) – The dict input to the fn callable will be constrained to the groups specified by groups.
groupby (Union[str, List[str], Callable, None]) –
If a string or list of strings is provided, these columns are used to group the Column series. If a callable is passed, the expected signature is: Callable[ [pd.DataFrame], pd.core.groupby.DataFrameGroupBy]

The the case of Column checks, this function has access to the entire dataframe, but Column.name is selected from this DataFrameGroupby object so that a SeriesGroupBy object is passed into check_fn.

Specifying the groupby argument changes the check_fn signature to:

Callable[[Dict[Union[str, Tuple[str]], pd.Series]], Union[bool, pd.Series]] # noqa

where the input is a dictionary mapping keys to subsets of the column/dataframe.
ignore_na (bool) – If True, null values will be ignored when determining if a check passed or failed. For dataframes, ignores rows with any null value. New in version 0.4.0
element_wise (bool) – Whether or not to apply validator in an element-wise fashion. If bool, assumes that all checks should be applied to the column element-wise. If list, should be the same number of elements as checks.
name (Optional[str]) – optional name for the check.
error (Optional[str]) – custom error message if series fails validation check.
raise_warning (bool) – if True, raise a UserWarning and do not throw exception instead of raising a SchemaError for a specific check. This option should be used carefully in cases where a failing check is informational and shouldn’t stop execution of the program.
n_failure_cases (Optional[int]) – report the first n unique failure cases. If None, report all failure cases.
title (Optional[str]) – A human-readable label for the check.
description (Optional[str]) – An arbitrary textual description of the check.
statistics (Optional[Dict[str, Any]]) – kwargs to pass into the check function. These values are serialized and represent the constraints of the checks.
strategy (Optional[SearchStrategy]) – A hypothesis strategy, used for implementing data synthesis strategies for this check.
check_kwargs – key-word arguments to pass into check_fn

Example

>>> import pandas as pd
>>> import pandera as pa
>>>
>>>
>>> # column checks are vectorized by default
>>> check_positive = pa.Check(lambda s: s > 0)
>>>
>>> # define an element-wise check
>>> check_even = pa.Check(lambda x: x % 2 == 0, element_wise=True)
>>>
>>> # checks can be given human-readable metadata
>>> check_with_metadata = pa.Check(
...     lambda x: True,
...     title="Always passes",
...     description="This check always passes."
... )
>>>
>>> # specify assertions across categorical variables using `groupby`,
>>> # for example, make sure the mean measure for group "A" is always
>>> # larger than the mean measure for group "B"
>>> check_by_group = pa.Check(
...     lambda measures: measures["A"].mean() > measures["B"].mean(),
...     groupby=["group"],
... )
>>>
>>> # define a wide DataFrame-level check
>>> check_dataframe = pa.Check(
...     lambda df: df["measure_1"] > df["measure_2"])
>>>
>>> measure_checks = [check_positive, check_even, check_by_group]
>>>
>>> schema = pa.DataFrameSchema(
...     columns={
...         "measure_1": pa.Column(int, checks=measure_checks),
...         "measure_2": pa.Column(int, checks=measure_checks),
...         "group": pa.Column(str),
...     },
...     checks=check_dataframe
... )
>>>
>>> df = pd.DataFrame({
...     "measure_1": [10, 12, 14, 16],
...     "measure_2": [2, 4, 6, 8],
...     "group": ["B", "B", "A", "A"]
... })
>>>
>>> schema.validate(df)[["measure_1", "measure_2", "group"]]
    measure_1  measure_2 group
0         10          2     B
1         12          4     B
2         14          6     A
3         16          8     A

See here for more usage details.

Attributes

`BACKEND_REGISTRY`
`CHECK_FUNCTION_REGISTRY`
`REGISTERED_CUSTOM_CHECKS`

Methods

`__init__`	Apply a validation function to a data object.
`between`	Alias of `in_range()`
`eq`	Alias of `equal_to()`
`equal_to`	Ensure all elements of a data container equal a certain value.
`ge`	Alias of `greater_than_or_equal_to()`
`greater_than`	Ensure values of a data container are strictly greater than a minimum value.
`greater_than_or_equal_to`	Ensure all values are greater or equal a certain value.
`gt`	Alias of `greater_than()`
`in_range`	Ensure all values of a series are within an interval.
`isin`	Ensure only allowed values occur within a series.
`le`	Alias of `less_than_or_equal_to()`
`less_than`	Ensure values of a series are strictly below a maximum value.
`less_than_or_equal_to`	Ensure values of a series are strictly below a maximum value.
`lt`	Alias of `less_than()`
`ne`	Alias of `not_equal_to()`
`not_equal_to`	Ensure no elements of a data container equals a certain value.
`notin`	Ensure some defined values don't occur within a series.
`str_contains`	Ensure that a pattern can be found within each row.
`str_endswith`	Ensure that all values end with a certain string.
`str_length`	Ensure that the length of strings is within a specified range.
`str_matches`	Ensure that string values match a regular expression.
`str_startswith`	Ensure that all values start with a certain string.
`unique_values_eq`	Ensure that unique values in the data object contain all values.
`__call__`	Validate pandas DataFrame or Series.