Data Validation with GeoPandas#
new in 0.9.0
GeoPandas is an extension of Pandas that adds
support for geospatial data. You can use pandera to validate GeoDataFrame()
and GeoSeries()
objects directly. First, install
pandera
with the geopandas
extra:
pip install pandera[geopandas]
Then you can use pandera schemas to validate geodataframes. In the example
below we’ll use the class-based API to define a
DataFrameModel
for validation.
import geopandas as gpd
import pandas as pd
import pandera as pa
from shapely.geometry import Polygon
geo_schema = pa.DataFrameSchema({
"geometry": pa.Column("geometry"),
"region": pa.Column(str),
})
geo_df = gpd.GeoDataFrame({
"geometry": [
Polygon(((0, 0), (0, 1), (1, 1), (1, 0))),
Polygon(((0, 0), (0, -1), (-1, -1), (-1, 0)))
],
"region": ["NA", "SA"]
})
print(geo_schema.validate(geo_df))
geometry region
0 POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1.... NA
1 POLYGON ((0.00000 0.00000, 0.00000 -1.00000, -... SA
You can also use the GeometryDtype
data type in either instantiated or
un-instantiated form:
geo_schema = pa.DataFrameSchema({
"geometry": pa.Column(gpd.array.GeometryDtype),
# or
"geometry": pa.Column(gpd.array.GeometryDtype()),
})
If you want to validate-on-instantiation, you can use the
GeoDataFrame
generic type with the
dataframe model defined above:
from pandera.typing import Series
from pandera.typing.geopandas import GeoDataFrame, GeoSeries
class Schema(pa.DataFrameModel):
geometry: GeoSeries
region: Series[str]
# create a geodataframe that's validated on object initialization
df = GeoDataFrame[Schema](
{
'geometry': [
Polygon(((0, 0), (0, 1), (1, 1), (1, 0))),
Polygon(((0, 0), (0, -1), (-1, -1), (-1, 0)))
],
'region': ['NA','SA']
}
)
print(df)
geometry region
0 POLYGON ((0.00000 0.00000, 0.00000 1.00000, 1.... NA
1 POLYGON ((0.00000 0.00000, 0.00000 -1.00000, -... SA