pydantic

pypi license

Current Version: 0.4

Data validation and settings management using python 3.6 type hinting.

Define how data should be in pure, canonical python; validate it with pydantic.

PEP 484 introduced type hinting into python 3.5, PEP 526 extended that with syntax for variable annotation in python 3.6.

pydantic uses those annotations to validate that untrusted data takes the form you want.

Example:

from datetime import datetime
from typing import List
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name = 'John Doe'
    signup_ts: datetime = None
    friends: List[int] = []

external_data = {'id': '123', 'signup_ts': '2017-06-01 12:22', 'friends': [1, '2', b'3']}
user = User(**external_data)
print(user)
# > User id=123 name='John Doe' signup_ts=datetime.datetime(2017, 6, 1, 12, 22) friends=[1, 2, 3]
print(user.id)
# > 123

(This script is complete, it should run “as is”)

What’s going on here:

  • id is of type int; the annotation only declaration tells pydantic that this field is required. Strings, bytes or floats will be coerced to ints if possible, otherwise an exception would be raised.
  • name is inferred as a string from the default, it is not required as it has a default.
  • signup_ts is a datetime field which is not required (None if it’s not supplied), pydantic will process either a unix timestamp int (e.g. 1496498400) or a string representing the date & time.
  • friends uses python’s typing system, it is required to be a list of integers, as with id integer-like objects will be converted to integers.

If validation fails pydantic with raise an error with a breakdown of what was wrong:

from pydantic import ValidationError
try:
    User(signup_ts='broken', friends=[1, 2, 'not number'])
except ValidationError as e:
    print(e.json())

"""
{
  "friends": [
    {
      "error_msg": "invalid literal for int() with base 10: 'not number'",
      "error_type": "ValueError",
      "index": 2,
      "track": "int"
    }
  ],
  "id": {
    "error_msg": "field required",
    "error_type": "Missing",
    "index": null,
    "track": null
  },
  "signup_ts": {
    "error_msg": "Invalid datetime format",
    "error_type": "ValueError",
    "index": null,
    "track": "datetime"
  }
}
"""

Rationale

So pydantic uses some cool new language feature, but why should I actually go an use it?

no brainfuck
no new schema definition micro-language to learn. If you know python (and perhaps skim read the type hinting docs) you know how to use pydantic.
plays nicely with your IDE/linter/brain
because pydantic data structures are just instances of classes you define; auto-completion, linting, mypy and your intuition should all work properly with your validated data.
dual use
pydantic’s BaseSettings class allows it to be used in both a “validate this request data” context and “load my system settings” context. The main difference being that system settings can can have defaults changed by environment variables and more complex objects like DSNs and python objects often required.
fast
In benchmarks pydantic is around twice as fast as trafaret. Other comparisons to cerberus, marshmallow, DRF, jsonmodels etc. to come.
validate complex structures
use of recursive pydantic models and typing’s List and Dict etc. allow complex data schemas to be clearly and easily defined.
extendible
pydantic allows custom data types to be defined or you can extend validation with the clean_* methods on a model.

Install

Just:

pip install pydantic

pydantic has no required dependencies except python 3.6+. If you’ve got python 3.6 and pip installed - you’re good to go.

If you want pydantic to parse msgpack you can add msgpack-python as an optional dependency, same goes for reading json faster with ujson:

pip install pydantic[msgpack]
# or
pip install pydantic[ujson]
# or just
pip install pydantic[msgpack,ujson]

Usage

PEP 484 Types

pydantic uses typing types to define more complex objects.

from typing import Dict, List, Optional, Union, Set

from pydantic import BaseModel


class Model(BaseModel):
    simple_list: list = None
    list_of_ints: List[int] = None

    simple_dict: dict = None
    dict_str_float: Dict[str, float] = None

    simple_set: set = None
    set_bytes: Set[bytes] = None

    str_or_bytes: Union[str, bytes] = None
    none_or_str: Optional[str] = None

    compound: Dict[Union[str, bytes], List[Set[int]]] = None

print(Model(simple_list=['1', '2', '3']).simple_list)  # > ['1', '2', '3']
print(Model(list_of_ints=['1', '2', '3']).list_of_ints)  # > [1, 2, 3]

print(Model(simple_dict={'a': 1, b'b': 2}).simple_dict)  # > {'a': 1, b'b': 2}
print(Model(dict_str_float={'a': 1, b'b': 2}).dict_str_float)  # > {'a': 1.0, 'b': 2.0}

(This script is complete, it should run “as is”)

Choices

pydantic uses python’s standard enum classes to define choices.

from enum import Enum, IntEnum

from pydantic import BaseModel


class FruitEnum(str, Enum):
    pear = 'pear'
    banana = 'banana'


class ToolEnum(IntEnum):
    spanner = 1
    wrench = 2


class CookingModel(BaseModel):
    fruit: FruitEnum = FruitEnum.pear
    tool: ToolEnum = ToolEnum.spanner


print(CookingModel())
# > CookingModel fruit=<FruitEnum.pear: 'pear'> tool=<ToolEnum.spanner: 1>
print(CookingModel(tool=2, fruit='banana'))
# > CookingModel fruit=<FruitEnum.banana: 'banana'> tool=<ToolEnum.wrench: 2>
print(CookingModel(fruit='other'))
# will raise a validation error

(This script is complete, it should run “as is”)

Recursive Models

More complex hierarchical data structures can be defined using models as types in annotations themselves.

The ellipsis ... just means “Required” same as annotation only declarations above.

from typing import List
from pydantic import BaseModel

class Foo(BaseModel):
    count: int = ...
    size: float = None

class Bar(BaseModel):
    apple = 'x'
    banana = 'y'

class Spam(BaseModel):
    foo: Foo = ...
    bars: List[Bar] = ...


m = Spam(foo={'count': 4}, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
print(m)
# > Spam foo=<Foo count=4 size=None> bars=[<Bar apple='x1' banana='y'>, <Bar apple='x2' banana='y'>]
print(m.values())
# {'foo': {'count': 4, 'size': None}, 'bars': [{'apple': 'x1', 'banana': 'y'}, {'apple': 'x2', 'banana': 'y'}]}

(This script is complete, it should run “as is”)

Error Handling

from typing import List

from pydantic import BaseModel, ValidationError


class Location(BaseModel):
    lat = 0.1
    lng = 10.1

class Model(BaseModel):
    list_of_ints: List[int] = None
    a_float: float = None
    is_required: float = ...
    recursive_model: Location = None

try:
    Model(list_of_ints=['1', 2, 'bad'], a_float='not a float', recursive_model={'lat': 4.2, 'lng': 'New York'})
except ValidationError as e:
    print(e)

"""
4 errors validating input
list_of_ints:
  invalid literal for int() with base 10: 'bad' (error_type=ValueError track=int index=2)
a_float:
  could not convert string to float: 'not a float' (error_type=ValueError track=float)
is_required:
  field required (error_type=Missing)
recursive_model:
  error validating input (error_type=ValidationError track=Location)
    lng:
      could not convert string to float: 'New York' (error_type=ValueError track=float
"""

try:
    Model(list_of_ints=1, a_float=None, recursive_model=[1, 2, 3])
except ValidationError as e:
    print(e.json())

"""
{
  "is_required": {
    "error_msg": "field required",
    "error_type": "Missing",
    "index": null,
    "track": null
  },
  "list_of_ints": {
    "error_msg": "'int' object is not iterable",
    "error_type": "TypeError",
    "index": null,
    "track": null
  },
  "recursive_model": {
    "error_msg": "cannot convert dictionary update sequence element #0 to a sequence",
    "error_type": "TypeError",
    "index": null,
    "track": "Location"
  }
}
"""

(This script is complete, it should run “as is”)

Exotic Types

pydantic comes with a number of utilities for parsing or validating common objects.

from pathlib import Path

from pydantic import DSN, BaseModel, EmailStr, NameEmail, PyObject, conint, constr, PositiveInt, NegativeInt


class Model(BaseModel):
    cos_function: PyObject = None
    path_to_something: Path = None

    short_str: constr(min_length=2, max_length=10) = None
    regex_str: constr(regex='apple (pie|tart|sandwich)') = None

    big_int: conint(gt=1000, lt=1024) = None
    pos_int: PositiveInt = None
    neg_int: NegativeInt = None

    email_address: EmailStr = None
    email_and_name: NameEmail = None

    db_name = 'foobar'
    db_user = 'postgres'
    db_password: str = None
    db_host = 'localhost'
    db_port = '5432'
    db_driver = 'postgres'
    db_query: dict = None
    dsn: DSN = None

m = Model(
    cos_function='math.cos',
    path_to_something='/home',
    short_str='foo',
    regex_str='apple pie',
    big_int=1001,
    pos_int=1,
    neg_int=-1,
    email_address='Samuel Colvin <s@muelcolvin.com >',
    email_and_name='Samuel Colvin <s@muelcolvin.com >',
)
print(m.values())
"""
{
    'cos_function': <built-in function cos>,
    'path_to_something': PosixPath('/home'),
    'short_str': 'foo', 'regex_str': 'apple pie',
    'big_int': 1001,
    'pos_int': 1,
    'neg_int': -1,
    'email_address': 's@muelcolvin.com',
    'email_and_name': <NameEmail("Samuel Colvin <s@muelcolvin.com>")>,
    ...
    'dsn': 'postgres://postgres@localhost:5432/foobar'
}
"""

(This script is complete, it should run “as is”)

Helper Functions

Pydantic provides three classmethod helper functions on models for parsing data:

parse_obj:this is almost identical to the __init__ method of the model except if the object passed is not a dict ValidationError will be raised (rather than python raising a TypeError).
parse_raw:takes a str or bytes parses it as json, msgpack or pickle data and then passes the result to parse_obj. The data type is inferred from the content_type argument, otherwise json is assumed.
parse_file:reads a file and passes the contents to parse_raw, if content_type is omitted it is inferred from the file’s extension.
import pickle
from pathlib import Path
from datetime import datetime
import msgpack
from pydantic import BaseModel, ValidationError

class User(BaseModel):
    id: int
    name = 'John Doe'
    signup_ts: datetime = None

m = User.parse_obj({'id': 123, 'name': 'James'})
print(m)
# > User id=123 name='James' signup_ts=None

try:
    User.parse_obj(['not', 'a', 'dict'])
except ValidationError as e:
    print(e)
# > error validating input
# > User expected dict not list (error_type=TypeError)

m = User.parse_raw('{"id": 123, "name": "James"}')  # assumes json as no content type passed
print(m)
# > User id=123 name='James' signup_ts=None

msgpack_data = msgpack.packb({'id': 123, 'name': 'James', 'signup_ts': 1500000000})
m = User.parse_raw(msgpack_data, content_type='application/msgpack')
print(m)
# > User id=123 name='James' signup_ts=datetime.datetime(2017, 7, 14, 2, 40, tzinfo=datetime.timezone.utc)


pickle_data = pickle.dumps({'id': 123, 'name': 'James', 'signup_ts': datetime(2017, 7, 14)})
m = User.parse_raw(pickle_data, content_type='application/pickle', allow_pickle=True)
print(m)
# > User id=123 name='James' signup_ts=datetime.datetime(2017, 7, 14, 0, 0)


Path('/tmp/data.mp').write_bytes(msgpack_data)
# data.json: {"id": 123, "name": "James"}
m = User.parse_file('/tmp/data.mp')
print(m)
# > User id=123 name='James' signup_ts=datetime.datetime(2017, 7, 14, 2, 40, tzinfo=datetime.timezone.utc)

(This script is complete, it should run “as is” provided msgpack-python is installed)

Note

Since pickle allows complex objects to be encoded, to use it you need to explicitly pass allow_pickle to the parsing function.

Model Config

Behaviour of pydantic can be controlled via the Config class on a model.

Options:

min_anystr_length:
 min length for str & byte types (default: 0)
max_anystr_length:
 max length for str & byte types (default: 2 ** 16)
min_number_size:
 min size for numbers (default: -2 ** 64)
max_number_size:
 max size for numbers (default: 2 ** 64)
validate_all:whether or not to validate field defaults (default: False)
ignore_extra:whether to ignore any extra values in input data (default: True)
allow_extra:whether or not too allow (and include on the model) any extra values in input data (default: False)
allow_mutation:whether or not models are faux-immutable, e.g. __setattr__ fails (default: True)
fields:extra information on each field, currently just “alias” is allowed (default: None)
from pydantic import BaseModel, ValidationError


class Model(BaseModel):
    v: str

    class Config:
        max_anystr_length = 10


try:
    Model(v='x' * 20)
except ValidationError as e:
    print(e)
"""
error validating input
v:
  length not in range 0 to 10 (error_type=ValueError track=str)
"""

(This script is complete, it should run “as is”)

Settings

One of pydantic’s most useful applications is to define default settings, allow them to be overridden by environment variables or keyword arguments (e.g. in unit tests).

This usage example comes last as it uses numerous concepts described above.

from pydantic import DSN, BaseSettings, PyObject


class Settings(BaseSettings):
    redis_host = 'localhost'
    redis_port = 6379
    redis_database = 0
    redis_password: str = None

    auth_key: str = ...

    invoicing_cls: PyObject = 'path.to.Invoice'

    db_name = 'foobar'
    db_user = 'postgres'
    db_password: str = None
    db_host = 'localhost'
    db_port = '5432'
    db_driver = 'postgres'
    db_query: dict = None
    dsn: DSN = None

    class Config:
        env_prefix = 'MY_PREFIX_'  # defaults to 'APP_'
        fields = {
            'auth_key': {
                'alias': 'my_api_key'
            }
        }

(This script is complete, it should run “as is”)

Here redis_port could be modified via export MY_PREFIX_REDIS_PORT=6380 or auth_key by export my_api_key=6380.

Usage with mypy

Pydantic works with mypy provided you use the “annotation only” version of required variables:

from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel, NoneStr

class Model(BaseModel):
    age: int
    first_name = 'John'
    last_name: NoneStr = None
    signup_ts: Optional[datetime] = None
    list_of_ints: List[int]

m = Model(age=42, list_of_ints=[1, '2', b'3'])
print(m.age)
# > 42

Model()
# will raise a validation error for age and list_of_ints

(This script is complete, it should run “as is”)

This script is complete, it should run “as is”. You can also run it through mypy with:

mypy --ignore-missing-imports --follow-imports=skip --strict-optional pydantic_mypy_test.py

Strict Optional

For your code to pass with --strict-optional you need to to use Optional[] or an alias of Optional[] for all fields with None default, this is standard with mypy.

Pydantic provides a few useful optional or union types:

  • NoneStr aka. Optional[str]
  • NoneBytes aka. Optional[bytes]
  • StrBytes aka. Union[str, bytes]
  • NoneStrBytes aka. Optional[StrBytes]

If these aren’t sufficient you can of course define your own.

Required Fields and mypy

The ellipsis notation ... will not work with mypy, you need to use annotation only fields as in the example above.

Warning

Be aware that using annotation only fields will alter the order of your fields in metadata and errors: annotation only fields will always come first, but still in the order they were defined.

To get round this you can use the Required (via from pydantic import Required) field as an alias for ellipses or annotation only.

Faux Immutability

Models can be configured to be immutable via allow_mutation = False this will prevent changing attributes of a model.

Warning

Immutability in python is never strict. If developers are determined/stupid they can always modify a so-called “immutable” object.

from pydantic import BaseModel


class FooBarModel(BaseModel):
    a: str
    b: dict

    class Config:
        allow_mutation = False


foobar = FooBarModel(a='hello', b={'apple': 'pear'})

try:
    foobar.a = 'different'
except TypeError as e:
    print(e)
    # > "FooBarModel" is immutable and does not support item assignment

print(foobar.a)
# > hello

print(foobar.b)
# > {'apple': 'pear'}

foobar.b['apple'] = 'grape'
print(foobar.b)
# > {'apple': 'grape'}

Trying to change a caused an error and it remains unchanged, however the dict b is mutable and the immutability of foobar doesn’t stop being changed.

Copy and Values

The values function returns a dict containing the attributes of a model sub-model are recursively converted to dicts.

While copy allows models to be duplicated, this is particularly useful for immutable models.

Both values and copy take the optional include and exclude keyword arguments to control which attributes are return/copied. copy allows an extra keyword argument update allowing attributes to be modified as the model is duplicated.

from pydantic import BaseModel


class BarModel(BaseModel):
    whatever: int


class FooBarModel(BaseModel):
    banana: float
    foo: str
    bar: BarModel


m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})

print(m.values())
# > {'banana': 3.14, 'foo': 'hello', 'bar': {'whatever': 123}}

print(m.values(include={'foo', 'bar'}))
# > {'foo': 'hello', 'bar': {'whatever': 123}}

print(m.values(exclude={'foo', 'bar'}))
# > {'banana': 3.14}

print(m.copy())
# > FooBarModel banana=3.14 foo='hello' bar=<BarModel whatever=123>

print(m.copy(include={'foo', 'bar'}))
# > FooBarModel foo='hello' bar=<BarModel whatever=123>

print(m.copy(exclude={'foo', 'bar'}))
# > FooBarModel banana=3.14

print(m.copy(update={'banana': 0}))
# > FooBarModel banana=0 foo='hello' bar=<BarModel whatever=123>

Pickle

Using the same plumbing as copy() pydantic models support efficient pickling and unpicking.

import pickle
from pydantic import BaseModel


class FooBarModel(BaseModel):
    a: str
    b: int


m = FooBarModel(a='hello', b=123)
print(m)
# > FooBarModel a='hello' b=123

data = pickle.dumps(m)
print(data)
# > b'\x80\x03c...'

m2 = pickle.loads(data)
print(m2)
# > FooBarModel a='hello' b=123

History

v0.4.0 (2017-07-08)

  • show length in string validation error
  • fix aliases in config during inheritance #55
  • simplify error display
  • use unicode ellipsis in truncate
  • add parse_obj, parse_raw and parse_file helper functions #58
  • switch annotation only fields to come first in fields list not last

v0.3.0 (2017-06-21)

  • immutable models via config.allow_mutation = False, associated cleanup and performance improvement #44
  • immutable helper methods construct() and copy() #53
  • allow pickling of models #53
  • setattr is removed as __setattr__ is now intelligent #44
  • raise_exception removed, Models now always raise exceptions #44
  • instance method validators removed
  • django-restful-framework benchmarks added #47
  • fix inheritance bug #49
  • make str type stricter so list, dict etc are not coerced to strings. #52
  • add StrictStr which only always strings as input #52

v0.2.1 (2017-06-07)

  • pypi and travis together messed up the deploy of v0.2 this should fix it

v0.2.0 (2017-06-07)

  • breaking change: values() on a model is now a method not a property, takes include and exclude arguments
  • allow annotation only fields to support mypy
  • add pretty to_string(pretty=True) method for models

v0.1.0 (2017-06-03)

  • add docs
  • add history