pydantic

pypi license

Current Version: v0.6.3

Data validation and settings management using python 3.6 type hinting.

Define how data should be in pure, canonical python; validate it with pydantic.

PEP 484 introduced type hinting into python 3.5, PEP 526 extended that with syntax for variable annotation in python 3.6.

pydantic uses those annotations to validate that untrusted data takes the form you want.

Example:

from datetime import datetime
from typing import List
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name = 'John Doe'
    signup_ts: datetime = None
    friends: List[int] = []

external_data = {'id': '123', 'signup_ts': '2017-06-01 12:22', 'friends': [1, '2', b'3']}
user = User(**external_data)
print(user)
# > User id=123 name='John Doe' signup_ts=datetime.datetime(2017, 6, 1, 12, 22) friends=[1, 2, 3]
print(user.id)
# > 123

(This script is complete, it should run “as is”)

What’s going on here:

  • id is of type int; the annotation only declaration tells pydantic that this field is required. Strings, bytes or floats will be coerced to ints if possible, otherwise an exception would be raised.
  • name is inferred as a string from the default, it is not required as it has a default.
  • signup_ts is a datetime field which is not required (None if it’s not supplied), pydantic will process either a unix timestamp int (e.g. 1496498400) or a string representing the date & time.
  • friends uses python’s typing system, it is required to be a list of integers, as with id integer-like objects will be converted to integers.

If validation fails pydantic with raise an error with a breakdown of what was wrong:

from pydantic import ValidationError
try:
    User(signup_ts='broken', friends=[1, 2, 'not number'])
except ValidationError as e:
    print(e.json())

"""
{
  "friends": [
    {
      "error_msg": "invalid literal for int() with base 10: 'not number'",
      "error_type": "ValueError",
      "index": 2,
      "track": "int"
    }
  ],
  "id": {
    "error_msg": "field required",
    "error_type": "Missing"
  },
  "signup_ts": {
    "error_msg": "Invalid datetime format",
    "error_type": "ValueError",
    "track": "datetime"
  }
}
"""

Rationale

So pydantic uses some cool new language feature, but why should I actually go an use it?

no brainfuck
no new schema definition micro-language to learn. If you know python (and perhaps skim read the type hinting docs) you know how to use pydantic.
plays nicely with your IDE/linter/brain
because pydantic data structures are just instances of classes you define; auto-completion, linting, mypy and your intuition should all work properly with your validated data.
dual use
pydantic’s BaseSettings class allows it to be used in both a “validate this request data” context and “load my system settings” context. The main difference being that system settings can have defaults changed by environment variables and more complex objects like DSNs and python objects are often required.
fast
In benchmarks pydantic is faster than all other tested libraries.
validate complex structures
use of recursive pydantic models, typing’s List and Dict etc. and validators allow complex data schemas to be clearly and easily defined can then checked.
extendible
pydantic allows custom data types to be defined or you can extend validation with methods on a model decorated with the validator decorator.

Install

Just:

pip install pydantic

pydantic has no required dependencies except python 3.6+. If you’ve got python 3.6 and pip installed - you’re good to go.

If you want pydantic to parse msgpack you can add msgpack-python as an optional dependency, same goes for reading json faster with ujson:

pip install pydantic[msgpack]
# or
pip install pydantic[ujson]
# or just
pip install pydantic[msgpack,ujson]

Usage

PEP 484 Types

pydantic uses typing types to define more complex objects.

from typing import Dict, List, Optional, Union, Set

from pydantic import BaseModel


class Model(BaseModel):
    simple_list: list = None
    list_of_ints: List[int] = None

    simple_dict: dict = None
    dict_str_float: Dict[str, float] = None

    simple_set: set = None
    set_bytes: Set[bytes] = None

    str_or_bytes: Union[str, bytes] = None
    none_or_str: Optional[str] = None

    compound: Dict[Union[str, bytes], List[Set[int]]] = None

print(Model(simple_list=['1', '2', '3']).simple_list)  # > ['1', '2', '3']
print(Model(list_of_ints=['1', '2', '3']).list_of_ints)  # > [1, 2, 3]

print(Model(simple_dict={'a': 1, b'b': 2}).simple_dict)  # > {'a': 1, b'b': 2}
print(Model(dict_str_float={'a': 1, b'b': 2}).dict_str_float)  # > {'a': 1.0, 'b': 2.0}

(This script is complete, it should run “as is”)

Choices

pydantic uses python’s standard enum classes to define choices.

from enum import Enum, IntEnum

from pydantic import BaseModel


class FruitEnum(str, Enum):
    pear = 'pear'
    banana = 'banana'


class ToolEnum(IntEnum):
    spanner = 1
    wrench = 2


class CookingModel(BaseModel):
    fruit: FruitEnum = FruitEnum.pear
    tool: ToolEnum = ToolEnum.spanner


print(CookingModel())
# > CookingModel fruit=<FruitEnum.pear: 'pear'> tool=<ToolEnum.spanner: 1>
print(CookingModel(tool=2, fruit='banana'))
# > CookingModel fruit=<FruitEnum.banana: 'banana'> tool=<ToolEnum.wrench: 2>
print(CookingModel(fruit='other'))
# will raise a validation error

(This script is complete, it should run “as is”)

Validators

Custom validation and complex relationships between objects can achieved using the validator decorator.

from pydantic import BaseModel, ValidationError, validator


class UserModel(BaseModel):
    name: str
    password1: str
    password2: str

    @validator('name')
    def name_must_contain_space(cls, v):
        if ' ' not in v:
            raise ValueError('must contain a space')
        return v.title()

    @validator('password2')
    def passwords_match(cls, v, values, **kwargs):
        if 'password1' in values and v != values['password1']:
            raise ValueError('passwords do not match')
        return v


print(UserModel(name='samuel colvin', password1='zxcvbn', password2='zxcvbn'))
# > UserModel name='Samuel Colvin' password1='zxcvbn' password2='zxcvbn'

try:
    UserModel(name='samuel', password1='zxcvbn', password2='zxcvbn2')
except ValidationError as e:
    print(e)
"""
2 errors validating input
name:
  must contain a space (error_type=ValueError track=str)
password2:
  passwords do not match (error_type=ValueError track=str)
"""

(This script is complete, it should run “as is”)

A few things to note on validators:

  • validators are “class methods”, the first value they receive here will be the UserModel not an instance of UserModel
  • their signature can with be (cls, value) or (cls, value, *, values, config, field)
  • validator should either return the new value or raise a ValueError or TypeError
  • where validators rely on other values, you should be aware that:
    • Validation is done in the order fields are defined, eg. here password2 has access to password1 (and name), but password1 does not have access to password2. You should heed the warning below regarding field order and required fields.
    • If validation fails on another field (or that field is missing) it will not be included in values, hence if 'password1' in values and ... in this example.

Pre and Whole Validators

Validators can do a few more complex things:

import json
from typing import List, Set

from pydantic import BaseModel, ValidationError, validator


class DemoModel(BaseModel):
    numbers: List[int] = []
    people: List[str] = []

    @validator('people', 'numbers', pre=True, whole=True)
    def json_decode(cls, v):
        if isinstance(v, str):
            try:
                return json.loads(v)
            except ValueError:
                pass
        return v

    @validator('numbers')
    def check_numbers_low(cls, v):
        if v > 4:
            raise ValueError(f'number to large {v} > 4')
        return v

    @validator('numbers', whole=True)
    def check_sum_numbers_low(cls, v):
        if sum(v) > 8:
            raise ValueError(f'sum of numbers greater than 8')
        return v


print(DemoModel(numbers='[1, 1, 2, 2]'))
# > DemoModel numbers=[1, 1, 2, 2] people=[]

try:
    DemoModel(numbers='[1, 2, 5]')
except ValidationError as e:
    print(e)
"""
error validating input
numbers:
  number to large 5 > 4 (error_type=ValueError track=int index=2)
"""

try:
    DemoModel(numbers=[3, 3, 3])
except ValidationError as e:
    print(e)
"""
error validating input
numbers:
  sum of numbers greater than 8 (error_type=ValueError track=int)
"""

(This script is complete, it should run “as is”)

A few more things to note:

  • a single validator can apply to multiple fields
  • the keyword argument pre will cause validators to be called prior to other validation
  • the whole keyword argument will mean validators are applied to entire objects rather than individual values (applies for complex typing objects eg. List, Dict, Set)

Validate Always

For performance reasons by default validators are not called for fields where the value is not supplied. However there are situations where it’s useful or required to always call the validator, e.g. to set a dynamic default value.

from datetime import datetime

from pydantic import BaseModel, validator


class DemoModel(BaseModel):
    ts: datetime = None

    @validator('ts', pre=True, always=True)
    def set_ts_now(cls, v):
        return v or datetime.now()


print(DemoModel())
# > DemoModel ts=datetime.datetime(2017, 11, 8, 13, 59, 11, 723629)

print(DemoModel(ts='2017-11-08T14:00'))
# > DemoModel ts=datetime.datetime(2017, 11, 8, 14, 0)

(This script is complete, it should run “as is”)

You’ll often want to use this together with pre since otherwise the with always=True _pydantic_ would try to validate the default None which would cause an error.

Recursive Models

More complex hierarchical data structures can be defined using models as types in annotations themselves.

The ellipsis ... just means “Required” same as annotation only declarations above.

from typing import List
from pydantic import BaseModel

class Foo(BaseModel):
    count: int = ...
    size: float = None

class Bar(BaseModel):
    apple = 'x'
    banana = 'y'

class Spam(BaseModel):
    foo: Foo = ...
    bars: List[Bar] = ...


m = Spam(foo={'count': 4}, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
print(m)
# > Spam foo=<Foo count=4 size=None> bars=[<Bar apple='x1' banana='y'>, <Bar apple='x2' banana='y'>]
print(m.dict())
# {'foo': {'count': 4, 'size': None}, 'bars': [{'apple': 'x1', 'banana': 'y'}, {'apple': 'x2', 'banana': 'y'}]}

(This script is complete, it should run “as is”)

Error Handling

from typing import List

from pydantic import BaseModel, ValidationError


class Location(BaseModel):
    lat = 0.1
    lng = 10.1

class Model(BaseModel):
    list_of_ints: List[int] = None
    a_float: float = None
    is_required: float = ...
    recursive_model: Location = None

try:
    Model(list_of_ints=['1', 2, 'bad'], a_float='not a float', recursive_model={'lat': 4.2, 'lng': 'New York'})
except ValidationError as e:
    print(e)

"""
4 errors validating input
list_of_ints:
  invalid literal for int() with base 10: 'bad' (error_type=ValueError track=int index=2)
a_float:
  could not convert string to float: 'not a float' (error_type=ValueError track=float)
is_required:
  field required (error_type=Missing)
recursive_model:
  error validating input (error_type=ValidationError track=Location)
    lng:
      could not convert string to float: 'New York' (error_type=ValueError track=float
"""

try:
    Model(list_of_ints=1, a_float=None, recursive_model=[1, 2, 3])
except ValidationError as e:
    print(e.json())

"""
{
  "is_required": {
    "error_msg": "field required",
    "error_type": "Missing"
  },
  "list_of_ints": {
    "error_msg": "'int' object is not iterable",
    "error_type": "TypeError"
  },
  "recursive_model": {
    "error_msg": "cannot convert dictionary update sequence element #0 to a sequence",
    "error_type": "TypeError",
    "track": "Location"
  }
}
"""

(This script is complete, it should run “as is”)

Exotic Types

pydantic comes with a number of utilities for parsing or validating common objects.

from pathlib import Path
from uuid import UUID

from pydantic import (DSN, BaseModel, EmailStr, NameEmail, PyObject, conint,
                      constr, PositiveInt, NegativeInt)


class Model(BaseModel):
    cos_function: PyObject = None
    path_to_something: Path = None

    short_str: constr(min_length=2, max_length=10) = None
    regex_str: constr(regex='apple (pie|tart|sandwich)') = None

    big_int: conint(gt=1000, lt=1024) = None
    pos_int: PositiveInt = None
    neg_int: NegativeInt = None

    email_address: EmailStr = None
    email_and_name: NameEmail = None

    db_name = 'foobar'
    db_user = 'postgres'
    db_password: str = None
    db_host = 'localhost'
    db_port = '5432'
    db_driver = 'postgres'
    db_query: dict = None
    dsn: DSN = None
    uuid: UUID = None

m = Model(
    cos_function='math.cos',
    path_to_something='/home',
    short_str='foo',
    regex_str='apple pie',
    big_int=1001,
    pos_int=1,
    neg_int=-1,
    email_address='Samuel Colvin <s@muelcolvin.com >',
    email_and_name='Samuel Colvin <s@muelcolvin.com >',
    uuid='ebcdab58-6eb8-46fb-a190-d07a33e9eac8'
)
print(m.dict())
"""
{
    'cos_function': <built-in function cos>,
    'path_to_something': PosixPath('/home'),
    'short_str': 'foo', 'regex_str': 'apple pie',
    'big_int': 1001,
    'pos_int': 1,
    'neg_int': -1,
    'email_address': 's@muelcolvin.com',
    'email_and_name': <NameEmail("Samuel Colvin <s@muelcolvin.com>")>,
    ...
    'dsn': 'postgres://postgres@localhost:5432/foobar',
    'uuid': UUID('ebcdab58-6eb8-46fb-a190-d07a33e9eac8'),
}
"""

(This script is complete, it should run “as is”)

Helper Functions

Pydantic provides three classmethod helper functions on models for parsing data:

parse_obj:this is almost identical to the __init__ method of the model except if the object passed is not a dict ValidationError will be raised (rather than python raising a TypeError).
parse_raw:takes a str or bytes parses it as json, msgpack or pickle data and then passes the result to parse_obj. The data type is inferred from the content_type argument, otherwise json is assumed.
parse_file:reads a file and passes the contents to parse_raw, if content_type is omitted it is inferred from the file’s extension.
import pickle
from pathlib import Path
from datetime import datetime
import msgpack
from pydantic import BaseModel, ValidationError

class User(BaseModel):
    id: int
    name = 'John Doe'
    signup_ts: datetime = None

m = User.parse_obj({'id': 123, 'name': 'James'})
print(m)
# > User id=123 name='James' signup_ts=None

try:
    User.parse_obj(['not', 'a', 'dict'])
except ValidationError as e:
    print(e)
# > error validating input
# > User expected dict not list (error_type=TypeError)

m = User.parse_raw('{"id": 123, "name": "James"}')  # assumes json as no content type passed
print(m)
# > User id=123 name='James' signup_ts=None

msgpack_data = msgpack.packb({'id': 123, 'name': 'James', 'signup_ts': 1500000000})
m = User.parse_raw(msgpack_data, content_type='application/msgpack')
print(m)
# > User id=123 name='James' signup_ts=datetime.datetime(2017, 7, 14, 2, 40, tzinfo=datetime.timezone.utc)


pickle_data = pickle.dumps({'id': 123, 'name': 'James', 'signup_ts': datetime(2017, 7, 14)})
m = User.parse_raw(pickle_data, content_type='application/pickle', allow_pickle=True)
print(m)
# > User id=123 name='James' signup_ts=datetime.datetime(2017, 7, 14, 0, 0)


Path('/tmp/data.mp').write_bytes(msgpack_data)
# data.json: {"id": 123, "name": "James"}
m = User.parse_file('/tmp/data.mp')
print(m)
# > User id=123 name='James' signup_ts=datetime.datetime(2017, 7, 14, 2, 40, tzinfo=datetime.timezone.utc)

(This script is complete, it should run “as is” provided msgpack-python is installed)

Note

Since pickle allows complex objects to be encoded, to use it you need to explicitly pass allow_pickle to the parsing function.

Model Config

Behaviour of pydantic can be controlled via the Config class on a model.

Options:

min_anystr_length:
 min length for str & byte types (default: 0)
max_anystr_length:
 max length for str & byte types (default: 2 ** 16)
min_number_size:
 min size for numbers (default: -2 ** 64)
max_number_size:
 max size for numbers (default: 2 ** 64)
validate_all:whether or not to validate field defaults (default: False)
ignore_extra:whether to ignore any extra values in input data (default: True)
allow_extra:whether or not too allow (and include on the model) any extra values in input data (default: False)
allow_mutation:whether or not models are faux-immutable, e.g. __setattr__ fails (default: True)
fields:extra information on each field, currently just “alias” is allowed (default: None)
validate_assignment:
 whether to perform validation on assignment to attributes or not (default: False)
from pydantic import BaseModel, ValidationError


class Model(BaseModel):
    v: str

    class Config:
        max_anystr_length = 10


try:
    Model(v='x' * 20)
except ValidationError as e:
    print(e)
"""
error validating input
v:
  length not in range 0 to 10 (error_type=ValueError track=str)
"""

(This script is complete, it should run “as is”)

Settings

One of pydantic’s most useful applications is to define default settings, allow them to be overridden by environment variables or keyword arguments (e.g. in unit tests).

This usage example comes last as it uses numerous concepts described above.

from typing import Set

from pydantic import BaseModel, DSN, BaseSettings, PyObject


class SubModel(BaseModel):
    foo = 'bar'
    apple = 1


class Settings(BaseSettings):
    redis_host = 'localhost'
    redis_port = 6379
    redis_database = 0
    redis_password: str = None

    auth_key: str = ...

    invoicing_cls: PyObject = 'path.to.Invoice'

    db_name = 'foobar'
    db_user = 'postgres'
    db_password: str = None
    db_host = 'localhost'
    db_port = '5432'
    db_driver = 'postgres'
    db_query: dict = None
    dsn: DSN = None

    # to override domains:
    # export MY_PREFIX_DOMAINS = '["foo.com", "bar.com"]'
    domains: Set[str] = set()

    # to override more_settings:
    # export MY_PREFIX_MORE_SETTINGS = '{"foo": "x", "apple": 1}'
    more_settings: SubModel = SubModel()

    class Config:
        env_prefix = 'MY_PREFIX_'  # defaults to 'APP_'
        fields = {
            'auth_key': {
                'alias': 'my_api_key'
            }
        }

(This script is complete, it should run “as is”)

Here redis_port could be modified via export MY_PREFIX_REDIS_PORT=6380 or auth_key by export my_api_key=6380.

Complex types like list, set, dict and submodels can be set by using JSON environment variables.

Usage with mypy

Pydantic works with mypy provided you use the “annotation only” version of required variables:

from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel, NoneStr

class Model(BaseModel):
    age: int
    first_name = 'John'
    last_name: NoneStr = None
    signup_ts: Optional[datetime] = None
    list_of_ints: List[int]

m = Model(age=42, list_of_ints=[1, '2', b'3'])
print(m.age)
# > 42

Model()
# will raise a validation error for age and list_of_ints

(This script is complete, it should run “as is”)

This script is complete, it should run “as is”. You can also run it through mypy with:

mypy --ignore-missing-imports --follow-imports=skip --strict-optional pydantic_mypy_test.py

Strict Optional

For your code to pass with --strict-optional you need to to use Optional[] or an alias of Optional[] for all fields with None default, this is standard with mypy.

Pydantic provides a few useful optional or union types:

  • NoneStr aka. Optional[str]
  • NoneBytes aka. Optional[bytes]
  • StrBytes aka. Union[str, bytes]
  • NoneStrBytes aka. Optional[StrBytes]

If these aren’t sufficient you can of course define your own.

Required Fields and mypy

The ellipsis notation ... will not work with mypy, you need to use annotation only fields as in the example above.

Warning

Be aware that using annotation only fields will alter the order of your fields in metadata and errors: annotation only fields will always come first, but still in the order they were defined.

To get round this you can use the Required (via from pydantic import Required) field as an alias for ellipses or annotation only.

Faux Immutability

Models can be configured to be immutable via allow_mutation = False this will prevent changing attributes of a model.

Warning

Immutability in python is never strict. If developers are determined/stupid they can always modify a so-called “immutable” object.

from pydantic import BaseModel


class FooBarModel(BaseModel):
    a: str
    b: dict

    class Config:
        allow_mutation = False


foobar = FooBarModel(a='hello', b={'apple': 'pear'})

try:
    foobar.a = 'different'
except TypeError as e:
    print(e)
    # > "FooBarModel" is immutable and does not support item assignment

print(foobar.a)
# > hello

print(foobar.b)
# > {'apple': 'pear'}

foobar.b['apple'] = 'grape'
print(foobar.b)
# > {'apple': 'grape'}

Trying to change a caused an error and it remains unchanged, however the dict b is mutable and the immutability of foobar doesn’t stop being changed.

Copy and Values

The values function returns a dict containing the attributes of a model sub-model are recursively converted to dicts.

While copy allows models to be duplicated, this is particularly useful for immutable models.

Both values and copy take the optional include and exclude keyword arguments to control which attributes are return/copied. copy allows an extra keyword argument update allowing attributes to be modified as the model is duplicated.

from pydantic import BaseModel


class BarModel(BaseModel):
    whatever: int


class FooBarModel(BaseModel):
    banana: float
    foo: str
    bar: BarModel


m = FooBarModel(banana=3.14, foo='hello', bar={'whatever': 123})

print(m.dict())
# > {'banana': 3.14, 'foo': 'hello', 'bar': {'whatever': 123}}

print(m.dict(include={'foo', 'bar'}))
# > {'foo': 'hello', 'bar': {'whatever': 123}}

print(m.dict(exclude={'foo', 'bar'}))
# > {'banana': 3.14}

print(m.copy())
# > FooBarModel banana=3.14 foo='hello' bar=<BarModel whatever=123>

print(m.copy(include={'foo', 'bar'}))
# > FooBarModel foo='hello' bar=<BarModel whatever=123>

print(m.copy(exclude={'foo', 'bar'}))
# > FooBarModel banana=3.14

print(m.copy(update={'banana': 0}))
# > FooBarModel banana=0 foo='hello' bar=<BarModel whatever=123>

Pickle

Using the same plumbing as copy() pydantic models support efficient pickling and unpicking.

import pickle
from pydantic import BaseModel


class FooBarModel(BaseModel):
    a: str
    b: int


m = FooBarModel(a='hello', b=123)
print(m)
# > FooBarModel a='hello' b=123

data = pickle.dumps(m)
print(data)
# > b'\x80\x03c...'

m2 = pickle.loads(data)
print(m2)
# > FooBarModel a='hello' b=123

Benchmarks

Below are the results of crude benchmarks comparing pydantic to other validation libraries.

Package Mean deserialization time std. dev.
pydantic 25.5μs 0.313μs
toasted-marshmallow 38.2μs 0.153μs
marshmallow 47.3μs 0.256μs
trafaret 50.7μs 0.201μs
django-restful-framework 207.5μs 3.252μs

(See the benchmarks code for more details on the test case. Feel free to submit more benchmarks or improve an existing one.)

History

v0.6.3 (2017-11-26)

  • fix direct install without README.rst present

v0.6.2 (2017-11-13)

  • errors for invalid validator use
  • safer check for complex models in Settings

v0.6.1 (2017-11-08)

  • prevent duplicate validators, #101
  • add always kwarg to validators, #102

v0.6.0 (2017-11-07)

  • assignment validation #94, thanks petroswork!
  • JSON in environment variables for complex types, #96
  • add validator decorators for complex validation, #97
  • depreciate values(...) and replace with .dict(...), #99

v0.5.0 (2017-10-23)

  • add UUID validation #89
  • remove index and track from error object (json) if they’re null #90
  • improve the error text when a list is provided rather than a dict #90
  • add benchmarks table to docs #91

v0.4.0 (2017-07-08)

  • show length in string validation error
  • fix aliases in config during inheritance #55
  • simplify error display
  • use unicode ellipsis in truncate
  • add parse_obj, parse_raw and parse_file helper functions #58
  • switch annotation only fields to come first in fields list not last

v0.3.0 (2017-06-21)

  • immutable models via config.allow_mutation = False, associated cleanup and performance improvement #44
  • immutable helper methods construct() and copy() #53
  • allow pickling of models #53
  • setattr is removed as __setattr__ is now intelligent #44
  • raise_exception removed, Models now always raise exceptions #44
  • instance method validators removed
  • django-restful-framework benchmarks added #47
  • fix inheritance bug #49
  • make str type stricter so list, dict etc are not coerced to strings. #52
  • add StrictStr which only always strings as input #52

v0.2.1 (2017-06-07)

  • pypi and travis together messed up the deploy of v0.2 this should fix it

v0.2.0 (2017-06-07)

  • breaking change: values() on a model is now a method not a property, takes include and exclude arguments
  • allow annotation only fields to support mypy
  • add pretty to_string(pretty=True) method for models

v0.1.0 (2017-06-03)

  • add docs
  • add history