How to use DynamoDB with Python type hints

DynamoDB logo
DynamoDB logo

Introduction

Python has had type hints since version 3.5, released way back in September 2015.

Since then, some pretty cool projects have spun up, like mypyc, which "uses standard Python type hints to generate fast code", and Microsoft's Pyright, a static type checker that plays nicely with their Visual Studio Code editor.

Back in 2019, Airbnb mentioned that around 38% of their bugs would have been prevented if they'd been using types in their codebase all along (HackerNews discussion). Although Airbnb was talking about JavaScript bugs that could have been prevented by using TypeScript, the general theme of types preventing errors by catching type issues before they're exposed in runtime holds true for Python, too.

Type Support

When reading AWS's Python Code Samples for Amazon DynamoDB or the boto3 documentation, it isn't clear how to use DynamoDB with Python types because neither source uses type hints. There are third-party projects that look to fill this gap, like boto3-stubs, part of the mypy_boto3_builder project.

boto3-stubs gives us quite a bit of type information and can help speed up development by providing auto-completion in your code editor, but it doesn't know exactly what we're putting into or getting out of a DynamoDB table.

Let's take a look at the default type for a DynamoDB item, taken here from the return value of table.get_item():

"Item": Dict[
    str,
    Union[
        bytes,
        bytearray,
        str,
        int,
        Decimal,
        bool,
        Set[int],
        Set[Decimal],
        Set[str],
        Set[bytes],
        Set[bytearray],
        Sequence[Any],
        Mapping[str, Any],
        None,
    ],
],

So a DynamoDB item is a dict with strings for keys and values that could be almost anything. Maybe it's an int, or maybe it's a Set of 'em, or maybe even some kind of map of str to int, or, if you can believe it, a sequence of Any! Why not?!

So how can we type individual DynamoDB items? Read on, dear reader. Read on.

Typing a DynamoDB Item

In the spirit of schema first design, to start, we'll create a type for our DynamoDB item. As we can see above, a DynamoDB item is returned as a dict, and to properly type that dict, we'll use—contain your surprise—a TypedDict.

A TypedDict "support[s] the use case where a dictionary object has a specific set of string keys, each with a value of a specific type", so we can guarantee both a dict's key names and its value types. This allows us to benefit from typing without having to translate the dict into, say, a Data Class, then back to a dict.

As an example, here's a type for a news article, something you might find in an RSS feed:

# news/types.py
from typing import TypedDict


class NewsItem(TypedDict):
    PK: str
    SK: str
    title: str
    description: str | None
    published: str
    link: str

This gives us a type we can use in application code, but we still need a way to ensure that the type we're putting into and pulling out of a DynamoDB table actually matches that type.

To ensure the DynamoDB item matches our more specific NewsItem type, we'll define some type guards (PEP-647) to do type narrowing. This will ensure that the values contained in a DynamoDB item are the types we expect.

# news/type_guards.py
from typing import Any


class TypeGuardException(Exception):
    pass


def guard_optional_string(value: Any) -> str | None:
    if value is None:
        return value
    if isinstance(value, str):
        return value
    raise TypeGuardException(
        f"Received unexpected type: "
        f"expected str | None but received value of type {type(value)}: {value}"
    )


def guard_string(value: Any) -> str:
    if isinstance(value, str):
        return value

    raise TypeGuardException(
        f"Received unexpected type: "
        f"expected str but received value of type {type(value)}: {value}"
    )

These functions are pretty simple: they take in a value that can be any type, and they either raise an exception or spit out that same value. The key difference is in the return type, like -> str: now our type checker knows exactly what type value is.

Let's see how these type guards work in practice:

>>> guard_optional_string('En garde!') == 'En garde!'
True
>>> guard_optional_string(None) is None
True
>>> guard_optional_string(100) is 100
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/john/Code/blog-examples/dynamodb-python-type-hints/news/type_guards.py", line 15, in guard_optional_string
    raise TypeGuardException(
news.type_guards.TypeGuardException: Received unexpected type: expected str | None but received value of type <class 'int'>: 100

Now that we can guarantee the types of individual fields, we can use these type guards to narrow the entire NewsItem dict:

# news/type_guards.py
from news.types import NewsItem


def guard_news_item(item: dict) -> NewsItem:
    return NewsItem(
        PK=guard_string(item.get("PK")),
        SK=guard_string(item.get("SK")),
        title=guard_string(item.get("title")),
        description=guard_optional_string(item.get("description")),
        published=guard_string(item.get("published")),
        link=guard_string(item.get("link")),
    )

The guard_string() and guard_optional_string() functions will throw if there's an issue initializing a NewsItem, so we don't need to throw a separate exception:

>>> from news.type_guards import guard_news_item
>>> guard_news_item({"PK": "News"})

Because we're trying to create a NewsItem without all the necessary fields, guard_news_item() raises an exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/john/Code/blog-examples/dynamodb-python-type-hints/news/type_guards.py", line 34, in guard_news_item
    SK=guard_string(item.get("SK")),
  File "/home/john/Code/blog-examples/dynamodb-python-type-hints/news/type_guards.py", line 25, in guard_string
    raise TypeGuardException(
news.type_guards.TypeGuardException: Received unexpected type: expected str but received value of type <class 'NoneType'>: None

We can see in the traceback that the error is a TypeGuardException raised because SK is None. That's a required field and needs to be a string, and we'd keep getting type errors for invalid or missing values until the item we created matches NewsItem. Our NewsItem type is safe and sound!

Wrap-up

Sure, Python is a duck-typed language rather than a strictly typed language, but with a bit of upfront work, type guards and TypedDicts can add type safety to your Python AWS project by guaranteeing that your DynamoDB inputs and outputs are the types you expect.

In other words, when it comes to typing, guards give a duck teeth:

Daffy Duck smiling in a mirror showing his teeth

Bonus: abstracting DynamoDB logic into a controller

As a bonus, here's an example of how I'm using this code in a project. The database logic is nicely contained in a controller class, and the class takes a DynamoDB Table instance in its constructor, so it's easy to test the controller logic using unit tests (by passing in a dynalite connection, say).

# news/controllers.py
from typing import Iterator, TypedDict

from boto3.dynamodb.conditions import Key
from mypy_boto3_dynamodb.service_resource import Table

from news.type_guards import guard_news_item
from news.types import NewsItem


class PutNewsItemsResponse(TypedDict):
    saved_item_count: int


class NewsController:
    def __init__(self, dynamo_table: Table):
        self.dynamo_table = dynamo_table

    def get_newest_news_item(self) -> NewsItem | None:
        newest_news_items = self.dynamo_table.query(
            KeyConditionExpression=Key("PK").eq("News"),
            ScanIndexForward=False,
            Limit=1,
        )["Items"]
        if not newest_news_items or not newest_news_items[0]:
            return None
        newest_item = guard_news_item(newest_news_items[0])
        return newest_item

    def put_items(self, items: Iterator[NewsItem]) -> PutNewsItemsResponse:
        saved_item_count = 0
        with self.dynamo_table.batch_writer() as batch:
            for item in items:
                batch.put_item(Item=item)
                saved_item_count += 1
        return {"saved_item_count": saved_item_count}