How to use DynamoDB with Python type hints
Introduction
Python has had type hints since version 3.5, released way back in September 2015.
Since then, some pretty cool projects have spun up, like mypyc, which "uses standard Python type hints to generate fast code", and Microsoft's Pyright, a static type checker that plays nicely with their Visual Studio Code editor.
Back in 2019, Airbnb mentioned that around 38% of their bugs would have been prevented if they'd been using types in their codebase all along (HackerNews discussion). Although Airbnb was talking about JavaScript bugs that could have been prevented by using TypeScript, the general theme of types preventing errors by catching type issues before they're exposed in runtime holds true for Python, too.
Type Support
When reading AWS's Python Code Samples for Amazon DynamoDB or the boto3 documentation, it isn't clear how to use DynamoDB with Python types because neither source uses type hints. There are third-party projects that look to fill this gap, like boto3-stubs
, part of the mypy_boto3_builder
project.
boto3-stubs
gives us quite a bit of type information and can help speed up development by providing auto-completion in your code editor, but it doesn't know exactly what we're putting into or getting out of a DynamoDB table.
Let's take a look at the default type for a DynamoDB item, taken here from the return value of table.get_item()
:
"Item": Dict[
str,
Union[
bytes,
bytearray,
str,
int,
Decimal,
bool,
Set[int],
Set[Decimal],
Set[str],
Set[bytes],
Set[bytearray],
Sequence[Any],
Mapping[str, Any],
None,
],
],
So a DynamoDB item is a dict
with strings for keys and values that could be almost anything. Maybe it's an int
, or maybe it's a Set
of 'em, or maybe even some kind of map of str
to int
, or, if you can believe it, a sequence of Any
! Why not?!
So how can we type individual DynamoDB items? Read on, dear reader. Read on.
Typing a DynamoDB Item
In the spirit of schema first design, to start, we'll create a type for our DynamoDB item. As we can see above, a DynamoDB item is returned as a dict, and to properly type that dict, we'll use—contain your surprise—a TypedDict
.
A TypedDict
"support[s] the use case where a dictionary object has a specific set of string keys, each with a value of a specific type", so we can guarantee both a dict's key names and its value types. This allows us to benefit from typing without having to translate the dict
into, say, a Data Class, then back to a dict
.
As an example, here's a type for a news article, something you might find in an RSS feed:
# news/types.py
from typing import TypedDict
class NewsItem(TypedDict):
PK: str
SK: str
title: str
description: str | None
published: str
link: str
This gives us a type we can use in application code, but we still need a way to ensure that the type we're putting into and pulling out of a DynamoDB table actually matches that type.
To ensure the DynamoDB item matches our more specific NewsItem
type, we'll define some type guards (PEP-647) to do type narrowing. This will ensure that the values contained in a DynamoDB item are the types we expect.
# news/type_guards.py
from typing import Any
class TypeGuardException(Exception):
pass
def guard_optional_string(value: Any) -> str | None:
if value is None:
return value
if isinstance(value, str):
return value
raise TypeGuardException(
f"Received unexpected type: "
f"expected str | None but received value of type {type(value)}: {value}"
)
def guard_string(value: Any) -> str:
if isinstance(value, str):
return value
raise TypeGuardException(
f"Received unexpected type: "
f"expected str but received value of type {type(value)}: {value}"
)
These functions are pretty simple: they take in a value
that can be any type, and they either raise an exception or spit out that same value. The key difference is in the return type, like -> str
: now our type checker knows exactly what type value
is.
Let's see how these type guards work in practice:
>>> guard_optional_string('En garde!') == 'En garde!'
True
>>> guard_optional_string(None) is None
True
>>> guard_optional_string(100) is 100
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/john/Code/blog-examples/dynamodb-python-type-hints/news/type_guards.py", line 15, in guard_optional_string
raise TypeGuardException(
news.type_guards.TypeGuardException: Received unexpected type: expected str | None but received value of type <class 'int'>: 100
Now that we can guarantee the types of individual fields, we can use these type guards to narrow the entire NewsItem
dict:
# news/type_guards.py
from news.types import NewsItem
def guard_news_item(item: dict) -> NewsItem:
return NewsItem(
PK=guard_string(item.get("PK")),
SK=guard_string(item.get("SK")),
title=guard_string(item.get("title")),
description=guard_optional_string(item.get("description")),
published=guard_string(item.get("published")),
link=guard_string(item.get("link")),
)
The guard_string()
and guard_optional_string()
functions will throw if there's an issue initializing a NewsItem
, so we don't need to throw a separate exception:
>>> from news.type_guards import guard_news_item
>>> guard_news_item({"PK": "News"})
Because we're trying to create a NewsItem
without all the necessary fields, guard_news_item()
raises an exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/john/Code/blog-examples/dynamodb-python-type-hints/news/type_guards.py", line 34, in guard_news_item
SK=guard_string(item.get("SK")),
File "/home/john/Code/blog-examples/dynamodb-python-type-hints/news/type_guards.py", line 25, in guard_string
raise TypeGuardException(
news.type_guards.TypeGuardException: Received unexpected type: expected str but received value of type <class 'NoneType'>: None
We can see in the traceback that the error is a TypeGuardException
raised because SK
is None
. That's a required field and needs to be a string, and we'd keep getting type errors for invalid or missing values until the item we created matches NewsItem
. Our NewsItem
type is safe and sound!
Wrap-up
Sure, Python is a duck-typed language rather than a strictly typed language, but with a bit of upfront work, type guards and TypedDict
s can add type safety to your Python AWS project by guaranteeing that your DynamoDB inputs and outputs are the types you expect.
In other words, when it comes to typing, guards give a duck teeth:
Bonus: abstracting DynamoDB logic into a controller
As a bonus, here's an example of how I'm using this code in a project. The database logic is nicely contained in a controller class, and the class takes a DynamoDB Table
instance in its constructor, so it's easy to test the controller logic using unit tests (by passing in a dynalite
connection, say).
# news/controllers.py
from typing import Iterator, TypedDict
from boto3.dynamodb.conditions import Key
from mypy_boto3_dynamodb.service_resource import Table
from news.type_guards import guard_news_item
from news.types import NewsItem
class PutNewsItemsResponse(TypedDict):
saved_item_count: int
class NewsController:
def __init__(self, dynamo_table: Table):
self.dynamo_table = dynamo_table
def get_newest_news_item(self) -> NewsItem | None:
newest_news_items = self.dynamo_table.query(
KeyConditionExpression=Key("PK").eq("News"),
ScanIndexForward=False,
Limit=1,
)["Items"]
if not newest_news_items or not newest_news_items[0]:
return None
newest_item = guard_news_item(newest_news_items[0])
return newest_item
def put_items(self, items: Iterator[NewsItem]) -> PutNewsItemsResponse:
saved_item_count = 0
with self.dynamo_table.batch_writer() as batch:
for item in items:
batch.put_item(Item=item)
saved_item_count += 1
return {"saved_item_count": saved_item_count}