The Django Rest Framework is a mature, feature-rich and robust framework, that provides immense additions to Django's core functionality out of the box.
Its main game-changing assets are its
View classes, since they provide a concrete way of implementing Django views and setting up API responses seamlessly, while following RESTful design principles.
This means that the, already very fast, process of building a web application back-end can be further enhanced!
However, as helpful and resourceful it may be, it can still cause "trouble in paradise" in cases where the database schema gradually increases in complexity, but the API design does not adapt to the change in a viable and efficient way.
The thing with DRF views
Let's inspect a minimal DRF view:
ActionList class can provide a complete listing of all instances of the Action model, defined in our database schema, with just a few lines of code. The implicit functionality corresponds to an HTTP GET request, authorization is handled by the
permission_classes class attribute and the API response structure is being taken care of by the associated
serializer_class. Neat, right?
Well, yes and no!
This is an excellent solution when our models are relatively simple and utilize
OneToOne ForeignKey relationships with other models. However, this is rarely the case, as most web applications use elaborate models with complex relationships.
This usually leads to implementing nested serializers, a path which will most certainly lead to severe and unnecessary database loads, the issue being the common N+1 selects problem.
Django ships with a solid ORM, which facilitates database queries for most use cases, without the explicit need for elaborate statement syntax.
However, let's consider the related models below:
Their corresponding serializers would be:
Suppose we need to get the total turnover value for all Action instances. At first glance, the snippet below seems pretty reasonable:
actions = Action.objects.all() for a in actions: participations = SessionParticipation.objects.filter(session__action=a) if participations: action_turnover = participations.aggregate(Sum('turnover'))['turnover__sum'] else: action_turnover = 0
But! An Action instance may be related to many Session instances. Each Session instance may then be associated to one or more Participation instances.
The above code will perform the below queries:
SELECTto return all Action instances
SELECTper Action instance for the associated Sessions
SELECTper Session instance for the associated Session Participations
It's easy to identify the issue here: if we consider having X Actions * Y Sessions * Z Session Participations, the number of queries ends up multiplying beyond reason!
Now, let's look at this:
actions = Action.objects.all().prefetch_related('sessions', 'session__participations') for a in actions: action_turnover = 0 for s in a.sessions.all(): for p in s.participations.all(): # ugly, but faster than pydash.sum_by! action_turnover += p.turnover if p.turnover else 0
Since we have declared
related_name values for our models, we used them to cache all the required data from our database, using Django ORM's
prefetch_related. We also got rid of the aggregation queries.
We then proceed with using the cached collections of
s.participations to iterate over their values.
So, to sum up, we query the database three times in order to get the same result. The overall database stress has been reduced by at least two orders of magnitude! Comparing numbers is not even relevant at this point!
Things can get way uglier when dealing with
ManyToMany relationships, resolved by the
RelatedManager using a pivot table and serialized using nested serializers. In this case, we end up with queries for every set of <PK, FK, FK> in the pivot, which makes the view extremely slow.
In order to handle the optimizations described in the previous section at an application-wide level, we need a generic, reusable implementation.
EagerLoadingMixin class provides the
eager_load class method, with handles
queryset caching for all possible Django Model relationships.
Let's revisit our previous
Serializer classes and use
EagerLoadingMixin to optimize database calls:
By providing the
prefetch_eager class attribute, we manipulate the Django ORM to cache all related data to the
If the view requires more detailed
queryset setup (e.g. for sorting/filtering):
Et voila! Pretty clean, isn't it?
It is evident that using the Django Rest Framework and Django ORM do not imply optimal database loads by default. Careful design and iterative improvements are required, to ensure that the server performance remains the best possible and that the database is not encumbered by a tsunami of redundant queries, so that the application my achieve consistence and scalability.