Introduction

The Django Rest Framework is a mature, feature-rich and robust framework, that provides immense additions to Django's core functionality out of the box.

Its main game-changing assets are its Serializer and View classes, since they provide a concrete way of implementing Django views and setting up API responses seamlessly, while following RESTful design principles.

This means that the, already very fast, process of building a web application back-end can be further enhanced!

However, as helpful and resourceful it may be, it can still cause "trouble in paradise" in cases where the database schema gradually increases in complexity, but the API design does not adapt to the change in a viable and efficient way.

The thing with DRF views

Let's inspect a minimal DRF view:

class ActionList(ListAPIView):
    """
    Handle collections of Actions
    """

    permission_classes = (IsAuthenticated,)
    serializer_class = ActionSerializer
    queryset = Action.objects.all()
.../views.py

The ActionList class can provide a complete listing of all instances of the Action model, defined in our database schema, with just a few lines of code. The implicit functionality corresponds to an HTTP GET request, authorization is handled by the permission_classes class attribute and the API response structure is being taken care of by the associated serializer_class. Neat, right?

Well, yes and no!

This is an excellent solution when our models are relatively simple and utilize OneToOne ForeignKey relationships with other models. However, this is rarely the case, as most web applications use elaborate models with complex relationships.

This usually leads to implementing nested serializers, a path which will most certainly lead to severe and unnecessary database loads, the issue being the common N+1 selects problem.

ORM behavior

Django ships with a solid ORM, which facilitates database queries for most use  cases, without the explicit need for elaborate statement syntax.

However, let's consider the related models below:

class Action(models.Model):
    # fields list


class Session(models.Model):
	# fields list
    action = models.ForeignKey(Action, on_delete=models.CASCADE, related_name='sessions')
    

class SessionParticipation(models.Model):
	# fields list
    session = models.ForeignKey(Session, on_delete=models.SET_NULL, related_name='participations')
    turnover = models.DecimalField(max_digits=8, decimal_places=2)
.../models.py

Their corresponding serializers would be:

class SessionParticipationSerializer(serializers.ModelSerializer):
	# implementation


class SessionSerializer(serializers.ModelSerializer):
	# fields list
    participations = SessionParticipationSerializer(many=True)


class ActionSerializer(serializers.ModelSerializer):
	# fields list
    sessions = SessionSerializer(many=True)
.../serializers.py

Suppose we need to get the total turnover value for all Action instances. At first glance, the snippet below seems pretty reasonable:

actions = Action.objects.all()

for a in actions:
	participations = SessionParticipation.objects.filter(session__action=a)
	if participations:
    	action_turnover = participations.aggregate(Sum('turnover'))['turnover__sum']
    else:
        action_turnover = 0

But! An Action instance may be related to many Session instances. Each  Session instance may then be associated to one or more Participation  instances.

The above code will perform the below queries:

  • 1 SELECT to return all Action instances
  • 1 SELECT per Action instance for the associated Sessions
  • 1 SELECT per Session instance for the associated Session Participations
  • 1 SUM(`turnover`) per Session

It's easy to identify the issue here: if we consider having X Actions *  Y Sessions *  Z Session Participations, the number of queries ends up multiplying beyond reason!

Now, let's look at this:

actions = Action.objects.all().prefetch_related('sessions', 'session__participations')

for a in actions:
  action_turnover = 0
  for s in a.sessions.all():
      for p in s.participations.all():
          # ugly, but faster than pydash.sum_by!
          action_turnover += p.turnover if p.turnover else 0

Since we have declared related_name values for our models, we used them to cache all the required data from our database, using Django ORM's prefetch_related. We also got rid of the aggregation queries.

We then proceed with using the cached collections of actions, a.sessions and s.participations to iterate over their values.

So, to sum up, we query the database three times in order to get the same result. The overall database stress has been reduced by at least two orders of magnitude! Comparing numbers is not even relevant at this point!

Things can get way uglier when dealing with ManyToMany relationships, resolved by the RelatedManager using a pivot table and serialized using nested serializers. In this case, we end up with queries for every set of <PK, FK, FK> in the  pivot, which makes the view extremely slow.

The solution

In order to handle the optimizations described in the previous section at an  application-wide level, we need a generic, reusable implementation.

Enter Mixins:

class EagerLoadingMixin:
    """
    Mixin Class that performs eager loading for serializers with O2O, O2M and M2M relationships
    """

    @classmethod
    def eager_load(cls, queryset: QuerySet):
        """
        Perform eager loading of model data for nested serializers

        :param queryset: the model queryset
        :return: the queryset containing a prefetch cache
        """

        if hasattr(cls, "select_eager"):
            queryset = queryset.select_related(*cls.select_eager)

        if hasattr(cls, "prefetch_eager"):
            queryset = queryset.prefetch_related(*cls.prefetch_eager)

        return queryset
.../mixins.py

The EagerLoadingMixin class provides the eager_load class method, with handles queryset caching for all possible Django Model relationships.

Let's revisit our previous Serializer classes and use EagerLoadingMixin to optimize database calls:

class SessionParticipationSerializer(serializers.ModelSerializer):
	# implementation


class SessionSerializer(serializers.ModelSerializer, EagerLoadingMixin):
    prefetch_eager = [
    	'participations'
    ]
	# fields list
    participations = SessionParticipationSerializer(many=True)


class ActionSerializer(serializers.ModelSerializer, EagerLoadingMixin):
  	prefetch_eager = [
    	'sessions'
    ]
	# fields list
    sessions = SessionSerializer(many=True)
.../serializers.py

By providing the prefetch_eager class attribute, we manipulate the Django ORM to cache all related data to the queryset.

Then, in ActionList view:

class ActionList(ListAPIView):
    """
    Handle collections of Actions
    """

    permission_classes = (IsAuthenticated,)
    serializer_class = ActionSerializer
    queryset = serializer_class.eager_load(Actions.objects.all())
.../views.py

If the view requires more detailed queryset setup (e.g. for sorting/filtering):

class ActionList(ListAPIView):
    """
    Handle collections of Actions
    """

    permission_classes = (IsAuthenticated,)
    serializer_class = ActionSerializer
    
    def get_queryset(self):
        queryset = self.get_serializer_class().eager_load(Action.objects.all())
        # do your work here
        ...
        return queryset
.../views.py

Et voila! Pretty clean, isn't it?

Conclusion

It is  evident that using the Django Rest Framework and Django ORM do not imply optimal database loads by default. Careful design and iterative improvements are required, to ensure that the server performance remains the best possible and that the database is not encumbered by a tsunami of redundant queries, so that the application my achieve consistence and scalability.

Also check:

  1. Django QuerySet API reference | select_related
  2. Django QuerySet API reference | prefetch_related
  3. Optimizing slow Django REST Framework performance
  4. Web API performance: profiling Django REST framework