Queries are too slow; prefetch_related not solving the problem

django prefetch_related
django orm performance
django queryset
django orm optimization
django cache queryset
django filter slow
django performance testing
django filter queryset without hitting database

We are using Django 2.1 for Speedy Net. I have pages which display about 96 users per page, and for each user I want to display how many friends he has on Speedy Match, with an active email address. The query checks for each user if (self.email_addresses.filter(is_confirmed=True).exists()) is true:

def has_confirmed_email(self):
    return (self.email_addresses.filter(is_confirmed=True).exists())

For each user of 96 users, it checks all his friends and runs this query - more than hundreds of times per page. The query for fetching the users is User.objects.all().order_by(<...>), and then for each user it checks this query:

qs = self.friends.all().prefetch_related("from_user", "from_user__{}".format(SpeedyNetSiteProfile.RELATED_NAME), "from_user__{}".format(SpeedyMatchSiteProfile.RELATED_NAME), "from_user__email_addresses").distinct().order_by('-from_user__{}__last_visit'.format(SiteProfile.RELATED_NAME))

I added prefetch_related in the User's manager model:

def get_queryset(self):
    from speedy.net.accounts.models import SiteProfile as SpeedyNetSiteProfile
    from speedy.match.accounts.models import SiteProfile as SpeedyMatchSiteProfile
    return super().get_queryset().prefetch_related(SpeedyNetSiteProfile.RELATED_NAME, SpeedyMatchSiteProfile.RELATED_NAME, "email_addresses").distinct()

But adding "email_addresses" and "from_user__email_addresses" to prefetch_related doesn't make the page load faster - it takes about 16 seconds to load the page. When loading the page without checking if each friend has a confirmed email address it takes about 3 seconds to load the page. Is there a way I can load all the email addresses of the users once and not each time a user is checked? Actually I would also like the friends query to be loaded once and not 96 times per page (once for each user), but the page loads in 3 seconds so it doesn't matter that much. But if I could query the friends table once it would have been better.

The queries are caused by the following line (link):

if ((self.user.has_confirmed_email()) and (step >= self.activation_step)):

This is called by is_active_and_valid which is called by get_matching_rank, to check if the user is a match of the specific user. This is called by method get_friends in the model.

Update #1: If I change to return True in def has_confirmed_email(...) in the model, the page loads only 3 seconds faster (13 instead of 16) so there might be more performance-related issues in this page.

If I disable the functionality of get_matching_rank and replace it with a plain return 5, the page loads much faster. But of course we need the functionality of this function. Maybe we can just cache for a few minutes the results of this function when called for sets of two specific users?

Update #2: I want to add a boolean field to the user model, which will be true if the user has a confirmed email address. And this field will be updated each time an email address is saved or deleted. I know how to override the save method, but how do I update this field when an email address gets deleted? It may also be deleted by the admin.

I think I should use signals such as post_save and post_delete.

For the prefetch to have any effect you would have to be using it on the User model - it's hard to tell if you're doing that from what you've included.

Without prefetching friends for each user doing self.friends.all() is going to cause a query. To get around the query using prefetch you could do one of the following:

User.objects.prefetch_related('friends')

Or you could use a Prefetch object to further filter:

User.objects.prefetch_related(Prefetch(
    'friends',
    queryset=Friend.objects.filter(is_confirmed=True)
)

A Count annotation using the filter keyword argument is going to be much quicker.

from djang.db.models import Count, Q

qs = User.objects.annotate(
    friend_count=Count('friends', filter=Q(friends__is_confirmed=True)
)

Queries are too slow; prefetch_related not solving the , We are using Django 2.1 for Speedy Net. I have pages which display about 96 users per page, and for each user I want to display how many friends he has on� There is no such way of defining, what is the good or bad performance of a query. Lets benefit SQL query by optimizing the SQL Server and making its performance fast. I will share some quick tips to improve performance slow running queries in SQL Server. But first let us understand the possible reasons Why SQL Server running slow ?

But adding "email_addresses" and "from_user__email_addresses" to prefetch_related doesn't make the page load faster ...

That is because self.email_addresses.filter(is_confirmed=True).exists() doesn't use the prefetched QuerySet.

To use the prefetched self.email_addresses, filter in memory:

def has_confirmed_email(self):
    if self.email_addresses.all()._result_cache is not None:
        return any(email_address.is_confirmed for email_address in self.email_addresses.all())

    return (self.email_addresses.filter(is_confirmed=True).exists())

Note: If not prefetched, then the improved implementation still hits the database on every has_confirmed_email function call since the .filter still creates a new QuerySet. To handle this, make has_confirmed_email a Django @cached_property.

Explanation

From https://docs.djangoproject.com/en/3.0/ref/models/querysets/#prefetch-related:

Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query. ...

>>> pizzas = Pizza.objects.prefetch_related('toppings')
>>> [list(pizza.toppings.filter(spicy=True)) for pizza in pizzas]

... The prefetched cache can’t help here; in fact it hurts performance, since you have done a database query that you haven’t used. So use this feature with caution!

Solving Performance Problems in the Django ORM, Django is fast, but sometimes it allows you to unwittingly write slow code. the problem and provides select_related and prefetch_related to solve it. code and recording the queries it produces; Queries should not be in� Before you can profile slow queries, you need to find them. MySQL has a built-in slow query log. To use it, open the my.cnf file and set the slow_query_log variable to "On." Set long_query_time to the number of seconds that a query should take to be considered slow, say 0.2. Set slow_query_log_file to the path where you want to save the file

I added a field to the User model:

has_confirmed_email = models.BooleanField(default=False)

And the method:

def _update_has_confirmed_email_field(self):
    self.has_confirmed_email = (self.email_addresses.filter(is_confirmed=True).count() > 0)
    self.save_user_and_profile()

And:

@receiver(signal=models.signals.post_save, sender=UserEmailAddress)
def update_user_has_confirmed_email_field_after_saving_email_address(sender, instance: UserEmailAddress, **kwargs):
    instance.user._update_has_confirmed_email_field()


@receiver(signal=models.signals.post_delete, sender=UserEmailAddress)
def update_user_has_confirmed_email_field_after_deleting_email_address(sender, instance: UserEmailAddress, **kwargs):
    instance.user._update_has_confirmed_email_field()

And in the User model:

def delete(self, *args, **kwargs):
    if ((self.is_staff) or (self.is_superuser)):
        warnings.warn('Can’t delete staff user.')
        return False
    else:
        self.email_addresses.all().delete() # This is necessary because of the signal above.
        return super().delete(*args, **kwargs)

I also removed the friends count from the admin view, and now the admin view pages load in about 1.5 seconds.

Optimizing slow Django REST Framework performance, Solve slow Django REST framework API performance problems by eager- loading data, instead of inefficient database querying due to nested serializers. Using DRF, it is hard not to make. It requires use of the underutilized select_related and prefetch_related methods on the Django ORM (and the� What to do when your query is too slow? First of all, you have to know why it is slow. What is the real cause of your problem. If the reason why is not known, suggesting to rewrite the query, or hinting the query, suggesting parallellization et cetera is not very productive. Once in a while you may get lucky.

#21760 (prefetch_related uses an inefficient query to get the related , The query I'd expect to be issued by the prefetch_related is: There are two problems with this approach: end up returning much more data than is needed because the related objects are duplicated many times over, and will be slower overall. If it's just a cleanup, only fix the if condition, and not the implementation. PBI has power query and power pivot as modelling tools. Unlike excel version, you cannot bring data directly in Power Pivot, so you have a query from a website. (google sheets in your case). If the refresh time is slow, then the query is slow. The refresh time is strictly related to what your query does, and the measures you wrote.

Optimizing Django REST Framework performance with django-auto , tldr: django-auto-prefetching is a library that automatically optimizes your This means we potentially make a lot of queries as DRF first goes to fetch Django has a built in solution to this problem, select_related and prefetch_related prefetch_related calls, not to mention keeping them updated as code� Hi Everyone, My "Apply query changes' is taking far too long. This wasn't the case when I was working on the same dataset (2m rows) of data. Recently I did some changes and removed some of the steps within the query that required Merged. And now when I apply changes to it, is taking forever an

Query count pitfalls and how to avoid them, If you notice that your Django application is running slowly, the first pip install pipenv # if pipenv is not already installed on your system I'm going to run through some of the most common scenarios and show you how to fix them. the count with prefetch_related . prefetch_related executes one query for�

Comments
  • Would it be better to implement some sort of cache or intermediate storage for this? You're asking alot of the db for this, and might be time to look at alternatives for this type of data
  • @Jason Yes I agree, an email address may be confirmed or deleted, these are the only times when the value of has_confirmed_email would change. There is no way to cause a confirmed email address to become unconfirmed.
  • The email address check has to be from code because it checks other things too, such as matching between users. But only the email addresses are retrieved from the database.
  • I want to retrieve all the email addresses of the relevant users once from the database without changing the code.
  • A friend can't be confirmed but each email address can be confirmed or unconfirmed.
  • @Uri hard to tell without seeing your models but I'm guessing a friend is linked to an email address? In that case in your Prefetch object you can filter the queryset with .filter(email_address__is_confirmed=True)
  • For each user I can query the email addresses in (self.email_addresses.filter(is_confirmed=True).exists()), but I want to fetch all the email addresses once and not once for each user. A friend is a user.
  • Yes, I understand. Thank you. But I already added has_confirmed_email as a field in the database. Please see my answer below.
  • Actually, return any(email_address.is_confirmed for email_address in self.email_addresses) can work anyway, whether or not the query is prefetched.
  • Yes, I saw your answer after I posted (I typed my answer, had to step away, and then posted hours later). This answer solves the original problem in the question (Queries are too slow; prefetch_related not solving the problem), while your answer achieves your end goal (make the page load faster) in a very different way.
  • Although return any(...) can work whether or not the query is prefetched, it incurs a performance cost — loading all email_addresses into memory — vs .exists().
  • There should not be more than 5 email addresses for a single user so I think it should take about the same time in both ways. But if the email addresses are prefetched for many users at once, it would be faster to use it.