https://github.com/twtrubiks/django_n_add_1_queries_problem_tutorial

Django N+1 Queries Problem
https://github.com/twtrubiks/django_n_add_1_queries_problem_tutorial
django prefetch prefetch-related query select-related
Last synced: 3 months ago
JSON representation
Django N+1 Queries Problem
Host: GitHub
URL: https://github.com/twtrubiks/django_n_add_1_queries_problem_tutorial
Owner: twtrubiks
License: mit
Created: 2023-02-04T11:41:45.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-07-25T05:47:49.000Z (10 months ago)
Last Synced: 2025-01-17T05:09:09.878Z (4 months ago)
Topics: django, prefetch, prefetch-related, query, select-related
Language: Python
Homepage:
Size: 18.6 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        # django_N_add_1_queries_problem_tutorial

透過 django 介紹 N+1 Queries Problem

* [Youtube Tutorial - 透過 django 介紹 N+1 Queries Problem](https://youtu.be/trVzF-jBFTo)

## 環境安裝

```cmd

pip3 install -r requirements.txt

```

這裡面有使用到 [django-extensions](https://django-extensions.readthedocs.io/en/latest/), 主要是要看 raw SQL 做了什麼事情

## 介紹

[models.py](https://github.com/twtrubiks/django_N_add_1_queries_problem_tutorial/blob/main/books/models.py)

建立一些簡單的資料

Author

```sql

postgres=# SELECT * FROM public.author;

 id |   name   | country_id

----+----------+------------

  1 | author_1 |          1

  2 | author_2 |          2

  3 | author_3 |          3

(3 rows)

```

Book

```sql

postgres=# SELECT * FROM public.book;

 id |  name  | author_id

----+--------+-----------

  1 | book_1 |         1

  2 | book_2 |         2

  3 | book_3 |         3

(3 rows)

```

Country

```sql

postgres=# SELECT * FROM public.country;

 id |   name

----+-----------

  1 | country_1

  2 | country_2

  3 | country_3

(3 rows)

```

User

```sql

postgres=# SELECT * FROM public."user";

 id | name

----+-------

  1 | user1

  2 | user2

  3 | user3

(3 rows)

```

Tag

```sql

postgres=# SELECT * FROM public.tag;

 id | name  | created_by_id

----+-------+---------------

  1 | tag_1 |             1

  3 | tag_3 |             3

  2 | tag_2 |

(3 rows)

```

book_tags

(ManyToManyField 自動產出來的表, 紀錄 Book 和 Tag 的關係)

```sql

postgres=# SELECT * FROM public.book_tags;

 id | book_id | tag_id

----+---------+--------

  1 |       1 |      1

  2 |       1 |      2

  3 |       2 |      3

(3 rows)

```

先 migrate 後匯入測試資料

```cmd

python manage.py migrate

python manage.py loaddata db.json

```

也可以從後台建立(觀看),

```cmd

python3 manage.py runserver

```

進入 [http://0.0.0.0:8000/admin/](http://0.0.0.0:8000/admin/)

```text

帳號/密碼: admin/admin

```

進入 shell_plus 操作資料,

```cmd

python3 manage.py shell_plus --print-sql

```

### 情境一

N+1 Queries

```cmd

>>> from books.models import Country, Author, Book

>>> books = Book.objects.all() # QuerySets are lazy # Doesn't hit the database.

>>> for book in books:

...     print(book.name, "by", book.author.name)

...

SELECT "book"."id",

       "book"."name",

       "book"."author_id"

  FROM "book"

Execution time: 0.000184s [Database: default]

SELECT "author"."id",

       "author"."name",

       "author"."country_id"

  FROM "author"

 WHERE "author"."id" = 1

 LIMIT 21

Execution time: 0.000178s [Database: default]

book_1 by author_1

SELECT "author"."id",

       "author"."name",

       "author"."country_id"

  FROM "author"

 WHERE "author"."id" = 2

 LIMIT 21

Execution time: 0.000191s [Database: default]

book_2 by author_2

SELECT "author"."id",

       "author"."name",

       "author"."country_id"

  FROM "author"

 WHERE "author"."id" = 3

 LIMIT 21

Execution time: 0.000202s [Database: default]

book_3 by author_3

```

book table 只有3筆資料, 但總共執行了 4 (N+1) 次 SQL,

1次是撈出全部的book, 剩下的3次, 是透過 author id 去撈作者的 name.

### 情境二

2N+1 Queries

假如今天我們 access 更多的 foreign key

```cmd

>>> from books.models import Country, Author, Book

>>> books = Book.objects.all() # QuerySets are lazy # Doesn't hit the database.

>>> for book in books:

...     print(book.name, "by", book.author.name, "from", book.author.country.name)

...

SELECT "book"."id",

       "book"."name",

       "book"."author_id"

  FROM "book"

Execution time: 0.000313s [Database: default]

SELECT "author"."id",

       "author"."name",

       "author"."country_id"

  FROM "author"

 WHERE "author"."id" = 1

 LIMIT 21

Execution time: 0.000086s [Database: default]

SELECT "country"."id",

       "country"."name"

  FROM "country"

 WHERE "country"."id" = 1

 LIMIT 21

Execution time: 0.000128s [Database: default]

book_1 by author_1 from country_1

SELECT "author"."id",

       "author"."name",

       "author"."country_id"

  FROM "author"

 WHERE "author"."id" = 2

 LIMIT 21

Execution time: 0.000112s [Database: default]

SELECT "country"."id",

       "country"."name"

  FROM "country"

 WHERE "country"."id" = 2

 LIMIT 21

Execution time: 0.000108s [Database: default]

book_2 by author_2 from country_2

SELECT "author"."id",

       "author"."name",

       "author"."country_id"

  FROM "author"

 WHERE "author"."id" = 3

 LIMIT 21

Execution time: 0.000254s [Database: default]

SELECT "country"."id",

       "country"."name"

  FROM "country"

 WHERE "country"."id" = 3

 LIMIT 21

Execution time: 0.000117s [Database: default]

book_3 by author_3 from country_3

```

book table 只有3筆資料, 但總共執行了 7 (2N+1) 次 SQL,

1次是撈出全部的book,

(1次是透過 author id 去撈作者的 name, 1次是透過 country id 去撈居住地點的 name.) * 2 次.

看了情境一(N+1 Queries), 以及情境二 (2N+1 Queries), 肯定會造成效能的影響 :weary:

這樣在 Django ORM 上, 應該怎麼解決呢 :question:

### 情境三

也來看一下 ManyToManyField 的情境

```cmd

from books.models import Country, Author, Book, Tag

>>> books = Book.objects.all()

>>> for book in books:

...     print(book.tags.all())

...

SELECT "book"."id",

       "book"."name",

       "book"."author_id"

  FROM "book"

Execution time: 0.001057s [Database: default]

SELECT "tag"."id",

       "tag"."name",

       "tag"."created_by_id"

  FROM "tag"

 INNER JOIN "book_tags"

    ON ("tag"."id" = "book_tags"."tag_id")

 WHERE "book_tags"."book_id" = 1

 LIMIT 21

Execution time: 0.001094s [Database: default]

, ]>

SELECT "tag"."id",

       "tag"."name",

       "tag"."created_by_id"

  FROM "tag"

 INNER JOIN "book_tags"

    ON ("tag"."id" = "book_tags"."tag_id")

 WHERE "book_tags"."book_id" = 2

 LIMIT 21

Execution time: 0.000927s [Database: default]

]>

SELECT "tag"."id",

       "tag"."name",

       "tag"."created_by_id"

  FROM "tag"

 INNER JOIN "book_tags"

    ON ("tag"."id" = "book_tags"."tag_id")

 WHERE "book_tags"."book_id" = 3

 LIMIT 21

Execution time: 0.000973s [Database: default]

```

一次 SQL 取出全部的 book,

二次 SQL 直接 JOIN book_tags, 然後透過 WHERE 的方式, 將需要的資料拿出來,

因為 book 有三筆資料, 就分別 JOIN 了 3 次 (效能不好).

## 解決方法

### 方法一

`select_related()` [https://docs.djangoproject.com/en/5.0/ref/models/querysets/#select-related](https://docs.djangoproject.com/en/5.0/ref/models/querysets/#select-related)

官網說明如下,

```text

Returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.

```

#### 情境一

N+1 Queries

```cmd

>>> from books.models import Country, Author, Book

>>> books = Book.objects.all().select_related("author") # QuerySets are lazy # Doesn't hit the database.

>>> for book in books:

...     print(book.name, "by", book.author.name)

...

SELECT "book"."id",

       "book"."name",

       "book"."author_id",

       "author"."id",

       "author"."name",

       "author"."country_id"

  FROM "book"

 INNER JOIN "author"

    ON ("book"."author_id" = "author"."id")

Execution time: 0.000651s [Database: default]

book_1 by author_1

book_2 by author_2

book_3 by author_3

```

只使用了一次的 Query, 使用 Join 的方式解決了這個問題.

#### 情境二

2N+1 Queries

```cmd

>>> from books.models import Country, Author, Book

>>> books = Book.objects.all().select_related("author", "author__country") # QuerySets are lazy # Doesn't hit the database.

>>> for book in books:

...     print(book.name, "by", book.author.name, "from", book.author.country.name)

...

SELECT "book"."id",

       "book"."name",

       "book"."author_id",

       "author"."id",

       "author"."name",

       "author"."country_id",

       "country"."id",

       "country"."name"

  FROM "book"

 INNER JOIN "author"

    ON ("book"."author_id" = "author"."id")

 INNER JOIN "country"

    ON ("author"."country_id" = "country"."id")

Execution time: 0.000541s [Database: default]

book_1 by author_1 from country_1

book_2 by author_2 from country_2

book_3 by author_3 from country_3

```

只使用了一次的 Query, 使用 Join 兩個 table 的方式解決了這個問題.

### 方法二

`prefetch_related` [https://docs.djangoproject.com/en/5.0/ref/models/querysets/#prefetch-related](https://docs.djangoproject.com/en/5.0/ref/models/querysets/#prefetch-related)

官網說明如下,

```text

Returns a QuerySet that will automatically retrieve, in a single batch, related objects for each of the specified lookups.

This has a similar purpose to select_related, in that both are designed to stop the deluge of database queries that is caused by accessing related objects, but the strategy is quite different.

select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.

prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey, however, it must be restricted to a homogeneous set of results. For example, prefetching objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.

```

#### 情境一

N+1 Queries

```cmd

>>> from books.models import Country, Author, Book

>>> books = Book.objects.all().prefetch_related("author") # QuerySets are lazy # Doesn't hit the database.

>>> for book in books:

...     print(book.name, "by", book.author.name)

...

SELECT "book"."id",

       "book"."name",

       "book"."author_id"

  FROM "book"

Execution time: 0.000486s [Database: default]

SELECT "author"."id",

       "author"."name",

       "author"."country_id"

  FROM "author"

 WHERE "author"."id" IN (1, 2, 3)

Execution time: 0.000735s [Database: default]

book_1 by author_1

book_2 by author_2

book_3 by author_3

```

和 `select_related` 不同的是, 使用了兩次的 Query,

一次找全部的 book,

第二次透過作者 id (ids) 一次找到全部的名稱.

#### 情境二

2N+1 Queries

```cmd

>>> from books.models import Country, Author, Book

>>> books = Book.objects.all().prefetch_related("author", "author__country") # QuerySets are lazy # Doesn't hit the database.

>>> for book in books:

...     print(book.name, "by", book.author.name, "from", book.author.country.name)

...

SELECT "book"."id",

       "book"."name",

       "book"."author_id"

  FROM "book"

Execution time: 0.000108s [Database: default]

SELECT "author"."id",

       "author"."name",

       "author"."country_id"

  FROM "author"

 WHERE "author"."id" IN (1, 2, 3)

Execution time: 0.000112s [Database: default]

SELECT "country"."id",

       "country"."name"

  FROM "country"

WHERE "country"."id" IN (1, 2, 3)

```

和 `select_related` 不同的是, 使用了三次的 Query,

一次找全部的 book,

第二次透過作者 id (ids) 一次找到全部的名稱.

第三次透過居住地 id (ids) 一次找到全部的名稱.

#### 情境三

ManyToManyField 的情境

```cmd

>>> from books.models import Country, Author, Book, Tag

>>> books = Book.objects.all().prefetch_related("tags")

>>> for book in books:

...     print(book.tags.all())

...

SELECT "book"."id",

       "book"."name",

       "book"."author_id"

  FROM "book"

Execution time: 0.001068s [Database: default]

SELECT ("book_tags"."book_id") AS "_prefetch_related_val_book_id",

       "tag"."id",

       "tag"."name",

       "tag"."created_by_id"

  FROM "tag"

 INNER JOIN "book_tags"

    ON ("tag"."id" = "book_tags"."tag_id")

 WHERE "book_tags"."book_id" IN (1, 2, 3)

Execution time: 0.000886s [Database: default]

, ]>

]>

```

一次 SQL 取出全部的 book,

二次 SQL 直接 JOIN book_tags, 然後透過 WHERE IN 的方式, 一口氣將需要的資料拿出來,

當在 loop 的時候, 其實也都是去快取(cache) 裡面取資料而已(不會 access db).

#### 更複雜的情境

假設今天我們的 ManyToManyField tags 裡面還有一個 ForeignKey created_by,

這樣應該如何一起把它 JOIN 起來呢 ❓ 這時候就必須透過 [Prefetch](https://docs.djangoproject.com/en/5.0/ref/models/querysets/#django.db.models.Prefetch)

以下範例是只撈出 Tag 裡面有 created_by 值得內容,

```cmd

>>> from books.models import Country, Author, Book, Tag, User

>>> from django.db.models import Prefetch

>>> queryset = Tag.objects.select_related("created_by").filter(created_by__isnull=False)

>>> books = Book.objects.prefetch_related(Prefetch("tags", queryset=queryset, to_attr="tag_has_created_by"))

>>> for book in books:

...     for tag in book.tag_has_created_by:

...         print(tag.created_by)

...

SELECT "book"."id",

       "book"."name",

       "book"."author_id"

  FROM "book"

Execution time: 0.001034s [Database: default]

SELECT ("book_tags"."book_id") AS "_prefetch_related_val_book_id",

       "tag"."id",

       "tag"."name",

       "tag"."created_by_id",

       "user"."id",

       "user"."name"

  FROM "tag"

 INNER JOIN "user"

    ON ("tag"."created_by_id" = "user"."id")

 INNER JOIN "book_tags"

    ON ("tag"."id" = "book_tags"."tag_id")

 WHERE ("tag"."created_by_id" IS NOT NULL AND "book_tags"."book_id" IN (1, 2, 3))

Execution time: 0.001113s [Database: default]

user1

user3

```

你會發現 JOIN 了兩次, 一次是針對 book_tags, 另一次是針對 created_by (user).

#### prefetch_related_objects

如果今天你使用了 `model.all()`, 它不會有快取,

因為對 django 來說, 他是一份新的.

所以這時候可以透過 [prefetch_related_objects](https://docs.djangoproject.com/en/5.0/ref/models/querysets/#prefetch-related-objects) 快取起來.

`prefetch_related_objects(model_instances, *related_lookups)`

```cmd

>>> from books.models import Country, Author, Book, Tag, User

>>> book = Book.objects.first()

SELECT "book"."id",

       "book"."name",

       "book"."author_id"

  FROM "book"

 ORDER BY "book"."id" ASC

 LIMIT 1

Execution time: 0.000815s [Database: default]

>>> book.tags.all() # query db

SELECT "tag"."id",

       "tag"."name",

       "tag"."created_by_id"

  FROM "tag"

 INNER JOIN "book_tags"

    ON ("tag"."id" = "book_tags"."tag_id")

 WHERE "book_tags"."book_id" = 1

 LIMIT 21

Execution time: 0.001145s [Database: default]

, ]>

>>> book.tags.all() # query db

SELECT "tag"."id",

       "tag"."name",

       "tag"."created_by_id"

  FROM "tag"

 INNER JOIN "book_tags"

    ON ("tag"."id" = "book_tags"."tag_id")

 WHERE "book_tags"."book_id" = 1

 LIMIT 21

Execution time: 0.001164s [Database: default]

, ]>

>>> from django.db.models import prefetch_related_objects

>>> prefetch_related_objects([book], "tags")

SELECT ("book_tags"."book_id") AS "_prefetch_related_val_book_id",

       "tag"."id",

       "tag"."name",

       "tag"."created_by_id"

  FROM "tag"

 INNER JOIN "book_tags"

    ON ("tag"."id" = "book_tags"."tag_id")

 WHERE "book_tags"."book_id" IN (1)

Execution time: 0.001192s [Database: default]

>>> book.tags.all() # use cache

, ]>

>>> book.tags.all() # use cache

, ]>

>>> book.tags.all() # use cache

, ]>

>>> book.tags.all() # use cache

, ]>

```

可以發現當我們使用了 prefetch_related_objects, 就會直接從快取拿資料.

## 結論

`select_related` 可以只 Query 一次, 但 JOIN 還是需要一點成本.

通常使用在 OneToOne(一對一) 或 ForeignKey(多對一), 透過 join 降低查詢次數.

```text

select_related is limited to single-valued relationships - foreign key and one-to-one.

```

`prefetch_related` Query 多次(依照你要取的 ForeignKey 數量), 但執行比較不需要成本的 Query.

通常使用在 OneToMany(一對多) 或 ManyToMany(多對多), 避免載入大量資料.

## 執行環境

* Python 3.11

## Reference

* [Django](https://www.djangoproject.com/)

* [Django and the N+1 Queries Problem](https://scoutapm.com/blog/django-and-the-n1-queries-problem)

## Donation

文章都是我自己研究內化後原創，如果有幫助到您，也想鼓勵我的話，歡迎請我喝一杯咖啡:laughing:

![alt tag](https://i.imgur.com/LRct9xa.png)

[贊助者付款](https://payment.opay.tw/Broadcaster/Donate/9E47FDEF85ABE383A0F5FC6A218606F8)

## License

MIT license
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/twtrubiks/django_n_add_1_queries_problem_tutorial

Awesome Lists containing this project

README