Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/peopledoc/django-chunkator
Chunk large QuerySets into small chunks, and iterate over them without killing your RAM.
https://github.com/peopledoc/django-chunkator
approved-public ghec-mig-migrated
Last synced: 5 days ago
JSON representation
Chunk large QuerySets into small chunks, and iterate over them without killing your RAM.
- Host: GitHub
- URL: https://github.com/peopledoc/django-chunkator
- Owner: peopledoc
- License: mit
- Created: 2015-03-18T16:57:12.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2022-07-11T19:31:17.000Z (over 2 years ago)
- Last Synced: 2025-01-07T14:08:31.627Z (12 days ago)
- Topics: approved-public, ghec-mig-migrated
- Language: Python
- Homepage: https://pypi.python.org/pypi/django-chunkator
- Size: 94.7 KB
- Stars: 110
- Watchers: 14
- Forks: 11
- Open Issues: 5
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
- starred-awesome - django-chunkator - Chunk large QuerySets into small chunks, and iterate over them without killing your RAM. (Python)
README
================
django-chunkator
================Chunk large QuerySets into small chunks, and iterate over them without killing
your RAM... image:: https://travis-ci.org/peopledoc/django-chunkator.svg
Tested with all the combinations of:
* Python: 3.5, 3.6, 3.7, 3.8
* Django: 2, 2.1, 2.2, 3.0, master.. note::
Django 3.0 is incompatible with Python 3.5, see
Usage
=====.. code:: python
from chunkator import chunkator
for item in chunkator(LargeModel.objects.all(), 200):
do_something(item)This tool is intended to work on Django querysets.
Your model **must** define a ``pk`` field (this is done by default, but
sometimes it can be overridden) and this pk has to be unique. ``django-
chunkator`` has been tested with PostgreSQL and SQLite, using regular PKs and
UUIDs as primary keys.You can also use ``values()``:
.. code:: python
from chunkator import chunkator
for item in chunkator(LargeModel.objects.values('pk', 'name'), 200):
do_something(item).. important::
If you're using ``values()`` you **have** to add at least your "pk" field
to the values, otherwise, the chunkator will throw a
``MissingPkFieldException``... warning::
This will not **accelerate** your process. Instead of having one BIG query,
you'll have several small queries. This will save your RAM instead, because
you'll not load a huge queryset result before looping on it.If you want to manipulate the pages directly, you can use `chunkator_page`:
.. code:: python
from chunkator import chunkator_page
queryset = LargeModel.objects.all().values('pk')
for page in chunkator_page(queryset, 200):
launch_some_task([item['pk'] for item in page])FAQ
===- How is django-chunkator different from Django's `iterator `_?
If you have server-side cursors (using Postgres or Oracle & not setting `DISABLE_SERVER_SIDE_CURSORS`), then the main difference is that the cursor is in the hands of the application instead of the server. It really depends on your constraints, but sometimes server side cursors can put too much strains on your DB.
If you don't have server-side cursors, then chunkator will allow you to iterate over your queryset by batch, without relying on LIMIT/OFFSET. The problem with LIMIT/OFFSET is that computing a large offset (when you're at the end of your queryset) requires the DB to go through all the previous entries. With large tables this can be a huge issue.
- Will django-chunkator preserve the ordering on my querysets?
No, it orders the queryset by pk. However you could do the same thing than chunkator with another field, given that it's unique and not nullable, see `here `_ for more details.
License
=======MIT License.