https://github.com/andreacrotti/postgres-migration
https://github.com/andreacrotti/postgres-migration
Last synced: 5 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/andreacrotti/postgres-migration
- Owner: AndreaCrotti
- Created: 2016-07-19T07:52:56.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-07-28T13:16:43.000Z (over 9 years ago)
- Last Synced: 2025-02-01T08:30:56.436Z (11 months ago)
- Size: 1.59 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.org
Awesome Lists containing this project
README
#+AUTHOR: Andrea Crotti @andreacrotti
#+TITLE: Django: From MySQL to Postgres
#+OPTIONS: num:nil ^:nil toc:nil timestamp:nil reveal_single_file:t
#+REVEAL_TRANS: fade
#+REVEAL_SPEED: fast
#+REVEAL_PLUGINS: notes
#+EMAIL: andrea.crotti@iwoca.co.uk
* Why
[[./images/postgresql_versus_mysql.jpg]]
* Why, really
This sucks:
#+BEGIN_SRC sql
SELECT X.* FROM no_chain_samplemodel as X
JOIN (SELECT user_id, MAX(timestamp) AS timestamp
FROM no_chain_samplemodel
GROUP BY user_id) AS Y
ON (X.user_id = Y.user_id and X.timestamp = Y.timestamp)i
WHERE X.staff_id = %s
#+END_SRC
This is great:
#+BEGIN_SRC sql
SELECT DISTINCT ON (user_id) FROM no_chain_samplemodel
WHERE timestamp <= '2017-01-01' ORDER BY user_id ASC, timestamp DESC;
#+END_SRC
With the Django ORM:
#+BEGIN_SRC python
SampleModel.objects.distinct('user_id').\
filter(timestamp__gt=mydate).order_by('user_id', '-timestamp')
#+END_SRC
* How
*From this*
#+BEGIN_SRC python
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
...
}
}
#+END_SRC
*To this*
#+BEGIN_SRC python
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
...
}
}
#+END_SRC
* Done?
[[./images/done_yet.png]]
* A few numbers
Massive fintech Django project:
- 190k lines of Python code
- > 100 Apps
- 383 tables to migrate
- 3000 tests running in ~3 minutes
* What's the plan
- adapt code
- migrate data
- profit!
* Data migration
** Pgloader
*Pgloader* to the rescue:
[[./images/pgloader.png]]
*No live replication* == *Downtime!!!*
** Very big tables
- drop foreign keys (ForeignKey → IntegerField)
- adapt queries
- write a database router
* Code changes
- get Postgres on CI (stable tests)
- search for untested raw queries
- manual testing on real data
** Regression testing
[[./images/notebook.png]]
* Tips for switchers
- *use* migrations for everything
- *test* everything
- *NEVER* rely on implicit ordering
- make Django apps really independent
- split that monolith ASAP
* Conclusions
#+BEGIN_QUOTE
Hofstadter's Database Migration Law:
Migrating from MySQL to Postgres always takes longer than you expect, even when you take into account Hofstadter's Law.
#+END_QUOTE
@andreacrotti https://www.iwoca.co.uk/