https://github.com/emahtab/database-table-indexes
A demo on Database table indexes
https://github.com/emahtab/database-table-indexes
database-indexes mysql
Last synced: 2 months ago
JSON representation
A demo on Database table indexes
- Host: GitHub
- URL: https://github.com/emahtab/database-table-indexes
- Owner: eMahtab
- Created: 2024-12-10T05:50:56.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-12-11T15:51:05.000Z (6 months ago)
- Last Synced: 2025-02-07T22:25:38.139Z (4 months ago)
- Topics: database-indexes, mysql
- Homepage:
- Size: 729 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Database Table Indexes
## Database Setup :
The `test` database contains two tables `users` and `messages`, the tables and records in the tables are created by importing the (https://github.com/eMahtab/mysql-test-dataset/blob/main/users-and-messages/test_database.zip) database dump.
Its medium sized database having in total 110 Million records, so **importing the database dump will take a long time**.

### Dataset Size :
**users table = 10 Million records**
**messages table = 100 Million records**

#### Schema :
**The tables were originally created using below DDL statements :**`users` table have id (which denotes user id) as Primary Key, and `messages` table also have id (which denotes message id) as Primary Key.
`messages` table have columns `sender_id` and `recipient_id` which are Foreign Key, referencing `id` column of `users` table
```sql
CREATE TABLE users (
id BIGINT,
name VARCHAR(50),
username VARCHAR(30),
PRIMARY KEY (id)
);CREATE TABLE messages (
id BIGINT,
sender_id BIGINT,
recipient_id BIGINT,
message TEXT,
created_at DATETIME NOT NULL,
edited_at DATETIME DEFAULT NULL,
deleted_at DATETIME DEFAULT NULL,
PRIMARY KEY (id),
FOREIGN KEY (sender_id) REFERENCES users(id),
FOREIGN KEY (recipient_id) REFERENCES users(id)
);
```
## Indexes in users and messages tables:
Below is the output from **`SHOW INDEX FROM users`** and **`SHOW INDEX FROM messages`**

As we can see from above screenshot, **`users` table have one index, which is Primary index on `id` column**
And **`messages` table have one Primary index (on column `id`) and two Secondary indexes (one on column `sender_id` and other one on column `recipient_id`)**
## Primary and Secondary Index :
A Primary index is always unique, and it uniquely identifies each row in the table and enforces uniqueness on the primary key column(s). A table can have only one primary index.
A Secondary index may or may not be unique, secondary indexes are created to optimize query performance for columns not covered by the primary index. A table can have multiple secondary indexes.
If you see the `CREATE TABLE` command above for `messages` table, we did not explicitly created secondary indexes, it was automatically created by MySQL for enforcing referential integrity **efficiently**. Operations like `ON DELETE CASCADE` or `ON UPDATE CASCADE` need to locate and modify rows in the referencing table quickly.
You can always check the indexes in a table using `SHOW INDEX FROM` command, to get more details.
# Always Check your query : EXPLAIN ANALYZE
## Fast query : using the Primary index in messages table (index lookup)
```sql
select * from messages where id = 679802;
```
## Fast query : using the Secondary index in messages table (index lookup)
```sql
select count(*) from messages where sender_id = 6452;
```
## Fast query : using the Secondary index in messages table (index lookup and filter)
```sql
select * from messages
where sender_id = 12098 AND
created_at BETWEEN '2024-12-05 00:00:00' AND '2024-12-05 23:59:59';
```
## Fast query : using the Secondary index in messages table (index lookup and sort)
```sql
select * from messages where sender_id = 44887 order by created_at ASC;
```
## Fast query : using the Secondary index in messages table (index lookup and filter and sort)
```sql
select * from messages
where sender_id = 889000 AND
created_at BETWEEN '2024-12-01 00:00:00' AND '2024-12-07 23:59:59'
order by created_at ASC;
```
## Fast query : using multiple indexes in messages table
```sql
select * from messages where sender_id = 8 AND recipient_id = 859523;
```
## Slow/Very Slow query : Table Scan (queried column doesn't have index)
If you query on a column which don't have index on it, then the query will be slow or very slow or will take unacceptable amount of time,
depending on how many records that table have. A Table scan is costly and should be avoided.
One of the way to solve this issue, is to create an index on the column on which we want to query.```sql
select count(*) from messages
where created_at BETWEEN '2024-12-03 00:00:00' AND '2024-12-03 23:59:59';
```
# Table Indexes and Index Size
Three indexes on `messages` table take up around 7.4 GB space.
To get the total size of index and actual data stored inside a table, use below sql command
```sql
SELECT TABLE_NAME,
DATA_LENGTH / (1024 * 1024 * 1024) AS DataSize_GB,
INDEX_LENGTH / (1024 * 1024 * 1024) AS IndexSize_GB,
(DATA_LENGTH + INDEX_LENGTH) / (1024 * 1024 * 1024) AS TotalSize_GB
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 'messages';
```
# Creating index on created_at column

# Query performance improvement after creating index (from over 90 seconds to under 6 seconds)

# Indexing Overhead :
Indexing comes with a small cost, whenever a new row inserted or a row deleted, the indexes needs to updated to reflect the updated state of database table.
Similarly when a row is updated (which updates the indexed column), then also index needs to be updated.So indexing does adds to slowing down the write operation.