Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/onpaws/babys-first-index

Baby's first database index from scratch
https://github.com/onpaws/babys-first-index

Last synced: 24 days ago
JSON representation

Baby's first database index from scratch

Host: GitHub
URL: https://github.com/onpaws/babys-first-index
Owner: onpaws
Created: 2023-02-27T05:32:29.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-02-27T18:36:23.000Z (over 1 year ago)
Last Synced: 2023-08-29T00:43:11.414Z (about 1 year ago)
Language: Python
Size: 97.7 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Indexing from scratch for fun
Having benefitted from `CREATE INDEX` over many years' worth of webapp buildling, I realized I was kind of conceptualizing it as free performance pixie dust and hadn't implemented an index from scratch before. So when I randomly saw this [comment](https://news.ycombinator.com/item?id=30604221)° on the Orange Site I made a note to actually try it on a weekend as a miniproject. That weekend finally arrived! 🎉

So basically [index-fun.py](./index-fun.py) is the result of me attempting to build baby's first index in Python, plus a binary search slightly modified to accomodate my ugly "index" 🤡 file.
The point was to take direct inspiration from how dead tree book indexes work - the text file is literally just a text file with sorted lines of
`$word,$index` e.g. `foo,38`

Results:
```
tablescan found wiz @ 9971
tablescan runtime 0.0005388259887695312
binarySearch found wiz @ 9971
index runtime 0.0000209808
```

### TODO:
- [ ] Try something like this again but with btrees

° copy of the comment for convenience:

```
Take a large unordered text file and then create an ordered index in a separate file of word ==> line
write a brief binary search algo to search the index.
Compare searching the words with a "table scan" on the first file using grep, vs the binary search on the index.
You will find the table scan is O(n) and your binary search is roughly O(log n)
In 60 minutes you'll understand more about indexing than reading stack overflow.
```