https://github.com/pushshift/binary_search
Example of a binary search implementation using real data (Reddit author info)
https://github.com/pushshift/binary_search
binarysearch python3 reddit search
Last synced: 11 days ago
JSON representation
Example of a binary search implementation using real data (Reddit author info)
- Host: GitHub
- URL: https://github.com/pushshift/binary_search
- Owner: pushshift
- Created: 2020-07-15T18:19:21.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-07-15T18:48:52.000Z (almost 6 years ago)
- Last Synced: 2025-03-01T06:44:27.929Z (over 1 year ago)
- Topics: binarysearch, python3, reddit, search
- Language: Python
- Homepage:
- Size: 1.95 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Binary Search implementation using Python
This code gives an example of implementing binary search using Python. This is a bare-bones example of how to implement a binary search using sorted data (Reddit Authors). Each record is 44 bytes and consists of three fields -- author name, id and the author creation time (epoch).
- The first field is author and is 32 bytes in size. The author name is null padded.
- The second field is the id of the author and is an 8 byte integer.
- The third field is the author creation time (in epoch seconds) and is a 4 byte integer.
There are two methods in binary_search.py.
- The search_record method returns the record position if a match is found or returns False if no match is found.
- The fetch_record method fetches a record from authors.dat
This code could be further optimized by including a cache for the search and fetch methods.
You will need to download the authors.dat file from https://files.pushshift.io/reddit/authors.dat.zst. When you download the file, decompress the file and put it in the same directory as the binary_search.py script.