https://github.com/mach-kernel/suggest
A stab at implementing a Google-style typeahead suggestion from a dump of previous queries.
https://github.com/mach-kernel/suggest
Last synced: 4 months ago
JSON representation
A stab at implementing a Google-style typeahead suggestion from a dump of previous queries.
- Host: GitHub
- URL: https://github.com/mach-kernel/suggest
- Owner: mach-kernel
- License: gpl-3.0
- Created: 2020-01-12T01:15:58.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-01-13T02:47:54.000Z (over 5 years ago)
- Last Synced: 2025-01-08T22:05:26.090Z (6 months ago)
- Language: Java
- Size: 32.2 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# suggest
A stab at implementing a Google-style typeahead suggestion from a dump of previous queries. Written in pure Java 8 with no external dependencies.
## Getting Started
```bash
wget http://www.cim.mcgill.ca/~dudek/206/Logs/AOL-user-ct-collection/aol-data.tar.gz
tar -xvf aol-data.tar.gz
gunzip AOL-user-ct-collection/user-ct-test-collection-01.txt.gz
# Skip the first line, and take the
tail -n +2 AOL-user-ct-collection/user-ct-test-collection-01.txt | cut -f2 > justquery.txt
gradle --console plain run
``````
suggest> loadfile justquery.txt
suggest> suggest food
347 - food network
218 - foodnetwork.com
198 - foodnetwork
115 - foodtv
69 - foodtv.com
68 - food network.com
61 - food jokes
55 - food tv
41 - food channel
39 - food stamps
```## Data
To keep things easy, loading data is limited to reading newline delimited text files. It is surprisingly hard to find these kinds of datasets. The list of data used to test this program are listed below:
- [Web Search Query Logs](https://jeffhuang.com/search_query_logs.html)
- The AOL mirror is the only live one## Methodology
- Google defaults to showing 10 suggestions, so we will do the same.
- The rank of the suggestion will be determined by the frequency of each unique query.