https://github.com/pdoup/atd
Task for the Advanced Topics in Databases course - DWS MSc Spring '22
https://github.com/pdoup/atd
databases nlp postgresql search-engine
Last synced: about 1 month ago
JSON representation
Task for the Advanced Topics in Databases course - DWS MSc Spring '22
- Host: GitHub
- URL: https://github.com/pdoup/atd
- Owner: pdoup
- Created: 2022-04-25T17:30:12.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2022-06-13T10:13:27.000Z (almost 3 years ago)
- Last Synced: 2023-03-08T21:56:53.736Z (over 2 years ago)
- Topics: databases, nlp, postgresql, search-engine
- Language: Python
- Homepage:
- Size: 3.08 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ATD
Task for the ATD course - DWS MSc Spring 2022
### TODO
- [X] Create a crawler to get articles and save them in csv files
- [X] Add them to Postgres
- [X] Connect Postgres with Python using a connector (psycopg3)
- [X] Read credentials from config file
- [X] Add create directory if not exists in `extract_body.py`
- [X] Fix `article_path.csv`
- [X] Add threshold to relevant docs in `text_query.py`
- [X] Add columns to show in `text_query.py`
- [X] Show lines that have keywords (grep maybe?)
- [X] Add `requirements.txt`
- [X] Fix - In `text_query.py`:301 -> check if list empty
- [X] Move `links.csv` to `csv_files`
- [X] ~~Add show vector in `text_query.py` output~~
- [X] Use GIN index on docvec column
- [X] Displaying docvec troublesome in terminal
- [X] Add comments