Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vdoninav/habr_parsing_vacancies
Parsing vacancies from habr
https://github.com/vdoninav/habr_parsing_vacancies
bag-of-words bow habr natasha neural-linguistic-processing nlp parsing python python3 text-processing
Last synced: about 2 months ago
JSON representation
Parsing vacancies from habr
- Host: GitHub
- URL: https://github.com/vdoninav/habr_parsing_vacancies
- Owner: vdoninav
- License: mit
- Created: 2022-10-15T13:02:57.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2022-11-22T23:03:12.000Z (about 2 years ago)
- Last Synced: 2024-01-14T12:04:39.464Z (12 months ago)
- Topics: bag-of-words, bow, habr, natasha, neural-linguistic-processing, nlp, parsing, python, python3, text-processing
- Language: Python
- Homepage:
- Size: 3.04 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# habr_parsing_vacancies
current project parses vacancies from habr into a raw json file, handles them using bag of words (bow) and neural linguistic processing (nlp)
compares processed entered text to created data and outputs the best matching vacancies according to euclidean_similarity
# components purposes and launch order
1. parser_writer.py - parses and then writes raw data into json file
2. formatter.py - formats raw data into json file data_edited.json kind of {'1':{},'2':{},...}
3. nlp_handler.py - handles nlp processing, takes data_edited.json | outputs vocab.json and data_nlp.json
4. bow_handler.py - handles bow processing, takes data_nlp.json and vocab.json | outputs data_bow.json
5. comparator.py - compares entered text to parsed vacancies, takes vocab.json, data_bow.json, data_edited.json | outputs simplified vacancy cards in terminal