https://github.com/evilfreelancer/hh-dataset-generator
Vacancies/Keywords DataSet generator by details crawled from API HH.ru
https://github.com/evilfreelancer/hh-dataset-generator
Last synced: 2 months ago
JSON representation
Vacancies/Keywords DataSet generator by details crawled from API HH.ru
- Host: GitHub
- URL: https://github.com/evilfreelancer/hh-dataset-generator
- Owner: EvilFreelancer
- Created: 2022-11-10T19:13:25.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2022-11-12T07:07:58.000Z (about 3 years ago)
- Last Synced: 2025-02-09T15:15:41.113Z (11 months ago)
- Language: Python
- Size: 10.7 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Vacancies/Keywords DataSet generator from HH.ru
Collection of simple scripts for crawling vacancies from HH.ru site
via API for generating CSV file by fields data like: name,
description and key skills.
It helps to generate CSV file with following format:
```csv
"$name1 & $description1","key skills1"
"$name2 & $description2","key skills2"
"$name3 & $description3","key skills3"
...
```
Scripts tested on python 3.10 but should work on previous versions too.
## How to use
Clone this repo.
Install all requirements:
```shell
pip install -r requirements.txt
```
### Get pages
Change `text` field in `download.py` to yours:
```python
text = 'NAME:Data science'
```
Then run script
```shell
python3 download.py
```
This script will download save results from API to `./docs/pagination`
folder in JSON format.
### Get details about vacancies
On the next step we need to download extended details about vacancies:
```shell
python3 parse.py
```
Script will call API and save responses to `./docs/vacancies` folder.
### Generate CSV
```shell
python3 generate.py
```
Result will be saved to `./docs/csv` folder.
## Links
* https://office-menu.ru/python/96-api-hh
* https://stackoverflow.com/questions/753052/strip-html-from-strings-in-python
* https://pythonspot.com/files-spreadsheets-csv/