{"id":15198938,"url":"https://github.com/alextkdev/resume_parsing","last_synced_at":"2026-02-20T01:01:43.088Z","repository":{"id":254996376,"uuid":"848208643","full_name":"AlexTkDev/resume_parsing","owner":"AlexTkDev","description":"Solution on Python that allows parsing and sorting of resumes from popular job websites.","archived":false,"fork":false,"pushed_at":"2024-09-18T11:19:29.000Z","size":43,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-11T18:22:20.821Z","etag":null,"topics":["beautifulsoup4","ci-cd","flake8","parser","parsing","python3","selenium"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlexTkDev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-27T10:39:39.000Z","updated_at":"2024-09-18T11:19:32.000Z","dependencies_parsed_at":"2024-09-05T16:53:32.545Z","dependency_job_id":"6ad3baf3-11be-4878-8cf9-af94a9c05c5f","html_url":"https://github.com/AlexTkDev/resume_parsing","commit_stats":{"total_commits":53,"total_committers":2,"mean_commits":26.5,"dds":0.05660377358490565,"last_synced_commit":"22e140142940afd540c04617f8866d8a7d799491"},"previous_names":["alextkdev/resume_parsing_and_telegram_bot","alextkdev/resume_parsing"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexTkDev%2Fresume_parsing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexTkDev%2Fresume_parsing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexTkDev%2Fresume_parsing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlexTkDev%2Fresume_parsing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlexTkDev","download_url":"https://codeload.github.com/AlexTkDev/resume_parsing/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219859642,"owners_count":16556035,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup4","ci-cd","flake8","parser","parsing","python3","selenium"],"created_at":"2024-09-28T02:00:23.270Z","updated_at":"2025-10-28T11:32:01.148Z","avatar_url":"https://github.com/AlexTkDev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## :technologist: Solution on Python that allows parsing and sorting of resumes from popular job websites.\n\n \n### :card_file_box: Installation\n1. **Clone the repository:**\n   ```bash\n   git clone https://github.com/AlexTkDev/resume_parsing.git\n   cd resume_parsing\n   ```\n\n2. **Create and activate a virtual environment:**\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # For Windows use `venv\\Scripts\\activate`\n   ```\n\n3. **Install dependencies:**\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n### 1. Script for Parsing Resumes from robota.ua\n**Description:**\nThis script parses resumes from the [robota.ua](https://robota.ua) website. Selenium is used to automate the Chrome browser, which runs in headless mode (without displaying the GUI). The script iterates through resume pages and saves the data into a JSON file.\n\n**Key Functions:**\n- `setup_selenium()`: Configures the Selenium WebDriver to work with Chrome in headless mode.\n- `fetch_resumes(url, driver)`: Opens the page at the given URL, locates the resume elements, and extracts information such as title, link, name, details, and publication time.\n- `save_to_json(data, filename)`: Saves the data into a JSON file. If the file already exists, the data is appended to it.\n- `main(pages, skill)`: The main function that manages the parsing process. It iterates through the specified pages, extracts resumes, and saves them to the `resumes_robota_ua.json` file.\n\n**Example Execution:**\n```bash\n  python robota_ua/get_resume.py --pages 2 --skill python\n```\n\n---\n\n### 2. Script for Parsing Resumes from work.ua\n**Description:**\nThis script is designed to parse resumes from the [work.ua](https://work.ua) website using the BeautifulSoup library for HTML parsing. The script extracts resume information and saves it into a JSON file.\n\n**Key Functions:**\n- `fetch_resumes(url)`: Sends an HTTP request to the resume page and extracts information such as title, link, name, details, and publication time.\n- `save_to_json(data, filename)`: Saves the data into a JSON file. If the file already exists, the data is appended to it.\n- `main(pages, skill)`: The main function that iterates through the specified pages, extracts resumes, and saves them to the `resumes_work_ua.json` file.\n\n**Example Execution:**\n```bash\n  python work_ua/get_resume.py --pages 2 --skill python\n```\n\n---\n\n### 3. Script for Fetching HTML Pages Using Selenium\n**Description:**\nThis script is used to load the HTML content of a web page using Selenium and save it to a file. It is suitable for situations where you need to obtain the full HTML of a page, including dynamically loaded elements.\n\n**Key Functions:**\n- `get_data_by_selenium(url)`: Opens the page at the given URL using Selenium WebDriver, waits for all elements to load, and returns the HTML content of the page.\n- `save_html_to_file(html_content, file_path)`: Saves the provided HTML content to the specified file.\n\n**Example Execution:**\n```python\n    url = \"https://robota.ua/candidates/all/ukraine\"\n    html_data = get_data_by_selenium(url)\n    save_html_to_file(html_data, \"page_content.html\")\n    print(\"HTML saved to 'page_content.html'\")\n```\n\n---\n\n### 4. Script for Parsing and Saving Resumes from work.ua by Links in JSON File\n**Description:**\nThis script extracts resume data from URLs listed in a JSON file, formats it, and saves it to \ntext files. It uses the `requests` library to perform HTTP requests and `BeautifulSoup` to parse \nthe HTML content of the resume pages. Each resume is saved as a `.txt` file in a designated \ndirectory.\n\n**Key Functions:**\n- `get_user_links(file)`: Extracts all values of the 'link' key from the provided JSON file.\n- `clean_text(text)`: Cleans the text by removing unnecessary spaces and newline characters.\n- `get_separate_resume(url)`: Sends an HTTP request to the resume page URL, extracts and formats the resume data such as title, name, and details.\n- `save_to_txt(data, filename)`: Saves the extracted resume data to a text file.\n- `main(file)`: The main function that processes each link extracted from the JSON file. It creates a directory for saving resumes (if it doesn't already exist) and saves each resume in a `.txt` file named using the user ID extracted from the URL.\n\n**Example of running the script:**\n```bash\n  python work_ua/get_separate_resume.py --file resumes_work_ua.json\n```\n\n---\n\n### 5. Script for Parsing Resumes from robota.ua Using Selenium\n**Description:**\nThis script extracts resume data from [robota.ua](https://robota.ua) using Selenium. \nIt reads candidate links and names from a JSON file, navigates to each resume page,\nand extracts information such as experience, skills, education, and languages. The extracted data\nis saved in a text file.\n\n**Key Functions:**\n- `setup_selenium()`: Configures the Selenium WebDriver to use Chrome in headless mode.\n- `get_user_data(file)`: Extracts candidate links and names from a JSON file.\n- `clean_text(text)`: Cleans text by removing unnecessary spaces and newline characters.\n- `get_separate_resume(driver, url)`: Extracts resume information from the given URL using Selenium, including experience, skills, education, and languages.\n- `save_to_txt(data, filename)`: Saves the extracted resume data to a text file.\n- `main(file)`: The main function that processes the JSON file with candidate data, extracts resumes, and saves them to text files.\n\n**Example of running the script:**\n```bash\n  python robota_ua/get_separate_resume.py --file resumes_robota_ua.json\n```\n\n**Notes:**\n- The `resumes_robota_ua.json` file should contain the candidate links and names.\n- The script saves the resumes in the `ready-made_resumes` directory, creating the directory \nif it doesn't exist.\n\n---\n\n### 6. Resume Scoring and Sorting Script\n**Description:**\nThis script scores and sorts resumes saved in text files based on their content. \nIt supports resumes from [robota.ua](https://robota.ua) and [work.ua](https://www.work.ua). \nThe scoring is based on resume completeness, keywords, work experience, education, \nand additional criteria. The script sorts resumes by score in descending order and saves \nthe results to a text file.\n\n**Key Functions:**\n- `score_resume(resume_text)`: Scores the resume based on its content, including resume sections, keywords, work experience, education, and additional criteria.\n- `extract_experience_years(resume_text)`: Extracts the total number of years of experience from the resume text.\n- `load_resumes(resume_folder)`: Loads all resumes from the specified folder.\n- `sort_candidates_by_relevance(resumes)`: Sorts resumes by score in descending order based on the evaluation.\n- `main(resume_folder)`: The main function that loads resumes, scores and sorts them, and then saves the results to `sorted_candidates.txt`.\n\n**Example of running the script:**\n```bash\n  python sorting_resume/sorting_resume.py --directory ready-made_resumes\n```\n\n**Notes:**\n- The script expects resumes to be located in the folder specified by the `--directory` argument.\n- Results will be saved in `sorted_candidates.txt`, containing the resume file name, score, and path to the resume.\n\n\n### 7. Collaborate\n- If you have any suggestions or improvements, please feel free to contribute.\n- Fork this repository.\n- Create a new branch with a meaningful name.\n- Open a pull request.\n- Your changes will be reviewed and merged.\n- Thank you for your contribution!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falextkdev%2Fresume_parsing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falextkdev%2Fresume_parsing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falextkdev%2Fresume_parsing/lists"}