{"id":13677292,"url":"https://github.com/OmkarPathak/pyresparser","last_synced_at":"2025-04-29T11:30:31.652Z","repository":{"id":38713043,"uuid":"194691524","full_name":"OmkarPathak/pyresparser","owner":"OmkarPathak","description":"A simple resume parser used for extracting information from resumes","archived":false,"fork":false,"pushed_at":"2023-09-13T06:54:26.000Z","size":6045,"stargazers_count":811,"open_issues_count":46,"forks_count":412,"subscribers_count":17,"default_branch":"master","last_synced_at":"2024-11-05T05:33:21.863Z","etag":null,"topics":["extract","extracting-data","machine-learning","natural-language-processing","nlp","parser","parsers","pyresparser","python","python3","resume","resume-parser","resumes","skills"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OmkarPathak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":["pyresparser"]}},"created_at":"2019-07-01T14:43:02.000Z","updated_at":"2024-10-26T05:35:16.000Z","dependencies_parsed_at":"2023-02-06T06:46:25.993Z","dependency_job_id":"7ac8def5-e8f0-4adb-a838-f35fa7e77fa6","html_url":"https://github.com/OmkarPathak/pyresparser","commit_stats":{"total_commits":65,"total_committers":3,"mean_commits":"21.666666666666668","dds":0.0461538461538461,"last_synced_commit":"a66f25b583f2dd8dbd18f419321eed57b04a006e"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OmkarPathak%2Fpyresparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OmkarPathak%2Fpyresparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OmkarPathak%2Fpyresparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OmkarPathak%2Fpyresparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OmkarPathak","download_url":"https://codeload.github.com/OmkarPathak/pyresparser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224163503,"owners_count":17266513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extract","extracting-data","machine-learning","natural-language-processing","nlp","parser","parsers","pyresparser","python","python3","resume","resume-parser","resumes","skills"],"created_at":"2024-08-02T13:00:39.969Z","updated_at":"2024-11-11T19:31:25.016Z","avatar_url":"https://github.com/OmkarPathak.png","language":"Python","funding_links":["https://github.com/sponsors/pyresparser","https://paypal.me/omkarpathak27"],"categories":["Python"],"sub_categories":[],"readme":"# pyresparser\n\n```\nA simple resume parser used for extracting information from resumes\n```\n\nBuilt with ❤︎ and :coffee: by  [Omkar Pathak](https://github.com/OmkarPathak)\n\n---\n\n[![GitHub stars](https://img.shields.io/github/stars/OmkarPathak/pyresparser.svg)](https://github.com/OmkarPathak/pyresparser/stargazers)\n[![PyPI](https://img.shields.io/pypi/v/pyresparser.svg)](https://pypi.org/project/pyresparser/)\n[![Downloads](https://pepy.tech/badge/pyresparser)](https://pepy.tech/project/pyresparser)\n[![GitHub](https://img.shields.io/github/license/omkarpathak/pyresparser.svg)](https://github.com/OmkarPathak/pyresparser/blob/master/LICENSE) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/Django.svg) [![Say Thanks!](https://img.shields.io/badge/Say%20Thanks-:D-1EAEDB.svg)](https://saythanks.io/to/omkarpathak27@gmail.com)\n[![Build Status](https://travis-ci.com/OmkarPathak/pyresparser.svg?branch=master)](https://travis-ci.com/OmkarPathak/pyresparser)\n[![codecov](https://codecov.io/gh/OmkarPathak/pyresparser/branch/master/graph/badge.svg)](https://codecov.io/gh/OmkarPathak/pyresparser)\n\n# Features\n\n- Extract name\n- Extract email\n- Extract mobile numbers\n- Extract skills\n- Extract total experience\n- Extract college name\n- Extract degree\n- Extract designation\n- Extract company names\n\n# Installation\n\n- You can install this package using\n\n```bash\npip install pyresparser\n```\n\n- For NLP operations we use spacy and nltk. Install them using below commands:\n\n```bash\n# spaCy\npython -m spacy download en_core_web_sm\n\n# nltk\npython -m nltk.downloader words\npython -m nltk.downloader stopwords\n```\n\n# Documentation\n\nOfficial documentation is available at: https://www.omkarpathak.in/pyresparser/\n\n# Supported File Formats\n\n- PDF and DOCx files are supported on all Operating Systems\n- If you want to extract DOC files you can install [textract](https://textract.readthedocs.io/en/stable/installation.html) for your OS (Linux, MacOS)\n- Note: You just have to install textract (and nothing else) and doc files will get parsed easily\n\n# Usage\n\n- Import it in your Python project\n\n```python\nfrom pyresparser import ResumeParser\ndata = ResumeParser('/path/to/resume/file').get_extracted_data()\n```\n\n# CLI\n\nFor running the resume extractor you can also use the `cli` provided\n\n```bash\nusage: pyresparser [-h] [-f FILE] [-d DIRECTORY] [-r REMOTEFILE]\n                   [-re CUSTOM_REGEX] [-sf SKILLSFILE] [-e EXPORT_FORMAT]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -f FILE, --file FILE  resume file to be extracted\n  -d DIRECTORY, --directory DIRECTORY\n                        directory containing all the resumes to be extracted\n  -r REMOTEFILE, --remotefile REMOTEFILE\n                        remote path for resume file to be extracted\n  -re CUSTOM_REGEX, --custom-regex CUSTOM_REGEX\n                        custom regex for parsing mobile numbers\n  -sf SKILLSFILE, --skillsfile SKILLSFILE\n                        custom skills CSV file against which skills are\n                        searched for\n  -e EXPORT_FORMAT, --export-format EXPORT_FORMAT\n                        the information export format (json)\n```\n\n# Notes:\n\n- If you are running the app on windows, then you can only extract .docs and .pdf files\n\n# Result\n\nThe module would return a list of dictionary objects with result as follows:\n\n```\n[\n  {\n    'college_name': ['Marathwada Mitra Mandal’s College of Engineering'],\n    'company_names': None,\n    'degree': ['B.E. IN COMPUTER ENGINEERING'],\n    'designation': ['Manager',\n                    'TECHNICAL CONTENT WRITER',\n                    'DATA ENGINEER'],\n    'email': 'omkarpathak27@gmail.com',\n    'mobile_number': '8087996634',\n    'name': 'Omkar Pathak',\n    'no_of_pages': 3,\n    'skills': ['Operating systems',\n              'Linux',\n              'Github',\n              'Testing',\n              'Content',\n              'Automation',\n              'Python',\n              'Css',\n              'Website',\n              'Django',\n              'Opencv',\n              'Programming',\n              'C',\n              ...],\n    'total_experience': 1.83\n  }\n]\n```\n\n# References that helped me get here\n\n- Some of the core concepts behind the algorithm have been taken from [https://github.com/divapriya/Language_Processing](https://github.com/divapriya/Language_Processing) which has been summed up in this blog [https://medium.com/@divalicious.priya/information-extraction-from-cv-acec216c3f48](https://medium.com/@divalicious.priya/information-extraction-from-cv-acec216c3f48). Thanks to Priya for sharing this concept\n\n- [https://www.kaggle.com/nirant/hitchhiker-s-guide-to-nlp-in-spacy](https://www.kaggle.com/nirant/hitchhiker-s-guide-to-nlp-in-spacy)\n\n- [https://www.analyticsvidhya.com/blog/2017/04/natural-language-processing-made-easy-using-spacy-%E2%80%8Bin-python/](https://www.analyticsvidhya.com/blog/2017/04/natural-language-processing-made-easy-using-spacy-%E2%80%8Bin-python/)\n\n- **Special thanks** to dataturks for their [annotated dataset](https://dataturks.com/blog/named-entity-recognition-in-resumes.php)\n\n# Donation\n\nIf you have found my softwares to be of any use to you, do consider helping me pay my internet bills. This would encourage me to create many such softwares :smile:\n\n| PayPal | \u003ca href=\"https://paypal.me/omkarpathak27\" target=\"_blank\"\u003e\u003cimg src=\"https://www.paypalobjects.com/webstatic/mktg/logo/AM_mc_vs_dc_ae.jpg\" alt=\"Donate via PayPal!\" title=\"Donate via PayPal!\" /\u003e\u003c/a\u003e |\n|:-------------------------------------------:|:-------------------------------------------------------------:|\n| ₹ (INR)  | \u003ca href=\"https://www.instamojo.com/@omkarpathak27/\" target=\"_blank\"\u003e\u003cimg src=\"https://www.soldermall.com/images/pic-online-payment.jpg\" alt=\"Donate via Instamojo\" title=\"Donate via instamojo\" /\u003e\u003c/a\u003e |\n\n# Stargazer over time\n[![Stargazers over time](https://starchart.cc/OmkarPathak/pyresparser.svg)](https://starchart.cc/OmkarPathak/pyresparser)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOmkarPathak%2Fpyresparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOmkarPathak%2Fpyresparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOmkarPathak%2Fpyresparser/lists"}