https://github.com/deepramazumder/resume-management-system
A Machine Learning project for classifying resumes into categories to streamline resume sorting and enhance recruitment processes
https://github.com/deepramazumder/resume-management-system
machine-learning naive-bayes nltk recruitment-automation resume-classification tf-idf
Last synced: 4 months ago
JSON representation
A Machine Learning project for classifying resumes into categories to streamline resume sorting and enhance recruitment processes
- Host: GitHub
- URL: https://github.com/deepramazumder/resume-management-system
- Owner: DeepraMazumder
- License: mit
- Created: 2024-01-30T12:45:07.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-06T07:27:48.000Z (10 months ago)
- Last Synced: 2025-02-23T08:46:20.301Z (4 months ago)
- Topics: machine-learning, naive-bayes, nltk, recruitment-automation, resume-classification, tf-idf
- Language: Python
- Homepage: https://resume-classification-deepra-mazumder.streamlit.app/
- Size: 1.11 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Resume Classification using Naive Bayes Classifier
## Overview
This repository documents a machine learning project for the classification of resumes into different categories using the UpdatedResumeDataset.csv dataset. The project follows a systematic approach, including pre-processing, exploratory data analysis (EDA), text cleaning, stopwords removal, feature extraction, and model evaluation.
## Steps
### 1. Read the UpdatedResumeDataset.csv dataset
- Loaded the dataset using the pandas library to initiate the project.
### 2. Displayed Categories and Counts
- Examined the distribution of resume categories within the dataset using the `value_counts()` method. This step provides an initial understanding of the dataset's class distribution.### 3. Created a Count Plot
- Visualized the count of resumes for each category using a horizontal bar plot. This plot provides a clear representation of the number of resumes in each category, aiding in identifying any class imbalances.### 4. Created a Pie Plot
- Generated a pie chart illustrating the percentage distribution of resumes across different categories. This visualization helps in grasping the proportional contribution of each category to the overall dataset.
### 5. Converted all the Resume text to lower case
- Standardized the text data by converting all resume text to lowercase.
### 6. Defined a function to clean the resume text
- Developed a cleaning function to remove special characters, URLs, RT, punctuations, and extra whitespace.
- Stored the cleaned text in a new column for further analysis.### 7. Used nltk package to find the most common words and Generate Word Cloud
- Utilized the nltk library to tokenize the cleaned resume text and identify the most common words.
- Created a Word Cloud to visually represent the most frequently occurring words in the resume text.### 8. Converted the categorical variable Category to a numerical feature
- Encoded the categorical variable 'Category' into numerical values using label encoding.
### 9. Converted Text to Feature Vectors (TF-IDF)
- Utilized the TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer to convert the cleaned resume text into numerical feature vectors. This process involves tokenizing documents, learning vocabulary, and calculating inverse document frequency weightings.### 10. Applied Naive Bayes Classifier
- Splitted the data into training and testing sets.
- Implemented a Naive Bayes Classifier, specifically the MultinomialNB model, to train on the feature vectors and make predictions.
- Evaluated the model's performance by calculating accuracy and providing a detailed classification report.## Project Link
Explore the complete project, including the Jupyter notebook with code and visualizations, at [Project Link](https://colab.research.google.com/drive/1-Y7YM9YLctt2BdncASrJx9Vdr8-B22dT?usp=sharing).
Feel free to adapt and use the provided code for your own resume classification tasks. For any questions or suggestions, please open an issue or reach out. Happy exploring!