https://github.com/deepramazumder/resume-management-system

A Machine Learning project for classifying resumes into categories to streamline resume sorting and enhance recruitment processes
https://github.com/deepramazumder/resume-management-system

machine-learning naive-bayes nltk recruitment-automation resume-classification tf-idf

Last synced: 4 months ago
JSON representation

A Machine Learning project for classifying resumes into categories to streamline resume sorting and enhance recruitment processes

Host: GitHub
URL: https://github.com/deepramazumder/resume-management-system
Owner: DeepraMazumder
License: mit
Created: 2024-01-30T12:45:07.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-09-06T07:27:48.000Z (10 months ago)
Last Synced: 2025-02-23T08:46:20.301Z (4 months ago)
Topics: machine-learning, naive-bayes, nltk, recruitment-automation, resume-classification, tf-idf
Language: Python
Homepage: https://resume-classification-deepra-mazumder.streamlit.app/
Size: 1.11 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Resume Classification using Naive Bayes Classifier

## Overview

This repository documents a machine learning project for the classification of resumes into different categories using the UpdatedResumeDataset.csv dataset. The project follows a systematic approach, including pre-processing, exploratory data analysis (EDA), text cleaning, stopwords removal, feature extraction, and model evaluation.

## Steps

### 1. Read the UpdatedResumeDataset.csv dataset

- Loaded the dataset using the pandas library to initiate the project.

### 2. Displayed Categories and Counts

   

- Examined the distribution of resume categories within the dataset using the `value_counts()` method. This step provides an initial understanding of the dataset's class distribution.

### 3. Created a Count Plot

   

- Visualized the count of resumes for each category using a horizontal bar plot. This plot provides a clear representation of the number of resumes in each category, aiding in identifying any class imbalances.

### 4. Created a Pie Plot

- Generated a pie chart illustrating the percentage distribution of resumes across different categories. This visualization helps in grasping the proportional contribution of each category to the overall dataset.

### 5. Converted all the Resume text to lower case

- Standardized the text data by converting all resume text to lowercase.

### 6. Defined a function to clean the resume text

- Developed a cleaning function to remove special characters, URLs, RT, punctuations, and extra whitespace.

- Stored the cleaned text in a new column for further analysis.

### 7. Used nltk package to find the most common words and Generate Word Cloud

- Utilized the nltk library to tokenize the cleaned resume text and identify the most common words.

- Created a Word Cloud to visually represent the most frequently occurring words in the resume text.

### 8. Converted the categorical variable Category to a numerical feature

- Encoded the categorical variable 'Category' into numerical values using label encoding.

### 9. Converted Text to Feature Vectors (TF-IDF)

   

- Utilized the TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer to convert the cleaned resume text into numerical feature vectors. This process involves tokenizing documents, learning vocabulary, and calculating inverse document frequency weightings.

### 10. Applied Naive Bayes Classifier

    

- Splitted the data into training and testing sets.

- Implemented a Naive Bayes Classifier, specifically the MultinomialNB model, to train on the feature vectors and make predictions.

- Evaluated the model's performance by calculating accuracy and providing a detailed classification report.

## Project Link

Explore the complete project, including the Jupyter notebook with code and visualizations, at [Project Link](https://colab.research.google.com/drive/1-Y7YM9YLctt2BdncASrJx9Vdr8-B22dT?usp=sharing).

Feel free to adapt and use the provided code for your own resume classification tasks. For any questions or suggestions, please open an issue or reach out. Happy exploring!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/deepramazumder/resume-management-system

Awesome Lists containing this project

README