https://github.com/djdhairya/resume-categorization

The Resume Categorization App helps automate the process of categorizing resumes into different job domains using machine learning. It provides a user-friendly interface for uploading resumes, categorizing them, and downloading the results.
https://github.com/djdhairya/resume-categorization

accuracy logistic-regression machine-learning matplotlib pandas pkl pypdf2 sickit-learn streamlit

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/djdhairya/resume-categorization
Owner: djdhairya
License: mit
Created: 2024-12-16T21:00:17.000Z (5 months ago)
Default Branch: main
Last Pushed: 2024-12-16T21:01:25.000Z (5 months ago)
Last Synced: 2025-04-08T13:21:59.516Z (about 1 month ago)
Topics: accuracy, logistic-regression, machine-learning, matplotlib, pandas, pkl, pypdf2, sickit-learn, streamlit
Language: Jupyter Notebook
Homepage:
Size: 209 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

# Resume Categorization App

## Overview

This is a **Resume Categorization** app built using **Streamlit** and **Machine Learning**. The app allows users to upload resumes in PDF format, which are then categorized into various job domains using a **Logistic Regression** model. The app stores categorized resumes in folders and generates a CSV file containing the filename and corresponding category.

## Project Structure

```
Resume-Categorization/
│
├── app.py
├── application.ipynb
├── model/
│ ├── model.pkl
│ └── tfidf.pkl
├── data/
│ └── ResumeDataSet.csv
```

## Requirements

To run this app, you will need to install the following dependencies:

- Python
- Streamlit
- scikit-learn
- PyPDF2
- pandas
- pickle

## How to Use

### 1. Launch the App
Run the Streamlit app by executing:

```bash
streamlit run app.py
```

This will start the Streamlit app and open it in your browser.

### 2. Upload Resumes
- Click the file uploader to select multiple PDF resumes for processing.
- The app will process the resumes and categorize them based on their content.

### 3. Categorize the Resumes
- After uploading the resumes, click the "Categorize Resumes" button.
- The app will display a table with the filename and its assigned category.
- The categorized resumes will be stored in folders named after the predicted categories.

### 4. Download Results
- After categorization, you can download the results as a CSV file that contains the filenames and their respective categories.
- The CSV file will be available for download directly from the app.

### 5. Directory Structure
- The categorized resumes will be stored in the output directory, organized into folders for each category.
- Example: `categorized_resumes/Java Developer/Resume1.pdf`

## File Descriptions

### `app.py`

This file contains the Streamlit application code. It handles:
- Uploading resumes
- Categorizing the resumes using the pre-trained model
- Saving the categorized resumes in respective folders
- Generating and providing a downloadable CSV file with the results

### `application.ipynb`

This Jupyter Notebook is used for:
- Training the machine learning model using a dataset of resumes (`ResumeDataSet.csv`).
- Evaluating the model's performance.
- Saving the trained model (`model.pkl`) and the TF-IDF vectorizer (`tfidf.pkl`).

### `model/model.pkl`

This is the trained **Logistic Regression** model used to categorize resumes. It is used by the `app.py` file to predict the category of the uploaded resume.

### `model/tfidf.pkl`

This file contains the **TF-IDF vectorizer** that is used to transform the text data (cleaned resume) into numerical features that can be fed into the model for categorization.

### `data/ResumeDataSet.csv`

This CSV file contains the dataset of resumes used to train the machine learning model. Each row represents a resume with its associated category.

## Category Mapping

The app categorizes resumes into the following job domains:

- **0**: Advocate
- **1**: Arts
- **2**: Automation Testing
- **3**: Blockchain
- **4**: Business Analyst
- **5**: Civil Engineer
- **6**: Data Science
- **7**: Database
- **8**: DevOps Engineer
- **9**: DotNet Developer
- **10**: ETL Developer
- **11**: Electrical Engineering
- **12**: HR
- **13**: Hadoop
- **14**: Health and Fitness
- **15**: Java Developer
- **16**: Mechanical Engineer
- **17**: Network Security Engineer
- **18**: Operations Manager
- **19**: PMO
- **20**: Python Developer
- **21**: SAP Developer
- **22**: Sales
- **23**: Testing
- **24**: Web Designing

## Conclusion

![Screenshot 2024-12-17 002247](https://github.com/user-attachments/assets/90e52558-981d-40c6-81db-4f52090ac739)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/djdhairya/resume-categorization

Awesome Lists containing this project

README