https://github.com/bazilsuhail/resume-dataset

Resume-Dataset is a Python-based project to generate PDF CVs from a LaTeX template and a CSV dataset, designed for creating a structured dataset for training LayoutLM, a model for document understanding.
https://github.com/bazilsuhail/resume-dataset

data-set latex-document latex-resume-template resume-builder resume-dataset resume-template

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/bazilsuhail/resume-dataset
Owner: BazilSuhail
License: mit
Created: 2025-08-09T16:43:59.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-08-09T17:21:05.000Z (11 months ago)
Last Synced: 2025-08-30T23:03:06.459Z (11 months ago)
Topics: data-set, latex-document, latex-resume-template, resume-builder, resume-dataset, resume-template
Language: Jupyter Notebook
Homepage:
Size: 906 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Resume-Dataset

## Overview
**Resume-Dataset** is a Python-based project to generate PDF CVs from a LaTeX template and a CSV dataset, designed for creating a structured dataset for training LayoutLM, a model for document understanding. The repository includes a LaTeX template (`cv_template.tex`) and a sample CSV dataset (`cv_data.csv`) to generate CVs, which can be processed for text and layout extraction.

## Repository Contents
- `cv_template.tex`: Injectable LaTeX template with placeholders for CV data.
- `cv_data.csv`: Sample dataset with resume data for two individuals (Sourabh Bajaj, Jane Smith).

## Approach
- Parse CSV data containing resume details (name, email, mobile, website, education, experience, projects, languages, technologies).
- Inject CSV data into LaTeX template placeholders.
- Compile LaTeX files to PDFs using `pdflatex`.
- Clean up auxiliary files (`.aux`, `.log`, `.tex`) after compilation.
- Enable dataset creation for LayoutLM by providing PDFs for text and bounding box extraction.

## Usage
1. Install dependencies: `texlive`, `pandas`.
2. Place `cv_template.tex` and `cv_data.csv` in the working directory.
3. Run `generate_cvs.py` to produce PDFs (e.g., `CV_Sourabh_Bajaj.pdf`, `CV_Jane_Smith.pdf`).
4. Use PDFs for LayoutLM dataset preparation (e.g., extract text and bounding boxes with `pdfplumber`).

## Prerequisites
- Python 3.6+
- LaTeX distribution (TeX Live/MiKTeX)
- `pandas` library

## Installation (Google Colab)
```python
!apt-get update
!apt-get install -y texlive texlive-latex-extra texlive-fonts-extra
!pip install pandas
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bazilsuhail/resume-dataset

Awesome Lists containing this project

README