Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pandakn/course-data-dive

Extract course data from PDF
https://github.com/pandakn/course-data-dive

python

Last synced: 7 days ago
JSON representation

Extract course data from PDF

Awesome Lists containing this project

README

        

# Course Data Dive 📚

This project extracts course data from PDF files of SUT (Suranaree University of Technology) and saves it in CSV and JSON formats.

## Table of Contents

- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Usage](#usage)
- [Makefile Commands](#makefile-commands)

## Prerequisites

- Python 3.x
- Virtual environment (`venv`) module
- make cli (`optional`) (powerful command-line tool, pls installðŸĪĢ)

## Installation

1. Clone the repository:
```bash
git clone https://github.com/pandakn/course-data-dive.git
cd course-data-dive
```
2. Create and activate the virtual environment, then install the dependencies:
```bash
make setup
```
3. Activate the virtual environment:
- On macOS and Linux:
```bash
source venv/bin/activate
```
- On Windows:
```bash
.\venv\Scripts\activate
```

## Usage

1. To extract course data from a PDF:
```bash
make run
```
2. The extracted data will be saved as courses.csv and courses.json in the same directory as the PDF file:

##### JSON

```json
[
{
"id": "ENG23 2031",
"code": "ENG23 2031",
"name": "āđ‚āļ„āļĢāļ‡āļŠāļĢāđ‰āļēāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāđāļĨāļ°āļ‚āļąāđ‰āļ™āļ•āļ­āļ™āļ§āļīāļ˜āļĩ",
"credit": "4(4-0-8)",
"description": "āļāļēāļĢāļ§āļīāđ€āļ„āļĢāļēāļ°āļŦāđŒāļ‚āļąāđ‰āļ™āļ•āļ­āļ™āļ§āļīāļ˜āļĩāđ€āļšāļ·āđ‰āļ­āļ‡āļ•āđ‰āļ™ āļ„āļ§āļēāļĄāļ‹āļąāļšāļ‹āđ‰āļ­āļ™āļ‚āļ­āļ‡āļ‚āļąāđ‰āļ™āļ•āļ­āļ™āļ§āļīāļ˜āļĩ āļ‚āļąāđ‰āļ™āļ•āļ­āļ™āļ§āļīāļ˜āļĩāđƒāļ™āļāļēāļĢāđ€āļĢāļĩāļĒāļ‡āļĨāļģāļ”āļąāļšāđāļĨāļ°āļ„āđ‰āļ™āļŦāļēāļ‚āđ‰āļ­āļĄāļđāļĨ āđ‚āļ„āļĢāļ‡āļŠāļĢāđ‰āļēāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļĨāļīāļ‡āļ„āđŒāļĨāļīāļŠāļ•āđŒ āļ„āļīāļ§ āļŠāđāļ•āļ āđ„āļšāļ™āļēāļĢāļĩāļ—āļĢāļĩ āļšāļĩāļ—āļĢāļĩāđāļĨāļ°āļŪāļĩāļž āļāļĨāļĒāļļāļ—āļ˜āđŒāļ‚āļ­āļ‡āļ‚āļąāđ‰āļ™āļ•āļ­āļ™āļ§āļīāļ˜āļĩ",
"prerequisites": "ENG23 2001 āļāļēāļĢāđ€āļ‚āļĩāļĒāļ™āđ‚āļ›āļĢāđāļāļĢāļĄāļ„āļ­āļĄāļžāļīāļ§āđ€āļ•āļ­āļĢāđŒ 2",
"prerequisites_list": ["ENG23 2001"]
},
{
"id": "ENG23 2032",
"code": "ENG23 2032",
"name": "āđ€āļ—āļ„āđ‚āļ™āđ‚āļĨāļĒāļĩāđ€āļŠāļīāļ‡āļ§āļąāļ•āļ–āļļ",
"credit": "4(3-3-9)",
"description": "āđāļ™āļ§āļ„āļīāļ”āđ€āļŠāļīāļ‡āļ§āļąāļ•āļ–āļļ āļāļēāļĢāđ€āļ‚āļĩāļĒāļ™āđ‚āļ›āļĢāđāļāļĢāļĄāđ€āļŠāļīāļ‡āļ§āļąāļ•āļ–āļļ āļ„āļĨāļēāļŠāđāļĨāļ°āļ§āļąāļ•āļ–āļļ āļāļēāļĢāļŦāđˆāļ­āļŦāļļāđ‰āļĄ āļāļēāļĢāļŠāļ·āļšāļ—āļ­āļ” āđ‚āļžāļĨāļĩāļĄāļ­āļĢāđŒāļŸāļīāļŠāļ‹āļķāļĄ āļ­āļīāļ™āđ€āļ•āļ­āļĢāđŒāđ€āļŸāļŠāļŠāļēāļĒāļ­āļąāļāļ‚āļĢāļ° āļāļēāļĢāļˆāļąāļ”āļāļēāļĢāļ‚āđ‰āļ­āļœāļīāļ”āļžāļĨāļēāļ”",
"prerequisites": "ENG23 2001 āļāļēāļĢāđ€āļ‚āļĩāļĒāļ™āđ‚āļ›āļĢāđāļāļĢāļĄāļ„āļ­āļĄāļžāļīāļ§āđ€āļ•āļ­āļĢāđŒ 2 āđāļĨāļ° ENG23 2003 āļāļēāļĢāđāļāđ‰āļ›āļąāļāļŦāļēāļ”āđ‰āļ§āļĒāļāļēāļĢāđ‚āļ›āļĢāđāļāļĢāļĄ",
"prerequisites_list": ["ENG23 2001", "ENG23 2003"]
}
]
```

##### CSV

| id | code | name | credit | prerequisites | description |
| ---------- | ---------- | ----------------------------- | -------- | -------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ENG23 2031 | ENG23 2031 | āđ‚āļ„āļĢāļ‡āļŠāļĢāđ‰āļēāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāđāļĨāļ°āļ‚āļąāđ‰āļ™āļ•āļ­āļ™āļ§āļīāļ˜āļĩ | 4(4-0-8) | ENG23 2001 āļāļēāļĢāđ€āļ‚āļĩāļĒāļ™āđ‚āļ›āļĢāđāļāļĢāļĄāļ„āļ­āļĄāļžāļīāļ§āđ€āļ•āļ­āļĢāđŒ 2 | āļāļēāļĢāļ§āļīāđ€āļ„āļĢāļēāļ°āļŦāđŒāļ‚āļąāđ‰āļ™āļ•āļ­āļ™āļ§āļīāļ˜āļĩāđ€āļšāļ·āđ‰āļ­āļ‡āļ•āđ‰āļ™ āļ„āļ§āļēāļĄāļ‹āļąāļšāļ‹āđ‰āļ­āļ™āļ‚āļ­āļ‡āļ‚āļąāđ‰āļ™āļ•āļ­āļ™āļ§āļīāļ˜āļĩ āļ‚āļąāđ‰āļ™āļ•āļ­āļ™āļ§āļīāļ˜āļĩāđƒāļ™āļāļēāļĢāđ€āļĢāļĩāļĒāļ‡āļĨāļģāļ”āļąāļšāđāļĨāļ°āļ„āđ‰āļ™āļŦāļēāļ‚āđ‰āļ­āļĄāļđāļĨ āđ‚āļ„āļĢāļ‡āļŠāļĢāđ‰āļēāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļĨāļīāļ‡āļ„āđŒāļĨāļīāļŠāļ•āđŒ āļ„āļīāļ§ āļŠāđāļ•āļ āđ„āļšāļ™āļēāļĢāļĩāļ—āļĢāļĩ āļšāļĩāļ—āļĢāļĩāđāļĨāļ°āļŪāļĩāļž āļāļĨāļĒāļļāļ—āļ˜āđŒāļ‚āļ­āļ‡āļ‚āļąāđ‰āļ™āļ•āļ­āļ™āļ§āļīāļ˜āļĩ |
| ENG23 2032 | ENG23 2032 | āđ€āļ—āļ„āđ‚āļ™āđ‚āļĨāļĒāļĩāđ€āļŠāļīāļ‡āļ§āļąāļ•āļ–āļļ | 4(3-3-9) | ENG23 2001 āļāļēāļĢāđ€āļ‚āļĩāļĒāļ™āđ‚āļ›āļĢāđāļāļĢāļĄāļ„āļ­āļĄāļžāļīāļ§āđ€āļ•āļ­āļĢāđŒ 2 āđāļĨāļ° ENG23 2003 āļāļēāļĢāđāļāđ‰āļ›āļąāļāļŦāļēāļ”āđ‰āļ§āļĒāļāļēāļĢāđ‚āļ›āļĢāđāļāļĢāļĄ | āđāļ™āļ§āļ„āļīāļ”āđ€āļŠāļīāļ‡āļ§āļąāļ•āļ–āļļ āļāļēāļĢāđ€āļ‚āļĩāļĒāļ™āđ‚āļ›āļĢāđāļāļĢāļĄāđ€āļŠāļīāļ‡āļ§āļąāļ•āļ–āļļ āļ„āļĨāļēāļŠāđāļĨāļ°āļ§āļąāļ•āļ–āļļ āļāļēāļĢāļŦāđˆāļ­āļŦāļļāđ‰āļĄ āļāļēāļĢāļŠāļ·āļšāļ—āļ­āļ” āđ‚āļžāļĨāļĩāļĄāļ­āļĢāđŒāļŸāļīāļŠāļ‹āļķāļĄ āļ­āļīāļ™āđ€āļ•āļ­āļĢāđŒāđ€āļŸāļŠāļŠāļēāļĒāļ­āļąāļāļ‚āļĢāļ° āļāļēāļĢāļˆāļąāļ”āļāļēāļĢāļ‚āđ‰āļ­āļœāļīāļ”āļžāļĨāļēāļ” |

## Makefile Commands

- `make setup` : Create virtual environment and install dependencies
- `make test` : Run all tests
- `make lint` : Run linter (Ruff)
- `make format` : Format code with Ruff
- `make check` : Run linter and formatter check without making changes
- `make clean` : Remove virtual environment and cache files
- `make run` : Run the main application