https://github.com/aida-ugent/occupation_coding_datasets
https://github.com/aida-ugent/occupation_coding_datasets
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/aida-ugent/occupation_coding_datasets
- Owner: aida-ugent
- License: cc0-1.0
- Created: 2023-09-10T10:43:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-13T09:46:41.000Z (over 1 year ago)
- Last Synced: 2025-01-21T21:33:37.640Z (4 months ago)
- Size: 2 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Occupation coding datasets
1. GenEasy: A collection of 500 synthetic job listings linked to select ESCO occupation codes, crafted using GPT-4.
2. GenHard: Identical to the above, but with job titles diverging from the textual descriptors of their respective codes.
3. Real_indeed: A set of 100 genuine job listings sourced from Indeed, annotated manually.## Each dataset consists of columns for ID, job title, description, label, and other potential supplementary data.