An open API service indexing awesome lists of open source software.

https://github.com/sodascience/workshop_llm_data_collection

This repository contains the code and slides for our workshop on data collection and inference with Large Language Models
https://github.com/sodascience/workshop_llm_data_collection

data-collection inference llm python r workshop workshop-materials

Last synced: 2 months ago
JSON representation

This repository contains the code and slides for our workshop on data collection and inference with Large Language Models

Awesome Lists containing this project

README

          

# Workshop Data Collection with LLMs in Social Sciences

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.16631539.svg)](https://doi.org/10.5281/zenodo.16631539)

This repository contains the code and slides for our workshop on data collection and inference with Large Language Models. The materials on this page are [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) licensed.

![cc](https://mirrors.creativecommons.org/presskit/icons/cc.svg) ![by](https://mirrors.creativecommons.org/presskit/icons/by.svg)

More information can be found on the website [here](https://sodascience.github.io/workshop_llm_data_collection/).

SoDa logo

## Tutorial Paper
Read and cite our tutorial paper (preprint):
- Fang, Q., Garcia-Bernardo, J., & van Kesteren, E. (2026, March 23). A Methodological Guide on Using Large Language Models for Text Annotation in the Social Sciences and Humanities with Python and R. Retrieved from osf.io/preprints/socarxiv/v4eq6_v2
- [`Download`](https://osf.io/preprints/socarxiv/v4eq6_v2) from OSF

## Technical details
- No previous experience with LLMs is required.
- `R` or `python` programming knowledge is desired but not required.
- In python we will use [`langchain`](https://python.langchain.com/docs/introduction/), in R we will use [`ellmer`](https://ellmer.tidyverse.org/) to interact with LLMs.

## Slides
- Full workshop slides (v2026.01.23): [`Download`](./slides/soda_llm_workshop_slides.pdf)
- ODISSEI 2025 workshop slides: [`Download`](./slides/soda_llm_workshop_odissei_25_slides.pdf)

## Full Workshop Schedule

| Time | Title | Resource |
| :---- | :----------------------------------- | :------------------------------------------------------------------------------------------- |
| 09:30 | LLM fundamentals for Social Sciences | |
| 11:00 | Coffee break | Coffee is provided! |
| 11:20 | Data collection/annotation with LLMs | [`python`](https://colab.research.google.com/github/sodascience/workshop_llm_data_collection/blob/main/notebooks/llm_data_collection_py.ipynb), [`R`](https://colab.research.google.com/github/sodascience/workshop_llm_data_collection/blob/main/notebooks/llm_data_collection_R.ipynb) |
| 12:30 | Break | Lunch is provided! |
| 13:15 | Inference with LLM annotations | [`python`](https://colab.research.google.com/github/sodascience/workshop_llm_data_collection/blob/main/notebooks/llm_inferential_regression_py.ipynb), [`R`](https://colab.research.google.com/github/sodascience/workshop_llm_data_collection/blob/main/notebooks/llm_inferential_regression_R.ipynb) |
| 14:30 | Conclusion & Q&A | |

Methods and software for inference with measurement error correction: [sodascience/social_science_inferences_with_llms](https://github.com/sodascience/social_science_inferences_with_llms).

## Contact

This project is developed and maintained by the [ODISSEI Social Data Science (SoDa)](https://odissei-soda.nl/) team.

SoDa logo

Do you have questions, suggestions, or remarks? File an [issue](https://github.com/sodascience/workshop_llm_data_collection/issues) or feel free to contact [Qixiang Fang](https://github.com/fqixiang) or [Erik-Jan van Kesteren](https://github.com/vankesteren).