https://github.com/sodascience/workshop_llm_data_collection
This repository contains the code and slides for our workshop on data collection and inference with Large Language Models
https://github.com/sodascience/workshop_llm_data_collection
data-collection inference llm python r workshop workshop-materials
Last synced: 2 months ago
JSON representation
This repository contains the code and slides for our workshop on data collection and inference with Large Language Models
- Host: GitHub
- URL: https://github.com/sodascience/workshop_llm_data_collection
- Owner: sodascience
- License: cc-by-4.0
- Created: 2025-03-19T10:41:54.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-03-24T11:31:33.000Z (3 months ago)
- Last Synced: 2026-03-25T14:39:01.046Z (3 months ago)
- Topics: data-collection, inference, llm, python, r, workshop, workshop-materials
- Language: Jupyter Notebook
- Homepage:
- Size: 12.2 MB
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Workshop Data Collection with LLMs in Social Sciences
[](https://doi.org/10.5281/zenodo.16631539)
This repository contains the code and slides for our workshop on data collection and inference with Large Language Models. The materials on this page are [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) licensed.
 
More information can be found on the website [here](https://sodascience.github.io/workshop_llm_data_collection/).

## Tutorial Paper
Read and cite our tutorial paper (preprint):
- Fang, Q., Garcia-Bernardo, J., & van Kesteren, E. (2026, March 23). A Methodological Guide on Using Large Language Models for Text Annotation in the Social Sciences and Humanities with Python and R. Retrieved from osf.io/preprints/socarxiv/v4eq6_v2
- [`Download`](https://osf.io/preprints/socarxiv/v4eq6_v2) from OSF
## Technical details
- No previous experience with LLMs is required.
- `R` or `python` programming knowledge is desired but not required.
- In python we will use [`langchain`](https://python.langchain.com/docs/introduction/), in R we will use [`ellmer`](https://ellmer.tidyverse.org/) to interact with LLMs.
## Slides
- Full workshop slides (v2026.01.23): [`Download`](./slides/soda_llm_workshop_slides.pdf)
- ODISSEI 2025 workshop slides: [`Download`](./slides/soda_llm_workshop_odissei_25_slides.pdf)
## Full Workshop Schedule
| Time | Title | Resource |
| :---- | :----------------------------------- | :------------------------------------------------------------------------------------------- |
| 09:30 | LLM fundamentals for Social Sciences | |
| 11:00 | Coffee break | Coffee is provided! |
| 11:20 | Data collection/annotation with LLMs | [`python`](https://colab.research.google.com/github/sodascience/workshop_llm_data_collection/blob/main/notebooks/llm_data_collection_py.ipynb), [`R`](https://colab.research.google.com/github/sodascience/workshop_llm_data_collection/blob/main/notebooks/llm_data_collection_R.ipynb) |
| 12:30 | Break | Lunch is provided! |
| 13:15 | Inference with LLM annotations | [`python`](https://colab.research.google.com/github/sodascience/workshop_llm_data_collection/blob/main/notebooks/llm_inferential_regression_py.ipynb), [`R`](https://colab.research.google.com/github/sodascience/workshop_llm_data_collection/blob/main/notebooks/llm_inferential_regression_R.ipynb) |
| 14:30 | Conclusion & Q&A | |
Methods and software for inference with measurement error correction: [sodascience/social_science_inferences_with_llms](https://github.com/sodascience/social_science_inferences_with_llms).
## Contact
This project is developed and maintained by the [ODISSEI Social Data Science (SoDa)](https://odissei-soda.nl/) team.

Do you have questions, suggestions, or remarks? File an [issue](https://github.com/sodascience/workshop_llm_data_collection/issues) or feel free to contact [Qixiang Fang](https://github.com/fqixiang) or [Erik-Jan van Kesteren](https://github.com/vankesteren).