Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dimits-ts/synthetic_discussion_framework
Continuation of thesis project, Synthetic Discussion Framework v2.0
https://github.com/dimits-ts/synthetic_discussion_framework
Last synced: about 8 hours ago
JSON representation
Continuation of thesis project, Synthetic Discussion Framework v2.0
- Host: GitHub
- URL: https://github.com/dimits-ts/synthetic_discussion_framework
- Owner: dimits-ts
- Created: 2024-10-14T12:01:11.000Z (24 days ago)
- Default Branch: master
- Last Pushed: 2024-11-01T12:08:12.000Z (6 days ago)
- Last Synced: 2024-11-01T13:18:50.036Z (6 days ago)
- Language: Python
- Size: 691 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Synthetic Discussion Framework (SDF)
Continuation of the [sister thesis project](https://github.com/dimits-ts/llm_moderation_research). A lightweight, simple and specialized framework used for creating, storing, annotating and analyzing
synthetic discussions between LLM users in the context of online discussions. Used as a proxy for experimentation with human users when
researching optimal LLM moderation techniques.## Requirements
### Environment & Dependencies
The code is tested for Linux only. The platform-specific (Linux x86 / NVIDIA CUDA) conda environment used in this project can be found up-to-date [here](https://github.com/dimits-ts/conda_auto_backup/blob/master/llm.yml).
Run [`src/scripts/download_model.sh`](src/scripts/download_model.sh) in order to download the model used to run the framework in the correct directory (~5 GB of storage needed).
## Use
### Setting up configurations
The framework is intended to be used with modular input files, which are then combined in various combinations to generate the final conversation inputs.
Default configurations are already provided. To modify and add configurations, simply change/add files in the [`data/generated_discussion_input/modular_configurations`](data/generated_discussion_input/modular_configurations) directory.
To generate the final conversation inputs run the [`src/scripts/generate_conversation_inputs.sh`](src/scripts/generate_conv_configs_personalized.sh) script.
### Synthetic conversation creation
There are many ways with which to use the synthetic conversation framework:
1. (Preferred) Run [`src/scripts/conversation_personalized.sh`](src/scripts/conversation_personalized.sh), where `output_dir` is the directory of the final conversation inputs (see section above)
1. Create a new python script leveraging the framework library found in the `sdl` module## Structure
The project is structured as follows:
- `data`: SDF input configurations and output
- `src/models`: directory for local LLM instances
- `src/scripts`: automation scripts for batch processing of experiments and conversation input creation
- `src/sdl`: the Synthetic Discussion Library, containing the necessary modules for synthetic discussion creation and annotationNotable files:
- [`src/sdf_create_conversations.py`](src/sdf_create_conversations.py): script automatically loading a conversation's parameters, executing the synthetic dialogue using a local LLM, and serializing the output
- [`src/sdf_create_annotations.py`](src/sdf_create_annotations.py): script loading a previously concluded conversation from serialized data, executing an annotation job using a local LLM, and serializing the output
- [`src/generate_conv_configs.py`](src/generate_conv_configs.py): a notebook containing notes on the experiments, implementation and design details, as well as example code for our framework## Documentation
Since the project is still nascent and its API constantly shifts, there is no separate, stable documentation. However, we provide up-to-date documentation in the docstrings found in the python source files.