{"id":25926914,"url":"https://github.com/clchinkc/story_crowdsource_preference","last_synced_at":"2026-04-11T09:02:04.999Z","repository":{"id":280259926,"uuid":"938454408","full_name":"clchinkc/story_crowdsource_preference","owner":"clchinkc","description":"Personal project, Generative AI, Python, Streamlit, Supabase, PyTorch","archived":false,"fork":false,"pushed_at":"2025-03-02T10:11:00.000Z","size":2867,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-10T20:58:50.748Z","etag":null,"topics":["generative-ai","python","pytorch","streamlit","supabase"],"latest_commit_sha":null,"homepage":"https://storycrowdsourcepreference.streamlit.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clchinkc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-25T01:32:41.000Z","updated_at":"2025-03-02T10:32:37.000Z","dependencies_parsed_at":"2025-03-02T11:32:03.536Z","dependency_job_id":null,"html_url":"https://github.com/clchinkc/story_crowdsource_preference","commit_stats":null,"previous_names":["clchinkc/story_crowdsource_preference"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/clchinkc/story_crowdsource_preference","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clchinkc%2Fstory_crowdsource_preference","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clchinkc%2Fstory_crowdsource_preference/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clchinkc%2Fstory_crowdsource_preference/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clchinkc%2Fstory_crowdsource_preference/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clchinkc","download_url":"https://codeload.github.com/clchinkc/story_crowdsource_preference/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clchinkc%2Fstory_crowdsource_preference/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31674624,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-11T08:18:19.405Z","status":"ssl_error","status_checked_at":"2026-04-11T08:17:08.892Z","response_time":54,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["generative-ai","python","pytorch","streamlit","supabase"],"created_at":"2025-03-03T20:04:05.861Z","updated_at":"2026-04-11T09:02:04.982Z","avatar_url":"https://github.com/clchinkc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Story Crowdsource Preference System\n\n🌟 **We're creating an open source dataset for story preferences! Star this repository to be notified when we release it.** 🌟\n\n[![X URL](https://img.shields.io/twitter/url/https/x.com/doc_editor_saas.svg?style=social\u0026label=Follow%20%40doc_editor_saas)](https://x.com/doc_editor_saas)\n[![Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://storycrowdsourcepreference.streamlit.app)\n\n\nA comprehensive system for collecting, analyzing, and learning from story preferences using AI models and human feedback. This project combines multiple AI models with human feedback to improve story generation and preference learning.\n\n## Overview\n\nThe system consists of several key components:\n\n1. **Story Generation**: Uses multiple AI models (GPT-4, Gemini, etc.) to generate story variations from prompts\n2. **Feedback Collection**: Web interface for collecting human preferences between story variations\n3. **Embedding Processing**: Generates and stores embeddings for story variations\n4. **Reward Model Training**: Trains a reward model based on collected preferences\n5. **Dataset Generation**: Creates preference datasets suitable for Direct Preference Optimization (DPO)\n\n## Key Highlights\n\n- **Open Source Dataset**: We will open source the dataset here in this repository, making it a valuable resource for both the tech and writer communities.\n- **Live Demo**: Try our [Story Preference Collection App](https://storycrowdsourcepreference.streamlit.app) to contribute your preferences\n- **Streamlit + Supabase**: The system is built using Streamlit for the web interface and Supabase for data management.\n- **Benchmarking Tool**: This dataset will be instrumental in benchmarking Large Language Models (LLMs) for creative writing.\n\n## Features\n\n- Multi-model story generation with configurable provider selection\n- Interactive web interface for story comparison and feedback collection\n- Automated embedding generation for story variations\n- Customizable reward model training with source weighting\n- Export capabilities for DPO-compatible datasets\n- Supabase integration for data storage and management\n\n## Participate and Contribute\n\n- **⭐ Star the Repository**: Your first step! Star this repository to show your support and be notified when we release the dataset.\n- **Input Your Preferences**: Visit our [Story Preference Collection App](https://storycrowdsourcepreference.streamlit.app) to input your story preferences and contribute to the dataset.\n\n## Prerequisites\n\n- Python 3.8+\n- PyTorch\n- CUDA-compatible GPU (optional, but recommended for training)\n- Supabase account and project\n- API keys for supported AI models (GPT-4, Gemini, etc.)\n\n## Installation\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/yourusername/story_crowdsource_preference.git\ncd story_crowdsource_preference\n```\n\n2. Install dependencies:\n```bash\npip install -r requirements.txt\n```\n\n3. Set up environment variables:\nCreate a `.env` file with the following variables:\n```\nSUPABASE_URL=your_supabase_url\nSUPABASE_KEY=your_supabase_key\nGITHUB_TOKEN=your_github_token\nGEMINI_API_KEY=your_gemini_api_key\nAZURE_UST_SECONDARY_KEY=your_azure_key\n```\n\n## Components\n\n### Story Dataset Generator (`story_dataset_generator.py`)\n- Generates story prompts and variations using multiple AI models\n- Handles story comparison and evaluation\n- Integrates with Supabase for data storage\n\n### Feedback Collection App (`story_feedback_app.py`)\n- Streamlit-based web interface\n- Allows users to compare and rate story variations\n- Exports collected preferences in DPO format\n\n### Embedding Processor (`story_embedding_processor.py`)\n- Processes story variations to generate embeddings\n- Uses ModernBERT for embedding generation\n- Manages embedding storage in Supabase\n\n### Reward Model Training (`story_ranking_dataset.py`)\n- Implements preference learning from collected feedback\n- Supports weighted training based on feedback source\n- Includes model evaluation and testing capabilities\n\n## Usage\n\n1. Start the feedback collection app:\n```bash\nstreamlit run story_feedback_app.py\n```\n\n2. Generate story datasets:\n```bash\npython story_dataset_generator.py\n```\n\n3. Process embeddings:\n```bash\npython story_embedding_processor.py\n```\n\n4. Train the reward model:\n```bash\npython story_ranking_dataset.py\n```\n\n## Configuration\n\n### Provider Configuration\nThe system supports multiple AI providers with configurable weights and models:\n\n```python\nPROVIDERS_CONFIG = {\n    \"prompt\": {\n        \"provider\": \"github\",\n        \"model\": \"openai/gpt-4o-mini\"\n    },\n    \"variations\": [\n        {\n            \"provider\": \"gemini\",\n            \"model\": \"gemini/gemini-2.0-flash-exp\"\n        },\n        {\n            \"provider\": \"azure\",\n            \"model\": \"azure/gpt-4o-mini\"\n        }\n    ]\n}\n```\n\n### Training Configuration\nCustomize reward model training parameters:\n\n```python\nconfig = {\n    'batch_size': 4,\n    'num_epochs': 5,\n    'learning_rate': 3e-5,\n    'test_size': 0.2,\n    'source_weights': {\n        'model': 1.0,   # Full weight to reward model feedback\n        'llm': 0.5,     # Half weight to LLM feedback\n        'human': 2.0    # Double weight to human feedback\n    }\n}\n```\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License. See the LICENSE file for details.\n\n## Acknowledgments\n\n- ModernBERT by AnswerDotAI\n- Streamlit for the web interface\n- Supabase for database services\n \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclchinkc%2Fstory_crowdsource_preference","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclchinkc%2Fstory_crowdsource_preference","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclchinkc%2Fstory_crowdsource_preference/lists"}