An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by bigscience-workshop

A curated list of projects in awesome lists by bigscience-workshop .

https://github.com/bigscience-workshop/promptsource

Toolkit for creating, sharing and using natural language prompts.

machine-learning natural-language-processing nlp

Last synced: 14 May 2025

https://github.com/bigscience-workshop/megatron-deepspeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Last synced: 15 May 2025

https://github.com/bigscience-workshop/Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Last synced: 27 Mar 2025

https://github.com/bigscience-workshop/bigscience

Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.

machine-learning models nlp training

Last synced: 16 May 2025

https://github.com/bigscience-workshop/biomedical

Tools for curating biomedical training data for large-scale language modeling

Last synced: 15 May 2025

https://github.com/bigscience-workshop/t-zero

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

Last synced: 28 Oct 2025

https://github.com/bigscience-workshop/data-preparation

Code used for sourcing and cleaning the BigScience ROOTS corpus

dataset large-language-models multilingual

Last synced: 09 Apr 2025

https://github.com/bigscience-workshop/lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Last synced: 14 Jan 2026

https://github.com/bigscience-workshop/data_tooling

Tools for managing datasets for governance and training.

Last synced: 27 Jan 2026

https://github.com/bigscience-workshop/lam

Libraries, Archives and Museums (LAM)

Last synced: 26 Feb 2025

https://github.com/bigscience-workshop/multilingual-modeling

BLOOM+1: Adapting BLOOM model to support a new unseen language

Last synced: 26 Apr 2025

https://github.com/bigscience-workshop/evaluation

Code and Data for Evaluation WG

Last synced: 26 Apr 2025

https://github.com/bigscience-workshop/data_sourcing

This directory gathers the tools developed by the Data Sourcing Working Group

Last synced: 24 Oct 2025

https://github.com/bigscience-workshop/metadata

Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.

Last synced: 26 Apr 2025

https://github.com/bigscience-workshop/carbon-footprint

A repository for `codecarbon` logs.

Last synced: 26 Apr 2025

https://github.com/bigscience-workshop/bloom-dechonk

A repo for running model shrinking experiments

Last synced: 26 Apr 2025

https://github.com/bigscience-workshop/catalogue_data

Scripts to prepare catalogue data

Last synced: 26 Apr 2025

https://github.com/bigscience-workshop/historical_texts

BigScience working group on language models for historical texts

Last synced: 26 Apr 2025

https://github.com/bigscience-workshop/pii_processing

PII Processing code to detect and remediate PII in BigScience datasets. Reference implementation for the PII Hackathon

Last synced: 26 Apr 2025

https://github.com/bigscience-workshop/bibliography

A list of BigScience publications

bibliography bibtex research

Last synced: 26 Feb 2025

https://github.com/bigscience-workshop/evaluation-robustness-consistency

Tools for evaluating model robustness and consistency

Last synced: 16 Jul 2025

https://github.com/bigscience-workshop/scaling-laws-tokenization

scaling-laws-tokenization

Last synced: 26 Feb 2025

https://github.com/bigscience-workshop/datasets_stats

Generate statistics over datasets used in the context of BS

Last synced: 23 Feb 2026

https://github.com/bigscience-workshop/shadesofbias

Evaluation for Shades of Bias in Text

Last synced: 26 Feb 2025