An open API service indexing awesome lists of open source software.

https://github.com/duoan/replicateai

Recreating every milestone in Machine Learning and Artificial Intelligence
https://github.com/duoan/replicateai

ai ai-history bert deep-learning foundation-models llama llava llm machine-learning ml qwen reproduce reproducibility tokenizers transformer

Last synced: about 2 months ago
JSON representation

Recreating every milestone in Machine Learning and Artificial Intelligence

Awesome Lists containing this project

README

          

# ๐Ÿง  ReplicateAI

> **Recreating every milestone in Machine Learning and Artificial Intelligence โ€” from Transformers to Perceptrons.**

[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)]()
[![Status](https://img.shields.io/badge/Project-Active-blue.svg)]()

---

## ๐Ÿš€ Overview

**ReplicateAI** is an open initiative to **rebuild and verify every major paper in ML/AI history**,
starting from modern **foundation models (2023โ€“2025)** and tracing backward to the origins of AI.

We believe that **understanding AI means rebuilding it โ€” line by line, layer by layer.**

![quote](quote.png)

---

## ๐Ÿงฉ Project Vision

> โ€œBecause science means reproducibility.โ€

- ๐Ÿ“œ **Goal**: Faithfully re-implement influential ML/AI papers with open code, datasets, and experiments
- ๐Ÿงฑ **Scope**: From *Qwen2.5 (2025)* to *Perceptron (1958)*
- ๐Ÿง  **Approach**: Reverse timeline โ€” start with Foundation Models, then trace history backward
- ๐Ÿงพ **Output**: Each paper becomes a self-contained, reproducible module with reports and experiments

## ๐Ÿช Stage 1 โ€” Foundation & Multimodal Era (2023โ€“2025)

> *The golden age of open-source foundation models.*

| Year | Paper / Model | Organization | Why It Matters | Replicate Goal | Status |
|----------|------------------------|------------------|--------------------------------------------|----------------------------------------------|------------|
| **2025** | **Qwen2.5** | Alibaba | Fully open multimodal model (text + image) | Rebuild text/image pipeline | ๐Ÿงญ Planned |
| **2025** | **DeepSeek-V2** | DeepSeek | MoE + RLHF efficiency breakthrough | Replicate expert routing and reward pipeline | ๐Ÿงญ Planned |
| **2025** | **Claude 3 Family** | Anthropic | Leading alignment via Constitutional AI | Explore rule-based alignment principles | ๐Ÿงญ Planned |
| **2024** | **LLaMA 3** | Meta | Open foundation model standard | Implement scaled transformer + tokenizer | ๐Ÿงญ Planned |
| **2024** | **Mixtral 8ร—7B** | Mistral | Sparse Mixture-of-Experts architecture | Implement routing + expert parallelism | ๐Ÿงญ Planned |
| **2024** | **Phi-2 / Phi-3** | Microsoft | Small but high-quality model; data-centric | Rebuild synthetic data pipeline | ๐Ÿงญ Planned |
| **2024** | **Gemini 1 / 1.5** | Google DeepMind | Vision + Text + Reasoning | Prototype multimodal reasoning pipeline | ๐Ÿงญ Planned |
| **2023** | **Qwen-VL** | Alibaba | Vision-language alignment model | Replicate visual encoder + text fusion | ๐Ÿงญ Planned |
| **2023** | **BLIP-2 / MiniGPT-4** | Salesforce / HKU | Lightweight multimodal bridging | Implement pretrain connector | ๐Ÿงญ Planned |
| **2023** | **LLaMA 1 / 2** | Meta | Open LLM baseline | Implement tokenizer + attention stack | ๐Ÿงญ Planned |

---

## ๐Ÿ” Stage 2 โ€” Representation & Sequence Models (2013โ€“2021)

| Year | Paper | Author | Goal | Status |
|------|--------------------------------------------------------------------|--------------------|---------------------------------------------------------------|----------------|
| 2021 | [CLIP](./stage2_representation/2021_CLIP) | Radford, et al. | Align Vision and NLP in same space using contrastive learning | ๐Ÿ”ฌ Replicating |
| 2020 | [ViT](./stage2_representation/2020_VisionTransformer) | Dosovitskiy et al. | Vision Transformer | โœ… Done |
| 2018 | BERT | Devlin et al. | Masked Language Modeling | ๐Ÿ”ฌ Replicating |
| 2017 | [Transformer](./stage2_representation/2017_AttentionIsAllYouNeed/) | Vaswani et al. | โ€œAttention Is All You Needโ€ | โœ… Done |
| 2014 | Seq2Seq | Sutskever et al. | Encoder-decoder translation | ๐Ÿงญ Planned |
| 2013 | Word2Vec | Mikolov et al. | Learn word embeddings | ๐Ÿงญ Planned |
| 2015 | Bahdanau Attention | Bahdanau et al. | RNN + Attention | ๐Ÿงญ Planned |

---

## ๐Ÿงฉ Stage 3 โ€” Deep Learning Renaissance (2006โ€“2014)

| Year | Paper | Author | Goal | Status |
|------|-----------|-------------------|------------------------|------------|
| 2015 | ResNet | He et al. | Residual learning | ๐Ÿงญ Planned |
| 2014 | VGG | Simonyan et al. | Deep CNN architectures | ๐Ÿงญ Planned |
| 2012 | AlexNet | Krizhevsky et al. | GPU-based CNN | ๐Ÿงญ Planned |
| 2006 | DBN / RBM | Hinton | Layer-wise pretraining | ๐Ÿงญ Planned |

---

## ๐Ÿ“Š Stage 4 โ€” Statistical Learning Era (1990sโ€“2000s)

| Year | Paper | Author | Goal | Status |
|------|----------------|-------------------|---------------------------|------------|
| 2001 | Random Forests | Breiman | Ensemble learning | ๐Ÿงญ Planned |
| 1997 | AdaBoost | Freund & Schapire | Boosting algorithms | ๐Ÿงญ Planned |
| 1995 | SVM | Vapnik | Maximum margin classifier | ๐Ÿงญ Planned |
| 1977 | EM Algorithm | Dempster et al. | Expectation-Maximization | ๐Ÿงญ Planned |

---

## ๐Ÿงฌ Stage 5 โ€” Early Neural Foundations (1950sโ€“1980s)

| Year | Paper | Author | Goal | Status |
|------|-------------------|------------------|-----------------------------|------------|
| 1986 | Backpropagation | Rumelhart et al. | Gradient-based learning | ๐Ÿงญ Planned |
| 1985 | Boltzmann Machine | Hinton et al. | Generative stochastic model | ๐Ÿงญ Planned |
| 1982 | Hopfield Network | Hopfield | Associative memory | ๐Ÿงญ Planned |
| 1958 | Perceptron | Rosenblatt | Linear separability | ๐Ÿงญ Planned |

---

## Lifecycle

```
๐Ÿงญ Planned
โ†“
๐Ÿ”ฌ In Reproduction
โ†“
๐Ÿงช Under Evaluation
โ†“
๐Ÿ“ˆ Verified
โ†“
๐Ÿงพ Documented
โ†“
๐Ÿงฐ Extended (optional)
```

## ๐Ÿ“ Repository Structure

```

ReplicateAI/
โ”œโ”€โ”€ stage1_foundation/
โ”‚ โ”œโ”€โ”€ 2025_Qwen2.5/
โ”‚ โ”œโ”€โ”€ 2024_LLaMA3/
โ”‚ โ””โ”€โ”€ 2023_CLIP/
โ”œโ”€โ”€ stage2_representation/
โ”‚ โ”œโ”€โ”€ 2018_BERT/
โ”‚ โ”œโ”€โ”€ 2017_Transformer/
โ”‚ โ””โ”€โ”€ 2013_Word2Vec/
โ”œโ”€โ”€ stage3_deep_renaissance/
โ”‚ โ”œโ”€โ”€ 2015_ResNet/
โ”‚ โ”œโ”€โ”€ 2012_AlexNet/
โ”‚ โ””โ”€โ”€ 2006_DBN/
โ”œโ”€โ”€ stage4_statistical/
โ”‚ โ”œโ”€โ”€ 2001_RandomForest/
โ”‚ โ””โ”€โ”€ 1995_SVM/
โ””โ”€โ”€ stage5_foundations/
โ”œโ”€โ”€ 1986_Backprop/
โ””โ”€โ”€ 1958_Perceptron/

```

Each paper module includes:

```

๐Ÿ“„ README.md โ€” Paper summary & objective
๐Ÿ“˜ report.md โ€” Reproduction results & analysis
๐Ÿ““ notebook/ โ€” Interactive demo
๐Ÿ’ป src/ โ€” Core implementation
๐Ÿ”— references.bib โ€” Original citation

````

---

## ๐Ÿค Contributing

We welcome contributions from researchers, engineers, and students who believe in reproducibility.

1. Fork the repo
2. Pick a paper or model not yet implemented
3. Follow the [Paper Template](paper_template/README.md)
4. Submit a PR with your code and report

โœ… **Please include**:

- clear code (PyTorch / JAX / NumPy)
- short experiment or visualization
- reproducibility notes or deviations

---

## ๐Ÿงฎ Progress Overview

| Stage | Era | Progress |
|---------------------------------|---------------------------|-------------------|
| ๐Ÿช Foundation (2023โ€“2025) | Modern LLM & Multimodal | โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 0% |
| ๐Ÿ” Representation (2013โ€“2020) | Transformers & Embeddings | โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 0% |
| ๐Ÿงฉ Deep Renaissance (2006โ€“2014) | CNN Era | โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 0% |
| ๐Ÿ“Š Statistical (1990sโ€“2000s) | Classical ML | โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 0% |
| ๐Ÿงฌ Foundations (1950sโ€“1980s) | Neural Origins | โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 0% |

---

## ๐Ÿ“š Citation

If you use or reference this project, please cite:

```bibtex
@misc{replicateai2025,
author = {ReplicateAI Contributors},
title = {ReplicateAI: Rebuilding the History of Machine Learning and Artificial Intelligence},
year = {2025},
url = {https://github.com/duoan/ReplicateAI}
}
```

---

## ๐Ÿ’ฌ Motto

> โ€œReplicate. Verify. Understand.โ€

---

โญ๏ธ **Star this repo if you believe reproducibility is the foundation of true intelligence.**