https://github.com/duoan/replicateai
Recreating every milestone in Machine Learning and Artificial Intelligence
https://github.com/duoan/replicateai
ai ai-history bert deep-learning foundation-models llama llava llm machine-learning ml qwen reproduce reproducibility tokenizers transformer
Last synced: about 2 months ago
JSON representation
Recreating every milestone in Machine Learning and Artificial Intelligence
- Host: GitHub
- URL: https://github.com/duoan/replicateai
- Owner: duoan
- Created: 2025-10-18T21:56:22.000Z (9 months ago)
- Default Branch: master
- Last Pushed: 2025-10-28T04:52:50.000Z (8 months ago)
- Last Synced: 2025-10-28T06:21:01.031Z (8 months ago)
- Topics: ai, ai-history, bert, deep-learning, foundation-models, llama, llava, llm, machine-learning, ml, qwen, reproduce, reproducibility, tokenizers, transformer
- Language: Python
- Homepage:
- Size: 9.95 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ง ReplicateAI
> **Recreating every milestone in Machine Learning and Artificial Intelligence โ from Transformers to Perceptrons.**
[](LICENSE)
[]()
[]()
---
## ๐ Overview
**ReplicateAI** is an open initiative to **rebuild and verify every major paper in ML/AI history**,
starting from modern **foundation models (2023โ2025)** and tracing backward to the origins of AI.
We believe that **understanding AI means rebuilding it โ line by line, layer by layer.**

---
## ๐งฉ Project Vision
> โBecause science means reproducibility.โ
- ๐ **Goal**: Faithfully re-implement influential ML/AI papers with open code, datasets, and experiments
- ๐งฑ **Scope**: From *Qwen2.5 (2025)* to *Perceptron (1958)*
- ๐ง **Approach**: Reverse timeline โ start with Foundation Models, then trace history backward
- ๐งพ **Output**: Each paper becomes a self-contained, reproducible module with reports and experiments
## ๐ช Stage 1 โ Foundation & Multimodal Era (2023โ2025)
> *The golden age of open-source foundation models.*
| Year | Paper / Model | Organization | Why It Matters | Replicate Goal | Status |
|----------|------------------------|------------------|--------------------------------------------|----------------------------------------------|------------|
| **2025** | **Qwen2.5** | Alibaba | Fully open multimodal model (text + image) | Rebuild text/image pipeline | ๐งญ Planned |
| **2025** | **DeepSeek-V2** | DeepSeek | MoE + RLHF efficiency breakthrough | Replicate expert routing and reward pipeline | ๐งญ Planned |
| **2025** | **Claude 3 Family** | Anthropic | Leading alignment via Constitutional AI | Explore rule-based alignment principles | ๐งญ Planned |
| **2024** | **LLaMA 3** | Meta | Open foundation model standard | Implement scaled transformer + tokenizer | ๐งญ Planned |
| **2024** | **Mixtral 8ร7B** | Mistral | Sparse Mixture-of-Experts architecture | Implement routing + expert parallelism | ๐งญ Planned |
| **2024** | **Phi-2 / Phi-3** | Microsoft | Small but high-quality model; data-centric | Rebuild synthetic data pipeline | ๐งญ Planned |
| **2024** | **Gemini 1 / 1.5** | Google DeepMind | Vision + Text + Reasoning | Prototype multimodal reasoning pipeline | ๐งญ Planned |
| **2023** | **Qwen-VL** | Alibaba | Vision-language alignment model | Replicate visual encoder + text fusion | ๐งญ Planned |
| **2023** | **BLIP-2 / MiniGPT-4** | Salesforce / HKU | Lightweight multimodal bridging | Implement pretrain connector | ๐งญ Planned |
| **2023** | **LLaMA 1 / 2** | Meta | Open LLM baseline | Implement tokenizer + attention stack | ๐งญ Planned |
---
## ๐ Stage 2 โ Representation & Sequence Models (2013โ2021)
| Year | Paper | Author | Goal | Status |
|------|--------------------------------------------------------------------|--------------------|---------------------------------------------------------------|----------------|
| 2021 | [CLIP](./stage2_representation/2021_CLIP) | Radford, et al. | Align Vision and NLP in same space using contrastive learning | ๐ฌ Replicating |
| 2020 | [ViT](./stage2_representation/2020_VisionTransformer) | Dosovitskiy et al. | Vision Transformer | โ
Done |
| 2018 | BERT | Devlin et al. | Masked Language Modeling | ๐ฌ Replicating |
| 2017 | [Transformer](./stage2_representation/2017_AttentionIsAllYouNeed/) | Vaswani et al. | โAttention Is All You Needโ | โ
Done |
| 2014 | Seq2Seq | Sutskever et al. | Encoder-decoder translation | ๐งญ Planned |
| 2013 | Word2Vec | Mikolov et al. | Learn word embeddings | ๐งญ Planned |
| 2015 | Bahdanau Attention | Bahdanau et al. | RNN + Attention | ๐งญ Planned |
---
## ๐งฉ Stage 3 โ Deep Learning Renaissance (2006โ2014)
| Year | Paper | Author | Goal | Status |
|------|-----------|-------------------|------------------------|------------|
| 2015 | ResNet | He et al. | Residual learning | ๐งญ Planned |
| 2014 | VGG | Simonyan et al. | Deep CNN architectures | ๐งญ Planned |
| 2012 | AlexNet | Krizhevsky et al. | GPU-based CNN | ๐งญ Planned |
| 2006 | DBN / RBM | Hinton | Layer-wise pretraining | ๐งญ Planned |
---
## ๐ Stage 4 โ Statistical Learning Era (1990sโ2000s)
| Year | Paper | Author | Goal | Status |
|------|----------------|-------------------|---------------------------|------------|
| 2001 | Random Forests | Breiman | Ensemble learning | ๐งญ Planned |
| 1997 | AdaBoost | Freund & Schapire | Boosting algorithms | ๐งญ Planned |
| 1995 | SVM | Vapnik | Maximum margin classifier | ๐งญ Planned |
| 1977 | EM Algorithm | Dempster et al. | Expectation-Maximization | ๐งญ Planned |
---
## ๐งฌ Stage 5 โ Early Neural Foundations (1950sโ1980s)
| Year | Paper | Author | Goal | Status |
|------|-------------------|------------------|-----------------------------|------------|
| 1986 | Backpropagation | Rumelhart et al. | Gradient-based learning | ๐งญ Planned |
| 1985 | Boltzmann Machine | Hinton et al. | Generative stochastic model | ๐งญ Planned |
| 1982 | Hopfield Network | Hopfield | Associative memory | ๐งญ Planned |
| 1958 | Perceptron | Rosenblatt | Linear separability | ๐งญ Planned |
---
## Lifecycle
```
๐งญ Planned
โ
๐ฌ In Reproduction
โ
๐งช Under Evaluation
โ
๐ Verified
โ
๐งพ Documented
โ
๐งฐ Extended (optional)
```
## ๐ Repository Structure
```
ReplicateAI/
โโโ stage1_foundation/
โ โโโ 2025_Qwen2.5/
โ โโโ 2024_LLaMA3/
โ โโโ 2023_CLIP/
โโโ stage2_representation/
โ โโโ 2018_BERT/
โ โโโ 2017_Transformer/
โ โโโ 2013_Word2Vec/
โโโ stage3_deep_renaissance/
โ โโโ 2015_ResNet/
โ โโโ 2012_AlexNet/
โ โโโ 2006_DBN/
โโโ stage4_statistical/
โ โโโ 2001_RandomForest/
โ โโโ 1995_SVM/
โโโ stage5_foundations/
โโโ 1986_Backprop/
โโโ 1958_Perceptron/
```
Each paper module includes:
```
๐ README.md โ Paper summary & objective
๐ report.md โ Reproduction results & analysis
๐ notebook/ โ Interactive demo
๐ป src/ โ Core implementation
๐ references.bib โ Original citation
````
---
## ๐ค Contributing
We welcome contributions from researchers, engineers, and students who believe in reproducibility.
1. Fork the repo
2. Pick a paper or model not yet implemented
3. Follow the [Paper Template](paper_template/README.md)
4. Submit a PR with your code and report
โ
**Please include**:
- clear code (PyTorch / JAX / NumPy)
- short experiment or visualization
- reproducibility notes or deviations
---
## ๐งฎ Progress Overview
| Stage | Era | Progress |
|---------------------------------|---------------------------|-------------------|
| ๐ช Foundation (2023โ2025) | Modern LLM & Multimodal | โโโโโโโโโโโโโโ 0% |
| ๐ Representation (2013โ2020) | Transformers & Embeddings | โโโโโโโโโโโโโโ 0% |
| ๐งฉ Deep Renaissance (2006โ2014) | CNN Era | โโโโโโโโโโโโโโ 0% |
| ๐ Statistical (1990sโ2000s) | Classical ML | โโโโโโโโโโโโโโ 0% |
| ๐งฌ Foundations (1950sโ1980s) | Neural Origins | โโโโโโโโโโโโโโ 0% |
---
## ๐ Citation
If you use or reference this project, please cite:
```bibtex
@misc{replicateai2025,
author = {ReplicateAI Contributors},
title = {ReplicateAI: Rebuilding the History of Machine Learning and Artificial Intelligence},
year = {2025},
url = {https://github.com/duoan/ReplicateAI}
}
```
---
## ๐ฌ Motto
> โReplicate. Verify. Understand.โ
---
โญ๏ธ **Star this repo if you believe reproducibility is the foundation of true intelligence.**