{"id":32691182,"url":"https://github.com/duoan/replicateai","last_synced_at":"2026-05-09T02:19:08.610Z","repository":{"id":320639212,"uuid":"1079002593","full_name":"duoan/ReplicateAI","owner":"duoan","description":"Recreating every milestone in Machine Learning and Artificial Intelligence","archived":false,"fork":false,"pushed_at":"2025-10-28T04:52:50.000Z","size":10435,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-10-28T06:21:01.031Z","etag":null,"topics":["ai","ai-history","bert","deep-learning","foundation-models","llama","llava","llm","machine-learning","ml","qwen","reproduce","reproducibility","tokenizers","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/duoan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-18T21:56:22.000Z","updated_at":"2025-10-28T04:52:53.000Z","dependencies_parsed_at":"2025-10-28T06:21:01.981Z","dependency_job_id":null,"html_url":"https://github.com/duoan/ReplicateAI","commit_stats":null,"previous_names":["duoan/reproduceai","duoan/replicateai"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/duoan/ReplicateAI","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duoan%2FReplicateAI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duoan%2FReplicateAI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duoan%2FReplicateAI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duoan%2FReplicateAI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/duoan","download_url":"https://codeload.github.com/duoan/ReplicateAI/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/duoan%2FReplicateAI/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":282158195,"owners_count":26623960,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-01T02:00:06.759Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-history","bert","deep-learning","foundation-models","llama","llava","llm","machine-learning","ml","qwen","reproduce","reproducibility","tokenizers","transformer"],"created_at":"2025-11-01T15:00:54.150Z","updated_at":"2025-11-01T15:02:22.475Z","avatar_url":"https://github.com/duoan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🧠 ReplicateAI\n\n\u003e **Recreating every milestone in Machine Learning and Artificial Intelligence — from Transformers to Perceptrons.**\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n[![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)]()\n[![Status](https://img.shields.io/badge/Project-Active-blue.svg)]()\n\n---\n\n## 🚀 Overview\n\n**ReplicateAI** is an open initiative to **rebuild and verify every major paper in ML/AI history**,  \nstarting from modern **foundation models (2023–2025)** and tracing backward to the origins of AI.\n\nWe believe that **understanding AI means rebuilding it — line by line, layer by layer.**\n\n![quote](quote.png)\n\n---\n\n## 🧩 Project Vision\n\n\u003e “Because science means reproducibility.”\n\n- 📜 **Goal**: Faithfully re-implement influential ML/AI papers with open code, datasets, and experiments\n- 🧱 **Scope**: From *Qwen2.5 (2025)* to *Perceptron (1958)*\n- 🧠 **Approach**: Reverse timeline — start with Foundation Models, then trace history backward\n- 🧾 **Output**: Each paper becomes a self-contained, reproducible module with reports and experiments\n\n## 🪐 Stage 1 — Foundation \u0026 Multimodal Era (2023–2025)\n\n\u003e *The golden age of open-source foundation models.*\n\n| Year     | Paper / Model          | Organization     | Why It Matters                             | Replicate Goal                               | Status     |\n|----------|------------------------|------------------|--------------------------------------------|----------------------------------------------|------------|\n| **2025** | **Qwen2.5**            | Alibaba          | Fully open multimodal model (text + image) | Rebuild text/image pipeline                  | 🧭 Planned |\n| **2025** | **DeepSeek-V2**        | DeepSeek         | MoE + RLHF efficiency breakthrough         | Replicate expert routing and reward pipeline | 🧭 Planned |\n| **2025** | **Claude 3 Family**    | Anthropic        | Leading alignment via Constitutional AI    | Explore rule-based alignment principles      | 🧭 Planned |\n| **2024** | **LLaMA 3**            | Meta             | Open foundation model standard             | Implement scaled transformer + tokenizer     | 🧭 Planned |\n| **2024** | **Mixtral 8×7B**       | Mistral          | Sparse Mixture-of-Experts architecture     | Implement routing + expert parallelism       | 🧭 Planned |\n| **2024** | **Phi-2 / Phi-3**      | Microsoft        | Small but high-quality model; data-centric | Rebuild synthetic data pipeline              | 🧭 Planned |\n| **2024** | **Gemini 1 / 1.5**     | Google DeepMind  | Vision + Text + Reasoning                  | Prototype multimodal reasoning pipeline      | 🧭 Planned |\n| **2023** | **Qwen-VL**            | Alibaba          | Vision-language alignment model            | Replicate visual encoder + text fusion       | 🧭 Planned |\n| **2023** | **BLIP-2 / MiniGPT-4** | Salesforce / HKU | Lightweight multimodal bridging            | Implement pretrain connector                 | 🧭 Planned |\n| **2023** | **LLaMA 1 / 2**        | Meta             | Open LLM baseline                          | Implement tokenizer + attention stack        | 🧭 Planned |\n\n---\n\n## 🔍 Stage 2 — Representation \u0026 Sequence Models (2013–2021)\n\n| Year | Paper                                                              | Author             | Goal                                                          | Status         |\n|------|--------------------------------------------------------------------|--------------------|---------------------------------------------------------------|----------------|\n| 2021 | [CLIP](./stage2_representation/2021_CLIP)                          | Radford, et al.    | Align Vision and NLP in same space using contrastive learning | 🔬 Replicating |\n| 2020 | [ViT](./stage2_representation/2020_VisionTransformer)              | Dosovitskiy et al. | Vision Transformer                                            | ✅ Done         |\n| 2018 | BERT                                                               | Devlin et al.      | Masked Language Modeling                                      | 🔬 Replicating |\n| 2017 | [Transformer](./stage2_representation/2017_AttentionIsAllYouNeed/) | Vaswani et al.     | “Attention Is All You Need”                                   | ✅ Done         |\n| 2014 | Seq2Seq                                                            | Sutskever et al.   | Encoder-decoder translation                                   | 🧭 Planned     |\n| 2013 | Word2Vec                                                           | Mikolov et al.     | Learn word embeddings                                         | 🧭 Planned     |\n| 2015 | Bahdanau Attention                                                 | Bahdanau et al.    | RNN + Attention                                               | 🧭 Planned     |\n\n---\n\n## 🧩 Stage 3 — Deep Learning Renaissance (2006–2014)\n\n| Year | Paper     | Author            | Goal                   | Status     |\n|------|-----------|-------------------|------------------------|------------|\n| 2015 | ResNet    | He et al.         | Residual learning      | 🧭 Planned |\n| 2014 | VGG       | Simonyan et al.   | Deep CNN architectures | 🧭 Planned |\n| 2012 | AlexNet   | Krizhevsky et al. | GPU-based CNN          | 🧭 Planned |\n| 2006 | DBN / RBM | Hinton            | Layer-wise pretraining | 🧭 Planned |\n\n---\n\n## 📊 Stage 4 — Statistical Learning Era (1990s–2000s)\n\n| Year | Paper          | Author            | Goal                      | Status     |\n|------|----------------|-------------------|---------------------------|------------|\n| 2001 | Random Forests | Breiman           | Ensemble learning         | 🧭 Planned |\n| 1997 | AdaBoost       | Freund \u0026 Schapire | Boosting algorithms       | 🧭 Planned |\n| 1995 | SVM            | Vapnik            | Maximum margin classifier | 🧭 Planned |\n| 1977 | EM Algorithm   | Dempster et al.   | Expectation-Maximization  | 🧭 Planned |\n\n---\n\n## 🧬 Stage 5 — Early Neural Foundations (1950s–1980s)\n\n| Year | Paper             | Author           | Goal                        | Status     |\n|------|-------------------|------------------|-----------------------------|------------|\n| 1986 | Backpropagation   | Rumelhart et al. | Gradient-based learning     | 🧭 Planned |\n| 1985 | Boltzmann Machine | Hinton et al.    | Generative stochastic model | 🧭 Planned |\n| 1982 | Hopfield Network  | Hopfield         | Associative memory          | 🧭 Planned |\n| 1958 | Perceptron        | Rosenblatt       | Linear separability         | 🧭 Planned |\n\n---\n\n## Lifecycle\n\n```\n🧭 Planned\n   ↓\n🔬 In Reproduction\n   ↓\n🧪 Under Evaluation\n   ↓\n📈 Verified\n   ↓\n🧾 Documented\n   ↓\n🧰 Extended (optional)\n```\n\n## 📁 Repository Structure\n\n```\n\nReplicateAI/\n├── stage1_foundation/\n│   ├── 2025_Qwen2.5/\n│   ├── 2024_LLaMA3/\n│   └── 2023_CLIP/\n├── stage2_representation/\n│   ├── 2018_BERT/\n│   ├── 2017_Transformer/\n│   └── 2013_Word2Vec/\n├── stage3_deep_renaissance/\n│   ├── 2015_ResNet/\n│   ├── 2012_AlexNet/\n│   └── 2006_DBN/\n├── stage4_statistical/\n│   ├── 2001_RandomForest/\n│   └── 1995_SVM/\n└── stage5_foundations/\n├── 1986_Backprop/\n└── 1958_Perceptron/\n\n```\n\nEach paper module includes:\n\n```\n\n📄 README.md   — Paper summary \u0026 objective\n📘 report.md   — Reproduction results \u0026 analysis\n📓 notebook/   — Interactive demo\n💻 src/        — Core implementation\n🔗 references.bib — Original citation\n\n````\n\n---\n\n## 🤝 Contributing\n\nWe welcome contributions from researchers, engineers, and students who believe in reproducibility.\n\n1. Fork the repo\n2. Pick a paper or model not yet implemented\n3. Follow the [Paper Template](paper_template/README.md)\n4. Submit a PR with your code and report\n\n✅ **Please include**:\n\n- clear code (PyTorch / JAX / NumPy)\n- short experiment or visualization\n- reproducibility notes or deviations\n\n---\n\n## 🧮 Progress Overview\n\n| Stage                           | Era                       | Progress          |\n|---------------------------------|---------------------------|-------------------|\n| 🪐 Foundation (2023–2025)       | Modern LLM \u0026 Multimodal   | ░░░░░░░░░░░░░░ 0% |\n| 🔍 Representation (2013–2020)   | Transformers \u0026 Embeddings | ░░░░░░░░░░░░░░ 0% |\n| 🧩 Deep Renaissance (2006–2014) | CNN Era                   | ░░░░░░░░░░░░░░ 0% |\n| 📊 Statistical (1990s–2000s)    | Classical ML              | ░░░░░░░░░░░░░░ 0% |\n| 🧬 Foundations (1950s–1980s)    | Neural Origins            | ░░░░░░░░░░░░░░ 0% |\n\n---\n\n## 📚 Citation\n\nIf you use or reference this project, please cite:\n\n```bibtex\n@misc{replicateai2025,\n  author = {ReplicateAI Contributors},\n  title = {ReplicateAI: Rebuilding the History of Machine Learning and Artificial Intelligence},\n  year = {2025},\n  url = {https://github.com/duoan/ReplicateAI}\n}\n```\n\n---\n\n## 💬 Motto\n\n\u003e “Replicate. Verify. Understand.”\n\n---\n\n⭐️ **Star this repo if you believe reproducibility is the foundation of true intelligence.**\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduoan%2Freplicateai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduoan%2Freplicateai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduoan%2Freplicateai/lists"}