{"id":27298024,"url":"https://github.com/deep-learning-101/computer-vision-paper","last_synced_at":"2026-01-22T18:42:46.940Z","repository":{"id":178314563,"uuid":"492289307","full_name":"Deep-Learning-101/Computer-Vision-Paper","owner":"Deep-Learning-101","description":"https://deep-learning-101.github.io//Computer-Vision Computer vision (電腦視覺)","archived":false,"fork":false,"pushed_at":"2026-01-11T05:38:08.000Z","size":6662,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-11T06:17:04.291Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://www.twman.org/AI/CV","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Deep-Learning-101.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-05-14T18:00:47.000Z","updated_at":"2026-01-11T05:38:11.000Z","dependencies_parsed_at":"2025-02-28T04:38:47.237Z","dependency_job_id":"415a6263-d51c-40a1-ab69-8765b2bdc58d","html_url":"https://github.com/Deep-Learning-101/Computer-Vision-Paper","commit_stats":null,"previous_names":["deep-learning-101/computer-vision-paper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Deep-Learning-101/Computer-Vision-Paper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deep-Learning-101%2FComputer-Vision-Paper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deep-Learning-101%2FComputer-Vision-Paper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deep-Learning-101%2FComputer-Vision-Paper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deep-Learning-101%2FComputer-Vision-Paper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Deep-Learning-101","download_url":"https://codeload.github.com/Deep-Learning-101/Computer-Vision-Paper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Deep-Learning-101%2FComputer-Vision-Paper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28668262,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T17:07:18.858Z","status":"ssl_error","status_checked_at":"2026-01-22T17:05:02.040Z","response_time":144,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-12T00:25:27.256Z","updated_at":"2026-01-22T18:42:46.934Z","avatar_url":"https://github.com/Deep-Learning-101.png","language":null,"funding_links":["https://www.buymeacoffee.com/DeepLearning101"],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eDeep Learning 101, Taiwan’s pioneering and highest deep learning meetup, launched on 2016/11/11 @ 83F, Taipei 101\u003c/strong\u003e  \n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  AI是一條孤獨且充滿惶恐及未知的旅程，花俏絢麗的收費課程或活動絕非通往成功的捷徑。\u003cbr\u003e\n  衷心感謝當時來自不同單位的AI同好參與者實名分享的寶貴經驗；如欲移除資訊還請告知。\u003cbr\u003e\n  由 \u003ca href=\"https://www.twman.org/\" target=\"_blank\"\u003eTonTon Huang Ph.D.\u003c/a\u003e 發起，及其當時任職公司(台灣雪豹科技)無償贊助場地及茶水點心。\u003cbr\u003e\n\u003c/p\u003e  \n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://huggingface.co/spaces/DeepLearning101/Deep-Learning-101-FAQ\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://github.com/Deep-Learning-101/.github/blob/main/images/DeepLearning101.JPG?raw=true\" alt=\"Deep Learning 101\" width=\"180\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://www.buymeacoffee.com/DeepLearning101\" target=\"_blank\"\u003e\u003cimg src=\"https://cdn.buymeacoffee.com/buttons/v2/default-red.png\" alt=\"Buy Me A Coffee\" style=\"height: 100px !important;width: 180px !important;\" \u003e\u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://www.youtube.com/@DeepLearning101\" target=\"_blank\"\u003e去 YouTube 訂閱\u003c/a\u003e |\n  \u003ca href=\"https://www.facebook.com/groups/525579498272187/\" target=\"_blank\"\u003eFacebook\u003c/a\u003e |\n  \u003ca href=\"https://deep-learning-101.github.io/\"\u003e 回 GitHub Pages\u003c/a\u003e |\n  \u003ca href=\"https://github.com/Deep-Learning-101\" target=\"_blank\"\u003e 到 GitHub 點星\u003c/a\u003e |  \n  \u003ca href=\"https://www.twman.org/DeepLearning101\" target=\"_blank\"\u003e網站\u003c/a\u003e |\n  \u003ca href=\"https://huggingface.co/DeepLearning101\" target=\"_blank\"\u003e到 Hugging Face Space 按愛心\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n\u003cdiv align=\"center\"\u003e\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://deep-learning-101.github.io/Large-Language-Model\"\u003e大語言模型\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://deep-learning-101.github.io/Speech-Processing\"\u003e語音處理\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://deep-learning-101.github.io/Natural-Language-Processing\"\u003e自然語言處理\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://deep-learning-101.github.io//Computer-Vision\"\u003e電腦視覺\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003ca href=\"https://github.com/Deep-Learning-101/Natural-Language-Processing-Paper?tab=readme-ov-file#llm\"\u003eLarge Language Model\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://github.com/Deep-Learning-101/Speech-Processing-Paper\"\u003eSpeech Processing\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://github.com/Deep-Learning-101/Natural-Language-Processing-Paper\"\u003eNatural Language Processing, NLP\u003c/a\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003ca href=\"https://github.com/Deep-Learning-101/Computer-Vision-Paper\"\u003eComputer Vision\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003c/div\u003e\n\n---\n\n\u003cdetails\u003e\n\u003csummary\u003e手把手帶你一起踩 AI 坑\u003c/summary\u003e\n\n\u003ch3\u003e\u003ca href=\"https://blog.twman.org/p/deeplearning101.html\" target=\"_blank\"\u003e手把手帶你一起踩 AI 坑\u003c/a\u003e：\u003ca href=\"https://www.twman.org/AI\" target=\"_blank\"\u003ehttps://www.twman.org/AI\u003c/a\u003e\u003c/h3\u003e\n\n\u003cul\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2025/03/AIAgent.html\" target=\"_blank\"\u003e避開 AI Agent 開發陷阱：常見問題、挑戰與解決方案\u003c/a\u003e\u003c/b\u003e：\u003ca href=\"https://deep-learning-101.github.io/agent\" target=\"_blank\"\u003e探討多種 AI 代理人工具的應用經驗與挑戰，分享實用經驗與工具推薦。\u003c/a\u003e\n  \u003c/li\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2024/08/LLM.html\" target=\"_blank\"\u003e白話文手把手帶你科普 GenAI\u003c/a\u003e\u003c/b\u003e：\u003ca href=\"https://deep-learning-101.github.io/GenAI\" target=\"_blank\"\u003e淺顯介紹生成式人工智慧核心概念，強調硬體資源和數據的重要性。\u003c/a\u003e\n  \u003c/li\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2024/09/LLM.html\" target=\"_blank\"\u003e大型語言模型直接就打完收工？\u003c/a\u003e\u003c/b\u003e：\u003ca href=\"https://deep-learning-101.github.io/1010LLM\" target=\"_blank\"\u003e回顧 LLM 領域探索歷程，討論硬體升級對 AI 開發的重要性。\u003c/a\u003e\n  \u003c/li\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2024/07/RAG.html\" target=\"_blank\"\u003e檢索增強生成(RAG)不是萬靈丹之優化挑戰技巧\u003c/a\u003e\u003c/b\u003e：\u003ca href=\"https://deep-learning-101.github.io/RAG\" target=\"_blank\"\u003e探討 RAG 技術應用與挑戰，提供實用經驗分享和工具建議。\u003c/a\u003e\n  \u003c/li\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2024/02/LLM.html\" target=\"_blank\"\u003e大型語言模型 (LLM) 入門完整指南：原理、應用與未來\u003c/a\u003e\u003c/b\u003e：\u003ca href=\"https://deep-learning-101.github.io/0204LLM\" target=\"_blank\"\u003e探討多種 LLM 工具的應用與挑戰，強調硬體資源的重要性。\u003c/a\u003e\n  \u003c/li\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2023/04/GPT.html\" target=\"_blank\"\u003e什麼是大語言模型，它是什麼？想要嗎？(Large Language Model，LLM)\u003c/a\u003e\u003c/b\u003e：\u003ca href=\"https://deep-learning-101.github.io/GPU\" target=\"_blank\"\u003e探討 LLM 的發展與應用，強調硬體資源在開發中的關鍵作用。\u003c/a\u003e\n  \u003c/li\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2024/11/diffusion.html\" target=\"_blank\"\u003eDiffusion Model 完全解析：從原理、應用到實作 (AI 圖像生成)\u003c/a\u003e\u003c/b\u003e；\u003ca href=\"https://deep-learning-101.github.io/diffusion\" target=\"_blank\"\u003e深入探討影像生成與分割技術的應用，強調硬體資源的重要性。\u003c/a\u003e\n  \u003c/li\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2024/02/asr-tts.html\" target=\"_blank\"\u003eASR/TTS 開發避坑指南：語音辨識與合成的常見挑戰與對策\u003c/a\u003e\u003c/b\u003e：\u003ca href=\"https://deep-learning-101.github.io/asr-tts\" target=\"_blank\"\u003e探討 ASR 和 TTS 技術應用中的問題，強調數據質量的重要性。\u003c/a\u003e\n  \u003c/li\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2021/04/NLP.html\" target=\"_blank\"\u003e那些 NLP 踩的坑\u003c/a\u003e\u003c/b\u003e：\u003ca href=\"https://deep-learning-101.github.io/nlp\" target=\"_blank\"\u003e分享 NLP 領域的實踐經驗，強調數據質量對模型效果的影響。\u003c/a\u003e\n  \u003c/li\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2021/04/ASR.html\" target=\"_blank\"\u003e那些語音處理踩的坑\u003c/a\u003e\u003c/b\u003e：\u003ca href=\"https://deep-learning-101.github.io/speech\" target=\"_blank\"\u003e分享語音處理領域的實務經驗，強調資料品質對模型效果的影響。\u003c/a\u003e\n  \u003c/li\u003e\n  \u003cli\u003e\n    \u003cb\u003e\u003ca href=\"https://blog.twman.org/2020/05/DeepLearning.html\" target=\"_blank\"\u003e手把手學深度學習安裝環境\u003c/a\u003e\u003c/b\u003e：\u003ca href=\"https://deep-learning-101.github.io/101\" target=\"_blank\"\u003e詳細介紹在 Ubuntu 上安裝深度學習環境的步驟，分享實際操作經驗。\u003c/a\u003e\n  \u003c/li\u003e\n\u003c/ul\u003e\n\n\u003c/details\u003e\n\n---\n\n# Computer Vision (CV, 電腦視覺)\n\n### **文章目錄**\n- [Anomaly Detection](#anomalydetection)\n- [Object Detection](#objectdetection)\n- [Segmentation](#segmentation)\n- [OCR](#ocr)\n- [Diffusion model (擴散模型)](#diffusion-model)\n- [Digital Human (虛擬數字人)](#digital-human)\n\n\n\n## AnomalyDetection\n**Anomaly Detection (異常檢測)**\n\n- 2025-09-24｜**FS-SAM2**\n  - 說明：Adapting Segment Anything Model 2 for Few-Shot Semantic Segmentation\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/overview/2509.12105v1) | [📝 FS-SAM2 效能與效率雙優](https://zread.ai/fornib/FS-SAM2)\n\n- 2025-09-20｜**MOCHA**\n  - 說明：Multi-modal Objects-aware Cross-arcHitecture Alignment\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2509.14001v1) | [📝 注入 YOLO 少樣本檢測性能大漲](https://zhuanlan.zhihu.com/p/1952054591035281418)\n\n- 2025-07-16｜**CostFilter-AD**\n  - 說明：Enhancing Anomaly Detection through Matching Cost Filtering\n  - 資源：[🐙 GitHub](https://github.com/ZHE-SAPI/CostFilter-AD) | [📝 刷新無監督異常檢測上限](https://zhuanlan.zhihu.com/p/1928870223529882075)\n\n- 2025-06-13｜**One-to-Normal**\n  - 說明：Anomaly Personalization (少樣本異常識別新突破)\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/abs/2502.01201) | [📝 擴散模型協助精準偵測](https://zhuanlan.zhihu.com/p/1916799842879018831)\n\n- 2025-06-06｜**DualAnoDiff (CVPR 2025)**\n  - 說明：Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/abs/2408.13509v3) | [📝 復旦騰訊優圖新算法](https://www.qbitai.com/2025/06/291359.html)\n\n- 2025-05-15｜**AdaptCLIP**\n  - 說明：Adapting CLIP for Universal Visual Anomaly Detection\n  - 資源：[🐙 GitHub](https://github.com/aiiu-lab/AdaptCLIP) | [📄 AlphaXiv](https://www.alphaxiv.org/overview/2407.15795) | [📝 騰訊開源刷新 SOTA](https://mp.weixin.qq.com/s/w5x6T18aSZt9jxqMIdf-Yg)\n\n- 2025-05-05｜**Multi-Modal LLM for AD**\n  - 說明：Detect, Classify, Act: Categorizing Industrial Anomalies\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2505.02626) | [📚 DeepWiki](https://deepwiki.com/Sassanmtr/VELM) | [💾 MVTec Dataset](https://www.mvtec.com/company/research/datasets/mvtec-ad)\n\n- 2025-04-27｜**AnomalyCLIP**\n  - 說明：Object-agnostic Prompt Learning for Zero-shot Anomaly Detection\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/overview/2310.18961) | [📚 DeepWiki](https://deepwiki.com/zqhang/AnomalyCLIP)\n\n- 2025-04-26｜**PaDim**\n  - 說明：經典無監督異常檢測方法\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2011.08785) | [📚 DeepWiki](https://deepwiki.com/xiahaifeng1995/PaDiM-Anomaly-Detection-Localization-master)\n\n- 2025-04-12｜**AA-CLIP**\n  - 說明：Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2503.06661) | [📚 DeepWiki](https://deepwiki.com/Mwxinnn/AA-CLIP)\n\n- 2025-03-25｜**Dinomaly**\n  - 說明：The Less Is More Philosophy in Multi-Class Unsupervised AD\n  - 資源：[🐙 GitHub](https://github.com/guojiajeremy/Dinomaly) | [📝 無監督異常檢測 UAD 解讀](https://zhuanlan.zhihu.com/p/1886364053259146390)\n\n---\n\n## ObjectDetection\n**Object Detection (目標偵測)**\n\n- 2025｜**MCL (AAAI 2025)**\n  - 說明：Multi-clue Consistency Learning (遙感半監督目標檢測)\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/abs/2407.05909) | [🐙 GitHub](https://github.com/facias914/sood-mcl) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/26788012528)\n\n- 2025-07-24｜**OV-DINO**\n  - 說明：開源工業開放詞彙目標檢測\n  - 資源：[🐙 GitHub](https://github.com/wanghao9610/OV-DINO) | [📝 中文解讀](https://mp.weixin.qq.com/s/gLAVYFAH_39gT4XC0zWN0A)\n\n- 2025-06-18｜**CountVid**\n  - 說明：Open-World Object Counting in Videos (影片中指哪數哪)\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/abs/2506.15368) | [📝 牛津大學開源](https://mp.weixin.qq.com/s/hICrrfEgriyktoIxnbjPEQ)\n\n- 2025-06-15｜**GeoPix**\n  - 說明：像素級遙感多模態大模型\n  - 資源：[🐙 GitHub](https://github.com/Norman-Ou/GeoPix) | [📝 北大實驗室介紹](https://3slab.pku.edu.cn/info/1026/2121.htm)\n\n- 2025-05-23｜**VisionReasoner**\n  - 說明：用強化學習統一視覺感知與推理 (對標 Qwen2.5-VL)\n  - 資源：[🐙 GitHub](https://github.com/dvlab-research/VisionReasoner) | [📝 中文解讀](https://mp.weixin.qq.com/s/vECz3i_-dzvlDr3BdRLPWQ)\n\n- 2025-03-14｜**Falcon**\n  - 說明：A Remote Sensing Vision-Language Foundation Model\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/abs/2503.11070) | [📚 DeepWiki](https://deepwiki.com/TianHuiLab/Falcon)\n\n---\n\n## Segmentation\n**Segmentation (圖像分割)**\n\n- **Perceive Anything Model**\n  - 說明：Recognize, Explain, Caption, and Segment Anything (對標 SAM2 + LLM)\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2506.05302v1) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/1919709726209446971)\n\n- **RemoteSAM**\n  - 說明：Towards Segment Anything for Earth Observation\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/abs/2505.18022v3) | [📚 DeepWiki](https://deepwiki.com/1e12Leon/RemoteSAM)\n\n- **InstructSAM**\n  - 說明：Training-Free Framework for Remote Sensing\n  - 資源：[🌐 Project](https://voyagerxvoyagerx.github.io/InstructSAM/) | [📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2505.15818v1) | [📚 DeepWiki](https://deepwiki.com/VoyagerXvoyagerx/InstructSAM)\n\n- **RESAnything**\n  - 說明：Attribute Prompting for Arbitrary Referring Segmentation\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/abs/2505.02867) | [🌐 Project](https://suikei-wang.github.io/RESAnything/)\n\n- **CVPR 2025 Highlights**\n  - **SegAnyMo**: [Segment Any Motion in Videos](https://www.alphaxiv.org/zh/overview/2503.22268) | [🐙 GitHub](https://github.com/nnanhuang/SegAnyMo)\n  - **Exact**: [遙感影像時間序列弱監督學習](https://zhuanlan.zhihu.com/p/38754229963) | [🐙 GitHub](https://github.com/MiSsU-HH/Exact)\n\n- **MatAnyone**\n  - 說明：視訊摳圖，一次指定全程追踪，髮絲級還原\n  - 資源：[🐙 GitHub](https://github.com/pq-yang/MatAnyone) | [📝 機器之心解讀](https://www.jiqizhixin.com/articles/2025-04-17-27)\n\n- **SAM 2 \u0026 Variants (分割一切系列)**\n  - **Meta SAM 2**: [官方網站](https://ai.meta.com/sam2/) | [📝 60行程式碼微調教學](https://mp.weixin.qq.com/s/YfgYCzvi0cXxOFIfQvE_9w)\n  - **CLIPSeg**: [HuggingFace Space](https://huggingface.co/spaces/taesiri/CLIPSeg) | [🐙 GitHub](https://github.com/timojl/clipseg)\n  - **SAMURAI**: [Project](https://yangchris11.github.io/samurai/) | [📝 KF+SAM2 解決快速移動/自遮擋](https://mp.weixin.qq.com/s/iU3Bk_uO01GWUxAtIBsrWQ)\n  - **Grounded SAM 2**: [🐙 GitHub](https://github.com/IDEA-Research/Grounded-SAM-2) | [🤗 Demo](https://huggingface.co/spaces/yizhangliu/Grounded-Segment-Anything)\n  - **SAM2Long**: [🐙 GitHub](https://github.com/Mark12Ding/SAM2Long) | [📝 港中文提出複雜長視頻分割](https://mp.weixin.qq.com/s/henvaxGoNgx24NLQV1Qj2w)\n  - **SAM2-Adapter**: [🐙 GitHub](https://github.com/tianrun-chen/SAM-Adapter-PyTorch) | [📝 讓 SAM 2 適應下游任務](https://mp.weixin.qq.com/s/3z-LshKAgbSzNCzyoLOuag)\n  - **SAM2Point**: [🐙 GitHub](https://github.com/ZiyuGuo99/SAM2Point) | [📝 可提示 3D 分割里程碑](https://mp.weixin.qq.com/s/TnTK5UE7O_hcrNzloxBmAw)\n\n## OCR\n**Optical Character Recognition (光學文字識別)**\n**[針對物件或場景影像進行分析與偵測](https://www.twman.org/AI/CV)**\n\n- [使用開源模型強化您的 OCR 工作流程](https://huggingface.co/blog/zh/ocr-open-models)\n- [12個流行的開源免費OCR項目](https://mp.weixin.qq.com/s/7EuhnQedAX6injBL_Dg_sQ)\n\n- 2025-11-30｜**HunyuanOCR**\n  - 資源：[🐙 GitHub](https://github.com/Tencent-Hunyuan/HunyuanOCR) | [📝 騰訊混元 1B 級全能模型](https://zhuanlan.zhihu.com/p/1977498008712131326)\n\n- 2025-10-21 | **Chandra OCR**\n  - 資源：[🐙 GitHub](https://github.com/datalab-to/chandra) | [📝 超越DeepSeek-OCR！ OCR領域的革命性突破：Chandra OCR本地部署+真實評測](https://zhuanlan.zhihu.com/p/1969019468937144099)\n\n- 2025-10-19｜**PaddleOCR-VL**\n  - 資源：[🤗 HuggingFace](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) | [📝 圖片辨識轉文字巔峰之作](https://zhuanlan.zhihu.com/p/1964600336103745187)\n\n- 2025-08-18｜**DianJin-OCR-R1**\n  - 資源：[🐙 GitHub](https://github.com/aliyun/qwen-dianjin) | [📝 點金 OCR-R1：模糊蓋章、跨頁表格全拿下](https://mp.weixin.qq.com/s/cOo0sqwDt3ARid70wBaYVA)\n\n- 2025-07-30｜**dots.ocr**\n  - 資源：[🤗 HuggingFace](https://huggingface.co/rednote-hilab/dots.ocr) | [📝 本地部署 1.7B 超強 OCR](https://zhuanlan.zhihu.com/p/1935120171573413613)\n\n- 2025-06-16｜**OCRFlux**\n  - 說明：基於 LLM 的複雜佈局與跨頁合併 PDF 解析\n  - 資源：[🐙 GitHub](https://github.com/chatdoc-com/OCRFlux) | [🌐 Demo](https://ocrflux.pdfparser.io/#/)\n\n- 2025-06-05｜**MonkeyOCR**\n  - 資源：[📚 DeepWiki](https://deepwiki.com/Yuliang-Liu/MonkeyOCR) | [📄 AlphaXiv](https://www.alphaxiv.org/overview/2506.05218)\n\n  - 2025-03-05｜**OpenOCR**\n  - 資源：[🐙 GitHub](https://github.com/Topdu/OpenOCR) | [📝 通用OCR工具OpenOCR開源\n](https://zhuanlan.zhihu.com/p/10259507246)\n\n- 2025-03-05｜**PP-DocBee**\n  - 資源：[🐙 GitHub](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/deploy/ppdocbee) | [📝 百度文檔影像理解](https://zhuanlan.zhihu.com/p/28715553656)\n\n- 2025-03-03｜**olmocr**\n  - 資源：[🐙 GitHub](https://github.com/allenai/olmocr) | [📝 本地部署精準提取 PDF](https://www.aivi.fyi/llms/deploy-olmOCR)\n\n- 2025-02-05｜**MinerU**\n  - 資源：[🐙 GitHub](https://github.com/opendatalab/MinerU) | [📝 PDF 轉 Markdown 神器](https://mp.weixin.qq.com/s/ci5wp6gICTCtaRZfn5yWUQ)\n\n- 2024-12-15｜**markitdown**\n  - 資源：[🐙 GitHub](https://github.com/microsoft/markitdown)\n\n- 2024-10-29｜**OmniParser**\n  - 資源：[🐙 GitHub](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/OmniParser) | [📝 Alibaba 出品：通用文檔複雜場景抽取](https://mp.weixin.qq.com/s/_1Aatpna7poIVRhfYk4aAQ)\n\n- 2024-09-11｜**GOT-OCR-2.0**\n  - 資源：[📝 模型開源介紹](https://mp.weixin.qq.com/s/rQL-Q0TGhT6e8Ti4zZalrg) | [📝 OCR 2.0 時代來了](https://mp.weixin.qq.com/s/W-Ult-F3pU6Wvx3fHEN8yA)\n\n- 2024-08-20｜**PDF 轉 MarkDown 工具**\n  - 資源：[📝 萬物皆可 AI 化！12000 人圍觀的開源工具](https://www.53ai.com/news/MultimodalLargeModel/2024082059736.html)\n\n- **其他實用工具與資源**\n  - **RapidOCR**：[🐙 GitHub](https://github.com/RapidAI/RapidOCR/blob/main/docs/README_zh.md)\n  - **TableStructureRec**：[🐙 GitHub](https://github.com/RapidAI/TableStructureRec) | [📝 表格結構辨識推理庫](https://zhuanlan.zhihu.com/p/668484933)\n  - **PaddleOCR 教學**：[📝 用 PPOCRLabel 微調醫療診斷書和收據](https://blog.twman.org/2023/07/wsl.html)\n\n\n## Diffusion Model\n**Diffusion Model (擴散模型)**\n\n\n- 2025-05-28｜**Jodi**\n  - 說明：視覺理解 \u0026 生成大一統模型\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2505.19084) | [🌐 Project](https://vipl-genun.github.io/Project-Jodi/)\n\n- 2025-05-27｜**AnomalyAny (CVPR 2025)**\n  - 說明：Stable Diffusion 協助視覺異常檢測，無需訓練\n  - 資源：[🌐 Project](https://hansunhayden.github.io/AnomalyAny.github.io/) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/1910284073231942689)\n\n- 2025-05-23｜**HivisionIDPhotos**\n  - 說明：智慧證件照生成神器 (摳圖、換背景、任意尺寸)\n  - 資源：[📚 DeepWiki](https://deepwiki.com/Zeyi-Lin/HivisionIDPhotos) | [📝 教學文章](https://zhuanlan.zhihu.com/p/718725351)\n\n- 2025-05-19｜**Index-AniSora**\n  - 說明：B 站開源 SOTA 動畫影片生成模型\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/overview/2504.10044) | [📚 DeepWiki](https://deepwiki.com/bilibili/Index-anisora) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/1908150671540224717)\n\n- 2025-04-26｜**Insert Anything**\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2504.15009) | [📚 DeepWiki](https://deepwiki.com/song-wensong/insert-anything)\n\n- 2025-04-24｜**Phantom**\n  - 說明：字節跳動 1280x720 影片生成模型，10G 顯存可用\n  - 資源：[🐙 GitHub](https://github.com/Phantom-video/Phantom) | [📝 實測報告](https://zhuanlan.zhihu.com/p/1898688574477545694)\n\n- 2025-04-22｜**MAGI-1**\n  - 說明：Sand AI 全球首個自回歸影片生成大模型\n  - 資源：[🐙 GitHub](https://github.com/SandAI-org/Magi-1) | [📝 性能亮點解析](https://www.zhihu.com/question/1898030232184795448)\n\n- 2025-04-22｜**SkyReels V2**\n  - 說明：全球首個無限時長影片生成，電影級理解\n  - 資源：[🐙 GitHub](https://github.com/SkyworkAI/SkyReels-V2) | [📝 媒體報導](https://www.qbitai.com/2025/04/275531.html)\n\n- 2025-04-14｜**FramePack**\n  - 說明：ComfyUI 插件，6G 顯存跑 13B 模型，支援 1 分鐘影片\n  - 資源：[🐙 GitHub](https://github.com/kijai/ComfyUI-FramePackWrapper) | [📝 性價比分析](https://zhuanlan.zhihu.com/p/1896487969470251546)\n\n- 2025-04-14｜**Fantasy-talking**\n  - 說明：基於 Wan2.1 的音訊驅動數字人\n  - 資源：[🌐 Project](https://fantasy-amap.github.io/fantasy-talking/) | [📝 解讀文章](https://zhuanlan.zhihu.com/p/1892895916354148118)\n\n- 2025-04-05｜**SkyReels-A2**\n  - 資源：[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2504.02436) | [📚 DeepWiki](https://deepwiki.com/SkyworkAI/SkyReels-A2) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/1892709305301590652)\n\n- 2025-03-10｜**HunyuanVideo-I2V**\n  - 說明：騰訊開源圖生視訊模型 + LoRA 訓練腳本\n  - 資源：[🐙 GitHub](https://github.com/Tencent/HunyuanVideo-I2V) | [📝 實戰教學](https://zhuanlan.zhihu.com/p/29110060025)\n\n- 2025-02-25｜**Wan-Video**\n  - 說明：阿里萬相大模型開源，全模態、全尺寸\n  - 資源：[🐙 GitHub](https://github.com/Wan-Video/Wan2.1) | [📝 媒體報導](https://finance.sina.com.cn/jjxw/2025-02-26/doc-inemukxr9127437.shtml)\n\n- 2025-02-14｜**FlashVideo**\n  - 說明：字節跳動視訊增強演算法，102 秒生成 1080P 影片\n  - 資源：[🐙 GitHub](https://github.com/FoundationVision/FlashVideo) | [📝 解讀文章](https://zhuanlan.zhihu.com/p/23702953115)\n\n- 2025-01-28｜**Sana (ICLR 2025 Oral)**\n  - 說明：英偉達/MIT/清華開源，比 FLUX 快 100 倍\n  - 資源：[🐙 GitHub](https://github.com/NVlabs/Sana) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/19489214543)\n\n- **Flux \u0026 Ecosystem**\n  - **Flux Models**: [🤗 Black Forest Labs](https://huggingface.co/black-forest-labs)\n    - [Canny-dev](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Canny-dev) | [Depth-dev](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Depth-dev) | [Fill-dev](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Fill-dev) | [Redux-dev](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Redux-dev)\n  - **PuLID (2024-11-29)**: [🐙 GitHub](https://github.com/ToTheBeginning/PuLID) | [📝 ComfyUI 教學](https://mp.weixin.qq.com/s/07BMFHaSasl7-PFtkN6_Zg)\n  - **Leffa (2024-12-17)**: [🐙 GitHub](https://github.com/franciszzj/Leffa) | [📝 Meta AI 人物特徵保持](https://juejin.cn/post/7449325873725276196)\n  - **MagicQuill (2024-11-26)**: [🐙 GitHub](https://github.com/magic-quill/MagicQuill) | [🤗 Space](https://huggingface.co/spaces/AI4Editing/MagicQuill) | [📝 AI P 圖神器](https://mp.weixin.qq.com/s/Pc3xRP8_9BxkVSRNznkplw)\n\n- **Practical Tools (ComfyUI \u0026 Others)**\n  - **OOTDiffusion**: [🐙 GitHub](https://github.com/levihsu/OOTDiffusion) | [📝 AI 換裝神器](https://mp.weixin.qq.com/s/B2rNCjJLo8coYzoHGPnVaw)\n  - **ComfyUI Impact Pack**: [🐙 GitHub](https://github.com/ltdrdata/ComfyUI-Impact-Pack) | [📝 最強臉部修復](https://mp.weixin.qq.com/s/hNQ9BfdGbRQ_Osus-yMJWg)\n  - **OmniGen**: [🐙 GitHub](https://github.com/AIFSH/OmniGen-ComfyUI) | [📝 全能影像生成](https://mp.weixin.qq.com/s/msGK0FmNs3T3jbUBHfR9DA)\n\n---\n\n## Digital Human\n**Digital Human (虛擬數字人)**\n\n- **Open Avatar Chat**\n  - 資源：[📝 專案介紹](https://zread.ai/HumanAIGC-Engineering/OpenAvatarChat) | [📝 GitHub 爆火神器，本地部署無套路](https://mp.weixin.qq.com/s/eNRbU4lZLgdpe_iNSNcfGA)\n\n- **HeyGem**\n  - 資源：[🐙 GitHub](https://github.com/GuijiAI/HeyGem.ai) | [📝 數字人克隆神器](https://zhuanlan.zhihu.com/p/29274862393)\n\n- **Duix**\n  - 資源：[🐙 GitHub](https://github.com/GuijiAI/duix.ai) | [📝 全球首個真人數字人開源](https://zhuanlan.zhihu.com/p/716583514)\n\n- **Linly-Talker**\n  - 說明：結合 LLM 與視覺模型的智能交互系統\n  - 資源：[🐙 GitHub](https://github.com/Kedreamix/Linly-Talker)\n\n- **CVPR 2025 / NeurIPS Resources**\n  - **EchoMimicV2 (CVPR 2025)**: [🐙 GitHub](https://github.com/antgroup/echomimic_v2) - Striking, Simplified Human Animation.\n  - **Hallo3 (CVPR 2025)**: [🐙 GitHub](https://github.com/fudan-generative-vision/hallo3) - Highly Dynamic Portrait Animation.\n  - **MimicTalk (NeurIPS 2024)**: [🐙 GitHub](https://github.com/yerfor/MimicTalk) - 3D talking face.\n\n- **Other Tools**\n  - **JoyGen**: [🐙 GitHub](https://github.com/JOY-MM/JoyGen) (Audio-Driven 3D Editing)\n  - **Latentsync**: [🐙 GitHub](https://github.com/bytedance/LatentSync)\n  - **MuseTalk**: [🐙 GitHub](https://github.com/TMElyralab/MuseTalk)\n\n---\n\n## Image Recognition\n**Image Recognition (圖像識別)**\n\n\n- **ViT (Vision Transformer)**\n  - 資源：[🐙 GitHub](https://github.com/google-research/vision_transformer) | [📝 解析文章](https://zhuanlan.zhihu.com/p/445122996) | [📝 遷移表現分析](https://zhuanlan.zhihu.com/p/463608959)\n\n- **Swin Transformer**\n  - 資源：[🐙 GitHub](https://github.com/microsoft/Swin-Transformer) | [📝 用 CNN 方式打敗 CNN](https://zhuanlan.zhihu.com/p/362690149)\n\n- **EfficientNetV2**\n  - 資源：[🐙 GitHub](https://github.com/d-li14/efficientnetv2.pytorch) | [📝 更小更快的訓練](https://zhuanlan.zhihu.com/p/361873583)\n\n---\n\n## Document AI\n**Document Understanding \u0026 OCR (文檔理解與文字識別)**\n\n- **Donut (2022)**: OCR-free Document Understanding Transformer. [📄 arXiv:2111.15664](./donut.md)\n- **LayoutParser (2021)**: Unified toolkit for Deep Learning Based Document Analysis. [📄 arXiv:2103.15348](./LayoutParser.md)\n- **TrOCR (2021)**: Transformer-based OCR with Pre-trained Models. [📄 arXiv:2109.10282](./TrOCR.md)\n- **DiT (2022)**: Self-supervised Pre-training for Document Image Transformer. [📄 arXiv:2203.02378](./DiT.md)\n- **Nougat (2023)**: Neural Optical Understanding for Academic Documents. [📄 arXiv:2308.13418](https://facebookresearch.github.io/nougat/)\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003e📚 LayoutLM Series (點擊展開)\u003c/strong\u003e\u003c/summary\u003e\n\n- **LayoutLM (2020)**: Pre-training of Text and Layout. [📄 arXiv:1912.13318](./LayoutLM.md)\n- **LayoutLMv2 (2021)**: Multi-modal Pre-training. [📄 arXiv:2012.14740](./LayoutLMv2.md)\n- **LayoutXLM (2021)**: Multilingual Visually-rich Document Understanding. [📄 arXiv:2104.08836](./LayoutXLM.md)\n- **LayoutLMv3 (2022)**: Pre-training with Unified Text and Image Masking. [📄 arXiv:2204.08387](./LayoutLMv3.md)\n\u003c/details\u003e\n\n- **Scene Text Recognition**\n  - **ABINet (2021)**: Read Like Humans. [📄 arXiv:2103.06495](./ABINet.md)\n  - **ABINet++ (2022)**: Iterative Language Modeling for Text Spotting. [📄 arXiv:2211.10578](./ABINet%2B%2B.md)\n  - **ABCNet v2 (2021)**: Adaptive Bezier-Curve Network. [📄 arXiv:2105.03620](./ABCNet_v2.md)\n  - **SVTR (2022)**: Scene Text Recognition with a Single Visual Model. [📄 arXiv:2205.00159](./SVTR.md)\n\n---\n\n## DeepFake Detection\n**DeepFake Detection (深度偽造偵測)**\n\n- **Multi-attentional Deepfake Detection (CVPR 2021)**\n  - H. Zhao et al., Proceedings of the IEEE/CVF CVPR 2021.\n\n- **Geometric Features (CVPR 2021)**\n  - Improving Efficiency and Robustness through Precise Geometric Features. Sun, Zekun et al.\n\n- **3D Decomposition (CVPR 2021)**\n  - Face Forgery Detection by 3D Decomposition. Xiangyu Zhu et al.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeep-learning-101%2Fcomputer-vision-paper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeep-learning-101%2Fcomputer-vision-paper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeep-learning-101%2Fcomputer-vision-paper/lists"}