An open API service indexing awesome lists of open source software.

https://github.com/deep-learning-101/computer-vision-paper

https://deep-learning-101.github.io//Computer-Vision Computer vision (電腦視覺)
https://github.com/deep-learning-101/computer-vision-paper

Last synced: 5 months ago
JSON representation

https://deep-learning-101.github.io//Computer-Vision Computer vision (電腦視覺)

Awesome Lists containing this project

README

          


Deep Learning 101, Taiwan’s pioneering and highest deep learning meetup, launched on 2016/11/11 @ 83F, Taipei 101



AI是一條孤獨且充滿惶恐及未知的旅程,花俏絢麗的收費課程或活動絕非通往成功的捷徑。

衷心感謝當時來自不同單位的AI同好參與者實名分享的寶貴經驗;如欲移除資訊還請告知。

TonTon Huang Ph.D. 發起,及其當時任職公司(台灣雪豹科技)無償贊助場地及茶水點心。




Deep Learning 101

Buy Me A Coffee



去 YouTube 訂閱 |
Facebook |
回 GitHub Pages |
到 GitHub 點星 |
網站 |
到 Hugging Face Space 按愛心

---


大語言模型
語音處理
自然語言處理
電腦視覺


Large Language Model
Speech Processing
Natural Language Processing, NLP
Computer Vision

---

手把手帶你一起踩 AI 坑


手把手帶你一起踩 AI 坑https://www.twman.org/AI

---

# Computer Vision (CV, 電腦視覺)

### **文章目錄**
- [Anomaly Detection](#anomalydetection)
- [Object Detection](#objectdetection)
- [Segmentation](#segmentation)
- [OCR](#ocr)
- [Diffusion model (擴散模型)](#diffusion-model)
- [Digital Human (虛擬數字人)](#digital-human)

## AnomalyDetection
**Anomaly Detection (異常檢測)**

- 2025-09-24|**FS-SAM2**
- 說明:Adapting Segment Anything Model 2 for Few-Shot Semantic Segmentation
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/overview/2509.12105v1) | [📝 FS-SAM2 效能與效率雙優](https://zread.ai/fornib/FS-SAM2)

- 2025-09-20|**MOCHA**
- 說明:Multi-modal Objects-aware Cross-arcHitecture Alignment
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2509.14001v1) | [📝 注入 YOLO 少樣本檢測性能大漲](https://zhuanlan.zhihu.com/p/1952054591035281418)

- 2025-07-16|**CostFilter-AD**
- 說明:Enhancing Anomaly Detection through Matching Cost Filtering
- 資源:[🐙 GitHub](https://github.com/ZHE-SAPI/CostFilter-AD) | [📝 刷新無監督異常檢測上限](https://zhuanlan.zhihu.com/p/1928870223529882075)

- 2025-06-13|**One-to-Normal**
- 說明:Anomaly Personalization (少樣本異常識別新突破)
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/abs/2502.01201) | [📝 擴散模型協助精準偵測](https://zhuanlan.zhihu.com/p/1916799842879018831)

- 2025-06-06|**DualAnoDiff (CVPR 2025)**
- 說明:Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/abs/2408.13509v3) | [📝 復旦騰訊優圖新算法](https://www.qbitai.com/2025/06/291359.html)

- 2025-05-15|**AdaptCLIP**
- 說明:Adapting CLIP for Universal Visual Anomaly Detection
- 資源:[🐙 GitHub](https://github.com/aiiu-lab/AdaptCLIP) | [📄 AlphaXiv](https://www.alphaxiv.org/overview/2407.15795) | [📝 騰訊開源刷新 SOTA](https://mp.weixin.qq.com/s/w5x6T18aSZt9jxqMIdf-Yg)

- 2025-05-05|**Multi-Modal LLM for AD**
- 說明:Detect, Classify, Act: Categorizing Industrial Anomalies
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2505.02626) | [📚 DeepWiki](https://deepwiki.com/Sassanmtr/VELM) | [💾 MVTec Dataset](https://www.mvtec.com/company/research/datasets/mvtec-ad)

- 2025-04-27|**AnomalyCLIP**
- 說明:Object-agnostic Prompt Learning for Zero-shot Anomaly Detection
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/overview/2310.18961) | [📚 DeepWiki](https://deepwiki.com/zqhang/AnomalyCLIP)

- 2025-04-26|**PaDim**
- 說明:經典無監督異常檢測方法
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2011.08785) | [📚 DeepWiki](https://deepwiki.com/xiahaifeng1995/PaDiM-Anomaly-Detection-Localization-master)

- 2025-04-12|**AA-CLIP**
- 說明:Enhancing Zero-shot Anomaly Detection via Anomaly-Aware CLIP
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2503.06661) | [📚 DeepWiki](https://deepwiki.com/Mwxinnn/AA-CLIP)

- 2025-03-25|**Dinomaly**
- 說明:The Less Is More Philosophy in Multi-Class Unsupervised AD
- 資源:[🐙 GitHub](https://github.com/guojiajeremy/Dinomaly) | [📝 無監督異常檢測 UAD 解讀](https://zhuanlan.zhihu.com/p/1886364053259146390)

---

## ObjectDetection
**Object Detection (目標偵測)**

- 2025|**MCL (AAAI 2025)**
- 說明:Multi-clue Consistency Learning (遙感半監督目標檢測)
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/abs/2407.05909) | [🐙 GitHub](https://github.com/facias914/sood-mcl) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/26788012528)

- 2025-07-24|**OV-DINO**
- 說明:開源工業開放詞彙目標檢測
- 資源:[🐙 GitHub](https://github.com/wanghao9610/OV-DINO) | [📝 中文解讀](https://mp.weixin.qq.com/s/gLAVYFAH_39gT4XC0zWN0A)

- 2025-06-18|**CountVid**
- 說明:Open-World Object Counting in Videos (影片中指哪數哪)
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/abs/2506.15368) | [📝 牛津大學開源](https://mp.weixin.qq.com/s/hICrrfEgriyktoIxnbjPEQ)

- 2025-06-15|**GeoPix**
- 說明:像素級遙感多模態大模型
- 資源:[🐙 GitHub](https://github.com/Norman-Ou/GeoPix) | [📝 北大實驗室介紹](https://3slab.pku.edu.cn/info/1026/2121.htm)

- 2025-05-23|**VisionReasoner**
- 說明:用強化學習統一視覺感知與推理 (對標 Qwen2.5-VL)
- 資源:[🐙 GitHub](https://github.com/dvlab-research/VisionReasoner) | [📝 中文解讀](https://mp.weixin.qq.com/s/vECz3i_-dzvlDr3BdRLPWQ)

- 2025-03-14|**Falcon**
- 說明:A Remote Sensing Vision-Language Foundation Model
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/abs/2503.11070) | [📚 DeepWiki](https://deepwiki.com/TianHuiLab/Falcon)

---

## Segmentation
**Segmentation (圖像分割)**

- **Perceive Anything Model**
- 說明:Recognize, Explain, Caption, and Segment Anything (對標 SAM2 + LLM)
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2506.05302v1) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/1919709726209446971)

- **RemoteSAM**
- 說明:Towards Segment Anything for Earth Observation
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/abs/2505.18022v3) | [📚 DeepWiki](https://deepwiki.com/1e12Leon/RemoteSAM)

- **InstructSAM**
- 說明:Training-Free Framework for Remote Sensing
- 資源:[🌐 Project](https://voyagerxvoyagerx.github.io/InstructSAM/) | [📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2505.15818v1) | [📚 DeepWiki](https://deepwiki.com/VoyagerXvoyagerx/InstructSAM)

- **RESAnything**
- 說明:Attribute Prompting for Arbitrary Referring Segmentation
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/abs/2505.02867) | [🌐 Project](https://suikei-wang.github.io/RESAnything/)

- **CVPR 2025 Highlights**
- **SegAnyMo**: [Segment Any Motion in Videos](https://www.alphaxiv.org/zh/overview/2503.22268) | [🐙 GitHub](https://github.com/nnanhuang/SegAnyMo)
- **Exact**: [遙感影像時間序列弱監督學習](https://zhuanlan.zhihu.com/p/38754229963) | [🐙 GitHub](https://github.com/MiSsU-HH/Exact)

- **MatAnyone**
- 說明:視訊摳圖,一次指定全程追踪,髮絲級還原
- 資源:[🐙 GitHub](https://github.com/pq-yang/MatAnyone) | [📝 機器之心解讀](https://www.jiqizhixin.com/articles/2025-04-17-27)

- **SAM 2 & Variants (分割一切系列)**
- **Meta SAM 2**: [官方網站](https://ai.meta.com/sam2/) | [📝 60行程式碼微調教學](https://mp.weixin.qq.com/s/YfgYCzvi0cXxOFIfQvE_9w)
- **CLIPSeg**: [HuggingFace Space](https://huggingface.co/spaces/taesiri/CLIPSeg) | [🐙 GitHub](https://github.com/timojl/clipseg)
- **SAMURAI**: [Project](https://yangchris11.github.io/samurai/) | [📝 KF+SAM2 解決快速移動/自遮擋](https://mp.weixin.qq.com/s/iU3Bk_uO01GWUxAtIBsrWQ)
- **Grounded SAM 2**: [🐙 GitHub](https://github.com/IDEA-Research/Grounded-SAM-2) | [🤗 Demo](https://huggingface.co/spaces/yizhangliu/Grounded-Segment-Anything)
- **SAM2Long**: [🐙 GitHub](https://github.com/Mark12Ding/SAM2Long) | [📝 港中文提出複雜長視頻分割](https://mp.weixin.qq.com/s/henvaxGoNgx24NLQV1Qj2w)
- **SAM2-Adapter**: [🐙 GitHub](https://github.com/tianrun-chen/SAM-Adapter-PyTorch) | [📝 讓 SAM 2 適應下游任務](https://mp.weixin.qq.com/s/3z-LshKAgbSzNCzyoLOuag)
- **SAM2Point**: [🐙 GitHub](https://github.com/ZiyuGuo99/SAM2Point) | [📝 可提示 3D 分割里程碑](https://mp.weixin.qq.com/s/TnTK5UE7O_hcrNzloxBmAw)

## OCR
**Optical Character Recognition (光學文字識別)**
**[針對物件或場景影像進行分析與偵測](https://www.twman.org/AI/CV)**

- [使用開源模型強化您的 OCR 工作流程](https://huggingface.co/blog/zh/ocr-open-models)
- [12個流行的開源免費OCR項目](https://mp.weixin.qq.com/s/7EuhnQedAX6injBL_Dg_sQ)

- 2025-11-30|**HunyuanOCR**
- 資源:[🐙 GitHub](https://github.com/Tencent-Hunyuan/HunyuanOCR) | [📝 騰訊混元 1B 級全能模型](https://zhuanlan.zhihu.com/p/1977498008712131326)

- 2025-10-21 | **Chandra OCR**
- 資源:[🐙 GitHub](https://github.com/datalab-to/chandra) | [📝 超越DeepSeek-OCR! OCR領域的革命性突破:Chandra OCR本地部署+真實評測](https://zhuanlan.zhihu.com/p/1969019468937144099)

- 2025-10-19|**PaddleOCR-VL**
- 資源:[🤗 HuggingFace](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) | [📝 圖片辨識轉文字巔峰之作](https://zhuanlan.zhihu.com/p/1964600336103745187)

- 2025-08-18|**DianJin-OCR-R1**
- 資源:[🐙 GitHub](https://github.com/aliyun/qwen-dianjin) | [📝 點金 OCR-R1:模糊蓋章、跨頁表格全拿下](https://mp.weixin.qq.com/s/cOo0sqwDt3ARid70wBaYVA)

- 2025-07-30|**dots.ocr**
- 資源:[🤗 HuggingFace](https://huggingface.co/rednote-hilab/dots.ocr) | [📝 本地部署 1.7B 超強 OCR](https://zhuanlan.zhihu.com/p/1935120171573413613)

- 2025-06-16|**OCRFlux**
- 說明:基於 LLM 的複雜佈局與跨頁合併 PDF 解析
- 資源:[🐙 GitHub](https://github.com/chatdoc-com/OCRFlux) | [🌐 Demo](https://ocrflux.pdfparser.io/#/)

- 2025-06-05|**MonkeyOCR**
- 資源:[📚 DeepWiki](https://deepwiki.com/Yuliang-Liu/MonkeyOCR) | [📄 AlphaXiv](https://www.alphaxiv.org/overview/2506.05218)

- 2025-03-05|**OpenOCR**
- 資源:[🐙 GitHub](https://github.com/Topdu/OpenOCR) | [📝 通用OCR工具OpenOCR開源
](https://zhuanlan.zhihu.com/p/10259507246)

- 2025-03-05|**PP-DocBee**
- 資源:[🐙 GitHub](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/deploy/ppdocbee) | [📝 百度文檔影像理解](https://zhuanlan.zhihu.com/p/28715553656)

- 2025-03-03|**olmocr**
- 資源:[🐙 GitHub](https://github.com/allenai/olmocr) | [📝 本地部署精準提取 PDF](https://www.aivi.fyi/llms/deploy-olmOCR)

- 2025-02-05|**MinerU**
- 資源:[🐙 GitHub](https://github.com/opendatalab/MinerU) | [📝 PDF 轉 Markdown 神器](https://mp.weixin.qq.com/s/ci5wp6gICTCtaRZfn5yWUQ)

- 2024-12-15|**markitdown**
- 資源:[🐙 GitHub](https://github.com/microsoft/markitdown)

- 2024-10-29|**OmniParser**
- 資源:[🐙 GitHub](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/OmniParser) | [📝 Alibaba 出品:通用文檔複雜場景抽取](https://mp.weixin.qq.com/s/_1Aatpna7poIVRhfYk4aAQ)

- 2024-09-11|**GOT-OCR-2.0**
- 資源:[📝 模型開源介紹](https://mp.weixin.qq.com/s/rQL-Q0TGhT6e8Ti4zZalrg) | [📝 OCR 2.0 時代來了](https://mp.weixin.qq.com/s/W-Ult-F3pU6Wvx3fHEN8yA)

- 2024-08-20|**PDF 轉 MarkDown 工具**
- 資源:[📝 萬物皆可 AI 化!12000 人圍觀的開源工具](https://www.53ai.com/news/MultimodalLargeModel/2024082059736.html)

- **其他實用工具與資源**
- **RapidOCR**:[🐙 GitHub](https://github.com/RapidAI/RapidOCR/blob/main/docs/README_zh.md)
- **TableStructureRec**:[🐙 GitHub](https://github.com/RapidAI/TableStructureRec) | [📝 表格結構辨識推理庫](https://zhuanlan.zhihu.com/p/668484933)
- **PaddleOCR 教學**:[📝 用 PPOCRLabel 微調醫療診斷書和收據](https://blog.twman.org/2023/07/wsl.html)

## Diffusion Model
**Diffusion Model (擴散模型)**

- 2025-05-28|**Jodi**
- 說明:視覺理解 & 生成大一統模型
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2505.19084) | [🌐 Project](https://vipl-genun.github.io/Project-Jodi/)

- 2025-05-27|**AnomalyAny (CVPR 2025)**
- 說明:Stable Diffusion 協助視覺異常檢測,無需訓練
- 資源:[🌐 Project](https://hansunhayden.github.io/AnomalyAny.github.io/) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/1910284073231942689)

- 2025-05-23|**HivisionIDPhotos**
- 說明:智慧證件照生成神器 (摳圖、換背景、任意尺寸)
- 資源:[📚 DeepWiki](https://deepwiki.com/Zeyi-Lin/HivisionIDPhotos) | [📝 教學文章](https://zhuanlan.zhihu.com/p/718725351)

- 2025-05-19|**Index-AniSora**
- 說明:B 站開源 SOTA 動畫影片生成模型
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/overview/2504.10044) | [📚 DeepWiki](https://deepwiki.com/bilibili/Index-anisora) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/1908150671540224717)

- 2025-04-26|**Insert Anything**
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2504.15009) | [📚 DeepWiki](https://deepwiki.com/song-wensong/insert-anything)

- 2025-04-24|**Phantom**
- 說明:字節跳動 1280x720 影片生成模型,10G 顯存可用
- 資源:[🐙 GitHub](https://github.com/Phantom-video/Phantom) | [📝 實測報告](https://zhuanlan.zhihu.com/p/1898688574477545694)

- 2025-04-22|**MAGI-1**
- 說明:Sand AI 全球首個自回歸影片生成大模型
- 資源:[🐙 GitHub](https://github.com/SandAI-org/Magi-1) | [📝 性能亮點解析](https://www.zhihu.com/question/1898030232184795448)

- 2025-04-22|**SkyReels V2**
- 說明:全球首個無限時長影片生成,電影級理解
- 資源:[🐙 GitHub](https://github.com/SkyworkAI/SkyReels-V2) | [📝 媒體報導](https://www.qbitai.com/2025/04/275531.html)

- 2025-04-14|**FramePack**
- 說明:ComfyUI 插件,6G 顯存跑 13B 模型,支援 1 分鐘影片
- 資源:[🐙 GitHub](https://github.com/kijai/ComfyUI-FramePackWrapper) | [📝 性價比分析](https://zhuanlan.zhihu.com/p/1896487969470251546)

- 2025-04-14|**Fantasy-talking**
- 說明:基於 Wan2.1 的音訊驅動數字人
- 資源:[🌐 Project](https://fantasy-amap.github.io/fantasy-talking/) | [📝 解讀文章](https://zhuanlan.zhihu.com/p/1892895916354148118)

- 2025-04-05|**SkyReels-A2**
- 資源:[📄 AlphaXiv](https://www.alphaxiv.org/zh/overview/2504.02436) | [📚 DeepWiki](https://deepwiki.com/SkyworkAI/SkyReels-A2) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/1892709305301590652)

- 2025-03-10|**HunyuanVideo-I2V**
- 說明:騰訊開源圖生視訊模型 + LoRA 訓練腳本
- 資源:[🐙 GitHub](https://github.com/Tencent/HunyuanVideo-I2V) | [📝 實戰教學](https://zhuanlan.zhihu.com/p/29110060025)

- 2025-02-25|**Wan-Video**
- 說明:阿里萬相大模型開源,全模態、全尺寸
- 資源:[🐙 GitHub](https://github.com/Wan-Video/Wan2.1) | [📝 媒體報導](https://finance.sina.com.cn/jjxw/2025-02-26/doc-inemukxr9127437.shtml)

- 2025-02-14|**FlashVideo**
- 說明:字節跳動視訊增強演算法,102 秒生成 1080P 影片
- 資源:[🐙 GitHub](https://github.com/FoundationVision/FlashVideo) | [📝 解讀文章](https://zhuanlan.zhihu.com/p/23702953115)

- 2025-01-28|**Sana (ICLR 2025 Oral)**
- 說明:英偉達/MIT/清華開源,比 FLUX 快 100 倍
- 資源:[🐙 GitHub](https://github.com/NVlabs/Sana) | [📝 中文解讀](https://zhuanlan.zhihu.com/p/19489214543)

- **Flux & Ecosystem**
- **Flux Models**: [🤗 Black Forest Labs](https://huggingface.co/black-forest-labs)
- [Canny-dev](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Canny-dev) | [Depth-dev](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Depth-dev) | [Fill-dev](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Fill-dev) | [Redux-dev](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Redux-dev)
- **PuLID (2024-11-29)**: [🐙 GitHub](https://github.com/ToTheBeginning/PuLID) | [📝 ComfyUI 教學](https://mp.weixin.qq.com/s/07BMFHaSasl7-PFtkN6_Zg)
- **Leffa (2024-12-17)**: [🐙 GitHub](https://github.com/franciszzj/Leffa) | [📝 Meta AI 人物特徵保持](https://juejin.cn/post/7449325873725276196)
- **MagicQuill (2024-11-26)**: [🐙 GitHub](https://github.com/magic-quill/MagicQuill) | [🤗 Space](https://huggingface.co/spaces/AI4Editing/MagicQuill) | [📝 AI P 圖神器](https://mp.weixin.qq.com/s/Pc3xRP8_9BxkVSRNznkplw)

- **Practical Tools (ComfyUI & Others)**
- **OOTDiffusion**: [🐙 GitHub](https://github.com/levihsu/OOTDiffusion) | [📝 AI 換裝神器](https://mp.weixin.qq.com/s/B2rNCjJLo8coYzoHGPnVaw)
- **ComfyUI Impact Pack**: [🐙 GitHub](https://github.com/ltdrdata/ComfyUI-Impact-Pack) | [📝 最強臉部修復](https://mp.weixin.qq.com/s/hNQ9BfdGbRQ_Osus-yMJWg)
- **OmniGen**: [🐙 GitHub](https://github.com/AIFSH/OmniGen-ComfyUI) | [📝 全能影像生成](https://mp.weixin.qq.com/s/msGK0FmNs3T3jbUBHfR9DA)

---

## Digital Human
**Digital Human (虛擬數字人)**

- **Open Avatar Chat**
- 資源:[📝 專案介紹](https://zread.ai/HumanAIGC-Engineering/OpenAvatarChat) | [📝 GitHub 爆火神器,本地部署無套路](https://mp.weixin.qq.com/s/eNRbU4lZLgdpe_iNSNcfGA)

- **HeyGem**
- 資源:[🐙 GitHub](https://github.com/GuijiAI/HeyGem.ai) | [📝 數字人克隆神器](https://zhuanlan.zhihu.com/p/29274862393)

- **Duix**
- 資源:[🐙 GitHub](https://github.com/GuijiAI/duix.ai) | [📝 全球首個真人數字人開源](https://zhuanlan.zhihu.com/p/716583514)

- **Linly-Talker**
- 說明:結合 LLM 與視覺模型的智能交互系統
- 資源:[🐙 GitHub](https://github.com/Kedreamix/Linly-Talker)

- **CVPR 2025 / NeurIPS Resources**
- **EchoMimicV2 (CVPR 2025)**: [🐙 GitHub](https://github.com/antgroup/echomimic_v2) - Striking, Simplified Human Animation.
- **Hallo3 (CVPR 2025)**: [🐙 GitHub](https://github.com/fudan-generative-vision/hallo3) - Highly Dynamic Portrait Animation.
- **MimicTalk (NeurIPS 2024)**: [🐙 GitHub](https://github.com/yerfor/MimicTalk) - 3D talking face.

- **Other Tools**
- **JoyGen**: [🐙 GitHub](https://github.com/JOY-MM/JoyGen) (Audio-Driven 3D Editing)
- **Latentsync**: [🐙 GitHub](https://github.com/bytedance/LatentSync)
- **MuseTalk**: [🐙 GitHub](https://github.com/TMElyralab/MuseTalk)

---

## Image Recognition
**Image Recognition (圖像識別)**

- **ViT (Vision Transformer)**
- 資源:[🐙 GitHub](https://github.com/google-research/vision_transformer) | [📝 解析文章](https://zhuanlan.zhihu.com/p/445122996) | [📝 遷移表現分析](https://zhuanlan.zhihu.com/p/463608959)

- **Swin Transformer**
- 資源:[🐙 GitHub](https://github.com/microsoft/Swin-Transformer) | [📝 用 CNN 方式打敗 CNN](https://zhuanlan.zhihu.com/p/362690149)

- **EfficientNetV2**
- 資源:[🐙 GitHub](https://github.com/d-li14/efficientnetv2.pytorch) | [📝 更小更快的訓練](https://zhuanlan.zhihu.com/p/361873583)

---

## Document AI
**Document Understanding & OCR (文檔理解與文字識別)**

- **Donut (2022)**: OCR-free Document Understanding Transformer. [📄 arXiv:2111.15664](./donut.md)
- **LayoutParser (2021)**: Unified toolkit for Deep Learning Based Document Analysis. [📄 arXiv:2103.15348](./LayoutParser.md)
- **TrOCR (2021)**: Transformer-based OCR with Pre-trained Models. [📄 arXiv:2109.10282](./TrOCR.md)
- **DiT (2022)**: Self-supervised Pre-training for Document Image Transformer. [📄 arXiv:2203.02378](./DiT.md)
- **Nougat (2023)**: Neural Optical Understanding for Academic Documents. [📄 arXiv:2308.13418](https://facebookresearch.github.io/nougat/)

📚 LayoutLM Series (點擊展開)

- **LayoutLM (2020)**: Pre-training of Text and Layout. [📄 arXiv:1912.13318](./LayoutLM.md)
- **LayoutLMv2 (2021)**: Multi-modal Pre-training. [📄 arXiv:2012.14740](./LayoutLMv2.md)
- **LayoutXLM (2021)**: Multilingual Visually-rich Document Understanding. [📄 arXiv:2104.08836](./LayoutXLM.md)
- **LayoutLMv3 (2022)**: Pre-training with Unified Text and Image Masking. [📄 arXiv:2204.08387](./LayoutLMv3.md)

- **Scene Text Recognition**
- **ABINet (2021)**: Read Like Humans. [📄 arXiv:2103.06495](./ABINet.md)
- **ABINet++ (2022)**: Iterative Language Modeling for Text Spotting. [📄 arXiv:2211.10578](./ABINet%2B%2B.md)
- **ABCNet v2 (2021)**: Adaptive Bezier-Curve Network. [📄 arXiv:2105.03620](./ABCNet_v2.md)
- **SVTR (2022)**: Scene Text Recognition with a Single Visual Model. [📄 arXiv:2205.00159](./SVTR.md)

---

## DeepFake Detection
**DeepFake Detection (深度偽造偵測)**

- **Multi-attentional Deepfake Detection (CVPR 2021)**
- H. Zhao et al., Proceedings of the IEEE/CVF CVPR 2021.

- **Geometric Features (CVPR 2021)**
- Improving Efficiency and Robustness through Precise Geometric Features. Sun, Zekun et al.

- **3D Decomposition (CVPR 2021)**
- Face Forgery Detection by 3D Decomposition. Xiangyu Zhu et al.