Projects in Awesome Lists by TencentARC

[CVPR 2025] Official code of "DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation"

Last synced: 25 Jun 2025

https://github.com/tencentarc/umt

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Last synced: 21 Jul 2025

https://github.com/tencentarc/vit-lens

[CVPR 2024] ViT-Lens: Towards Omni-modal Representations

multimodal-learning

Last synced: 04 Apr 2025

https://github.com/tencentarc/mm-realsr

Codes for "Metric Learning based Interactive Modulation for Real-World Super-Resolution"

Last synced: 05 Apr 2025

https://github.com/tencentarc/st-llm

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

large-language-models video-language-model video-understanding

Last synced: 08 Oct 2025

https://github.com/tencentarc/mcq

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Last synced: 05 Apr 2025

https://github.com/tencentarc/desra

Official codes for DeSRA (ICML 2023)

Last synced: 05 Apr 2025

https://github.com/tencentarc/stereocrafter

A framework to convert any 2D videos to immersive stereoscopic 3D

Last synced: 25 Jun 2025

https://github.com/tencentarc/faig

NeurIPS 2021, Spotlight, Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution

Last synced: 05 Apr 2025

https://github.com/TencentARC/ArcNerf

Nerf and extensions in all

3d deep-learning graphics instant-ngp nerf neural-rendering pytorch reconstruction rendering

Last synced: 02 May 2025

https://github.com/tencentarc/arcnerf

Nerf and extensions in all

3d deep-learning graphics instant-ngp nerf neural-rendering pytorch reconstruction rendering

Last synced: 05 Apr 2025

https://github.com/tencentarc/moto

Latent Motion Token as the Bridging Language for Robot Manipulation

Last synced: 11 Oct 2025

https://github.com/tencentarc/mllm-npu

mllm-npu: training multimodal large language models on Ascend NPUs

Last synced: 17 Jun 2025

https://github.com/tencentarc/seed-bench-r1

Last synced: 14 Feb 2026

https://github.com/tencentarc/blobctrl

[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

aigc image-editing

Last synced: 25 Jun 2025

https://github.com/tencentarc/surfelnerf

SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor Scenes

Last synced: 22 Jan 2026

https://github.com/tencentarc/repsr

Codes for "RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization"

Last synced: 16 Mar 2026

https://github.com/tencentarc/di-pcg

Code release of our paper "DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation".

Last synced: 25 Jun 2025

https://github.com/tencentarc/hosnerf

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video

Last synced: 05 Apr 2025

https://github.com/tencentarc/divot

Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)

Last synced: 25 Jun 2025

https://github.com/tencentarc/fastrealvsr

Codes for "Mitigating Artifacts in Real-World Video Super-Resolution Models"

Last synced: 02 Feb 2026

https://github.com/tencentarc/conmim

Official codes for ConMIM (ICLR 2023)

Last synced: 05 Apr 2025

https://github.com/tencentarc/gvt

Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".

Last synced: 05 Apr 2025

https://github.com/tencentarc/freesplatter

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Last synced: 25 Jun 2025

https://github.com/tencentarc/video-holmes

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Last synced: 25 Jun 2025

https://github.com/tencentarc/tvts

Turning to Video for Transcript Sorting

Last synced: 05 Apr 2025

https://github.com/tencentarc/bebr

Official code for "Binary embedding based retrieval at Tencent"

Last synced: 03 Sep 2025

https://github.com/tencentarc/mindomni

Last synced: 25 Jun 2025

https://github.com/tencentarc/visft

Last synced: 09 Mar 2026

https://github.com/tencentarc/pi-tuning

Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.

Last synced: 05 Apr 2025

https://github.com/tencentarc/flm

Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)

language-modeling vision-language-pretraining

Last synced: 05 Apr 2025

Last synced: 22 Jul 2025