An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by opendatalab

A curated list of projects in awesome lists by opendatalab .

https://github.com/opendatalab/mineru

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 06 Jan 2026

https://github.com/opendatalab/MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

ai4science document-analysis extract-data layout-analysis ocr parser pdf pdf-converter pdf-extractor-llm pdf-extractor-pretrain pdf-extractor-rag pdf-parser python

Last synced: 24 Mar 2025

https://github.com/opendatalab/pdf-extract-kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Last synced: 13 May 2025

https://github.com/opendatalab/PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Last synced: 06 May 2025

https://github.com/opendatalab/labelu

Data annotation toolbox supports image, audio and video data.

Last synced: 14 May 2025

https://github.com/opendatalab/labelU

Data annotation toolbox supports image, audio and video data.

Last synced: 21 Apr 2025

https://github.com/opendatalab/labelllm

The Open-Source Data Annotation Platform

Last synced: 15 May 2025

https://github.com/opendatalab/WanJuan1.0

万卷1.0多模态语料

Last synced: 20 Apr 2025

https://github.com/opendatalab/wanjuan1.0

万卷1.0多模态语料

Last synced: 10 Apr 2025

https://github.com/opendatalab/unimernet

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

Last synced: 15 May 2025

https://github.com/opendatalab/doclayout-yolo

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Last synced: 12 Apr 2025

https://github.com/opendatalab/DocLayout-YOLO

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Last synced: 18 Mar 2025

https://github.com/opendatalab/loki

[ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”

Last synced: 05 Apr 2025

https://github.com/opendatalab/opendatalab-datasets

datasets resource

Last synced: 12 Aug 2025

https://github.com/opendatalab/labelu-kit

Data annotation component library --provided as NPM packages

Last synced: 07 Oct 2025

https://github.com/opendatalab/vigc

AAAI 2024: Visual Instruction Generation and Correction

Last synced: 06 Apr 2025

https://github.com/opendatalab/ha-dpo

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Last synced: 10 Jun 2025

https://github.com/opendatalab/vhm

VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

Last synced: 18 Jul 2025

https://github.com/opendatalab/VHM

VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis

Last synced: 06 Apr 2025

https://github.com/opendatalab/clip-parrot-bias

ECCV2024_Parrot Captions Teach CLIP to Spot Text

Last synced: 10 Sep 2025

https://github.com/opendatalab/mls-brn

[CVPR 2024] 3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

Last synced: 06 Apr 2025

https://github.com/opendatalab/opendatalab-python-sdk

SDK of OpenDataLab - https://opendatalab.org.cn

Last synced: 20 Aug 2025

https://github.com/opendatalab/mineru-vl-utils

A Python package for interacting with the MinerU Vision-Language Model.

mineru utils vlm

Last synced: 24 Dec 2025

https://github.com/opendatalab/fakevlm

FakeVLM: Advancing Synthetic Image Detection through Explainable Multimodal Models and Fine-Grained Artifact Analysis

Last synced: 25 Jun 2025

https://github.com/opendatalab/dsdl-docs

Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)

Last synced: 01 Jul 2025

https://github.com/opendatalab/skydiffusion

The official implementation of the paper “Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm”

Last synced: 13 Apr 2025

https://github.com/opendatalab/mllm-dataengine

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

Last synced: 06 Apr 2025

https://github.com/opendatalab/charm

[ACL 2024 Main Conference] Chinese commonsense benchmark for LLMs

Last synced: 14 Oct 2025

https://github.com/opendatalab/provergen

[ICLR 2025] This is the official implementation for the paper: "Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation"

Last synced: 10 Jul 2025

https://github.com/opendatalab/miner-pdf-benchmark

MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.

Last synced: 06 Apr 2025

https://github.com/opendatalab/rest

Last synced: 28 Aug 2025

https://github.com/opendatalab/crossviewdiff

The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"

Last synced: 13 Apr 2025

https://github.com/opendatalab/wanjuan2.0-wanjuan-cc

WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。

Last synced: 10 Apr 2025

https://github.com/opendatalab/labelu-frontend

LabelU front-end library

Last synced: 06 Apr 2025

https://github.com/opendatalab/allz

A universal command line tool for compression and decompression

Last synced: 20 Aug 2025

https://github.com/opendatalab/vis3

OSS browser based on s3, 👇 Try online.

s3 s3-file-browser s3-visulization s3browser

Last synced: 24 Jun 2025

https://github.com/opendatalab/meta-rater

[ACL 2025] This is the official implementation for the paper: "Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models"

Last synced: 24 Jun 2025

https://github.com/opendatalab/.github

Last synced: 08 Aug 2025