{"id":28722377,"url":"https://github.com/docling-project/docling-core","last_synced_at":"2026-04-07T14:03:01.788Z","repository":{"id":248441762,"uuid":"827882798","full_name":"docling-project/docling-core","owner":"docling-project","description":"Docling core data types and transformations","archived":false,"fork":false,"pushed_at":"2026-04-01T15:13:39.000Z","size":110467,"stargazers_count":240,"open_issues_count":74,"forks_count":148,"subscribers_count":8,"default_branch":"main","last_synced_at":"2026-04-01T16:24:29.564Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/docling-project.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":"MAINTAINERS.md","copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-07-12T15:29:22.000Z","updated_at":"2026-04-01T15:13:47.000Z","dependencies_parsed_at":"2024-09-17T16:34:01.883Z","dependency_job_id":"d1d1c538-de2b-43fa-9fe7-ad783122bc72","html_url":"https://github.com/docling-project/docling-core","commit_stats":null,"previous_names":["ds4sd/docling-core","docling-project/docling-core"],"tags_count":146,"template":false,"template_full_name":null,"purl":"pkg:github/docling-project/docling-core","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docling-project%2Fdocling-core","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docling-project%2Fdocling-core/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docling-project%2Fdocling-core/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docling-project%2Fdocling-core/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/docling-project","download_url":"https://codeload.github.com/docling-project/docling-core/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docling-project%2Fdocling-core/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31515152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-15T08:09:09.881Z","updated_at":"2026-04-07T14:03:01.782Z","avatar_url":"https://github.com/docling-project.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Docling Core\n\n[![PyPI version](https://img.shields.io/pypi/v/docling-core)](https://pypi.org/project/docling-core/)\n![Python](https://img.shields.io/badge/python-3.10%20%7C%20%203.11%20%7C%203.12%20%7C%203.13%20%7C%203.14-blue)\n[![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat\u0026labelColor=ef8336)](https://pycqa.github.io/isort/)\n[![Checked with mypy](https://www.mypy-lang.org/static/mypy_badge.svg)](https://mypy-lang.org/)\n[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://pydantic.dev)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026logoColor=white)](https://github.com/pre-commit/pre-commit)\n[![License MIT](https://img.shields.io/github/license/docling-project/docling-core)](https://opensource.org/licenses/MIT)\n\nDocling Core is a library that defines core data types and transformations in [Docling](https://github.com/docling-project/docling).\n\n## Installation\n\nTo use Docling Core, simply install `docling-core` from your package manager, e.g. pip:\n```bash\npip install docling-core\n```\n\n### Development setup\n\nTo develop for Docling Core, you need Python 3.10 through 3.14 and the `uv` package. You can then install it from your local clone's root directory:\n```bash\nuv sync --all-extras\n```\n\nTo run the pytest suite, execute:\n```\nuv run pytest -s test\n```\n\n## Main features\n\nDocling Core provides the foundational DoclingDocument data model and API, as well as\nadditional APIs for tasks like serialization and chunking, which are key to developing\ngenerative AI applications using Docling.\n\n### DoclingDocument\n\nDocling Core defines the DoclingDocument as a Pydantic model, allowing for advanced\ndata model control, customizability, and interoperability.\n\nIn addition to specifying the schema, it provides a handy API for building documents,\nas well as for basic operations, e.g. exporting to various formats, like Markdown, HTML,\nand others.\n\n👉 More details:\n- [Architecture docs](https://docling-project.github.io/docling/concepts/architecture/)\n- [DoclingDocument docs](https://docling-project.github.io/docling/concepts/docling_document/)\n\n### Serialization\n\nDifferent users can have varying requirements when it comes to serialization.\nTo address this, the Serialization API introduces a design that allows easy extension,\nwhile providing feature-rich built-in implementations (on which the respective\nDoclingDocument helpers are actually based).\n\n👉 More details:\n- [Serialization docs](https://docling-project.github.io/docling/concepts/serialization/)\n- [Serialization example](https://docling-project.github.io/docling/examples/serialization/)\n\n### Chunking\n\nSimilarly to above, the Chunking API provides built-in chunking capabilities as well as\na design that enables easy extension, this way tackling customization requirements of\ndifferent use cases.\n\n👉 More details:\n- [Chunking docs](https://docling-project.github.io/docling/concepts/chunking/)\n- [Hybrid chunking example](https://docling-project.github.io/docling/examples/hybrid_chunking/)\n- [Advanced chunking and serialization](https://docling-project.github.io/docling/examples/advanced_chunking_and_serialization/)\n\n### Profiling\n\nThe Profiling API enables extraction of comprehensive statistics from DoclingDocument objects,\nboth for individual documents and collections. It provides metrics on document structure\n(pages, tables, pictures, text items) along with statistical distributions (deciles, histograms)\nand visualization capabilities for analyzing document collections at scale.\n\n👉 More details:\n- [Document profiling example](./examples/document_profiling.py)\n- [Collection statistics visualization](./examples/visualize_collection_stats.py)\n\n## Contributing\n\nPlease read [Contributing to Docling Core](./CONTRIBUTING.md) for details.\n\n## References\n\nIf you use Docling Core in your projects, please consider citing the following:\n\n```bib\n@techreport{Docling,\n  author = \"Deep Search Team\",\n  month = 8,\n  title = \"Docling Technical Report\",\n  url = \"https://arxiv.org/abs/2408.09869\",\n  eprint = \"2408.09869\",\n  doi = \"10.48550/arXiv.2408.09869\",\n  version = \"1.0.0\",\n  year = 2024\n}\n```\n\n## License\n\nThe Docling Core codebase is under MIT license.\nFor individual model usage, please refer to the model licenses found in the original packages.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocling-project%2Fdocling-core","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdocling-project%2Fdocling-core","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocling-project%2Fdocling-core/lists"}