https://github.com/xiaoyao9184/docker-marker
Docker implementation of the Marker pdf to markdown
https://github.com/xiaoyao9184/docker-marker
cuda-support docker-image markdown-export marker ocr pdf-extractor
Last synced: about 2 months ago
JSON representation
Docker implementation of the Marker pdf to markdown
- Host: GitHub
- URL: https://github.com/xiaoyao9184/docker-marker
- Owner: xiaoyao9184
- License: apache-2.0
- Created: 2024-11-25T12:16:36.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-12-25T12:50:56.000Z (about 1 year ago)
- Last Synced: 2025-01-03T15:38:53.550Z (about 1 year ago)
- Topics: cuda-support, docker-image, markdown-export, marker, ocr, pdf-extractor
- Language: Python
- Homepage:
- Size: 41 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Docker Marker
A Docker image built through Github Actions with Git commit version tag
[](https://github.com/xiaoyao9184/docker-marker/actions/workflows/docker-image-tag-commit.yml) [](https://hub.docker.com/r/xiaoyao9184/marker)
[](https://github.com/xiaoyao9184/docker-marker/actions/workflows/docker-image-tag-version.yml) [](https://hub.docker.com/r/xiaoyao9184/marker)
[](https://github.com/xiaoyao9184/docker-marker/actions/workflows/hf-model-sync.yml) [](https://huggingface.co/collections/xiaoyao9184/surya-and-marker-68635abc74f33ef5d5be792d)
[](https://github.com/xiaoyao9184/docker-marker/actions/workflows/hf-space-sync.yml) [](https://huggingface.co/spaces/xiaoyao9184/marker) 
# Why
I found that Marker's Docker image is difficult to find.
The code on [GitHub](https://github.com/VikParuchuri/marker) does not provide a pre-built Docker image.
After reviewing the following items
- [linux.do](https://linux.do/t/topic/239082)
This project will use GitHub Actions and Docker Hub to build and publish images,
aiming to keep the process as clean as possible without custom configuration files.
# Tags
The images of this project will be published to Docker Hub under the repository [xiaoyao9184/marker](https://hub.docker.com/r/xiaoyao9184/marker).
Since this project references the Marker project via a submodule, it cannot monitor push events on the Marker project, and therefore cannot automatically create an image for every commit.
A good solution is to manually trigger the action and tag it with the commit id. For more details, see this article [set-dynamic-parameters-github-workflows-en](https://damienaicheh.github.io/github/actions/2022/01/20/set-dynamic-parameters-github-workflows-en.html).
The default image name format is `${DOCKERHUB_USERNAME}/marker`.
The tag uses the input parameter `commit_id`,
which can be either a branch name or a commit id,
when manually triggering the [docker-image-tag-commit](./.github/workflows/docker-image-tag-commit.yml) job.
if the job is triggered by a submodule update push,
the default branch name `master` will be used instead of the `commit_id` parameter.
This job will also use the shortened commit id as the tag.
If the job [docker-image-tag-version](./.github/workflows/docker-image-tag-version.yml) is triggered with the `marker_version` parameter set to the PyPI Marker version number,
the Marker package published on PyPI will be installed for the build,
and `marker_version` will be used as the tag.
Currently, only the `linux/amd64` platform is supported.
# Model
The models of this project will be synced to HuggingFace under the collection [xiaoyao9184/surya-and-marker](https://huggingface.co/collections/xiaoyao9184/surya-and-marker-68635abc74f33ef5d5be792d).
The Docker image does not include model files.
When running, the required models will be automatically downloaded.
If you need to run offline, you must pre-download the model files and enable offline mode.
See [cache/README.md](./cache/README.md) for detailed instructions.
# Service
By default, the Docker container runs the Streamlit App, which comes from the original project.
However, this project also provides a Gradio App, a functional reimplementation of the Streamlit version.
The Gradio App supports both a UI and API interface, and can even serve as an MCP server,
so it is recommended as the preferred option.
The source code for the Gradio App is located in the [gradio](./gradio) directory of this project.
A demo of this project is also available and auto-synced on Hugging Face Spaces: [xiaoyao9184/marker](https://huggingface.co/spaces/xiaoyao9184/marker)
To run the Gradio App, you can do so by modifying the Docker command. see the `up.gradio` sub-directory in the [docker](./docker) directory for details.
# Change
You can fork this project and build your own image. You will need to provide the following variables: `DOCKERHUB_USERNAME`, `DOCKERHUB_TOKEN`, `HF_USERNAME`, `HF_TOKEN`.
See [this](https://github.com/docker/login-action#docker-hub) for more details.