{"id":27700972,"url":"https://github.com/mathpix/mathpix-on-prem","last_synced_at":"2026-01-24T03:31:14.308Z","repository":{"id":272827872,"uuid":"911312521","full_name":"Mathpix/mathpix-on-prem","owner":"Mathpix","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-25T14:42:43.000Z","size":51,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-25T19:14:49.087Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HCL","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mathpix.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-02T18:11:00.000Z","updated_at":"2025-04-25T14:42:47.000Z","dependencies_parsed_at":"2025-01-16T21:39:11.936Z","dependency_job_id":"4e9c66b0-7393-4261-8091-8222de962ff0","html_url":"https://github.com/Mathpix/mathpix-on-prem","commit_stats":null,"previous_names":["mathpix/mathpix-on-prem"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mathpix%2Fmathpix-on-prem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mathpix%2Fmathpix-on-prem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mathpix%2Fmathpix-on-prem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mathpix%2Fmathpix-on-prem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mathpix","download_url":"https://codeload.github.com/Mathpix/mathpix-on-prem/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250878893,"owners_count":21501743,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-25T19:14:49.658Z","updated_at":"2026-01-24T03:31:14.262Z","avatar_url":"https://github.com/Mathpix.png","language":"HCL","readme":"# Mathpix on-premise\n\n- [Prerequisites](#prerequisites)\n- [API](#api)\n  - [Updating files](#updating-files)\n    - [Adding your Mathpix on-prem license](#adding-your-mathpix-on-prem-license)\n    - [Setting up initial credentials](#updating-the-credentials)\n    - [Replacing the docker images (if not using AWS ECR)](#replacing-the-docker-images-if-not-using-aws-ecr)\n  - [Deploying Mathpix on-prem](#deploying-mathpix-on-prem)\n  - [How to](#how-to)\n    - [Update mathpix on-prem license](#update-mathpix-on-prem-license)\n    - [Update default credentials](#update-default-credentials)\n    - [Scale the Mathpix API](#scale-the-mathpix-api)\n- [SCS](#scs)\n  - [Publisher](#publisher)\n  - [Consumer](#consumer)\n  - [GCP GKE Setup](#gcp-gke-setup)\n  - [RabbitMQ Broker](#rabbitmq-broker)\n\n## Prerequisites\n\n| Requirement | GCP Kubernetes Engine | AWS EKS |\n|-------------|-----------------------|---------|\n| A Kubernetes cluster | [Create a cluster](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster) | [Create a cluster](https://docs.aws.amazon.com/eks/latest/userguide/create-cluster.html) |\n| Installed `kubectl` version 1.30 or higher  | [Install kubectl](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#generate_kubeconfig_entry) | [Install kubectl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) |\n| Nodes with NVIDIA GPUs and drivers | [GPU Drivers](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/google-gke.html) | [GPU Drivers](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/amazon-eks.html) |\n\nTo confirm you have GPU nodes available after configuring kubectl to connect to your cluster, run:\n\n```\nkubectl describe nodes\n```\n\nAnd you should see the nodes with the `nvidia.com/gpu` resource: \n\n```\nAllocated resources:\n  Resource           Requests       Limits\n  --------           --------       ------\n  ...\n  nvidia.com/gpu     1              1\n```\n\n## API\n\n### Updating files\n\n#### Adding your Mathpix on-prem license\nFirst you'll need to copy `kubernetes-manifests/api/mathpix/mathpix.env.example` to `kubernetes-manifests/api/mathpix/mathpix.env` and add your `MATHPIX_ON_PREM_LICENSE` to it.\n\n```\ncp kubernetes-manifests/mathpix/api/mathpix.env.example kubernetes-manifests/api/mathpix/mathpix.env\n# Now open kubernetes-manifests/api/mathpix/mathpix.env\n# Replace REPLACE_WITH_YOUR_LICENSE with your license\n# MATHPIX_ON_PREM_LICENSE=REPLACE_WITH_YOUR_LICENSE\n```\n\n#### Setting up initial credentials\n\nYou should update the credentials in the [kubernetes-manifests/api/jobs/update-credentials/credentials.json](kubernetes-manifests/api/jobs/update-credentials/credentials.json) file with the credentials you want to use to access the Mathpix on-prem OCR API. \n\n#### Replacing the docker images\n\nTo update the docker images you will need to update these files:\n\n- [kubernetes-manifests/api/mathpix/kustomization.yaml](kubernetes-manifests/api/mathpix/kustomization.yaml)\n- [kubernetes-manifests/api/jobs/kustomization.yaml](kubernetes-manifests/api/jobs/kustomization.yaml)\n- [kubernetes-manifests/api/jobs/update-credentials/kustomization.yaml](kubernetes-manifests/api/jobs/update-credentials/kustomization.yaml)\n\n**Note:** If you haven't had your AWS account granted access to download images from our AWS ECR then you should update the images to point to the registry where your cluster can access them. If you're using Google Cloud Platform or another kubernetes cluster without access to ECR you'll need to get our images into to your google artifact registry or other registry that your cluster has access to and use those images in the kustomization files.\n\n### Deploying Mathpix on-prem\n\nTo create the entire Mathpix on-prem deployment first create the dependencies such as postgres, redis, minio and rabbitmq:\n\n```\nkubectl apply -k ./kubernetes-manifests/api/dependencies\n```\n\nOnce they are running you can start the jobs that will migrate the database and seed it with credentials and create the storage buckets:\n\n```\n# Check if all the dependencies are running\nkubectl wait --for=condition=Ready pod/minio-0 pod/postgres-0 pod/rabbitmq-0 pod/redis-0 --timeout=240s\n\n# Apply the jobs that will migrate the database and seed it with credentials and create the storage buckets\nkubectl apply -k ./kubernetes-manifests/api/jobs\n```\n\nOnce the jobs have completed you can deploy the Mathpix OCR API:\n\n```\n# Wait for the jobs to complete\nkubectl wait --for=condition=complete job/mathpix-migrate-schema job/mathpix-update-credentials job/minio-init-buckets --timeout=180s\n\n# Start the mathpix OCR API\nkubectl apply -k ./kubernetes-manifests/api/mathpix\n```\n\nTo remove the on-prem deployment, run:\n\n```\nkubectl delete -k ./kubernetes-manifests/api/dependencies\nkubectl delete -k ./kubernetes-manifests/api/jobs\nkubectl delete -k ./kubernetes-manifests/api/mathpix\n```\n\nThe Mathpix API will take a few minutes to start up, you can check the status with:\n\n```\nkubectl get pods \n# or\nkubectl wait --for=condition=ready pod -l app=mathpix-api --timeout=600s\n```\n\nTo see the load balancer with the Mathpix service, run:\n\n```\nkubectl get svc mathpix-loadbalancer -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'\n```\n\nTo verify that the Mathpix service is running, run:\n\n```\nAPI_URL=$(kubectl get svc mathpix-loadbalancer -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')\ncurl -s $API_URL/region-health\n```\n\nTo send OCR requests with the default Mathpix on-prem credentials, run:\n\n```\nAPI_URL=$(kubectl get svc mathpix-loadbalancer -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')\n\n# Image\ncurl -X POST $API_URL/v3/text \\\n     -H 'app_id: mathpix-test-app-1' -H 'app_key: replace-with-your-app-key-1' -H 'Content-Type: application/json' \\\n     --data '{\"src\": \"https://mathpix-ocr-examples.s3.amazonaws.com/cases_hw.jpg\", \"math_inline_delimiters\": [\"$\", \"$\"], \"rm_spaces\": true}'\n\n\n# PDF\ncurl -X POST $API_URL/v3/pdf \\\n     -H 'app_id: mathpix-test-app-1' -H 'app_key: replace-with-your-app-key-1' -H 'Content-Type: application/json' \\\n     --data '{ \"url\": \"http://cs229.stanford.edu/notes2020spring/cs229-notes1.pdf\", \"conversion_formats\": {\"docx\": true, \"tex.zip\": true}}'\n```\n\n## How to\n\n### Update the docker images:\n\nWhen we release a new image you will update [kubernetes-manifests/api/mathpix/kustomization.yaml](kubernetes-manifests/api/kustomization) with the new image tag and then update the deployment with:\n\n```\nkubectl apply -k ./kubernetes-manifests/api/mathpix\n```\n\n### Update mathpix on-prem license\n\nTo update the mathpix on-prem license, modify the file `kubernetes-manifests/mathpix/mathpix.env` and run:\n\n```\nkubectl apply -k ./kubernetes-manifests/api/mathpix\n```\n\n### Update API credentials\n\nTo update your API credentials, modify the file `kubernetes-manifests/jobs/update-credentials/credentials.json` and run:\n\n```\nkubectl apply -k ./kubernetes-manifests/api/jobs/update-credentials\n```\n\n### Scale the Mathpix API\n\nScaling the Mathpix OCR API can be done with `kubectl`:\n\n```\nkubectl scale deploy mathpix-api --replicas 3\n```\n\nOr by modifying the [kubernetes-manifests/api/mathpix/kustomization.yaml](kubernetes-manifests/api/kustomization) file's replicas count and re-applying:\n\n```\n# After updating replicas count in kubernetes-manifests/api/mathpix/kustomization.yaml\nkubectl apply -k ./kubernetes-manifests/api/mathpix\n```\n\n## SCS\n\nTo deploy the secure conversions service you'll need to update a few files in the `kubernetes-manifests/scs` directory.\n\nThen you'll need to make a new overlay by copying the `kubernetes-manifests/scs/overlays/example` to a new directory:\n\n```\ncp kubernetes-manifests/scs/overlays/example kubernetes-manifests/scs/overlays/your-overlay-name\n```\n\nNext, you'll need to update these files to point to the correct SCS image that you have access to.\n\n- `kubernetes-manifests/scs/overlays/your-overlay-name/consumer/kustomization.yaml`\n- `kubernetes-manifests/scs/overlays/your-overlay-name/publisher/kustomization.yaml`\n\n```\nimages:\n  - name: external_image\n    newName: REPLACE_WITH_REGISTRY_REPO_IMAGE\n    newTag: REPLACE_WITH_REGISTRY_REPO_IMAGE_TAG\n```\n\n### Publisher\n\nTo deploy a publisher job which you'll need to update the `kubernetes-manifests/scs/overlays/your-overlay-name/publisher/secrets.yaml` file to point to the correct values for the following secrets:\n\n- `AMQP_URL`\n- `STORAGE_ENDPOINT_URL`\n- `ACCESS_KEY_ID`\n- `SECRET_ACCESS_KEY`\n\nThen you'll want to update the `kubernetes-manifests/scs/overlays/your-overlay-name/publisher/configmap.yaml` file to point to the correct values for the following configmaps:\n\n- `NAME`\n- `JOB_ID`\n- `INPUT_BUCKET`\n- `INPUT_FOLDER`\n- `OUTPUT_BUCKET`\n- `OUTPUT_FOLDER`\n\nOnce you've updated these files, you can deploy the secure conversions service publisher using the following command:\n\n```\nkubectl apply -k ./kubernetes-manifests/scs/overlays/your-overlay-name/publisher\n```\n\n## Consumer\n\nTo deploy the secure conversions consumer you'll need to update a few files in the `kubernetes-manifests/scs/consumer` directory.\n\nYou'll need to update the `kubernetes-manifests/scs/overlays/your-overlay-name/consumer/secrets.yaml` file to point to the correct values for the following secrets:\n\n- `AMQP_URL` - The connection string for the rabbitmq cluster (see [RabbitMQ Broker](#rabbitmq-broker) below)\n- `STORAGE_ENDPOINT_URL` - The S3 compatible storage endpoint url\n- `ACCESS_KEY_ID` - The access key id for the S3 compatible storage endpoint\n- `SECRET_ACCESS_KEY` - The secret access key for the S3 compatible storage endpoint\n\nThen you'll need to update the `kubernetes-manifests/scs/overlays/your-overlay-name/consumer/configmap.yaml` file to point to the correct values for the following configmaps:\n\n- `NAME`\n- `JOB_ID`\n\nOnce you've updated these files, you can deploy the secure conversions consumer using the following command:\n\n```\nkubectl apply -k ./kubernetes-manifests/scs/overlays/your-overlay-name/consumer\n```\n\nYou can also update the consumer replica count in your consumer overlay's `kustomization.yaml` from the default of 1.\n\nFurthermore you can update the default options for output formats in the conversion_options.json file:\n\n```json\n{\n  \"md\": {},\n  \"html\": {},\n  \"tex.zip\": {},\n  \"docx\": {}\n}\n```\n\nRemoving any of the fields will disable the output format, so if you only wanted `.md` files you could use:\n\n```\n{\n  \"md\": {}\n}\n```\n\n**Note:** All output folders will contain `.lines.json`, `.lines.mmd.json`, and `.mmd` files as these are the base output formats that other formats are derived from.\n\n## GCP GKE Cluster Setup\n\nTo create a GKE cluster with the GPU operator and GPU nodes from scratch you can use terraform. Look at the [README.md](kubernetes-cluster-setup/gke/README.md) in the [`kubernetes-cluster-setup/gke/`](kubernetes-cluster-setup/gke/) directory for more information.\n\n## RabbitMQ Broker\n\nThe only external dependency for the SCS is the RabbitMQ broker. This can be any rabbitmq broker that supports the AMQP 0.9.1 protocol, for processing large files the heartbeat should be disabled and consumer_timeout set to 3600000 (1 hour).\n\nYou can also create a rabbitmq broker cluster in kubernetes using the `rabbitmq-operator` with instructons and configuration in the [rabbitmq-broker-setup](./rabbitmq-broker-setup/) directory.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmathpix%2Fmathpix-on-prem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmathpix%2Fmathpix-on-prem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmathpix%2Fmathpix-on-prem/lists"}