{"id":15659956,"url":"https://github.com/fdawgs/docsmith","last_synced_at":"2025-10-19T09:14:38.003Z","repository":{"id":37820077,"uuid":"329618667","full_name":"Fdawgs/docsmith","owner":"Fdawgs","description":"RESTful API for converting clinical documents and files","archived":false,"fork":false,"pushed_at":"2025-01-01T03:58:41.000Z","size":25902,"stargazers_count":20,"open_issues_count":10,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-01T04:26:04.536Z","etag":null,"topics":["doc","docker","documents","docx","fastify","nhs","nodejs","pdf","pm2","rest","restful","rtf"],"latest_commit_sha":null,"homepage":"","language":"Rich Text Format","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Fdawgs.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"Fdawgs"}},"created_at":"2021-01-14T13:08:25.000Z","updated_at":"2025-01-01T03:57:36.000Z","dependencies_parsed_at":"2023-12-01T04:28:14.906Z","dependency_job_id":"95e8dc07-3a0a-473b-8bc5-5608c7a42a37","html_url":"https://github.com/Fdawgs/docsmith","commit_stats":{"total_commits":2437,"total_committers":6,"mean_commits":406.1666666666667,"dds":0.5207221994255231,"last_synced_commit":"fd17e8538f95d856e7ed1ca06bc9066d52a2a27c"},"previous_names":[],"tags_count":111,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fdawgs%2Fdocsmith","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fdawgs%2Fdocsmith/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fdawgs%2Fdocsmith/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Fdawgs%2Fdocsmith/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Fdawgs","download_url":"https://codeload.github.com/Fdawgs/docsmith/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":232765086,"owners_count":18573251,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["doc","docker","documents","docx","fastify","nhs","nodejs","pdf","pm2","rest","restful","rtf"],"created_at":"2024-10-03T13:19:40.802Z","updated_at":"2025-10-19T09:14:32.976Z","avatar_url":"https://github.com/Fdawgs.png","language":"Rich Text Format","funding_links":["https://github.com/sponsors/Fdawgs"],"categories":[],"sub_categories":[],"readme":"\u003cimg alttext=\"Docsmith logo\" src=\"https://raw.githubusercontent.com/Fdawgs/docsmith/main/docs/images/docsmith-logo.svg\" width=\"480\" height=\"auto\" /\u003e\n\n# Docsmith\n\n[![GitHub release](https://img.shields.io/github/release/Fdawgs/docsmith.svg)](https://github.com/Fdawgs/docsmith/releases/latest/)\n[![CI](https://github.com/Fdawgs/docsmith/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/Fdawgs/docsmith/actions/workflows/ci.yml)\n[![Coverage status](https://coveralls.io/repos/github/Fdawgs/docsmith/badge.svg?branch=main)](https://coveralls.io/github/Fdawgs/docsmith?branch=main)\n[![code style: Prettier](https://img.shields.io/badge/code_style-prettier-ff69b4.svg?style=flat)](https://github.com/prettier/prettier)\n\n\u003e RESTful API for converting clinical documents and files\n\n## Overview\n\nDocsmith is a RESTful API, built using Node.js and the [Fastify](https://fastify.dev/) web framework, that can convert a range of files:\n\n| Input | Output | Notes                                        |\n| ----- | ------ | -------------------------------------------- |\n| DOC   | TXT    | DOT file variant supported                   |\n| DOCX  | HTML   | DOCM, DOTM, and DOTX file variants supported |\n| DOCX  | TXT    | DOCM, DOTM, and DOTX file variants supported |\n| HTML  | TXT    | XHTML file variant supported                 |\n| PDF   | HTML   |                                              |\n| PDF   | TXT    | Scanned documents supported using OCR        |\n| RTF   | HTML   | Images are removed[^1]                       |\n| RTF   | TXT    |                                              |\n\n[^1]: The underlying UnRTF binary converts images and stores them using an incremental naming scheme of `pict001`, `pict002`, and so on. This poses a confidentiality and clinical risk as concurrent requests will overwrite each other's images, which could result in a patient's image being placed in another patient's document. To mitigate this, Docsmith removes all images from the output HTML\n\n### Why Docsmith?\n\nDocsmith was created in my spare time outside of work after identifying the need for an open-source document conversion service at Yeovil Hospital (ran by [Somerset NHS Foundation Trust](https://www.somersetft.nhs.uk/)).\n\nBeing open-source, with the ability to be self-hosted, enables a data processor (i.e. an NHS trust) to confirm that a service is not storing and logging files with confidential patient identifiable data (PID) in them, which is essential for preventing potential GDPR breaches. This is something that the majority of existing closed-source document conversion services cannot offer. Docsmith was built to remedy this.\n\nBefore Docsmith, Yeovil Hospital was using expensive proprietary conversion tools that would regularly produce unreadable documents with issues such as text running off the page, paragraphs overlapping each other, and Windows-1252 to UTF-8 character encoding problems. GP surgeries in Somerset and Dorset would receive these corrupted documents through [MESH](https://digital.nhs.uk/services/message-exchange-for-social-care-and-health-mesh) and be unable to read them. This resulted in time and money wasted either posting or faxing them again, opening up the potential for further data breaches.\n\nDocsmith enables a data processor to use a comprehensive, GDPR-compliant, open-source document conversion service. In comparison with equivalents in the market today it completes this vital task at a fraction of the cost (free!), whilst also ensuring a higher level of security and privacy for the data subjects.\n\n## Prerequisites\n\nThese are only required if running the API outside of Docker:\n\n- [Node.js](https://nodejs.org/en/) \u003e=20.0.0\n- Linux only: `poppler-data` \u003e=0.4.9\n- Linux only: `poppler-utils` \u003e=20.12.0\n- macOS only: `poppler` \u003e=20.12.0\n- Linux and macOS only: `unrtf` \u003e=0.19.3\n\n## Setup\n\nPerform the following steps before deployment:\n\n1. Download and extract the [latest release asset](https://github.com/Fdawgs/docsmith/releases/latest)\n2. Navigate to the extracted directory\n3. Make a copy of `.env.template` in the root directory and rename it to `.env`\n4. Configure the application using the environment variables in `.env`\n5. Place additional trained data into `ocr_lang_data` directory (optional, [info can be found here](./ocr_lang_data/README.md))\n\n\u003e **Note**\n\u003e Set the following environment variables in `.env` to meet NHS England's recommendation to retain six months' worth of logs:\n\u003e\n\u003e - `LOG_ROTATION_DATE_FORMAT=\"YYYY-MM-DD\"`\n\u003e - `LOG_ROTATION_FREQUENCY=\"daily\"`\n\u003e - `LOG_ROTATION_MAX_LOGS=\"180d\"`\n\n## Deployment\n\n### Standard deployment\n\n1. Run `npm ci --ignore-scripts --omit=dev` to install dependencies\n2. Run `npm start`\n\nThe service should be up and running on the port set in the config. Output similar to the following should appear in stdout or in the log file specified using the `LOG_ROTATION_FILENAME` environment variable:\n\n```json\n{\n\t\"level\": \"info\",\n\t\"time\": \"2022-10-20T07:57:21.459Z\",\n\t\"pid\": 148,\n\t\"hostname\": \"MYCOMPUTER\",\n\t\"msg\": \"Server listening at http://127.0.0.1:51173\"\n}\n```\n\nTo test it, use [Yaak](https://yaak.app/) and import the example requests from `./test_resources/yaak.docsmith.json`.\n\n### Deploying using Docker\n\nThis requires [Docker](https://docker.com) installed.\n\n1. Run `docker compose up` (or `docker compose up -d` to run in the background)\n\n### Deploying using PM2\n\nIf this cannot be deployed into production using Docker, use a process manager such as [PM2](https://pm2.keymetrics.io/).\n\n1. Run `npm ci --ignore-scripts --omit=dev` to install dependencies\n2. Run `npm i -g pm2` to install pm2 globally\n3. Launch the application with `pm2 start .pm2.config.js`\n4. Check that the application has been deployed using `pm2 list` or `pm2 monit`\n\n#### To install as a Windows service:\n\nIf using a Microsoft Windows OS utilise [pm2-installer](https://github.com/jessety/pm2-installer) to install PM2 as a Windows service.\n\n\u003e **Note**\n\u003e PM2 will automatically restart the application if `.env` is modified.\n\n## Usage\n\n### Accessing API documentation\n\nAPI documentation can be found at `/docs`:\n\n\u003cimg alttext=\"Screenshot of Docsmith documentation page\" src=\"https://raw.githubusercontent.com/Fdawgs/docsmith/main/docs/images/api_documentation_screenshot.png\" width=\"720\"\u003e\n\nThe underlying OpenAPI definitions are found at `/docs/openapi`.\n\n## Contributing\n\nContributions are welcome, and any help is greatly appreciated!\n\nSee [the contributing guide](./CONTRIBUTING.md) for details on how to get started.\nPlease adhere to this project's [Code of Conduct](https://github.com/Fdawgs/.github/blob/main/CODE_OF_CONDUCT.md) when contributing.\n\n## License\n\n`docsmith` is licensed under the [MIT](./LICENSE) license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffdawgs%2Fdocsmith","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffdawgs%2Fdocsmith","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffdawgs%2Fdocsmith/lists"}