{"id":41392494,"url":"https://github.com/theovassiliou/doctrans","last_synced_at":"2026-01-23T12:47:00.607Z","repository":{"id":46292233,"uuid":"219697663","full_name":"theovassiliou/doctrans","owner":"theovassiliou","description":"The Document Transformation Application","archived":false,"fork":false,"pushed_at":"2025-08-19T23:29:58.000Z","size":1838,"stargazers_count":7,"open_issues_count":0,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-08-20T01:22:05.585Z","etag":null,"topics":["framework","golang","microservice"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/theovassiliou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-05T08:45:13.000Z","updated_at":"2025-08-19T23:30:02.000Z","dependencies_parsed_at":"2024-06-04T12:02:54.752Z","dependency_job_id":"00010577-7e14-4014-8758-06a8989bd442","html_url":"https://github.com/theovassiliou/doctrans","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/theovassiliou/doctrans","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/theovassiliou%2Fdoctrans","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/theovassiliou%2Fdoctrans/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/theovassiliou%2Fdoctrans/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/theovassiliou%2Fdoctrans/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/theovassiliou","download_url":"https://codeload.github.com/theovassiliou/doctrans/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/theovassiliou%2Fdoctrans/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28692009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T11:01:27.039Z","status":"ssl_error","status_checked_at":"2026-01-23T11:00:26.909Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["framework","golang","microservice"],"created_at":"2026-01-23T12:46:59.900Z","updated_at":"2026-01-23T12:47:00.595Z","avatar_url":"https://github.com/theovassiliou.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# The Document Transformation Application\n\nThe Document Transformation Application (DTA) is a microservice based web service application provided by the research group QDS of the TU Berlin.\n\nThe DTA serves as a playground for different research topics and provides a collection of different document transformation functions. A DocumentTransformationFunction DTF transforms a document into another document. Transforming in our context is meant in a very broad sense. Actually the semantics defined is, that one document, represented as a byte array, is passed as a argument to the DTF and another array of bytes is returned.\n\n## The DTA Server protocol\n\nThe  protocol between a [DTA user](#user-content-dtauser) and the [DTA Server](#user-content-dtaserver) is defined using [gRPC](https://grpc.io/) and we call it [DTA Server Protocol](#user-content-thedtaserverprotocol). A [DTA worker](#user-content-dtaworker) might also act as a gateway to\n\nThe server protocol is defined using [gRPC](https://grpc.io) and consists of three operations\n\n- TransformDocument\n- ListServices\n- TransformPipe\n  \nWe have build the protocol using the [protobuf v3.9.1](https://github.com/protocolbuffers/protobuf/releases/tag/v3.9.1) tool.\n\nIn addition we have defined a compatible RESTfull/JSON based API according to the following [REST specification](swagger/index.html)\n\n## Architecture\n\nThe architecture is pretty simple as shown in\n\n![Architecture](doc/Architecture.jpg \"Architecture\")\n\nA Client communicates with a DTA Server via gRPC or RESTfull/JSON. The DTA Server can communicate with other DTA Servers and will return the result to the DTA client.\n\nIf a DTA server simply enables the communication with other, potentially private DTA servers we call this a DTA gateway.\n\n## Implementations\n\nCurrently the project provides implementations for the following elements\n\n### Servers\n\n- [Gateway](gateway/README.md)\n  A simple, straigth forward gateway implementation\n\n### Services\n\n- [Count](services/qds_count/README.md)\n  Counting lines, words, bytes in a document\n\n- [Echo](services/qds_echo/README.md)\n  Just echoing the provided document\n\n- [Html2text](services/qds_echo/README.md)\n  Extracts from a HTML document the text in markdown form. Preserves table structures. \n\n## Installation of Implementations\n\nInstallation is pretty straight forward if you have a working golang environment.\n\n```shell\ngo get github.com/theovassiliou/doctrans\ngo build ./...\ngo test ./...\n```\n\nIf the output looks reasonable you are ready to go.\n\n## 1st run\n\nTo test run a client/server pair try out the following\n\n```shell\ngo run services/qds_echo/echo.go\n```\n\nand in another terminal on the same host\n\n```shell\ngo run clients/client.go test/testDoc.txt\n```\n\n`client` sends the file `testDoc.txt` to the *echo* server which has been started before. Addressing is hardcoded via the default parameters.\n\nIf you start in a third terminal on the same host an additional server with\n\n```shell\ngo run services/qds_count/count.go\n```\n\nThis would start the *count* server, listening on the next available port. In order to use this service you could use now\n\n```shell\ngo run clients/client.go -g :50052 test/testDoc.txt\n```\n\nFor detailed configurations consult the respective client and server READMEs\n\n## Glossary\n\n- DTA  - Document Transformation Application\n  - DTA client - Synonym for DTA user.\n  - DTA gateway - A DTA gateway offers via the DTA server API access to non-publicly available DTA\n  - DTA server - The DTA server provides an API for document transformation. The DTA server might use [DTA worker](#user-content-dtaworker) to perform the task, or other means. See also DTA Gateway\n  - DTA server protocol - The protocol between DTA server and DTA user.\n  - DTA user - Is a entity that uses the DTA Server API to transform a document. Also called DTA client.\n  - DTA worker - A microservice providing *one* transformation function, potentially parametrised. A DTA worker is also a DTA server.\n  - DTF - Document Transformation Function is a function that tranforms a document into another document. Simple example include, ECHO (the null transformation) or COUNT (Counting lines, words and/or characters), while more suffistacted functions might convert a PDF to a text document.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftheovassiliou%2Fdoctrans","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftheovassiliou%2Fdoctrans","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftheovassiliou%2Fdoctrans/lists"}