{"id":41203276,"url":"https://github.com/thomas545/extractor-bot","last_synced_at":"2026-01-22T21:37:12.938Z","repository":{"id":241589217,"uuid":"804985437","full_name":"thomas545/extractor-bot","owner":"thomas545","description":null,"archived":false,"fork":false,"pushed_at":"2025-10-08T13:51:08.000Z","size":57,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-10-08T15:33:41.651Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomas545.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-05-23T16:51:42.000Z","updated_at":"2024-11-24T10:11:26.000Z","dependencies_parsed_at":"2024-05-29T06:50:11.546Z","dependency_job_id":"1706cbfa-45dd-44d9-859a-5ce8f53a884a","html_url":"https://github.com/thomas545/extractor-bot","commit_stats":null,"previous_names":["thomas545/extractor-bot"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/thomas545/extractor-bot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas545%2Fextractor-bot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas545%2Fextractor-bot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas545%2Fextractor-bot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas545%2Fextractor-bot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomas545","download_url":"https://codeload.github.com/thomas545/extractor-bot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomas545%2Fextractor-bot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28671719,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-22T20:48:19.482Z","status":"ssl_error","status_checked_at":"2026-01-22T20:48:14.968Z","response_time":144,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-22T21:37:12.444Z","updated_at":"2026-01-22T21:37:12.923Z","avatar_url":"https://github.com/thomas545.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Extractor Bot\n\n### Instructions\n\n### Installation/Setup\n- Clone the repository\n- Install [MongoDB](https://www.mongodb.com/docs/manual/administration/install-community/) locally: \n- Install Python 3.10+\n- Create Python ENV\n  - python3 -m venv `env_name`\n  - source  `env_name`/bin/activate\n  - pip install -r requirements.txt\n  - Add `.env` file with your secret keys that in `env_dev`\n\n- Run project\n  - Run: `uvicorn main:app --host 0.0.0.0 --port 8000 --reload` or `fastapi run`\n\n- Run by Docker:\n  - Build: `docker build -t extractor-app .`\n  - Run: `docker run -p 8000:8000 extractor-app`\n\n\n## API Documentation\n- [Local Docs](http://127.0.0.1:8000/docs)\n\n## Tech Stack:\n- Python 3.10+\n- FastAPI\n- Langchain\n- OpenAI / Gemini\n- MongoDB\n- Milvus\n- uvicorn\n\n### Endpoints usage\n\n#### **File `Upload` Endpoint**\n- Path: `/upload`\n- Method: `POST` \n- Usage:\n  - Accepts one or more file uploads (limited to pdf, tiff, png,jpeg formats).\n  - Returns a list of file identifiers or signed URLs for the uploaded files.\n\n- **Request**:\n\n```\n  files: form-data\n```\n\n- **Response**:\n\n```\n{\n    \"data\": [\n        {\n            \"_id\": \"6654c225e8769fc30206f225\",\n            \"file_name\": \"東京都建築安全条例.json\",\n            \"url\": \"https://testingzone021.b-cdn.net/users_files/6651fbad0b03b201a830642a/1b38aa42-7a34-4bc9-b5fc-01c4e5f2c139.json\",\n            \"file_type\": \"json\"\n        }\n    ],\n    \"status\": \"success\",\n    \"status_code\": 201\n}\n```\n\n#### **`OCR` Endpoint**\n- Path: /ocr\n     \n- Method: POST \n- Usage:\n  - Add OCR file url or file upload `_id`.\n  - Processing OCR results with embedding models, then upload the embeddings to a vector db for better searches.\n  - Return File data to pass it to the extractor\n\n\n- **Request**:\n\n```\n  {\n    \"file_id\": \"6654c225e8769fc30206f225\"\n    // \"url\": \"https://testingzone021.b-cdn.net/users_files/6651fbad0b03b201a830642a/1b38aa42-7a34-4bc9-b5fc-01c4e5f2c139.json\"\n  }\n```\n\n- **Response**:\n\n```\n{\n    \"data\": {\n        \"file\": {\n            \"_id\": \"6654c225e8769fc30206f225\",\n            \"file_name\": \"東京都建築安全条例.json\",\n            \"url\": \"https://testingzone021.b-cdn.net/users_files/6651fbad0b03b201a830642a/1b38aa42-7a34-4bc9-b5fc-01c4e5f2c139.json\",\n            \"file_type\": \"json\"\n        },\n        \"msg\": \"Processing OCR File.\"\n    },\n    \"status\": \"success\",\n    \"status_code\": 200\n}\n```\n\n\n#### **`Extraction` Endpoint**\n\n- Path: /extract\n- Method: POST \n- Usage:\n  - Takes a query text and file_id as input.\n  - Return response from the AI model depend on document data.\n\n\n\n- **Request**:\n\n```\n  {\n    \"file_id\": \"6654c225e8769fc30206f225\",\n    \"query\": \"道路状に造られた敷地の頂点の角の長さはどれくらいですか\"\n  }\n\n```\n\n- **Response**:\n\n```\n{\n    \"data\": {\n        \"response\": {\n            \"_id\": \"66575283f8f024a09219a037\",\n            \"user_id\": \"6651fbad0b03b201a830642a\",\n            \"file_id\": \"6654c225e8769fc30206f225\",\n            \"query\": \"道路状に造られた敷地の頂点の角の長さはどれくらいですか\",\n            \"response\": \"道路状に造られた敷地の頂点の角の長さは、長さニメートルの底辺を有する二等辺三角形の部分です。\",\n            \"created_at\": \"2024-05-29T16:03:37Z\",\n            \"updated_at\": \"2024-05-29T16:03:37Z\"\n        }\n    },\n    \"status\": \"success\",\n    \"status_code\": 200\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomas545%2Fextractor-bot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomas545%2Fextractor-bot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomas545%2Fextractor-bot/lists"}