{"id":35226176,"url":"https://github.com/epam/statgpt-backend","last_synced_at":"2026-04-29T09:09:22.799Z","repository":{"id":316659340,"uuid":"1015421692","full_name":"epam/statgpt-backend","owner":"epam","description":"StatGPT Backend","archived":false,"fork":false,"pushed_at":"2026-04-22T13:23:23.000Z","size":1960,"stargazers_count":24,"open_issues_count":56,"forks_count":1,"subscribers_count":0,"default_branch":"development","last_synced_at":"2026-04-22T15:09:10.687Z","etag":null,"topics":["llm","sdmx","statgpt"],"latest_commit_sha":null,"homepage":"https://statgpt.dialx.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-07T13:29:09.000Z","updated_at":"2026-04-22T12:53:52.000Z","dependencies_parsed_at":"2025-12-29T13:00:11.982Z","dependency_job_id":null,"html_url":"https://github.com/epam/statgpt-backend","commit_stats":null,"previous_names":["epam/statgpt-backend"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/epam/statgpt-backend","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epam%2Fstatgpt-backend","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epam%2Fstatgpt-backend/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epam%2Fstatgpt-backend/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epam%2Fstatgpt-backend/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epam","download_url":"https://codeload.github.com/epam/statgpt-backend/tar.gz/refs/heads/development","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epam%2Fstatgpt-backend/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32418301,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T06:29:02.080Z","status":"ssl_error","status_checked_at":"2026-04-29T06:29:00.631Z","response_time":110,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","sdmx","statgpt"],"created_at":"2025-12-30T01:09:25.291Z","updated_at":"2026-04-29T09:09:22.785Z","avatar_url":"https://github.com/epam.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# StatGPT Backend\n\nThis repository contains code for StatGPT backend, which implements APIs and main logic of the StatGPT application.\n\nMore information about StatGPT and its architecture can be found in\nthe [documentation repository](https://github.com/epam/statgpt).\n\n## Technological stack\n\nApplication is written in Python 3.11 and uses the following main technologies:\n\n| Technology                                                   | Purpose                                                  |\n|--------------------------------------------------------------|----------------------------------------------------------|\n| [AI DIAL SDK](https://github.com/epam/ai-dial-sdk)           | SDK for building applications on top of AI DIAL platform |\n| [FastAPI](https://fastapi.tiangolo.com/)                     | Web framework for API development                        |\n| [SQLAlchemy](https://www.sqlalchemy.org/)                    | ORM for database operations                              |\n| [LangChain](https://python.langchain.com/docs/introduction/) | LLM application framework                                |\n| [Pydantic](https://pydantic.dev/)                            | Data validation and settings                             |\n| [sdmx1](https://github.com/khaeru/sdmx)                      | SDMX data handling and provider connections              |\n\n## Project structure\n\n* `statgpt/admin` — backend of the administrator part which allows the user to add and update data.\n* `statgpt/common` — common code used in the `statgpt.admin` and `statgpt.app` applications.\n* `statgpt/app` — main application that generates response using LLMs and based on data prepared by `statgpt.admin`.\n* `statgpt/cli` — command-line interface (CLI) for managing various aspects of StatGPT.\n* `tests` - unit and integration tests.\n* `docker` - Dockerfiles for building docker images.\n\n## Environment variables\n\nThe applications are configured using environment variables. The environment variables are described in the following\nfiles:\n\n* [Common environment variables](statgpt/common/README.md#environment-variables) - used in both applications\n* [Admin Backend environment variables](statgpt/admin/README.md#environment-variables)\n* [Main App environment variables](statgpt/app/README.md#environment-variables)\n* [CLI environment variables](statgpt/cli/README.md#environment-variables)\n\n## Local Setup\n\n### Pre-requisites\n\n#### 1. Install [Make](https://www.gnu.org/software/make/)\n\n* MacOS - should be already installed\n* [Windows](https://gnuwin32.sourceforge.net/packages/make.htm)\n* [Windows, using Chocolatey](https://community.chocolatey.org/packages/make)\n* Make sure that `make` is in the PATH (run `which make`).\n\n#### 2. Install Python 3.11\n\nDirect installation:\n\n* [MacOS, using Homebrew](https://formulae.brew.sh/formula/python@3.11) - `brew install python@3.11`\n* [Windows or MacOS, using official repository](https://www.python.org/downloads/)\n* [Windows, using Chocolatey](https://community.chocolatey.org/packages/python311)\n* Make sure that `python3` or `python3.11` is in the PATH and works properly (run `python3.11 --version`).\n\nAlternative: use [pyenv](https://github.com/pyenv/pyenv?tab=readme-ov-file#installation):\n\n* `pyenv` allows to manage different python versions on the same machine\n* execute following from the repository root folder:\n  ```bash\n  pyenv install 3.11\n  pyenv local 3.11  # use Python 3.11 for the current project\n  ```\n\n#### 3. Install [Poetry](https://python-poetry.org/docs/#installation)\n\nRecommended way - system-wide, independent of any particular python venv:\n\n* MacOS - recommended way to install poetry is to [use pipx](https://python-poetry.org/docs/#installing-with-pipx)\n* Windows - recommended way to install poetry is to\n  use [official installer](https://python-poetry.org/docs/#installing-with-the-official-installer)\n* Make sure that `poetry` is in the PATH and works properly (run `poetry --version`).\n\n#### 4. Install Docker Engine and Docker Compose suitable for your OS\n\nSince Docker Desktop requires a paid license for commercial use, you can use one of the following alternatives:\n\n* [Docker Engine and Docker Compose on Linux](https://docs.docker.com/engine/install/)\n* [Rancher Desktop](https://rancherdesktop.io/) on Windows or MacOS\n\n#### 5. Install GNU gettext (for localization)\n\nRequired for localization commands (`make extract_messages`, `make update_messages`, `make compile_messages`):\n\n* MacOS - `brew install gettext`\n* Linux/WSL - `sudo apt install gettext`\n* Windows (native) - Install via [Chocolatey](https://community.chocolatey.org/packages/gettext): `choco install gettext`\n\nVerify installation: `which xgettext msgmerge msgfmt`\n\n---\n\n### Setup\n\n#### 1. Clone the repository\n\n#### 2. Create venv (python virtual environment)\n\nCreate python virtual environment, using poetry:\n\n```bash\nmake init_venv\n```\n\nIf you see the following error: `Skipping virtualenv creation, as specified in config file.`, it means venv was not\ncreated because poetry is configured not to create a new virtual environment. You can fix this:\n\n* Either by updating poetry config:\n    * `poetry config --local virtualenvs.create true` (local config)\n    * or `poetry config virtualenvs.create true` (global config)\n* or by creating venv manually: `python -m venv .venv`\n\n#### 3. Activate venv\n\nFor Mac / Linux:\n\n```bash\nsource .venv/bin/activate\n```\n\nFor Windows:\n\n```bash\n.venv/Scripts/Activate\n```\n\n#### 4. Install required python packages\n\nThe following will install basic and dev dependencies:\n\n```bash\nmake install_dev\n```\n\n#### 5. Create `.env` file in the root of the project\n\nYou can copy the template file and fill values for secrets manually:\n\n```bash\ncp .env.template .env\n```\n\nThe [Environment variables section](#environment-variables) provides links to pages with\ndetailed information about environment variables.\n\n#### 6. Create `dial/core/config/config.json` file by running python script\n\n_Not implemented yet, TODO: create a script that generates config based on .env variables_\n\n## Run StatGPT locally\n\n1. Run the DIAL using docker compose:\n\n    ```bash\n    docker compose up -d\n    ```\n\n2. Apply `alembic` migrations:\n\n   ```bash\n   make db_migrate\n   ```\n\n3. Run Admin backend (if you want to initialize or update data):\n\n   ```bash\n   make statgpt_admin\n   ```\n\n4. Run StatGPT application:\n\n   ```bash\n   make statgpt_app\n   ```\n\n5. Initialize sample content (optional):\n\n   ```bash\n   # Run CLI and initialize sample client\n   make statgpt_cli\n   ```\n\n   Then in the CLI:\n\n   ```\n   statgpt\u003e content init --client-id sample -y\n   statgpt\u003e channel reindex -c statgpt-sample --mode all\n   ```\n\n   Wait till reindexing is finished (check status using `channel status` command in CLI). After that run deduplication:\n\n   ```\n   statgpt\u003e channel deduplicate -c statgpt-sample\n   ```\n\n\n   See [CLI documentation](statgpt/cli/README.md) for more commands.\n\n## MCP\n\n### StatGPT MCP\n\nThe main application includes an MCP server that exposes the SupremeAgent's tools to external MCP clients\n(Claude Code, Cursor, etc.).\n\nSee [MCP Server setup instructions](statgpt/app/README.md#mcp-server) for details.\n\n\n### Admin MCP (Beta)\n\nThe Admin application includes an optional MCP (Model Context Protocol) server for dataset onboarding assistance.\nIt provides tools and prompts for coding agents such as Cursor and Claude Code.\n\n\u003e **Note:** This feature is optional and disabled by default. It requires installing additional dependencies\n\u003e and enabling via environment variable.\n\nSee [MCP setup instructions](statgpt/admin/mcp/README.md) for details.\n\n## Utils for Development\n\n### 1. Format the code\n\n ```bash\n make format\n ```\n\n### 2. Run linters\n\n ```bash\n make lint\n ```\n\n### 3. Pre-Commit Hooks\n\nTo automatically apply black and isort on each commit, enable PreCommit Hooks:\n\n```bash\nmake install_pre_commit_hooks\n```\n\nThis command will set up the git hook scripts.\n\n### 4. Create a new `alembic` migration:\n\n\u003e **(!)**\n\u003e It is critical to note that **autogenerate is not intended to be perfect**.\n\u003e It is *always* necessary to manually review and correct the **candidate migrations** that autogenerate produces.\n\n**(!)** After creating a new migration, it is necessary to update the `ALEMBIC_TARGET_VERSION` in the\n`statgpt/common/config/version.py` file to the new version.\n\n ```bash\n make db_autogenerate MESSAGE=\"Your message\"\n ```\n\nor:\n\n ```bash\n alembic -c alembic.ini revision --autogenerate -m \"Your message\"\n ```\n\n### 5. Undo last `alembic` migration\n\n ```bash\n make db_downgrade\n ```\n\n### 6. Localization (i18n)\n\nThe project uses GNU gettext for internationalizing dataset formatters. Use these commands when working with translations:\n\n**Workflow:**\n\n1. **Extract translatable strings** - Run after adding/modifying strings marked with `_()` in formatter code:\n   ```bash\n   make extract_messages\n   ```\n   This creates/updates the `locales/dataset.pot` template file.\n\n2. Review changes to `locales/dataset.pot` file - check git diff.\n   There should be no unexpected changes (removals, additions) -\n   they sometimes happen on Windows platforms.\n\n3. **Update translation files** - Run to sync `.po` files with the new template:\n   ```bash\n   make update_messages\n   ```\n   This updates `en/LC_MESSAGES/dataset.po` and `uk/LC_MESSAGES/dataset.po` with new strings.\n\n4. Fill missing translations in `.po` files. Either manually or using coding agent.\n\n5. **Compile translations** - Run after translating strings in `.po` files to generate binary `.mo` files:\n   ```bash\n   make compile_messages\n   ```\n   Or use the shorthand: `make locales`\n\n**Note:** All commands require GNU gettext to be installed (see [Prerequisites](#pre-requisites)).\n\n## Run Tests\n\n- Run all tests (unit and integration):\n\n    ```bash\n    make test\n    ```\n\n- Run only unit tests:\n\n    ```bash\n    make test_unit\n    ```\n\n- Run only integration tests:\n\n    ```bash\n    make test_integration\n    ```\n\n⚠️ **WARNING:** Integration tests require a database and Elasticsearch instance.\nConsider using separate test instances instead of the ones from `docker-compose.yml`\nbecause tests truncate tables during execution, which may result in **DATA LOSS**.\nConfigure `TEST_DATABASE_*` environment variables accordingly.\nSee [Common environment variables](statgpt/common/README.md#environment-variables) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepam%2Fstatgpt-backend","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepam%2Fstatgpt-backend","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepam%2Fstatgpt-backend/lists"}