{"id":30371335,"url":"https://github.com/gistrec/cleartranscriptbot","last_synced_at":"2025-10-13T20:07:16.094Z","repository":{"id":309074115,"uuid":"1035076253","full_name":"gistrec/ClearTranscriptBot","owner":"gistrec","description":"Get the text from your video/audio with a simple Telegram bot — fast and easy","archived":false,"fork":false,"pushed_at":"2025-09-17T16:10:48.000Z","size":106,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-13T05:12:51.465Z","etag":null,"topics":["bot","speech-recognition","speech-to-text","telegram"],"latest_commit_sha":null,"homepage":"https://t.me/ClearTranscriptBot","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gistrec.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-09T15:51:40.000Z","updated_at":"2025-09-22T22:42:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"0a5b07e8-36cd-47a8-b4b9-52feb07ee34d","html_url":"https://github.com/gistrec/ClearTranscriptBot","commit_stats":null,"previous_names":["gistrec/cleartranscriptbot"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gistrec/ClearTranscriptBot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gistrec%2FClearTranscriptBot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gistrec%2FClearTranscriptBot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gistrec%2FClearTranscriptBot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gistrec%2FClearTranscriptBot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gistrec","download_url":"https://codeload.github.com/gistrec/ClearTranscriptBot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gistrec%2FClearTranscriptBot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279016930,"owners_count":26085905,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bot","speech-recognition","speech-to-text","telegram"],"created_at":"2025-08-20T05:26:56.646Z","updated_at":"2025-10-13T20:07:16.087Z","avatar_url":"https://github.com/gistrec.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ClearTranscriptBot\n\n[👉 Try the bot on Telegram][1]\n\nTelegram bot for automatic audio/video transcription:\n1. Accepts files from a user\n2. Converts them to OGG (via ffmpeg)\n3. Uploads to Yandex Cloud S3\n4. Requests transcription from Yandex SpeechKit\n5. Sends the transcript back in Telegram\n\n## Features\n\n- 🎙 Supports audio and video files (up to 4 hours)\n- 📦 Stores files in Yandex Cloud S3\n- 💬 Transcription via Yandex SpeechKit\n- 💰 Balance and billing inside Telegram\n- 📜 Full request history\n- 🐞 Optional error reporting via Sentry\n\n## Project structure\n\n```\nClearTranscriptBot\n├── main.py              # Bot entry point\n├── scheduler.py         # Periodic task scheduler\n├── handlers/            # Telegram update handlers\n│   ├── balance.py\n│   ├── cancel_task.py\n│   ├── create_task.py\n│   ├── file.py\n│   ├── history.py\n│   ├── price.py\n│   └── text.py\n├── database/            # Data access layer\n│   ├── connection.py    # MySQL connection setup via SQLAlchemy\n│   ├── models.py        # SQLAlchemy models for application tables\n│   └── queries.py       # Helper functions for common database operations\n├── payment/             # Tinkoff acquiring API wrappers\n│   ├── init.py\n│   ├── get_state.py\n│   └── cancel.py\n├── utils/               # Helper utilities\n│   ├── ffmpeg.py        # Conversion to OGG using ffmpeg\n│   ├── marketing.py     # Advertising/tracking: send conversion goals to Yandex Metrica\n│   ├── s3.py            # Upload helper for Yandex Cloud S3 (S3-compatible)\n│   ├── sentry.py        # Sentry error reporting helpers\n│   ├── speechkit.py     # Request transcription from SpeechKit\n│   └── tg.py            # Telegram-specific helpers\n└── requirements.txt     # Python dependencies list\n```\n\n## Environment variables\n\n### Telegram\n\n| Variable             | Description                                                       |\n|----------------------|-------------------------------------------------------------------|\n| `TELEGRAM_BOT_TOKEN` | Token used to authenticate the bot                                |\n| `TELEGRAM_API_ID`    | Optional, required only when using a local Bot API server         |\n| `TELEGRAM_API_HASH`  | Optional, required only when using a local Bot API server         |\n| `USE_LOCAL_PTB`      | Any value → use a local Bot API server at `http://127.0.0.1:8081` |\n\n### MySQL\n\n| Variable         | Description       |\n|------------------|-------------------|\n| `MYSQL_USER`     | Database user     |\n| `MYSQL_PASSWORD` | Database password |\n| `MYSQL_HOST`     | Database host     |\n| `MYSQL_PORT`     | Database port     |\n| `MYSQL_DB`       | Database name     |\n\n### Yandex Cloud\n\n#### S3\n\n| Variable         | Description       |\n|------------------|-------------------|\n| `S3_ACCESS_KEY`  | Access key        |\n| `S3_SECRET_KEY`  | Secret key        |\n| `S3_ENDPOINT`    | S3-compatible URL |\n| `S3_BUCKET`      | Bucket name       |\n\n#### SpeechKit\n\n| Variable        | Description |\n|-----------------|-------------|\n| `YC_API_KEY`    | API key     |\n| `YC_FOLDER_ID`  | Folder ID   |\n\n\n### Sentry\n\n| Variable        | Description                                                |\n|-----------------|------------------------------------------------------------|\n| `ENABLE_SENTRY` | Set to `1` to enable Sentry error reporting                |\n| `SENTRY_DSN`    | DSN of your Sentry project. Required if `ENABLE_SENTRY=1`  |\n\n### Marketing (Yandex.Metrica)\n\n| Variable       | Description                                                         |\n|----------------|---------------------------------------------------------------------|\n| `COUNTER_ID`   | Yandex.Metrica counter ID                                           |\n| `MEAS_TOKEN`   | Measurement Protocol token (generated in Metrica counter settings)  |\n| `BOT_URL`      | Public URL of your bot (e.g. `https://t.me/ClearTranscriptBot`)     |\n\n### Tinkoff acquiring\n\n| Variable            | Description                                  |\n|---------------------|----------------------------------------------|\n| `TERMINAL_KEY`      | Terminal key from Tinkoff                    |\n| `TERMINAL_PASSWORD` | Terminal password from Tinkoff               |\n| `TERMINAL_ENV`      | Environment: `test` for sandbox or `prod`    |\n\n## Local Bot API server\n\nTo handle large files you can run a local copy of Telegram's Bot API server.\nExample using Docker:\n\n```bash\ndocker run \\\n    --detach \\\n    --name tg-bot-api \\\n    --publish 8081:8081 \\\n    --volume /var/lib/telegram-bot-api:/var/lib/telegram-bot-api \\\n    --env TELEGRAM_API_ID=$TELEGRAM_API_ID \\\n    --env TELEGRAM_API_HASH=$TELEGRAM_API_HASH \\\n    --env TELEGRAM_LOCAL=True \\\n    aiogram/telegram-bot-api:latest \\\n    --http-ip-address=0.0.0.0 \\\n    --dir=/var/lib/telegram-bot-api\n```\n\nRun this container and set `USE_LOCAL_PTB` so that the bot uses the local\nserver.\n\n## Database schema\n\n```sql\n-- Users table holds Telegram users interacting with the bot\nCREATE TABLE IF NOT EXISTS users (\n    telegram_id      BIGINT          PRIMARY KEY,\n    telegram_login   VARCHAR(32),\n    balance          DECIMAL(10,2)   NOT NULL DEFAULT 250.00,\n    registered_at    TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP\n);\n\n-- History of transcription requests made by users\nCREATE TABLE IF NOT EXISTS transcription_history (\n    id               BIGINT          PRIMARY KEY,\n    telegram_id      BIGINT          NOT NULL REFERENCES users(telegram_id),\n    status           VARCHAR(32)     NOT NULL,\n    audio_s3_path    TEXT            NOT NULL,\n    duration_seconds INTEGER,\n    price_rub        DECIMAL(10,2),\n    result_s3_path   TEXT,\n    result_json      TEXT,\n    operation_id     VARCHAR(128),\n    message_id       INTEGER,\n    chat_id          BIGINT,\n    created_at       TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP\n);\n\n-- Index to speed up lookups by user\nCREATE INDEX idx_transcription_history_telegram_id\n    ON transcription_history(telegram_id);\n\n-- Payments processed via Tinkoff acquiring\nCREATE TABLE IF NOT EXISTS payments (\n    id              BIGINT          PRIMARY KEY,\n    telegram_id     BIGINT          NOT NULL REFERENCES users(telegram_id),\n    order_id        VARCHAR(64)     NOT NULL UNIQUE,\n    payment_id      BIGINT          UNIQUE,\n    amount          DECIMAL(10,2)   NOT NULL,\n    status          VARCHAR(32)     NOT NULL,\n    created_at      TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP\n);\n\nCREATE INDEX idx_payments_telegram_id\n    ON payments(telegram_id);\n```\n\n## Installation\n\nBefore running, install Python dependencies with:\n```bash\npip3 install -r requirements.txt\n```\n\nYandex Cloud MySQL requires SSL. To connect securely, download the CA certificate:\n```bash\nmkdir -p ~/.mysql \u0026\u0026 \\\nwget \"https://storage.yandexcloud.net/cloud-certs/CA.pem\" \\\n     --output-document ~/.mysql/root.crt \u0026\u0026 \\\nchmod 0600 ~/.mysql/root.crt\n```\n\nThe certificate will be saved to `~/.mysql/root.crt` and will be used automatically by MySQL clients to establish a secure connection.\n\n## Known issues (mysqlclient)\n\n\u003cdetails\u003e\n\u003csummary\u003e⚠️ pkg-config: command not found\u003c/summary\u003e\n\n**On macOS and Linux you may hit an error when installing mysqlclient:**\n\n```bash\nCollecting mysqlclient (from -r requirements.txt (line 8))\n  Using cached mysqlclient-2.2.7.tar.gz (91 kB)\n  Installing build dependencies ... done\n  Getting requirements to build wheel ... error\n  error: subprocess-exited-with-error\n\n  x Getting requirements to build wheel did not run successfully.\n  │ exit code: 1\n  ╰─\u003e [35 lines of output]\n      /bin/sh: pkg-config: command not found\n      *********\n      Trying pkg-config --exists mysqlclient\n      Command 'pkg-config --exists mysqlclient' returned non-zero exit status 127.\n```\n\nIn that case install pkg-config: `sudo apt install pkg-config` or `brew install pkg-config`\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e⚠️ Specify MYSQLCLIENT_CFLAGS and MYSQLCLIENT_LDFLAGS env vars manually\u003c/summary\u003e\n\n**On macOS and Linux you may hit an error when installing mysqlclient:**\n\n```bash\nCollecting mysqlclient (from -r requirements.txt (line 8))\n  Using cached mysqlclient-2.2.7.tar.gz (91 kB)\n  Installing build dependencies ... done\n  Getting requirements to build wheel ... error\n  error: subprocess-exited-with-error\n\n  × Getting requirements to build wheel did not run successfully.\n  │ exit code: 1\n  ╰─\u003e [29 lines of output]\n      Trying pkg-config --exists mysqlclient\n      Command 'pkg-config --exists mysqlclient' returned non-zero exit status 1.\n      *********\n      Exception: Can not find valid pkg-config name.\n      Specify MYSQLCLIENT_CFLAGS and MYSQLCLIENT_LDFLAGS env vars manually\n      [end of output]\n```\n\nIn that case install libmysqlclient-dev: `sudo apt install libmysqlclient-dev` or `brew install libmysqlclient-dev`\n\n**libmysqlclient-dev** — is the package that provides the headers and libraries required to build applications that link against MySQL\n\n\u003c/details\u003e\n\n## References\n\n- [Yandex Cloud SpeechKit docs][2]  \n- [Telegram Bot API][3]  \n\n[1]: https://t.me/ClearTranscriptBot\n[2]: https://cloud.yandex.com/docs/speechkit/\n[3]: https://core.telegram.org/bots/api","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgistrec%2Fcleartranscriptbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgistrec%2Fcleartranscriptbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgistrec%2Fcleartranscriptbot/lists"}