{"id":24430735,"url":"https://github.com/gusanmaz/echosight","last_synced_at":"2025-04-28T12:21:58.341Z","repository":{"id":215116765,"uuid":"738160754","full_name":"gusanmaz/echosight","owner":"gusanmaz","description":"EchoSight is a tool that helps visually impaired individuals by audibly describing images taken with a Raspberry Pi Camera or inputted via image path or URL across different operating systems.","archived":false,"fork":false,"pushed_at":"2024-01-03T11:43:45.000Z","size":218,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-30T09:31:33.158Z","etag":null,"topics":["cogvl","coqui-tts","llm","llms","raspberry-pi","replicate","replicate-api","seamlessm4t","visual-audio","visual-audio-navigation","vllm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gusanmaz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-02T15:13:53.000Z","updated_at":"2024-12-05T14:06:16.000Z","dependencies_parsed_at":"2025-03-13T19:20:19.538Z","dependency_job_id":"0ca64950-e8ec-4c51-9755-699d708feede","html_url":"https://github.com/gusanmaz/echosight","commit_stats":null,"previous_names":["gusanmaz/echosight"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gusanmaz%2Fechosight","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gusanmaz%2Fechosight/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gusanmaz%2Fechosight/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gusanmaz%2Fechosight/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gusanmaz","download_url":"https://codeload.github.com/gusanmaz/echosight/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251311422,"owners_count":21569029,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cogvl","coqui-tts","llm","llms","raspberry-pi","replicate","replicate-api","seamlessm4t","visual-audio","visual-audio-navigation","vllm"],"created_at":"2025-01-20T14:57:40.539Z","updated_at":"2025-04-28T12:21:58.323Z","avatar_url":"https://github.com/gusanmaz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EchoSight\n\nEchoSight is designed to assist visually impaired individuals by providing audible descriptions of images captured by a camera. It operates in two modes: one for capturing images using a Raspberry Pi Camera and listening to their voice descriptions, and another for inputting an image path or URL on various operating systems to hear voice descriptions.\n\n## Output Files\n\nThe project generates multiple outputs during operation:\n\n- **Image Files**: Captured or downloaded images are saved in the `output` directory.\n- **Text Descriptions**: Text descriptions of the images in both English and Turkish are saved as `.txt` files in the `output` directory.\n- **Audio Files**: The Turkish voice description of the image is saved as a `.wav` file in the `output` directory.\n- **Log Files**: Event logs and errors are recorded and saved in `events.log` files within the respective output subdirectories.\n\n## Configurable Parameters\n\n- **KEY_ACTION**: In `rpi.py`, this is set to 'KEY_S' by default. Modify the `KEY_ACTION` variable to change the key action.\n- **CAMERA_DELAY**: In `rpi.py`, the default camera delay is '0.1' seconds. Adjust the `CAMERA_DELAY` variable to change this setting.\n- **MAX_WIDTH**: In `image2speech.py`, the maximum image width for resizing is controlled by `MAX_WIDTH`. Alter this parameter as needed.\n\n## Pre-requisites (For Raspberry Pi Usage)\n\n- Ensure Raspberry Pi OS is installed.\n- Use [Raspberry Pi Imager](https://downloads.raspberrypi.org/imager/imager_latest.exe) to prepare your SD card.\n- Test your Raspberry Pi Camera: `libcamera-jpeg -o z.jpg`.\n\n## Installation\n\n- Obtain your Replicate.com API token:\n  - For Bash: `echo 'export REPLICATE_API_TOKEN=your_token_here' \u003e\u003e ~/.bashrc`.\n  - For Zsh: `echo 'export REPLICATE_API_TOKEN=your_token_here' \u003e\u003e ~/.zshrc`.\n- Set `keyboard_path` correctly if automatic detection fails. Refer to [this guide](https://chat.openai.com/share/bd2753d8-0ee3-4963-8e26-9569575470eb).\n- Clone and setup the EchoSight environment:\n  ```bash\n  git clone https://github.com/gusanmaz/echosight\n  cd echosight\n  python3 -m venv env\n  source env/bin/activate\n  pip install -r requirements.txt\n\n\n### Usage \n**(Raspberry Pi) To capture images from Raspberry Pi Camera by pressing a keyboard button (default: S) to listen \n  voice description of the captured image**\n\n* `python3 rpi.py`\n\n **(ALL OSes) Give an image path or URL to listen voice description of the image**\n\n* `python3 url2speech.py image_path_or_url`\n\n\n### Models\n\nThis project uses models from https://replicate.com/ to generate voice descriptions of the images. You can find the models used in this project from the links below.\n\n* **cogvlm**\n  * Replicate Model: https://replicate.com/cjwbw/cogvlm \n  * Github Repo: https://github.com/THUDM/CogVLM\n* **Seamless Communication**\n  * Replicate Model: https://replicate.com/cjwbw/seamless-communication\n  * Github Repo: https://github.com/facebookresearch/seamless_communication\n* **Coqui XTTS-v2**\n  * Replicate Model: https://replicate.com/cjwbw/coqui-xtts-v2\n  * Github Repo: https://github.com/coqui-ai/TTS\n\nFuture versions may incorporate different models, and the code could be adapted for easier experimentation with various models.\n\n### Cost\n\n* **Conservative Cost Estimate**: 0.2$ per image\n* **Conservative Runtime Estimate**: 40 seconds per image to produce audio (excluding time spent for starting the \n  models \non Replicate.com)\n\n### License\nApache License 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgusanmaz%2Fechosight","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgusanmaz%2Fechosight","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgusanmaz%2Fechosight/lists"}