{"id":16556623,"url":"https://github.com/nuhmanpk/visionscriptbot","last_synced_at":"2025-03-21T10:32:26.328Z","repository":{"id":215910308,"uuid":"737186508","full_name":"nuhmanpk/VisionScriptBot","owner":"nuhmanpk","description":"A telegram bot that uses Google's Gemini Pro Vision API to convert image to text","archived":false,"fork":false,"pushed_at":"2024-07-30T03:48:26.000Z","size":2463,"stargazers_count":21,"open_issues_count":0,"forks_count":8,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-18T00:38:29.929Z","etag":null,"topics":["gemini-ai","gemini-api","gemini-vision-pro","google","google-ai-studio","google-api","gpt","pyrogram","python","telegram-bot","vision-transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nuhmanpk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"nuhmanpk","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2023-12-30T05:32:12.000Z","updated_at":"2025-03-01T00:18:45.000Z","dependencies_parsed_at":"2024-07-30T07:52:44.323Z","dependency_job_id":null,"html_url":"https://github.com/nuhmanpk/VisionScriptBot","commit_stats":null,"previous_names":["nuhmanpk/visionscriptbot"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nuhmanpk%2FVisionScriptBot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nuhmanpk%2FVisionScriptBot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nuhmanpk%2FVisionScriptBot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nuhmanpk%2FVisionScriptBot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nuhmanpk","download_url":"https://codeload.github.com/nuhmanpk/VisionScriptBot/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244778034,"owners_count":20508841,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gemini-ai","gemini-api","gemini-vision-pro","google","google-ai-studio","google-api","gpt","pyrogram","python","telegram-bot","vision-transformer"],"created_at":"2024-10-11T20:05:15.489Z","updated_at":"2025-03-21T10:32:25.358Z","avatar_url":"https://github.com/nuhmanpk.png","language":"Python","funding_links":["https://github.com/sponsors/nuhmanpk"],"categories":[],"sub_categories":[],"readme":"# VisionScriptBot\nA telegram bot that uses Google's Gemini Pro Vision API , Take a demo [here](https://t.me/visionscriptbot). New Version support prompts along with Images, Add your prompt in Image caption before uploading the Image.\n\n### Gemini Vision Pro\n\nGemini Pro Vision is a Gemini large language vision model that understands input from text and visual modalities (image and video) in addition to text to generate relevant text responses.\n\nGemini Pro Vision is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.\n\n## Gemini API \nVisionScriptBot uses Google new [Gemini Pro Model](https://ai.google.dev/docs) . \n\n[Gemini](https://deepmind.google/technologies/gemini/) is Google's latest family of [large language models](https://blog.google/technology/ai/google-gemini-ai/#performance).\n\n### API KEY\n\nYou need Google Api key 🔐 for Gemini to run this model. \nGet your api key from \nhttps://makersuite.google.com/app/apikey\n\n\nGoogle's Python SDK for the Gemini API, is contained in the google-generativeai package. Install the dependency using pip:\n\n\n```bash\npip install -q -U google-generativeai\n```\n\nfor complete guide [refer](https://ai.google.dev/tutorials/python_quickstart)\n\n### Deploy\n\nDeployed on [Railway.app](https://railway.app?referralCode=O6FeyZ) , do checkout their free hosting plans [here](https://railway.app?referralCode=O6FeyZ) \n\n#### Use cases\n\n1. Visual information seeking: Use external knowledge combined with information extracted from the input image or video to answer questions.\n\n1. Object recognition: Answer questions related to fine-grained identification of the objects in images and videos.\n\n1. Digital content understanding: Answer questions and extract information from visual content like infographics, charts, figures, tables, and web pages.\n\n1. Structured content generation: Generate responses based on multimodal inputs in formats like HTML and JSON.\n\n1. Captioning and description: Generate descriptions of images and videos with varying levels of details.\n\n1. Reasoning: Compositionally infer new information without memorization or retrieval.\n\n\n## Demo\n\n![](https://github.com/nuhmanpk/VisionScriptBot/blob/main/demos/Screenshot_20231230-115838.png)\n\n## Support\n\nIf You find this project useful, Do support me [here](https://github.com/sponsors/nuhmanpk) \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnuhmanpk%2Fvisionscriptbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnuhmanpk%2Fvisionscriptbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnuhmanpk%2Fvisionscriptbot/lists"}