{"id":24976079,"url":"https://github.com/anthonyjdella/summarize-text","last_synced_at":"2025-04-11T13:08:40.647Z","repository":{"id":64480154,"uuid":"529475529","full_name":"anthonyjdella/summarize-text","owner":"anthonyjdella","description":"📖 A Python app that uses text recognition on photos, then texts you a summary.","archived":false,"fork":false,"pushed_at":"2023-03-13T22:44:51.000Z","size":9423,"stargazers_count":7,"open_issues_count":5,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-25T09:21:28.546Z","etag":null,"topics":["openai","python","text-recognition","twilio","vision-api"],"latest_commit_sha":null,"homepage":"https://www.twilio.com/blog/summarize-text-from-images-using-ai-and-twilio","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anthonyjdella.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-08-27T03:56:57.000Z","updated_at":"2024-09-21T13:12:30.000Z","dependencies_parsed_at":"2023-01-03T20:33:39.737Z","dependency_job_id":null,"html_url":"https://github.com/anthonyjdella/summarize-text","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonyjdella%2Fsummarize-text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonyjdella%2Fsummarize-text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonyjdella%2Fsummarize-text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anthonyjdella%2Fsummarize-text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anthonyjdella","download_url":"https://codeload.github.com/anthonyjdella/summarize-text/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248404695,"owners_count":21097805,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["openai","python","text-recognition","twilio","vision-api"],"created_at":"2025-02-03T21:59:22.474Z","updated_at":"2025-04-11T13:08:40.619Z","avatar_url":"https://github.com/anthonyjdella.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Summarize Text from Photos Using AI and Twilio\n\n![](https://lh3.googleusercontent.com/6aiRz4c6rZwt5I-1j1pvjW7mMzygnfZ9dvy8nTeD8ZSXy0J04a7mRlLlllnTjgkD4RQgqG4zH-nk_zuckAl1_f14CSGp4HMV-9rXzkeAhdBPWfdvcTPFEV-Kr4GcLdBg2dlrmsbZopXzT8I9Io_7jDI)\n\nContent, content, content! Are you overwhelmed by the amount of content you’re asked to read on a daily basis? Don’t you wish you could quickly summarize large chunks of text? It’d be a huge timesaver, especially for college students who read a lot of content!\n\nIn this blog post, I will teach you how to build an app in Python that performs text recognition on photos, summarizes that text, and then sends you a summary via SMS.\n\n![](https://lh3.googleusercontent.com/FWQ2xKRF_002kQjcFdNlwEbI9dDUt4KU8jXoTH7aEWDaHx6uZA54o_JtaBbgAVTmPvsHUOzKSI1JgtTiDmy9gFyQvBf-wazbnguwFqJjICpOur9n9jUNz8YRg6olzmSj1q5iD96fQ3imHhNlyg3m9qs)\n\nHere’s a typical use case: you see a large wall of text that you don’t want to read, so you pull out your phone to take a picture of that text, then you receive a SMS with a nice summarization. Boom, time saved!\n\n![](https://lh4.googleusercontent.com/w2MnJHu7sDu6V8lQfsJ6ZlYrOMjY2lCcZj1-PaU249d2EOyIxZZ3ut9X56sO8zkitCDiDoKaIcuGSh-KXB9Gt7LjhndNOxJ1lVpv7sHaQUV1bkiTRYx9hf-HeibkbzJVC99NMSBz4k5YVAXUeeRWA0g)\n\n## Prerequisites\n\nBefore getting started, it’s important to have the following before moving on:\n\n- [Python 3.7](https://www.python.org/downloads/) or higher installed on your machine.\n- A Twilio account. If you haven’t yet, [sign up for a free Twilio trial](https://www.twilio.com/try-twilio).\n- A Google Cloud account, [get started for free](https://console.cloud.google.com/freetrial).\n- An OpenAI account, [sign up for a free account](https://openai.com/join/).\n- ngrok installed on your machine. ngrok is a useful tool for connecting your local server to a public URL. You can [sign up for a free account](https://ngrok.com/) and [learn how to install ngrok](https://ngrok.com/download).\n- A phone with a [US or Canada](https://www.twilio.com/docs/sms/tutorials/sending-international-sms-guide#international-mms-messages) phone number.\n\n\n## Access the Code\n\nIf blog posts aren’t your thing and you’d prefer to just look at the code, it’s available in this [GitHub repository](https://github.com/anthonyjdella/summarize-text).\n\n\n## Table of Contents\n\nFor context, this blog post is structured as follows:\n\n1. **Setup Google Cloud Vision:** Set up our Google Cloud account and enable the Vision API\n2. **Setup OpenAI:** Set up our OpenAI account\n3. **Setup Local Environment:** Set up our local development environment\n4. **Cloud Vision API:** Using ML, detect words from images using the Google Cloud Vision API\n5. **OpenAI API: **Using AI, generate a summary of text from the OpenAI API\n6. **Twilio SMS API:** Send a text message (containing the summary) when the application is triggered\n\n\n## Setup Google Cloud Vision\n\nTo use the Google Cloud Vision API, we need to set it up by following the [quickstart guide](https://cloud.google.com/vision/docs/setup). This process does take some time, but don’t get discouraged. Just follow the quickstart guide step-by-step, or continue along here (if you don’t want to tab out).\n\nAssuming you already have a [Google Cloud account](https://console.cloud.google.com/freetrial), you’ll need to [create a new project](https://console.cloud.google.com/projectselector2/home/dashboard) within Google Cloud. Give it a Project name of **summarize-text** and click Create.\n\n![](https://lh5.googleusercontent.com/z8TYmB7ZyDWfnMChAIa6sTOUZViUzZi4n42ZPNaFNO7cfy1n4n7hYB1MDL1-VIkgo--xwpBb4t9eF0ncNzK6bTgcooKtepqG7SUN3vK99Q6d7cly52SjHuIa2eSO2l8F10C_NNzrZP2-r2FncRIxc5U)\n\nNext, enable billing for the project we just created. But don’t worry, you won’t be charged unless you exceed the [Cloud Vision monthly limits](https://cloud.google.com/vision/pricing). Learn how to [check if billing is enabled on a project](https://cloud.google.com/billing/docs/how-to/verify-billing-enabled).\n\n![](https://lh4.googleusercontent.com/tHg33DDCscEQHbtu0ZR-8lVv8CDESkflL8PXkSABScsFtAshmfn2sb61nNbUvLVbc9fq24vjEXm6Wk-81sYWocPFfct4iJGr9l0QoopWGch3u_5vFAmAKBshrlpy3c88gKhEJzRXuXz2Qj-TQfexi3g)\n\n\n[Enable the Vision API](https://console.cloud.google.com/flows/enableapi?apiid=vision.googleapis.com) for the project we created earlier, called **summarize-text**.\n\n![](https://lh6.googleusercontent.com/bQpoOg2PSBJg1CwNSXk9SxeBdTIXUtiEyYYnsGq4caWoPVtXGp7hNquz8GBi3D0iQN4Ww6rum_ypcGjhuLBs4H-TdUWtO871PTuCzSvDY7U38Wtpuw24ZG1T3_W5_Z4LFcGmlINdZY4C_lXn98ITqU0)\n\nNext, set up authentication with a service account. Go to [Create a service account](https://console.cloud.google.com/projectselector/iam-admin/serviceaccounts/create?supportedpurview=project), select our project (**summarize-text)**, in the **Service account name** field, enter a name of **summarize-text**, in the **Service account description** field, enter a description of **Service account for summarize-text**. Continue and then grant the role of Project \u003e **Owner** to your service account.\n\nAfter creating a service account, create a service account key by clicking on the email address of service account: **summarize-text**. Click **Keys**, then **Create new key**. After doing this, a JSON key file will be downloaded to your computer. You’ll need to store this file in a location of your choice and then set an environment variable pointing to the path of this JSON file.\n\nFor example, on Linux or MacOS, in .zshrc:\n\n```bash\nexport GOOGLE_APPLICATION_CREDENTIALS=\"/home/user/Downloads/service-account-file.json\"\n```\n\nFor example, on Windows with PowerShell:\n\n```bash\n$env:GOOGLE_APPLICATION_CREDENTIALS=\"C:\\\\Users\\\\username\\\\Downloads\\\\service-account-file.json\"\n```\n\nNext, install the Google Cloud CLI. Since this is different for each operating system, follow the steps outlined in Google’s [gcloud CLI installation guide](https://cloud.google.com/sdk/docs/install#installation_instructions).\n\nFinally, install the Python client library with the following command:\n\n```bash\npip install --upgrade google-cloud-vision\n```\n\n## Setup OpenAI\n\nAssuming you already registered for an [account with OpenAI](https://openai.com/join/), you’ll need to create an API key in your user account settings, which will allow you to authenticate your application with OpenAI. Copy this key and don’t share it with anyone!\n\n![](https://lh4.googleusercontent.com/JPCucfBTomdVGhRwtGpFGln4Gj40l4N2L65UFARgV1UBR64BJAKJQpxyJCJYMsJzcmp-fJ0-_IVFOHmM223usmaZAiM1gz182APITEFdCvyJRqyKY2grLwqWTQlCJHKVi3P9Bu31nWzSGJ6DE3KzvbA)\n\nWe will securely store this API key in the following section.\n\n\n## Setup Local Environment\n\nCreate an empty project directory:\n\n```bash\nmkdir summarize_text\n```\n\nThen change into that directory as that’s where our code will be.\n\n```bash\ncd summarize_text\n```\n\nCreate a [virtual environment](https://www.twilio.com/docs/usage/tutorials/how-to-set-up-your-python-and-flask-development-environment#start-a-new-project-with-virtualenv):\n\n```bash\npython -m venv summarize\n```\n\nActivate our virtual environment:\n\n```bash\nsource summarize/bin/activate\n```\n\nInstall dependencies to our virtual environment:\n\n```bash\npip install python-dotenv twilio Flask requests google-cloud-vision openai\n```\n\nLet’s create a file called \\`.env\\` in the project’s root directory to store our API keys in [environment variables](https://www.twilio.com/blog/2017/01/how-to-set-environment-variables.html). \n\nWithin that file, we’ll create an environment variable called \\`OPENAI_API_KEY\\`. \n\n(Replace \\`PASTE_YOUR_API_KEY_HERE\\` with the API key that you copied earlier.)\n\n```bash\nOPENAI_API_KEY=PASTE_YOUR_API_KEY_HERE\n```\n\nFor example:\n\n```bash\nOPENAI_API_KEY=sk-1234567890abcdefg\n```\n\nSince we’ll also be working with our Twilio account, we’ll need to modify this file even more. Log into your [Twilio console](https://console.twilio.com/), then scroll down to find your Account SID and Auth Token. Add two additional lines to the \\`.env\\` file, but change the values to equal your unique Account SID and Auth Token.\n\n```bash\nTWILIO_ACCOUNT_SID=PASTE_YOUR_ACCOUNT_SID_HERETWILIO_AUTH_TOKEN=PASTE_YOUR_AUTH_TOKEN_HERE\n```\n\nFor example:\n\n```bash\nTWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxTWILIO_AUTH_TOKEN=321321321321321\n```\n\nIf you’re pushing these to a Git repository, please make sure to add the \\`.env\\` file to your \\`.gitignore\\` so that these credentials are secured.\n\nWe’ll be working with local images, so in your project’s root directory, create a new directory called \\`resources\\`. For now, it will be an empty directory, but later this is where images will be stored.\n\n\n## Cloud Vision API\n\nSince we set it up already, you may be wondering what the Vision API is. It’s a Google API that offers powerful pre-trained machine learning models through REST. With the API, you can do things like [detect faces](https://cloud.google.com/vision/docs/face-tutorial), identify places, [recognize celebrities](https://cloud.google.com/vision/docs/celebrity-recognition), and much more. For this app, we will be using Optical Character Recognition (OCR) to [recognize text in images](https://cloud.google.com/vision/docs/ocr). \n\nCreate a file called \\`detect.py\\` in the project’s root directory and copy and paste the following code into the file:\n\n```python\nimport io\nimport os\nfrom google.cloud import vision\n\n\ndef detect_text():\n    client = vision.ImageAnnotatorClient()\n\n    file_name = os.path.abspath('resources/image.jpg')\n\n    with io.open(file_name, 'rb') as image_file:\n        content = image_file.read()\n\n    image = vision.Image(content=content)\n\n    response = client.text_detection(image=image)\n    texts = response.text_annotations\n    return(texts[0].description)\n```\n\n\nThe \\`detect_text\\` function will look at a local file from your computer–in this case an image from the \\`resources/\\` directory called \\`image.jpg\\`. Then, we will read the content from that image and use the \\`text_detection\\` function from the Vision API to detect text. Finally, we’ll return that text.\n\nIf you were to run the \\`detect_text\\` function as is, it wouldn’t work since we are reading an image called \\`image.jpg\\` from the \\`resources/\\` directory that doesn’t currently exist. But we’ll come back to this later.\n\nCreate a new file called \\`utilities.py\\` in the project’s root directory and paste the following code into the file:\n\n```python\nimport requests\n\n\ndef save_image(image_url):\n    img_data = requests.get(image_url).content\n    with open('resources/image.jpg', 'wb') as handler:\n        handler.write(img_data)\n```\n\nThe \\`save_image\\` function will take an image url and save it as a file called \\`image.jpg\\` within the \\`resources/\\` directory.\n\n\n## OpenAI API\n\nNow that we’ve written the code for interacting with the Cloud Vision API that allows us to perform text recognition on photos, we can use the OpenAI API to summarize that text. OpenAI is an AI company (surprise, surprise) that applies models on natural language for various tasks. You give the API a prompt, which is natural language that you input, and the AI will generate a response. For example, if you input a prompt “write a tagline for an ice cream shop” you may see a response like “we serve up smiles with every scoop!”\n\nIn the project’s root directory create a file called \\`summarize.py\\` and paste the following code into the file:\n\n```python\nimport os\nimport openai\nfrom dotenv import load_dotenv\nfrom detect import detect_text\nfrom utilities import save_image\n\n\nload_dotenv()\n\n\nopenai.api_key = os.getenv(\"OPENAI_API_KEY\")\n\n\ndef get_text_from_image(url):\n    save_image(url)\n    return detect_text()\n\n\ndef generate_prompt(url):\n    return f\"In one-sentence, summarize the following text: \\n {get_text_from_image(url)} \\n\"\n\n\ndef summarize_prompt(url):\n    response = openai.Completion.create(\n        model=\"text-davinci-002\",\n        prompt=generate_prompt(url),\n        temperature=0.8,\n        max_tokens=100,\n        top_p=1.0,\n        frequency_penalty=0.0,\n        presence_penalty=0.0\n    )\n    print(response.choices[0].text)\n    return(response.choices[0].text)\n```\n\nThe \\`summarize_prompt\\` function uses the OpenAI API \\`create\\` function to respond to a prompt that we give it (\\`generate_prompt\\`). The model we are specifying (_text-davinci-002_) is OpenAI’s most capable [GPT-3 model](https://en.wikipedia.org/wiki/GPT-3). The max_tokens parameter sets an upper bound on how many [tokens](https://beta.openai.com/tokenizer) the API will return, or how long our response will be. \\`generate_prompt\\` will create a prompt that summarizes text in one sentence. \\`get_text_from_image\\` will call our previously created functions from the previous section.\n\n\n## Twilio SMS API\n\nNow, we’ll create the code in our application that will all us to text our Twilio phone number and get back a response. This is called sending an Inbound SMS. Think of inbound as an inbound SMS to a Twilio phone number triggering your application. In this case, we will be sending a text to a Twilio phone number (our trigger), then having it respond by sending a reply containing a summary.\n\nCreate a new file (in the same directory) called \\`app.py\\`. Using [Flask](https://flask.palletsprojects.com/en/2.1.x/), a Python web framework, we will create an app that runs on a local server. Paste the following code into \\`app.py\\`:\n\n```python\nfrom flask import Flask, request\nfrom twilio.twiml.messaging_response import MessagingResponse\nfrom summarize import summarize_prompt\n\n\napp = Flask(__name__)\n\n\ndef respond(message):\n   response = MessagingResponse()\n   response.message(message)\n   return str(response)\n\n\n@app.route(\"/summary\", methods=['GET', 'POST'])\ndef incoming_sms():\n   user_input = request.form.get('NumMedia')\n   if user_input == '1':\n       pic_url = request.form.get('MediaUrl0')\n       summary = summarize_prompt(pic_url)\n       return respond(f\"{summary}\")\n   else:\n       return respond(\"Please send a picture containing text!\")\n\n\nif __name__ == \"__main__\":\n   app.run(host='localhost', debug=True, port=8080)\n```\n\nRun the application on your local server with this command in your console (from the root directory):\n\n```bash\npython app.py\n```\n\nYour application should be running on \u003chttp://localhost:8080\u003e. Output will look similar to this:\n\n```bash\n * Serving Flask app 'app' (lazy loading)\n * Environment: production\n   WARNING: This is a development server. Do not use it in a production deployment.\n   Use a production WSGI server instead.\n * Debug mode: on\n * Running on http://localhost:8080 (Press CTRL+C to quit)\n * Restarting with stat\n * Debugger is active!\n * Debugger PIN: 199-776-319\n```\n\nAs of now, our application is only running on a server within your computer. But we need a public-facing URL (not \u003chttp://localhost\u003e) to configure a [Webhook](https://www.twilio.com/docs/usage/webhooks/getting-started-twilio-webhooks) so Twilio can find it. By using a tool, called ngrok, we will [“put localhost on the Internet”](https://ngrok.com/product) so we can configure our webhook.\n\nIn another console tab run the command:\n\n```bash\nngrok http 8080\n```\n\nThis will create a “tunnel” from the public Internet into port 8080 in our local machine, where the Flask app is listening for requests. You should see output similar to this:\n\n![](https://lh3.googleusercontent.com/k5z3v6P7sSfco3bj9EzA66t1qNWfqCKpA7zAfy1BUT79xgBaaAZuisZcISOTDCsesleptuR6GQrwLpqytFd5Ff5tPCur_19IwX3CVobt-r9A14kKNOAHJ0TiT6KD1jj6L9xUtWBJSpSml38L0Tc9kWw)\n\nTake note of the line that says “Forwarding”. In the image above, it shows: \n\n```bash\nhttps//5bad813c2718.ngrok.io -\u003e http//localhost:8080\n```\n\nThis means that our local application is running publicly on \n\n```bash\nhttps//5bad813c2718.ngrok.io/summary\n```\n\nWithin the [Console](https://console.twilio.com), enter in the ngrok URL as a Webhook when “A Message Comes In”.\n\n![](https://lh5.googleusercontent.com/qIaL_tdyDl1s4buJkwmfWfdgTQieLd3n5DSU228mtzWGnFE9T1UkmghKILbgKjpPA80src6SoSQ1yJf7rU3ap5y8CC27ox2J3t1d6pOwOmwxL_nYLWmOFv6gTVQ_ibupknwXU99Od2M1JSTIEet_kS8)\n\nPlease be aware that unless you have a paid ngrok account, each time you run the ngrok command a new URL will be generated, so be sure to make the changes within the Twilio console.\n\nSince our application and ngrok are running, we can send a text message to our Twilio phone number and it will respond back with a summary of text!\n\n![](https://lh4.googleusercontent.com/Hsvz7e0FdzS8rkblHCP8NQsQq0vDmc7hdQBLTY3vUebalyO6Tz0lPDL9RX8tAyVbWFkDMVTbdlv4GAKxMN2HfVh20kSEg9z1Ts67F9OpOOyjL0fhFvg9AZ_HcRCoLY-vYjgZECCO0h4s0AaVwihrRAM)\n\n\n## Show Me What You Build\n\nNow if there’s a big wall of text that you don’t want to read, pull out your phone, take a picture, and then text it to your Twilio number. You’ll get a response back with a short summary!\n\nThanks so much for reading! If you found this tutorial helpful, have any questions, or want to show me what you’ve built, let me know online. And if you want to learn more about me, check out [my intro blog post](https://www.twilio.com/blog/introducing-twilio-developer-evangelist-anthony-dellavecchia).\n\n![](https://lh5.googleusercontent.com/QqqYPg-hhp8oQKv4XEWLDNhjs5DrmgJbm_qEWZWJLzudWG9T46R7OIGWhVDRHjosLv7aM-I3xXxzORP6VhiUjbJvZIjiO1RZx-aLdIJXwZUMXTgwR8b1FRzWKra4KTQP2gljGhKXRG1fp83uWqkYbEk)\n\n\u003e _Anthony Dellavecchia is a Developer Evangelist at Twilio who writes code on stage in front of a crowd. He is an experienced software developer who teaches thousands of people how to change the world with code. His goal is to help you build deep experiences and connections with technology so that they stick with you forever._\n\n\u003e _Check him out online @anthonyjdella -- _[_Twitter_](https://twitter.com/anthonyjdella)_ • _[_Linkedin_](https://www.linkedin.com/in/anthonydellavecchia/)_ • _[_GitHub_](https://github.com/anthonyjdella)_ • _[_TikTok_](https://tiktok.com/@anthonyjdella)_ • _[_Medium_](https://medium.com/@anthonyjdella)_ • _[_Dev.to_](https://dev.to/anthonyjdella)_ • Email • _[_anthonydellavecchia.com_](https://anthonydellavecchia.com/)_ 👈_\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanthonyjdella%2Fsummarize-text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanthonyjdella%2Fsummarize-text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanthonyjdella%2Fsummarize-text/lists"}