{"id":23050873,"url":"https://github.com/codewithcharan/text-summarizer","last_synced_at":"2025-10-16T16:54:34.353Z","repository":{"id":253540214,"uuid":"841556650","full_name":"CodeWithCharan/Text-Summarizer","owner":"CodeWithCharan","description":null,"archived":false,"fork":false,"pushed_at":"2024-09-05T10:44:10.000Z","size":86,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-08T17:44:48.875Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CodeWithCharan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-12T16:34:42.000Z","updated_at":"2024-09-05T10:44:14.000Z","dependencies_parsed_at":"2024-09-05T17:04:07.672Z","dependency_job_id":null,"html_url":"https://github.com/CodeWithCharan/Text-Summarizer","commit_stats":null,"previous_names":["codewithcharan/text-summarizer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeWithCharan%2FText-Summarizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeWithCharan%2FText-Summarizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeWithCharan%2FText-Summarizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeWithCharan%2FText-Summarizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CodeWithCharan","download_url":"https://codeload.github.com/CodeWithCharan/Text-Summarizer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246933354,"owners_count":20857052,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-15T23:39:03.098Z","updated_at":"2025-10-16T16:54:29.311Z","avatar_url":"https://github.com/CodeWithCharan.png","language":"Jupyter Notebook","readme":"# End-to-End-Text-Summarizer-Project\n\nIn this project, I built a text summarizer app that will summarize any text, dialogue, conversation or article. I utilized the Pegasus model and fine-tuned it with the SAMSum dataset.\n\n🔗 **Model**: [Pegasus](https://huggingface.co/docs/transformers/en/model_doc/pegasus)  \n🔗 **Dataset**: [SAMSum](https://huggingface.co/datasets/Samsung/samsum)\n\n### **Project Highlights**\nWhat makes this project unique is the integration of **MLOps** techniques:\n- **Pipelines**: Implemented Data Ingestion, Data Validation, Data Transformation, Model Trainer, Model Evaluation and Prediction.\n- **Docker**: Containerized the source code for easy deployment on AWS ECR and EC2.\n- **GitHub Actions**: Set up Continuous Integration and Continuous Deployment (CI/CD).\n- **Streamlit App**: Developed a user-friendly UI for interaction.\n\n### **Challenges \u0026 Solutions**\n1. **Model Fine-Tuning**: Finding the right hyperparameters for the SAMSum dataset required extensive experimentation, leading to a more robust summarization performance.\n2. **Deployment**: Containerizing and deploying the app on AWS was challenging, but leveraging Docker and GitHub Actions streamlined the process.\n\n### **Results**\n- The app successfully summarized a chat between my friend [@dheerajvoore](https://www.linkedin.com/in/dheerajvoore/) and me into a concise and clear summary.\n\n## Workflows\n\n1. Update config.yaml\n2. Update params.yaml\n3. Update entity\n4. Update the configuration manager in src config\n5. update the components\n6. update the pipeline\n7. update the main.py\n8. update the app.py\n\n## Pipelines\n\n1. Data Ingestion\n2. Data Validation\n3. Data Transformation\n4. Model Trainer\n5. Model Evaluation\n\n## STEPS\n\nClone the repository\n\n```bash\ngit clone https://github.com/CodeWithCharan/Text-Summarizer.git\n```\n### STEP 01: Create a conda environment after opening the repository\n\n```bash\nconda create -n envname python=3.8 -y\n```\n\n```bash\nconda activate envname\n```\n\n\n### STEP 02: install the requirements\n```bash\npip install -r requirements.txt\n```\n\n### STEP 03: run streamlit app\n```bash\nstreamlit run app.py\n```\n\n### STEP 04: After running the app, it will be available at:\n\n- `http://127.0.0.1:8080`\n- `http://localhost:8080`\n\n## AWS CI/CD Deployment with Github Actions\n\nDescription: About the deployment\n\n\t1. Build docker image of the source code\n\n\t2. Push your docker image to ECR\n\n\t3. Launch Your EC2 \n\n\t4. Pull Your image from ECR in EC2\n\n\t5. Lauch your docker image in EC2\n## Steps:\n\n### 1. Login to AWS console.\n\n### 2. Create IAM user for deployment\n\n\t#with specific access\n\n\t1. EC2 access : It is virtual machine\n\n\t2. ECR: Elastic Container registry to save your docker image in aws\n\n\t#Policy:\n\n\t1. AmazonEC2ContainerRegistryFullAccess\n\n\t2. AmazonEC2FullAccess\n\n\t\n### 3. Create ECR repo to store/save docker image\n    - Save the URI: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\n\n### 4. Create EC2 machine (Ubuntu) \n\n### 5. Open EC2 and Install docker in EC2 Machine:\n\t\n\t#optinal\n\n\tsudo apt-get update -y\n\n\tsudo apt-get upgrade\n\t\n\t#required\n\n\tcurl -fsSL https://get.docker.com -o get-docker.sh\n\n\tsudo sh get-docker.sh\n\n\tsudo usermod -aG docker ubuntu\n\n\tnewgrp docker\n\t\n### 6. Configure EC2 as self-hosted runner:\n    setting\u003eactions\u003erunner\u003enew self hosted runner\u003e choose os\u003e then run command one by one\n\n\n### 7. Setup github secrets:\n\n    AWS_ACCESS_KEY_ID=\n\n    AWS_SECRET_ACCESS_KEY=\n\n    AWS_REGION = ap-south-1\n\n    AWS_ECR_LOGIN_URI =\n\n    ECR_REPOSITORY_NAME = mlproj\n\n### **Pegasus Model Performance**:\n\n| Model   | ROUGE-1   | ROUGE-2 | ROUGE-L | ROUGE-Lsum |\n|---------|-----------|---------|---------|------------|\n| Pegasus | 0.02161   | 0.0     | 0.02131 | 0.02125    |\n\n*\"Continuously working on improving the model's performance for better summarization accuracy!\"*","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodewithcharan%2Ftext-summarizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodewithcharan%2Ftext-summarizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodewithcharan%2Ftext-summarizer/lists"}