{"id":19561887,"url":"https://github.com/kaustbh/text-summarizer","last_synced_at":"2026-05-06T04:31:39.804Z","repository":{"id":227108081,"uuid":"769846801","full_name":"Kaustbh/Text-Summarizer","owner":"Kaustbh","description":null,"archived":false,"fork":false,"pushed_at":"2024-05-06T20:32:16.000Z","size":40064,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-26T08:45:12.649Z","etag":null,"topics":["api","cicd","deep-learning","docker","fastapi","huggingface-datasets","huggingface-transformers","mlops","mlops-pipeline","text-summarizer"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Kaustbh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-10T08:22:49.000Z","updated_at":"2024-10-01T03:49:54.000Z","dependencies_parsed_at":"2024-05-06T21:48:46.095Z","dependency_job_id":null,"html_url":"https://github.com/Kaustbh/Text-Summarizer","commit_stats":null,"previous_names":["kaustbh/text-summarizer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Kaustbh/Text-Summarizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaustbh%2FText-Summarizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaustbh%2FText-Summarizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaustbh%2FText-Summarizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaustbh%2FText-Summarizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Kaustbh","download_url":"https://codeload.github.com/Kaustbh/Text-Summarizer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kaustbh%2FText-Summarizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32678584,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-06T02:33:58.958Z","status":"ssl_error","status_checked_at":"2026-05-06T02:33:39.611Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","cicd","deep-learning","docker","fastapi","huggingface-datasets","huggingface-transformers","mlops","mlops-pipeline","text-summarizer"],"created_at":"2024-11-11T05:12:51.266Z","updated_at":"2026-05-06T04:31:39.690Z","avatar_url":"https://github.com/Kaustbh.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text-Summarizer\n\n\n \n## Table of Contents\n\n- [Introduction](#introduction)\n- [Dataset Used](#dataset-used)\n- [Model Used](#model-used)\n- [Workflow of Project](#workflow-of-project)\n- [Pipeline](#pipeline)\n  - [Data Ingestion](#data-ingestion)\n  - [Data Validation](#data-validation)\n  - [Data Transformation](#data-transformation)\n  - [Model Training](#model-training)\n  - [Model Evaluation](#model-evaluation)\n- [Running the Project](#running-the-project)\n- [Contributing](#contributing)\n- [License](#license)\n\n## Introduction\n\n### Text-Summarizer:\n\nA powerful and efficient text summarization tool designed to condense large bodies of text into concise summaries, preserving key information and insights.\n\n## Dataset Used\n\n- #### Samsum Dataset \n\nThe SAMSum dataset contains about 16k messenger-like conversations with summaries.\nYou can get the dataset from Hugging Face , https://huggingface.co/datasets/samsum?row=10 \n\n## Model Used\n\n- #### Google Pegasus Model\n\nThe project utilizes the Google Pegasus model, a state-of-the-art transformer-based model for text generation tasks, including summarization. Developed by Google Research, Pegasus stands for Pre-training with Extracted Gap-sentences for Abstractive SUmmarization of Texts. It is designed to generate abstractive summaries by learning to predict masked tokens in a text, making it highly effective for tasks requiring understanding and summarizing long texts.\n\n ## Workflow of Project\n\n1. Update config.yaml\n2. Update params.yaml\n3. Update entity\n4. Update the configuration manager in src config\n5. Update the components\n6. Update the Pipeline\n7. Update the main.py\n8. Update the app.py\n\n## Pipeline\n\n- #### Data Ingestion\n\nThe data ingestion phase involves downloading the dataset from hugging face and unzipping it into a designated directory. \n\n- #### Data Validation\n\nAfter ingestion, the dataset undergoes validation to ensure all required files are present and correctly formatted. This process checks for the presence of 'train', 'test', and 'validation' directories and logs the status.\n\n- #### Data Transformation\n\nIn the data transformation phase, the dataset is further processed to prepare it for model training. This includes tokenization using the Google Pegasus tokenizer (`google/pegasus-cnn_dailymail`).\n\n- #### Model Training\n\nThe model training phase involves training the Google Pegasus model on the transformed dataset.\n\n- #### Model Evaluation\n\nFinally, the trained model is evaluated on the same dataset used for training.\n\n## Running the Project\n\nTo clone and run the project, follow these steps:\n\n1. Clone the repository:\n\n```bash\ngit clone https://github.com/Kaustbh/Text-Summarizer.git\n```\n\n2. Navigate to the project directory:\n\n```bash\ncd Text-Summarizer\n```\n\n3. Create a Python virtual environment (optional but recommended):\n\n```bash\npython -m venv venv\n```\n\n4. Install the required packages:\n\n```bash\npip install -r requirements.txt\n```\n\n5. Run the Flask app:\n\n```bash\nflask --app app run --debug\n```\n\n## Contributing\n\nContributions to this project are welcome. If you encounter any issues or have suggestions for improvements, please submit a pull request or open an issue on the GitHub repository.\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n\nFeel free to customize this README file to include specific details about your project, such as how to extend the functionality, examples of usage, or any additional acknowledgments.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkaustbh%2Ftext-summarizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkaustbh%2Ftext-summarizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkaustbh%2Ftext-summarizer/lists"}