Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/surbhi242singh/text_summarizer
https://github.com/surbhi242singh/text_summarizer
machine-learning nlp spacy tokenization
Last synced: 27 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/surbhi242singh/text_summarizer
- Owner: Surbhi242singh
- Created: 2024-08-23T12:15:51.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-08-23T16:45:10.000Z (2 months ago)
- Last Synced: 2024-10-09T11:42:49.643Z (27 days ago)
- Topics: machine-learning, nlp, spacy, tokenization
- Language: Jupyter Notebook
- Homepage:
- Size: 36.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Text Summarization with spaCy
## Project Overview
This project focuses on generating a summary of a given text using Natural Language Processing (NLP) techniques with spaCy. The approach involves tokenizing the text, calculating word frequencies, scoring sentences based on word frequencies, and selecting the most important sentences to create a concise summary.
## Methodology
**Word Tokenization:**
- The text is divided into individual words (tokens) using spaCy's NLP pipeline.**Word Frequency Calculation:**
- Stop words and punctuation are excluded, and the frequency of each word in the text is calculated.**Sentence Tokenization:**
- The text is split into sentences.**Sentence Scoring:**
- Each sentence is scored based on the cumulative frequency of the words it contains.**Summary Generation:**
- The top 30% of sentences with the highest scores are selected to form the summary.## Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.