Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/surbhi242singh/text_summarizer

machine-learning nlp spacy tokenization

Last synced: 27 days ago
JSON representation

Host: GitHub
URL: https://github.com/surbhi242singh/text_summarizer
Owner: Surbhi242singh
Created: 2024-08-23T12:15:51.000Z (2 months ago)
Default Branch: main
Last Pushed: 2024-08-23T16:45:10.000Z (2 months ago)
Last Synced: 2024-10-09T11:42:49.643Z (27 days ago)
Topics: machine-learning, nlp, spacy, tokenization
Language: Jupyter Notebook
Homepage:
Size: 36.1 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Text Summarization with spaCy

## Project Overview

This project focuses on generating a summary of a given text using Natural Language Processing (NLP) techniques with spaCy. The approach involves tokenizing the text, calculating word frequencies, scoring sentences based on word frequencies, and selecting the most important sentences to create a concise summary.

## Methodology

**Word Tokenization:**
- The text is divided into individual words (tokens) using spaCy's NLP pipeline.

**Word Frequency Calculation:**
- Stop words and punctuation are excluded, and the frequency of each word in the text is calculated.

**Sentence Tokenization:**
- The text is split into sentences.

**Sentence Scoring:**
- Each sentence is scored based on the cumulative frequency of the words it contains.

**Summary Generation:**
- The top 30% of sentences with the highest scores are selected to form the summary.

## Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.