https://github.com/rayan2162/extractive_text_summarization_using_tf-idf
Artificial Intelligence Laboratory (6th semester) course's project.
https://github.com/rayan2162/extractive_text_summarization_using_tf-idf
artificial-intelligence extractive-text-summarization nlp text-summarization tf-idf web-scraping
Last synced: 6 months ago
JSON representation
Artificial Intelligence Laboratory (6th semester) course's project.
- Host: GitHub
- URL: https://github.com/rayan2162/extractive_text_summarization_using_tf-idf
- Owner: rayan2162
- License: mit
- Created: 2024-04-30T12:07:24.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-20T18:01:31.000Z (about 1 year ago)
- Last Synced: 2025-04-02T20:22:08.443Z (6 months ago)
- Topics: artificial-intelligence, extractive-text-summarization, nlp, text-summarization, tf-idf, web-scraping
- Language: Jupyter Notebook
- Homepage:
- Size: 6.99 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Extractive Text Summarization using TF-IDF
**Artificial Intelligence Laboratory course of the 6th semester's project.**
This project focuses on creating an extractive text summarization model using Term Frequency-Inverse Document Frequency (TF-IDF) to generate concise summaries from large textual datasets.
.## Project Structure
### 1. Code Folder
- `tf_idf_built_in_function.ipynb`: Implementation of the TF-IDF algorithm using built-in Python functions.
- `tf_idf_raw_code.ipynb`: Manual implementation of the TF-IDF algorithm from scratch.### 2. Final Report Folder
- `Images Folder`:
- `output.png`: The output of the summarization process.
- `process_flow.png`: A visual representation of the process flow.
- `ai_lab_final_report`: The final report available in `.docx`, `.pdf`, and `.zip` formats for **LaTeX**.### 3. Preprocessing Folder
- `Generated Text Data Folder`:
- `bangladesh_small.txt`: Sample text data used in the project.
- `text_pre_processing.ipynb`: Jupyter notebook for preprocessing text data.
- `web_scraper.ipynb`: Jupyter notebook for scraping text data from the web.### 4. Presentation Folder
- `final_presentation.pptx`: The final presentation for the project.
### 5. Project Proposal Folder
- `ai_proposal_cover`: The cover page of the project proposal in `.docx` and `.pdf` formats.
- `ai_proposal_main`: The main content of the project proposal in `.docx` and `.pdf` formats.### 6. Reference Materials
- `reference_pdf.zip`: A collection of research papers and other reference materials for convenience.
**Note:** The `reference_pdf.zip` file contains some research papers that were downloaded in PDF format for convenience and were used for this project.
- These PDFs and materials may be subject to copyright.
- I do not own these materials nor do I have permission to distribute them.
- They are provided solely for educational purposes, to facilitate access to reference papers.
- **Please cite these sources appropriately if you use them.**## How to Use
1. **Code Execution:**
- The code for the project is located in the `Code` folder.
- Use `tf_idf_raw_code.ipynb` to explore the raw implementation.
- Use `tf_idf_built_in_function.ipynb` for a version using built-in functions.2. **Preprocessing:**
- The `Preprocessing` folder contains the scripts used to clean and preprocess the text data.
- `text_pre_processing.ipynb` handles text data cleaning.
- `web_scraper.ipynb` is used to scrape data from web sources.3. **Final Report:**
- The `Final Report` folder contains the final documentation of the project.
- You can find `output.png` and `process_flow.png` in the `Images` folder.
- The final report is available in `.docs`, `.pdf`, and `.zip` (for **LaTeX**) formats.4. **Presentation:**
- The `Presentation` folder includes `final_presentation.pptx` which summarizes the project for presentations.5. **Project Proposal:**
- The `Project Proposal` folder contains the proposal documents in both `.docx` and `.pdf` formats.## Example Usage
```text
Enter size of your summary: 33 lines sized summary:
Sentence: Russell's viper (Daboia russelii) is responsible for nearly half of snakebites in neighboring India, but in Bangladesh, where it’s known as chandra bora, it was thought to be an exceedingly rare species for more than a century.
Sentence: Hospitals in rural Bangladesh have reported an increase in people being bitten by snakes, especially by the Russell's viper, which is found in South Asia.
Sentence: A series of stories have been making rounds on social media, of people dying in different parts of Bangladesh from the bite of the Russell's viper, a venomous snake.
```## Contributing
Feel free to fork this repository, create a new branch, and submit pull requests.
## License
This project is open-source and available under the MIT License.