An open API service indexing awesome lists of open source software.

https://github.com/bebofekry/octobot-smart-summarizer

Scrapping and summaryzing text, pdf and text document files, web contents, LinkedIn posts, pdf, and YouTube videos content.
https://github.com/bebofekry/octobot-smart-summarizer

beautifulsoup chatbot gemini langchain large-language-models llm nlp octobot python rag retrieval-augmented-generation streamlit webscraping youtube youtube-api

Last synced: 3 months ago
JSON representation

Scrapping and summaryzing text, pdf and text document files, web contents, LinkedIn posts, pdf, and YouTube videos content.

Awesome Lists containing this project

README

          

# Octobot - Smart Summarizer

Scrapping and summaryzing text, pdf and text document files, web contents, LinkedIn posts, pdf, and YouTube videos content.

**Try The Project**:
[Link](https://octobot.streamlit.app/)

## Key Points
- Scrapping web pages to get web content.
- Scrapping YouTube videos links to get text subtitles in Arabic or English languages.
- Scrapping text files (PDF & Text) to get text content.
- Summarizing the text contents using smart chatbot with message history based on LLM model (Google Gemini) using Langchain, focus on the important notes, and adding Q/A.
- Chatbot can talking to the users, summarizing text messages, answering questions on the summarized contents, and can help users to use the web page explaining step by step.
- Designed a user friendly graphical interface using Streamlit.

---

## 🧠 Tech Stack

- **Graphical Interface**: Streamlit
- **Backend**: Python
- **LLM**: Google Gemini (gemini-2.5-flash)
- **Other Libraries**: langchain, sentence_transformers, beautiful_soup, youtube_transcript_api, validators.

---

## 🚀 Getting Started

### 🔧 Prerequisites

Install required packages:
`pip install -r requirements.txt`

Run the app:
`streamlit run app.py`

---

## Screenshots
🎬 [Watch the Demo Video](https://drive.google.com/file/d/114VXBHbowapdFN8XHSlGwLOKbcWkFiJ9/view?usp=sharing)

---

## Contact

Developed by Abdallah Fekry

📧 abdallahfekry95@gmail.com

🌐 [LinkedIn](https://www.linkedin.com/in/abdallah-fekry) | [GitHub](https://github.com/BeboFekry?tab=repositories)