https://github.com/danielstankw/wordwatchsummary

Find, scrape , summarize (LLM) and deliver!
https://github.com/danielstankw/wordwatchsummary

langchain llm ollama python scrapper summarization

Last synced: 7 months ago
JSON representation

Find, scrape , summarize (LLM) and deliver!

Host: GitHub
URL: https://github.com/danielstankw/wordwatchsummary
Owner: danielstankw
Created: 2024-05-29T06:02:18.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-07-06T20:40:25.000Z (over 1 year ago)
Last Synced: 2025-01-16T11:23:16.966Z (9 months ago)
Topics: langchain, llm, ollama, python, scrapper, summarization
Language: Python
Homepage:
Size: 175 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# "WordWatch Summary"
The goal of this Python application is to streamline the process of finding, summarizing, and sharing news articles that mention a specific word of interest. Here’s how it works:

1. **Search and Scrape:** The application searches for the specified word on Google, identifying occurrences in various articles, and collects the corresponding URLs.
2. **Summarization:** Using a chosen large language model (LLM), the application generates concise summaries of the scraped articles.
3. **Editing:** The summaries are reviewed and edited for clarity and coherence.
4. **Email Delivery:** The edited summaries are embedded into an email and sent to the user via SMTP.

*An example of email*

alt text

This application is a proof of concept and can be easily expanded.

## Installation
**a. Ollama (Local LLM)** - for details check [Video](https://www.youtube.com/watch?v=ZoxJcPkjirs&t=549s&ab_channel=MattWilliams)

To provide summarization an [Ollama](https://github.com/ollama/ollama) framework was used, together with the `gemma:2b` model.

```bashrc
docker run -d --gpus=all -v :/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```

To stop or start a container
```bashrc
docker stop ollama
docker start ollama
```

To run the `gemma:2b` model - see the [list of models](https://github.com/ollama/ollama?tab=readme-ov-file#model-library)
```basrhc
docker exec ollama ollama run gemma:2b
```

**b. Set up local SMTP (Gmail)**

Link to the [YouTube video](https://www.youtube.com/watch?v=g_j6ILT-X0k&t=592s&ab_channel=ThePyCoach)

## How to run?

Create a virtual environment based on `requirements.txt`.
Modify the `main.py` and run the script.
Make sure to define the env variables in `.env`, see the `.env.example` for the required variables.

```py
# Find pages with:
# all thses words:
search.set_all_words("")
# his exact word or phrase:
search.set_exact_phrase("Johny Bravo")
# any of these words:
search.set_any_words("")
# none of these words:
search.set_none_words("")
# numbers ranging from:
search.set_number_range("", "")
# Then narrow the results by:
# language
search.set_language("en")
# region
search.set_region("")
# last update
# d - day, w - week, m - month, y - year, all - anytime
search.set_last_update("w")
# site or domain
search.set_site_or_domain("")
# terms appearing
search.set_terms_appearing("")
# file type
search.set_file_type("")
# usage rights
search.set_usage_rights("")
```

The fields are the same as in Google Advanced Search:

alt text

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/danielstankw/wordwatchsummary

Awesome Lists containing this project

README