{"id":26105776,"url":"https://github.com/alex-ilgayev/secfeed","last_synced_at":"2025-04-12T19:21:50.183Z","repository":{"id":281343705,"uuid":"942036021","full_name":"alex-ilgayev/secfeed","owner":"alex-ilgayev","description":"AI-Powered Security Feed in Real Time","archived":false,"fork":false,"pushed_at":"2025-03-22T20:14:58.000Z","size":1982,"stargazers_count":19,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-22T21:25:09.805Z","etag":null,"topics":["ai","devsecops","golang","llm","ollama","openai","rss","security","security-feeds","threat-feeds","threat-intelligence"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alex-ilgayev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-03-03T13:20:42.000Z","updated_at":"2025-03-22T20:15:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"f15e5990-35d4-4a31-9d1d-198a2e99e495","html_url":"https://github.com/alex-ilgayev/secfeed","commit_stats":null,"previous_names":["alex-ilgayev/secfeed"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alex-ilgayev%2Fsecfeed","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alex-ilgayev%2Fsecfeed/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alex-ilgayev%2Fsecfeed/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alex-ilgayev%2Fsecfeed/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alex-ilgayev","download_url":"https://codeload.github.com/alex-ilgayev/secfeed/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248618714,"owners_count":21134285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","devsecops","golang","llm","ollama","openai","rss","security","security-feeds","threat-feeds","threat-intelligence"],"created_at":"2025-03-09T21:53:58.622Z","updated_at":"2025-04-12T19:21:50.173Z","avatar_url":"https://github.com/alex-ilgayev.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SecFeed - AI-Powered Security Feed in Real Time\n\n![](./assets/secfeed-logo.png)\n\n\u003cp align=\"center\"\u003e\n \u003ca href=\"https://golang.org/doc/go1.24\" target=\"_blank\"\u003e\n \u003cimg alt=\"Static Badge\" src=\"https://img.shields.io/badge/Go-1.24+-00ADD8.svg\"\u003e\u003c/a\u003e\n \u003ca href=\"https://opensource.org/license/apache-2-0\" target=\"_blank\"\u003e\n \u003cimg alt=\"Static Badge\" src=\"https://img.shields.io/badge/License-Apache2-yellow.svg\"\u003e\u003c/a\u003e\n \u003ca href=\"https://github.com/alex-ilgayev/secfeed/actions/workflows/build_and_push.yml\"  target=\"_blank\"\u003e\n \u003cimg src=\"https://github.com/alex-ilgayev/secfeed/actions/workflows/build_and_push.yml/badge.svg\" alt=\"Static Badge\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nSecFeed consolidates cybersecurity news and updates from multiple feeds and then leverages cutting-edge language models to deliver concise, actionable summaries. Whether you're researching new vulnerabilities or staying ahead of the latest threat landscape, SecFeed provides the clarity and insights you need—fast.\n\n![demo](./assets/demo.gif)\n\n## Table of Contents\n\n- [Main Features](#main-features)\n- [Installation](#installation)\n  - [Prerequisites](#prerequisites)\n  - [From Source](#from-source)\n  - [Using Docker](#using-docker)\n- [Configuration](#configuration)\n  - [Configuration File Structure](#configuration-file-structure)\n  - [Environment Variables](#environment-variables)\n- [Usage](#usage)\n  - [Command Line Options](#command-line-options)\n  - [Example Configs](#example-configs)\n- [Architecture](#architecture)\n  - [Flow Diagram](#flow-diagram)\n  - [Core Components](#core-components)\n- [Cost Management](#cost-management)\n- [Contributing](#contributing)\n- [Roadmap](#roadmap)\n- [License](#license)\n\n## Main Features\n\n- **Intelligent filtering** of security articles using LLM-based categorization\n- **Configurable categories** to focus on specific security domains\n- Support for **multiple LLM backends** (OpenAI or Ollama)\n- **Concise summaries** of relevant security articles\n- **Extensible RSS feed system** with built-in content extraction\n- **Slack integration** for real-time notifications\n\n## Installation\n\n### Prerequisites\n\n- Go 1.24 or higher\n- OpenAI API key or local Ollama setup\n\n### From Source\n\n```bash\n# Clone the repository\ngit clone https://github.com/alex-ilgayev/secfeed.git\ncd secfeed\n\n# Build the binary\ngo build -o secfeed ./cmd/secfeed\n\n# Optional: Install to your $GOPATH/bin\ngo install ./cmd/secfeed\n```\n\n### Using Docker\n\n```bash\n# Build the Docker image\ndocker build -t secfeed .\n\n# Run with Docker\ndocker run -v $(pwd)/config.yml:/app/config.yml \\\n-e OPENAI_API_KEY \\\n-e SLACK_WEBHOOK_URL \\\nsecfeed\n\n# Run with Pre-built Docker Image\ndocker run -v $(pwd)/config.yml:/app/config.yml \\\n-e OPENAI_API_KEY \\\n-e SLACK_WEBHOOK_URL \\\nalexilgayev/secfeed\n```\n\n## Configuration\n\nSecFeed uses a YAML configuration file to define categories for filtering articles and RSS feeds to monitor.\n\n### Configuration File Structure\n\nThe basic configuration file and all values are found [here](./config.yml).\n\n```yaml\ninit_pull: 0\n...\n\nreporting:\n  slack: false\n  ...\n\nllm:\n  client: \"openai\"\n  ...\n\ncategories:\n- name: Software Supply Chain\n  description: \u003e\n    Articles covering software supply chain security, including best practices,\n    tools, processes, and real-world case studies. Content may discuss securing\n    dependencies, preventing supply chain attacks, and maintaining the integrity\n    of software throughout its lifecycle.\n    ...\n\nrss_feed:\n- url: https://feeds.feedburner.com/TheHackersNews\n  name: The Hacker News\n  ...\n```\n\n### Environment Variables\n\n- `OPENAI_API_KEY`: Your OpenAI API key. Required when using OpenAI.\n- `OLLAMA_BASE_URL`: Base URL for Ollama API. Required when using Ollama. defaults to http://localhost:11434.\n- `SLACK_WEBHOOK_URL`: Webhook URL for Slack notifications\n\n## Usage\n\n```bash\nsecfeed --config config.yml\n```\n\n### Command Line Options\n\n```\nUsage:\n  secfeed [flags]\n\nFlags:\n  -c, --config string     config file path (default \"config.yml\")\n  -d, --debug             debug mode (prints extra context with the summarized report)\n  -h, --help              help for secfeed\n  -l, --log-file string   log file path\n  -v, --verbose           verbose output\n```\n\n### Example Configs\n\n**Running all articles from the last 7 days**\n\n```yaml\ninit_pull: 7\n```\n\n**Running on OpenAI models**\n\n```yaml\nllm:\n  client: \"openai\"\n  classification:\n    engine: \"llm\"\n    model: \"gpt-4o-mini\"\n    threshold: 8\n  summary:\n    model: \"gpt-4o\"\n```\n\n**Running on local Ollama models**\n\n```yaml\nllm:\n  client: \"ollama\"\n  classification:\n    engine: \"llm\"\n    model: \"llama3.2\"\n    threshold: 7\n  summary:\n    model: \"llama3.2\"\n```\n\n**Running classification engine based on text embeddings**\n\nText embeddings-based classification is still **a work in progress**, and more optimization is needed to make the results more reliable.\n\n```yaml\nllm:\n  client: \"openai\"\n  classification:\n    engine: \"embeddings\"\n    model: \"text-embedding-3-large\"\n    threshold: 4.2\n  summary:\n    model: \"gpt-4o\"\n```\n\n## Architecture\n\nSecFeed is designed with modularity in mind, separating components into distinct packages:\n\n### Flow Diagram\n\n```\n\n    ┌─────────────┐   ┌────────────────┐   ┌──────────┐\n    │ RSS Sources ├──►│ Feed Fetcher   ├──►│ Enricher │\n    └─────────────┘   └────────────────┘   └────┬─────┘\n                                                │\n                                                ▼\n     ┌──────────┐       ┌────────────┐     ┌────────────────┐\n     │ Slack    │◄──────┤ Summary    │◄────┤ Classification │\n     └──────────┘       └──────┬─────┘     └───────┬────────┘\n                               │                   │\n                               ▼                   ▼\n                        ┌───────────────────────────────────┐\n                        │ LLM Client (OpenAI / Ollama)      │\n                        └───────────────────────────────────┘\n\n```\n\n### Core Components\n\n1. **Feed Fetcher** ([`feed`](./pkg/feed/))\n\n   - Fetches articles from RSS feeds.\n   - Provides a stream of articles for processing.\n   - This can be extended to include additional sources, such as LinkedIn and Twitter.\n\n2. **Enricher** ([`feed.enrichArticleItem`](./pkg/feed/feed.go))\n\n   - Content is usually missing from RSS feeds.\n   - Smartly fetches the content from the blog.\n   - Adding browser-like headers to avoid being blocked.\n   - Extracts the relevant information from the HTML.\n   - Using [go-readability](https://github.com/go-shiori/go-readability) for the text cleaning task.\n\n3. **Classification Engine** ([`classification`](./pkg/classification))\n\n   - Analyzes articles for relevance using LLM or embeddings.\n   - Scores articles against user-defined categories.\n   - Filters out irrelevant content based on threshold.\n\n4. **LLM Client** ([`llm`](./pkg/llm/))\n\n   - Abstracts interaction with language model providers.\n   - Supports OpenAI API and Ollama.\n   - Handles prompt engineering, result analysis, and input chunking.\n   - Tracking costs only for OpenAI implementations.\n\n5. **Summary Engine** ([`llm.Summarize`](./pkg/llm/client.go))\n\n   - Provides a summary of the article and relevant action items.\n\n6. **Slack** ([`slack`](./pkg/slack/))\n\n   - Articles are formatted for Slack.\n   - Sends webhook notifications.\n\n### Classification Engine\n\nThere are currently two classification methods that can be configured through the `llm.classification.engine` config value.\n\n1. **LLM Classification**\n\n   - Evaluation of article relevance is based on direct LLM queries. Provide the categories and their descriptions to the prompt, and ask for a relevance score between 0 and 10.\n   - Provides detailed explanations for classifications\n   - Higher accuracy but more token usage. See cost estimations [here](./README.md#cost-management).\n\n2. **Embeddings Classification**\n   - Uses vector embeddings to match articles to categories.\n   - Pre-encode each category.\n   - Calculates the relation between each category and the article.\n   - More efficient for token usage\n   - Can't be trusted at the moment. Still WIP.\n\n## Cost Management\n\n**Cost Considerations**\n\nThe project was designed with cost efficiency in mind, following these design principles:\n\n- Separate Models for Classification and Summarization. Since classification is more frequent, a less capable (cheaper) model is used for that task, while a more advanced model is reserved for summarization.\n  Strict Input Text Limits: This helps control token usage and prevent costs from skyrocketing.\n  Lightweight Infrastructure: The program can run on the smallest VM offered by most cloud providers, keeping infrastructure costs low.\n  Local Ollama Setup: Running locally avoids external service fees, such as those from OpenAI.\n\n**Cost Estimations**\n\nWhen run in verbose mode (`-v`), every OpenAI API call also logs its running total cost. This is implemented in [`openai/client.go`](./pkg/llm/openai/client.go) using a static map of per-model pricing (based on OpenAI’s rates) and token usage per call. An example log entry might look like:\n\n```bash\n2025-03-10 21:41:30 [debu] [model:gpt-4o-2024-08-06] [tokens:2004] [total_cost:0.01585945] OpenAI API CreateChatCompletion call\n```\n\nFrom a simple calculation, with 33 feeds, 8 categories, ~20 articles per day, and 7 being relevant, the daily cost comes to about $0.00296, or $2.13 per month. Reducing the number of categories or making their descriptions shorter can easily cut costs by an additional 30–40%.\n\n## Contributing\n\nContributions are welcome! See the [Roadmap](README.md#roadmap) in the README for planned features. Please feel free to submit pull requests or open issues for bugs and feature requests.\n\n## Roadmap\n\n- Currently we aren't verifying the config object created in [config package](./pkg/config/config.go). Need to add basic verification, and default values for optional field.\n- Support reddit feeds. Although each Reddit channel has a dedicated `.rss` endpoint that provides an RSS feed, most of the content is redirected. So it doesn't make send to parse the content we get from the RSS field, and currently our url fetcher fails to get the content directly by querying Reddit through a GET request.\n- More tests for all the components.\n- Add option for a non-RSS feed based on LinkedIn or Twitter. Should receieve hashtags in configuration, and query these for new content.\n- Add ability to extract redirected URL within article contents, that contains the main content. For example, Reddit feeds, or articles that just repost an existing article. Maybe can be implemented by providing the LLM a tool that when it get's an URL that is important, to query and get the data from it.\n- Add special case for [tl;dr sec](https://tldrsec.com/). It supports RSS feed, but contains many articles within a single item, so it should get a special treatment.\n- Once we find the source URL from articles, we may do some caching for when several articles repost the same originating article. In short, caching.\n- Ability to bypass Cloudflare when fetching URLs - needed for some feeds. Example in [`config.yml`](./config.yml)\n- Mature Embedding offering\n    - Automatically generate similar words for improving the precision\n- Handle rate limiting when fetching feeds\n- Create a sanity CI flow to allow checking baisc functionality in pull requests.\n- More RSS feeds\n\n## License\n\n[Apache-2.0 License](./LICENSE)\n\n---\n\nBuilt with ❤️ for the security community\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falex-ilgayev%2Fsecfeed","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falex-ilgayev%2Fsecfeed","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falex-ilgayev%2Fsecfeed/lists"}