{"id":49563380,"url":"https://github.com/devsenweb/ai-news-aggregator-app","last_synced_at":"2026-05-03T10:47:41.307Z","repository":{"id":293148655,"uuid":"982992752","full_name":"devsenweb/ai-news-aggregator-app","owner":"devsenweb","description":"A Python-powered news aggregation system that collects articles from RSS feeds, classifies them into topics using semantic similarity, summarizes content, deduplicates similar articles, and stores everything in Firebase Firestore with timeline organization.","archived":false,"fork":false,"pushed_at":"2025-05-13T22:10:14.000Z","size":19,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-03T10:47:33.915Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devsenweb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-13T17:59:01.000Z","updated_at":"2025-05-13T22:10:17.000Z","dependencies_parsed_at":"2025-05-14T00:09:02.806Z","dependency_job_id":"e79c2c51-b54b-4d67-94b8-f26b059c55ba","html_url":"https://github.com/devsenweb/ai-news-aggregator-app","commit_stats":null,"previous_names":["devsenweb/ai-news-aggregator-app"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/devsenweb/ai-news-aggregator-app","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devsenweb%2Fai-news-aggregator-app","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devsenweb%2Fai-news-aggregator-app/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devsenweb%2Fai-news-aggregator-app/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devsenweb%2Fai-news-aggregator-app/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devsenweb","download_url":"https://codeload.github.com/devsenweb/ai-news-aggregator-app/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devsenweb%2Fai-news-aggregator-app/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32566444,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-03T06:36:36.687Z","status":"ssl_error","status_checked_at":"2026-05-03T06:36:09.306Z","response_time":103,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-03T10:47:40.471Z","updated_at":"2026-05-03T10:47:41.290Z","avatar_url":"https://github.com/devsenweb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI News Aggregator\n\nA Python-based news aggregation system that fetches, classifies, and organizes news articles into topic-based timelines, storing them in Firebase Firestore.\n\n## Features\n\n- Fetches news articles from multiple RSS feeds\n- Classifies articles into topics using semantic similarity\n- Removes duplicate or near-duplicate articles\n- Generates concise summaries of articles\n- Organizes articles into chronological timelines by topic\n- Stores results in Firebase Firestore with a structured schema\n- Command-line interface for easy execution\n\n## Installation\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/yourusername/ai-news-aggregator.git\n   cd ai-news-aggregator\n   ```\n\n2. Create and activate a virtual environment:\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n   ```\n\n3. Install the dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n4. Set up environment variables:\n   - Copy `.env.example` to `.env`\n   - Update the values in `.env` with your Firebase credentials and other settings\n\n## Firebase Setup\n\n1. Create a new Firebase project at [Firebase Console](https://console.firebase.google.com/)\n2. Enable Firestore Database\n3. Go to Project Settings \u003e Service Accounts\n4. Generate a new private key and save it as `firebase-credentials.json` in the project root\n5. Update the `FIREBASE_CREDENTIALS_PATH` in `.env` to point to this file\n6. Get your Firebase database URL from Project Settings \u003e General \u003e Your Apps \u003e Firebase SDK snippet\n\n## Usage\n\n### Basic Usage\n\n```bash\npython -m ai_news_aggregator.cli run \\\n  --rss-feeds \"https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml,http://feeds.bbci.co.uk/news/rss.xml\" \\\n  --firebase-credentials path/to/your/firebase-credentials.json \\\n  --firebase-db-url \"https://your-project-id.firebaseio.com\"\n```\n\n### Command Line Options\n\n```\nOptions:\n  --rss-feeds TEXT          Comma-separated list of RSS feed URLs  [required]\n  --firebase-credentials PATH\n                           Path to Firebase credentials JSON file  [required]\n  --firebase-db-url TEXT    Firebase database URL  [required]\n  --max-articles INTEGER    Maximum number of articles to process  [default: 50]\n  --similarity-threshold FLOAT\n                           Similarity threshold for topic classification (0.0 to 1.0)  [default: 0.75]\n  --dedupe-threshold FLOAT  Similarity threshold for deduplication (0.0 to 1.0)  [default: 0.85]\n  --summary-length INTEGER  Maximum length of article summaries  [default: 150]\n  --dry-run                 Process articles but do not upload to Firebase  [default: False]\n  --help                   Show this message and exit.\n```\n\n### Dry Run\n\nTo test the aggregator without uploading to Firebase:\n\n```bash\npython -m ai_news_aggregator.cli run \\\n  --rss-feeds \"https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml\" \\\n  --firebase-credentials path/to/your/firebase-credentials.json \\\n  --firebase-db-url \"https://your-project-id.firebaseio.com\" \\\n  --dry-run\n```\n\n## Project Structure\n\n```\nai-news-aggregator/\n├── ai_news_aggregator/\n│   ├── __init__.py\n│   ├── cli.py               # Command-line interface\n│   ├── deduplicator.py      # Article deduplication logic\n│   ├── firebase_service.py  # Firebase Firestore interactions\n│   ├── news_fetcher.py      # RSS feed fetching and parsing\n│   ├── summarizer.py        # Article summarization\n│   └── topic_classifier.py  # Topic classification\n├── tests/                   # Unit tests\n├── .env.example            # Example environment variables\n├── .gitignore\n├── README.md\n└── requirements.txt        # Python dependencies\n```\n\n## Configuration\n\nEdit the `.env` file to configure the application:\n\n- `FIREBASE_CREDENTIALS_PATH`: Path to your Firebase service account JSON file\n- `FIREBASE_DATABASE_URL`: Your Firebase database URL\n- `RSS_FEEDS`: Comma-separated list of RSS feed URLs\n- `SIMILARITY_THRESHOLD`: Threshold for topic classification (0.0 to 1.0)\n- `SUMMARY_MAX_LENGTH`: Maximum length of generated summaries\n\n## License\n\nMIT License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevsenweb%2Fai-news-aggregator-app","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevsenweb%2Fai-news-aggregator-app","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevsenweb%2Fai-news-aggregator-app/lists"}