{"id":33057159,"url":"https://github.com/boseongkang/newstrend","last_synced_at":"2026-05-31T06:01:57.895Z","repository":{"id":310415382,"uuid":"1039751205","full_name":"boseongkang/newstrend","owner":"boseongkang","description":"Self-correcting 5-pillar financial AI with continuous improvement cycle.  Production: boseongkang.github.io/newstrend","archived":false,"fork":false,"pushed_at":"2026-05-29T05:26:25.000Z","size":362003,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-29T06:24:22.623Z","etag":null,"topics":["financial-ai","finbert","quantitative-finance","self-correcting","sentiment-analysis","stock-price-prediction"],"latest_commit_sha":null,"homepage":"https://boseongkang.github.io/newstrend/index.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/boseongkang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":"audit_report.md","citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-17T22:55:56.000Z","updated_at":"2026-05-29T04:57:51.000Z","dependencies_parsed_at":"2025-08-18T01:09:07.408Z","dependency_job_id":"59b207f4-ce9e-4e20-b28d-44b7ca347f1a","html_url":"https://github.com/boseongkang/newstrend","commit_stats":null,"previous_names":["boseongkang/newstrend"],"tags_count":237,"template":false,"template_full_name":null,"purl":"pkg:github/boseongkang/newstrend","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boseongkang%2Fnewstrend","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boseongkang%2Fnewstrend/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boseongkang%2Fnewstrend/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boseongkang%2Fnewstrend/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/boseongkang","download_url":"https://codeload.github.com/boseongkang/newstrend/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boseongkang%2Fnewstrend/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33720897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["financial-ai","finbert","quantitative-finance","self-correcting","sentiment-analysis","stock-price-prediction"],"created_at":"2025-11-14T04:01:54.518Z","updated_at":"2026-05-31T06:01:57.859Z","avatar_url":"https://github.com/boseongkang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# news trend  \n\n\nA Python-first, package-style starter to *ingest daily US news, deduplicate them, and analyze trends*.  \nNow also includes **quick view \u0026 daily report generation**.\n\n---\n\n## Daily FinBERT routine (local, M3-MPS)\n\nGitHub Actions handles raw-news collection and the trend-site build, but FinBERT\nsentiment scoring runs locally — M3 MPS is ~52x faster than free-tier CPU and\nthe 60-day backfill that would have timed out on CI finishes in ~25 min locally.\n\n```bash\n# wrapper (auto-runs daily at 09:00 via launchd):\nbin/finbert-daily                          # default 60-day window\nWINDOW_DAYS=90 bin/finbert-daily           # custom window\n\n# direct invocation:\npython scripts/sentiment_finbert_local.py --window-days 60 --commit\n```\n\nFirst run only: pass `--setup` to create the `/tmp/newstrend-cache` worktree.\nOn subsequent runs the script auto-detects which days are missing (typically\njust today) and only scores those — usually a few seconds. The push lands on\nthe `data-cache` branch where `trend-site.yml` restores it on each CI run.\n\nThe launchd plist at `~/Library/LaunchAgents/com.newstrend.finbert.plist`\nfires `bin/finbert-daily` at 09:00 daily; logs go to\n`~/Library/Logs/newstrend-finbert.log`. Manual kickstart:\n`launchctl kickstart gui/$(id -u)/com.newstrend.finbert`.\n\n---\n\n## Quickstart  \n\n```bash\n# Create virtual environment \u0026 install dependencies\npython -m venv .venv \u0026\u0026 source .venv/bin/activate\npip install -e .\npip install pydantic\npip install timedelta\n\n# Copy env template \u0026 set NEWSAPI_KEY (optional)\ncp .env.example .env\n# edit .env and add: NEWSAPI_KEY=your_api_key\n```\n\n## Ingest today's news (RSS + NewsAPI if key present)\n`newscli ingest --country us --rss --newsapi` \n\n## Deduplicate today's file into silver dataset\n`newscli dedup --date today`\n\n### Example: look at yesterday's ingested raw newsapi\n`python src/news_trend/quickview.py --date YYYY-MM-DD --indir data --kind raw_newsapi --top 30 --min-len 3`\n\nThis will show:\n\n- total articles\n- top publishers\n- top words \u0026 bigrams\n- sample articles\n\n### Example: generate report from deduplicated (silver) data\n`python src/news_trend/report.py --date YYYY-MM-DD --indir data --kind silver_newsapi --outdir reports`\n\n### Cron Example (everyday automation)\n`5 8 * * * cd /path/to/news-trend-python-starter \u0026\u0026 . .venv/bin/activate \u0026\u0026 newscli ingest --country us --rss --newsapi \u0026\u0026 newscli dedup --date today \u0026\u0026 python src/news_trend/report.py --date $(date +\\%F) --indir data --kind silver_newsapi --outdir reports \u003e\u003e logs.txt 2\u003e\u00261`\n\n## Structure\n- src/news_trend/: Python package (ingest, dedup, quickview, report, utils)\n- data/raw/YYYY-MM-DD.jsonl: raw ingested articles\n- data/silver/YYYY-MM-DD.jsonl: cleaned \u0026 deduplicated articles\n- reports/YYYY-MM-DD.md: generated daily reports\n\n## 1. Ingest\n`newscli ingest --country us --rss --newsapi`\n\n## 2. Deduplicate\n`newscli dedup --date today`\n\n## 3. Quickview on raw data\n`python src/news_trend/quickview.py --date today --indir data --kind raw_newsapi`\n\n## 4. Generate report from deduplicated (silver) data\n`python src/news_trend/report.py --date today --indir data --kind silver_newsapi --outdir reports`\n\n\n## 08/19 update\n- Time-sliced NewsAPI ingest\n- Daily HTML report\n\n\n## Commands (daily pipeline)\n\n```bash\n# 1) Ingest (yesterday, time-sliced inside newscli / or your ingest script)\nnewscli ingest --newsapi --date yesterday\n\n# 2) Report \npython src/news_trend/report.py --date yesterday --indir data --kind raw --outdir reports --top 30\n# python src/news_trend/report.py --date yesterday --indir data --kind silver_newsapi --outdir reports --top 30\n```\n\n## 08/22 update \n### Continuous live collection using GitHub Actions (NEWSAPI)\nThis repo includes a 30-minute interval workflow that collects news data and commits them to the repo as newline-delimited JSON.\u003cbr\u003e\nFiles are written under data/live_newsapi/ with names like YYYY-MM-DDTHH-MMZ.jsonl.\n\n### Verify\n'Actions' - 'collect-live' - click recent workflow \u003cbr\u003e\nThe results are as follows \u003cbr\u003e\n[LIVE] NewsAPI -\u003e data/live_newsapi/2025-08-22T21-39Z.jsonl (n rows)\n\n## 08/30 update\n## Word Trends (cumulative)\nGenerate “top words” and 14-day trends from the deduplicated warehouse:\n\n```\npython scripts/viz_words.py \\\n  --master data/warehouse/master.jsonl \\\n  --outdir reports/words \\\n  --top 30 \\\n  --days 14 \\\n  --min-len 3 \\\n  --drop-content \\\n  --extra-stop \"chars,nbsp,amp,apos,mdash,ndash,inc,com,report,reports,shares\"\n```\n\n## What the command does\n\n- **Input**: data/warehouse/master.jsonl (all deduped articles).\n- **Window**: keeps only the most recent --days (default: 14).\n- **Text selection**: with --drop-content, only title + description are used (article body is ignored).\n- This reduces boilerplate/noise that appears in bodies and surfaces headline topics.\n- **Normalization**: lowercasing, basic cleaning, tokenization.\n- **Filtering**:\n- --min-len: drop tokens shorter than N characters (e.g., 3).\n- Stopwords = built-in list plus --extra-stop (comma-separated, case-insensitive).\n- **Outputs** (written to --outdir, e.g., reports/words/):\n- top_words.png – bar chart of the overall top N words.\n- top_words_trend.png – line chart of daily counts for those words over the last N days.\n- top_words.csv – total counts.\n- top_words_trend.csv – daily counts per word.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboseongkang%2Fnewstrend","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fboseongkang%2Fnewstrend","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboseongkang%2Fnewstrend/lists"}