https://github.com/harrydulaney/news-feed-scraper
Configurable and schedulable web scrapping tool. Used to extract raw article content and metadata for aggregated news feeds.
https://github.com/harrydulaney/news-feed-scraper
content-extraction java-web-scraper news-feed news-feed-provider newsscraper scraper scraperapi web-automation webscraper
Last synced: over 1 year ago
JSON representation
Configurable and schedulable web scrapping tool. Used to extract raw article content and metadata for aggregated news feeds.
- Host: GitHub
- URL: https://github.com/harrydulaney/news-feed-scraper
- Owner: HarryDulaney
- Created: 2022-12-26T20:00:09.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-01-02T02:55:33.000Z (over 3 years ago)
- Last Synced: 2025-01-04T20:47:40.526Z (over 1 year ago)
- Topics: content-extraction, java-web-scraper, news-feed, news-feed-provider, newsscraper, scraper, scraperapi, web-automation, webscraper
- Language: Java
- Homepage:
- Size: 6.57 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: ReadMe.md
Awesome Lists containing this project
README
# News-Feed-Scraper
### This is a web crawler, scraper, and extractor used internally at SessionApi for extracting financial news articles which are subsequently aggregated on some of our web products. See
### Configure what crawl and what to extract
### Execute on Schedule
To schedule the crawler to run automatically.
Define a cron expression in your application.yml.