https://github.com/mariosieg/krauler
https://github.com/mariosieg/krauler
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/mariosieg/krauler
- Owner: MarioSieg
- Created: 2021-04-19T13:45:06.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2021-12-22T14:14:52.000Z (over 4 years ago)
- Last Synced: 2025-06-14T22:43:49.037Z (about 1 year ago)
- Language: C#
- Size: 8.5 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Krauler
Webcrawling using public available proxies for several websites.
## Required Drivers
- [Mozilla Geckodriver] (https://github.com/mozilla/geckodriver/releases) in `C:\git\krauler\Krauler\bin\Debug\net5.0`
- [Google Chromedriver] (https://chromedriver.chromium.org/downloads) in `C:\git\krauler\Krauler\bin\Debug\net5.0`
## Tested Crawlers
- Youtube
- Google
## Folder Structure
- `/Config`: Selenium config files per crawler
- `/Resources`: Hardcoded definitions for proxies and user agents
- `/Logs`: The logs are stored here
- `/Crawlers`: The crawlers are implemented here