Projects in Awesome Lists tagged with scrapy

https://github.com/crawlab-team/crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

crawlab crawler crawling-tasks docker go platform scrapy scrapyd-ui spider spiders-management web-crawler webcrawler webspider

Last synced: 14 May 2025

https://github.com/lining0806/pythonspidernotes

Python入门网络爬虫之精华版

captcha cookie python scrapy selenium wechat zhihu

Last synced: 14 May 2025

https://github.com/lining0806/PythonSpiderNotes

Python入门网络爬虫之精华版

captcha cookie python scrapy selenium wechat zhihu

Last synced: 29 Apr 2025

https://github.com/Chyroc/WechatSogou

基于搜狗微信搜索的微信公众号爬虫接口

crawler pypi python scrapy sogou wechat

Last synced: 15 May 2025

https://github.com/chyroc/wechatsogou

基于搜狗微信搜索的微信公众号爬虫接口

crawler pypi python scrapy sogou wechat

Last synced: 13 May 2025

https://github.com/chyroc/WechatSogou

基于搜狗微信搜索的微信公众号爬虫接口

crawler pypi python scrapy sogou wechat

Last synced: 28 Mar 2025

https://github.com/rmax/scrapy-redis

Redis-based components for Scrapy.

crawler distributed redis scrapy

Last synced: 18 Dec 2025

https://github.com/SpiderClub/haipproxy

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

crawler distributed high-availability ipproxy redis scheduler scrapy spider

Last synced: 26 Mar 2025

https://github.com/spiderclub/haipproxy

:sparkling_heart: High available distributed ip proxy pool, powerd by Scrapy and Redis

crawler distributed high-availability ipproxy redis scheduler scrapy spider

Last synced: 14 May 2025

https://github.com/dropsdevopsorg/ecommercecrawlers

实战🐍多种网站、电商数据爬虫🕷。包含🕸：淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:

alitask baidu baidu-tieba baotu boss crawler ctrip dazhong-spider douban-movie douban-music fofa lagou python3 quanjing scrapy sohu taobao-spider wechat xianyu zhilianzhaopin

Last synced: 18 Oct 2025

https://github.com/DropsDevopsOrg/ECommerceCrawlers

实战🐍多种网站、电商数据爬虫🕷。包含🕸：淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:

alitask baidu baidu-tieba baotu boss crawler ctrip dazhong-spider douban-movie douban-music fofa lagou python3 quanjing scrapy sohu taobao-spider wechat xianyu zhilianzhaopin

Last synced: 14 Mar 2025

https://github.com/nghuyong/weibospider

持续维护的新浪微博采集工具🚀🚀🚀

python scrapy weibo weibospider

Last synced: 10 Apr 2025

https://github.com/nghuyong/WeiboSpider

持续维护的新浪微博采集工具🚀🚀🚀

python scrapy weibo weibospider

Last synced: 26 Mar 2025

https://github.com/gerapy/gerapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

dashboard distributed django docker gerapy scrapy scrapyd spider vue vuejs webspider

Last synced: 12 May 2025

https://github.com/Gerapy/Gerapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

dashboard distributed django docker gerapy scrapy scrapyd spider vue vuejs webspider

Last synced: 14 Mar 2025

https://github.com/my8100/scrapydweb

Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. Docs 文档 :point_right:

dashboard log-analysis log-parsing scrapy scrapy-log-analysis scrapy-visualization scrapyd scrapyd-admin scrapyd-api scrapyd-cluster-management scrapyd-control scrapyd-keeper scrapyd-log-analysis scrapyd-manage scrapyd-monitor scrapyd-ui scrapyd-visualization spider

Last synced: 23 Apr 2025

https://github.com/boris-code/feapder

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单，功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度

crawler feapder feaplat python scrapy spider

Last synced: 14 May 2025

https://github.com/scrapy-plugins/scrapy-splash

Scrapy+Splash for JavaScript integration

headless-browsers scrapy

Last synced: 17 Dec 2025

https://github.com/wkunzhi/python3-spider

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

crawl crawler dianping geek meituan pyppeteer python scrapy scrapy-crawler selenium spider splash taobao

Last synced: 15 May 2025

https://github.com/luckyzxl2016/movie_recommend

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

hadoop hive mysql nginx scala scrapy spark-mllib spark-streaming ssm-maven

Last synced: 15 May 2025

https://github.com/LuckyZXL2016/Movie_Recommend

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

hadoop hive mysql nginx scala scrapy spark-mllib spark-streaming ssm-maven

Last synced: 26 Mar 2025

https://github.com/dormymo/spiderkeeper

admin ui for scrapy/open source scrapinghub

dashboard scrapy scrapy-ui scrapyd scrapyd-dashboard scrapyd-ui spider

Last synced: 14 May 2025

https://github.com/DormyMo/SpiderKeeper

admin ui for scrapy/open source scrapinghub

dashboard scrapy scrapy-ui scrapyd scrapyd-dashboard scrapyd-ui spider

Last synced: 26 Mar 2025

https://github.com/wkunzhi/Python3-Spider

Python爬虫实战 - 模拟登陆各大网站包含但不限于：滑块验证、拼多多、美团、百度、bilibili、大众点评、淘宝，如果喜欢请start ❤️

crawl crawler dianping geek meituan pyppeteer python scrapy scrapy-crawler selenium spider splash taobao

Last synced: 14 May 2025

https://github.com/Boris-code/feapder

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单，功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度

crawler feapder feaplat python scrapy spider

Last synced: 28 Mar 2025

https://github.com/qianyantech/image-downloader

Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.

baidu bing google google-images image-downloader pyqt scrapy spider

Last synced: 15 May 2025

https://github.com/QianyanTech/Image-Downloader

Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.

baidu bing google google-images image-downloader pyqt scrapy spider

Last synced: 14 Apr 2025

https://github.com/librauee/reptile

🏀 Python3 网络爬虫实战（部分含详细教程）猫眼腾讯视频豆瓣研招网微博笔趣阁小说百度热点 B站 CSDN 网易云阅读阿里文学百度股票今日头条微信公众号网易云音乐拉勾有道 unsplash 实习僧汽车之家英雄联盟盒子大众点评链家 LPL赛程台风梦幻西游、阴阳师藏宝阁天气牛客网百度文库睡前故事知乎 Wish

python3 requests scrapy spider

Last synced: 15 May 2025

https://github.com/thewebscrapingclub/webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

playwright python scrapy scrapy-spider scrapysplash webscraping

Last synced: 08 Apr 2025

https://github.com/TheWebScrapingClub/webscraping-from-0-to-hero

The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

playwright python scrapy scrapy-spider scrapysplash webscraping

Last synced: 14 Mar 2025

https://github.com/kkoooqq/fakebrowser

🤖 Fake fingerprints to bypass anti-bot systems. Simulate mouse and keyboard operations to make behavior like a real person.

anti-bot-detection anti-fingerprinting automation bot browser-fingerprint cheat crawler fake headless puppeteer puppeteer-extra puppeteer-extra-plugin scrapy spoof stealth

Last synced: 13 Mar 2025

https://github.com/eliasdabbas/advertools

advertools - online marketing productivity and analysis tools

advertising adwords digital-marketing google-ads keywords log-analysis logfile-parser marketing online-marketing python robots-txt scrapy search-engine-marketing search-engine-optimization seo seo-crawler serp social-media twitter-api youtube

Last synced: 13 May 2025

https://github.com/istresearch/scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

distributed kafka python redis scraping scrapy

Last synced: 14 May 2025

https://github.com/holgerd77/django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

django python scraper scraping scrapy spider webscraping

Last synced: 15 May 2025

https://github.com/juancarlospaco/faster-than-requests

Faster requests on Python 3

curl cython download-file faster-than-requests high-performance http-requests ndjson open-data python python-library python-requests python3 requests-toolbelt requests3 scrapy speed urllib urllib3 web-scraper web-scraping

Last synced: 14 May 2025

https://github.com/bytebuff/JSpider

JSpider会每周更新至少一个网站的JS解密方式，欢迎 Star，交流微信：13298307816

javascript nodejs python3 scrapy spider

Last synced: 30 Mar 2025

https://github.com/bytebuff/jspider

JSpider会每周更新至少一个网站的JS解密方式，欢迎 Star，交流微信：13298307816

javascript nodejs python3 scrapy spider

Last synced: 12 Apr 2025

https://github.com/moyada/stealer

抖音、快手、火山、皮皮虾，视频去水印程序

bilibili bilibili-download douyin douyin-download kuaishou pipixia scrapy

Last synced: 04 Jul 2025

https://github.com/jonbakerfish/TweetScraper

TweetScraper is a simple crawler/spider for Twitter Search without using API

scrapy tweets twitter twitter-search

Last synced: 17 Apr 2025

https://github.com/vifreefly/kimuraframework

Kimurai is a modern web scraping framework written in Ruby which works out of box with Headless Chromium/Firefox, PhantomJS, or simple HTTP requests and allows to scrape and interact with JavaScript rendered websites

crawler headless-chrome kimurai scraper scrapy

Last synced: 14 May 2025

https://github.com/scrapy-plugins/scrapy-playwright

🎭 Playwright integration for Scrapy

chrome-headless firefox-headless hacktoberfest headless-browser javascript-renderer playwright playwright-python python python-asyncio python3 scrapy webkit-headless

Last synced: 14 May 2025

https://github.com/xingag/spider_python

python爬虫

bs4 python python3 requests scrapy urllib xpath

Last synced: 27 Apr 2025

https://github.com/clemfromspace/scrapy-selenium

Scrapy middleware to handle javascript pages using selenium

crawling scrapy selenium

Last synced: 14 May 2025

https://github.com/mtianyan/FunpySpiderSearchEngine

Word2vec 千人千面个性化搜索 + Scrapy2.3.0(爬取数据) + ElasticSearch7.9.1(存储数据并提供对外Restful API) + Django3.1.1 搜索

django elasticsearch elasticsearch-analysis-ik lagou mysql python redis scrapy search-engine spider zhihu

Last synced: 14 Mar 2025

https://github.com/alanchn31/data-engineering-projects

Personal Data Engineering Projects

airflow aws-redshift cassandra data-engineering data-engineering-nanodegree data-lake data-modeling data-warehouse ingest-data mongodb postgres scrapy spark star-schema

Last synced: 12 Apr 2025

https://github.com/eracle/linkedin

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

bot chromium-browser docker docker-compose linkedin scraper scraping scrapy selenium-webdriver

Last synced: 15 May 2025

https://github.com/hellock/icrawler

A multi-thread crawler framework with many builtin image crawlers provided.

bing-image crawler flickr-api google-images python scrapy spider

Last synced: 14 May 2025

https://github.com/scrapinghub/scrapyrt

HTTP API for Scrapy spiders

crawler crawling hacktoberfest hacktoberfest2021 python scraper scrapy twisted webcrawler webcrawling

Last synced: 15 May 2025

https://github.com/alanchn31/Data-Engineering-Projects

Personal Data Engineering Projects

airflow aws-redshift cassandra data-engineering data-engineering-nanodegree data-lake data-modeling data-warehouse ingest-data mongodb postgres scrapy spark star-schema

Last synced: 16 Apr 2025

https://github.com/morvanzhou/easy-scraping-tutorial

Simple but useful Python web scraping tutorial code.

asyncio beautifulsoup crawler crawling distributed-scraper regex requests scraping scrapy urllib

Last synced: 16 May 2025

https://github.com/MorvanZhou/easy-scraping-tutorial

Simple but useful Python web scraping tutorial code.

asyncio beautifulsoup crawler crawling distributed-scraper regex requests scraping scrapy urllib

Last synced: 07 Sep 2025

https://github.com/kezhenxu94/house-renting

Possibly the best practice of Scrapy 🕷 and renting a house 🏡

docker python scrapy scrapy-crawler scrapy-spider scrapyd

Last synced: 11 Jul 2025

https://github.com/lb2281075105/python-spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

beautifulsoup4 crawlspider django itchat mongodb mysql pymysql python redis scrapy selenium spider weichat xpath

Last synced: 12 Apr 2025

https://github.com/lb2281075105/Python-Spider

豆瓣电影top250、斗鱼爬取json数据以及爬取美女图片、淘宝、有缘、CrawlSpider爬取红娘网相亲人的部分基本信息以及红娘网分布式爬取和存储redis、爬虫小demo、Selenium、爬取多点、django开发接口、爬取有缘网信息、模拟知乎登录、模拟github登录、模拟图虫网登录、爬取多点商城整站数据、爬取微信公众号历史文章、爬取微信群或者微信好友分享的文章、itchat监听指定微信公众号分享的文章

beautifulsoup4 crawlspider django itchat mongodb mysql pymysql python redis scrapy selenium spider weichat xpath

Last synced: 14 Mar 2025

https://github.com/TeamHG-Memex/scrapy-rotating-proxies

use multiple proxies with Scrapy

proxy scrapy

Last synced: 12 May 2025

https://github.com/teamhg-memex/scrapy-rotating-proxies

use multiple proxies with Scrapy

proxy scrapy

Last synced: 15 May 2025

https://github.com/tb0hdan/domains

World’s single largest Internet domains dataset

colly dataset internet-domains scrapy search-engines yacy

Last synced: 14 Apr 2025

https://github.com/ramsayleung/jd_spider

Two dumb distributed crawlers

docker graphite mongodb python3 scrapy

Last synced: 28 Sep 2025

https://github.com/alecxe/scrapy-fake-useragent

Random User-Agent middleware based on fake-useragent

python scrapy web-scraping

Last synced: 15 May 2025

https://github.com/rugantio/fbcrawl

A Facebook crawler

crawl crawler facebook python scraper scrapy spider

Last synced: 07 Apr 2025

https://github.com/alltheplaces/alltheplaces

A set of spiders and scrapers to extract location information from places that post their location on the internet.

geojson hacktoberfest python scrapers scrapy spider

Last synced: 27 Jul 2025

https://github.com/turboway/spiderman

基于 scrapy-redis 的通用分布式爬虫框架

hbase hive kafka rdbm scapy-redis scrapy spiderman

Last synced: 04 Apr 2025

https://github.com/TurboWay/spiderman

基于 scrapy-redis 的通用分布式爬虫框架

hbase hive kafka rdbm scapy-redis scrapy spiderman

Last synced: 25 Mar 2025

https://github.com/mouday/spider-admin-pro

spider-admin-pro 一个集爬虫Scrapy+Scrapyd爬虫项目查看和爬虫任务定时调度的可视化管理工具，SpiderAdmin的升级版

python3 scrapy scrapyd spider

Last synced: 14 May 2025

https://github.com/spekulatius/phpscraper

A universal web-util for PHP.

beautifulsoup chromium headless-chrome php php-crawler php-scraper php-spider php-spiders puppeteer pyppeteer scraper scraping scraping-websites scrapy web-scraper web-scraping

Last synced: 15 May 2025

https://github.com/spekulatius/PHPScraper

A universal web-util for PHP.

beautifulsoup chromium headless-chrome php php-crawler php-scraper php-spider php-spiders puppeteer pyppeteer scraper scraping scraping-websites scrapy web-scraper web-scraping

Last synced: 14 Mar 2025

https://github.com/hopetree/e-commerce-crawlers

:rocket:电商网站爬虫合集，淘宝京东亚马逊等

pymongo pymysql python scrapy selenium

Last synced: 05 Apr 2025

https://github.com/abhisharma404/vault

swiss army knife for hackers

crawler fuzzing hacking hacking-tool information-gathering lfi networking offensive-security osint pentesting port-scanner python rfi scanner scrapy security sqlite ssl-inspection vault xss-vulnerability

Last synced: 02 Apr 2025

https://github.com/AlexMathew/scrapple

A framework for creating semi-automatic web content extractors

beautifulsoup crawler css-selector extractor lxml python scrapers scraping scrapy selector selector-expression tutorial web-scraper web-scraping xpath-expression

Last synced: 29 Mar 2025

https://github.com/TikHub/TikHub-API-Python-SDK

High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).

api captcha-solver crawler data-api douyin douyin-tiktok-api instagram kuaishou netease-cloud-music private-api scrapy tiktok twitter weibo xiaohongshu xiguashipin

Last synced: 11 May 2025

https://github.com/sangaline/advanced-web-scraping-tutorial

The Zipru scraper developed in the Advanced Web Scraping Tutorial.

python scraper scrapy tutorial-code

Last synced: 07 Apr 2025

https://github.com/my8100/files

Docs and files for ScrapydWeb, Scrapyd, Scrapy, and other projects

scrapy scrapyd scrapydweb

Last synced: 16 May 2025

https://github.com/MarwanDebbiche/post-tuto-deployment

Build and deploy a machine learning app from scratch 🚀

api aws character-level-cnn deployment docker machine-learning pytorch scraping scrapy selenium

Last synced: 19 Jul 2025

https://github.com/marwandebbiche/post-tuto-deployment

Build and deploy a machine learning app from scratch 🚀

api aws character-level-cnn deployment docker machine-learning pytorch scraping scrapy selenium

Last synced: 05 Apr 2025

https://github.com/kingname/sourcecodeofbook

《Python爬虫开发从入门到实战》配套源代码。

python python3 requests scrapy webcrawler

Last synced: 05 Apr 2025

https://github.com/scrapy-plugins/scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy

crawler crawler-detection plugin proxy scraping scrapy

Last synced: 16 May 2025

https://github.com/lkuffo/web-scraping

Más de 50 ejemplos de web scraping utilizando: Requests | Scrapy | Selenium | LXML | BeautifulSoup

beautifulsoup beautifulsoup4 lxml-etree scraping scraping-python scraping-websites scrapping-python scrapy scrapy-crawler scrapy-spider selenium selenium-python selenium-webdriver web-scraping webscraping

Last synced: 07 Apr 2025

https://github.com/City-Bureau/city-scrapers

Scrape, standardize and share public meetings from local government websites

city-scrapers open-data python scrapy web-scraping

Last synced: 07 Apr 2025

https://github.com/yangjianxin1/qqmusicspider

基于Scrapy的QQ音乐爬虫(QQ Music Spider)，爬取歌曲信息、歌词、精彩评论等，并且分享了QQ音乐中排名前6400名的内地和港台歌手的49万+的音乐语料

crawler music musicspider qqmusic scrapy

Last synced: 27 Oct 2025

https://github.com/windrises/dialogue.moe

dialogue django elasticsearch scrapy vue

Last synced: 22 Jul 2025

https://github.com/hellokaton/elves

🎊 Design and implement of lightweight crawler framework.

163news douban-movie elves scrapy spider

Last synced: 09 Apr 2025

https://github.com/chenjiandongx/github-spider

Github 仓库及用户分析爬虫

crawler github scrapy

Last synced: 13 Apr 2025

https://github.com/zhupingqi/RuiJi.Net

crawler framework, distributed crawler extractor

crawler extractor headless-chrome netcore owin scraper scrapy

Last synced: 04 May 2025

https://github.com/chenjiandongx/Github-spider

Github 仓库及用户分析爬虫

crawler github scrapy

Last synced: 30 Apr 2025

https://github.com/oldshensheep/v2ex_scrapy

scrapy for v2ex.com

dataset scrapy v2ex

Last synced: 04 Apr 2025

https://github.com/glaucocustodio/tanakai

Tanakai is a modern web scraping framework written in Ruby. A fork of Kimurai.

chrome-headless crawler kimurai scraper scrapy webscraping

Last synced: 28 Mar 2025

https://github.com/xyntax/filesensor

Dynamic file detection tool based on crawler 基于爬虫的动态敏感文件探测工具

crawler fuzzing pentesting scrapy

Last synced: 03 Sep 2025

https://github.com/crawlab-team/crawlab-lite

Lite version of Crawlab. 轻量版 Crawlab 爬虫管理平台

crawlab crawler crawler-management crawling-tasks platform scrapy scrapy-ui scrapyd scrapyd-ui spider web-crawler

Last synced: 11 Mar 2025

https://github.com/makelove/programer_log

最新动态在这里【我的程序员日志】

docke docker python scrapy

Last synced: 13 Apr 2025

https://github.com/taoget/livetv_mining

直播网站数据采集

flask python python3 scrapy vue webpack

Last synced: 17 Jun 2025

https://github.com/ttonys/Scrapy-CVE-CNVD

漏洞监控，基于scrapy，scrapy-redis，获取每日最新的CVE和CNVD漏洞，邮件通知

cnvd cve scrapy

Last synced: 11 Jul 2025

https://github.com/henryhaohao/wenshu_spider

:rainbow:Wenshu_Spider-Scrapy框架爬取中国裁判文书网案件数据(2019-1-9最新版)

abuyun decrypt judgement proxy-server scrapy wenshu

Last synced: 20 Aug 2025

https://github.com/DiegoCaraballo/Email-extractor

The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url

email email-extractor email-marketing emails extraction python scraper scrapers scraping scraping-websites scrapper scrapping scrapy scrapy-spider spyder stractor

Last synced: 11 Jul 2025

https://github.com/mehmetozkaya/dotnetcrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping

Last synced: 11 May 2025

https://github.com/mehmetozkaya/DotnetCrawler

DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c

crawler crawling csharp ddd-architecture dotnetcore entity-framework-core htmlagilitypack scraping scrapy scrapy-crawler webcrawler webcrawler-htmlagilitypack webcrawling webscraper webscraping