Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-10 00:06:02 UTC
- JSON Representation
https://github.com/woorim960/nate.com-comments-crawler
nate.com-comments-crawler
chromedriver crawler python3 selenium
Last synced: 28 Dec 2024
https://github.com/amirzenoozi/aparat-videos-dataset
Some Simple Information About Aparat Videos for DataScientists
aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video
Last synced: 20 Nov 2024
https://github.com/lockblock-dev/crawlarr
Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.
Last synced: 24 Nov 2024
https://github.com/hctilg/taaghche-dl
Save books purchased from taaghche.com !
crawler downloader pillow-library python3 selenium taaghche
Last synced: 09 Jan 2025
https://github.com/eea/eea-crawler
EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).
airflow-dags crawler elasticsearch etl-pipeline indexing
Last synced: 24 Nov 2024
https://github.com/exasol/error-code-crawler-maven-plugin
Validator and crawler for exasol-error-codes in Java code
catalog crawler error-handling error-report error-reporting exasol exasol-integration java unification
Last synced: 14 Nov 2024
https://github.com/projectx3193275578/prjctxx8264
A simple, open-source, easy to use, and free download manager for malware samples.
crawler downloader malware manager samples
Last synced: 05 Jan 2025
https://github.com/sonhm3029/crawl-data-bot
This project making a base crawl data from web bot, include text data and images data
crawler google medical vietnamese
Last synced: 16 Nov 2024
https://github.com/sammwyy/craw
a website-crawler library for nodejs
crawler crawlers html javascript library node nodejs nodejs-module npm npm-module parser spider website
Last synced: 16 Nov 2024
https://github.com/sefinek/niedlascamu.pl-tracker
Śledzenie zmian na stronie niedlascamu.pl.
crawl crawler niedlascamu tracker tracking
Last synced: 07 Dec 2024
https://github.com/dizys/weibo-crawler
A nodejs weibo crawler
crawler nodejs typescript weibo-spider
Last synced: 27 Dec 2024
https://github.com/citiususc/polypus
Polypus: a Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis
analytics bigdata crawler scraper sentiment-analysis twitter
Last synced: 02 Dec 2024
https://github.com/tcc0lin/magiccrawler
Collect all kinds of interesting crawler scripts and tackle them against the anti-climbing method :bowtie::heavy_check_mark::heavy_check_mark::heavy_check_mark:
Last synced: 17 Nov 2024
https://github.com/piopi/behatcrawler
A Behat extension that crawls links on a website and executes user-defined function on each one of them.
behat behat-extension crawler php selenium-webdriver
Last synced: 19 Dec 2024
https://github.com/uzsoftic/ecommerce-web-crawler
WebCrawler for ecommerce sites
bot crawler crawler-php ecommerce laravel parser php php8
Last synced: 24 Dec 2024
https://github.com/maxgio92/package-crawler
A package crawler for most known Linux distros
Last synced: 26 Nov 2024
https://github.com/maxiroellplenty/gs-robot
NodeJs tool to scrap gelbe-seiten
axios cheerio crawler gelbe-seiten nodejs scraper yargs
Last synced: 23 Nov 2024
https://github.com/sinkaroid/webnovelcrawler
Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.
Last synced: 23 Dec 2024
https://github.com/myconsciousness/metis
Metis main repository.
application client crawler crawling crawlwebpage educatable gui lerning logging programming-language python scrape scraping scraping-websites tkinter tkinter-gui tkinter-python
Last synced: 08 Dec 2024
https://github.com/j-hoplin/naver_news_headtopic_news_scraper
네이버 뉴스에서 헤드라인 뉴스 스크레이핑
Last synced: 11 Dec 2024
https://github.com/zhoudaxia233/unilogo
A visually striking assembly of the top 1000 universities' logos from ARWU, sorted by color into a vibrant spectrum.
Last synced: 15 Dec 2024
https://github.com/orafaelfragoso/itunes-crawler
Retrieves information about an artist by crawling the iTunes API and iTunes Page
Last synced: 19 Dec 2024
https://github.com/adamfisher/scrapyrt.client
A C# client to make calls to a scrapyrt (Scrapy real-time) HTTP endpoint.
crawler scraper scrapy scrapy-crawler scrapy-framework scrapy-spider
Last synced: 26 Nov 2024
https://github.com/raphaelalmeidamartins/python-tech-news
Python data science project developed js at the end of Unit 35 (Computer Science Module) of the Trybe's Web Development course
crawler crawler-python data-science pytest python
Last synced: 17 Nov 2024
https://github.com/suddi/fundscraper
Collection of web crawlers to scrape fund data using Scrapy
Last synced: 11 Oct 2024
https://github.com/hoishing/selenium-crawler
a web crawler written in python, powered by Selenium and Tesseract OCR
Last synced: 17 Nov 2024
https://github.com/mc256/node-static-webpage-crawler
download entire website with its directory structure.
cache-server crawler nodejs static-site
Last synced: 24 Nov 2024
https://github.com/tubone24/askfm-qa-crawler
Crawl Ask.fm QA lists and create corpus for ML.
askfm chromedriver corpus-builder crawler selenium
Last synced: 25 Dec 2024
https://github.com/soulyma/web_crawler
A focused web crawler to extract and structure Arabic content from web pages. Designed for researchers, data analysts, and developers working on Arabic language datasets.
beautifulsoup4 crawler csv data json python structured-data
Last synced: 13 Dec 2024
https://github.com/alatiera/ellinofreneia-crawler
Crawler of ellinofreneianet.gr for offline content consumption
Last synced: 01 Jan 2025
https://github.com/moontai0724/auto-notify-pu-courses-quota
A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.
Last synced: 06 Dec 2024
https://github.com/cseas/shares-monitor
Web crawler to fetch and monitor shares details.
crawler python python3 scraper scraping-websites shares
Last synced: 27 Dec 2024
https://github.com/zhaotianff/qzone
想起那天夕阳下的奔跑,那是我逝去的青春
crawler crawling-sites csharp qzone qzone-photos qzone-spider wpf
Last synced: 15 Nov 2024
https://github.com/gnaneshkunal/book-miner
Web crawler for Book reviews (Goodreads)
Last synced: 16 Dec 2024
https://github.com/ghost---shadow/feature-extractor-from-codebase
Copies the target java file and all its dependencies recursively to another directory
Last synced: 16 Nov 2024
https://github.com/leomaurodesenv/smm-maker-profile
A package to fetching the maker profile - Super Mario Maker
crawler javascript json mario-maker nodejs
Last synced: 02 Nov 2024
https://github.com/hamidrabedi/digikala-crawler
a crawler for digikala with django framework, selenium and rest api. also scraping data from gathered urls
crawler digikala digikala-crawler django python scraper
Last synced: 14 Dec 2024
https://github.com/iarsham/scrapify
Scrapify is a golang library that automates the process of bypassing CAPTCHAs, enabling efficient web scraping and data acquisition.
403-bypass arkose cloudflare crawler golang http-client scraper
Last synced: 12 Dec 2024
https://github.com/maxmindlin/swarm
Go crawler that searches and aggregates information relevant to your interests. WIP for learning Go crawling.
Last synced: 06 Dec 2024
https://github.com/ilsonlasmar/inovamind
Desafio Inovamind - Crawler em Ruby on Rails com Sidekiq + Redis
Last synced: 10 Jan 2025
https://github.com/naveenaidu/google-crawler
Google Crawler - Curates the search results
Last synced: 17 Nov 2024
https://github.com/coghost/crawlers
crawlers in one
crawler python3 staticimg weibo
Last synced: 02 Jan 2025
https://github.com/kahsolt/allchan
An image crawler for xChan(4chan/8ch/...) image board.
4chan 4chan-downloader 8chan crawler image-crawler
Last synced: 03 Jan 2025
https://github.com/bitscoper/bitscoper_crawler
Crawls the titles of webpages in series by number and creates a list of the available links.
Last synced: 05 Dec 2024
https://github.com/developerjosh/gogo-crawler
The tool kit for making an anime website with a database full of anime
crawler crawler-js gogoanime gogoanime-api gogoanime-scraper
Last synced: 16 Nov 2024
https://github.com/henkman/crawlers
:squirrel: some crawlers and downloaders
Last synced: 15 Nov 2024
https://github.com/baerwang/sec_craw
一个方便安全研究人员获取每日安全日报的爬虫,目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客,持续更新中。
crawler security security-tools threat threat-intelligence
Last synced: 20 Nov 2024
https://github.com/princed/specht
Check links found in html or js files by pattern
cli crawler html javascript streams
Last synced: 19 Nov 2024
https://github.com/tufayellus/linkedin-cv-downloader
A Python based GUI automation software for downloading bulk LinkedIn CV / LinkedIn Resume from a list of profile links
crawler digital-marketing email-marketing email-scraper leads linkedin-bot linkedin-cv linkedin-cv-downloader linkedin-download linkedin-downloader linkedin-resume linkedin-resume-downloader linkedin-scraper scrape-emails scrape-websites scraper scraper-engine
Last synced: 23 Nov 2024
https://github.com/greatdrake/contributecounter
crawl Wikipedia for contributers
Last synced: 14 Dec 2024
https://github.com/mkfsn/chronos
A light cron-like container service - create cron job easily.
Last synced: 21 Nov 2024
https://github.com/opda0887/bahamut-crawler-to-gmail
發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.
Last synced: 27 Nov 2024
https://github.com/mdazlaanzubair/amazon-scraper-api
A web scraper to crawl on amazon to extract products information and return in JSON format.
amazon crawler expressjs json-api nodejs webscraping
Last synced: 10 Jan 2025
https://github.com/mohabmes/matool
A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }
cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web
Last synced: 08 Jan 2025
https://github.com/deptno/nsdi
㉿ nsdi downloader built on puppeteer
crawler downloader nsdi openapi puppeteer
Last synced: 31 Dec 2024
https://github.com/altescy/mincrawler
A minimal web crawler.
configurable crawler python scraping
Last synced: 27 Nov 2024
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 17 Dec 2024
https://github.com/fa7ad/aiub-notes-dl
Download all notes from AIUB's portal
Last synced: 24 Oct 2024
https://github.com/buren/site_health
Crawl a site and check various health indicators
Last synced: 28 Oct 2024
https://github.com/droiddevgeeks/nodelearning
This is node learning demo. It has covered all basics of node.
crawler database ejs ejs-express mcv middleware-nodes mongodb node node-module nodejs nodemailer npm-package router sign
Last synced: 13 Nov 2024
https://github.com/saketh7382/smartcrawler
Package for crawling items from webpages and store them as json file
crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager
Last synced: 08 Dec 2024
https://github.com/khoinguyen2k/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 16 Nov 2024
https://github.com/srx-2000/swaiter
a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序
crawler selenium selenium-python
Last synced: 21 Nov 2024
https://github.com/h4r5h1t/crawlytics
A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.
appsec crawler crawler-python mechanicalsoup security security-tools webcrawler
Last synced: 28 Dec 2024
https://github.com/kangoo13/textbroker-author-article-picker
Bot that automatically lock an order into a textbroker's author account.
author-textbroker automation bot colly crawler go gocolly golang scrapper spider textbroker textbroker-author textbroker-order-picker textbroker-orders textbroker-scrapper
Last synced: 21 Nov 2024
https://github.com/hoanle396/py-iconnect
crawler flask flask-application image-processing python
Last synced: 14 Dec 2024
https://github.com/mahmoudgalalz/pupt
A starter for web crawling using Puppeteer
Last synced: 05 Jan 2025
https://github.com/amirsorouri00/dsl-se
This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is
container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine
Last synced: 18 Nov 2024
https://github.com/snuzi/devblogs-aggregator
The backend aggregator project of DevBlogs.net
aggregator blog crawler engineering engineering-blogs tech tech-blogs tech-companies tech-news
Last synced: 09 Nov 2024
https://github.com/bkdev98/ebooks-crawler
Ebooks crawler for personal purpose using ReactJS.
crawler material-ui nodejs reactjs
Last synced: 01 Jan 2025
https://github.com/somnisomni/trawler-csharp
The successor of https://github.com/somnisomni/twitter-account-data-crawler, written in .NET C#
crawler crawling csharp dotnet follower-tracker selenium selenium-csharp twitter twitter-crawler twitter-crawling twitter-scraper
Last synced: 05 Jan 2025
https://github.com/trixsec/zeuscrawler
The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.
crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper
Last synced: 21 Dec 2024
https://github.com/dimo414/pycrawl
Simple Python web crawler, primarily designed for inspecting and diagnosing your own website
Last synced: 18 Dec 2024
https://github.com/zephyrpersonal/github-trending-crawler
transform github-trending repos to json data
cheerio crawler fetch github node repository spider trending
Last synced: 28 Nov 2024
https://github.com/scrwdrv/siege-crawler
This CLI tool will find same domain urls in a web page and requesting them to find even more urls until server crash (or at the end of benchmark). It is used to test maximun capacity of server or finding for glitches that users might encounter.
benchmark cli crawler ddos debug siege tool
Last synced: 18 Dec 2024
https://github.com/jackfsuia/chats-crawler
Discourse chat data crawling and on-the-way parsing straight for LLM instruction finetuning. Data include texts, images and links ( Discourse论坛对话(图片,文本)数据爬取并解析,以直接用于(多模态)指令微调).
crawler fine-tuning finetune-llm gpt html-css-javascript instruction-tuning llm llm-training llms nlp nlp-parsing parser
Last synced: 14 Nov 2024
https://github.com/hanifdwyputras/se-scraper
Search Engine scraper with PHP
crawler scraper seo seo-crawler
Last synced: 06 Dec 2024
https://github.com/teal33t/base_crawler
Simple scaffold for selenium based crawler bots
crawler scaffold-template selenium selenium-python
Last synced: 23 Nov 2024
https://github.com/pxlrbt/website-diff
Utility tool that bundles a crawler and BackstopJS for visual regression testing.
backstopjs crawler visual-regression-testing
Last synced: 28 Nov 2024
https://github.com/tsaohucn/crawler_fb_group
This is crawler use selenium for facebook groups
crawler facebook-groups rails ruby
Last synced: 19 Nov 2024
https://github.com/stevieflyer/quokka
An easy-to-use web crawler framework, supporting parallel crawling without a line of code and headless running.
crawler parallel web-automation
Last synced: 14 Dec 2024