Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-09 00:09:38 UTC
- JSON Representation
https://github.com/hoishing/selenium-crawler
a web crawler written in python, powered by Selenium and Tesseract OCR
Last synced: 17 Nov 2024
https://github.com/estroz/seekret
Seekret is a sensitive data crawler for GitHub repositories
Last synced: 25 Dec 2024
https://github.com/tsoliangwu0130/ptt-search
A simple Python script to fetch PTT post from the command line.
Last synced: 08 Jan 2025
https://github.com/opda0887/bahamut-crawler-to-gmail
發想:使用Python爬蟲取得巴哈姆特版面的最新論壇,並用gmail傳送這些訊息給自己。A thought: Use Python crawler to the latest forums in Bahamut, and use gmail to send these messages to myself.
Last synced: 27 Nov 2024
https://github.com/deptno/nsdi
㉿ nsdi downloader built on puppeteer
crawler downloader nsdi openapi puppeteer
Last synced: 31 Dec 2024
https://github.com/amirespahbodi/url_crawler
url crawler
crawler fastapi pydantic python3 sqlalchemy
Last synced: 02 Jan 2025
https://github.com/altescy/mincrawler
A minimal web crawler.
configurable crawler python scraping
Last synced: 27 Nov 2024
https://github.com/jorgeparavicini/medalytik-python
Python crawlers for a job mediation firm
Last synced: 07 Dec 2024
https://github.com/pjullrich/link-crawler
Python Crawler that reports broken links on a given website and its sup-pages
asyncio breadth-first-search broken-links crawler python
Last synced: 22 Nov 2024
https://github.com/teal33t/base_crawler
Simple scaffold for selenium based crawler bots
crawler scaffold-template selenium selenium-python
Last synced: 23 Nov 2024
https://github.com/shiritai/wallpaper_master
My first individual project!
crawler file-explorer javafx-application maven-shade mini-system wallpaper wallpaper-master
Last synced: 01 Jan 2025
https://github.com/programming-with-love/skyeyesystem
天眼系统,每隔十分钟爬取各个平台的热搜数据并入库。包括原始热搜数据存入mysql。词频统计存入Redis。
crawler mysql redis skyeye skyeyewall springboot
Last synced: 16 Nov 2024
https://github.com/coghost/crawlers
crawlers in one
crawler python3 staticimg weibo
Last synced: 02 Jan 2025
https://github.com/cseas/shares-monitor
Web crawler to fetch and monitor shares details.
crawler python python3 scraper scraping-websites shares
Last synced: 27 Dec 2024
https://github.com/stevieflyer/quokka
An easy-to-use web crawler framework, supporting parallel crawling without a line of code and headless running.
crawler parallel web-automation
Last synced: 14 Dec 2024
https://github.com/sonhm3029/crawl-data-bot
This project making a base crawl data from web bot, include text data and images data
crawler google medical vietnamese
Last synced: 16 Nov 2024
https://github.com/moontai0724/auto-notify-pu-courses-quota
A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.
Last synced: 06 Dec 2024
https://github.com/vindecodex/automated-crawler-wget
Using wget to crawl site
Last synced: 01 Jan 2025
https://github.com/buren/stupid_crawler
Stupid crawler that looks for URLs on a given site
Last synced: 12 Oct 2024
https://github.com/mkfsn/chronos
A light cron-like container service - create cron job easily.
Last synced: 21 Nov 2024
https://github.com/zephyrpersonal/github-trending-crawler
transform github-trending repos to json data
cheerio crawler fetch github node repository spider trending
Last synced: 28 Nov 2024
https://github.com/exasol/error-code-crawler-maven-plugin
Validator and crawler for exasol-error-codes in Java code
catalog crawler error-handling error-report error-reporting exasol exasol-integration java unification
Last synced: 14 Nov 2024
https://github.com/mc256/node-static-webpage-crawler
download entire website with its directory structure.
cache-server crawler nodejs static-site
Last synced: 24 Nov 2024
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
crawler node-crawler nodejs url
Last synced: 25 Dec 2024
https://github.com/pxlrbt/website-diff
Utility tool that bundles a crawler and BackstopJS for visual regression testing.
backstopjs crawler visual-regression-testing
Last synced: 28 Nov 2024
https://github.com/srx-2000/swaiter
a programe to wait until the selenium element has loaded——selenium模拟器元素等待程序
crawler selenium selenium-python
Last synced: 21 Nov 2024
https://github.com/hoanle396/py-iconnect
crawler flask flask-application image-processing python
Last synced: 14 Dec 2024
https://github.com/projectx3193275578/prjctxx8264
A simple, open-source, easy to use, and free download manager for malware samples.
crawler downloader malware manager samples
Last synced: 05 Jan 2025
https://github.com/trixsec/zeuscrawler
The ultimate web crawling powerhouse, striking the web like lightning to harvest data with divine precision.
crawler cybersecurity information-gathering information-retrieval osint python scraper spider web-crawler web-scraper
Last synced: 21 Dec 2024
https://github.com/uzsoftic/ecommerce-web-crawler
WebCrawler for ecommerce sites
bot crawler crawler-php ecommerce laravel parser php php8
Last synced: 24 Dec 2024
https://github.com/mahmoudgalalz/pupt
A starter for web crawling using Puppeteer
Last synced: 05 Jan 2025
https://github.com/lukasherz/22fs-sc-twitter-crawler
used for a research project in social computing @ uzh (fs22)
crawler crawling database twitter twitter-api-v2
Last synced: 25 Dec 2024
https://github.com/victorhuu/amazonmovieintegration
本仓库是同济大学数据仓库的第一个个人作业——利用爬虫与ETL工具整理Amazon的电影数据
crawler data-warehouse movies pandas scrapy xpath
Last synced: 28 Nov 2024
https://github.com/tubone24/askfm-qa-crawler
Crawl Ask.fm QA lists and create corpus for ML.
askfm chromedriver corpus-builder crawler selenium
Last synced: 25 Dec 2024
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 28 Nov 2024
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 23 Dec 2024
https://github.com/alatiera/ellinofreneia-crawler
Crawler of ellinofreneianet.gr for offline content consumption
Last synced: 01 Jan 2025
https://github.com/kangoo13/textbroker-author-article-picker
Bot that automatically lock an order into a textbroker's author account.
author-textbroker automation bot colly crawler go gocolly golang scrapper spider textbroker textbroker-author textbroker-order-picker textbroker-orders textbroker-scrapper
Last synced: 21 Nov 2024
https://github.com/vitaee/laravelandcrawlers
php web crawler examples with oop concept and laravel project
Last synced: 26 Dec 2024
https://github.com/hantang/list-movies-top
豆瓣(douban.com)、IMDb(imdb.com)、时光网(mtime.com)、猫眼(maoyan.com)Top电影定时抓取
Last synced: 07 Jan 2025
https://github.com/microlinkhq/ua
A simple redis primitives to incr() and top() user agents
crawler redis user-agent user-agent-parser
Last synced: 12 Nov 2024
https://github.com/dangdungcntt/crawl-fb-v2
Simple script to detect email and phone from facebook comment.
Last synced: 17 Nov 2024
https://github.com/danielemoraschi/go-sitemap-common
Simple GO sitemap generator and crawler.
crawler golang sitemap sitemap-generator
Last synced: 31 Dec 2024
https://github.com/pnguyen215/instagram-crawler
Instagram Crawler is a Python script to download posts from a specified Instagram account.
crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler
Last synced: 12 Nov 2024
https://github.com/madis/flatcrawl
Clojure app for crawling apartment information from http://kv.ee
clojure crawler real-estate webapp
Last synced: 12 Nov 2024
https://github.com/phanikmr/linkcrawler
A LinkCrawler is a Python module that takes a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.
async crawler linkcrawler parse python scrapy spider
Last synced: 29 Nov 2024
https://github.com/developerjosh/gogo-crawler
The tool kit for making an anime website with a database full of anime
crawler crawler-js gogoanime gogoanime-api gogoanime-scraper
Last synced: 16 Nov 2024
https://github.com/khilnani/spidey.py
Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.
cli crawler python scaper web-spider
Last synced: 02 Dec 2024
https://github.com/ccrashzer0/web_crawler
A python based web crawler
crawler internet python python3 webcrawler
Last synced: 28 Nov 2024
https://github.com/captain-woof/zhi-zhu
Zhi-Zhu is a multithreaded spidering script that recursively searches base webpages and all urls appearing in it, for specific (regex) words.
crawler crawler-python crawling-python python3
Last synced: 31 Dec 2024
https://github.com/feliz-szk/berserk
Berserk: Crawler to increase web traffic(based on tor and privoxy)
anonymizer anonymous-proxy command-line-tool crawler linux privoxy python scraping-websites tor webtraffic-increaser
Last synced: 13 Nov 2024
https://github.com/hudson-newey/user-web-crawler
The Archive.org Crawler works through volunteering users who install an extension on their browsers. When the user visits a webpage, the URL is anonymously added to the Archive.org database.
Last synced: 11 Nov 2024
https://github.com/afuntw/misc-crawler
some small crawler for specific website
Last synced: 13 Nov 2024
https://github.com/santhin/real-estate
Real estate crawler with ML on scraped data
crawler jupyter-notebook ml real-estate scrapy
Last synced: 24 Nov 2024
https://github.com/khoinguyen2k/web-crawler
about crawl data
crawler jsoup-library scraper selenium-java
Last synced: 16 Nov 2024
https://github.com/cryptoc1/earl
Earl is looking for URLs in your area.
crawler middleware nuget webscraping
Last synced: 28 Nov 2024
https://github.com/khadkarajesh/aptoide
Aptoide app crawler using beautifulsoup
beautifulsoup4 crawler flask python3 web-application
Last synced: 13 Nov 2024
https://github.com/donuts-are-good/araknnid
GO GO TINY SPIDER!
crawler hacktoberfest search-engine spider
Last synced: 28 Dec 2024
https://github.com/hctilg/taaghche-dl
Save books purchased from taaghche.com !
crawler downloader pillow-library python3 selenium taaghche
Last synced: 09 Jan 2025
https://github.com/hanifdwyputras/se-scraper
Search Engine scraper with PHP
crawler scraper seo seo-crawler
Last synced: 06 Dec 2024
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 17 Dec 2024
https://github.com/orsinium-labs/gpcc
Python library and CLI tool to fetch information from GCP Browser (https://gpc-browser.gs1.org/)
Last synced: 16 Nov 2024
https://github.com/zabuzard/wslotter
WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.
Last synced: 13 Nov 2024
https://github.com/zabuzard/songcrawler
Crawles all song files available on 'http://downloads.khinsider.com/'. Creates a list of direct download links for all such songs, intended for use with JDownloader or similar.
command-line-tool crawler download-musics downloadmanager jdownloader multithreading song-files songs web-crawler
Last synced: 13 Nov 2024
https://github.com/wondervictor/spiderman
2017 Software Course Project
crawler distribute-crawler zhihu-crawler
Last synced: 16 Nov 2024
https://github.com/deployment-helper/api-template-crawler
API interface to crawl the templates
api crawler deployment-helper gcp gcp-cloud-run golang rest
Last synced: 14 Nov 2024
https://github.com/orafaelfragoso/itunes-crawler
Retrieves information about an artist by crawling the iTunes API and iTunes Page
Last synced: 19 Dec 2024
https://github.com/somnisomni/trawler-csharp
The successor of https://github.com/somnisomni/twitter-account-data-crawler, written in .NET C#
crawler crawling csharp dotnet follower-tracker selenium selenium-csharp twitter twitter-crawler twitter-crawling twitter-scraper
Last synced: 05 Jan 2025
https://github.com/idlesign/gallerycrawler
Generic crawling for galleries
crawler gallery images python3
Last synced: 17 Dec 2024
https://github.com/davideferre/covid19-data-crawler-ita
Covid 19 italian data crawler
coronavirus covid19 crawler hacktoberfest hacktoberfest2021 python
Last synced: 12 Nov 2024
https://github.com/nelcifranmagalhaes/web_crawler
A web crawler for all Naruto characters
anime beautifulsoup characters crawler naruto python
Last synced: 03 Dec 2024
https://github.com/jonasrenault/cprex
Chemical Properties Relation Extraction
chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers
Last synced: 14 Oct 2024
https://github.com/flavien-hugs/scrapy-test
Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.
crawler python scraping scrapy
Last synced: 09 Dec 2024
https://github.com/zhanymkanov/marketplace_parser
Products and Reviews Crawler
Last synced: 14 Nov 2024
https://github.com/amirsorouri00/dsl-se
This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is
container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine
Last synced: 18 Nov 2024
https://github.com/duaraghav8/larry-crawler
Kayako Twitter challenge
crawler fetch-tweets hashtag nodejs pagination tweets twitter-api
Last synced: 21 Nov 2024
https://github.com/tanja-4732/od-get
A Rust tool for recursively crawling & downloading data from open directories
cli crawler open-directory open-directory-downloader rust
Last synced: 14 Nov 2024
https://github.com/zhs007/lottery-crawler
基于jarvis-task的爬虫,主要用来爬取lottery数据。
Last synced: 03 Jan 2025
https://github.com/karantyagi/web-crawler
BFS and DFS implementations for a wikipedia crawler
Last synced: 13 Nov 2024
https://github.com/par7133/splash-bot-crawler
Splash Bot creates splash on the fly of your websites - GPL License 🔥
bot crawler gallery open-source opensource php splash
Last synced: 13 Nov 2024
https://github.com/suddi/fundscraper
Collection of web crawlers to scrape fund data using Scrapy
Last synced: 11 Oct 2024
https://github.com/liyun-li/meh-bot
Just a bot that clicks an image
bot crawler docker headless-firefox meh python python3 selenium twilio-sms-api
Last synced: 25 Nov 2024
https://github.com/pierlauro/mdbubing
From WARC records to MongoDB documents
bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving
Last synced: 09 Dec 2024
https://github.com/abdus/scrape-web
A simple web scrapper for Node.js
crawler web-scraping web-scrapper
Last synced: 03 Dec 2024
https://github.com/dingpingzhang/papermedia
A scrapy-based crawler for crawling paper media.
Last synced: 22 Dec 2024