Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-01-10 00:06:02 UTC
- JSON Representation
https://github.com/antoniowd/crawly
Un web crawler para explorar la web en busca de determinada informacion (email, telefonos, etc...)
crawler got jsdom nodejs webcrawler webscraping
Last synced: 12 Dec 2024
https://github.com/luickk/vulnerability-crawler
Small python program meant to analyze random sites found on google for any vulnerabilities!
Last synced: 28 Dec 2024
https://github.com/der3318/daily-pixiv
Integrated Flow - Line Notification of Top Ranked Pixiv Illustrations
crawler line-notify pixiv workflow
Last synced: 14 Nov 2024
https://github.com/dnlzrgz/excursionist
Scrapy-powered flight price crawler.
crawler crawlers crawling flight flights playwright scraper scraping-websites scrapy travel traveling
Last synced: 23 Dec 2024
https://github.com/engageintellect/scrapers
A repository of web scrapers using Python & Scrapy
Last synced: 13 Dec 2024
https://github.com/moe131/webcrawler
Python web crawler designed to scrape websites
crawler crawling-python python python-crawler scraping simhash web-crawler
Last synced: 23 Dec 2024
https://github.com/r3c0ger/douban-movie-top250-crawler
Crawl the movie information of Douban Movie Top-250, including movie name, movie link, director, starring, release time, production country/region, type, rating, number of reviews and introduction.
beautifulsoup4 crawler lxml python3 spider
Last synced: 09 Jan 2025
https://github.com/kasperomari/simplecrawlerapi
A simple RESTful API that takes a URL and returns all the links in a specific depth.
crawler flask-api flask-restful
Last synced: 03 Jan 2025
https://github.com/tetreum/xupopter_chrome_extension
Extension to easily create crawling recipes
crawler scrapper scrapping webscraper
Last synced: 17 Dec 2024
https://github.com/abdymm/abtelegrambot-sample
sample using Telegram Bot
crawler football php scheduler telegram-bot webhook
Last synced: 10 Jan 2025
https://github.com/izh318/genie-music-artist-album-crawler
지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.
Last synced: 28 Dec 2024
https://github.com/octcarp/sustech_cs209a-java2_f24_proj
(Spring Boot + Vue3) Stack Overflow data crawling and visualization: Our project of CS209A 2024 Fall: Computer System Design and Applications A (a.k.a. Java 2), SUSTech. Taught by Yida Tao @yidatao .
crawler spring-boot stackexchange sustech visualization
Last synced: 01 Jan 2025
https://github.com/keizerzilla/ssh-hunter
Script que caça por Raspberry Pis vulneráveis na internet (porta SSH aberta e senha padrão não modificada).
Last synced: 23 Dec 2024
https://github.com/keizerzilla/search4dwango9
My attempt to help solving the DWANGO9 wad mystery. More info: https://www.youtube.com/watch?v=RXGtCjdwwe8
Last synced: 23 Dec 2024
https://github.com/waived/google-drive-crawler
Proxy-based crawler to expose public (shared) Google Drive links
crawler crawler-python file-crawler google-drive-api shared-folders web-spider
Last synced: 05 Dec 2024
https://github.com/cristiangreco/gcrawler
A simple (not concurrent) web crawler written in Java.
Last synced: 23 Dec 2024
https://github.com/sanhphanvan96/php-training-crawler
Simple php crawler for training purpose
crawler docker docker-compose nginx php php-fpm
Last synced: 10 Jan 2025
https://github.com/tryagi/firecrawl
Generated C# SDK based on official Firecrawl OpenAPI specification
ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk
Last synced: 14 Oct 2024
https://github.com/kahsolt/tieba-dl
A simple image crawler/downloader for Baidu tieba.
baidu-tieba crawler image-crawler tieba
Last synced: 03 Jan 2025
https://github.com/d-w-arnold/local-news-data-collection
Web crawler for local news sites - Generates HTML files of each webpage visited and a list of links found on the webpage, as a TXT file 🌎
crawler data-collection python
Last synced: 14 Dec 2024
https://github.com/bradsec/gofindfiles
Crawl websites attempting to find and download files with matching file types. For use as OSINT or RECON intelligence collection tool.
crawler osint osint-tool recon scraper web-scraper
Last synced: 07 Jan 2025
https://github.com/tinoco/ticapsoriginal_div2png
Ticapsoriginal programmatically div design to png generator of html code from url
beutifulsoup crawler data design div2png generated-art generator html2image parse programmatically-layout pycodestyle python requests ticapsoriginal url urllib
Last synced: 09 Jan 2025
https://github.com/tsaohucn/crawler_fb_page
This is crawler use selenium for facebook pages
crawler facebook-page rails ruby selenium
Last synced: 19 Nov 2024
https://github.com/tsaohucn/crawler_fb_user_group
This is crawler use selenium for facebook user groups
crawler facebook-user-groups rails ruby
Last synced: 19 Nov 2024
https://github.com/cls1991/gank.io-go
A simple crawler for fetching pictures from http://gank.io, implemented in golang.
crawler gankio goquery pictures
Last synced: 10 Jan 2025
https://github.com/sbstjn/tatort
Query information for upcoming Tatort shows
Last synced: 05 Jan 2025
https://github.com/zhanziyuan/webdownloader
Download elements from the specified website.
crawler downloader image image-downloader python python-crawler web
Last synced: 08 Jan 2025
https://github.com/andresayac/cuevana3
Cuevana3 scraper is a content provider of the latest in the world of movies and tv show in Latin Spanish dub or subtitled.
Last synced: 18 Dec 2024
https://github.com/iamtonmoy0/sitemap-crawler
site map crawler with golang and goquery
Last synced: 05 Jan 2025
https://github.com/hvtuananh/twitter_crawler
Daemon to call and get tweets from Twitter Public Stream API
crawler java streaming-api tweets twitter twitter-crawler
Last synced: 23 Oct 2024
https://github.com/luciopaiva/dicio-crawler
Node.js crawler for dicio.com.br.
Last synced: 18 Dec 2024
https://github.com/jefftriplett/pholcidae-demo
:spider: A Pholcidae demo for crawling/spidering a website
crawler csv pholcidae python scrapper scrapy-crawler spider toml
Last synced: 10 Jan 2025
https://github.com/wingkwong/daily_weather_temperature_in_hong_kong
Crawling daily weather temperature in Hong Kong
crawler hongkong python temperature
Last synced: 24 Dec 2024
https://github.com/raspi/scrapy-kuntavaalit2021-almamedia
Fetch Almamedia kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/estavadormir/scrappist
A web scrapper that takes an URL/URLs and converts into a PDF.
bun cli crawler pdf-generation
Last synced: 12 Nov 2024
https://github.com/mach1el/openproject-crawler
Scraping data on OpenProject
crawler golang golang-channel golang-crawling openproject-crawler python python-asyncio python-crawling
Last synced: 12 Nov 2024
https://github.com/erickj3/strike-api
this is a web scraping api with nestsj
api crawler flow nestjs scraping typescript
Last synced: 24 Nov 2024
https://github.com/ma-pony/playwright-spider-utils
Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.
crawl crawler playwright python scrapy selenium spider spiderman
Last synced: 09 Oct 2024
https://github.com/jauharibill/animeindo-crawler
this crawler is used for research only. the creator doesn't take any responsibility for any harmful usage
Last synced: 29 Dec 2024
https://github.com/limdongjin/bill-scraper
Python3 Scraper / Multiprocessing / ElasticSearch / BeautifulSoup :: 20대 국회 법안 크롤러
Last synced: 12 Nov 2024
https://github.com/raspi/scrapy-kuntavaalit2021-sanoma
Fetch Sanoma kuntavaalit 2021 data
crawler mirror python scrapy spider webcrawler
Last synced: 10 Nov 2024
https://github.com/luminovrym/crawler-tools-js
Crawler Tools Js adalah sebuah aplikasi yang digunakan untuk scrapping data pada sebuah web
crawler crawler-js data js web-scraping
Last synced: 02 Jan 2025
https://github.com/ilovebacteria/digikala-api
This python package requests to Digikala API and gets a product detail.
Last synced: 14 Nov 2024
https://github.com/huakunshen/cron-crawler-template
Web Crawler Cron Job Template running with GitHub Action. Capable of sending email notifications.
Last synced: 16 Nov 2024
https://github.com/amazingcoderpro/pythonup
玩转Python!for improving python skills
Last synced: 30 Nov 2024
https://github.com/k0nxt3d/web-scrapers
Web Scraping Scripts in PhP and Bash
bash bot clone cloning crawler curl curlphp download mirroring scraping scraping-websites seo seo-optimization shell-script spider wget
Last synced: 13 Nov 2024
https://github.com/sssshefer/web-crawler-http
Basic web crawler which represents the linking structure of the website
Last synced: 13 Nov 2024
https://github.com/wcygan/crawler
web crawler
crawler crawling tokio tokio-rs web-crawler
Last synced: 13 Nov 2024
https://github.com/reineimi/va2crawl
Website crawler, validator and SEO optimizer
crawler seo-optimization seotools validator website-crawler
Last synced: 12 Nov 2024
https://github.com/shivamsaraswat/webxcrawler
WebXCrawler is a fast static crawler to crawl a website and get all the links.
crawler crawling python scraping webcrawler webxcrawler
Last synced: 06 Nov 2024
https://github.com/willi-dev/dtcapp
dtcapp : distributed twitter crawler.
crawler distributed-systems hazelcast java twitter twitter-api
Last synced: 14 Nov 2024
https://github.com/shamsher31/crawler
Simple site crawler that extracts all the URL links from the given website
Last synced: 13 Nov 2024
https://github.com/jarircse16/bot_detection_firewall
Detects and Blocks generic crawlers from your website.
Last synced: 30 Dec 2024
https://github.com/jnbdz/xtamia-crawler
(!!!Still being built!!!) An open-source web crawler build on Electron for Windows, Mac OS X, and Linux
crawler electron foundation foundation-css javascript scraper vuejs xtamia
Last synced: 12 Nov 2024
https://github.com/rabattkarte/free-domain-scanner
crawler dns domain domain-name domain-names go golang scanner whois
Last synced: 16 Nov 2024
https://github.com/cak/foot
Foot is a library that fetches a list of URLs and silly walks through each site to gather information.
Last synced: 14 Nov 2024
https://github.com/yosh1/mio-crawler
A crawler that acquires data usage of iijmio .
Last synced: 13 Nov 2024
https://github.com/jul10l1r4/objetive
This software is a mini-crawler that aims to grab some text parts from some website or ip that responds http*
bigdata crawler data-science security-tools web
Last synced: 19 Nov 2024
https://github.com/leegeunhyeok/python-gongucrawler
파이썬3 공유마당 이미지 및 상세정보 크롤러
Last synced: 22 Dec 2024
https://github.com/earelin/jwraith
A Java clone of the Wraith website comparison tool.
crawler screenshots screenshots-comparison selenium webtest
Last synced: 19 Dec 2024
https://github.com/jurooravec/knwldg
Datasets, scrapers, pipelines
companies crawler data dataset non-profit-organizations scraper scrapy
Last synced: 13 Nov 2024
https://github.com/emarifer/search-engine
A mini Google. Custom web crawler & indexer written in Golang.
crawler dashboard deep-first-search fiber-framework full-text-search golang gorm-orm htmx htmx-go hyperscript indexer inverted-index response-caching search-engine templ worker-pool
Last synced: 16 Nov 2024
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 08 Jan 2025
https://github.com/eghuro/crawlcheck
Extensible web crawler
configuration crawler http plugin python robots-txt sitemap
Last synced: 13 Nov 2024
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 28 Dec 2024
https://github.com/iomarmochtar/imagecrawler
Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+
Last synced: 25 Dec 2024
https://github.com/zenoyang/webcrawler
一些爬虫代码
crawler scrapy spider web-crawler
Last synced: 16 Nov 2024
https://github.com/nextlevelshit/node-crawl
Webcrawler for nodejs
crawl crawler javascript nodejs
Last synced: 19 Nov 2024
https://github.com/nextlevelshit/adonis-crawler
A free web crawler on top of the incredibile AdonisJS Framework
adonisjs crawler javascript nodejs regex spider websocket
Last synced: 19 Nov 2024
https://github.com/viko16/hatcher
🐣[WIP] Provides APIs by simple configuration.
api api-server cli crawler koa-middleware nodejs spider
Last synced: 01 Oct 2024
https://github.com/zzzzer91/match_spider
某菠菜网站爬虫,该网站已倒闭:disappointed_relieved:
Last synced: 12 Nov 2024
https://github.com/rayspock/go-web-crawler
A web crawler to fetch all the links from a given website via go routines.
concurrency crawler golang goroutine
Last synced: 14 Nov 2024
https://github.com/hsiehbocheng/usa-tourist-recommend
crawler mongodb python tableau
Last synced: 14 Nov 2024
https://github.com/pourmand1376/crawler
Simple Crawler, Indexer and Search Engine Web Application
crawler csharp csharp-code dotnet mvc
Last synced: 14 Nov 2024