Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
![](https://explore-feed.github.com/topics/crawler/crawler.png)
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-02-15 00:06:28 UTC
- JSON Representation
https://github.com/bwh1270/allrecipes-scraper
crawler food-computing scraper scraping scrapy
Last synced: 24 Jan 2025
https://github.com/kyagara/lol-match-crawler
Very simple crawler for League of Legends matches.
crawler golang league-of-legends pgx postgres riot-games sql
Last synced: 29 Jan 2025
https://github.com/fscotto/noahcrawler
A simple web crawler written in Java to support a database of Italian regions.
Last synced: 21 Jan 2025
https://github.com/ma-pony/playwright-spider-utils
Playwright Spider Utils is a utility library for engineers using the Playwright framework to build web crawlers. This project provides common web scraping functions, simplifying the process of crawler development and enhancing productivity.
crawl crawler playwright python scrapy selenium spider spiderman
Last synced: 08 Feb 2025
https://github.com/cak/foot
Foot is a library that fetches a list of URLs and silly walks through each site to gather information.
Last synced: 14 Jan 2025
https://github.com/yosh1/mio-crawler
A crawler that acquires data usage of iijmio .
Last synced: 12 Jan 2025
https://github.com/artemnikitin/crawler
Example of web crawler implemented in Go
Last synced: 08 Jan 2025
https://github.com/murilobsd/icrop-csv
Icrop-csv para automatizar o processo do download dos relatórios.
Last synced: 28 Dec 2024
https://github.com/iomarmochtar/imagecrawler
Simple image crawler by follow the links recursively, no dependency needed, for python 2.7+
Last synced: 25 Dec 2024
https://github.com/luciopaiva/dicio-crawler
Node.js crawler for dicio.com.br.
Last synced: 11 Feb 2025
https://github.com/ymdarake/otenki-crawler
Yet another weather data scraper.
Last synced: 16 Jan 2025
https://github.com/waived/pastebin-ripper
Scrape all pastes from pastebin page + sub-pages
crawler mass-downloader pastebin-ripper pastebin-scraper python3 ripper scraper
Last synced: 29 Jan 2025
https://github.com/jurooravec/knwldg
Datasets, scrapers, pipelines
companies crawler data dataset non-profit-organizations scraper scrapy
Last synced: 12 Jan 2025
https://github.com/tssujt/async-crawler-sample
A simple crawler sample based on asyncio~
Last synced: 22 Jan 2025
https://github.com/orshahar91/crawler
Simple Web Crawler
crawler crawling-websites image-crawler java servlets webcrawler
Last synced: 28 Dec 2024
https://github.com/lesterrry/campfire
Shock-drop watching utility
crawler parser web-crawler web-parser
Last synced: 07 Jan 2025
https://github.com/agucova/needs-seeding
🌱 A script that downloads a list of .torrent files from a website, checks their health and lists the ones that need more seeding.
Last synced: 09 Jan 2025
https://github.com/juangesino/ah-bonus-crawler
React + Express application that crawls Albert Heijn's promotions.
crawler crawling express expressjs headless-chrome nodejs react reactjs
Last synced: 23 Jan 2025
https://github.com/ndoolan360/go-crawler
A simple web crawling program written in Go in an afternoon. 🕷️🕸️
afternoon-project crawler scraper
Last synced: 18 Jan 2025
https://github.com/phatpham9/scraper.fun
Building, using & sharing HTML scraper are way funnier!
Last synced: 30 Jan 2025
https://github.com/zawlinnnaing/my-wiki-crawler
A simple program for crawling Burmese wikipedia using Media wiki API.
crawler myanmar-tools python wikipedia-api
Last synced: 25 Dec 2024
https://github.com/949886/pixiv-crawler
Pixiv illustration info crawler to local MySQL database.
Last synced: 28 Dec 2024
https://github.com/seanghay/wpget
⚡️wpget - A tool for downloading all posts from a WordPress website via public JSON API
Last synced: 22 Nov 2024
https://github.com/spider-rs/spider-clients
Clients to use with the hosted spider service - spider.cloud
ai ai-agents ai-scraping crawler html-to-markdown llm-webcrawler scraper spider web-scraping
Last synced: 05 Nov 2024
https://github.com/davelongdev/link-report-crawler
A web crawler using Node.js that crawls a site and returns a report showing all internal links.
crawler crawling javascript seo seo-tools
Last synced: 02 Jan 2025
https://github.com/brianmacintosh/wikicrawler
Sandbox project for manipulating Wikimedia wikis
c-sharp crawler mediawiki-bot wikipedia-bot
Last synced: 30 Dec 2024
https://github.com/tatamiya/gas-new-books-crawler
Crawling new book information from 版元ドットコム(https://www.hanmoto.com/)
Last synced: 21 Jan 2025
https://github.com/muhfalihr/pyxdtelebot
PyXDTeleBot is a Telegram bot created using the Python programming language, specifically designed to facilitate the seamless sharing of media such as photos and videos from Twitter user posts.
crawler crawling crawling-python crontab python3 telegram-bot telegram-bot-api twitter twitter-api x
Last synced: 12 Feb 2025
https://github.com/viper373/xovideos
一个为用户打造的个性化视频下载工具
crawler downloader githubactions m3u8 mongodb mp4 pornhub python
Last synced: 23 Jan 2025
https://github.com/izh318/genie-music-artist-album-crawler
지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.
Last synced: 28 Dec 2024
https://github.com/rutopio/crawler-2020-taiwanese-election-results
2020 台灣選舉結果爬蟲:以不分區政黨票為例
Last synced: 31 Jan 2025
https://github.com/zigai/crawlwright
Web crawling framework powered by Playwright
crawler crawling playwright python scraping wrighter
Last synced: 02 Feb 2025
https://github.com/thejoin95/free-proxies.info
API service for get anonymous and non proxy, filter by latency, country, updatetime and more
api crawler http-proxy proxy proxy-list python scraper
Last synced: 06 Jan 2025
https://github.com/allancapistrano/anime-sheets
Crawler que pega as informações dos animes e salva numa planilha.
anime crawler google-sheets google-sheets-api
Last synced: 23 Jan 2025
https://github.com/allancapistrano/steam.py
An API wrapper for Steam written in Python.
Last synced: 23 Jan 2025
https://github.com/patrickschababerle/schabbi-webscraper
Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites.
crawler puppeteer scraper scraping web-crawler
Last synced: 09 Feb 2025
https://github.com/kehiy/prawler
Pactus P2P Network Crawler
crawler crawling metrics networking p2p pactus
Last synced: 28 Dec 2024
https://github.com/ssv445/js-rendering-proxy-docker
JS Rendering Proxy API to Handle JS Website in Your Crawler.
Last synced: 18 Jan 2025
https://github.com/erickj3/strike-api
this is a web scraping api with nestsj
api crawler flow nestjs scraping typescript
Last synced: 24 Jan 2025
https://github.com/seart-group/github-keyword-crawler
A simple and easy-to-deploy script for mining mentions of keywords across various :octocat: API endpoints
api-mining crawler dockerized github-api miner mongodb-database python-script
Last synced: 07 Dec 2024
https://github.com/eneax/web-crawler
A web crawler built in Node.js
crawler javascript nodejs web-crawler
Last synced: 14 Feb 2025
https://github.com/matheusfelipeog/google-doodles
Mapeie e faça download dos Doodles do Google.
crawler google google-doodle python web-scraping
Last synced: 25 Jan 2025
https://github.com/ronierisonmaciel/crawler
Um crawler utilizando BeautifulSoup tem como objetivo extrair informações de sites de maneira eficiente e estruturada. BeautifulSoup é uma biblioteca Python que facilita a análise e extração de dados de páginas HTML e XML. O projeto permite coletar e organizar informações relevantes.
beautifulsoup4 crawler crawling python python3
Last synced: 30 Jan 2025
https://github.com/timpletin/comming-soon
Coming Soon Page - Simple and clean design fully responsive on all screen, Count the days, hours, minutes and seconds for coming event
crawler css java javaweb nextjs nextjs-boilerplate nextjs-typescript nextjs14-typescript object-detection paypal python tailwindui tensorflow typescript
Last synced: 21 Jan 2025
https://github.com/bramtenhove/issue-crawler
Crawls Drupal issues and keeps stats
Last synced: 29 Dec 2024
https://github.com/yukihirai0505/streamcrawler
akka stream × crawler
akka-streams crawler elasticsearch instagram sbt scala
Last synced: 13 Jan 2025
https://github.com/bytejoseph/osintgit
OSINT investigation tool for Github
crawler email github github-to-email hacking hacking-tool hacktoberfest hacktoberfest2024 latest open-source-intelligence osint osint-python osint-tool pentesting pentesting-tools python python3 script streamlit streamlit-webapp
Last synced: 26 Jan 2025
https://github.com/blarc/windsurf-crawler
A simple crawler that collects windsurf boards offers from different sites.
Last synced: 30 Jan 2025
https://github.com/yuchenq/comp90055-project
This is the lastest version of my project belong to Comp90055.
couchdb crawler data-visualization python3 textblob tweepy
Last synced: 19 Jan 2025
https://github.com/gesiscss/github_traffic_crawler
Retrieve the data information from the repositories (insight, usage, commits)
Last synced: 03 Jan 2025
https://github.com/joyceannie/moviespider
This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.
crawler datascience python scrapy spider webscraper
Last synced: 29 Jan 2025
https://github.com/noarche/darknoisy
Same as my Noisy but on TOR network. Logs links. Crawls onion sites.
crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks
Last synced: 30 Jan 2025
https://github.com/khanof89/twitter_scraper
Scrape tweet details from user profile using selenium
crawler scraper selenium twitter twitter-bot
Last synced: 10 Jan 2025
https://github.com/hackthedev/botnet
Tool to find IP's on the Web and check SSH availability and brute force login with a wordlist. Educationally only !!!
botnet bruteforce crawler education educational ip malicious proof-of-concept ssh testing web
Last synced: 23 Jan 2025
https://github.com/josepedrodias/naivebot
attempt to mimic googlebot behaviour in nodejs with nightmarejs
crawler googlebot nightmarejs nodejs robots
Last synced: 21 Jan 2025
https://github.com/qqxs/usda_pomological_watercolors
爬取美国农业部果树水彩的数据
crawler koa2 nodejs watercolors
Last synced: 18 Jan 2025
https://github.com/tryagi/firecrawl
Generated C# SDK based on official Firecrawl OpenAPI specification
ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk
Last synced: 14 Oct 2024
https://github.com/beckkramer/puppeteer-traverse
Puppeteer utility to easily run a function you define per route on a set of routes.
crawler crawling nodejs puppeteer
Last synced: 19 Jan 2025
https://github.com/billy0402/python-application
A learning project from the book 'Python 技術者們'.
course crawler matplotlib opencv pandas python requests selenium sklearn
Last synced: 14 Jan 2025
https://github.com/kasperomari/simplecrawlerapi
A simple RESTful API that takes a URL and returns all the links in a specific depth.
crawler flask-api flask-restful
Last synced: 08 Feb 2025
https://github.com/billy0402/tibame-python-data-analysis
A learning project from TibaMe Python data analysis course.
ai course crawler jupyter-notebook matplotlib pandas python requests
Last synced: 14 Jan 2025
https://github.com/mirusu400/berryz-dl
Batch download berryz webshare files recursively!
berryz berryz-webshare crawler downloader scraper
Last synced: 26 Dec 2024
https://github.com/truongdd03/searchengine
A search engine written in c++.
cpp crawler search search-engine
Last synced: 13 Feb 2025
https://github.com/rsheremeta/web-crawler
A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output
crawler go golang web-crawler webcrawler
Last synced: 09 Jan 2025
https://github.com/jonasrenault/pubchem-api-crawler
Python client for PubChem's API to crawl compounds and their properties using a molecular formula search query.
chemistry crawler molecular-formula pubchem python
Last synced: 27 Jan 2025
https://github.com/jplitza/urlsearch
Index typical webserver directory listings and then search for arbitrary terms
Last synced: 24 Jan 2025
https://github.com/filipsedivy/tachometer-check
🚘 MDČR - kontrola tachometru
Last synced: 15 Feb 2025
https://github.com/tetreum/puppeteer-for-crawling
Daily use crawling methods for puppeteer
Last synced: 04 Feb 2025
https://github.com/tom-draper/wiki-crawl
A game of path finding through Wikipedia topics.
api crawler crawlers crawling crawling-python game pathfinding python requests wiki wikipedia wikipedia-api wikipedia-search
Last synced: 31 Dec 2024
https://github.com/fritz-c/itunes-stats
Fetch info on podcasts, etc. from iTunes RSS data
Last synced: 02 Jan 2025
https://github.com/hoan02/novel-crawler
Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn
Last synced: 20 Jan 2025