Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
![](https://explore-feed.github.com/topics/crawler/crawler.png)
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2025-02-11 00:06:38 UTC
- JSON Representation
https://github.com/allancapistrano/steam.py
An API wrapper for Steam written in Python.
Last synced: 23 Jan 2025
https://github.com/bradsec/gomine
A Go CLI tool to quickly crawl and mine (download) specific file types from websites.
cli crawler golang terminal-based
Last synced: 22 Dec 2024
https://github.com/n3d1117/sisop17
Esercizio per esame di Sistemi Operativi - 2017
crawler html java parser semaphores synchronization thread-safety threading
Last synced: 19 Dec 2024
https://github.com/docongminh/vinbdi-crawler
crawl data using scrapy + bs4
bs4-requests crawler scrapy splash
Last synced: 28 Dec 2024
https://github.com/seart-group/github-keyword-crawler
A simple and easy-to-deploy script for mining mentions of keywords across various :octocat: API endpoints
api-mining crawler dockerized github-api miner mongodb-database python-script
Last synced: 07 Dec 2024
https://github.com/luickk/vulnerability-crawler
Small python program meant to analyze random sites found on google for any vulnerabilities!
Last synced: 28 Dec 2024
https://github.com/matheusfelipeog/google-doodles
Mapeie e faça download dos Doodles do Google.
crawler google google-doodle python web-scraping
Last synced: 25 Jan 2025
https://github.com/bramtenhove/issue-crawler
Crawls Drupal issues and keeps stats
Last synced: 29 Dec 2024
https://github.com/tomfran/crawler
A web crawler written in Rust
bloom-filter crawler rust simhash
Last synced: 06 Jan 2025
https://github.com/tinoco/ticapsoriginal_div2png
Ticapsoriginal programmatically div design to png generator of html code from url
beutifulsoup crawler data design div2png generated-art generator html2image parse programmatically-layout pycodestyle python requests ticapsoriginal url urllib
Last synced: 09 Jan 2025
https://github.com/blarc/windsurf-crawler
A simple crawler that collects windsurf boards offers from different sites.
Last synced: 30 Jan 2025
https://github.com/gesiscss/github_traffic_crawler
Retrieve the data information from the repositories (insight, usage, commits)
Last synced: 03 Jan 2025
https://github.com/devindon/movie-crawler
Movie crawler for douban.com, pianku.tv, etc.
Last synced: 02 Feb 2025
https://github.com/noarche/darknoisy
Same as my Noisy but on TOR network. Logs links. Crawls onion sites.
crawler crawling onion-domains onion-services onion-sites onions-list python python-script python3 tor torsocks
Last synced: 30 Jan 2025
https://github.com/izh318/genie-music-artist-album-crawler
지니뮤직에 등록 되어 있는 특정 아티스트의 앨범 정보를 한 번에 크롤링 하는 Python Script 입니다.
Last synced: 28 Dec 2024
https://github.com/tiennhm/crawl-sanfoundry-mcqs
Sanfoundry MQCS Crawler
beautifulsoup4 bs4 crawler csv flask python
Last synced: 27 Jan 2025
https://github.com/datvodinh/laptop-price-prediction
An End to End Data Science Project about Laptop Price Prediction
crawler ensemble-learning scrapy selenium xgboost
Last synced: 17 Nov 2024
https://github.com/zigai/crawlwright
Web crawling framework powered by Playwright
crawler crawling playwright python scraping wrighter
Last synced: 02 Feb 2025
https://github.com/kasperomari/simplecrawlerapi
A simple RESTful API that takes a URL and returns all the links in a specific depth.
crawler flask-api flask-restful
Last synced: 08 Feb 2025
https://github.com/patrickschababerle/schabbi-webscraper
Small and easy to use NodeJS webcrawler project. Returns basic information about the crawled sites.
crawler puppeteer scraper scraping web-crawler
Last synced: 09 Feb 2025
https://github.com/apurvsikka/mediaverse
MediaVerse is a versatile search engine for various media types such as anime, books and drama
anime anime-api anime-api-free api-rest bun crawler extensions extensions-pack free-manga kdrama lightnovel manga manga-api manga-api-free manga-crawler manga-reader movies netflix ts tv
Last synced: 03 Feb 2025
https://github.com/dmarcosl/upshelf-technical-test
Technical test for Upshelf
crawler interview python scraping scrapy spider technical-test web-scraping
Last synced: 22 Dec 2024
https://github.com/rsheremeta/web-crawler
A tiny web-crawler which looks for the links, extract and prints them concurrently to the Terminal output
crawler go golang web-crawler webcrawler
Last synced: 09 Jan 2025
https://github.com/basemax/crawler-news-currency-gold-coins
PHP Crawler to get Persian news related to currency coin and gold.
crawler crawler-php crawler-testing currency currency-exchange-rates gold php php-crawler
Last synced: 09 Feb 2025
https://github.com/capturr/json-deep-equal
Check if json objects contains the same values (ignoring arrays order).
array compare comparison crawler crawling deep equal equality equality-check equals javascript json object recursive scraper scraping spider test tree typescript
Last synced: 07 Jan 2025
https://github.com/andrefs/derzis
A path-aware distributed linked data crawler
Last synced: 08 Jan 2025
https://github.com/tisfeng/bing-dict
A Bing command line dictionary, which obtains the query results of bing dictionary by crawler.
bing-dictionary command-line crawler nodejs
Last synced: 03 Jan 2025
https://github.com/tech-espm/misc-webbot
This project is aimed on creating personal assistants for replying messages about specifics issues.
classification-model crawler nlp
Last synced: 11 Jan 2025
https://github.com/kofj/octopus
Octopus an open source software to collect data from web pages.
Last synced: 27 Jan 2025
https://github.com/tryagi/firecrawl
Generated C# SDK based on official Firecrawl OpenAPI specification
ai crawler crawling dotnet firecrawl generated generator langchain langchain-dotnet net8 netframework netstandard openapi scrape scraping sdk
Last synced: 14 Oct 2024
https://github.com/bradsec/gofindfiles
Crawl websites attempting to find and download files with matching file types. For use as OSINT or RECON intelligence collection tool.
crawler osint osint-tool recon scraper web-scraper
Last synced: 07 Jan 2025
https://github.com/tom-draper/wiki-crawl
A game of path finding through Wikipedia topics.
api crawler crawlers crawling crawling-python game pathfinding python requests wiki wikipedia wikipedia-api wikipedia-search
Last synced: 31 Dec 2024
https://github.com/jofaval/open-graph-visualizer
Web Scraping showcase of how crawlers retrieve site's details through the Open Graph Protocol
crawler javascript opengraph scraping web web-scraping
Last synced: 04 Feb 2025
https://github.com/tigercosmos/web-crawler
Web Crawler in Java Maven Project
Last synced: 01 Feb 2025
https://github.com/namchee/hackerbits
Web Crawler dan Clustering pada website HackerNews.
Last synced: 30 Jan 2025
https://github.com/thamindur/ir-project
Search Engine for Sri Lankan MPs
crawler elasticsearch python scraping search-engine
Last synced: 09 Feb 2025
https://github.com/ri0n/unboxer
MP4 crawler and extractor
crawler extractor mp4 object-oriented-design qt
Last synced: 13 Jan 2025
https://github.com/snwfdhmp/3gm-bot
Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.
3gm-bot crawler game-bot task-automation web-crawling
Last synced: 15 Jan 2025
https://github.com/ekojs/web-crawler
Web Crawler untuk mengambil judul penelitian pada Google Scholar
Last synced: 08 Jan 2025
https://github.com/danielemoraschi/sitemap-common
Simple PHP Sitemap generator and crawler library.
crawler php php-library php-sitemap-generator sitemap
Last synced: 31 Dec 2024
https://github.com/danielemoraschi/go-sitemap-app
crawler golang sitemap sitemap-generator
Last synced: 31 Dec 2024
https://github.com/danielemoraschi/sitemap-app
Sitemap generator command line application using dmoraschi/sitemap-common library
crawler php php-library sitemap sitemap-generator
Last synced: 31 Dec 2024
https://github.com/jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
crawler crawling diceware go wikipedia wikipedia-crawler word-extraction
Last synced: 21 Jan 2025
https://github.com/arman-aminian/divar-text-exploring
The first practice of Dr. Asgari's NLP lesson - Data Exploration
crawler natural-language-processing nlp preprocessing scrapy
Last synced: 08 Jan 2025
https://github.com/sgeisler/fishbones2epub
fetches the fishbones novel and outputs an epub
Last synced: 27 Jan 2025
https://github.com/tsaohucn/crawler_fb_page
This is crawler use selenium for facebook pages
crawler facebook-page rails ruby selenium
Last synced: 20 Jan 2025
https://github.com/berecat/selenium_facebook_scraper
A simple python3 script used to download a users's friend list from facebook.
automation crawler facebook facebook-scraper webscraper
Last synced: 08 Jan 2025
https://github.com/vaenow/chromeless-coursera-caption
Chromeless crawler coursera video's caption / subtitle
caption chromeless coursera crawler crx subtitle
Last synced: 06 Feb 2025
https://github.com/vaenow/crawler-chromeless
A chromeless crawler for coursera
chromeless coursera crawler puppeteer
Last synced: 06 Feb 2025
https://github.com/sbstjn/tatort
Query information for upcoming Tatort shows
Last synced: 05 Jan 2025
https://github.com/hvtuananh/twitter_crawler
Daemon to call and get tweets from Twitter Public Stream API
crawler java streaming-api tweets twitter twitter-crawler
Last synced: 23 Oct 2024
https://github.com/d7isme/pixiv-downloader-mod
Modded extension of the pixiv downloader on chrome webstore with premium feature unlocked.
chrome-extension crawler extension-chrome image pem pixiv pixiv-bot pixiv-crawler pixiv-downloader
Last synced: 09 Jan 2025
https://github.com/ashwantmanikoth/aipoweredwebcrawler
This is a AI powered crawler that can search the web for information based on your input.
crawler deepseek groq-api hybrid-search llama llm pydantic python rag reranking retrieval-augmented-generation
Last synced: 10 Feb 2025
https://github.com/palpitate-xus/sge_data_insert
利用Github Actions实现自动获取sge数据并存入数据库
Last synced: 08 Feb 2025
https://github.com/ryu1kn/procedural-page-crawler
Page Crawler. Tell it where to go and what to look for.
Last synced: 03 Feb 2025
https://github.com/zfael/scrape-it-all
Modular web scraper for Node.JS
crawler scraper scraping scraping-websites web-scraping
Last synced: 23 Dec 2024
https://github.com/mohitk05/drstrange
A simple breadth-first search web crawler
Last synced: 01 Feb 2025
https://github.com/filipsedivy/tachometer-check
🚘 MDČR - kontrola tachometru
Last synced: 23 Dec 2024
https://github.com/engageintellect/scrapers
A repository of web scrapers using Python & Scrapy
Last synced: 06 Feb 2025
https://github.com/basemax/css-properties
The CSS Properties Repository is a comprehensive collection of CSS properties, categorized and detailed for web developers. It offers a structured overview of various CSS properties, including their names, categories, brief descriptions, and links to detailed references.
crawler css css-properties css-property css3
Last synced: 14 Jan 2025
https://github.com/splorg/sage
A scraper to get every quote from a book off of Goodreads.
books crawler datamining goodreads goodreads-data python scraper scrapy webcrawling webscraping
Last synced: 21 Jan 2025
https://github.com/allotmentandy/socialmedialinkextractor
php laravel package to extract social media links from an array of links for my spider, used as part of a spider for checking londinium.com website links
crawler extractor facebook laravel linked-list php social social-network spider twitter url youtube
Last synced: 23 Dec 2024
https://github.com/basemax/crawleryjc
This PHP crawler is designed to scrape news articles and categories from the YJC.ir news agency website. It provides a way to extract valuable data from the website for further analysis or any other purpose.
crawler crawler-php database database-news ir ir-yjc iran news news-database news-yjc php php-crawler yjc yjc-ir yjc-news
Last synced: 09 Feb 2025
https://github.com/tca166/ck2-history-extractor
A tool for creating an encyclopedia from your CK2 savefile
Last synced: 07 Feb 2025
https://github.com/dubniczky/webmap
Website mapping crawler implemented in python
crawler mapping mapping-tools package python scraping security
Last synced: 06 Feb 2025
https://github.com/dubniczky/bad-robot
This is a python crawler that disregards robots.txt rules and downloads disallowed resources
crawler osint-python osint-tool python robots-txt
Last synced: 06 Feb 2025
https://github.com/davelongdev/link-report-crawler
A web crawler using Node.js that crawls a site and returns a report showing all internal links.
crawler crawling javascript seo seo-tools
Last synced: 02 Jan 2025
https://github.com/huakunshen/cron-crawler-template
Web Crawler Cron Job Template running with GitHub Action. Capable of sending email notifications.
Last synced: 17 Jan 2025
https://github.com/yukihirai0505/streamcrawler
akka stream × crawler
akka-streams crawler elasticsearch instagram sbt scala
Last synced: 13 Jan 2025
https://github.com/fscotto/noahcrawler
A simple web crawler written in Java to support a database of Italian regions.
Last synced: 21 Jan 2025
https://github.com/athulmurali/flickr-api-docs-crawler
A python based crawler that extracts the documentation of apis and writes it into a file as JSON. A beautiful documentation page can be built from the JSON file using Docusaurus
api beautifulsoup4 crawler documentation python3
Last synced: 09 Jan 2025
https://github.com/ecklf/reddit-clawler
A command-line tool written in Rust that crawls Reddit posts from a user or subreddit
cli crawler downloader downloader-for-reddit reddit
Last synced: 06 Feb 2025