Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2025-01-10 00:06:02 UTC
JSON Representation

https://github.com/juliocesarscheidt/stock-trader

aws-alb aws-ecs aws-xray crawler flask github-actions mongodb python rabbitmq terraform

Last synced: 24 Nov 2024

https://github.com/zaneh/ocw-crawler

Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.

crawler kimurai mit ocw opencourseware spider

Last synced: 15 Nov 2024

https://github.com/sajjadanwar0/booking.com-scraping

Scraping booking.com using Selenium and Beautiful Soup

crawler data python scraping selenium

Last synced: 14 Nov 2024

https://github.com/homuchen/instagram-crawler

Instagram crawler

crawler instagram nodejs-crawler

Last synced: 01 Dec 2024

https://github.com/joyceannie/moviespider

This project is used to crawl movie data from IMDb. Scrapy framework is used to extract relevant information like movie title, datePublished, summary, genres, director etc.

crawler datascience python scrapy spider webscraper

Last synced: 01 Dec 2024

https://github.com/rcmilan/ex-web-scraping

Web Scraping com F#

crawler f-sharp fsharp fsharp-data scraper web-scraping xplot

Last synced: 17 Nov 2024

https://github.com/abx123/coronachan

Simple lambda function to crawl MKN twitter account for daily Malaysia COVID-19 updates.

crawler lambda-functions python

Last synced: 07 Dec 2024

https://github.com/kevincolemaninc/mm-crawler

Scrapes meetme user profiles

crawler docker fake-data meetme ruby scraper sidekiq

Last synced: 01 Jan 2025

https://github.com/tssujt/async-crawler-sample

A simple crawler sample based on asyncio~

aiohttp asyncio crawler

Last synced: 21 Nov 2024

https://github.com/abx123/crawler

Simple lambda function to crawl daily web novel updates.

crawler firebase-database golang lambda-functions

Last synced: 07 Dec 2024

https://github.com/johanbook/node-web-crawler

Nodejs CLI for web crawling

cli crawler nodejs typescript

Last synced: 16 Nov 2024

https://github.com/madret/selenium_crawler

Selenium Webcrawler based on the chromedriver.

chromedriver crawler human-like selenium selenium-webdriver webcrawler

Last synced: 15 Nov 2024

https://github.com/copha-project/copha

Open-Source Software For Managing Tasks

crawler framework nodejs puppeteer selenium

Last synced: 15 Nov 2024

https://github.com/anshiii/pixder

🤔 A spider for pixiv.net

crawler pixiv spider

Last synced: 22 Nov 2024

https://github.com/sedrubal/webcrawler

Crawl sites and search for security issues.

crawler script security website-auditing

Last synced: 24 Nov 2024

https://github.com/smikodanic/dex8-sdk

DEX8 SDK is software development kit for DEX8.com platform.

crawler crawler-engine data-extraction dex8 scraper scraping-websites spider

Last synced: 26 Dec 2024

https://github.com/bingxyz/btcethcrawler

telegram 比特幣、乙太幣廣播頻道

bash bash-script crawler telegram-bot

Last synced: 21 Nov 2024

https://github.com/lucasfogliarini/minhaentradacrawler.consoleapp

Web crawler em C# que usa a biblioteca AngleSharp para extrair detalhes de eventos do site "https://minhaentrada.com.br". Ele analisa o HTML da página e recupera informações como título, data, local e links dos eventos.

anglesharp crawler minhaentrada

Last synced: 31 Dec 2024

https://github.com/jayzhan211/python-crawler-startups

python crawler learning

crawler python

Last synced: 25 Nov 2024

https://github.com/zhaotianff/crawler-line

C# command-line crawler

command-line command-line-tool crawler csharp dotnet-core

Last synced: 15 Nov 2024

https://github.com/eivindarvesen/naive-spider

A minimal web crawler

crawler python spider

Last synced: 16 Nov 2024

https://github.com/kyagara/lol-match-crawler

Very simple crawler for League of Legends matches.

crawler league-of-legends pgx postgres riot-games sql

Last synced: 01 Dec 2024

https://github.com/vishaalpkumar/skysift

A distributed search engine from scratch

aws crawler css distributed-systems html java search-engine

Last synced: 22 Dec 2024

https://github.com/lysagxra/eromedownloader

Erome albums and profile downloader

bulk bulk-downloader concurrent-processing crawler downloader erome erome-downloader parallel-processing profile-downloader python python3

Last synced: 16 Nov 2024

https://github.com/snwfdhmp/3gm-bot

Bot for the online french indie game 3gm.fr implemented in Ruby. Mostly website crawling and task automation.

3gm-bot crawler game-bot task-automation web-crawling

Last synced: 15 Nov 2024

https://github.com/joeri-abbo/python-credly-scraper

This project is a set of Python scripts designed to crawl and extract data from the Credly platform, focusing on skills, organizations, and badges. The scripts allow users to perform searches using command-line arguments, predefined search terms, or skills listed in a JSON file. The collected data is then saved to JSON files for further analysis an

badges crawler credly data-extraction json organizations python python3 requests-library skills web-crawling

Last synced: 15 Nov 2024

https://github.com/genfuture/cryptocurrency-scraper

Cryptocurrency Data Crawler 🚀 High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, social metrics, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools.

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper