Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2025-01-24 00:06:40 UTC
JSON Representation

https://github.com/nava45/simplempcrawler

Simple Multiprocessing Crawler in python

crawler multiprocessing python

Last synced: 05 Jan 2025

https://github.com/maxbubblegum47/spotydump

Spotify Scraper combined with a Genius Scraper. Scrape artist of a certain period of time/region of the world and dump all their songs!

crawler dump genius lyrics python spotify unimore-informatica

Last synced: 29 Nov 2024

https://github.com/tsonglew/spidreat

Article Spider with Python & Node.js :beetle:

crawler

Last synced: 19 Dec 2024

https://github.com/genfuture/cryptocurrency-scraper

Cryptocurrency Data Crawler 🚀 Updates CoinData Every 12 hours. High-performance Node.js crawler that fetches comprehensive data for 1500+ cryptocurrencies from CoinGecko API. Collects market data, and blockchain details with built-in rate limiting and resume capability. Perfect for crypto analysis, research, and building market intelligence tools

binance bitcoin bsc coingecko coingecko-api crawler crypto-bot cryptocurrencies cryptocurrency ethereum scraper

Last synced: 17 Jan 2025

https://github.com/restuwahyu13/node-scraper-content

example node scraper all content programming using puppeteer

crawler nodejs puppeter scrapper

Last synced: 03 Jan 2025

https://github.com/sangupta/shopify-burst-crawler

Simple crawler to download meta information for all stock pics from Shopify Burst website

burst crawler java shopify stock-photos

Last synced: 08 Nov 2024

https://github.com/nueip/curl

NUEiP Curl Lib

crawler php

Last synced: 24 Nov 2024

https://github.com/krishpranav/spider

A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely

crawler ruby spider web-crawler web-scraping

Last synced: 06 Dec 2024

https://github.com/lockblock-dev/crawlarr

Crawlarr is a fast web crawler built in Go. It searches for anchor tags in the HTML pages and follows links. It leverages concurrency to improve speed.

crawler golang

Last synced: 24 Jan 2025

https://github.com/kluhan/kraken

Kraken is a generic, mid-scale web crawler specifically built to crawl vertical data-sources, like Youtube or the Google Play Store.

celery crawler google-play-store python web-crawling

Last synced: 15 Dec 2024

https://github.com/ewertoncodes/mind-crawler

A simple api written in Rails to extract quotations from the Quotes to Scrape site.

crawler ruby ruby-on-rails

Last synced: 23 Jan 2025

https://github.com/jiannei/github-trending

Github trending crawling based on lumen.

crawler github-trending lumen php

Last synced: 09 Nov 2024

https://github.com/eklem/browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

crawler search-engine website-generation

Last synced: 19 Dec 2024

https://github.com/ging-dev/sitemap-crawler

Collect links through the sitemap.xml or robots.txt

crawler php php8 sitemap sitemap-crawler

Last synced: 18 Nov 2024

https://github.com/coghost/izen

encapsulation of some useful features

chaos crawler encrypt izen mqtt profig python3 utils

Last synced: 09 Nov 2024

https://github.com/fernandod1/yahoo-finance-scraper

This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.

crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api

Last synced: 12 Jan 2025

https://github.com/kokseen1/chii

A minimal marketplace bot maker.

auction automation bidding bot carousell crawler ecommerce marketplace python python-telegram-bot scraper telegram telegram-bot web-scraping yahoo yahoo-auction

Last synced: 13 Jan 2025

https://github.com/sean2077/leetcode_anki

Leetcode Anki card factory.

anki crawler leetcode leetcode-anki scrapy

Last synced: 11 Jan 2025

https://github.com/ysh329/stock-newspaper-crawler

[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).

corpus crawled-data crawler database stock-newspaper-crawler

Last synced: 16 Dec 2024

https://github.com/natshah/natshah-crawler

Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.

crawler database filter natshah-crawler

Last synced: 14 Dec 2024

https://github.com/sebi75/lightweight-sitemapper

A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies

crawler node-js sitemap

Last synced: 21 Dec 2024

https://github.com/ph-7/gettermails

GetterMails, Scraper

bot crawler email php python retrieve-web-page scrape scraper scraping scraping-websites scrapper webdriver

Last synced: 19 Jan 2025

https://github.com/z3ntl3/redeye

Crawl real and new user agents from the most major 2 databases.

crawler header ua user-agents useragents

Last synced: 16 Dec 2024

https://github.com/dhchenx/quick-crawler

A toolkit for quickly performing crawler functions

crawler crawler-python

Last synced: 01 Dec 2024

https://github.com/appliedsoul/crawlmatic

Static and Dynamic website crawling library - a common promise based wrapper around node-crawler & hccrawler libraries.

crawler scraper

Last synced: 30 Dec 2024

https://github.com/victorhuu/amazonmovieintegration

本仓库是同济大学数据仓库的第一个个人作业——利用爬虫与ETL工具整理Amazon的电影数据

crawler data-warehouse movies pandas scrapy xpath

Last synced: 28 Nov 2024

https://github.com/aleclarson/recrawl

Filesystem crawler

crawler fs nodejs

Last synced: 09 Jan 2025

https://github.com/songjiayang/china_repos

github repo 爬虫

china crawler statistics

Last synced: 11 Jan 2025

https://github.com/konradlinkowski/wikipediafinder

Find words in wikipage

crawler scraper wikipedia

Last synced: 28 Nov 2024

https://github.com/konradlinkowski/mailcrawler

Crawler to find emails in the websites

crawler scraper

Last synced: 28 Nov 2024

https://github.com/weaming/simple-crawler

my simple crawler

crawler

Last synced: 12 Jan 2025

https://github.com/dizys/weibo-crawler

A nodejs weibo crawler

crawler nodejs typescript weibo-spider

Last synced: 27 Dec 2024

https://github.com/shgopher/retuo

A distributed crawler

crawler go

Last synced: 31 Dec 2024

https://github.com/danielemoraschi/go-sitemap-common

Simple GO sitemap generator and crawler.

crawler golang sitemap sitemap-generator

Last synced: 31 Dec 2024

https://github.com/ryanchao2012/okbot

A conversation retrieval engine based on PTT corpus

chatbot crawler django ptt

Last synced: 12 Jan 2025

https://github.com/rxcai/python3-weibo-crawler

基于Python3实现的微博小爬虫

crawler python python3 spider weibo

Last synced: 28 Nov 2024

https://github.com/khilnani/spidey.py

Web spiders are usually disliked by websites, but useful for recursive API/page downloads for offline analysis.

cli crawler python scaper web-spider

Last synced: 02 Dec 2024

https://github.com/ccrashzer0/web_crawler

A python based web crawler

crawler internet python python3 webcrawler

Last synced: 28 Nov 2024

https://github.com/1970mr/link-crawler

Web Link Crawler: A Python script to crawl websites and collect links based on a regex pattern. Efficient and customizable.

clawler crawler crawler-python link-crawler link-crawler-python link-scraper link-scraper-python links python scraper scraper-python website-crawler website-scraper

Last synced: 11 Nov 2024

https://github.com/davideferre/covid19-data-crawler-ita

Covid 19 italian data crawler

coronavirus covid19 crawler hacktoberfest hacktoberfest2021 python

Last synced: 11 Jan 2025

https://github.com/excaliburhan/littlenews

A news app via electron

crawler electron rss-feed

Last synced: 29 Nov 2024

https://github.com/cryptoc1/earl

Earl is looking for URLs in your area.

crawler middleware nuget webscraping

Last synced: 28 Nov 2024

https://github.com/tungct/facebook-crawler

crawler facebook python

Last synced: 14 Jan 2025

https://github.com/jfcherng/wiki-cgroup-crawler

此腳本用於抓取維基百科的公共轉換組詞庫，並將結果儲存為外部檔案。

crawler php-71 wiki-cgroup-crawler wikipedia

Last synced: 22 Jan 2025

https://github.com/hoanle396/py-iconnect

crawler flask flask-application image-processing python

Last synced: 14 Dec 2024

https://github.com/tungct/golangcrawler

Crawler goroutine Golang

crawler go

Last synced: 14 Jan 2025

https://github.com/sefinek/niedlascamu.pl-tracker

Śledzenie zmian na stronie niedlascamu.pl.

crawl crawler niedlascamu tracker tracking

Last synced: 07 Dec 2024

https://github.com/hanifdwyputras/se-scraper

Search Engine scraper with PHP

crawler scraper seo seo-crawler

Last synced: 06 Dec 2024

https://github.com/rbkgh/dailytext-crawler

Crawl jw.org to retrieve daily text

crawler dailytext java jsoup jw

Last synced: 15 Jan 2025

https://github.com/zabuzard/wslotter

WSlotter is a Selenium driven tool for assigning to events on 'https://www.gruppe-w.de'.

bot crawler gruppe-w

Last synced: 12 Jan 2025

https://github.com/andmerk93/scrapy_parser_pep

Учебный проект на Scrapy, парсит PEP, выводит в 2х форматах

crawler scrapy

Last synced: 24 Jan 2025

https://github.com/saketh7382/smartcrawler

Package for crawling items from webpages and store them as json file

crawler crawler-python open-source pip python3 scraper selenium selenium-webdriver webdriver-manager

Last synced: 08 Dec 2024

https://github.com/dangdungcntt/crawl-fb-v2

Simple script to detect email and phone from facebook comment.

crawler facebook

Last synced: 18 Jan 2025

https://github.com/zhoudaxia233/unilogo

A visually striking assembly of the top 1000 universities' logos from ARWU, sorted by color into a vibrant spectrum.

crawler python visualization

Last synced: 15 Dec 2024

https://github.com/naveenaidu/google-crawler

Google Crawler - Curates the search results

beautifulsoup crawler scraper

Last synced: 18 Jan 2025

https://github.com/nelcifranmagalhaes/web_crawler

A web crawler for all Naruto characters

anime beautifulsoup characters crawler naruto python

Last synced: 03 Dec 2024

https://github.com/lykmapipo/producthunt-python-scrapy-scraper

Python Scrapy spiders that scrapes data from producthunt.com

crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper

Last synced: 21 Dec 2024

https://github.com/karantyagi/web-crawler

BFS and DFS implementations for a wikipedia crawler

beautifulsoup crawler

Last synced: 12 Jan 2025

https://github.com/par7133/splash-bot-crawler

Splash Bot creates splash on the fly of your websites - GPL License 🔥

bot crawler gallery open-source opensource php splash

Last synced: 12 Jan 2025

https://github.com/comigor/balances

Your checking and savings accounts balances on banks and brokers.

balance bank broker crawler node

Last synced: 09 Dec 2024

https://github.com/hoishing/selenium-crawler

a web crawler written in python, powered by Selenium and Tesseract OCR

crawler python selenium

Last synced: 18 Jan 2025

https://github.com/flavien-hugs/scrapy-test

Manipulation de la librairie Scrapy. Mini script permet d'extraire l'ensemble des personnages de dessin animé sur Wikipedia.

crawler python scraping scrapy

Last synced: 09 Dec 2024

https://github.com/mmqnym/pyppeteer-use-case

Show how to do web crawl via pyppeteer

crawl crawler pyppeteer python

Last synced: 18 Jan 2025

https://github.com/hedon954/go-crawler

A crawler system implemented in Go.

crawler go

Last synced: 21 Jan 2025

https://github.com/40uf411/sillybot

SillyBot is a wrapper for the selenium library

bot crawler python scraper selenium web wrapper

Last synced: 19 Dec 2024

https://github.com/piopi/behatcrawler

A Behat extension that crawls links on a website and executes user-defined function on each one of them.

behat behat-extension crawler php selenium-webdriver

Last synced: 19 Dec 2024

https://github.com/gnaneshkunal/book-miner

Web crawler for Book reviews (Goodreads)

crawler goodreads typescript

Last synced: 16 Dec 2024

https://github.com/cothema/nlp-workers

crawler nlp

Last synced: 18 Jan 2025

https://github.com/leomaurodesenv/smm-maker-profile

A package to fetching the maker profile - Super Mario Maker

crawler javascript json mario-maker nodejs

Last synced: 02 Nov 2024

https://github.com/baerwang/sec_craw

一个方便安全研究人员获取每日安全日报的爬虫，目前爬取范围包括90sec、看雪论坛、v2ex、精易论坛、52破解论坛等实验室博客，持续更新中。

crawler security security-tools threat threat-intelligence

Last synced: 21 Jan 2025

https://github.com/cseas/shares-monitor

Web crawler to fetch and monitor shares details.

crawler python python3 scraper scraping-websites shares

Last synced: 27 Dec 2024

https://github.com/pnguyen215/instagram-crawler

Instagram Crawler is a Python script to download posts from a specified Instagram account.

crawler crawling-python instagram instagram-crawler scraper scraping-python scraping-websites scrapper scrapy-crawler

Last synced: 12 Jan 2025

https://github.com/pierlauro/mdbubing

From WARC records to MongoDB documents

bubing crawler crawling warc warc-files warc-format warc-record webarchive webarchiving

Last synced: 09 Dec 2024

https://github.com/khoinguyen2k/web-crawler

about crawl data

crawler jsoup-library scraper selenium-java

Last synced: 17 Jan 2025

https://github.com/princed/specht

Check links found in html or js files by pattern

cli crawler html javascript streams

Last synced: 19 Jan 2025

https://github.com/xcrypt0r/xcrawler

✂️ A crawling example for maplestory with various languages using multi-threading

crawler crawling multithreading parsing regexp

Last synced: 09 Jan 2025

https://github.com/orafaelfragoso/itunes-crawler

Retrieves information about an artist by crawling the iTunes API and iTunes Page

api crawler itunes itunes-api

Last synced: 19 Dec 2024

https://github.com/sirius-mhlee/naver-cafe-crawler

NAVER Cafe Crawler using pandas, tqdm, Selenium, BeautifulSoup4

beautifulsoup4 crawler pandas selenium tqdm

Last synced: 14 Jan 2025

https://github.com/snuzi/devblogs-aggregator

The backend aggregator project of DevBlogs.net

aggregator blog crawler engineering engineering-blogs tech tech-blogs tech-companies tech-news

Last synced: 09 Nov 2024

https://github.com/panagiks/asset

ASynchronous Spidering Essential Tool (ASSET).

async asyncio crawler graph reporting spider

Last synced: 06 Dec 2024

https://github.com/moontai0724/auto-notify-pu-courses-quota

A small crawler to fetch remains quota of a list of courses in Providence University every 2 to 10 minutes, then send webhook when change.

crawler javascript nodejs

Last synced: 06 Dec 2024

https://github.com/enishant/cooking-perl

crawler data-extraction description-extraction keyword-extraction keywords perl perl-lwp perl5 search-bot title-extraction website-seo

Last synced: 19 Jan 2025

https://github.com/buren/stupid_crawler

Stupid crawler that looks for URLs on a given site

cli crawler ruby rubygem

Last synced: 12 Oct 2024

https://github.com/marcinrek/sauron

Basic page crawler written in Node.js

crawler json node-js nodejs requests

Last synced: 29 Nov 2024

https://github.com/jonasrenault/cprex

Chemical Properties Relation Extraction

chemistry crawler deep-learning information-extraction machine-learning named-entity-recognition nlp pubchem relation-extraction scientific-articles spacy transformers

Last synced: 14 Oct 2024

https://github.com/loggerhead/dianping_crawler

基于 Scrapy (python 3.5) 的大众点评爬虫

crawler python-3-5

Last synced: 24 Jan 2025

https://github.com/coghost/crawlers

crawlers in one

crawler python3 staticimg weibo

Last synced: 02 Jan 2025

https://github.com/amirsorouri00/dsl-se

This is a MVP provided based on the "Search Engine And Data Mining" Course. The idea behind this project is the forked project which its link provided is

container crawler distributed-systems docker docker-compose elasticsearch pagerank search-engine

Last synced: 19 Jan 2025

https://github.com/hwywl/mzitu-crawler

爬取mzitu网站的妹子，注意营养

crawler mzitu python

Last synced: 08 Jan 2025

https://github.com/aminehsan/crawler-divar.ir

Analyzing and Extracting Insights from Ads on 'divar.ir'

crawler data-mining data-science divar-ir scarping

Last synced: 04 Dec 2024

https://github.com/dean9703111/humandesign_nodejs

用nodejs爬蟲工具將人類圖網頁上的資訊爬下來，再存到雲端的google excel

crawler googlesheetapi googlesheets nodejs

Last synced: 12 Jan 2025

https://github.com/dean9703111/shopee_find_mac

用最快的速度找到便宜符合自己要求規格的mac

argparse crawler mac pip python python2 xlsxwriter

Last synced: 12 Jan 2025

https://github.com/birdroad1/server-pinger

Server pinger for Minecraft written in C++

cpp crawler make minecraft minecraft-scanner postgres scanner server

Last synced: 21 Jan 2025

https://github.com/amirzenoozi/aparat-videos-dataset

Some Simple Information About Aparat Videos for DataScientists

aparat cli crawler data-science data-science-projects pandas python python3 sdk-python sqlite3 video

Last synced: 21 Jan 2025

https://github.com/bitscoper/bitscoper_crawler

Crawls the titles of webpages in series by number and creates a list of the available links.

crawler lister

Last synced: 05 Dec 2024

https://github.com/kangoo13/textbroker-author-article-picker

Bot that automatically lock an order into a textbroker's author account.

author-textbroker automation bot colly crawler go gocolly golang scrapper spider textbroker textbroker-author textbroker-order-picker textbroker-orders textbroker-scrapper

Last synced: 22 Jan 2025

https://github.com/alexzhangs/stockdb

Stock data collecting and analyzing

crawler django pandas scrapy stock tushare

Last synced: 08 Jan 2025

https://github.com/microlinkhq/ua

A simple redis primitives to incr() and top() user agents

crawler redis user-agent user-agent-parser

Last synced: 12 Jan 2025

https://github.com/suddi/fundscraper

Collection of web crawlers to scrape fund data using Scrapy

crawler funds scraper scrapy

Last synced: 11 Oct 2024

https://github.com/liebki/githubnet

This library allows you to retrieve several things from GitHub, things like trending repositories, profiles of users, the repositories of users and related information.

crawler crawling github github-trending htmlagilitypack microsoft

Last synced: 24 Jan 2025

Crawler Awesome Lists

awesome-crawler 101 awesome-python-primer 68 awesome-fingerprinting 48 awesome-digital-preservation 45

Crawler Categories

2.6 机器学习 50 Research 31 Replay tools 18 Python 18 1.1 语言基础 16 Libraries & Projects 13 Fingerprinting Evasion 13 Sites 12 2.4 Web 前端 10 2.1 爬虫基础 9 3\. 数据库 8 Java 7 Web archiving 7 2.5 数据分析 7 Other digital objects 6 4\. 异步IO 6 2.2 Flask 框架 4 2.3 Django 框架 4 Standards and specifications 4 Social Networks 4