Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2025-02-13 00:06:45 UTC
JSON Representation

https://github.com/huzecong/film-spider

Spiders crawling for film listing websites.

crawler

Last synced: 11 Jan 2025

https://github.com/ayusharma/rss-parser

A simple crawler in ReactJS

crawler reactjs rss-parser

Last synced: 10 Feb 2025

https://github.com/testica/a3hrgo-sdk

a3HRgo sdk to automatize your reports

a3hrgo crawler javascript puppeteer

Last synced: 10 Feb 2025

https://github.com/tokenmill/crawling-framework-example

Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.

crawler crawling-framework elasticsearch storm-crawler

Last synced: 06 Jan 2025

https://github.com/indatawetrust/reporter

Crawler queue creation tool for paging

crawler

Last synced: 13 Dec 2024

https://github.com/zenrows/crawling-from-scratch

Repository for the Mastering Web Scraping in Python: Crawling from Scratch blogpost with the final code.

crawler crawling python python3 scraping

Last synced: 16 Jan 2025

https://github.com/cls1991/gank.io

抓取干货集中营图片资源 (http://gank.io)

crawler curl gankio picture

Last synced: 11 Nov 2024

https://github.com/elektrostudios/fhm-crawler-freehardmusic.com

Crawls download urls of albums from freehardmusic.com website

albums crawl crawler crawling desktop-app desktop-application dotnet music web-crawler web-crawling web-scraper web-scraping webcrawler webcrawling webscraper webscraping windows windows-app windowsapp winforms

Last synced: 29 Jan 2025

https://github.com/igaozp/jobwitcher

JobWitcher 招聘网站爬虫合集

crawler python3 redis scrapy spider

Last synced: 27 Dec 2024

https://github.com/ribeirogab/technology-insights

Program with the aim of using the data from Stack Overflow Insights 2020 and generating informative graphs.

crawler python scraping typescript

Last synced: 19 Nov 2024

https://github.com/chenmozhijin/mediawikiextractor

一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.

crawler crawler-python crawling extractor json mediawiki python regex web-crawler

Last synced: 22 Jan 2025

https://github.com/a-x-/scian

Simple cian stat

cian crawler static-site

Last synced: 11 Jan 2025

https://github.com/mrmarble/mineseek

Minecraft server scanner

crawler minecraft minecraft-server scanner slp

Last synced: 17 Jan 2025

https://github.com/ktont/curlas

a nodejs spider tool

chrome-extension crawler spider

Last synced: 13 Jan 2025

https://github.com/achannarasappa/locust-cli

Developer tools to accelerate development of Locust jobs

cli crawler headless-chrome puppeteer scraper

Last synced: 19 Jan 2025

https://github.com/mikirasora/osuplayedbeatmapscrawler

A crawler that fetch and download osu!beatmaps which you had played

crawler osu

Last synced: 01 Jan 2025

https://github.com/ericz99/go-crawler

Simple lightweight crawler, that will find all endpoints on any website.

crawler golang

Last synced: 28 Jan 2025

https://github.com/v-braun/hero-scrape

Find the hero (main) image of an URL

crawler fastimage hero hero-image opengraph webscraping

Last synced: 15 Jan 2025

https://github.com/hrvadl/goweekly

Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel

article chatgpt crawler go golang openai-api telegram telegram-bot

Last synced: 13 Oct 2024

https://github.com/tufayellus/linkedin-cv-downloader

A Python based GUI automation software for downloading bulk LinkedIn CV / LinkedIn Resume from a list of profile links

crawler digital-marketing email-marketing email-scraper leads linkedin-bot linkedin-cv linkedin-cv-downloader linkedin-download linkedin-downloader linkedin-resume linkedin-resume-downloader linkedin-scraper scrape-emails scrape-websites scraper scraper-engine

Last synced: 23 Jan 2025

https://github.com/marvnc/pixiv-dump

Pixiv Encyclopedia DB Dumps, updated daily

crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping

Last synced: 13 Feb 2025

https://github.com/stangirard/crawlycolly

Website Crawler to extract all urls

colly crawler discover golang sitemap

Last synced: 15 Jan 2025

https://github.com/nazanin1369/searchengine

Implementing a search engine using Java, AngularJS and Elastic search

angularjs crawler elasticsearch java search-engine

Last synced: 07 Jan 2025

https://github.com/carloocchiena/python_url_crawler

A script that starting from a webpage, iterate thru all its link, appending them in a list. Sort of proxy to get all pages in a website

beautifulsoup crawler python python3

Last synced: 28 Nov 2024

https://github.com/first-coding/django-and-web

This is a django and Web front - and back -end separation project.

crawler django python

Last synced: 28 Dec 2024

https://github.com/z3ntl3/redeye

Crawl real and new user agents from the most major 2 databases.

crawler header ua user-agents useragents

Last synced: 09 Feb 2025

https://github.com/norconex/committer-neo4j

Implementation of Norconex Committer for Neo4j.

crawler neo4j neo4j-committer norconex-committer

Last synced: 09 Feb 2025

https://github.com/epigos/newsbot

A news bot written in Go for Dialogflow and Facebook messenger

autocert chatbot crawler datastore dialogflow facebook-messenger-bot golang letsencrypt newsfeed

Last synced: 27 Jan 2025

https://github.com/nava45/simplempcrawler

Simple Multiprocessing Crawler in python

crawler multiprocessing python

Last synced: 05 Jan 2025

https://github.com/gnujoow/crawl-repo

crawling github's repositories basic info

crawler github github-api python3

Last synced: 07 Feb 2025

https://github.com/izumisy/scalable-crawler

Scalable crawler, fully-managed by Google Cloud Platrom

crawler docker gcp golang ruby

Last synced: 10 Feb 2025

https://github.com/erikmueller/jazmax

Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator

crawler data-science efficiency green heatpump jaz

Last synced: 29 Jan 2025

https://github.com/eduardosbcabral/desafio-tecnico-mp

Desafio - Gerador de arquivos em C# utilizando Web Crawler e Buffers para a escrita do arquivo em disco.

crawler csharp dotnet

Last synced: 13 Jan 2025

https://github.com/codeforequity-at/botium-crawler

Botium Crawler - Like a Website Crawler, just for Conversation Flows

botium chatbots crawler

Last synced: 20 Oct 2024

https://github.com/becky-dai/flower-knowledge-graph-visualization

A full stack program of knowledge graph visualization 一个关于知识图谱可视化的全栈项目

crawler css django echarts html js knowledge-graph neo4j python

Last synced: 21 Dec 2024

https://github.com/yidas/tw-stock-crawler-php

PHP Crawler for Taiwan Stock Data (台股資料爬蟲)

crawler stock taiwan taiwan-stock-information taiwan-stock-market

Last synced: 29 Oct 2024

https://github.com/qianbinbin/moebooru-crawler

Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .

crawler moebooru shell

Last synced: 09 Feb 2025

https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse

[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.

browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms

Last synced: 26 Jan 2025

https://github.com/spraakbanken/svt-crawler

Programme for crawling SVT's API for news articles and converting the data to XML.

corpus crawler

Last synced: 28 Jan 2025

https://github.com/aicore/app_info_extracter

This application would be used to extract information about apps from the internet

android appreview apps crawler googleplaystore

Last synced: 13 Nov 2024

https://github.com/nirjharlo/complete-google-seo-scan

WordPress Plugin with inbuilt SEO crawler

crawl-pages crawler seotools web-crawler web-spider wordpress wordpress-plugin

Last synced: 27 Oct 2024

https://github.com/harryandriyan/21scrap

Cinema XXI movie data scraper

crawler python scrapy

Last synced: 21 Jan 2025

https://github.com/santhoshse7en/alcoholics-anonymous

Research Project to analyse the knowledge about Alcoholics Anonymous in public

aa-meetings alcoholics alcoholics-anonymous anonymous bs4 crawler data-extraction-and-pre-processing google-search-using-python news-crawler newspaper3k python the-hindu web-scraping without-api

Last synced: 14 Jan 2025

https://github.com/kokseen1/chii

A minimal marketplace bot maker.

auction automation bidding bot carousell crawler ecommerce marketplace python python-telegram-bot scraper telegram telegram-bot web-scraping yahoo yahoo-auction

Last synced: 13 Jan 2025

https://github.com/skulltech/arachnid

Crawling Instagram for reasons.

crawler instagram instagram-scraper python3 scraper scrapy

Last synced: 01 Feb 2025

https://github.com/airtoxin/stackable-crawler

middleware based lightweight crawler framework

crawler javascript lightweight

Last synced: 24 Dec 2024

https://github.com/superreal/octopus

Recursive and multi-threaded broken link checker

broken checker crawler links

Last synced: 07 Jan 2025

https://github.com/roccomuso/is-twitter

Verify that a request is from Twitter crawlers using DNS verification steps

bot crawler dns ip js nodejs twitter verification

Last synced: 07 Jan 2025

https://github.com/tikazyq/colly-crawlers

Crawlers using Golang-based web crawling framework Colly

crawler

Last synced: 02 Jan 2025

https://github.com/joelkoen/wls

Easily crawl multiple sitemaps and list URLs

crawler sitemap url

Last synced: 07 Nov 2024

https://github.com/fabrix-app/spool-scraper

Spool: Webscraper

cheerio crawler fabrix nodejs scraping spools typescript webscraper

Last synced: 13 Jan 2025

https://github.com/omkarcloud/multiple-account-generation-template

🚀 THIS WEB SCRAPING TEMPLATE PROVIDES YOU WITH A GREAT STARTING POINT WHEN CREATING MULTIPLE ACCOUNTS ON A WEBSITE. 🤖

beautifulsoup crawler crawling crawling-framework crawling-python crawling-tool headless node-crawler python-crawler scraper scraping scraping-framework scraping-python scraping-tool selenium web-crawler web-crawling web-scraper web-scraping webscraping

Last synced: 02 Jan 2025

https://github.com/kapitanluffy/sunny-crawler

That moment when I tried learning things about "Big Data" and "Inverted Indexes"

big-data crawler inverted-index php search

Last synced: 07 Feb 2025

https://github.com/foufou-exe/yspeed

Yspeed is a library that scrapes the Speedtest site

crawler python rich scraper scraping selenium selenium-python speedtest

Last synced: 08 Jan 2025

https://github.com/sangupta/shopify-burst-crawler

Simple crawler to download meta information for all stock pics from Shopify Burst website

burst crawler java shopify stock-photos

Last synced: 08 Nov 2024

https://github.com/sieep-coding/web-crawler

A simple web crawler implemented in Go.

crawler go golang web-crawler

Last synced: 16 Jan 2025

https://github.com/coghost/izen

encapsulation of some useful features

chaos crawler encrypt izen mqtt profig python3 utils

Last synced: 09 Nov 2024

https://github.com/jiannei/github-trending

Github trending crawling based on lumen.

crawler github-trending lumen php

Last synced: 09 Nov 2024

https://github.com/zhifengle/js-hook

解析 JavaScript 的 AST，添加自定义的钩子

crawler js-reverse

Last synced: 14 Nov 2024

https://github.com/linkspreed/twig

Twig🔍 - the fastest and safest search engine📐 for the web🌐, images🤳, news 📰and much more

crawler engine search search-engine web5

Last synced: 03 Jan 2025

https://github.com/jofaval/webscraping

WebScraper providing tools to scrape tons of websites with the same base

crawler e-commerce python scraper webscraper webscraping

Last synced: 04 Feb 2025

https://github.com/der3318/zijfhchat-crawler

手遊「紫禁繁花」－聊天室爬蟲、即時查詢

crawler dashboard line-notify

Last synced: 13 Jan 2025

https://github.com/restuwahyu13/node-scraper-content

example node scraper all content programming using puppeteer

crawler nodejs puppeter scrapper

Last synced: 03 Jan 2025

https://github.com/imthaghost/gocloneold

Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.

colly crawler go scraper

Last synced: 11 Feb 2025

https://github.com/truethari/fcrawler

Python application that can be used to copy files of a given file type from a folder directory.

copy copy-files crawl crawler crawler-python file files

Last synced: 07 Jan 2025

https://github.com/akagi201/spy

A lightweight distributed web crawler

crawler distributed lightweight nsq

Last synced: 08 Jan 2025

https://github.com/gill-singh-a/crawler

A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found

crawler multithreading osint python python3 requests scraper

Last synced: 09 Nov 2024

https://github.com/mushoffa/scrapy-tokopedia-python

crawler python scraping scrapy spider tokopedia

Last synced: 15 Jan 2025

https://github.com/jjlibra/bake-mediacrawler

NanmiCoder‘s self-media data crawling software

crawler learning

Last synced: 30 Nov 2024

https://github.com/maxbubblegum47/spotydump

Spotify Scraper combined with a Genius Scraper. Scrape artist of a certain period of time/region of the world and dump all their songs!

crawler dump genius lyrics python spotify unimore-informatica

Last synced: 28 Jan 2025

https://github.com/maraf/staticsitecrawler

A simple util for crawling links from root URL and saving HTML documents.

crawler static-site-generator

Last synced: 17 Jan 2025

https://github.com/sean2077/leetcode_anki

Leetcode Anki card factory.

anki crawler leetcode leetcode-anki scrapy

Last synced: 11 Jan 2025

https://github.com/panyanyany/vps_spider

VPS Spider powering https://findallvps.com

crawler spider vps

Last synced: 11 Jan 2025

https://github.com/raspi/scrapy-kuntavaalit2021-yle

Fetch YLE kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/e73b025/simple-python-url-crawler

Super simple Python3 website URL scraper/crawler. Multi-threaded.

crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple

Last synced: 11 Nov 2024

https://github.com/r3c0ger/liscaps

A LSTM-based intelligent stock crawl, analysis and prediction system.

crawler lstm python pytorch stock streamlit

Last synced: 11 Nov 2024

https://github.com/zabuzard/mplogger

Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.

bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api

Last synced: 12 Feb 2025

https://github.com/tvrcgo/collect

数据采集

crawler scraper

Last synced: 12 Feb 2025

https://github.com/santhin/real-estate

Real estate crawler with ML on scraped data

crawler jupyter-notebook ml real-estate scrapy

Last synced: 24 Jan 2025

https://github.com/Juphex/SupremeBot

Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.

android chrome crawler kivy python3 webscraping windows

Last synced: 23 Oct 2024

https://github.com/ductnn/curls

Simple tool crawler URLs form domain

colly crawler domain golang scanning url

Last synced: 09 Feb 2025

https://github.com/marcbperez/python-webcrawler

Crawls HTML pages for prices and other pieces of data.

crawler docker gradle python

Last synced: 20 Jan 2025

https://github.com/nextlevelshit/fick

Fucking Incredible Command line King. Add CLI flavour to any website you like to.

cli crawler

Last synced: 20 Jan 2025

https://github.com/brunojppb/airport-crawler

Simple and powerful CLI app to get worldwide airport information in JSON format

airport cli crawler ruby

Last synced: 14 Jan 2025

https://github.com/tsonglew/spidreat

Article Spider with Python & Node.js :beetle:

crawler

Last synced: 12 Feb 2025

https://github.com/maximiliancw/crawlio

Asynchronous web crawling and scraping with Python for minimalists

asyncio crawler fastapi framework picocss python scraper vuejs

Last synced: 13 Nov 2024

https://github.com/fbielejec/nagger

nag reviewers of PRs

bot crawler github slack

Last synced: 09 Jan 2025

https://github.com/ozakboy/taiwan-news-crawlers

.net-based Crawlers for news of Taiwan (.net 台灣新聞爬蟲，數據物件化，方便使用)

crawler data-collection dataset-generation dotnet news taiwan webcrawlers

Last synced: 22 Jan 2025

https://github.com/vmandic/tris-web-crawler

Tris is a simple NodeJS web crawler tool to help you collect links from visited links of a website's domain.

crawler data-tools nodejs scraping seo-tools web-scraper

Last synced: 12 Feb 2025

https://github.com/sc0vu/jspachong

Js crawler library.

crawler pachong

Last synced: 12 Feb 2025

https://github.com/wangyihang/acw-sc-v2-py

Python requests.HTTPAdapter for `acw_sc__v2`

acw-sc-v2 crawler waf

Last synced: 05 Jan 2025

https://github.com/nueip/curl

NUEiP Curl Lib

crawler php

Last synced: 24 Nov 2024

https://github.com/yuminn-k/crawling-tabelog

Crawling store information from tabelog

crawler python3

Last synced: 18 Jan 2025

https://github.com/eklem/browsercrawler

Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.

crawler search-engine website-generation

Last synced: 12 Feb 2025

https://github.com/wangshouh/icourse163_script

A python script designed for like and comments to MOOC. 用于中国大学MOOC点赞和评论的Python脚本

crawler icourse163 python requests

Last synced: 02 Feb 2025

https://github.com/diogoazevedos/x-ray-build

A helper that build a x-ray based on a schema

crawler schema scraper structure x-ray

Last synced: 31 Dec 2024

https://github.com/congcoi123/crawler-sheis

A small crawler for getting data from the website: https://sheis.vn

crawler webcrawler webcrawling webscraper webscraping

Last synced: 31 Dec 2024

https://github.com/wangshouh/qzone_api

使用Python调用QQ空间公开接口获取信息

crawler python qzone requests

Last synced: 02 Feb 2025

https://github.com/darealfreak/figure-tracker

application to keep watch of wished figures on multiple sites and notify you about auctions, sales or sudden price drops

crawler figure-tracker monitoring

Last synced: 05 Feb 2025

https://github.com/eduardozepeda/go-web-crawler

A concurrent web crawler written in go that looks for exposed .git and .env uris.

crawler environment-variables git go pentesting security-audit

Last synced: 16 Jan 2025

Crawler Awesome Lists

awesome-crawler 101 awesome-python-primer 68 awesome-fingerprinting 48 awesome-digital-preservation 45 awesome-web-scraping 62

Crawler Categories

Core Libraries 60 2.6 机器学习 50 Research 31 Python 18 Replay tools 18 1.1 语言基础 16 Anti-Bot Solutions 15 Libraries & Projects 13 Fingerprinting Evasion 13 Sites 12 Specialized Tools 12 2.4 Web 前端 10 Browser Automation 10 2.1 爬虫基础 9 Resources 9 3\. 数据库 8 2.5 数据分析 7 Web archiving 7 Data Processing 7 Java 7