Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Crawler

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

GitHub: https://github.com/topics/crawler
Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
Last updated: 2025-01-10 00:06:02 UTC
JSON Representation

https://github.com/surelle-ha/dogma

Dogma is a CLI tool that enables interaction with the GitHub API for the purpose of searching .env files with specified keywords. You can configure a GitHub token and use the crawler to search for keys in .env files across public repositories.

cli crawler github nodejs

Last synced: 10 Nov 2024

https://github.com/birkhofflee/blizzard_forum.js

An unofficial Node.js API for Blizzard Forums. (works in 2019)

api crawler web

Last synced: 18 Nov 2024

https://github.com/leelow/nightmare-screenshot-selector

👻 📷 A Nightmare plugin to easily take screenshots.

crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler

Last synced: 15 Nov 2024

https://github.com/zurdi15/nbz

Bot to automate internet browsing

automation bot browser-automation browsermob-proxy crawler selenium testing web

Last synced: 15 Oct 2024

https://github.com/chenmozhijin/mediawikiextractor

一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.

crawler crawler-python crawling extractor json mediawiki python regex web-crawler

Last synced: 09 Oct 2024

https://github.com/sieep-coding/web-crawler

A simple web crawler implemented in Go.

crawler go golang web-crawler

Last synced: 08 Nov 2024

https://github.com/roccomuso/is-duckduck

Verify that a request is from DuckDuckBot, the Web crawler for DuckDuckGo

crawler duckduck duckduckbot duckduckgo ip js nodejs verify web

Last synced: 07 Jan 2025

https://github.com/thiiagoms/dict-crawler

Simple crawler on UOL dictionary

beautifulsoup4 crawler dic python pythonic

Last synced: 15 Nov 2024

https://github.com/pyaesoneaungrgn/2d-crawler

2D crawler for set.or.th

2d 2d-crawler crawler myanmar php

Last synced: 09 Nov 2024

https://github.com/indatawetrust/reporter

Crawler queue creation tool for paging

crawler

Last synced: 13 Dec 2024

https://github.com/igeligel/teamfortressoutpostapi

:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.

bot bot-framework crawler steam steam-api steambot teamfortress2

Last synced: 19 Nov 2024

https://github.com/sauerbraten/chef

Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.

crawler extinfo go sauerbraten spy stalker

Last synced: 14 Nov 2024

https://github.com/obaskly/kikfriender.com-bot

A multifunctional bot that increases your likes and hotness points, as well as adding good positive feedback. It can also flag an account from your choice as fake and add negative feedback. Moreover, it can check a given wordlist and print out kik usernames and store them in a new text file.

ai artificial-intelligence bot checker chrome crawl crawler crawling kik proxies proxy scraper scraping selenium wordlist

Last synced: 08 Jan 2025

https://github.com/bitebait/curry

🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.

api brasil crawler currency-exchange-rates go golang paraguay webcrawler

Last synced: 14 Nov 2024

https://github.com/hctilg/pinterest-crawler

Downloads all images suitable for search

crawler pinterest

Last synced: 07 Nov 2024

https://github.com/leo9960/waimai_crawler

抓取外卖平台商户信息

crawler

Last synced: 10 Nov 2024

https://github.com/exp-codes/python-crawler-template

Python 爬虫开发模板

crawler programming template

Last synced: 16 Dec 2024

https://github.com/agmmnn/nis-scraper

Scrapy script to scrape nisanyansozluk.com

cli crawler python scraper

Last synced: 21 Dec 2024

https://github.com/ayusharma/rss-parser

A simple crawler in ReactJS

crawler reactjs rss-parser

Last synced: 18 Dec 2024

https://github.com/qin2dim/istockphoto-go

📸 Gracefully download dataset from iStockPhoto.

colly crawler istockphoto

Last synced: 28 Dec 2024

https://github.com/capturr/price-extract

Performant way to extract price amount and metadatas (currency, decimal & thousands separator) from any string.

amount crawler crawling currencies currency extract extractor javascript nodejs parser parsing price scraper scraping spider typescript

Last synced: 07 Jan 2025

https://github.com/sergioburdisso/solidscraper

Easy to use JQuery-Like API for Web Scraping/Crawling.

crawler crawling crawling-python jquery python scraper scraping tweets twitter web web-crawler web-scraping webscraping

Last synced: 23 Nov 2024

https://github.com/waynechang65/baha-crawler

baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.

bahamut crawler javascript nodejs scraper spider webcrawler

Last synced: 19 Oct 2024

https://github.com/reycn/china-drug-trials-crawler

A web crawler for Chinadrugtrials.org.cn, written in Python 3.6+.

china crawler drug python scraper

Last synced: 06 Dec 2024

https://github.com/vivekg13186/easy_web_crawler

Web crawler around puppeteer to crawler ajax/java script enabled pages.

crawler spider web

Last synced: 09 Dec 2024

https://github.com/t-rekttt/tlu-schedule

chatfuel crawler nodejs vuejs

Last synced: 09 Dec 2024

https://github.com/fanyong920/crawlitem-puppeteer

puppeteer抓取商品的例子

chromnium crawler javascript nodejs puppeteer scrapy

Last synced: 23 Dec 2024

https://github.com/ozansz/github-crawler

A basic utility for crawling users and e-mails of users

crawler github python python3

Last synced: 06 Dec 2024

https://github.com/rodyherrera/cdrake-se

✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).

bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube

Last synced: 25 Dec 2024

https://github.com/archan937/webhead

An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.

api cookies crawler fetch file-uploads forms headless json node redirects scraper spider traversing

Last synced: 10 Nov 2024

https://github.com/rimiti/ping-urls

🏓 Ping URLs by batch.

cache crawler ping prerender prerendering seo

Last synced: 28 Dec 2024

https://github.com/hrvadl/goweekly

Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel

article chatgpt crawler go golang openai-api telegram telegram-bot

Last synced: 13 Oct 2024

https://github.com/ribeirogab/technology-insights

Program with the aim of using the data from Stack Overflow Insights 2020 and generating informative graphs.

crawler python scraping typescript

Last synced: 19 Nov 2024

https://github.com/huzecong/film-spider

Spiders crawling for film listing websites.

crawler

Last synced: 12 Nov 2024

https://github.com/tokenmill/crawling-framework-example

Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.

crawler crawling-framework elasticsearch storm-crawler

Last synced: 06 Jan 2025

https://github.com/sntran/gen_spider

An Erlang/Elixir behaviour to define Spiders

behaviour crawler generic interface spider

Last synced: 23 Nov 2024

https://github.com/oxylabs/web-crawler

Web Crawler is a tool used to discover target URLs, select the relevant content, and have it delivered in bulk. It crawls websites in real-time and at scale to quickly deliver all content or only the data you need based on your chosen criteria.

api crawler github-python scraper web-crawler web-crawler-python web-scraping web-scraping-api webscraping

Last synced: 17 Nov 2024

https://github.com/bitlytwiser/tormonger

Recursive Tor network crawler

crawler go golang tor

Last synced: 05 Jan 2025

https://github.com/arthurc0102/ntub-bot

北商大教學評量機器人

bot crawler ntub

Last synced: 29 Nov 2024

https://github.com/glutexo/onigumo

Parallel web scraping framework

crawler

Last synced: 05 Jan 2025

https://github.com/ruedigervoigt/salted

Smart, Asynchronous Link Tester with Database backend: works with HTML, Markdown and TeX files

asyncio crawler html-files hyperlinks latex linkchecker markdown pandoc python

Last synced: 11 Oct 2024

https://github.com/thaddeusjiang/campcat

キャンプ場予約情報監視 Bot

bot crawler telegram

Last synced: 25 Oct 2024

https://github.com/testica/a3hrgo-sdk

a3HRgo sdk to automatize your reports

a3hrgo crawler javascript puppeteer

Last synced: 10 Oct 2024

https://github.com/ktont/curlas

a nodejs spider tool

chrome-extension crawler spider

Last synced: 19 Nov 2024

https://github.com/vaibhavpandeyvpz/cbse-scraper

This script scrapes information about schools affiliated with CBSE for a given state.

cbse crawler data schools scraper

Last synced: 09 Nov 2024

https://github.com/v-braun/hero-scrape

Find the hero (main) image of an URL

crawler fastimage hero hero-image opengraph webscraping

Last synced: 15 Nov 2024

https://github.com/dnlzrgz/winzig

A tiny search engine for personal use.

async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3

Last synced: 05 Nov 2024

https://github.com/arshadkazmi42/scraplink

Scraplink library, for scraping links and images url from a webpage

crawler mongdb nodejs scraplink url web

Last synced: 28 Oct 2024

https://github.com/arshadkazmi42/github-scanner-local

Locally scan all the repositories of a github organization

bounty bug bug-bounty crawler github local no-api scanner

Last synced: 28 Oct 2024

https://github.com/gatenlp/wpextract

Create datasets from WordPress sites for research or archiving

corpus crawler nlp text-extraction text-mining web-scraping wordpress

Last synced: 13 Nov 2024

https://github.com/roccomuso/is-baidu

Verify that a request is from Baidu crawlers using DNS verification

baidu crawler dns ip js nodejs verification

Last synced: 07 Jan 2025

https://github.com/mikirasora/osuplayedbeatmapscrawler

A crawler that fetch and download osu!beatmaps which you had played

crawler osu

Last synced: 01 Jan 2025

https://github.com/erikjiang/book_crawler

:lizard: book_crawler

crawler douban golang

Last synced: 28 Nov 2024

https://github.com/igaozp/jobwitcher

JobWitcher 招聘网站爬虫合集

crawler python3 redis scrapy spider

Last synced: 27 Dec 2024

https://github.com/xdk78/grabbi

grabbi a simple web scraper/crawler

crawler html scraper web-scraper

Last synced: 31 Dec 2024

https://github.com/chenyangguang/hundun

crawler go gocolly

Last synced: 14 Nov 2024

https://github.com/natlee/myanimelist-comment-crawler

Crawl all reviews and infomation of Anime works on MyAnimeList. ;)

anime crawler data-analysis data-mining data-science kaggle kaggle-dataset myanimelist python requests scrapy-crawler sqlite

Last synced: 21 Nov 2024

https://github.com/spa5k/quick-scraper

An easy, lightweight scraper built using typescript for good developer experience.

crawler dx easy-to-use esbuild scraper typescript

Last synced: 13 Nov 2024

https://github.com/igeligel/TeamFortressOutpostApi

:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.

bot bot-framework crawler steam steam-api steambot teamfortress2

Last synced: 13 Nov 2024

https://github.com/alexmili/reachable

Check if a URL exists and is reachable

crawler health-check monitoring reachability webscraping

Last synced: 10 Dec 2024

https://github.com/achannarasappa/locust-cli

Developer tools to accelerate development of Locust jobs

cli crawler headless-chrome puppeteer scraper

Last synced: 18 Nov 2024

https://github.com/jmkim/stock-crawler

Universal Stock Crawler

crawler stock stock-market yahoo-finance

Last synced: 27 Nov 2024

https://github.com/a-x-/scian

Simple cian stat

cian crawler static-site

Last synced: 11 Jan 2025

https://github.com/maicss/1024img

1024 image nodejs crawler

1024 crawler nodejs

Last synced: 31 Dec 2024

https://github.com/mmqnym/etherscan_tracker

Show how to tacker wallet on etherscan.io

crawler ethereum python

Last synced: 17 Nov 2024

https://github.com/xanke/nscan

NodeJs 网页采集器

crawler javascript nodejs

Last synced: 02 Dec 2024

https://github.com/cls1991/gank.io

抓取干货集中营图片资源 (http://gank.io)

crawler curl gankio picture

Last synced: 11 Nov 2024

https://github.com/yakuza8/coronavirus-timeseries-predictor

Timeseries analyzer for coronavirus with recurrent neural network

asyncio beautifulsoup4 corona coronavirus coronavirus-analysis coronavirus-crawler coronavirus-dataset covid covid-19 covid19-data crawler python-3-6 python3 python36 rnn web-scrapper

Last synced: 22 Dec 2024

https://github.com/developerdavi/meli-crawler

Basic web crawler API for getting products from MercadoLibre (BRL | MLB)

api crawler meli-crawler mercadolibre mercadolibre-sdk mercadolivre mercadolivre-sdk nextjs now products react zeit

Last synced: 25 Nov 2024

https://github.com/simoninithomas/news-crawler-parse-backend

This is a crawler made with Scrapy.py to crawl french news articles and send them in your Parse.com backend

crawler news parse scrapy

Last synced: 17 Nov 2024

https://github.com/georgea93/crawley

nodejs web crawler

crawler depth es6 javascript node nodejs nodejs-web-crawler npm npm-module npm-package robots-txt sitemap web yarn

Last synced: 20 Nov 2024

https://github.com/oldkingcone/pbandj

PasteBin Crawler, crawls the url https://pastebin.com/archive

crawler headless headless-chrome python python-crawler selenium-python selenium-webdriver

Last synced: 16 Nov 2024

https://github.com/fernandod1/yahoo-finance-scraper

This python script scraps "Open" and "Previous Close" values from any company in Yahoo Finance and save them in a local text file.

crawler python python3 scraper scraping scraping-websites scrapper scrapping spider yahoo yahoo-finance yahoo-finance-api

Last synced: 12 Nov 2024

https://github.com/ericz99/go-crawler

Simple lightweight crawler, that will find all endpoints on any website.

crawler golang

Last synced: 30 Nov 2024

https://github.com/spaceemotion/goodreads-browser

Custom crawler + interface to have better filtering and sorting of the goodreads database 📚🔍

books crawler goodreads

Last synced: 26 Dec 2024

https://github.com/natshah/natshah-crawler

Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.

crawler database filter natshah-crawler

Last synced: 14 Dec 2024

https://github.com/santhoshse7en/alcoholics-anonymous

Research Project to analyse the knowledge about Alcoholics Anonymous in public

aa-meetings alcoholics alcoholics-anonymous anonymous bs4 crawler data-extraction-and-pre-processing google-search-using-python news-crawler newspaper3k python the-hindu web-scraping without-api

Last synced: 14 Nov 2024

https://github.com/liuzl/newsmth

A go crawler for newsmth.net

bigdata crawler newsmth nlp

Last synced: 25 Dec 2024

https://github.com/linkspreed/twig

Twig🔍 - the fastest and safest search engine📐 for the web🌐, images🤳, news 📰and much more

crawler engine search search-engine web5

Last synced: 03 Jan 2025

https://github.com/yuminn-k/crawling-tabelog

Crawling store information from tabelog

crawler python3

Last synced: 17 Nov 2024

https://github.com/raspi/scrapy-kuntavaalit2021-yle

Fetch YLE kuntavaalit 2021 data

crawler mirror python scrapy spider webcrawler

Last synced: 10 Nov 2024

https://github.com/kapitanluffy/sunny-crawler

That moment when I tried learning things about "Big Data" and "Inverted Indexes"

big-data crawler inverted-index php search

Last synced: 14 Dec 2024

https://github.com/maximiliancw/crawlio

Asynchronous web crawling and scraping with Python for minimalists

asyncio crawler fastapi framework picocss python scraper vuejs

Last synced: 13 Nov 2024

https://github.com/antoinegagne/treewalker

A web crawler in Erlang that respects `robots.txt`.

crawler erlang webcrawler

Last synced: 20 Dec 2024

https://github.com/fbielejec/nagger

nag reviewers of PRs

bot crawler github slack

Last synced: 09 Jan 2025

https://github.com/panyanyany/vps_spider

VPS Spider powering https://findallvps.com

crawler spider vps

Last synced: 12 Nov 2024

https://github.com/litingyes/cobweb

Collect, store and distribute meaningful static data

apis bing-image bing-wallpapers crawler image random-image

Last synced: 05 Dec 2024

https://github.com/izumisy/scalable-crawler

Scalable crawler, fully-managed by Google Cloud Platrom

crawler docker gcp golang ruby

Last synced: 18 Dec 2024

https://github.com/norconex/committer-neo4j

Implementation of Norconex Committer for Neo4j.

crawler neo4j neo4j-committer norconex-committer

Last synced: 17 Dec 2024

https://github.com/imkrunalkanojiya/seo-checker

Resolve your SEO related issue by using SEO Checker Rest API

crawler nodejs rest-api seo seo-crawler seo-free seo-optimization seo-tools

Last synced: 03 Jan 2025

https://github.com/elektrostudios/fhm-crawler-freehardmusic.com

Crawls download urls of albums from freehardmusic.com website

albums crawl crawler crawling desktop-app desktop-application dotnet music web-crawler web-crawling web-scraper web-scraping webcrawler webcrawling webscraper webscraping windows windows-app windowsapp winforms

Last synced: 01 Dec 2024

https://github.com/fabrix-app/spool-scraper

Spool: Webscraper

cheerio crawler fabrix nodejs scraping spools typescript webscraper

Last synced: 14 Nov 2024

https://github.com/rudrakshi99/web_crawler

A Spider🕷 or search engine bot that downloads and indexes content from all over the Internet.

crawler python spider

Last synced: 22 Nov 2024

https://github.com/imthaghost/gocloneold

Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.

colly crawler go scraper

Last synced: 19 Dec 2024

https://github.com/sangupta/shopify-burst-crawler

Simple crawler to download meta information for all stock pics from Shopify Burst website

burst crawler java shopify stock-photos

Last synced: 08 Nov 2024

https://github.com/zhifengle/js-hook

解析 JavaScript 的 AST，添加自定义的钩子

crawler js-reverse

Last synced: 14 Nov 2024

https://github.com/travorlzh/temperature-analyzer

Python crawler that helps fetch temperature of Beijing, China

crawler homework python variance

Last synced: 16 Nov 2024

https://github.com/nakabonne/netsurfer

netsurfer is a very lightweight scraping framework

crawler go library scraping

Last synced: 14 Dec 2024

https://github.com/sc0vu/jspachong

Js crawler library.

crawler pachong

Last synced: 19 Dec 2024

https://github.com/sean2077/leetcode_anki

Leetcode Anki card factory.

anki crawler leetcode leetcode-anki scrapy

Last synced: 12 Nov 2024

Crawler Awesome Lists

awesome-crawler 101 awesome-python-primer 68 awesome-digital-preservation 45 awesome-fingerprinting 48

Crawler Categories

2.6 机器学习 50 Research 31 Python 18 Replay tools 18 1.1 语言基础 14 Libraries & Projects 13 Fingerprinting Evasion 13 Sites 12 2.4 Web 前端 10 2.1 爬虫基础 9 3\. 数据库 8 Java 7 2.5 数据分析 7 Web archiving 7 4\. 异步IO 6 Other digital objects 6 Standards and specifications 4 Social Networks 4 2.2 Flask 框架 4 2.3 Django 框架 4