Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-16 00:05:55 UTC
- JSON Representation
https://github.com/stopka/fedicrawl
Collect feeds to follow on Fediverse nodes.
crawler docker fediverse nodejs prisma typescript
Last synced: 05 Nov 2024
https://github.com/cr0hn/feed-to-exporter
Get RSS Feed and export as Wordpress Post
Last synced: 07 Nov 2024
https://github.com/moehmeni/ezweb
Easy to use web page analyzer
analyzer crawler scraper text-analysis text-classification text-mining webcrawler webcrawling webpage webscraper webscraping www
Last synced: 05 Nov 2024
https://github.com/alishahbazi81/jobcrawler
Job crawler robot which finds jobs on job board platforms like LinkedIn, Glassdoor, and indeed based on their post time and send them to a telegram channel
asp-net-core crawler jobs jobsearch telegram telegram-bot
Last synced: 11 Nov 2024
https://github.com/kernelerr/pixivsync
Pixiv图片下载及同步工具
crawler pixiv pixiv-crawler python
Last synced: 12 Oct 2024
https://github.com/aprilnea/xjtlu
This is how to get all the network resources of XJTLU.
crawler gateway http-auth python spider web-crawler xjtlu
Last synced: 15 Nov 2024
https://github.com/ivan-alone/instastories-saver-cpp
Program to saving Instagram Stories - Rewritten to C++
api backup crawler grambler gramblr insta instagram instagram-stories instastories-saver instastory stories
Last synced: 31 Oct 2024
https://github.com/juliandavidmr/raptor
Lightweight tool for scanning web sites, works as spider. Once executed, starts scanning pages looking for websites to visit, with automatic indexing.
Last synced: 09 Nov 2024
https://github.com/sayakie/pixiv-crawler
Crawls images from Pixiv 🚀
crawler nodejs pixiv typescript
Last synced: 28 Oct 2024
https://github.com/itszeeshan/crawlinit
A web crawler written in python3
appsec bugbounty bugbounty-tool bugbountytips crawler crawler-python enumeration infosec python recon reconnaissance scanner url web
Last synced: 12 Oct 2024
https://github.com/danielmorell/se_bot_checker
Validate search engine user agents and IP addresses.
crawler googlebot python search-engine spider
Last synced: 15 Oct 2024
https://github.com/leelow/nightmare-screenshot-selector
👻 📷 A Nightmare plugin to easily take screenshots.
crawler headless-browsers javascript js nightmare nightmarejs nodejs plugin webcrawler
Last synced: 15 Nov 2024
https://github.com/spencerlepine/readme-crawler
A Node.js web crawler to download README files and follow contained links. Fetch repositories from a valid GitHub URL
crawler javascript node nodejs readme scraper web-crawler webcrawer
Last synced: 13 Nov 2024
https://github.com/foolin/scrago
An simpe, fast, extensible crawl page framework for golang
Last synced: 09 Nov 2024
https://github.com/haxzie-xx/crode.js-node-web-crawler
Node.js Crawler built for open FTP sites for movie link collection.
Last synced: 01 Nov 2024
https://github.com/mcstreetguy/crawler
An advanced web-crawler written in PHP.
composer composer-library crawler crawler-engine guzzle http-requests php php-7 php-library web-crawler webcrawler
Last synced: 12 Oct 2024
https://github.com/huzecong/film-spider
Spiders crawling for film listing websites.
Last synced: 12 Nov 2024
https://github.com/ruedigervoigt/salted
Smart, Asynchronous Link Tester with Database backend: works with HTML, Markdown and TeX files
asyncio crawler html-files hyperlinks latex linkchecker markdown pandoc python
Last synced: 11 Oct 2024
https://github.com/rimiti/ping-urls
🏓 Ping URLs by batch.
cache crawler ping prerender prerendering seo
Last synced: 07 Nov 2024
https://github.com/tokenmill/crawling-framework-example
Demonstration on how to use the Crawling Framework to setup a simple science news crawler and store results in ElasticSearch. Use this configuration to set up your own crawler.
crawler crawling-framework elasticsearch storm-crawler
Last synced: 10 Nov 2024
https://github.com/arshadkazmi42/github-scanner-local
Locally scan all the repositories of a github organization
bounty bug bug-bounty crawler github local no-api scanner
Last synced: 28 Oct 2024
https://github.com/roccomuso/is-duckduck
Verify that a request is from DuckDuckBot, the Web crawler for DuckDuckGo
crawler duckduck duckduckbot duckduckgo ip js nodejs verify web
Last synced: 17 Oct 2024
https://github.com/archan937/webhead
An easy-to-use Node web crawler storing cookies, following redirects, traversing pages and submitting forms.
api cookies crawler fetch file-uploads forms headless json node redirects scraper spider traversing
Last synced: 10 Nov 2024
https://github.com/roccomuso/is-baidu
Verify that a request is from Baidu crawlers using DNS verification
baidu crawler dns ip js nodejs verification
Last synced: 17 Oct 2024
https://github.com/testica/a3hrgo-sdk
a3HRgo sdk to automatize your reports
a3hrgo crawler javascript puppeteer
Last synced: 10 Oct 2024
https://github.com/oldkingcone/pbandj
PasteBin Crawler, crawls the url https://pastebin.com/archive
crawler headless headless-chrome python python-crawler selenium-python selenium-webdriver
Last synced: 16 Nov 2024
https://github.com/v-braun/hero-scrape
Find the hero (main) image of an URL
crawler fastimage hero hero-image opengraph webscraping
Last synced: 15 Nov 2024
https://github.com/1uc1f3r616/dark-net-websites-dataset
Dataset of Onion Websites
crawler darknet data-analysis dataset onion search-engine website
Last synced: 11 Nov 2024
https://github.com/gatenlp/wpextract
Create datasets from WordPress sites for research or archiving
corpus crawler nlp text-extraction text-mining web-scraping wordpress
Last synced: 13 Nov 2024
https://github.com/agmmnn/nis-scraper
Scrapy script to scrape nisanyansozluk.com
Last synced: 04 Nov 2024
https://github.com/vivekg13186/easy_web_crawler
Web crawler around puppeteer to crawler ajax/java script enabled pages.
Last synced: 28 Oct 2024
https://github.com/sieep-coding/web-crawler
A simple web crawler implemented in Go.
Last synced: 08 Nov 2024
https://github.com/dnlzrgz/winzig
A tiny search engine for personal use.
async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3
Last synced: 05 Nov 2024
https://github.com/spa5k/quick-scraper
An easy, lightweight scraper built using typescript for good developer experience.
crawler dx easy-to-use esbuild scraper typescript
Last synced: 13 Nov 2024
https://github.com/pyaesoneaungrgn/2d-crawler
2D crawler for set.or.th
2d 2d-crawler crawler myanmar php
Last synced: 09 Nov 2024
https://github.com/igeligel/TeamFortressOutpostApi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 13 Nov 2024
https://github.com/restuwahyu13/node-scraper-content
example node scraper all content programming using puppeteer
crawler nodejs puppeter scrapper
Last synced: 09 Nov 2024
https://github.com/ozansz/github-crawler
A basic utility for crawling users and e-mails of users
Last synced: 16 Oct 2024
https://github.com/chenmozhijin/mediawikiextractor
一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.
crawler crawler-python crawling extractor json mediawiki python regex web-crawler
Last synced: 09 Oct 2024
https://github.com/waynechang65/baha-crawler
baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.
bahamut crawler javascript nodejs scraper spider webcrawler
Last synced: 19 Oct 2024
https://github.com/capturr/price-extract
Performant way to extract price amount and metadatas (currency, decimal & thousands separator) from any string.
amount crawler crawling currencies currency extract extractor javascript nodejs parser parsing price scraper scraping spider typescript
Last synced: 10 Nov 2024
https://github.com/thiiagoms/dict-crawler
Simple crawler on UOL dictionary
beautifulsoup4 crawler dic python pythonic
Last synced: 15 Nov 2024
https://github.com/hrvadl/goweekly
Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel
article chatgpt crawler go golang openai-api telegram telegram-bot
Last synced: 13 Oct 2024
https://github.com/qin2dim/istockphoto-go
📸 Gracefully download dataset from iStockPhoto.
Last synced: 31 Oct 2024
https://github.com/xdk78/grabbi
grabbi a simple web scraper/crawler
crawler html scraper web-scraper
Last synced: 23 Oct 2024
https://github.com/hctilg/pinterest-crawler
Downloads all images suitable for search
Last synced: 07 Nov 2024
https://github.com/bitebait/curry
🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.
api brasil crawler currency-exchange-rates go golang paraguay webcrawler
Last synced: 14 Nov 2024
https://github.com/sauerbraten/chef
Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.
crawler extinfo go sauerbraten spy stalker
Last synced: 14 Nov 2024
https://github.com/yakuza8/coronavirus-timeseries-predictor
Timeseries analyzer for coronavirus with recurrent neural network
asyncio beautifulsoup4 corona coronavirus coronavirus-analysis coronavirus-crawler coronavirus-dataset covid covid-19 covid19-data crawler python-3-6 python3 python36 rnn web-scrapper
Last synced: 12 Oct 2024
https://github.com/jmkim/stock-crawler
Universal Stock Crawler
crawler stock stock-market yahoo-finance
Last synced: 13 Oct 2024
https://github.com/rodyherrera/cdrake-se
✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).
bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube
Last synced: 06 Nov 2024
https://github.com/fanyong920/crawlitem-puppeteer
puppeteer抓取商品的例子
chromnium crawler javascript nodejs puppeteer scrapy
Last synced: 05 Nov 2024
https://github.com/akagi201/spy
A lightweight distributed web crawler
crawler distributed lightweight nsq
Last synced: 11 Nov 2024
https://github.com/aicore/app_info_extracter
This application would be used to extract information about apps from the internet
android appreview apps crawler googleplaystore
Last synced: 13 Nov 2024
https://github.com/krishpranav/spider
A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely
crawler ruby spider web-crawler web-scraping
Last synced: 15 Oct 2024
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 13 Oct 2024
https://github.com/gill-singh-a/crawler
A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found
crawler multithreading osint python python3 requests scraper
Last synced: 09 Nov 2024
https://github.com/andreoliwa/scrapy-tegenaria
🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢
crawler flask postgresql python python3 scrapy
Last synced: 31 Oct 2024
https://github.com/galaxiat/galaxiat.serve.seo
Node.JS package to serve React app and prerender path (cron)
crawler cron puppeteer seo seo-optimization ssr
Last synced: 05 Nov 2024
https://github.com/jofaval/webscraping
WebScraper providing tools to scrape tons of websites with the same base
crawler e-commerce python scraper webscraper webscraping
Last synced: 21 Oct 2024
https://github.com/truethari/fcrawler
Python application that can be used to copy files of a given file type from a folder directory.
copy copy-files crawl crawler crawler-python file files
Last synced: 10 Nov 2024
https://github.com/nazanin1369/searchengine
Implementing a search engine using Java, AngularJS and Elastic search
angularjs crawler elasticsearch java search-engine
Last synced: 10 Nov 2024
https://github.com/brunojppb/airport-crawler
Simple and powerful CLI app to get worldwide airport information in JSON format
Last synced: 14 Nov 2024
https://github.com/kokseen1/chii
A minimal marketplace bot maker.
auction automation bidding bot carousell crawler ecommerce marketplace python python-telegram-bot scraper telegram telegram-bot web-scraping yahoo yahoo-auction
Last synced: 13 Nov 2024
https://github.com/tbarnes94/fortnite-weapons-bot
A bot that returns fortnite weapon statistics based on input from Discord users. Written in TypeScript.
crawler discord discord-bot discord-js typescript2
Last synced: 15 Oct 2024
https://github.com/erikmueller/jazmax
Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator
crawler data-science efficiency green heatpump jaz
Last synced: 14 Oct 2024
https://github.com/sean2077/leetcode_anki
Leetcode Anki card factory.
anki crawler leetcode leetcode-anki scrapy
Last synced: 12 Nov 2024
https://github.com/xiantang/mini_scrapy
模仿scrapy的轻量级爬虫框架
crawler python3 requets scrapy
Last synced: 15 Oct 2024
https://github.com/codeforequity-at/botium-crawler
Botium Crawler - Like a Website Crawler, just for Conversation Flows
Last synced: 20 Oct 2024
https://github.com/natshah/natshah-crawler
Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.
crawler database filter natshah-crawler
Last synced: 26 Oct 2024
https://github.com/imthaghost/gocloneold
Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.
Last synced: 31 Oct 2024
https://github.com/manojahi/is-there-any-song-reference-in-article
It will tell if there are any songs references in article from a website.
crawler lyrics-search python webscraping
Last synced: 08 Nov 2024