Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Crawler
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).
- GitHub: https://github.com/topics/crawler
- Wikipedia: https://en.wikipedia.org/wiki/Web_crawler
- Last updated: 2024-11-08 00:06:12 UTC
- JSON Representation
https://github.com/agmmnn/nis-scraper
Scrapy script to scrape nisanyansozluk.com
Last synced: 04 Nov 2024
https://github.com/vivekg13186/easy_web_crawler
Web crawler around puppeteer to crawler ajax/java script enabled pages.
Last synced: 28 Oct 2024
https://github.com/chenmozhijin/mediawikiextractor
一个用于从 MediaWiki 网站中提取数据并保存为json的 Python 脚本。|A Python script for extracting data from a MediaWiki website and saving it as json.
crawler crawler-python crawling extractor json mediawiki python regex web-crawler
Last synced: 09 Oct 2024
https://github.com/hrvadl/goweekly
Application for querying top articles from https://golangweekly.com/, translating them to Ukrainian and sending to the telegram channel
article chatgpt crawler go golang openai-api telegram telegram-bot
Last synced: 13 Oct 2024
https://github.com/igeligel/TeamFortressOutpostApi
:repeat: An API wrapper for the TF2 Outpost platform. A platform to find great deals for your Team Fortress 2, Counter-Strike: Global Offensive and Dota 2 items with zero hassle.
bot bot-framework crawler steam steam-api steambot teamfortress2
Last synced: 02 Aug 2024
https://github.com/dnlzrgz/winzig
A tiny search engine for personal use.
async cli crawler feeds lofi python python3 rss-feed rss-reader sqlalchemy sqlite sqlite3
Last synced: 05 Nov 2024
https://github.com/qin2dim/istockphoto-go
📸 Gracefully download dataset from iStockPhoto.
Last synced: 31 Oct 2024
https://github.com/waynechang65/baha-crawler
baha-crawler is a web crawler module designed to scarp data from Bahamut Forum.
bahamut crawler javascript nodejs scraper spider webcrawler
Last synced: 19 Oct 2024
https://github.com/rimiti/ping-urls
🏓 Ping URLs by batch.
cache crawler ping prerender prerendering seo
Last synced: 07 Nov 2024
https://github.com/rodyherrera/cdrake-se
✨ Search through the internet for free and unlimited without APIs involved. Find videos, images, sites, books, among more resources using the different engines provided by the library such as Bing, Google Yahoo, Wikipedia, Youtube... Browse safely and privately with the CodexDrake Search Engine =).
bing crawler engine google images javascript metasearch metasearch-engine news nodejs privacy search-engine searx videos webscraping websearch websearchengine whoogle wikipedia youtube
Last synced: 06 Nov 2024
https://github.com/ruedigervoigt/salted
Smart, Asynchronous Link Tester with Database backend: works with HTML, Markdown and TeX files
asyncio crawler html-files hyperlinks latex linkchecker markdown pandoc python
Last synced: 11 Oct 2024
https://github.com/arshadkazmi42/github-scanner-local
Locally scan all the repositories of a github organization
bounty bug bug-bounty crawler github local no-api scanner
Last synced: 28 Oct 2024
https://github.com/sauerbraten/chef
Cube 2: Sauerbraten spy bot: collects IP-name combinations from extinfo and provides a web interface to search them.
crawler extinfo go sauerbraten spy stalker
Last synced: 02 Aug 2024
https://github.com/bitebait/curry
🍛 Curry é um WebCrawler escrito em Golang com finalidade de verificar o valor do câmbio de Dólar para Real (USDxBRL) em algumas lojas no Paraguay.
api brasil crawler currency-exchange-rates go golang paraguay webcrawler
Last synced: 02 Aug 2024
https://github.com/tikazyq/colly-crawlers
Crawlers using Golang-based web crawling framework Colly
Last synced: 11 Oct 2024
https://github.com/denrydu/baiduimagecrawler
自己写的两个用来爬取百度图片的脚本,方便CV研究者制作数据集。Two ways to download images from baidu, useful tool for making cv datasets!
Last synced: 07 Nov 2024
https://github.com/andreoliwa/scrapy-tegenaria
🕷🕸 Spiders to crawl ads of houses and apartments. 🏠 🏢
crawler flask postgresql python python3 scrapy
Last synced: 31 Oct 2024
https://github.com/imthaghost/gocloneold
Website Cloner - Utilizes powerful go routines to clone websites to your computer within seconds.
Last synced: 31 Oct 2024
https://github.com/galaxiat/galaxiat.serve.seo
Node.JS package to serve React app and prerender path (cron)
crawler cron puppeteer seo seo-optimization ssr
Last synced: 05 Nov 2024
https://github.com/tsonglew/spidreat
Article Spider with Python & Node.js :beetle:
Last synced: 31 Oct 2024
https://github.com/ysh329/stock-newspaper-crawler
[UNMAINTAINED]Crawl 4 kinds of finance newspaper corpus (from CCSTOCK.CN).
corpus crawled-data crawler database stock-newspaper-crawler
Last synced: 29 Oct 2024
https://github.com/qianbinbin/moebooru-crawler
Retrieve links of images from moebooru-based sites, like yande.re and konachan.com .
Last synced: 29 Oct 2024
https://github.com/norconex/committer-neo4j
Implementation of Norconex Committer for Neo4j.
crawler neo4j neo4j-committer norconex-committer
Last synced: 10 Oct 2024
https://github.com/marvnc/pixiv-dump
Pixiv Encyclopedia DB Dumps, updated daily
crawler database dump encyclopedia japanese pixiv pixiv-crawler pixiv-database scraping
Last synced: 26 Oct 2024
https://github.com/Juphex/SupremeBot
Demonstrates automated purchasing of the clothing brand "Supreme". This was a fun project and had no further application.
android chrome crawler kivy python3 webscraping windows
Last synced: 23 Oct 2024
https://github.com/zabuzard/mplogger
Saves marketprices for items, based on transactions, from the game 'http://www.freewar.de/' in a database by using a bot. Then processes the data and creates corresponding market price articles in 'http://www.fwwiki.de/'.
bot crawler database game mediawiki-api mmorpg mmorpg-freewar php saves-marketprices web-crawler wikipedia-api
Last synced: 31 Oct 2024
https://github.com/santhoshse7en/alcoholics-anonymous
Research Project to analyse the knowledge about Alcoholics Anonymous in public
aa-meetings alcoholics alcoholics-anonymous anonymous bs4 crawler data-extraction-and-pre-processing google-search-using-python news-crawler newspaper3k python the-hindu web-scraping without-api
Last synced: 11 Oct 2024
https://github.com/brunojppb/airport-crawler
Simple and powerful CLI app to get worldwide airport information in JSON format
Last synced: 11 Oct 2024
https://github.com/nazanin1369/searchengine
Implementing a search engine using Java, AngularJS and Elastic search
angularjs crawler elasticsearch java search-engine
Last synced: 11 Oct 2024
https://github.com/runnin-n-gunnin/geckofxinterceptrequestcaptureresponse
[GeckoFX/Firefox]: Shows how to Intercept request(s), capture response(s), customize GeckoPreferences, handle certificate errors, change useragent++.
browser cefsharp controls crawler crawling firefox gecko geckofx geckofx60 scraping webbrowser windows windowsforms winforms
Last synced: 13 Oct 2024
https://github.com/roccomuso/is-twitter
Verify that a request is from Twitter crawlers using DNS verification steps
bot crawler dns ip js nodejs twitter verification
Last synced: 17 Oct 2024
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 13 Oct 2024
https://github.com/nakabonne/staticcollector
Application to analyze static files of competing sites
Last synced: 27 Oct 2024
https://github.com/nakabonne/netsurfer
netsurfer is a very lightweight scraping framework
Last synced: 27 Oct 2024
https://github.com/erikmueller/jazmax
Crawl JAZ for different heat pumps depending on flow and return temperatures from the JAZ calculator
crawler data-science efficiency green heatpump jaz
Last synced: 14 Oct 2024
https://github.com/eklem/browsercrawler
Crawling content from a site within the browser. A basis for i.e. a search solution for static sites.
crawler search-engine website-generation
Last synced: 15 Oct 2024
https://github.com/krishpranav/spider
A ruby web spidering tool that can spider a site, multiple domains, certain links or infinitely
crawler ruby spider web-crawler web-scraping
Last synced: 15 Oct 2024
https://github.com/tbarnes94/fortnite-weapons-bot
A bot that returns fortnite weapon statistics based on input from Discord users. Written in TypeScript.
crawler discord discord-bot discord-js typescript2
Last synced: 15 Oct 2024
https://github.com/xiantang/mini_scrapy
模仿scrapy的轻量级爬虫框架
crawler python3 requets scrapy
Last synced: 15 Oct 2024
https://github.com/congcoi123/crawler-sheis
A small crawler for getting data from the website: https://sheis.vn
crawler webcrawler webcrawling webscraper webscraping
Last synced: 20 Oct 2024
https://github.com/codeforequity-at/botium-crawler
Botium Crawler - Like a Website Crawler, just for Conversation Flows
Last synced: 20 Oct 2024
https://github.com/yidas/tw-stock-crawler-php
PHP Crawler for Taiwan Stock Data (台股資料爬蟲)
crawler stock taiwan taiwan-stock-information taiwan-stock-market
Last synced: 29 Oct 2024
https://github.com/gabrielrf/bsbdf
Telegram Public Channel
crawler python telegram telegram-channel telegraph
Last synced: 15 Oct 2024
https://github.com/jofaval/webscraping
WebScraper providing tools to scrape tons of websites with the same base
crawler e-commerce python scraper webscraper webscraping
Last synced: 21 Oct 2024
https://github.com/natshah/natshah-crawler
Natshah Crawler works to crawl a selected domain with all it's internal links and internal pages.
crawler database filter natshah-crawler
Last synced: 26 Oct 2024
https://github.com/zain-ul-din/lgu-crawler
LGU timetable Crawler
crawler lahore-garrison-university lahore-garrison-university-timetable
Last synced: 26 Oct 2024
https://github.com/gnujoow/crawl-repo
crawling github's repositories basic info
crawler github github-api python3
Last synced: 27 Oct 2024
https://github.com/nirjharlo/complete-google-seo-scan
WordPress Plugin with inbuilt SEO crawler
crawl-pages crawler seotools web-crawler web-spider wordpress wordpress-plugin
Last synced: 27 Oct 2024
https://github.com/marabesi/social-crawler
Easy way to find emails from social networks
crawler emails php social-crawler social-network
Last synced: 11 Oct 2024
https://github.com/polakosz/smf-scraper
You know, just for backup :smile: - The only so the best Simple Machines Forum C# scraper on GitHub :cat:
crawler csharp forum machines php scraper simple simplemachines smf
Last synced: 30 Oct 2024
https://github.com/sebi75/lightweight-sitemapper
A lightweight sitemapper written in typescript, built on top of fast-xml-parser and relying on few dependencies
Last synced: 27 Oct 2024
https://github.com/airtoxin/stackable-crawler
middleware based lightweight crawler framework
crawler javascript lightweight
Last synced: 06 Nov 2024
https://github.com/benderpan/fakeagent.net
Fake Agent for .Net Standard.
agent crawler fake-agent http-headers
Last synced: 05 Nov 2024
https://github.com/joelkoen/wls
Easily crawl multiple sitemaps and list URLs
Last synced: 07 Nov 2024
https://github.com/hoanle396/py-iconnect
crawler flask flask-application image-processing python
Last synced: 27 Oct 2024
https://github.com/e73b025/simple-python-url-crawler
Super simple Python3 website URL scraper/crawler. Multi-threaded.
crawler googlebot lightweight link-collection multi-threaded python python3 scraper simple
Last synced: 11 Oct 2024
https://github.com/mohabmes/matool
A collection of various custom tools. { Antesh, CITerm, INetSC, KADManga, Tomado }
cli codeigniter-terminal crawler mangareader markd markdown markdown-to-html parser readme scan-tool scanner-web
Last synced: 11 Oct 2024
https://github.com/spider-rs/web-crawling-guides
How to guides on web-crawling or scraping
agents ai-agents ai-scraping clean-markdown crawler fast-webcrawler html-to-markdown llm-webcrawler scraper web-scraping
Last synced: 05 Nov 2024
https://github.com/uzsoftic/ecommerce-web-crawler
WebCrawler for ecommerce sites
bot crawler crawler-php ecommerce laravel parser php php8
Last synced: 06 Nov 2024
https://github.com/marzzzello/gplaycrawler
(mirror) Discover apps by different mehtods. Mass download app packages and metadata.
crawler google-play google-play-store googleplay googleplaystore playstore playstoreapi scraper
Last synced: 05 Nov 2024
https://github.com/maxgio92/package-crawler
A package crawler for most known Linux distros
Last synced: 13 Oct 2024
https://github.com/gnuns/raspa
data mining stuff
crawler robot scraper web-scraper web-scraping web-spider
Last synced: 30 Oct 2024
https://github.com/piopi/behatcrawler
A Behat extension that crawls links on a website and executes user-defined function on each one of them.
behat behat-extension crawler php selenium-webdriver
Last synced: 01 Nov 2024
https://github.com/sinkaroid/webnovelcrawler
Simple PHPcurl and getRequest to grab Light Novel and WebNovel, then create parser with DOMpdf.
Last synced: 05 Nov 2024
https://github.com/geoffreybauduin/website-checker
Performs useful checks against a website, such as 404 errors reporting, structured data validation...
crawler seo structured-data web-spider website
Last synced: 06 Nov 2024
https://github.com/skylightqp/namu2csv
A namuwiki crawler that converts header to csv file for kartrider wiki
Last synced: 19 Oct 2024
https://github.com/dylanhogg/cloud-products
A package for getting cloud products and product descriptions from a cloud provider website.
aws cloud-products crawler data text-processing
Last synced: 27 Oct 2024
https://github.com/akashrajpurohit/node-crawler
Nodejs Crawler which scrapes a website on live domain and crawls to find all URL of the domain
crawler node-crawler nodejs url
Last synced: 06 Nov 2024
https://github.com/myconsciousness/metis
Metis main repository.
application client crawler crawling crawlwebpage educatable gui lerning logging programming-language python scrape scraping scraping-websites tkinter tkinter-gui tkinter-python
Last synced: 19 Oct 2024
https://github.com/coverified/spider
A microservice with web-crawler/spider capabilities which only follows and indexes urls of the provided host domain(s)
akka crawler graphql hacktoberfest microservice spider
Last synced: 06 Nov 2024
https://github.com/konradlinkowski/mailcrawler
Crawler to find emails in the websites
Last synced: 14 Oct 2024
https://github.com/h4r5h1t/crawlytics
A Python-based web crawling tool for data extraction and security analysis that supports various arguments for efficient crawling and outputs results in JSON format.
appsec crawler crawler-python mechanicalsoup security security-tools webcrawler
Last synced: 07 Nov 2024
https://github.com/jorgeparavicini/medalytik-python
Python crawlers for a job mediation firm
Last synced: 17 Oct 2024
https://github.com/orafaelfragoso/itunes-crawler
Retrieves information about an artist by crawling the iTunes API and iTunes Page
Last synced: 01 Nov 2024
https://github.com/idlesign/gallerycrawler
Generic crawling for galleries
crawler gallery images python3
Last synced: 30 Oct 2024
https://github.com/lykmapipo/producthunt-python-scrapy-scraper
Python Scrapy spiders that scrapes data from producthunt.com
crawler featured launch lykmapipo product producthunt python scraper scrapy spider webscraper
Last synced: 04 Nov 2024
https://github.com/joaopauloaramuni/python
Repo Python
crawler python scraping scrapy
Last synced: 04 Nov 2024
https://github.com/yordadev/fenrisjs
A NodeJS application that scrapes any links from a given input and outputs the results nicely into one of two files, external or internal file for further analysis.
analysis crawler link-collection link-crawler nodejs nodejs-application
Last synced: 11 Oct 2024