Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hrbrmstr/spiderbar
Lightweight R wrapper around rep-cpp for robot.txt (Robots Exclusion Protocol) parsing and path testing in R
r r-cyber robots-exclusion-protocol robots-txt rstats
Last synced: 21 Jun 2024
![](https://github.com/hrbrmstr.png)
https://github.com/PhrozenByte/pico-robots
This is Pico's official robots plugin to add a robots.txt and sitemap.xml to your website. Pico is a stupidly simple, blazing fast, flat file CMS.
pico pico-robots picocms picocms-plugin robots robots-txt sitemap sitemap-xml
Last synced: 07 Jun 2024
![](https://github.com/PhrozenByte.png)
https://github.com/kyr0/astro-launchpad
An Astro project template for decent projects: auth, i18next, Bootstrap, sitemap, webworker, robots.txt, preact, react, endpoints, endpoint clients, OAuth, various Astro features and data loading preconfigured
astro authentication bootstrap i18next microservices preact robots-txt scaffold sitemap-xml template
Last synced: 07 Jun 2024
![](https://github.com/kyr0.png)
https://github.com/ameygawade/streamlit-robots_txt_generator
This Streamlit app allows users to generate and customize a robots.txt file by selecting user-agents, specifying disallowed paths, enabling crawler delay, and providing a sitemap URL.
config data-science front generative generator google robots-txt search-algorithm search-engine seo seo-optimization stream streamlit txt-files web webapp webapplication
Last synced: 02 Jun 2024
![](https://github.com/ameygawade.png)
https://github.com/LuXDAmore/nuxt-humans-txt
🧑🏻👩🏻 "We are people, not machines" - An initiative to know the creators of a website. Contains the information about humans to the web building - A Nuxt Module to statically integrate and generate a humans.txt author file - Based on the HumansTxt Project.
author humans humans-txt modules nuxt nuxt-module nuxtjs robots robots-txt static vuejs
Last synced: 01 Jun 2024
![](https://github.com/LuXDAmore.png)
https://github.com/liameno/librengine
Privacy Web Search Engine (not meta, own crawler)
cpp crawler encryption frontend privacy robots-txt rsa search-engine self-hosted spider websearch websearchengine
Last synced: 21 May 2024
![](https://github.com/liameno.png)
https://github.com/mdreizin/gatsby-plugin-robots-txt
Gatsby plugin that automatically creates robots.txt for your site
gatsby gatsby-plugin robots-txt
Last synced: 11 May 2024
![](https://github.com/mdreizin.png)
https://github.com/beb7/gflare-tk
Open-Source Python Based SEO Web Crawler
crawler python robots-txt scraper seo seo-crawler tkinter
Last synced: 10 May 2024
![](https://github.com/beb7.png)
https://github.com/adileo/MicroFrontier
A lightweight crawler frontier implementation in TypeScript using Redis.
crawler frontier microservice redis robots-txt spider
Last synced: 07 May 2024
![](https://github.com/adileo.png)
https://github.com/TurnerSoftware/InfinityCrawler
A simple but powerful web crawler library for .NET
crawler robots-txt spider web-crawler web-crawling
Last synced: 05 May 2024
![](https://github.com/TurnerSoftware.png)
https://github.com/emacs-php/robots-txt-mode
Emacs major mode for editing robots.txt
emacs major-mode melpa robots-txt
Last synced: 13 Apr 2024
![](https://github.com/emacs-php.png)
https://github.com/LexiestLeszek/scrapeGPT
ScrapeGPT is a Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.
crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper
Last synced: 11 Apr 2024
![](https://github.com/LexiestLeszek.png)
https://github.com/stovv/next-strapi-sitemap
Generate sitemap and robots.txt for NextJS used web hook from STRAPI
nextjs robots-txt sitemap strapi
Last synced: 09 Apr 2024
![](https://github.com/stovv.png)
https://github.com/PuerkitoBio/fetchbot
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Last synced: 27 Mar 2024
![](https://github.com/PuerkitoBio.png)
https://github.com/PuerkitoBio/gocrawl
Polite, slim and concurrent web crawler.
Last synced: 27 Mar 2024
![](https://github.com/PuerkitoBio.png)
https://github.com/php-middleware/block-robots
Middleware to avoid search engine indexing with PSR-7 using robots.txt and X-Robots-Tag
google middleware psr-15 psr-7 robots-txt seo
Last synced: 25 Mar 2024
![](https://github.com/php-middleware.png)
https://github.com/cyb3r3x3r/chanakya
Scan websites for multiple things like honeypot, whois , port scan etc...
honeypot nmap portscan robots-txt scan-tool webscanner website whois whois-lookup
Last synced: 23 Mar 2024
![](https://github.com/Cyb3r3x3r.png)
https://github.com/eliasdabbas/advertools
advertools - online marketing productivity and analysis tools
advertising adwords digital-marketing google-ads keywords log-analysis logfile-parser marketing online-marketing python robots-txt scrapy search-engine-marketing search-engine-optimization seo seo-crawler serp social-media twitter-api youtube
Last synced: 17 Mar 2024
![](https://github.com/eliasdabbas.png)
https://github.com/spatie/robots-txt
Determine if a page may be crawled from robots.txt, robots meta tags and robot headers
Last synced: 16 Mar 2024
![](https://github.com/spatie.png)