Projects in Awesome Lists tagged with robots-txt
A curated list of projects in awesome lists tagged with robots-txt .
https://github.com/PuerkitoBio/gocrawl
Polite, slim and concurrent web crawler.
Last synced: 25 Mar 2025
https://github.com/puerkitobio/gocrawl
Polite, slim and concurrent web crawler.
Last synced: 15 May 2025
https://github.com/eliasdabbas/advertools
advertools - online marketing productivity and analysis tools
advertising adwords digital-marketing google-ads keywords log-analysis logfile-parser marketing online-marketing python robots-txt scrapy search-engine-marketing search-engine-optimization seo seo-crawler serp social-media twitter-api youtube
Last synced: 13 May 2025
https://github.com/puerkitobio/fetchbot
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Last synced: 26 Jun 2025
https://github.com/PuerkitoBio/fetchbot
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Last synced: 25 Mar 2025
https://github.com/nuxt-modules/robots
Tame the robots crawling and indexing your Nuxt site.
nuxt nuxt-module robots-txt ssr vuejs
Last synced: 13 Apr 2025
https://github.com/temoto/robotstxt
The robots.txt exclusion protocol implementation for Go language
go go-library golang golang-library production-ready robots-txt status-active web
Last synced: 15 May 2025
https://github.com/turnersoftware/infinitycrawler
A simple but powerful web crawler library for .NET
crawler robots-txt spider web-crawler web-crawling
Last synced: 21 Jun 2025
https://github.com/TurnerSoftware/InfinityCrawler
A simple but powerful web crawler library for .NET
crawler robots-txt spider web-crawler web-crawling
Last synced: 25 Mar 2025
https://github.com/spatie/robots-txt
Determine if a page may be crawled from robots.txt, robots meta tags and robot headers
Last synced: 14 May 2025
https://github.com/gatenlp/ultimate-sitemap-parser
Ultimate Website Sitemap Parser
python python-3 python3 robots-txt sitemap sitemap-xml xml-sitemap xml-sitemap-parser
Last synced: 15 May 2025
https://github.com/alexjc/weboptout
Opt-Out tool to check Copyright reservations in a way that even machines can understand.
command-line-tool copyright data-ops ml-pipeline opt-out robots-txt terms-of-service webscraping
Last synced: 07 Apr 2025
https://github.com/beb7/gflare-tk
Open-Source Python Based SEO Web Crawler
crawler python robots-txt scraper seo seo-crawler tkinter
Last synced: 07 May 2025
https://github.com/healsdata/ai-training-opt-out
Known tags and settings suggested to opt out of having your content used for AI training.
Last synced: 25 Nov 2024
https://github.com/alextim/astro-lib
Makes it easy to add robots.txt, sitemap and web app manifest during build to your Astro app.
astro robots-txt robotstxt seo sitemap sitemap-xml webmanifest
Last synced: 06 Apr 2025
https://github.com/jimsmart/grobotstxt
grobotstxt is a native Go port of Google's robots.txt parser and matcher library.
go robots-exclusion-protocol robots-txt
Last synced: 19 Apr 2025
https://github.com/mdreizin/gatsby-plugin-robots-txt
Gatsby plugin that automatically creates robots.txt for your site
gatsby gatsby-plugin robots-txt
Last synced: 04 Apr 2025
https://github.com/samber/the-great-gpt-firewall
🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs
agent anthropic blocklist censorship crawler firewall genai generative-ai gpt gpt-4 llm openai robots-txt user-agent
Last synced: 07 Apr 2025
https://github.com/LexiestLeszek/scrapeGPT
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.
crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper
Last synced: 07 Apr 2025
https://github.com/t1gor/robots.txt-parser-class
Php class for robots.txt parse
google parser php robots-txt w3c yandex
Last synced: 05 Apr 2025
https://github.com/lexiestleszek/scrapegpt
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.
crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper
Last synced: 10 Apr 2025
https://github.com/liameno/librengine
Privacy Web Search Engine (not meta, own crawler)
cpp crawler encryption frontend privacy robots-txt rsa search-engine self-hosted spider websearch websearchengine
Last synced: 28 Apr 2025
https://github.com/ekalinin/robots.js
Parser for robots.txt for node.js
javascript nodejs parser robots robots-txt
Last synced: 24 Apr 2025
https://github.com/scrapy/protego
A pure-Python robots.txt parser with support for modern conventions.
hacktoberfest python robots-parser robots-txt
Last synced: 16 May 2025
https://github.com/itgalaxy/generate-robotstxt
Generator robots.txt for node js
cli generator-robots robot robots robots-generator robots-txt robotstxt
Last synced: 09 Apr 2025
https://github.com/kyr0/astro-launchpad
An Astro project template for decent projects: auth, i18next, Bootstrap, sitemap, webworker, robots.txt, preact, react, endpoints, endpoint clients, OAuth, various Astro features and data loading preconfigured
astro authentication bootstrap i18next microservices preact robots-txt scaffold sitemap-xml template
Last synced: 22 Jun 2025
https://github.com/mhmdiaa/waybackrobots
Enumerate old versions of robots.txt paths using Wayback Machine for content discovery
content-discovery recon robots-txt wayback-machine
Last synced: 19 Apr 2025
https://github.com/itgalaxy/robotstxt-webpack-plugin
A webpack plugin to generate a robots.txt file
robots-txt robotstxt webpack webpack-plugin
Last synced: 05 May 2025
https://github.com/turnersoftware/robotsexclusiontools
A "robots.txt" parsing and querying library for .NET
norobots-rfc parse parser robots-txt user-agent
Last synced: 21 Jun 2025
https://github.com/LuXDAmore/nuxt-humans-txt
🧑🏻👩🏻 "We are people, not machines" - An initiative to know the creators of a website. Contains the information about humans to the web building - A Nuxt Module to statically integrate and generate a humans.txt author file - Based on the HumansTxt Project.
author humans humans-txt modules nuxt nuxt-module nuxtjs robots robots-txt static vuejs
Last synced: 30 Mar 2025
https://github.com/luxdamore/nuxt-humans-txt
🧑🏻👩🏻 "We are people, not machines" - An initiative to know the creators of a website. Contains the information about humans to the web building - A Nuxt Module to statically integrate and generate a humans.txt author file - Based on the HumansTxt Project.
author humans humans-txt modules nuxt nuxt-module nuxtjs robots robots-txt static vuejs
Last synced: 13 Apr 2025
https://github.com/engincanv/seohelper
This package helps you to add meta-tags, sitemap.xml and robots.txt into your project easily.
dotnet dotnet-core nuget-package robots-txt seo sitemap-generator
Last synced: 12 Apr 2025
https://github.com/abdellahrk/seobundle
A complete SEO solution for Symfony projects. This bundle handles meta tags, Open Graph, Twitter Cards, canonical URLs, sitemaps, and more—helping your app stay search-engine friendly and socially shareable out of the box.
canonical-urls meta-tags meta-tags-management open-graph robots-txt search-engine-optimization seo seo-bundle sitemaps social-sharing symfony-seo twitter-cards webmaster-tools
Last synced: 15 Jun 2025
https://github.com/p0dalirius/robotsvalidator
A python script to check if URLs are allowed or disallowed by a robots.txt file.
allow bugbounty bypass check disallow robots-txt web
Last synced: 30 Dec 2024
https://github.com/bnomei/kirby3-robots-txt
Manage the robots.txt from the Kirby config file
kirby-cms kirby-plugin kirby3 kirby4 kirby5 robots-exclusion-protocol robots-txt
Last synced: 21 Jun 2025
https://github.com/stormid/robotify-netcore
Provides robots.txt middleware for .NET core
Last synced: 18 Mar 2025
https://github.com/mguinea/laravel-robots
Laravel package to manage robots
laravel package robots robots-txt seo seotools
Last synced: 29 Apr 2025
https://github.com/fooock/robots.txt
:robot: robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API
antlr4 api crawler crawler-engine docker docker-compose gradle java kotlin makefile postgresql redis redis-stream redis-streams robots-parser robots-txt spiders spring-boot
Last synced: 18 Mar 2025
https://github.com/momenbasel/pyrobots
a tool that gets all paths at robots.txt and opens it in the browser.
bugbounty penetration-testing pentesting python python3 robots-txt
Last synced: 13 Feb 2025
https://github.com/acp-code/astro-robots
A reliable robots.txt generator for Astro projects, offering zero-config setup and Verified Bots support.
astro-integration robots-txt verified-bots
Last synced: 08 Apr 2025
https://github.com/tractorcow/silverstripe-robots
Simple robots generation module for Silverstripe (SS 4 and above)
robots robots-txt silverstripe silverstripe-4
Last synced: 14 Apr 2025
https://github.com/ACP-CODE/astro-robots
A reliable robots.txt generator for Astro projects, offering zero-config setup and Verified Bots support.
astro-integration robots-txt verified-bots
Last synced: 06 Apr 2025
https://github.com/progressplanner/eco-friendly-robots-txt
Optimizes your site's robots.txt to reduce server load and CO2 footprint by blocking unnecessary crawlers while allowing major search engines and specific tools.
robots-txt wordpress wordpress-plugin
Last synced: 20 Mar 2025
https://github.com/crwlrsoft/robots-txt
Robots Exclusion Standard/Protocol Parser for Web Crawling/Scraping
hacktoberfest robots-exclusion-protocol robots-exclusion-standard robots-txt robots-txt-parser web-crawling web-scraping
Last synced: 13 May 2025
https://github.com/hrbrmstr/spiderbar
Lightweight R wrapper around rep-cpp for robot.txt (Robots Exclusion Protocol) parsing and path testing in R
r r-cyber robots-exclusion-protocol robots-txt rstats
Last synced: 16 Mar 2025
https://github.com/Lexxrt/Blue
🕵️♂️ɪɴғᴏʀᴍᴀᴛɪᴏɴ ɢᴀᴛʜᴇʀɪɴɢ ᴛᴏᴏʟ🕵️♂️
clickjacking dns-lookup geolocation haveibeenpwned http http-grabber info information-gathering link-grabber nmap port-scanner python robots-txt traceroute whois-lookup
Last synced: 30 Mar 2025
https://github.com/sobak/scrawler
Declarative, scriptable web robot (crawler) and scrapper
crawler crawler-engine robots-txt scraper scraping-websites
Last synced: 25 Mar 2025
https://github.com/aleksandrhovhannisyan/eleventy-plugin-robotstxt
Generate a robots.txt file for your Eleventy site
11ty eleventy eleventy-plugin robots-txt
Last synced: 09 Feb 2025
https://github.com/phrozenbyte/pico-robots
This is Pico's official robots plugin to add a robots.txt and sitemap.xml to your website. Pico is a stupidly simple, blazing fast, flat file CMS.
pico pico-robots picocms picocms-plugin robots robots-txt sitemap sitemap-xml
Last synced: 07 May 2025
https://github.com/ecnepsnai/robots.txt-block-ai
A robots.txt to ask AI from stealing your content
Last synced: 11 Mar 2025
https://github.com/adileo/MicroFrontier
A lightweight crawler frontier implementation in TypeScript using Redis.
crawler frontier microservice redis robots-txt spider
Last synced: 07 May 2025
https://github.com/PhrozenByte/pico-robots
This is Pico's official robots plugin to add a robots.txt and sitemap.xml to your website. Pico is a stupidly simple, blazing fast, flat file CMS.
pico pico-robots picocms picocms-plugin robots robots-txt sitemap sitemap-xml
Last synced: 31 Mar 2025
https://github.com/callumbwhyte/friendly-robots
A friendly tool for creating dynamic robots.txt files in Umbraco
Last synced: 12 Apr 2025
https://github.com/b4dnewz/robots-parse
A lightweight and simple robots.txt parser in node
osint parser robots-parser robots-txt
Last synced: 04 May 2025
https://github.com/a3onn/mapptth
A simple to use multi-threaded web-crawler written in C with libcURL and Lexbor.
c cmake gplv3 graphviz lexbor libcurl multi-threading robots-txt sitemap web-crawler
Last synced: 12 Apr 2025
https://github.com/glyn/nginx_robot_access
NGINX robot access module
hacktoberfest nginx robots-txt
Last synced: 13 Apr 2025
https://github.com/stovv/next-strapi-sitemap
Generate sitemap and robots.txt for NextJS used web hook from STRAPI
nextjs robots-txt sitemap strapi
Last synced: 15 Mar 2025
https://github.com/cyb3r3x3r/chanakya
Scan websites for multiple things like honeypot, whois , port scan etc...
honeypot nmap portscan robots-txt scan-tool webscanner website whois whois-lookup
Last synced: 12 May 2025
https://github.com/php-middleware/block-robots
Middleware to avoid search engine indexing with PSR-7 using robots.txt and X-Robots-Tag
google middleware psr-15 psr-7 robots-txt seo
Last synced: 13 Mar 2025
https://github.com/larevanchedessites/google-robotstxt-ruby
🤖 Ruby gem wrapper around Google Robotstxt Parser C++ library
c-plus-plus cpp gem google robots-parser robots-txt ruby ruby-gem rubygem rubygems seo
Last synced: 29 Jan 2025
https://github.com/austinsonger/sitemapsandrobotsaroundtheweb
Sitemaps and Robots.txt for websites around the world.
bug-bounty bugbounty ethical-hacking footprinting hacking information-gathering osint penetration-testing reconnaissance robots robots-txt scanning search searching security security-research sitemap sitemap-xml sitemaps webpentest
Last synced: 15 Mar 2025
https://github.com/emacs-php/robots-txt-mode
Emacs major mode for editing robots.txt
emacs major-mode melpa robots-txt
Last synced: 20 Mar 2025
https://github.com/amandeepmittal/robotize
Generates a robots.txt
javascript nodejs npm npm-package robots robots-generator robots-txt
Last synced: 12 Apr 2025
https://github.com/eliasdabbas/robotstxt_app
Visual App for Testing URLs and User-agents blocked by robots.txt Files
dashboard plotly-dash python robots-parser robots-txt
Last synced: 14 Apr 2025
https://github.com/florianwendelborn/robogen
🤖 Robots.txt generator done right.
npm-package robots-generator robots-txt
Last synced: 22 Feb 2025
https://github.com/rimiti/robotstxt
Robots.txt parser and generator - Work in progress
golang-package robots-parser robots-txt
Last synced: 19 Feb 2025
https://github.com/muratgozel/robotstxt-util
RFC 9309 spec compliant robots.txt builder and parser. 🦾 No dependencies, fully typed.
rfc-5234 robots-builder robots-exclusion-protocol robots-generator robots-parser robots-txt
Last synced: 03 Dec 2024
https://github.com/antoinegagne/robots
A parser for robots.txt with support for wildcards. See also RFC 9309.
crawling erlang erlang-library parser parsing parsing-library rfc-9309 robots-exclusion-standard robots-parser robots-txt
Last synced: 17 Jun 2025
https://github.com/hrbrmstr/robotify
🤖 Browser extension to check for and preview a site's robots.txt in a new tab (if it exists)
browser-extension r-cyber robots-txt
Last synced: 05 Mar 2025
https://github.com/josecarneiro/mr-roboto
🤖 Handle and parse a site's robots.txt file and extract actionable information
extract parser parsing robots robots-txt
Last synced: 15 Mar 2025
https://github.com/simonw/datasette-block-robots
Datasette plugin that blocks robots and crawlers using robots.txt
datasette datasette-io datasette-plugin robots-txt
Last synced: 19 Apr 2025
https://github.com/georgea93/crawley
nodejs web crawler
crawler depth es6 javascript node nodejs nodejs-web-crawler npm npm-module npm-package robots-txt sitemap web yarn
Last synced: 14 Mar 2025
https://github.com/enishant/domain-for-sale
This is ready to use template to quickly start selling domain with minimum setup.
domain lead-gathering lead-generation lead-generator leads robots-txt sell-domain seo-friendly seo-optimization seo-ready simple-website sitemap sitemap-xml website website-template
Last synced: 12 Mar 2025
https://github.com/apchavan/infopuller
Helpful CLI application to fetch useful details about website domains or local machine, using the core Windows OS functions.
autostart c cli-app cpp file-api ipv4 ipv6 mac-address malware-research operating-system persistence registry-hacks robots-txt threat win32-cpp windows windows-registry winsock2
Last synced: 01 Mar 2025
https://github.com/steeinru/php-robots
Generator robots.txt
laravel-package php php7 robots robots-generator robots-txt steein-robots
Last synced: 02 Dec 2024
https://github.com/becklyn/robots-txt
A package for generating a robots.txt programmatically.
Last synced: 10 Apr 2025
https://github.com/spences10/robots-txt-syntax-highlighting
robots.txt syntax highlighting for VS Code
highlighting robots-txt syntax vscode vscode-extension
Last synced: 09 Apr 2025
https://github.com/thefrosty/wp-block-ai-scrapers
Block all known AI Data Scrapers.
ai-bots htaccess-rule nginx-conf php81 robots-txt wordpress wordpress-plugin
Last synced: 13 Mar 2025
https://github.com/infinityloop-dev/robots
:wrench: Robots.txt generator component for Nette framework.
component nette php robots-txt
Last synced: 03 Dec 2024
https://github.com/vxern/robots_txt
⚙️ A quality `robots.txt` ruleset parser to ensure your application follows the standard specification for the file.
complete dart documented fast parser robots robots-txt robots-txt-parser robotstxt simple tiny
Last synced: 10 Apr 2025
https://github.com/r3k4t/pyrobotstxt
A simple python program which find out any website robots.txt file.
Last synced: 11 Mar 2025
https://github.com/ptsochantaris/can-proceed
A small, tested, no-frills parser of robots.txt files in Swift.
robots-parser robots-txt server-side-swift swift web-clients
Last synced: 19 Feb 2025
https://github.com/raminf/robonope-nginx
Take control of your own content. Enforce access to disallowed web URLs.
ai artificial-intelligence crawling machine-learning nginx robots robots-txt server spider web
Last synced: 21 Mar 2025
https://github.com/schnti/kirby-robots
Kirby 3 CMS plugin that adds a route for robots.txt
cms getkirby kirby3-cms kirby3-plugin robots-txt
Last synced: 17 Mar 2025
https://github.com/advanced-astro/rocketbase
🚀 This Astro template offers more than 'Just the Basics', providing a superior option for starting your next project wit best practices and a set of essential integrations already built-in.
astro astro-build astro-template astro-theme jamstack jamstack-theme robots-txt sitemap-xml starter-kit static-site-generator template
Last synced: 09 Feb 2025
https://github.com/beardedfish/vscode-robots-dot-txt-support
An extension for Visual Studio Code that enables support for robots.txt files. 🤖
commands extension intellisense language-server language-server-client language-server-protocol lsp mocha robots-txt snippets syntax-highlighting visual-studio-code vscode
Last synced: 14 May 2025
https://github.com/zvdy/parsero-go
Parsero is a free script written in Golang which reads the Robots.txt file of a web server and looks at the Disallow entries. The Disallow entries tell the search engines what directories or files hosted on a web server mustn't be indexed.
cybersecurity golang http robots-txt
Last synced: 30 Mar 2025
https://github.com/rimiti/robotizer
Robots.txt parser / generator
generator parser robots-parser robots-txt robotstxt
Last synced: 19 Feb 2025
https://github.com/james-see/random-robots-txt
Generates a random robots.txt deny list to throw script kiddies off the scent.
bot-blocker python3 robots-txt security-tools web-security
Last synced: 31 Mar 2025
https://github.com/thedaviddias/llms-txt-hub
🤖 The central hub for AI-ready documentation and tools implementing the llms.txt standard.
directory llms llms-txt llmstxt nextjs robots-txt supabase supabase-auth taiwindcss
Last synced: 28 Feb 2025
https://github.com/rix4uni/robotxt
Extract endpoints marked as Allow and Disallow in robots.txt
bug-bounty bugbounty bugbountytips hacking infosec osint osint-resources osint-tool penetration-testing pentest-tool pentesting recon reconnaissance robots-txt security security-tools threat-intelligence
Last synced: 09 May 2025
https://github.com/honzahommer/request-robots
An express.js middleware for handling noisy robots.txt
express express-middleware middleware nodejs npm request robots-txt
Last synced: 09 Apr 2025
https://github.com/camocatx/camocatx.github.io
The source code for Delete the Matrix blog - Exploiting, Experimenting, and Exploring the Universe
anybrowser gemfile gemfiles github-io-page github-pages hacker-blog hacker-blogs hacker-theme haxxor jekyll jekyll-blog jekyll-hacker jekyll-site jekyll-website matrix-blog matrix-theme personal-blog robots-txt seo-optimized the-matrix
Last synced: 13 Feb 2025
https://github.com/ameygawade/streamlit-robots_txt_generator
This Streamlit app allows users to generate and customize a robots.txt file by selecting user-agents, specifying disallowed paths, enabling crawler delay, and providing a sitemap URL.
config data-science front generative generator google robots-txt search-algorithm search-engine seo seo-optimization stream streamlit txt-files web webapp webapplication
Last synced: 04 Dec 2024
https://github.com/zorger27/3dconfigurator
👑 A 3D configurator is an innovative online technology that enables users to interact with 3D product models in real-time. 💎 It’s a powerful tool for businesses that allows your customers to customize products to their preferences. 🌈
css3 favicon flexbox-css git github google-analytics html5 i18n javascript markdown nodejs open-graph-protocol robots-txt scss search-console sitemap-xml threejs vuejs vuex webpack
Last synced: 02 Mar 2025
https://github.com/s-thom/create-robots-txt-action
An action to create a robots.txt file from different sources
action actions gh-action gh-actions github-action github-actions robots-txt robotstxt
Last synced: 02 Mar 2025
https://github.com/kevinbenabdelhak/wp-robots-txt-editor
WP Robots txt Editor est un plugin WordPress idéal pour modifier le fichier robots.txt à partir d'une simple page d'option. Générez un robots.txt par défaut et accédez à de nombreuses fonctionnalités comme le choix des publications, catégories, pages parents/enfants, et plus encore..
php robots-txt seo-optimization wordpress-plugin
Last synced: 03 Mar 2025
https://github.com/nickserv/crediblock
Open source robots.txt denylist for uncredited AI crawlers
Last synced: 18 Mar 2025
https://github.com/maximeguinard/robots.txt-viewer
🌐 Displays the contents of robots.txt and sitemap.xml files of a website google extension
extension extension-chrome extension-firefox extension-methods extension-pack extensions robots-txt robotstxt sitemap sitemap-xml sitemaps website website-builder website-design website-template websites
Last synced: 20 Mar 2025
https://github.com/zorger27/weather
☀️ Custom-built weather forecasting web app that delivers real-time data from OpenWeather for any city worldwide. 🌈 Whether you're a tech enthusiast or just curious about the weather, this app has something for everyone! ⛄️
axios favicon flexbox-css git google-analytics grid-css html5 i18n javascript markdown open-graph-protocol robots-txt scss search-console sitemap-xml threejs typescript vuejs vuex webpack
Last synced: 20 Mar 2025