An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with robots-txt

A curated list of projects in awesome lists tagged with robots-txt .

https://github.com/PuerkitoBio/gocrawl

Polite, slim and concurrent web crawler.

crawler robots-txt

Last synced: 25 Mar 2025

https://github.com/puerkitobio/gocrawl

Polite, slim and concurrent web crawler.

crawler robots-txt

Last synced: 15 May 2025

https://github.com/puerkitobio/fetchbot

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

crawler robots-txt

Last synced: 26 Jun 2025

https://github.com/PuerkitoBio/fetchbot

A simple and flexible web crawler that follows the robots.txt policies and crawl delays.

crawler robots-txt

Last synced: 25 Mar 2025

https://github.com/nuxt-modules/robots

Tame the robots crawling and indexing your Nuxt site.

nuxt nuxt-module robots-txt ssr vuejs

Last synced: 13 Apr 2025

https://github.com/temoto/robotstxt

The robots.txt exclusion protocol implementation for Go language

go go-library golang golang-library production-ready robots-txt status-active web

Last synced: 15 May 2025

https://github.com/turnersoftware/infinitycrawler

A simple but powerful web crawler library for .NET

crawler robots-txt spider web-crawler web-crawling

Last synced: 21 Jun 2025

https://github.com/TurnerSoftware/InfinityCrawler

A simple but powerful web crawler library for .NET

crawler robots-txt spider web-crawler web-crawling

Last synced: 25 Mar 2025

https://github.com/spatie/robots-txt

Determine if a page may be crawled from robots.txt, robots meta tags and robot headers

crawler php robots-txt

Last synced: 14 May 2025

https://github.com/alexjc/weboptout

Opt-Out tool to check Copyright reservations in a way that even machines can understand.

command-line-tool copyright data-ops ml-pipeline opt-out robots-txt terms-of-service webscraping

Last synced: 07 Apr 2025

https://github.com/beb7/gflare-tk

Open-Source Python Based SEO Web Crawler

crawler python robots-txt scraper seo seo-crawler tkinter

Last synced: 07 May 2025

https://github.com/healsdata/ai-training-opt-out

Known tags and settings suggested to opt out of having your content used for AI training.

ai meta opt-out robots-txt

Last synced: 25 Nov 2024

https://github.com/alextim/astro-lib

Makes it easy to add robots.txt, sitemap and web app manifest during build to your Astro app.

astro robots-txt robotstxt seo sitemap sitemap-xml webmanifest

Last synced: 06 Apr 2025

https://github.com/jimsmart/grobotstxt

grobotstxt is a native Go port of Google's robots.txt parser and matcher library.

go robots-exclusion-protocol robots-txt

Last synced: 19 Apr 2025

https://github.com/mdreizin/gatsby-plugin-robots-txt

Gatsby plugin that automatically creates robots.txt for your site

gatsby gatsby-plugin robots-txt

Last synced: 04 Apr 2025

https://github.com/samber/the-great-gpt-firewall

🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs

agent anthropic blocklist censorship crawler firewall genai generative-ai gpt gpt-4 llm openai robots-txt user-agent

Last synced: 07 Apr 2025

https://github.com/LexiestLeszek/scrapeGPT

ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.

crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper

Last synced: 07 Apr 2025

https://github.com/t1gor/robots.txt-parser-class

Php class for robots.txt parse

google parser php robots-txt w3c yandex

Last synced: 05 Apr 2025

https://github.com/lexiestleszek/scrapegpt

ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to return natural language answers to the user's queries.

crawler huggingface large-language-models llm ollama proxy rag retrieval-augmented-generation robots-txt scraper telegram-bot website-scraper

Last synced: 10 Apr 2025

https://github.com/ekalinin/robots.js

Parser for robots.txt for node.js

javascript nodejs parser robots robots-txt

Last synced: 24 Apr 2025

https://github.com/scrapy/protego

A pure-Python robots.txt parser with support for modern conventions.

hacktoberfest python robots-parser robots-txt

Last synced: 16 May 2025

https://github.com/kyr0/astro-launchpad

An Astro project template for decent projects: auth, i18next, Bootstrap, sitemap, webworker, robots.txt, preact, react, endpoints, endpoint clients, OAuth, various Astro features and data loading preconfigured

astro authentication bootstrap i18next microservices preact robots-txt scaffold sitemap-xml template

Last synced: 22 Jun 2025

https://github.com/mhmdiaa/waybackrobots

Enumerate old versions of robots.txt paths using Wayback Machine for content discovery

content-discovery recon robots-txt wayback-machine

Last synced: 19 Apr 2025

https://github.com/itgalaxy/robotstxt-webpack-plugin

A webpack plugin to generate a robots.txt file

robots-txt robotstxt webpack webpack-plugin

Last synced: 05 May 2025

https://github.com/turnersoftware/robotsexclusiontools

A "robots.txt" parsing and querying library for .NET

norobots-rfc parse parser robots-txt user-agent

Last synced: 21 Jun 2025

https://github.com/LuXDAmore/nuxt-humans-txt

🧑🏻👩🏻 "We are people, not machines" - An initiative to know the creators of a website. Contains the information about humans to the web building - A Nuxt Module to statically integrate and generate a humans.txt author file - Based on the HumansTxt Project.

author humans humans-txt modules nuxt nuxt-module nuxtjs robots robots-txt static vuejs

Last synced: 30 Mar 2025

https://github.com/luxdamore/nuxt-humans-txt

🧑🏻👩🏻 "We are people, not machines" - An initiative to know the creators of a website. Contains the information about humans to the web building - A Nuxt Module to statically integrate and generate a humans.txt author file - Based on the HumansTxt Project.

author humans humans-txt modules nuxt nuxt-module nuxtjs robots robots-txt static vuejs

Last synced: 13 Apr 2025

https://github.com/engincanv/seohelper

This package helps you to add meta-tags, sitemap.xml and robots.txt into your project easily.

dotnet dotnet-core nuget-package robots-txt seo sitemap-generator

Last synced: 12 Apr 2025

https://github.com/abdellahrk/seobundle

A complete SEO solution for Symfony projects. This bundle handles meta tags, Open Graph, Twitter Cards, canonical URLs, sitemaps, and more—helping your app stay search-engine friendly and socially shareable out of the box.

canonical-urls meta-tags meta-tags-management open-graph robots-txt search-engine-optimization seo seo-bundle sitemaps social-sharing symfony-seo twitter-cards webmaster-tools

Last synced: 15 Jun 2025

https://github.com/p0dalirius/robotsvalidator

A python script to check if URLs are allowed or disallowed by a robots.txt file.

allow bugbounty bypass check disallow robots-txt web

Last synced: 30 Dec 2024

https://github.com/bnomei/kirby3-robots-txt

Manage the robots.txt from the Kirby config file

kirby-cms kirby-plugin kirby3 kirby4 kirby5 robots-exclusion-protocol robots-txt

Last synced: 21 Jun 2025

https://github.com/stormid/robotify-netcore

Provides robots.txt middleware for .NET core

netcore robots-txt

Last synced: 18 Mar 2025

https://github.com/mguinea/laravel-robots

Laravel package to manage robots

laravel package robots robots-txt seo seotools

Last synced: 29 Apr 2025

https://github.com/fooock/robots.txt

:robot: robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API

antlr4 api crawler crawler-engine docker docker-compose gradle java kotlin makefile postgresql redis redis-stream redis-streams robots-parser robots-txt spiders spring-boot

Last synced: 18 Mar 2025

https://github.com/momenbasel/pyrobots

a tool that gets all paths at robots.txt and opens it in the browser.

bugbounty penetration-testing pentesting python python3 robots-txt

Last synced: 13 Feb 2025

https://github.com/acp-code/astro-robots

A reliable robots.txt generator for Astro projects, offering zero-config setup and Verified Bots support.

astro-integration robots-txt verified-bots

Last synced: 08 Apr 2025

https://github.com/tractorcow/silverstripe-robots

Simple robots generation module for Silverstripe (SS 4 and above)

robots robots-txt silverstripe silverstripe-4

Last synced: 14 Apr 2025

https://github.com/ACP-CODE/astro-robots

A reliable robots.txt generator for Astro projects, offering zero-config setup and Verified Bots support.

astro-integration robots-txt verified-bots

Last synced: 06 Apr 2025

https://github.com/progressplanner/eco-friendly-robots-txt

Optimizes your site's robots.txt to reduce server load and CO2 footprint by blocking unnecessary crawlers while allowing major search engines and specific tools.

robots-txt wordpress wordpress-plugin

Last synced: 20 Mar 2025

https://github.com/hrbrmstr/spiderbar

Lightweight R wrapper around rep-cpp for robot.txt (Robots Exclusion Protocol) parsing and path testing in R

r r-cyber robots-exclusion-protocol robots-txt rstats

Last synced: 16 Mar 2025

https://github.com/Lexxrt/Blue

🕵️‍♂️ɪɴғᴏʀᴍᴀᴛɪᴏɴ ɢᴀᴛʜᴇʀɪɴɢ ᴛᴏᴏʟ🕵️‍♂️

clickjacking dns-lookup geolocation haveibeenpwned http http-grabber info information-gathering link-grabber nmap port-scanner python robots-txt traceroute whois-lookup

Last synced: 30 Mar 2025

https://github.com/sobak/scrawler

Declarative, scriptable web robot (crawler) and scrapper

crawler crawler-engine robots-txt scraper scraping-websites

Last synced: 25 Mar 2025

https://github.com/aleksandrhovhannisyan/eleventy-plugin-robotstxt

Generate a robots.txt file for your Eleventy site

11ty eleventy eleventy-plugin robots-txt

Last synced: 09 Feb 2025

https://github.com/phrozenbyte/pico-robots

This is Pico's official robots plugin to add a robots.txt and sitemap.xml to your website. Pico is a stupidly simple, blazing fast, flat file CMS.

pico pico-robots picocms picocms-plugin robots robots-txt sitemap sitemap-xml

Last synced: 07 May 2025

https://github.com/ecnepsnai/robots.txt-block-ai

A robots.txt to ask AI from stealing your content

against-ai robots-txt

Last synced: 11 Mar 2025

https://github.com/adileo/MicroFrontier

A lightweight crawler frontier implementation in TypeScript using Redis.

crawler frontier microservice redis robots-txt spider

Last synced: 07 May 2025

https://github.com/PhrozenByte/pico-robots

This is Pico's official robots plugin to add a robots.txt and sitemap.xml to your website. Pico is a stupidly simple, blazing fast, flat file CMS.

pico pico-robots picocms picocms-plugin robots robots-txt sitemap sitemap-xml

Last synced: 31 Mar 2025

https://github.com/callumbwhyte/friendly-robots

A friendly tool for creating dynamic robots.txt files in Umbraco

dotnet robots-txt seo umbraco

Last synced: 12 Apr 2025

https://github.com/b4dnewz/robots-parse

A lightweight and simple robots.txt parser in node

osint parser robots-parser robots-txt

Last synced: 04 May 2025

https://github.com/a3onn/mapptth

A simple to use multi-threaded web-crawler written in C with libcURL and Lexbor.

c cmake gplv3 graphviz lexbor libcurl multi-threading robots-txt sitemap web-crawler

Last synced: 12 Apr 2025

https://github.com/glyn/nginx_robot_access

NGINX robot access module

hacktoberfest nginx robots-txt

Last synced: 13 Apr 2025

https://github.com/stovv/next-strapi-sitemap

Generate sitemap and robots.txt for NextJS used web hook from STRAPI

nextjs robots-txt sitemap strapi

Last synced: 15 Mar 2025

https://github.com/cyb3r3x3r/chanakya

Scan websites for multiple things like honeypot, whois , port scan etc...

honeypot nmap portscan robots-txt scan-tool webscanner website whois whois-lookup

Last synced: 12 May 2025

https://github.com/php-middleware/block-robots

Middleware to avoid search engine indexing with PSR-7 using robots.txt and X-Robots-Tag

google middleware psr-15 psr-7 robots-txt seo

Last synced: 13 Mar 2025

https://github.com/larevanchedessites/google-robotstxt-ruby

🤖 Ruby gem wrapper around Google Robotstxt Parser C++ library

c-plus-plus cpp gem google robots-parser robots-txt ruby ruby-gem rubygem rubygems seo

Last synced: 29 Jan 2025

https://github.com/emacs-php/robots-txt-mode

Emacs major mode for editing robots.txt

emacs major-mode melpa robots-txt

Last synced: 20 Mar 2025

https://github.com/eliasdabbas/robotstxt_app

Visual App for Testing URLs and User-agents blocked by robots.txt Files

dashboard plotly-dash python robots-parser robots-txt

Last synced: 14 Apr 2025

https://github.com/florianwendelborn/robogen

🤖 Robots.txt generator done right.

npm-package robots-generator robots-txt

Last synced: 22 Feb 2025

https://github.com/rimiti/robotstxt

Robots.txt parser and generator - Work in progress

golang-package robots-parser robots-txt

Last synced: 19 Feb 2025

https://github.com/muratgozel/robotstxt-util

RFC 9309 spec compliant robots.txt builder and parser. 🦾 No dependencies, fully typed.

rfc-5234 robots-builder robots-exclusion-protocol robots-generator robots-parser robots-txt

Last synced: 03 Dec 2024

https://github.com/antoinegagne/robots

A parser for robots.txt with support for wildcards. See also RFC 9309.

crawling erlang erlang-library parser parsing parsing-library rfc-9309 robots-exclusion-standard robots-parser robots-txt

Last synced: 17 Jun 2025

https://github.com/hrbrmstr/robotify

🤖 Browser extension to check for and preview a site's robots.txt in a new tab (if it exists)

browser-extension r-cyber robots-txt

Last synced: 05 Mar 2025

https://github.com/josecarneiro/mr-roboto

🤖 Handle and parse a site's robots.txt file and extract actionable information

extract parser parsing robots robots-txt

Last synced: 15 Mar 2025

https://github.com/simonw/datasette-block-robots

Datasette plugin that blocks robots and crawlers using robots.txt

datasette datasette-io datasette-plugin robots-txt

Last synced: 19 Apr 2025

https://github.com/apchavan/infopuller

Helpful CLI application to fetch useful details about website domains or local machine, using the core Windows OS functions.

autostart c cli-app cpp file-api ipv4 ipv6 mac-address malware-research operating-system persistence registry-hacks robots-txt threat win32-cpp windows windows-registry winsock2

Last synced: 01 Mar 2025

https://github.com/becklyn/robots-txt

A package for generating a robots.txt programmatically.

library php robots-txt

Last synced: 10 Apr 2025

https://github.com/spences10/robots-txt-syntax-highlighting

robots.txt syntax highlighting for VS Code

highlighting robots-txt syntax vscode vscode-extension

Last synced: 09 Apr 2025

https://github.com/infinityloop-dev/robots

:wrench: Robots.txt generator component for Nette framework.

component nette php robots-txt

Last synced: 03 Dec 2024

https://github.com/vxern/robots_txt

⚙️ A quality `robots.txt` ruleset parser to ensure your application follows the standard specification for the file.

complete dart documented fast parser robots robots-txt robots-txt-parser robotstxt simple tiny

Last synced: 10 Apr 2025

https://github.com/r3k4t/pyrobotstxt

A simple python program which find out any website robots.txt file.

robots-txt

Last synced: 11 Mar 2025

https://github.com/ptsochantaris/can-proceed

A small, tested, no-frills parser of robots.txt files in Swift.

robots-parser robots-txt server-side-swift swift web-clients

Last synced: 19 Feb 2025

https://github.com/raminf/robonope-nginx

Take control of your own content. Enforce access to disallowed web URLs.

ai artificial-intelligence crawling machine-learning nginx robots robots-txt server spider web

Last synced: 21 Mar 2025

https://github.com/schnti/kirby-robots

Kirby 3 CMS plugin that adds a route for robots.txt

cms getkirby kirby3-cms kirby3-plugin robots-txt

Last synced: 17 Mar 2025

https://github.com/advanced-astro/rocketbase

🚀 This Astro template offers more than 'Just the Basics', providing a superior option for starting your next project wit best practices and a set of essential integrations already built-in.

astro astro-build astro-template astro-theme jamstack jamstack-theme robots-txt sitemap-xml starter-kit static-site-generator template

Last synced: 09 Feb 2025

https://github.com/zvdy/parsero-go

Parsero is a free script written in Golang which reads the Robots.txt file of a web server and looks at the Disallow entries. The Disallow entries tell the search engines what directories or files hosted on a web server mustn't be indexed.

cybersecurity golang http robots-txt

Last synced: 30 Mar 2025

https://github.com/rimiti/robotizer

Robots.txt parser / generator

generator parser robots-parser robots-txt robotstxt

Last synced: 19 Feb 2025

https://github.com/james-see/random-robots-txt

Generates a random robots.txt deny list to throw script kiddies off the scent.

bot-blocker python3 robots-txt security-tools web-security

Last synced: 31 Mar 2025

https://github.com/thedaviddias/llms-txt-hub

🤖 The central hub for AI-ready documentation and tools implementing the llms.txt standard.

directory llms llms-txt llmstxt nextjs robots-txt supabase supabase-auth taiwindcss

Last synced: 28 Feb 2025

https://github.com/honzahommer/request-robots

An express.js middleware for handling noisy robots.txt

express express-middleware middleware nodejs npm request robots-txt

Last synced: 09 Apr 2025

https://github.com/ameygawade/streamlit-robots_txt_generator

This Streamlit app allows users to generate and customize a robots.txt file by selecting user-agents, specifying disallowed paths, enabling crawler delay, and providing a sitemap URL.

config data-science front generative generator google robots-txt search-algorithm search-engine seo seo-optimization stream streamlit txt-files web webapp webapplication

Last synced: 04 Dec 2024

https://github.com/zorger27/3dconfigurator

👑 A 3D configurator is an innovative online technology that enables users to interact with 3D product models in real-time. 💎 It’s a powerful tool for businesses that allows your customers to customize products to their preferences. 🌈

css3 favicon flexbox-css git github google-analytics html5 i18n javascript markdown nodejs open-graph-protocol robots-txt scss search-console sitemap-xml threejs vuejs vuex webpack

Last synced: 02 Mar 2025

https://github.com/s-thom/create-robots-txt-action

An action to create a robots.txt file from different sources

action actions gh-action gh-actions github-action github-actions robots-txt robotstxt

Last synced: 02 Mar 2025

https://github.com/kevinbenabdelhak/wp-robots-txt-editor

WP Robots txt Editor est un plugin WordPress idéal pour modifier le fichier robots.txt à partir d'une simple page d'option. Générez un robots.txt par défaut et accédez à de nombreuses fonctionnalités comme le choix des publications, catégories, pages parents/enfants, et plus encore..

php robots-txt seo-optimization wordpress-plugin

Last synced: 03 Mar 2025

https://github.com/nickserv/crediblock

Open source robots.txt denylist for uncredited AI crawlers

ai robots-txt

Last synced: 18 Mar 2025

https://github.com/zorger27/weather

☀️ Custom-built weather forecasting web app that delivers real-time data from OpenWeather for any city worldwide. 🌈 Whether you're a tech enthusiast or just curious about the weather, this app has something for everyone! ⛄️

axios favicon flexbox-css git google-analytics grid-css html5 i18n javascript markdown open-graph-protocol robots-txt scss search-console sitemap-xml threejs typescript vuejs vuex webpack

Last synced: 20 Mar 2025