{"id":49050631,"url":"https://github.com/PhialsBasement/LibreCrawl","last_synced_at":"2026-05-06T00:02:03.916Z","repository":{"id":316893686,"uuid":"1065249566","full_name":"PhialsBasement/LibreCrawl","owner":"PhialsBasement","description":"Free desktop SEO crawler - open source alternative to Screaming Frog and similar tools. Crawl websites, analyze links, extract SEO data, and export results without subscription fees. Fully customizable and extensible!","archived":false,"fork":false,"pushed_at":"2026-04-25T12:06:51.000Z","size":343,"stargazers_count":583,"open_issues_count":15,"forks_count":106,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-04-25T14:12:13.708Z","etag":null,"topics":["desktop-app","flask","free","open-source","python","seo","seo-analysis","web-crawler","website-auditing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PhialsBasement.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-27T10:46:19.000Z","updated_at":"2026-04-25T12:06:54.000Z","dependencies_parsed_at":"2025-09-27T12:27:40.524Z","dependency_job_id":"ca3cb876-9ef5-446f-a911-decee6ed3b21","html_url":"https://github.com/PhialsBasement/LibreCrawl","commit_stats":null,"previous_names":["phialsbasement/librecrawl"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/PhialsBasement/LibreCrawl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhialsBasement%2FLibreCrawl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhialsBasement%2FLibreCrawl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhialsBasement%2FLibreCrawl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhialsBasement%2FLibreCrawl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PhialsBasement","download_url":"https://codeload.github.com/PhialsBasement/LibreCrawl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PhialsBasement%2FLibreCrawl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32672682,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-05T11:29:49.557Z","status":"ssl_error","status_checked_at":"2026-05-05T11:29:48.587Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["desktop-app","flask","free","open-source","python","seo","seo-analysis","web-crawler","website-auditing"],"created_at":"2026-04-19T20:00:33.807Z","updated_at":"2026-05-06T00:02:03.909Z","avatar_url":"https://github.com/PhialsBasement.png","language":"Python","funding_links":["https://www.paypal.com/donate/?business=7H9HFA3385JS8\u0026no_recurring=0\u0026item_name=Continue+the+development+of+LibreCrawl\u0026currency_code=AUD"],"categories":[":hammer_and_wrench: Standalone Open Source SEO Tools"],"sub_categories":[":gear: OpenClaw SEO Skills"],"readme":"# LibreCrawl\n\nA web-based multi-tenant crawler for SEO analysis and website auditing.\n\n🌐 **Website**: [librecrawl.com](https://librecrawl.com)\n\n**Demo no longer available cause people thought it was a prod environ, it isnt, it was a demo to get a taste before installing**\n\n**API Documentation:** [https://librecrawl.com/api/docs/](https://librecrawl.com/api/docs/)\n\nLibreCrawl will ***always*** be free and open source. If it's replacing your $259/year Screaming Frog license, deepcrawl license or sitebulb license, [buy me a coffee](https://www.paypal.com/donate/?business=7H9HFA3385JS8\u0026no_recurring=0\u0026item_name=Continue+the+development+of+LibreCrawl\u0026currency_code=AUD).\n\n## What it does\n\nLibreCrawl crawls websites and gives you detailed information about pages, links, SEO elements, and performance. It's built as a web application using Python Flask with a modern web interface supporting multiple concurrent users.\n\n## Features\n\n- 🚀 **Multi-tenancy** - Multiple users can crawl simultaneously with isolated sessions\n- 🎨 **Custom CSS styling** - Personalize the UI with your own CSS themes\n- 💾 **Browser localStorage persistence** - Settings saved per browser\n- 🔄 **JavaScript rendering** for dynamic content (React, Vue, Angular, etc.)\n- 📊 **SEO analysis** - Extract titles, meta descriptions, headings, etc.\n- 🔗 **Link analysis** - Track internal and external links with detailed relationship mapping\n- 📈 **PageSpeed Insights integration** - Analyze Core Web Vitals\n- 💾 **Multiple export formats** - CSV, JSON, or XML\n- 🔍 **Issue detection** - Automated SEO issue identification\n- ⚡ **Real-time crawling progress** with live statistics\n\n## Getting started\n### Quick Start (Automatic Installation)\n\n**The easiest way to run LibreCrawl** - just run the startup script and it handles everything:\n\n**Windows:**\n```batch\nstart-librecrawl.bat\n```\n\n**Linux/Mac:**\n```bash\nchmod +x start-librecrawl.sh\n./start-librecrawl.sh\n```\n\n**What it does automatically:**\n1. Checks for Docker - if found, runs LibreCrawl in a container (recommended)\n2. If no Docker, checks for Python - if not found, downloads and installs it (Windows only *temporairly disabled since it causes some bat issues*)\n3. Installs all dependencies automatically (`pip install -r requirements.txt`)\n4. Installs Playwright browsers for JavaScript rendering\n5. Starts LibreCrawl in local mode (no authentication)\n6. Opens your browser to `http://localhost:5000`\n\n### Manual Installation\n\nIf you prefer to install manually or want more control:\n\n#### Option 1: Docker (Recommended)\n\n**Requirements:**\n- Docker and Docker Compose\n\n**Steps:**\n```bash\n# Clone the repository\ngit clone https://github.com/PhialsBasement/LibreCrawl.git\ncd LibreCrawl\n\n# Copy environment file\ncp .env.example .env\n\n# Start LibreCrawl\ndocker compose up -d\n\n# Open browser to http://localhost:5000\n```\nBy default, LibreCrawl runs in local mode for easy personal use. The `.env` file controls this:\n\n```bash\n# .env file\nLOCAL_MODE=true\nHOST_BINDING=127.0.0.1\nREGISTRATION_DISABLED=false\n```\n\nFor production deployment with user authentication, edit your `.env` file:\n\n```bash\n# .env file\nLOCAL_MODE=false\nHOST_BINDING=0.0.0.0\nREGISTRATION_DISABLED=false\n```\n\n\n#### Option 2: Python\n\n- Python 3.8 or later\n- Modern web browser (Chrome, Firefox, Safari, Edge)\n\n### Installation\n\n1. Clone or download this repository\n\n2. Install dependencies:\n```bash\npip install -r requirements.txt\n```\n\n3. For JavaScript rendering support (optional):\n```bash\nplaywright install chromium\n```\n\n4. Run the application:\n```bash\n# Standard mode (with authentication and tier system)\npython main.py\n\n# Local mode (all users get admin tier, no rate limits)\npython main.py --local\n# or\npython main.py -l\n```\n\n5. Open your browser and navigate to:\n   - Local: `http://localhost:5000`\n   - Network: `http://\u003cyour-ip\u003e:5000`\n\n\n## LibreCrawl Plugins\n\nDrop your custom plugin files in `/web/static/plugins/`! Each `.js` file will automatically create a new tab in LibreCrawl.\n\n### 🔌 Quick Start\n\n1. Create a new `.js` file in this folder (e.g., `my-plugin.js`)\n2. Register your plugin using the LibreCrawl Plugin API\n3. Refresh the app - your new tab appears automatically!\n\n### 📝 Example Plugin Structure\n\n```javascript\nLibreCrawlPlugin.register({\n  // Required: Unique ID (used for tab identification)\n  id: 'my-plugin',\n\n  // Required: Display name\n  name: 'My Plugin',\n\n  // Required: Tab configuration\n  tab: {\n    label: 'My Tab',\n    icon: '🔥', // Optional emoji\n  },\n\n  // Called when your tab is activated\n  onTabActivate(container, data) {\n    // data contains: { urls, links, issues, stats }\n    container.innerHTML = `\n      \u003cdiv class=\"plugin-content\" style=\"padding: 20px; overflow-y: auto; max-height: calc(100vh - 280px);\"\u003e\n        \u003ch2\u003eMy Custom Analysis\u003c/h2\u003e\n        \u003cp\u003eFound ${data.urls.length} URLs!\u003c/p\u003e\n      \u003c/div\u003e\n    `;\n  },\n\n  // Optional: Called during live crawls when data updates\n  onDataUpdate(data) {\n    if (this.isActive) {\n      // Update your UI\n    }\n  }\n});\n```\n\n### 🎯 Available Data\n\nYour plugin receives the same data as built-in tabs:\n\n- **`urls`** - Array of all crawled URLs with full metadata\n- **`links`** - All discovered links (internal/external)\n- **`issues`** - Detected SEO issues\n- **`stats`** - Crawl statistics (discovered, crawled, depth, speed)\n\n### 📚 Full API Reference\n\n#### Plugin Configuration\n\n```javascript\n{\n  id: string,              // Unique identifier\n  name: string,            // Display name\n  version: string,         // Optional version\n  author: string,          // Optional author\n  description: string,     // Optional description\n\n  tab: {\n    label: string,         // Tab button text\n    icon: string,          // Optional emoji/icon\n    position: number       // Optional position (default: append to end)\n  }\n}\n```\n\n#### Lifecycle Hooks\n\n- `onLoad()` - Called when plugin loads\n- `onTabActivate(container, data)` - Called when tab becomes active\n- `onTabDeactivate()` - Called when user switches away\n- `onDataUpdate(data)` - Called during live crawls\n- `onCrawlComplete(data)` - Called when crawl finishes\n\n#### Utilities\n\nAccess built-in utilities via `this.utils`:\n\n```javascript\nthis.utils.showNotification(message, type) // 'success', 'error', 'info'\nthis.utils.formatUrl(url)\nthis.utils.escapeHtml(text)\n```\n\n#### 🎨 Styling\n\nUse these CSS classes to match LibreCrawl's design:\n\n- `.plugin-content` - Main container\n- `.plugin-header` - Header section\n- `.data-table` - Tables (auto-styled)\n- `.stat-card` - Statistic cards\n- `.score-good` / `.score-needs-improvement` / `.score-poor` - Score indicators\n\n**Important:** Always add these styles to your main plugin container for proper scrolling:\n\n```javascript\ncontainer.innerHTML = `\n  \u003cdiv class=\"plugin-content\" style=\"padding: 20px; overflow-y: auto; max-height: calc(100vh - 280px);\"\u003e\n    \u003c!-- Your content here --\u003e\n  \u003c/div\u003e\n`;\n```\n\nThe `max-height: calc(100vh - 280px)` ensures your content scrolls properly within the tab pane.\n\n#### Example Plugins\n\nCheck out these example plugins to get started:\n\n- `_example-plugin.js` - Basic template (ignored by loader)\n- `e-e-a-t.js` - E-E-A-T analyzer example\n\n\n### Running Modes\n\n**Standard Mode** (default):\n- Full authentication system with login/register\n- Tier-based access control (Guest, User, Extra, Admin)\n- Guest users limited to 3 crawls per 24 hours (IP-based)\n- Ideal for public-facing demos or shared hosting\n\n**Local Mode** (`--local` or `-l`):\n- All users automatically get admin tier access\n- No rate limits or tier restrictions\n- Perfect for personal use or single-user self-hosting\n- Recommended for local development and testing\n\n## Configuration\n\nClick \"Settings\" to configure:\n\n- **Crawler settings**: depth (up to 5M URLs), delays, external links\n- **Request settings**: user agent, timeouts, proxy, robots.txt\n- **JavaScript rendering**: browser engine, wait times, viewport size\n- **Filters**: file types and URL patterns to include/exclude\n- **Export options**: formats and fields to export\n- **Custom CSS**: personalize the UI appearance with custom styles\n- **Issue exclusion**: patterns to exclude from SEO issue detection\n\nFor PageSpeed analysis, add a Google API key in Settings \u003e Requests for higher rate limits (25k/day vs limited).\n\n## Export formats\n\n- **CSV**: Spreadsheet-friendly format\n- **JSON**: Structured data with all details\n- **XML**: Markup format for other tools\n\n## Multi-tenancy\n\nLibreCrawl supports multiple concurrent users with isolated sessions:\n\n- Each browser session gets its own crawler instance and data\n- Settings are stored in browser localStorage (persistent across restarts)\n- Custom CSS themes are per-browser\n- Sessions expire after 1 hour of inactivity\n- Crawl data is isolated between users\n\n## Known limitations\n\n- PageSpeed API has rate limits (works better with API key)\n- Large sites may take time to crawl completely\n- JavaScript rendering is slower than HTTP-only crawling\n- Settings stored in localStorage (cleared if browser data is cleared)\n\n## Files\n\n- `main.py` - Main application and Flask server\n- `src/crawler.py` - Core crawling engine\n- `src/settings_manager.py` - Configuration management\n- `web/` - Frontend interface files\n\n## License\n\nMIT License - see LICENSE file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPhialsBasement%2FLibreCrawl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPhialsBasement%2FLibreCrawl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPhialsBasement%2FLibreCrawl/lists"}