{"id":48147572,"url":"https://github.com/ihuzaifashoukat/seo-analyzer","last_synced_at":"2026-04-04T17:01:45.329Z","repository":{"id":295356675,"uuid":"989888360","full_name":"ihuzaifashoukat/seo-analyzer","owner":"ihuzaifashoukat","description":"Powerful SEO Analyzer for comprehensive on-page, technical, and content audits. Features CLI \u0026 Flask API. Ideal for developers \u0026 SEOs. #Python #SEO","archived":false,"fork":false,"pushed_at":"2025-09-04T08:32:45.000Z","size":117,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-04T10:32:41.307Z","etag":null,"topics":["api","cli","content-analysis","digital-marketing","flask","on-page-seo","open-source","python","seo","seo-analysis","seo-analyzer","seo-audit","seo-checklist","seo-free","seo-optimization","seo-report","seo-tools","technical-seo","website-analyzer","website-optimization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ihuzaifashoukat.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-25T03:38:35.000Z","updated_at":"2025-09-04T08:32:48.000Z","dependencies_parsed_at":"2025-09-04T10:14:37.602Z","dependency_job_id":"ea5ff71d-d92d-47a3-8929-448ee7401eb3","html_url":"https://github.com/ihuzaifashoukat/seo-analyzer","commit_stats":null,"previous_names":["ihuzaifashoukat/seo-analyzer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ihuzaifashoukat/seo-analyzer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihuzaifashoukat%2Fseo-analyzer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihuzaifashoukat%2Fseo-analyzer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihuzaifashoukat%2Fseo-analyzer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihuzaifashoukat%2Fseo-analyzer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ihuzaifashoukat","download_url":"https://codeload.github.com/ihuzaifashoukat/seo-analyzer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ihuzaifashoukat%2Fseo-analyzer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31407391,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","cli","content-analysis","digital-marketing","flask","on-page-seo","open-source","python","seo","seo-analysis","seo-analyzer","seo-audit","seo-checklist","seo-free","seo-optimization","seo-report","seo-tools","technical-seo","website-analyzer","website-optimization"],"created_at":"2026-04-04T17:01:44.084Z","updated_at":"2026-04-04T17:01:45.250Z","avatar_url":"https://github.com/ihuzaifashoukat.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Advanced SEO Analyzer\n\nA modern, modular SEO analysis toolkit for Python. Run focused page-level audits or full site crawls, capture technical/content issues with clear severities, and export structured data for reporting. Built with extensibility in mind and designed for practical, actionable insights.\n\n## Highlights\n\n- On-page, Technical, and Content analyzers with unified scoring\n- Full Site Audit (Ahrefs-style) with concurrency, filtering, and exports\n- LLM/AI directives checklist (llms.txt / ai.txt)\n- Optional Lighthouse/CrUX metrics via PageSpeed Insights API\n- Duplicate detection across titles, descriptions, and visible text\n- Link graph, redirect chains/loops, status distribution, and internal link suggestions\n- REST API (Flask) and rich CLI with mobile-first and JS rendering options\n\n## Contents\n\n- Overview\n- Features\n- Project Structure\n- Quick Start\n- CLI Usage\n- API Usage\n- Configuration\n- Output \u0026 Exports\n- Optional Dependencies\n- Roadmap\n- Contributing \u0026 License\n\n---\n\n## Overview\n\nThe analyzer is split into focused subpackages: `on_page`, `technical`, `content`, `scoring`, and `site_audit`. Each module exposes a small, well-defined surface and can be extended independently. The CLI supports both single-page analysis and site-wide crawling with concurrency and filters. Results are JSON-first with optional CSV exports for pages, issues, and link edges.\n\n## Features\n\n- On-Page Analysis\n  - Title/meta description presence and lengths, duplication hints\n  - Heading structure (H1–H6), multiple H1 detection\n  - Image alt, responsive patterns, basic layout red flags\n  - Link audit (internal/external, broken links, rel, unsafe cross-origin)\n  - Content stats (word count, paragraphs, lorem ipsum)\n  - URL structure (length, depth, case), deprecated tags, inline CSS\n  - Social tags (Open Graph, Twitter Cards), favicon\n\n- Technical SEO\n  - Crawlability/Indexability: doctype, charset, viewport, AMP, language, hreflang, canonical, robots meta, structured data (JSON-LD/Microdata)\n  - Network \u0026 Headers: HTTP version, HSTS, server signature, cache headers, CDN hints\n  - Performance: DOM size, gzip, TTFB, optional PSI (Lighthouse/CrUX)\n  - Security: HTTPS usage, mixed content, plaintext emails, meta refresh\n  - Site-level: redirects chain trace, custom 404, directory browsing, SPF, ads.txt\n  - Assets: caching headers for CSS/JS/images; minification heuristics for CSS/JS\n  - LLMs: `llms.txt` / `ai.txt` detection and checklist with recommendations\n\n- Content Analysis\n  - Keyword extraction and target keyword usage\n  - Readability (Flesch Reading Ease)\n  - Text-to-HTML ratio\n  - Spellcheck (optional dependency)\n\n- Scoring\n  - Category scores (On-Page, Technical, Content) and overall score\n  - Configurable weights and category emphasis\n\n- Full Site Audit\n  - Crawler with robots.txt respect, include/exclude filters, subdomain toggle, depth/page caps, rate limiting, and optional JS rendering for discovery\n  - Concurrency for per-page analysis\n  - Issues with severity (error/warning/notice) across HTTP, redirects, sitemap, canonical, indexing, content/meta, links, international, performance, and security\n  - Status distribution, redirect loops, duplicate titles/meta/visible text, internal link graph (in/out degree) and heuristic internal link suggestions\n  - Optional exports: `pages.csv`, `issues.csv`, `edges.csv`\n\n## Project Structure\n\n```\nseo-analyzer/\n├── app.py                      # CLI \u0026 API entrypoint\n├── requirements.txt\n├── modules/\n│   ├── __init__.py\n│   ├── base_module.py          # Base with session, retries, headers\n│   ├── on_page/\n│   │   ├── __init__.py\n│   │   ├── analyzer.py         # On-page orchestrator\n│   │   ├── text_utils.py\n│   │   ├── title_meta.py\n│   │   ├── headings_links_images.py\n│   │   └── social_misc.py\n│   ├── technical/\n│   │   ├── __init__.py\n│   │   ├── analyzer.py         # Technical orchestrator\n│   │   ├── network.py\n│   │   ├── html_core.py\n│   │   ├── metrics.py\n│   │   ├── site_checks.py\n│   │   ├── assets.py\n│   │   ├── llms_txt.py         # LLMs/AI directives checklist\n│   │   └── performance_api.py  # PageSpeed Insights (optional)\n│   ├── content/\n│   │   ├── __init__.py\n│   │   ├── analyzer.py\n│   │   ├── text_utils.py\n│   │   ├── keywords.py\n│   │   ├── readability.py\n│   │   ├── ratio.py\n│   │   └── spellcheck.py\n│   ├── scoring/\n│   │   ├── __init__.py\n│   │   ├── analyzer.py\n│   │   ├── weights.py\n│   │   ├── util.py\n│   │   ├── on_page.py\n│   │   ├── technical.py\n│   │   └── content.py\n│   └── site_audit/\n│       ├── __init__.py\n│       ├── crawler.py          # Discovery crawler\n│       ├── render.py           # Optional Playwright renderer\n│       ├── audit.py            # Crawl + analyze + aggregate\n│       ├── issues.py           # Issue model \u0026 derivation\n│       ├── duplication.py      # Duplicate grouping helpers\n│       ├── sitemap.py          # Sitemap parsing \u0026 bucketing\n│       ├── export.py           # CSV exporters\n│       └── compare.py          # Diff between audit reports\n└── README.md\n```\n\n## Quick Start\n\n1) Python env\n- Python 3.8+\n- Optional: `python -m venv venv \u0026\u0026 source venv/bin/activate`\n\n2) Install\n- `pip install -r requirements.txt`\n- Optional dependencies:\n  - Playwright (JS rendering): `pip install playwright \u0026\u0026 playwright install`\n  - PSI (Lighthouse/CrUX): needs a Google API key (config below)\n\n3) Single-Page Audit (CLI)\n- `python app.py https://www.example.com`\n- Saves report to `reports/seo_report_\u003cdomain\u003e_\u003ctimestamp\u003e.json`\n\n4) Full Site Audit (CLI)\n- Example (mobile UA, filters, concurrency, exports):\n```\npython app.py https://www.example.com \\\n  --full-audit --max-pages 200 --max-depth 3 \\\n  --respect-robots --rate-limit 1.5 --workers 6 --mobile \\\n  --export-csv reports/example_audit \\\n  --include-path /blog --exclude-path re:^/admin --render-js\n```\n- Output:\n  - JSON report at `reports/site_audit_\u003cdomain\u003e_\u003ctimestamp\u003e.json`\n  - If `--export-csv` provided: `pages.csv`, `issues.csv`, `edges.csv`\n\n## CLI Usage\n\n- Single page:\n  - `python app.py \u003cURL\u003e [--keywords ...] [--config path.json] [--output json|txt]`\n- Full site audit:\n  - `--full-audit`: enable crawl + multi-page analysis\n  - `--max-pages`, `--max-depth`, `--rate-limit`\n  - `--include-subdomains`, `--same-domain-only`, `--respect-robots`/`--ignore-robots`\n  - `--include-path`, `--exclude-path` (prefix or `re:\u003cpattern\u003e`; repeatable)\n  - `--workers` (concurrent analysis), `--mobile` (mobile UA), `--render-js` (Playwright)\n  - `--auth-user`, `--auth-pass` for basic auth\n  - `--export-csv \u003cdir\u003e` for CSVs\n  - `--compare-report \u003cfile\u003e` to diff two site audit JSONs\n\n## API Usage (Flask)\n\nRun without a URL to start the API:\n- `python app.py`\n- POST/GET `http://127.0.0.1:5000/analyze?url=https://www.example.com`\n- Optional `keywords` (CSV or JSON array)\n- Response mirrors the single-page JSON structure.\n\n## Configuration\n\nConfig may be supplied via `--config path.json` or edited in `app.py`’s `DEFAULT_CONFIG`.\n\nExample snippet:\n```json\n{\n  \"OnPageAnalyzer\": {\n    \"title_min_length\": 20,\n    \"title_max_length\": 70,\n    \"desc_min_length\": 70,\n    \"desc_max_length\": 160\n  },\n  \"TechnicalSEOAnalyzer\": {\n    \"enable_pagespeed_insights\": true,\n    \"psi_api_key\": \"YOUR_GOOGLE_API_KEY\",\n    \"psi_strategy\": \"mobile\",\n    \"max_inline_js_to_check_minification\": 3,\n    \"max_js_to_check_minification\": 10\n  },\n  \"ContentAnalyzer\": {\n    \"top_n_keywords_count\": 10,\n    \"spellcheck_language\": \"en\"\n  },\n  \"ScoringModule\": {\n    \"weights\": {},\n    \"category_weights\": { \"OnPage\": 0.40, \"Technical\": 0.35, \"Content\": 0.25 }\n  },\n  \"FullSiteAudit\": {\n    \"max_pages\": 150,\n    \"max_depth\": 3,\n    \"respect_robots\": true,\n    \"same_domain_only\": true,\n    \"include_subdomains\": false,\n    \"rate_limit_rps\": 1.5,\n    \"workers\": 6,\n    \"include_paths\": [\"/blog\"],\n    \"exclude_paths\": [\"re:^/admin\"],\n    \"render_js\": true\n  },\n  \"Global\": {\n    \"request_timeout\": 12,\n    \"user_agent\": \"Mozilla/5.0 ...\",\n    \"accept_language\": \"en-US,en;q=0.8\",\n    \"http_retries_total\": 2,\n    \"http_backoff_factor\": 0.2,\n    \"http_status_forcelist\": [429,500,502,503,504],\n    \"http_allowed_retry_methods\": [\"HEAD\",\"GET\",\"OPTIONS\"]\n  }\n}\n```\n\n## Output \u0026 Exports\n\n- Single-page JSON (top-level):\n  - `seo_attributes.OnPageAnalyzer` (title/meta, headings, links, images, content stats, URL checks)\n  - `seo_attributes.TechnicalSEOAnalyzer` (headers, protocol, indexability, structured data, assets, PSI if enabled, llms.txt, redirects, robots/sitemap, SPF, ads.txt)\n  - `seo_attributes.ContentAnalyzer` (keywords, readability, ratio, spelling)\n  - `seo_attributes.ScoringModule` (category and overall scores)\n\n- Site audit JSON:\n  - `site_audit.summary`: status distribution, redirect loops, health score, duplicate groups, link graph metrics, sitemap summary, aggregate scores\n  - `site_audit.pages`: list of per-URL page results (same structure as single-page attributes)\n  - `site_audit.issues`: flattened issues with `url`, `code`, `title`, `severity`, `category`, `details`\n  - `site_audit.config_used`: crawl and worker config used; optional `exports` with CSV paths\n\n- CSVs (if `--export-csv`):\n  - `pages.csv`: URL, scores, HTTP status, TTFB, canonical/sitemap flags, schema, word count, H1 count, links, title, meta description\n  - `issues.csv`: URL, code, title, severity, category, details\n  - `edges.csv`: source, target, rel (internal link graph)\n\n## Optional Dependencies\n\n- `pyspellchecker`: content spell checks\n- `dnspython`: SPF lookup\n- `Pillow`: optional image-related utilities\n- `flask`: API mode\n- `playwright`: optional JS rendering for discovery (`--render-js`)\n- PageSpeed Insights: requires Google API key (`enable_pagespeed_insights`)\n\n## Roadmap\n\n- Rendered HTML analysis (Playwright) for per-page analyzers and JS error capture\n- Deeper structured data validation (rule-based, 190+ checks)\n- Expanded issue catalog and weighting\n- PSI/CrUX integration into health scoring/outlier detection\n- GSC/GA integrations and IndexNow submissions\n- Segmented crawling and richer URL detail panels\n\n## Contributing \u0026 License\n\n- Contributions welcome! Please open issues/PRs for features and fixes.\n- MIT License. See `LICENSE`.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fihuzaifashoukat%2Fseo-analyzer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fihuzaifashoukat%2Fseo-analyzer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fihuzaifashoukat%2Fseo-analyzer/lists"}