{"id":50333061,"url":"https://github.com/kernix13/github-readme-seo-analysis","last_synced_at":"2026-05-29T11:01:31.207Z","repository":{"id":352473207,"uuid":"1215255546","full_name":"Kernix13/github-readme-seo-analysis","owner":"Kernix13","description":"A Jupyter Notebook GitHub README and Repo SEO Analysis to determine what makes a repo rank in the SERPS","archived":false,"fork":false,"pushed_at":"2026-05-11T17:13:21.000Z","size":362,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-11T19:15:47.622Z","etag":null,"topics":["accessibility","data-analysis","readme","seo","seo-analysis"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Kernix13.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-19T17:23:33.000Z","updated_at":"2026-05-11T17:14:28.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Kernix13/github-readme-seo-analysis","commit_stats":null,"previous_names":["kernix13/github-readme-seo-analysis"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Kernix13/github-readme-seo-analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kernix13%2Fgithub-readme-seo-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kernix13%2Fgithub-readme-seo-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kernix13%2Fgithub-readme-seo-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kernix13%2Fgithub-readme-seo-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Kernix13","download_url":"https://codeload.github.com/Kernix13/github-readme-seo-analysis/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Kernix13%2Fgithub-readme-seo-analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33648534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["accessibility","data-analysis","readme","seo","seo-analysis"],"created_at":"2026-05-29T11:01:30.251Z","updated_at":"2026-05-29T11:01:31.201Z","avatar_url":"https://github.com/Kernix13.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GitHub README SEO: Data Analysis of What Makes Repos Rank\n\n\u003c!-- slug: github-readme-seo-analysis --\u003e\n\u003c!-- Title chars = 57 chars --\u003e\n\n \u003c!-- LATER: Badges (Python, Pandas, Numpy?, Matplotlib, Seaborn?) --\u003e\n \u003c!-- Google: \"sample readme for a jupyter notebook data analysis project\" for good boilerplate --\u003e\n\nThis project performs a GitHub README \u003cabbr title=\"Search Engine Optimization\"\u003eSEO\u003c/abbr\u003e Analysis using Jupyter Notebook and data from GitHub Explore \u0026 Google to determine the metrics needed to rank in the \u003cabbr title=\"Search Engine Results Pages\"\u003eSERPs\u003c/abbr\u003e.\n\n\u003e [!NOTE]\n\u003e I am new to Data Analysis and Jupyter Notebook. I am in the early stages of this analysis and it will take me a long time to finish unless I get help.\n\n\u003c!-- Intro paraagraph = 171 chars --\u003e\n\n\u003c!--\n   ✅ = Section done\n   📌 = Section not done\n --\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Overview\n\n\u003c!-- ✅ --\u003e\n\nThe goal of this project is to understand why certain GitHub repositories rank in Google \u0026 GitHub Explore search results while others do not.\n\nI collected a dataset of repositories using 46 search phrases and recorded both Google rankings and GitHub Explore rankings. For each repository, I also gathered metrics related to README content, repository activity, and available SEO data like titles and meta descriptions.\n\n\u003c!-- The analysis focuses on identifying patterns between these factors and ranking performance. In particular, it looks at whether common practices, such as having a clear README structure, or a descriptive introduction is associated with higher visibility. --\u003e\n\nThe end goal is to turn these findings into practical insights that can be applied to improve repository discoverability. This includes refining my own repositories as well as sharing useful patterns, the dataset, and the results with other developers.\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n\u003ch2 id=\"back-to-top\"\u003eTable of Contents\u003c/h1\u003e\n\n1. [Key Questions](#key-questions)\n1. [Key Findings](#key-findings)\n1. [Data Sources](#data-sources)\n1. [Methodology](#methodology)\n1. [Visualizations](#visualizations)\n1. [Data Dictionary](#data-dictionary)\n1. [Project Structure](#project-structure)\n1. [Tech Stack](#tech-stack)\n1. [Installation](#installation)\n1. [Usage](#usage)\n1. [Future Improvements](#future-improvements)\n1. [AI Usage](#ai-usage)\n1. [Acknowledgments](#acknowledgments)\n1. [Contributing](#contributing)\n1. [License](#license)\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Key Questions\n\n\u003c!-- 📌 --\u003e\n\n\u003e 🚧 Section under construction (Too many questions?)\n\n- Which factors are associated with a repository appearing in Google search results (SERPs)?\n- Which factors are associated with higher rankings within Google SERPs and GitHub Explore?\n- How closely do GitHub Explore rankings align with Google search rankings?\n- Is there a relationship between README structure (e.g., H1 usage, table of contents, introduction) and ranking?\n- Does the presence of a clear, descriptive introduction impact visibility or ranking?\n- Do content characteristics (e.g., word count, links, images) correlate with ranking performance?\n- Do broken or low-quality links (e.g., `http://localhost`) correlate with lower rankings?\n- Which repository features within a developer's control appear most associated with higher rankings?\n- How often do repositories use default titles (e.g., username/repository) versus descriptive titles? What causes that difference?\n- Is GitHub \"About\" text reused in Google SEO titles or meta descriptions?\n\n### Specific questions (remove this section later maybe)\n\n1. Does about_text get reused in:\n   - seo_title\n   - meta_description\n2. Do SERP fields reuse leading substrings of:\n   - about text\n   - README title\n   - Intro paragraphs (I need intro text)\n3. Does Google fall back to:\n   - username/repo for SEO Title when no usable text exists?\n4. Does having a good repo name with `-` as a separator result in a higher rank on average?\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Key Findings\n\n\u003c!-- 📌 --\u003e\n\nSummarize the most important insights discovered during the analysis.\n\n\u003e 🚧 Section under construction\n\n\u003c!-- Example:\n\n- Homes within 1 mile of downtown were 35% more expensive.\n- Temperature had a strong correlation with bike rentals.\n- Product category X generated 48% of revenue. --\u003e\n\n### \u003cspan aria-hidden=\"true\"\u003e🔍\u003c/span\u003e Google SERP Insights\n\n- 🚧 Nothing yet (this sub-section may not be needed)\n\n### \u003cspan aria-hidden=\"true\"\u003e🔍\u003c/span\u003e GitHub Explore Insights\n\n- 🚧 Nothing yet (this sub-section may not be needed)\n\n\u003c!-- ### Key Differences --\u003e\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Data Sources\n\n\u003c!-- ✅ --\u003e\n\nThe dataset for this project was created manually using a combination of Google search results and GitHub Explore.\n\nTwo primary data files were generated:\n\n- `data/all_metrics.csv`\n  Contains repository-level data, including README structure and content metrics (e.g., headings, word count, links, images), repository metadata (stars, forks, contributors), and SEO-related fields where available.\n- `data/search_ranks.csv`\n  Contains ranking data for each repository across multiple search phrases.\n\nThese datasets are joined using the `user_reponame` field to enable combined analysis of repository features and ranking performance (see `merged_data.csv`).\n\n### \u003cspan aria-hidden=\"true\"\u003e🗃️\u003c/span\u003e \u003cabbr title=\"Application Programming Interfaces\"\u003eAPIs\u003c/abbr\u003e Used\n\n- GitHub API: Used in `github_api.py` to collect repository metadata. This significantly reduced the need for manual data collection and improved consistency across records. More code could be added to get additional repo and README metrics.\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Methodology\n\n\u003c!-- 📌 --\u003e\n\n\u003e 🚧 Section under construction\n\n- Data collection: search phrases on Google and GitHub Explore\n- Data Processing and Transformation: ???\n- Data Analysis: ???\n\n\u003e Repositories without a README file were excluded from content-based analysis where applicable, as key metrics (e.g., word count, structure, and links) could not be derived.\n\n### \u003cspan aria-hidden=\"true\"\u003e🗃️\u003c/span\u003e Data Collection\n\n\u003c!-- ✅ Use the sub-sections below only if they improve clarity --\u003e\n\u003c!-- ✅ --\u003e\n\nData was collected from both Google search results (SERPs) and GitHub Explore using a set of 46 targeted search phrases. A custom script (`github_api.py`) was used to retrieve repository metadata via the GitHub API.\n\nFor each search phrase:\n\n- The top 10 results from GitHub Explore were recorded.\n- The top results from Google search were collected (cutoff at 50), including variations where the term \"github\" was appended to the query.\n\nThis process resulted in:\n\n- 335 unique repositories\n- 455 total ranking records across all search phrases\n\n### \u003cspan aria-hidden=\"true\"\u003e🔧\u003c/span\u003e Data Processing and Transformation\n\n\u003c!-- 📌 --\u003e\n\n\u003e 🚧 Section under construction\n\n- Processing: organizing / filtering / restructuring, selecting columns, grouping/sorting, \"prepare for analysis\"\n- Transformation: creating new variables/features, aggregations, encoding / scaling, \"change the data into new forms\"\n\n### \u003cspan aria-hidden=\"true\"\u003e📊\u003c/span\u003e Data Analysis\n\n\u003c!-- 📌 --\u003e\n\n\u003e 🚧 Section under construction\n\n\u003e Visualize relationships between fields and rank positions to derive insights\n\nThe analysis focused on identifying relationships between repository and README features and their ranking positions in both Google and GitHub search results.\n\n\u003c!-- Comparisons were made between top-ranking and lower-ranking repositories to identify patterns and potential ranking factors. --\u003e\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Visualizations\n\n\u003c!-- 📌 --\u003e\n\n\u003e 🚧 Section under construction\n\nShow key charts or plots.\n\nInclude screenshots of graphs from the notebook.\n\nExplain what each chart demonstrates.\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n## Data Dictionary\n\n\u003c!-- ✅ --\u003e\n\nHere are all the fields in `merged_data.csv`:\n\n\u003c!-- Should it be labeled Dataset instead? Should I add all the fields? --\u003e\n\u003c!-- Should I explain why I have fields like has_blog? --\u003e\n\n\u003cdetails\u003e\n  \u003csummary\u003e\u003cstrong\u003eData Dictionary fields\u003c/strong\u003e\u003c/summary\u003e\n  \u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n   \u003ctable\u003e\n      \u003cthead\u003e\n         \u003ctr\u003e\n            \u003cth\u003eField Name\u003c/th\u003e\n            \u003cth\u003eData Type\u003c/th\u003e\n            \u003cth\u003eDescription\u003c/th\u003e\n         \u003c/tr\u003e\n      \u003c/thead\u003e\n      \u003ctbody\u003e\n         \u003ctr\u003e\n            \u003ctd\u003euser_reponame\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003eThe repo: \u003ccode\u003euser_name/repo_name\u003c/code\u003e\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003esearch_phrase\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003eThe search phrase used\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eexplore_rank\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003ePosition in GitHub EXplore results\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003egoogle_rank\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003ePosition in Google SERPs\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003esource\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003e\n               \u003cul\u003e\n                  \u003cli\u003eGoogle SERPs\u003c/li\u003e\n                  \u003cli\u003eGoogle SERPs with \"github\" appended to search phrase\u003c/li\u003e\n                  \u003cli\u003eGitHub Explore results\u003c/li\u003e\n               \u003c/ul\u003e\n            \u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003e1st_el\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003e1st text element in README\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003e2nd_el\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003e2nd text element in README\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003e3rd_el\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003e3rd text element in README\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eh1_ct\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of H1 elements\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eh2_ct\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of H2 elements\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eh3_ct\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of H3 elements\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003etoc\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e\n               \u003cul\u003e\n                  \u003cli\u003e0 = No table of contents\u003c/li\u003e\n                  \u003cli\u003e1 = Table of contents present\u003c/li\u003e\n               \u003c/ul\u003e\n            \u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eimages\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of images in README\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003ealt_text_ct\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of images with alt text\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003ecode_blocks\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of code blocks in README\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003einternal_links\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of links to repo files\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eexternal_links\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of links to external sites or repos\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003elive_link\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e\n               \u003cul\u003e\n                  \u003cli\u003e0 = No link to live deploy\u003c/li\u003e\n                  \u003cli\u003e1 = Link to live deploy in sidebar\u003c/li\u003e\n               \u003c/ul\u003e\n            \u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003ewatchers\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of repo watchers\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003econtributors\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of repo contributors\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003erank\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e\n               \u003cul\u003e\n                  \u003cp\u003eMy opinion on the quality of the README\u003c/p\u003e\n                  \u003cli\u003e1 = Bad\u003c/li\u003e\n                  \u003cli\u003e2 = Good/okay\u003c/li\u003e\n                  \u003cli\u003e3 = Very Good\u003c/li\u003e\n               \u003c/ul\u003e\n            \u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003etype\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003eMy main classification of the repo\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003etype2\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003eMy sub-class for the repo\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eword_count\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003eREADME word count\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eforks\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of forks for repo\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003estars\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of stars for repo\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003etopics\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of topics in sidebar\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eabout_text\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003eThe About (description) text for the repo\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eseo_title\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003eThe title from the Google SERPs\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003emeta_desc\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003eThe description from the Google SERPs\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003etitle_text\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003eThe About text from the repo sidebar\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eintro_len\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003eThe length of the intro text if good_intro = 1\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003egood_intro\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e\n               \u003cul\u003e\n                  \u003cp\u003eMy judgement based on the text elements, and the quality \u0026 length of the text at the top of the repo\u003cp\u003e\n                  \u003cli\u003e0 = No\u003c/li\u003e\n                  \u003cli\u003e1 = Yes\u003c/li\u003e\n               \u003c/ul\u003e\n            \u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eprimary_lang\u003c/td\u003e\n            \u003ctd\u003estr\u003c/td\u003e\n            \u003ctd\u003eLanguage used in search phrase\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003eyr\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of years since last update\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003emo\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of months since last update\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003ewk\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e# of weeks since last update\u003c/td\u003e\n         \u003c/tr\u003e\n         \u003ctr\u003e\n            \u003ctd\u003ehas_blog\u003c/td\u003e\n            \u003ctd\u003eint64\u003c/td\u003e\n            \u003ctd\u003e\n               \u003cul\u003e\n                  \u003cli\u003e0 = Repo owner has no blog/website\u003c/li\u003e\n                  \u003cli\u003e1 = Repo owner has blog/website\u003c/li\u003e\n                  \u003cli\u003e2 = Repo owner has posts on Hashnode, Medium, YouTube, etc.\u003c/li\u003e\n               \u003c/ul\u003e\n            \u003c/td\u003e\n         \u003c/tr\u003e\n      \u003c/tbody\u003e\n   \u003c/table\u003e\n\u003c/details\u003e\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Project Structure\n\n\u003c!-- 📌 --\u003e\n\n\u003e Current structure as of 4-19-2026\n\n```py\n github-readme-seo-analysis/\n│\n├── .github/                     # Issue \u0026 PR templates\n│\n├── data/                        # All datasets used in analysis\n│   ├── all_metrics.csv          # Repo \u0026 README metrics\n│   ├── merged_data.csv          # The 2 csv files merged\n│   └── search_ranks.csv         # Google and GitHub Explore ranks + search phrases\n│\n├── notebooks/                   # Jupyter notebooks for analysis\n│   ├── 01-eda_overview.ipynb\n│   ├── 02-google_rank.ipynb\n│   └── 03-explore_rank.ipynb\n│\n├── src/                         # Python scripts (data collection, processing)\n│   └── github_api.py\n│\n├── venv/                        # ???\n│\n├── visuals/                     # Charts/images for README (optional but recommended)\n│\n├── .env                         # API keys\n├── .env.example\n├── .gitignore\n├── CONTRIBUTING.md\n├── CODE_OF_CONDUCT.md\n├── LICENSE                      # Add later\n├── README.md                    # Project overview (SEO target)\n└── requirements.txt\n```\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Tech Stack\n\n\u003c!-- ✅ --\u003e\n\n| Tool                                     | Version |\n| :--------------------------------------- | :------ |\n| [Python](https://www.python.org/)        | 3.14.0  |\n| [Jupyter Notebook](https://jupyter.org/) | 7.4.5   |\n| [Pandas](https://pandas.pydata.org/)     | 3.0.1   |\n| [Matplotlib](https://matplotlib.org/)    | 3.10.6  |\n| [Seaborn](https://seaborn.pydata.org/)   | 0.13.2  |\n| [NumPy](https://numpy.org/)              | 2.4.3   |\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Installation\n\n\u003c!-- ✅ --\u003e\n\nFollow these steps to set up the project locally.\n\n1. Clone the repository:\n\n   ```bash\n   git clone https://github.com/Kernix13/github-readme-seo-analysis\n   cd github-readme-seo-analysis\n   ```\n\n2. Create a Virtual Environment\n\n   ```bash\n   # Linux/Mac Command\n   python3 -m venv venv\n\n   # GitBash Command (Windows)\n   python -m venv venv\n   ```\n\n3. Activate the virtual environment\n\n   ```bash\n   # Linux/Mac Command\n   source venv/bin/activate\n\n   # GitBash Command (Windows)\n   source venv/Scripts/activate\n   ```\n\n4. Install dependencies\n\n   ```bash\n   pip install -r requirements.txt\n\n   # register kernel (one-time)\n   python -m ipykernel install --user --name=venv --display-name \"Python (venv)\"\n   ```\n\n### ⚡ Quick Start (Windows)\n\n```sh\ngit clone https://github.com/yourusername/github-readme-seo-analysis.git\ncd github-readme-seo-analysis\npython -m venv venv\nsource venv/Scripts/activate\npip install -r requirements.txt\n```\n\n### ⚡ Quick Start (Linux / macOS)\n\n```sh\ngit clone https://github.com/yourusername/github-readme-seo-analysis.git\ncd github-readme-seo-analysis\npython3 -m venv venv\nsource venv/bin/activate\npip install -r requirements.txt\n```\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Usage\n\n\u003c!-- ✅ --\u003e\n\n\u003c!-- Do I include notes about using `github_api.py` here? And what about mentioning Anaconda or is that bad? --\u003e\n\n1. Start Jupyter:\n\n   ```sh\n   # recommended:\n   jupyter lab\n   # or, if you prefer the classic interface:\n   jupyter notebook\n   ```\n\n2. Open the `.ipynb` notebook files in the browser interface and run the cells\n3. Deactivate the virtual environment when done\n\n   ```sh\n   deactivate\n   ```\n\n**Note**: If you are using Anaconda or another environment manager, you can open the notebook using your preferred tool (e.g., Anaconda Navigator or jupyter lab) after installing the required dependencies.\n\nRunning `jupyter notebook` does not work. To get `jupyter lab` to run I have to run `python -m ipykernel install --user --name=venv --display-name \"Python (venv)\"` - ChatGPT sucks! How do I stop the kernel from the browser or do I just run `deactivate`?\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Future Improvements\n\n\u003c!-- 📌 --\u003e\n\n\u003e 🚧 Section under construction\n\n- I need more repos/examples and the need for contributors (only 337 repos)\n- Maybe a related Web Dev project that converts your README to HTML then does an SEO analysis and Accessibility check with output that shows what you need and/or suggestions? Or run it through Lighthouse\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## \u003cabbr title=\"Artificial Intelligence\"\u003eAI\u003c/abbr\u003e Usage\n\n\u003c!-- 📌 --\u003e\n\n\u003e 🚧 Section under construction\n\nI am in the early stages of learning Python, so I used ChatGPT to write the code in `src/github_api.py` to speed up the process of collecting metrics for the repos. There is a list where you enter the username/reponame and the returns values are:\n\n- README word count\n- Number of repo forks\n- Number of repo stars\n- Number of repo topics\n- About text\n\nThere are other fields I may be able to get but for now I get the rest of the metrics by going to the repo.\n\nRepo-level metrics I should also get using the GitHub API are:\n\n- Whether there is a live link in the sidebar or not\n- The number of watchers\n- The primary language IF it is part of the search query\n- the year, month, week, or day since last update\n\nREADME-level metrics I should also get using the GitHub API are:\n\n- The \"title\" text (some READMEs do not have an H1 or H2 as the 1st heading)\n- The number of internal links\n- The number of external links\n- The number of images (both `![]()` and `\u003cimg\u003e`)\n- The number of images with alt text\n- The count of H1, H2, and H3 elements (both `#` and `\u003ch1\u003e`)\n- Whether or not there is a Table of Contents or not\n\nI also need the first elements that are text elements, ideally H1 followed by a paragraph followed by an H2, ignoring images. It would be hard to program that since I have seen other elements at the top of the repo, plus there are other issues. I am doing all of that manually.\n\nI am also counting the number of code blocks which may or may not be useful. There are other metrics I am collecting that are subjective and would be difficult to add to a function.\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## Acknowledgments\n\n\u003c!-- ✅ --\u003e\n\n\u003c!-- Credit datasets, tutorials, or inspiration (maybe these) --\u003e\n\n- [5 tips for making your GitHub profile page accessible](https://github.blog/developer-skills/github/5-tips-for-making-your-github-profile-page-accessible/): The article that got me thinking about repo SEO\n- [Awesome SEO tools](https://github.com/serpapi/awesome-seo-tools): decent list of tools\n- [GitHub Search Engine Optimization (SEO): how to rank your repository in GitHub search](https://www.markepear.dev/blog/github-search-engine-optimization): Good article on specifics for GitHub Explore rank\n- [GitHub SEO: Rank your repo and get adoption in 2026](https://nakora.ai/blog/github-seo): excellent tips\n- [GitHub Pages SEO Analyzer](https://www.jekyllpad.com/tools/github-pages-seo-analyzer): Enter your GitHub page URL to get a report\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n\u003c!-- ## Resources\n\n- List items here (Do I need this section?)\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e --\u003e\n\n## Contributing\n\n\u003c!-- ✅ --\u003e\n\nContributions are welcome! If you'd like to help improve this project, please read our [contribution guidelines](./CONTRIBUTING.md) on how to get started, our workflow, and code style expectations.\n\n\u003c!-- Should I add a stars button here like I've seen in other repos? --\u003e\n\n\u003cspan aria-hidden=\"true\"\u003e\u003cbr\u003e\u003c/span\u003e\n\n## License\n\n\u003c!-- ✅ --\u003e\n\nThis project is licensed under the \u003cabbr title=\"Massachusetts Institute of Technology\"\u003eMIT\u003c/abbr\u003e License (coming soon).\n\n\u003c!-- This project is licensed under the \u003ca href=\"./LICENSE\"\u003e \u003cabbr title=\"Massachusetts Institute of Technology\"\u003eMIT\u003c/abbr\u003e License\u003c/a\u003e. --\u003e\n\n\u003cdiv align=\"right\"\u003e\u0026#8673; \u003ca href=\"#back-to-top\"\u003eBack to Top\u003c/a\u003e\u003c/div\u003e\n\n\u003c!--\n📌 How to add a license:\n  - https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository\n --\u003e\n\n \u003c!-- \n OTHER IMPORTANT LINKS\n\n  📌 Accessible Markdown:\n  - https://github.blog/developer-skills/github/5-tips-for-making-your-github-profile-page-accessible/\n\n  📌 Create a PR Template:\n  - https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository\n  - https://axolo.co/blog/p/part-3-github-pull-request-template\n  - https://github.com/Kernix13/github-actions-dotfiles/blob/main/dotfiles.md#dot-github-folder\n\n  📌 Create an issues template\n  - https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/configuring-issue-templates-for-your-repository\n  - https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/about-issue-and-pull-request-templates\n\n  ✅ Jupyter Markdown: \n  - https://www.ibm.com/docs/en/watson-studio-local/1.2.3?topic=notebooks-markdown-jupyter-cheatsheet\n  - https://sqlbak.com/blog/wp-content/uploads/2020/12/Jupyter-Notebook-Markdown-Cheatsheet2.pdf\n  - https://www.kaggle.com/code/cuecacuela/2025-the-ultimate-markdown-cheat-sheet\n\n--\u003e\n\n\u003c!--\n\n1. ✅ SEO Title: 50-60 chars ideally, up to 70 is okay, 50-55 for mobile\n2. ✅ SEO/Meta Description: 120-158 characters, closer to 120 for mobile\n3. ✅ GitHub Explore Description: max 143 characters\n\n\nNOTE: df = pd.read_csv('data/repo_metrics.csv', index_col='user_reponame')\n\npython src/github_api.py\n\n ⭐⭐⭐ Every section/heading should earn its place\n --\u003e\n\n\u003c!--\n\n  DATA HEADINGS SUMMARY:\n  - Data Sources - 29 ✅\n  - Data Dictionary - 10 ✅\n  - (Data) Visualizations ~ 8 ✅\n  - Data Summary - 4\n  - Data Cleaning + [others] ~ 4\n  - Data Processing and Data Transformation - 3\n  - Gathering the data - 3\n  - Data Analysis - 2\n\n  ✅ OTHER COMMON HEADINGS:\n  - Project Summary/Summary - 3\n  - Conclusion - 1\n  - Research - 1\n\n  - Project Challenges - 1\n  - Streamlit - 1\n  - Consider: bullet point Data Set summary with datatype counts\n  - Features / Project Features - 4+ (is this Code:You capstone features?)\n\n --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkernix13%2Fgithub-readme-seo-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkernix13%2Fgithub-readme-seo-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkernix13%2Fgithub-readme-seo-analysis/lists"}