{"id":19593030,"url":"https://github.com/laudebugs/kenya-web-project","last_synced_at":"2026-06-09T22:31:05.939Z","repository":{"id":42761632,"uuid":"276806409","full_name":"laudebugs/kenya-web-project","owner":"laudebugs","description":"Provided a dataset of the top 500 Kenyan websites according to Alexa on July 2020 as well as performed some analysis on the dataset","archived":false,"fork":false,"pushed_at":"2023-08-04T08:35:37.000Z","size":41986,"stargazers_count":0,"open_issues_count":4,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-26T14:16:26.833Z","etag":null,"topics":["dataset","kenya","lighthouse-audits","matplotlib","multisite-lighthouse"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/laudebugs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-07-03T04:23:43.000Z","updated_at":"2023-05-27T13:25:05.000Z","dependencies_parsed_at":"2024-11-11T08:50:32.084Z","dependency_job_id":null,"html_url":"https://github.com/laudebugs/kenya-web-project","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/laudebugs/kenya-web-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laudebugs%2Fkenya-web-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laudebugs%2Fkenya-web-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laudebugs%2Fkenya-web-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laudebugs%2Fkenya-web-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/laudebugs","download_url":"https://codeload.github.com/laudebugs/kenya-web-project/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laudebugs%2Fkenya-web-project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34129072,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","kenya","lighthouse-audits","matplotlib","multisite-lighthouse"],"created_at":"2024-11-11T08:37:53.285Z","updated_at":"2026-06-09T22:31:05.920Z","avatar_url":"https://github.com/laudebugs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Kenya Web Project\n\nFor a more complete picture of the process and a richer guide to the project, [find here](https://laudebugs.me/laudebugs.me/#/experiments/kenya-web-project).\n\nFor this simple project, I wanted to answer the question, are the websites that Kenyans access accessible to a populus that is keen on data consumption. I used, on average, 100MB of data and I could not afford to access websites that would use even 5MB on a single page. And therefore, the websites that used the least amount of data, but that were also fast were the ones I tended to access from day to day. WhatsApp is a good example of a situation where I would get the most value for the data I used. If I turned off automatic video and image downloads, I could spend just 10MB each day communicating with friends as opposed to using direct text messages that were more expensive (a bundle of 200 text messages from [Safaricom](https://niabusiness.com/buy-safaricom-sms-bundles/) would cost Kes 10, each with a limit of ~160 characters). Brings back memories of how I started to use short-forms like _imy_ and _gn_ among others to use every single character of a text message.\n\n## The Dataset\nThe dataset consists of the top 499 websites in Kenya according to Alexa obtained on 4th July 2020. For each website, I performed a lighthouse report and added two key value pairs to the json describing the website, i.e. the performance and the size of the website on download. This dataset if found at [data/combinedWebData.json](data/combinedWebsiteData.json)\n```json\n{\n    \"daily pageviews per visitor\": \"10.00\",\n    \"% of Traffic From Search\": \"28.30%\",\n    \"site\": \"Kabarak.ac.ke\",\n    \"rank\": 257,\n    \"Size of page downloaded\": \"45,606KB\",\n    \"daily time on site\": \"23:23\",\n    \"performance\": 0.04,\n    \"total Sites Linking In\": \"139\"\n  }\n```\nFor each website, the full lighthouse report is also available in the data/complete-reports folder.\n\u003chr/\u003e\n\n## Project Stages and Guide on DIY steps\n\n1. Data Collection and Preparation\n   I gathered data from Alexa of the 500 most popular websites in Kenya [here](rawWebsiteData.txt).\u003cbr/\u003e\n   To parse the data, I ran a simple [java script](Split.java) to transform the dataset into json format [here](websiteData.json)\u003cbr/\u003e\n   ```bash\n   # compile the java program\n   javac Script.java\n   # run the java file\n   java Script\n   ```\n   Now we are ready to use the dataset to generate reports\n2. Generating reports. Borrowing heavily from this [multiple-lighthouse](https://github.com/sahava/multisite-lighthouse) repository, I modified the code to accept a json file that contained the websites. I have a far more detailed process on generating reports on [this blog post](). Since my personal pc didn't have the raw power to generate all the reports at once, I ran the script `runLightHouse.js` on an AWS server and generated this [file](websiteData-n-reports.json). I also stored the full lighthouse reports for all the websites [here](complete-reports).\u003cbr/\u003e\n   To generate the reports locally and append the performance score and data download onto the json defining one site:\n\n   ```bash\n   # install dependencies\n   npm i\n   # run script - the sampleRunLightHouse.js file will run lighthouse reports on only 5 websites\n   node scripts/sampleRunLightHouse.js\n\n   # To generate reports for all the websites, (only if you have a very powerful machine )\n   node scripts/runLightHouse.js\n   ```\n\n3. Examining the data\nSeveral scripts are available to generate visual representations of the dataset. \nAll the scripts require python's matplotlib library\n   ```bash\n   pip3 install matplotlib\n   ```\nThe scripts are made available in the analysis folder. Each script should be run from the home directory. For instance:\n   ```bash\n   # To run the time Spent on website graph script:\n   python3 analysis/timeSpent.py\n   ```\n   ![time spent on site](analysis/graphs/TimeSpentOnWebsite.png)\n  \u003cbr/\u003e \nWe can see that on average, Kenyans spend 7.2682 minutes on a website with the most common time spent on a site being 3 minutes according to the graph above⬆️.\n   ![Size of web page downloaded](analysis/graphs/Size%20of%20the%20web%20page%20downloaded.png)\nThe optimal size of a web page is 0 - 1 MB downloaded once a use logs onto a site. This is of course, not taking into account cached resources that might reduce the size of the page downloaded.\n### PostScript\nThe dataset made available still needs work. For instance, comparing the time spent on a particular website with another and doesn't take into account the fact that different websites serve different functions. For instance, a person logging into the Kenya Revenue Authority website would perhaps use the site for a specific predertemined use case while a person using YouTube might not have a goal in mind while using the site. And therefore, comparing how, for instance, the size of the page corelates with the amount of time spent makes a lot of assumptions such as the function of each site. \u003cbr/\u003e\n\nHowever, I am glad to make the dataset available, free to use and for more research to be done. Especially at a time when the internet is crucial to keep systems moving during Covid-19, we need to examine more closely how Kenyans use the internet.\n\n## Future Work\n- [ ] Label each website according to category such as entertainment, utility, education to be able to examine the dataset in more detail.\n- [ ] Analyze how sites in a certain group or category compare with each other.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaudebugs%2Fkenya-web-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flaudebugs%2Fkenya-web-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaudebugs%2Fkenya-web-project/lists"}