{"id":22312330,"url":"https://github.com/coderham/data-512-final-project","last_synced_at":"2025-08-23T06:36:35.749Z","repository":{"id":93309294,"uuid":"158330676","full_name":"CoderHam/data-512-final-project","owner":"CoderHam","description":"Final Project for DATA512 - Human-Centered Data Science @ University of Washington","archived":false,"fork":false,"pushed_at":"2018-12-09T22:27:35.000Z","size":4527,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-05T22:34:41.452Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CoderHam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-11-20T04:26:18.000Z","updated_at":"2018-12-09T22:27:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"f4baee22-503c-4fca-a1cc-09ce2a23d203","html_url":"https://github.com/CoderHam/data-512-final-project","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CoderHam/data-512-final-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CoderHam%2Fdata-512-final-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CoderHam%2Fdata-512-final-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CoderHam%2Fdata-512-final-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CoderHam%2Fdata-512-final-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CoderHam","download_url":"https://codeload.github.com/CoderHam/data-512-final-project/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CoderHam%2Fdata-512-final-project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271745679,"owners_count":24813521,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-03T21:45:48.578Z","updated_at":"2025-08-23T06:36:35.701Z","avatar_url":"https://github.com/CoderHam.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Final Project Plan - Education Statistics\n\nFinal Project for DATA512 - Human-Centered Data Science @ University of Washington\n\nAuthor: Hemant Jain\n\n## Abstract and Motivation\n\nPersonal motivation: In most developing countries a large portion of the youth does not have access to primary and secondary education even today.\n\n**\"In 40 out of 93 countries, fewer than 50% of the poorest children have completed primary school\"**.[6]\n\nWithout proper education, it is hard for individuals from poor families to  seek employment pays well and rise above the poverty. This in turn affects their quality of life and that of their future generations. It was this passion towards education and it's access for all that led me to join an NGO - [Make a Difference](https://makeadiff.in/) (MAD) during my undergraduate and as a part MAD, I educated and facilitated the education of orphaned and underprivileged children.\n\nIn this project, I intend to analyze, investigate and study the Education data from World Bank. The data contains several indicators of the access and quality of education. Apart from recorded data, it also possesses projected numbers for these. As someone passionate about education, I wish to perform this exploratory research with an open mind in order to find patterns and explanations for them.\n\nAccess to education has for long been an important factor in the improvement of the quality of life as well as the economy of a country. While many countries have taken steps to ensure access to education I wish to investigate the success and effects of this. Is there any bias in the data that suggests unfair practices towards people of a certain race, ethnicity, gender and region.\n\nApart from improving the lifestyle of individuals, education helps people find their purpose in life, encourages inquisitiveness and introspection, motivates people to be better, broads perspective and improves reasoning skills and creativity.\n\n##  Research Questions\n\nWhile I intend keep an open mind during the study there are a few **Research Questions** I wish to answer:\n\n1. **Education and Income**\n    1. Is there is a correlation between the number of youth attending school and the level of income? My focus was drawn to this by this statement:\n\n    **\"Children living in a low income countries are twice as likely to be out of school than those children in high income countries. Additionally, children from the wealthiest 20% of the population are 4 times more likely to be in school than the poorest 20%.\"**[5]\n\n2. **Gender**\n    1. Does gender impact the enrollment, level and quality of education received?\n\n    This article - [Why girls in India are still missing out on the education they need](https://www.theguardian.com/education/2013/mar/11/indian-children-education-opportunities) talks about the problem in India.\n\n    **\"India is no longer considered a poor country and yet many children do not receive a good education.\" - Rachel Williams\"**\n\n    I intend to test if this is true for other countries with better/worse economies.\n\n3. **Government's role in Education**\n    1. Do governments change their investment in education over time? Is it related to their annual GDP?\n    2. In what components of education does the government invest more in and which does it invest less?\n\n4. **Quality of Education**\n    1. Is the learning outcomes of different subjects negatively correlated i.e. does performing well in one subject correlate with performing badly in another subject?\n    2. Which subjects have seen improvements over time in terms of scores? Has the academic curriculum become harder in the recent years?\n\n## Human-Centered Design Considerations\n\nI have decided to include the following Human-Centered design considerations:\n\n1. Knowledge of current affairs impacting education\n2. Qualitative analysis based on additional data about perceived attitude towards education\n3. In the manner of the Research Questions asked in the study\n4. Incorporating visualizations and qualitative analysis when possible\n\n## Data (Source and Schema)\n\nThe [Education Statistics](https://datacatalog.worldbank.org/dataset/education-statistics) dataset being used in this assignment is downloaded from the [World Bank](http://www.worldbank.org/). The dataset sources it's data from:\n1.  UIS ([UNESCO Institute for Statistics](http://uis.unesco.org/)) - Administrative country data\n2. Several International and Regional learning assessments\n3. [World Bank Education Projects Database](http://datatopics.worldbank.org/education/wQueries/qprojects) - activities, components and sub-sectors of WB Education projects since 1998\n4. [World Bank Education Expenditures Database](http://datatopics.worldbank.org/education/wQueries/qexpenditures) - Education expenditure data\n\nThe datasets have been downloaded and added to the [`data`](https://github.com/CoderHam/data-512-final-project/tree/master/data) directory and consists of 5 parts that have been described below, plus one additional dataset for income groups:\n\n1.  [**EdStatsCountry.csv**](https://github.com/CoderHam/data-512-final-project/tree/master/data/EdStatsCountry.csv)\n\n| Column | Datatype | Description |\n|---|---|---|\n| Country Code | text | a three digit unique code to represent a country |\n| Short Name | text | short name for the country |\n| Table Name | text | the name being used in this table |\n| Long Name | text | full name of the country |\n| 2-alpha code | text | alphanumeric code of length 2 for the country |\n| Currency Unit | text | currency of the country |\n| Special Notes | text | additional notes about the country |\n| Region | text | region where the country is location |\n| Income Group | text | which income group the country belongs to |\n| WB-2 code | text | World Bank 2 letter code |\n| National accounts base year | numeric | base year used for country accounts |\n| National accounts reference year| numeric | reference year for the country accounts |\n| SNA price valuation| text | System of National Accounts (SNA) valuation of currency |\n| Lending category | text | the lending agency for that country as decided by World Bank |\n| Other groups | text | other World Bank/UN groups that the country belongs to |\n| System of National Accounts| text | the year of SNA methodology the country uses |\n| Alternative conversion factor| text | the DEC alternative conversion factor OR the official exchange rate reported by IMF's International Financial Statistics (IFS) |\n| PPP survey year| numeric | survey year for the nation's Purchasing power parity (PPP) |\n| Balance of Payments Manual in use| text | the IMF balance of payments manual used by the country |\n| External debt Reporting status| text | the type of the external debt reported |\n| System of trade| text | the type of trade system the nation uses |\n| Government Accounting concept| text | the type of accounting the central government uses |\n| IMF data dissemination standard| text | the data dissemination standard used by IMF for that nation |\n| Latest population census| numeric | the last year the nations population census was reported |\n| Latest household survey| text | the last year the nations household survey was reported |\n| Source of most recent Income and expenditure data | text |  the source and last year the nations income and expenditure data was reported |\n| Vital registration complete | categorical | has the nation completed vital registration |\n| Latest agricultural census| numeric | the last year the nations agricultural data was reported |\n| Latest industrial data | numeric | the last year the nations industrial data was reported |\n| Latest trade data| numeric | the last year the nations trade data was reported |\n| Latest water withdrawal data| numeric | the last year the nations water withdrawal data was reported |\n\n2. [**EdStatsCountry-Series.csv**](https://github.com/CoderHam/data-512-final-project/tree/master/data/EdStatsCountry-Series.csv)\n\n| Column | Datatype | Description |\n|---|---|---|\n| CountryCode\t| text | a three digit unique code to represent a country |\n| SeriesCode | text | the code for the data series |\n| Description | text | source of data / estimates |\n\n3. [**EdStatsData.csv**](https://github.com/CoderHam/data-512-final-project/tree/master/data/EdStatsData.csv)\n\n| Column | Datatype | Description |\n|---|---|---|\n|Country Name| text | the name of the country |\n|Country Code| text | a three digit unique code to represent a country |\n|Indicator Name| text | the name of the indicator being listed |\n|Indicator Code| text | the indicator code for the indicator being listed |\n|{Value for the Year} (1970 to 2017 as with 5 year binned projections till 2100)| numeric | the value of the indicator for that code |\n\n4. [**EdStatsSeries.csv**](https://github.com/CoderHam/data-512-final-project/tree/master/data/EdStatsSeries.csv)\n\n| Column | Datatype | Description |\n|---|---|---|\n|Series Code| text | the code for the data series |\n|Topic| text | topic category the series belongs to |\n|Indicator Name| text | name of the indicator |\n|Short definition| text | a short definition of the indicator |\n|Long definition| text | a detailed definition of the indicator |\n|Unit of measure| text | the unit of measure for the indicator |\n|Periodicity| numeric | time between successive recordings of for the indicator |\n|Base Period| numeric | the year used as reference for the base value for the indicator |\n|Other notes| text | additional details about the indicator |\n|Aggregation method| text | how the indicator was recorded or inferred |\n|Limitations and exceptions| text | details about the limits and expectations for the indicator |\n|Notes from original source| text | notes from- the source of the indicator |\n|General comments| text | comments about citations or use of this data |\n|Source\t| text | source of the indicator data |\n|Statistical concept and methodology| text | The method or study used to find the value of the indicator |\n|Development relevance| text | additional information about the collection of the indicator |\n|Related source links| text | link to source of the data |\n|Other web links| text | additional links for the data |\n|Related indicators| text | related indicators in the dataset |\n|License Type| text | license type for the indicator|\n\n5. [**EdStatsFootNote.csv**](https://github.com/CoderHam/data-512-final-project/tree/master/data/EdStatsFootNote.csv)\n\n| Column | Datatype | Description |\n|---|---|---|\n| CountryCode| text | a three digit unique code to represent a country |\n| SeriesCode| text | the code for the data series |\n| Year| numeric | the year for which the data is collected |\n| Description | text | the method of collection/estimation of the indicator |\n\nI added data for the current income levels, as classified by World Bank [7].\n\n6. [**income_group.csv**](https://github.com/CoderHam/data-512-final-project/tree/master/data/income_group.csv)\n\n| Column | Datatype | Description |\n|---|---|---|\n|Economy| text | the name of the country |\n| Code| text | a three digit unique code to represent a country |\n| Region | text | region where the country is location |\n| Income group | text | which income group the country belongs to |\n| Lending category | text | the lending agency for that country as decided by World Bank |\n| Other | text | other World Bank/UN groups that the country belongs to |\n\n\n## Licenses\n\nThe dataset is classified **Public** and is licensed under the [**CC-BY 4.0**](https://datacatalog.worldbank.org/public-licenses#cc-by). A short summary of [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) is that under this license, individuals are free to copy and redistribute data in any medium as well as modify and build upon the data for all purposes including commercially.\n\n\n## Data Pre-processing and Preparation\n\nThe data is distributed across the 5 csv files and has many missing values, extra details and incorrect formatting. There is no exhaustive description of each column and domain specific phrases and the same must be studied and inferred by referring to relevant literature and the UNIS and World Bank website.\n\n## Considerations for Gender\n\nThe dataset treats gender and sex as the same i.e. either male or female and while I do not agree that one should be forced to identify between either of the two, I was unable to find relevant education data for non-binary genders.\n\nStating this as a limitation of the dataset, I will proceed with this data and look for other sources to enrich the dataset.\n\n## Data Limitations\n\nApart from the data being reported by multiple agencies to the World Bank Group and the UN, there are several missing values. There is a gender limitation in that the dataset assumes binary gender is also important. The projections in the data are absent in many cases and those that are present may not be accurate.\n\n## Conclusions\n\nThere are several claims made online regarding inequality in educations, bias against female education [11] and relation between income and education. I wish to investigate these claims and find other such trends in the dataset.\n\n## References\n\n[1] World Bank Education Statistics Dataset - [https://datacatalog.worldbank.org/dataset/education-statistics](https://datacatalog.worldbank.org/dataset/education-statistics)\n\n[2] UNESCO Institute for Statistics - [http://uis.unesco.org/](http://uis.unesco.org/)\n\n[3] Education Statistics (EdStats) - [http://datatopics.worldbank.org/education/](http://datatopics.worldbank.org/education/)\n\n[4] Education Data Release - [http://uis.unesco.org/en/news/education-data-release-one-every-five-children-adolescents-and-youth-out-school](http://uis.unesco.org/en/news/education-data-release-one-every-five-children-adolescents-and-youth-out-school)\n\n[5] 11 Facts About Education Around the World - [https://www.dosomething.org/us/facts/11-facts-about-education-around-world](https://www.dosomething.org/us/facts/11-facts-about-education-around-world)\n\n[6] World Inequality Database on Education - [https://www.education-inequalities.org/](https://www.education-inequalities.org/)\n\n[7] World Bank Country and Lending and Income Groups - [https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups](https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups)\n\n[8] What is the DEC conversion factor? - [https://datahelpdesk.worldbank.org/knowledgebase/articles/77935-what-is-the-dec-conversion-factor](https://datahelpdesk.worldbank.org/knowledgebase/articles/77935-what-is-the-dec-conversion-factor)\n\n[9] World Bank Knowledge Base - [https://datahelpdesk.worldbank.org/knowledgebase](https://datahelpdesk.worldbank.org/knowledgebase)\n\n[10] Gender Parity Inde - [https://unstats.un.org/unsd/mdg/Metadata.aspx?IndicatorId=9](https://unstats.un.org/unsd/mdg/Metadata.aspx?IndicatorId=9)\n\n[11] Guardian Article on 'Why girls in India are still missing out on the education they need' - [https://www.theguardian.com/education/2013/mar/11/indian-children-education-opportunities](https://www.theguardian.com/education/2013/mar/11/indian-children-education-opportunities)\n\n[12] Why education matters for economic development - [http://blogs.worldbank.org/education/why-education-matters-economic-development](http://blogs.worldbank.org/education/why-education-matters-economic-development)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoderham%2Fdata-512-final-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcoderham%2Fdata-512-final-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcoderham%2Fdata-512-final-project/lists"}