{"id":15195501,"url":"https://github.com/grindelfp/datasets-analysis","last_synced_at":"2026-03-05T03:32:22.559Z","repository":{"id":222121977,"uuid":"756323275","full_name":"GrindelfP/datasets-analysis","owner":"GrindelfP","description":"The Machine Learning and Data Analysis course task dedicated to training skills of data normalizing and preprocessing.","archived":false,"fork":false,"pushed_at":"2024-02-23T19:45:17.000Z","size":3294,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-12T17:09:00.910Z","etag":null,"topics":["data-analysis","datasets","ipynb","mlda"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GrindelfP.png","metadata":{"files":{"readme":"README.adoc","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-12T12:49:49.000Z","updated_at":"2024-03-11T12:35:10.000Z","dependencies_parsed_at":"2024-02-23T20:44:59.893Z","dependency_job_id":null,"html_url":"https://github.com/GrindelfP/datasets-analysis","commit_stats":null,"previous_names":["grindelfp/datasets-analysis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GrindelfP%2Fdatasets-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GrindelfP%2Fdatasets-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GrindelfP%2Fdatasets-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GrindelfP%2Fdatasets-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GrindelfP","download_url":"https://codeload.github.com/GrindelfP/datasets-analysis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241459074,"owners_count":19966509,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","datasets","ipynb","mlda"],"created_at":"2024-09-27T23:40:20.782Z","updated_at":"2026-03-05T03:32:22.523Z","avatar_url":"https://github.com/GrindelfP.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"= Datasets analysis practice =\n\n== Task ==\n\n1. Pick 3 datasets from kaggle.com\n2. Describe the type of each feature from each dataset\n3. Analyze the data from one of the datasets with the most unique types of features\n4. Formalize and normalize the data\n5. Write conclusions about the work done\n\nThe work shoud be completed as a Jupyter notebook and passed as PDF file.\n\n== Datasets ==\n\n1. Chess Game Dataset (Lichess) - https://www.kaggle.com/datasnaek/chess\n2. Google Play Store Apps - https://www.kaggle.com/lava18/google-play-store-apps\n3. The books of Skyrim - https://www.kaggle.com/datasets/aadamg/skyrim-books-from-uesp\n\n== Formalization ways ==\nThe formalization of the data can be done in the multiple ways, dependently on the type of each feature of a dataset.\n\n=== Binary features ===\nDefine which one of the two present values is 0 and which is 1. Then, replace the values with 0 and 1.\n\n=== Nominal features ===\nNormalize the data by creating a column for each unique value of the feature. Then, replace the values with 0 and 1. \nExample: \nColor of eye: blue, green, brown, grey.\n\n[options=\"header\"]\n.Table 1. Representation of the nominal feature as columns of binary values\n|================\n| blue | green | brown | grey\n| 1   | 0 | 0 | 0\n| 0   | 1 | 0 | 0\n| 0   | 0 | 1 | 0\n| 0   | 0 | 0 | 1\n| ...   | ... | ... | ...\n|================\n\n\n=== Ranged features ===\nDivide the values by the greatest value of the feature, which will make the values to be in the range from 0 to 1.\n\n=== Features of quantity ===\nHere comes the following algorithm:\n\n1. Round the values to the nearest integer\n2. Sort the values in ascending order\n3. Divide the values into two groups: 0%..50% of values and 51%..100%\n4. Assign 0 to the first group and 1 to the second group\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrindelfp%2Fdatasets-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrindelfp%2Fdatasets-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrindelfp%2Fdatasets-analysis/lists"}