{"id":22211619,"url":"https://github.com/kwokhing/visualizing-datasets-with-facets","last_synced_at":"2026-04-18T14:02:38.445Z","repository":{"id":189295803,"uuid":"113724491","full_name":"KwokHing/Visualizing-Datasets-with-Facets","owner":"KwokHing","description":"Demo on using Facets: An Open Source Visualization Tool for Machine Learning Training Data developed by Google's PAIR Initiative","archived":false,"fork":false,"pushed_at":"2017-12-10T05:50:30.000Z","size":2628,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-10-26T02:41:46.684Z","etag":null,"topics":["anaconda","data-analysis","data-visualization","facets","jupyter-notebook","missing-data","open-source","python","skewness","unbalanced-data","visualisation","visualization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KwokHing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-12-10T05:02:33.000Z","updated_at":"2018-01-08T16:08:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"3d04e26b-5bb7-4dad-9789-b85b25daf26f","html_url":"https://github.com/KwokHing/Visualizing-Datasets-with-Facets","commit_stats":null,"previous_names":["kwokhing/visualizing-datasets-with-facets"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/KwokHing/Visualizing-Datasets-with-Facets","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KwokHing%2FVisualizing-Datasets-with-Facets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KwokHing%2FVisualizing-Datasets-with-Facets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KwokHing%2FVisualizing-Datasets-with-Facets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KwokHing%2FVisualizing-Datasets-with-Facets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KwokHing","download_url":"https://codeload.github.com/KwokHing/Visualizing-Datasets-with-Facets/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KwokHing%2FVisualizing-Datasets-with-Facets/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31971490,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T00:39:45.007Z","status":"online","status_checked_at":"2026-04-18T02:00:07.018Z","response_time":103,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anaconda","data-analysis","data-visualization","facets","jupyter-notebook","missing-data","open-source","python","skewness","unbalanced-data","visualisation","visualization"],"created_at":"2024-12-02T20:35:40.956Z","updated_at":"2026-04-18T14:02:33.435Z","avatar_url":"https://github.com/KwokHing.png","language":"Jupyter Notebook","readme":"### Visualizing Machine Learning Datasets using Anaconda \u0026 Facets\nFacets allows for easy visualization. For using Facets, first clone the git repository:\n\n\u003e git clone https://github.com/PAIR-code/facets.git\n\nTo use the visualization capabilities, you will have to add an nbextension. Therefore, find the path to the facets-dist directory in the cloned git repo and execute the following line of code:\n\n\u003e jupyter nbextension install facets-dist/ --user\n\nIn which case 'facets-dist' is the path to the respective folder. \n\nIf the above command still does not show the visualizations on the notebook, copy the file called facets-jupyter.html in 'facets/facets-dist' folder your local anaoconda file path _'[anaconda_path]/share/jupyter/nbextensions/'_. This is a known issue https://github.com/PAIR-code/facets/issues/41\n\nYou might need to restart jupyter after this and proceed with the vizualisation. For a more detailed installation guide and updates, have a look at:\n\n\u003e https://github.com/PAIR-code/facets\n\nDo also install the protobuf package\n\n\u003e conda install protobuf\n\n\n```python\n# Add the facets overview python code to the python path\nimport sys\n# FACETS_PATH is the full path to the python file in the clonde github repo of Facets.\n# It should look similar to this: \".../facets/facets_overview/python\"\n# If you have cloned the facets repo to your current working directory, you can proceed.\n# If you have chosen another location, just add it here.\n\nFACETS_PATH = 'facets-master/facets_overview/python'\nsys.path.append(FACETS_PATH)\n```\n\n\n```python\nimport pandas as pd\n\ntrain_data = pd.read_csv(\n    \"train.csv\",\n    #sep=r'\\s*,\\s*',\n    engine='python',\n    na_values=\"?\")\n\ntest_data = pd.read_csv(\n    \"test.csv\",\n    #sep=r'\\s*,\\s*',\n    engine='python',\n    na_values=\"?\")\n\ntest_salaries = pd.read_csv(\n    \"test_salaries.csv\",\n    #sep=r'\\s*,\\s*',\n    engine='python',\n    na_values=\"?\")\n\ntest_data = pd.concat([test_salaries, test_data], axis=1)\n```\n\n\n```python\n# Calculate the feature statistics proto from the datasets and stringify it for use in \n# facets overview\nfrom generic_feature_statistics_generator import GenericFeatureStatisticsGenerator\nimport base64\n\ngfsg = GenericFeatureStatisticsGenerator()\nproto = gfsg.ProtoFromDataFrames([{'name': 'train', 'table': train_data},\n                                  {'name': 'test', 'table': test_data}])\nprotostr = base64.b64encode(proto.SerializeToString()).decode(\"utf-8\")\n```\n\n\n```python\n# Display the facets overview visualization for this data\nfrom IPython.core.display import display, HTML\n\nHTML_TEMPLATE = \"\"\"\u003clink rel=\"import\" href=\"/nbextensions/facets-dist/facets-jupyter.html\" \u003e\n        \u003cfacets-overview id=\"elem\"\u003e\u003c/facets-overview\u003e\n        \u003cscript\u003e\n          document.querySelector(\"#elem\").protoInput = \"{protostr}\";\n        \u003c/script\u003e\"\"\"\n\nhtml = HTML_TEMPLATE.format(protostr=protostr)\ndisplay(HTML(html))\n```\n\n![png](images/facets-overview-numeric.png)\n\n![png](images/facets-overview-categorical.png)\n\nFacets Overview provides a quick understanding of the distribution of values across the features of their datasets. Multiple datasets, such as a training set and a test set, can also be compared on the same visualization. \n\nCommon data issues that can hamper machine learning are pushed to the forefront, such as: unexpected feature values, features with high percentages of missing values, features with unbalanced distributions, and feature distribution skew between datasets.\n\n### Known Issues ###\n\nThe Facets visualizations currently work only in Chrome browsers\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkwokhing%2Fvisualizing-datasets-with-facets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkwokhing%2Fvisualizing-datasets-with-facets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkwokhing%2Fvisualizing-datasets-with-facets/lists"}