Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nathan-lindstedt/census_income
Model Development w/ Kepler Mapper
https://github.com/nathan-lindstedt/census_income
kepler-mapper machine-learning scikit-tda shape-of-data topological-data-analysis uci-ml-repository xgboost
Last synced: about 1 month ago
JSON representation
Model Development w/ Kepler Mapper
- Host: GitHub
- URL: https://github.com/nathan-lindstedt/census_income
- Owner: nathan-lindstedt
- License: mit
- Created: 2024-09-28T15:50:48.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-12-06T02:08:53.000Z (about 1 month ago)
- Last Synced: 2024-12-06T03:18:48.749Z (about 1 month ago)
- Topics: kepler-mapper, machine-learning, scikit-tda, shape-of-data, topological-data-analysis, uci-ml-repository, xgboost
- Language: Python
- Homepage:
- Size: 328 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# census_income
THE SHAPE OF 1994 US CENSUS BUREAU INCOMES FROM THE UC-IRVINE MACHINE LEARNING REPOSITORYAbstract
Topological data analysis (TDA) is a group of methods and techniques that can be used in the context of both exploratory and explanatory data analysis. Here the Kepler Mapper algorithm is applied to the training set and its predictions as output by an XGBoost classifier evaluated on a validation set to investigate how well the baseline model performs on the target of interest (i.e., earning <= $50K or > $50K) and to visualize those areas where it might misclassify out-of-sample results before utilizing any testing set data to make such determinations. The goal for the analyst is to have an *a priori* understanding of the salient discriminative variables and the potential confounding variables to guide model development. Other suggested improvements to the baseline model include categorical feature embeddings, probability calibration, and decision threshold tuning.
![scikit_tda_img](./census_income/images/scikit_tda_img.png)
Dataset Citation:
Becker, B. and R. Kohavi. "Census Income," UCI Machine Learning Repository, 1996. [Online]. Available: https://doi.org/10.24432/C5GP7S.