{"id":13553454,"url":"https://github.com/uber/manifold","last_synced_at":"2025-05-13T23:06:52.500Z","repository":{"id":35106487,"uuid":"169798869","full_name":"uber/manifold","owner":"uber","description":"A model-agnostic visual debugging tool for machine learning","archived":false,"fork":false,"pushed_at":"2025-02-05T23:02:02.000Z","size":7130,"stargazers_count":1660,"open_issues_count":83,"forks_count":118,"subscribers_count":51,"default_branch":"master","last_synced_at":"2025-05-13T08:43:41.927Z","etag":null,"topics":["incubation","machine-learning","visualization"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/uber.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-02-08T21:05:30.000Z","updated_at":"2025-05-05T12:44:19.000Z","dependencies_parsed_at":"2023-01-15T14:00:40.176Z","dependency_job_id":"0bc898ff-dbd3-40e7-aed3-7345f9e81bac","html_url":"https://github.com/uber/manifold","commit_stats":{"total_commits":161,"total_committers":14,"mean_commits":11.5,"dds":"0.34782608695652173","last_synced_commit":"c87df910d05da8da36130386634e52cc2ce64af3"},"previous_names":[],"tags_count":48,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber%2Fmanifold","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber%2Fmanifold/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber%2Fmanifold/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/uber%2Fmanifold/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/uber","download_url":"https://codeload.github.com/uber/manifold/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254040991,"owners_count":22004642,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["incubation","machine-learning","visualization"],"created_at":"2024-08-01T12:02:25.299Z","updated_at":"2025-05-13T23:06:47.490Z","avatar_url":"https://github.com/uber.png","language":"JavaScript","funding_links":[],"categories":["🛠️ General ML Testing Frameworks","其他_机器学习与深度学习","JavaScript","Visualization","Training","Technical Resources","LLM SECURITY / AI SECURITY","Visual Analysis and Debugging"],"sub_categories":["Visualization","Open Source/Access Responsible AI Software Packages","AI/ML Model Analysis"],"readme":"[![Gitpod Ready-to-Code](https://img.shields.io/badge/Gitpod-Ready--to--Code-blue?logo=gitpod)](https://gitpod.io/#https://github.com/uber/manifold) \n[![Build Status](https://travis-ci.com/uber/manifold.svg?token=SZsMuk4iZZDLKwRXzyxu\u0026branch=master)](https://travis-ci.com/uber/manifold)\n[![CII Best Practices](https://bestpractices.coreinfrastructure.org/projects/3062/badge)](https://bestpractices.coreinfrastructure.org/projects/3062)\n\n# Manifold\n\n_This project is stable and being incubated for long-term support._\n\n[\u003cimg alt=\"Manifold\" src=\"https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/Manifold_Header.jpg\" width=\"600\"\u003e](https://uber.github.io/manifold/)\n\nManifold is a model-agnostic visual debugging tool for machine learning.\n\nUnderstanding ML model performance and behavior is a non-trivial process, given the intrisic opacity of ML algorithms. Performance summary statistics such as AUC, RMSE, and others are not instructive enough for identifying what went wrong with a model or how to improve it.\n\nAs a visual analytics tool, Manifold allows ML practitioners to look beyond overall summary metrics to detect which subset of data a model is inaccurately predicting. Manifold also explains the potential cause of poor model performance by surfacing the feature distribution difference between better and worse-performing subsets of data.\n\n## Table of contents\n\n- [Prepare your data](#prepare-your-data)\n- [Interpret visualizations](#interpret-visualizations)\n- [Using the demo app](#using-the-demo-app)\n- [Using the component](#using-the-component)\n- [Contributing](#contributing)\n- [Versioning](#versioning)\n- [License](#license)\n\n## Prepare Your Data\n\nThere are 2 ways to input data into Manifold:\n\n- [csv upload](#upload-csv-to-demo-app) if you use the Manifold demo app, or\n- [convert data programatically](#load-and-convert-data) if you use the Manifold component in your own app.\n\nIn either case, data that's directly input into Manifold should follow this format:\n\n```js\nconst data = {\n  x:     [...],         // feature data\n  yPred: [[...], ...]   // prediction data\n  yTrue: [...],         // ground truth data\n};\n```\n\nEach element in these arrays represents one data point in your evaluation dataset, and the order of data instances in `x`, `yPred` and `yTrue` should all match.\nThe recommended instance count for each of these datasets is 10000 - 15000. If you have a larger dataset that you want to analyze, a random subset of your data generally suffices to reveal the important patterns in it.\n\n##### `x`: {Object[]}\n\nA list of instances with features. Example (2 data instances):\n\n```js\n[{feature_0: 21, feature_1: 'B'}, {feature_0: 36, feature_1: 'A'}];\n```\n\n##### `yPred`: {Object[][]}\n\nA list of lists, where each child list is a prediction array from one model for each data instance. Example (3 models, 2 data instances, 2 classes `['false', 'true']`):\n\n```js\n[\n  [{false: 0.1, true: 0.9}, {false: 0.8, true: 0.2}],\n  [{false: 0.3, true: 0.7}, {false: 0.9, true: 0.1}],\n  [{false: 0.6, true: 0.4}, {false: 0.4, true: 0.6}],\n];\n```\n\n##### `yTrue`: {Number[] | String[]}\n\nA list, ground truth for each data instance. Values must be numbers for regression models, must be strings that match object keys in `yPred` for classification models. Example (2 data instances, 2 classes ['false', 'true']):\n\n```js\n['true', 'false'];\n```\n\n## Interpret visualizations\n\nThis guide explains how to interpret Manifold visualizations.\n\nManifold consists of:\n\n- [Performance Comparison View](#performance-comparison-view) which compares\n  prediction performance across models, across data subsets\n- [Feature Attribution View](#feature-attribution-view) which visualizes feature\n  distributions of data subsets with various performance levels\n\n### Performance Comparison View\n\nThis visualization is an overview of performance of your model(s) across\ndifferent segments of your data. It helps you identify under-performing data subsets for further inspection.\n\n#### Reading the chart\n\n\u003cimg alt=\"performance comparison view\" src=\"https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/performance_comparison_1.png\" width=\"600\"\u003e\n\n1. **X axis:** performance metric. Could be log-loss, squared-error, or raw prediction.\n2. **Segments:** your dataset is automatically divided into segments based on performance similarity between instances, across models.\n3. **Colors:** represent different models.\n\n\u003cimg alt=\"performance comparison view unit\" src=\"https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/performance_comparison_2.png\" width=\"600\"\u003e\n\n1. **Curve:** performance distribution (of one model, for one segment).\n2. **Y axis:** data count/density.\n3. **Cross:** the left end, center line, and right end are the 25th, 50th and 75th percentile of the distribution.\n\n#### Explanation\n\nManifold uses a clustering algorithm (k-Means) to break prediction data into N segments\nbased on performance similarity.\n\nThe input of the k-Means is per-instance performance scores. By default, that is the log-loss value for classification models and the squared-error value for regression models. Models with a lower log-loss/squared-error perform better than models with a higher log-loss/squared-error.\n\nIf you're analyzing multiple models, all model performance metrics will be included in the input.\n\n#### Usage\n\n- Look for segments of data where the error is higher (plotted to the right). These are areas you should analyze and try to improve.\n\n- If you're comparing models, look for segments where the log-loss is different for each model. If two models perform differently on the same set of data, consider using the better-performing model for that part of the data to boost performance.\n\n- After you notice any performance patterns/issues in the segments, slice the data to compare feature distribution for the data subset(s) of interest. You can create two segment groups to compare (colored pink and blue), and each group can have 1 or more segments.\n\n**Example**\n\n\u003cimg alt=\"performance comparison view example\" src=\"https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/performance_comparison_3.png\" width=\"600\"\u003e\n\nData in Segment 0 has a lower log-loss prediction error compared to Segments 1 and 2, since curves in Segment 0 are closer to the left side.\n\nIn Segments 1 and 2, the XGBoost model performs better than the DeepLearning model, but DeepLearning outperforms XGBoost in Segment 0.\n\n\u003cbr/\u003e\n\n### Feature Attribution View\n\nThis visualization shows feature values of your data, aggregated by user-defined segments. It helps you identify any input feature distribution that might correlate with inaccurate prediction output.\n\n#### Reading the chart\n\n\u003cimg alt=\"feature attribution view\" src=\"https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/feature_attribution_1.png\" width=\"600\"\u003e\n\n1. **Histogram / heatmap:** distribution of data from each data slice, shown in the corresponding color.\n2. **Segment groups:** indicates data slices you choose to compare against each other.\n3. **Ranking:** features are ranked by distribution difference between slices.\n\n\u003cimg alt=\"feature attribution view unit\" src=\"https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/feature_attribution_2.png\" width=\"600\"\u003e\n\n1. **X axis:** feature value.\n2. **Y axis:** data count/density.\n3. **Divergence score:** measure of difference in distributions between slices.\n\n#### Explanation\n\nAfter you slice the data to create segment groups, feature distribution histograms/heatmaps from the two segment groups are shown in this view.\n\nDepending on the feature type, features can be shown as heatmaps on a map for geo features, distribution curve for numerical features, or distribution bar chart for categorical features. (In bar charts, categories on the x-axis are sorted by instance count difference. Look for differences between the two distributions in each feature.)\n\nFeatures are ranked by their KL-Divergence - a measure of _difference_ between the two contrasting distributions. The higher the divergence is, the more likely this feature is correlated with the factor that differentiates the two Segment Groups.\n\n#### Usage\n\n- Look for the differences between the two distributions (pink and blue) in each feature. They represent the difference in data from the two segment groups you selected in the Performance Comparison View.\n\n**Example**\n\n\u003cimg alt=\"feature attribution view example\" src=\"https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/feature_attribution_3.png\" width=\"600\"\u003e\n\nData in Groups 0 and 1 have obvious differences in Features 0, 1, 2 and 3; but they are not so different in features 4 and 5.\n\nSuppose Data Groups 0 and 1 correspond to data instances with low and high prediction error respectively, this means that data with higher errors tend to have _lower_ feature values in Features 0 and 1, since peak of pink curve is to the left side of the blue curve.\n\n\u003cbr/\u003e\n\n### Geo Feature View\n\nIf there are geospatial features in your dataset, they will be displayed on a map. Lat-lng coordinates and [h3](https://github.com/uber/h3-js) hexagon ids are currently supoorted geo feature types.\n\n#### Reading the chart\n\n\u003cimg alt=\"geo feature view lat-lng\" src=\"https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/geo_feature_1.png\" width=\"600\"\u003e\n\n1. **Feature name:** when multiple geo features exist, you can choose which one to display on the map.\n2. **Color-by:** if a lat-lng feature is chosen, datapoints are colored by group ids.\n3. **Map:** Manifold defaults to display the location and density of these datapoints using a heatmap.\n\n\u003cimg alt=\"geo feature view hex id\" src=\"https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/geo_feature_2.png\" width=\"600\"\u003e\n\n1. **Feature name:** when choosing a hex-id feature to display, datapoints with the same hex-id are displayed in aggregate.\n2. **Color-by:** you can color the hexagons by: average model performance, percentage of segment group 0, or total count per hexagon.\n3. **Map:** all metrics that are used for coloring are also shown in tooltips, on the hexagon level.\n\n#### Usage\n\n- Look for the differences in geo location between the two segment groups (pink and grey). They represent the spation distribution difference between the two subsets you previously selected.\n\n**Example**\n\nIn the first map above, Group 0 has a more obvious tendency to be concentrated in downtown San Francisco area.\n\n\u003c!-- images in this doc are created from https://docs.google.com/presentation/d/1EqvjMyBLNX7wfEQPFKAoaE39bW0pXbBa8WIznQN49vE/edit?usp=sharing --\u003e\n\n## Using the Demo App\n\nTo do a one-off evaluation using static outputs of your ML models, use the demo app.\nOtherwise, if you have a system that programmatically generates ML model outputs, you might consider [using the Manifold component](#using-the-component) directly.\n\n### Running Demo App Locally\n\nRun the following commands to set up your environment and run the demo:\n\n```bash\n# install all dependencies in the root directory\nyarn\n# demo app is in examples/manifold directory\ncd examples/manifold\n# install dependencies for the demo app\nyarn\n# run the app\nyarn start\n```\n\nNow you should see the demo app running at `localhost:8080`.\n\n### Upload CSV to Demo App\n\n\u003cimg alt=\"csv upload interface\" src=\"https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/file_upload.png\" width=\"500\"\u003e\n\nOnce the app starts running, you will see the interface above asking you to upload **\"feature\"**, **\"prediction\"** and **\"ground truth\"** datasets to Manifold.\nThey correspond to `x`, `yPred`, and `yTrue` in the \"[prepare your data](#prepare-your-data)\" section, and you should prepare your CSV files accordingly, illustrated below:\n\n|           Field            |   **`x`** (feature)    | **`yPred`** (prediction)  | **`yTrue`** (ground truth)  |\n| :------------------------: | :--------------------: | :-----------------------: | :-------------------------: |\n|      Number of CSVs        |           1            |         multiple          |              1              |\n| Illustration of CSV format | ![][feature csv image] | ![][prediction csv image] | ![][ground truth csv image] |\n\nNote, the index columns should be excluded from the input file(s).\nOnce the datasets are uploaded, you will see visualizations generated by these datasets.\n\n## Using the Component\n\nEmbedding the Manifold component in your app allows you to programmatically generate ML model data and visualize.\nOtherwise, if you have some static output from some models and want to do a one-off evaluation, you might consider [using the demo app](#using-the-demo-app) directly.\n\nHere are the basic steps to import Manifold into your app and load data for visualizing. You can also take a look at the examples folder.\n\n### Install Manifold\n\n```bash\n$ npm install @mlvis/manifold styled-components styletron-engine-atomic styletron-react\n```\n\n### Load and Convert Data\n\nIn order to load your data files to Manifold, use the `loadLocalData` action. You could also reshape your data into the required Manifold format using `dataTransformer`.\n\n```js\nimport {loadLocalData} from '@mlvis/manifold/actions';\n\n// create the following action and pass to dispatch\nloadLocalData({\n  fileList,\n  dataTransformer,\n});\n```\n\n##### `fileList`: {Object[]}\n\nOne or more datasets, in CSV format. Could be ones that your backend returns.\n\n##### `dataTransformer`: {Function}\n\nA function that transforms `fileList` into the [Manifold input data format](#prepare-your-data). Default:\n\n```js\nconst defaultDataTransformer = fileList =\u003e ({\n  x: [],\n  yPred: [],\n  yTrue: [],\n});\n```\n\n### Mount reducer\n\nManifold uses Redux to manage its internal state. You need to register `manifoldReducer` to the main reducer of your app:\n\n```js\nimport manifoldReducer from '@mlvis/manifold/reducers';\nimport {combineReducers, createStore, compose} from 'redux';\n\nconst initialState = {};\nconst reducers = combineReducers({\n  // mount manifold reducer in your app\n  manifold: manifoldReducer,\n\n  // Your other reducers here\n  app: appReducer,\n});\n\n// using createStore\nexport default createStore(reducer, initialState);\n```\n\n### Mount Component\n\nIf you mount `manifoldReducer` in another address instead of `manifold` in the step above, you need to specify the path to it when you mount the component with the `getState` prop. `width` and `height` are both needed explicitly. If you have geospatial features and need to see them on a map, you also need a [mapbox token](https://docs.mapbox.com/help/how-mapbox-works/access-tokens/).\n\n```js\nimport Manifold from '@mlvis/manifold';\nconst manifoldGetState = state =\u003e state.pathTo.manifold;\nconst yourMapboxToken = ...;\n\nconst Main = props =\u003e (\n  \u003cManifold\n    getState={manifoldGetState}\n    width={width}\n    height={height}\n    mapboxToken={yourMapboxToken}\n  /\u003e\n);\n```\n\n### Styling\n\nManifold uses baseui, which uses Styletron as a styling engine. If you don't already use Styletron in other parts of your app, make sure to wrap Manifold with the [styletron provider](https://baseweb.design/getting-started/setup/#adding-base-web-to-your-application).\n\nManifold uses the baseui [theming API](https://baseweb.design/guides/theming/). The default theme used by Manifold is exported as `THEME`. You can customize the styling by extending `THEME` and passing it as a `theme` prop of the `Manifold` component.\n\n```js\nimport Manifold, {THEME} from '@mlvis/manifold';\nimport {Client as Styletron} from 'styletron-engine-atomic';\nimport {Provider as StyletronProvider} from 'styletron-react';\n\nconst engine = new Styletron();\nconst myTheme = {\n  ...THEME,\n  colors: {\n    ...THEME.colors,\n    primary: '#ff0000',\n  },\n}\n\nconst Main = props =\u003e (\n  \u003cStyletronProvider value={engine}\u003e\n    \u003cManifold\n      getState={manifoldGetState}\n      theme={myTheme}\n    /\u003e\n  \u003c/StyletronProvider\u003e\n);\n```\n\n## Built With\n- [TensorFlow.js](https://js.tensorflow.org/)\n- [React](https://reactjs.org/)\n- [Redux](https://redux.js.org/)\n\n## Contributing\nPlease read our [code of conduct](CODE_OF_CONDUCT.md) before you contribute! You can find details for submitting pull requests in the [CONTRIBUTING.md](CONTRIBUTING.md) file. Refer to the issue [template](https://help.github.com/articles/about-issue-and-pull-request-templates/).\n\n## Versioning\nWe document versions and changes in our changelog - see the [CHANGELOG.md](CHANGELOG.md) file for details.\n\n## License\nApache 2.0 License\n\n[feature csv image]: https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/x.png\n[prediction csv image]: https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/yPred.png\n[ground truth csv image]: https://d1a3f4spazzrp4.cloudfront.net/manifold/docs/yTrue.png\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuber%2Fmanifold","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuber%2Fmanifold","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuber%2Fmanifold/lists"}