{"id":13639913,"url":"https://github.com/anodot/MLWatcher","last_synced_at":"2025-04-20T01:32:39.043Z","repository":{"id":100829419,"uuid":"187784217","full_name":"anodot/MLWatcher","owner":"anodot","description":null,"archived":false,"fork":false,"pushed_at":"2020-03-26T22:02:15.000Z","size":284,"stargazers_count":96,"open_issues_count":1,"forks_count":16,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-11-09T10:38:33.363Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/anodot.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-21T07:26:05.000Z","updated_at":"2024-08-22T11:47:12.000Z","dependencies_parsed_at":"2023-06-10T07:15:24.277Z","dependency_job_id":null,"html_url":"https://github.com/anodot/MLWatcher","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anodot%2FMLWatcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anodot%2FMLWatcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anodot%2FMLWatcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/anodot%2FMLWatcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/anodot","download_url":"https://codeload.github.com/anodot/MLWatcher/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249838129,"owners_count":21332561,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T01:01:06.057Z","updated_at":"2025-04-20T01:32:39.036Z","avatar_url":"https://github.com/anodot.png","language":"Python","readme":"# MLWatcher\r\n\r\nMLWatcher is a python agent that records a large variety of time-serie metrics of your running ML classification algorithm.  \r\nIt enables you to monitor in real time :\r\n- the predictions : monitor the repartition of classes, the distribution of the `predict_proba_matrix` values, anomalies in your predictions\r\n- the features : monitor concept drift, anomalies in your data\r\n- the labels : monitor accuracy, precision, recall, f1 of your predictions vs labels if applicable\r\n\r\nThe statistics derived from the data are :\r\n- range, mean, std, median, q25, q50, q75, iqr for any continuous values (probabilities, features)\r\n- count, frequency for any discrete values (labels, classes)\r\n\r\nSome additional data are derived from the `predict_proba_matrix` and monitored as continuous values :\r\n- pred1 : the maximum prediction for each line in the `predict_proba_matrix`\r\n- spread12 : the spread between the maximum prediction(pred1) and the second maximum prediction for each line in the `predict_proba_matrix`.\r\nDrops in the pred1 timeserie and jumps in the spread12 can indicate a decrease of the algorithm average degree of certainty of its predictions.\r\n\r\nMLWatcher minimal input is the `predict_proba_matrix` of your algorithm (for each line in the batch of data, the probabilities of each class). \r\n`label_matrix` and `input_matrix` are optional to monitor.\r\nIn case of binary classification with only 2 classes, a threshold value can be fed to monitor the labels-related and prediction-related metrics.\r\n\r\n## MLWatcher use cases\r\n\r\nMonitoring your Machine Learning metrics can be used to achieve multiple goals: \r\n  - **alert on concept drift** : the production data can be significatively different from the training data as time passes by. Analyze the distribution of the production features through time (mean, std, median, iqr etc).  \r\n    \r\n  *Example of concept drift for MNIST dataset where the input pixels values get suddenly inverted. An anomaly in the distribution of the features is raised*:  \r\n    \r\n  ![Alt text](./IMAGES/concept_drift.png?raw=true \"Concept drift\")\r\n    \r\n  - **analyze the performance of the model** : the model may no longer be accurate with respect to the production data. Some unlabeled metrics can display this staleness : pred1, spread12, class_frequency. Drops in the pred1 and spread12 timeseries can indicate a decrease of the algorithm average degree of certainty of its predictions.  \r\n    \r\n  *Example of how the model predictions metrics change when a new set of input data comes into production*:   \r\n    \r\n  ![Alt text](./IMAGES/unlabeled_monitoring.png?raw=true \"Model performance of predictions\")\r\n    \r\n  - **check that your model is numerically stable** : analyze the pred1 and spread12 stability, but also the different classes frequency stability.  \r\n    \r\n  *Example of putting into production a weakly trained model (trained with a highly unbalanced training set) and how this affects the stability of the predictions distribution for production*:  \r\n      \r\n  ![Alt text](./IMAGES/class_distribution_anomaly.png?raw=true \"Model numerical stability\")\r\n    \r\n  - **canary process new models** : monitor multiple ML models with the production inputs and compare metrics of the production algorithm vs the tested ones, analyze the stability of each model through time, etc.   \r\n    \r\n  *Example of monitoring the accuracy metric for multiple concurrent algorithms*:  \r\n      \r\n  ![Alt text](./IMAGES/canary.png?raw=true \"Canary process multiple models\")\r\n    \r\n  - if labels are available, analyze the evolution of the classic ML metrics and correlate with other time series (features, predictions). \r\n  \r\nThe size of each buffer of data is also monitored, so it is important to also correlate the computed metrics with the sample size. (ie : **the sample size is not always statiscally significant**). \r\n\r\n\r\n## Getting Started\r\n\r\n0- Install the libs in requirements.txt.\r\n```\r\npython -m pip install -r /path/to/requirements.txt\r\n```\r\n1- Add the MLWatcher folder in the same folder of your algorithm script.\r\n\r\n2- Personalize some technical parameters in file conf.py (rotating logs specs, filenames,  token if applicable, etc).\r\n\r\n3- Load the MLWatcher libs in your import lines :\r\n```\r\nfrom MLWatcher.agent import MonitoringAgent\r\n```\r\n4- Instanciate a MonitoringAgent object, and run the agent-server side:\r\n```\r\nagent = MonitoringAgent(frequency=5, max_buffer_size=500, n_classes=10, agent_id='1', server_IP='127.0.0.1', server_port=8000)\r\n```\r\n`frequency` :  (int) Time in minutes to collect data. Frequency of monitoring  \r\n`max_buffer_size` : (int) Upper limit of number of inputs in buffer. Sampling of incoming data is done if limit is reached \r\n`n_classes` : (int) Number of classes for classification. Must be equal to the number of columns of your predict_proba matrix  \r\n`agent_id` : (string) ID. Used in case of multiple agent monitors (default '1')  \r\n`server_IP` : (string) IP of the server ('127.0.0.1' if local server)  \r\n`server_port` : (int) Port of the server (default 8000)  \r\n\r\nFor LOCAL Server. Local server would be listening on previously defined port, on localhost interface (127.0.0.1).\r\n```\r\nagent.run_local_server()\r\n```\r\nFor DISTANT Server : Hosted server would be listening on a defined port, on localhost interface (--listen localhost) or all interfaces (--listen all). Recommended :\r\n```\r\npython /path/to/server.py --listen all --port 8000 --n_sockets 5\r\n```\r\nSee --help for server.py options.\r\n\r\n5- Monitor the running ML process for each batch of data\r\n```\r\nagent.collect_data(\r\npredict_proba_matrix = \u003cyour pred_proba matrix\u003e,   ##mandatory\r\ninput_matrix = \u003cyour feature matrix\u003e,  ##optional\r\nlabel_matrix = \u003cyour label matrix\u003e   ##optional\r\n)\r\n```\r\n6- If `TOKEN`=None is provided in conf.py, you can analyze your data stored locally in the PROD folder with the given jupyter notebook (ANALYTICS folder)\r\n\r\n7- **For advanced analytics of the metrics and detect anomalies in your data**, the agent output is compatible with Anodot Rest API by using a valid `TOKEN`.\r\n\r\nYou can use the Anodot API script as follows :\r\n\r\n- Asynchronously from json files written on disk:\r\n```\r\npython anodot_api.py --input \u003cpath/to/PROD/XXX_MLmetrics.json\u003e --token \u003cTOKEN\u003e\r\n```\r\n- Synchronously without storing any production file :\r\nEdit `TOKEN`='123ad..dfg' instead of None in conf.py\r\nIn case the data is not correctly sent to Anodot(connection problems), the agent will start writing the metrics directly on disk in PROD folder.\r\nPlease contact Anodot to get a TOKEN through a trial session.\r\n\r\n### Prerequisites\r\n\r\nThe agent is fully writen in Python 3.X. It was tested with Python \u003e= 3.5\r\n\r\nThe input format for the agent collector are :  \r\npredictions (mandatory): `predict_proba_matrix` size (batch_size x n_classes)  \r\nlabels (optional): `label_matrix` binary matrix of shape (batch_size x n_classes) or (int matrix of shape (batch_size x 1)  \r\nfeatures (optional) : `input_matrix` size (batch_size x n_features)  \r\nn_classes must be \u003e= 2\r\n\r\n### Installing\r\n\r\nSee Getting Started section.\r\nYou can also have a look and run the example given with the MNIST dataset in the EXAMPLE folder (requirement:tensorflow).\r\n\r\n## Deployment and technical features.\r\n\r\nThe agent structure is as follows:\r\n  - a light agent-client side that collects the data of the algorithm and sends it the agent-server running in background (don't forget to launch agent.run_local_server() or use server.py)\r\n  - a agent-server side that **asynchronously** handles the data received to compute a wide variety of time series metrics.\r\n \r\n\r\nIt skips the data if a problem is met and records logs accordingly. \r\nThe agent is a light weight collector that stores up to `max_buffer_size` datapoints every period. \r\nAbove this limit, sampling is done using a 'Reservoir sampling' algorithm so the sampled data remains statistically significant. \r\n \r\nTo tackle bottleneck issues, you can adjust the number of threads that the server can run in parallel with the volume of batches you want to monitor synchronously. \r\nYou can also adjust `max_buffer_size` and `frequency` parameters accordingly to your volumetry. \r\nFor Anodot usage, a limit from Anodot API is defined as 2000 metric-datapoints per second. Please make sure that the volumetry is below this limit, else some monitored data would be lost (no storage case). \r\nBefore going to production, a phase of **tests** for implementing the agent and server to your production running algorithm is **highly recommended**. \r\n\r\n\r\n## Contributing\r\n\r\nThis agent was developped by Anodot to help the data science community to monitor in real time the performance, the anomalies and the lifecycle of running ML algorithms.  \r\nPlease also refer to the paper of Google 'The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction' to have a global view of the good practices in production ML algorithm design and monitoring.\r\n\r\n## Versioning\r\n\r\nv1.0\r\n\r\n## Authors\r\n\r\n* **Anodot**\r\n* **Garry B** - *ITC Project for Anodot*\r\n\r\n\r\n## License\r\n\r\nMIT License\r\n\r\nCopyright (c) 2019 Anodot\r\n\r\nPermission is hereby granted, free of charge, to any person obtaining a copy\r\nof this software and associated documentation files (the \"Software\"), to deal\r\nin the Software without restriction, including without limitation the rights\r\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\ncopies of the Software, and to permit persons to whom the Software is\r\nfurnished to do so, subject to the following conditions:\r\n\r\nThe above copyright notice and this permission notice shall be included in all\r\ncopies or substantial portions of the Software.\r\n\r\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\r\nSOFTWARE.\r\n\r\n## Acknowledgments\r\n\r\nAnodot Team  \r\nGlenda  \r\n","funding_links":[],"categories":["Model and Data Versioning","Model Serving and Monitoring"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanodot%2FMLWatcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanodot%2FMLWatcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanodot%2FMLWatcher/lists"}