{"id":13476973,"url":"https://github.com/A3Data/hermione","last_synced_at":"2025-03-27T04:31:52.353Z","repository":{"id":37202648,"uuid":"261773048","full_name":"A3Data/hermione","owner":"A3Data","description":"ML made simple","archived":false,"fork":false,"pushed_at":"2023-05-01T13:46:33.000Z","size":1784,"stargazers_count":210,"open_issues_count":11,"forks_count":40,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-03-03T00:04:20.847Z","etag":null,"topics":["data-science","hermione","machine-learning","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/A3Data.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-05-06T13:51:07.000Z","updated_at":"2025-02-07T22:01:25.000Z","dependencies_parsed_at":"2024-01-15T20:54:32.753Z","dependency_job_id":"a1bdcdff-8087-4117-bce8-d9ccb8cefee2","html_url":"https://github.com/A3Data/hermione","commit_stats":{"total_commits":186,"total_committers":15,"mean_commits":12.4,"dds":0.8333333333333334,"last_synced_commit":"c913f9235579a165505562d04afd828395081ec9"},"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/A3Data%2Fhermione","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/A3Data%2Fhermione/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/A3Data%2Fhermione/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/A3Data%2Fhermione/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/A3Data","download_url":"https://codeload.github.com/A3Data/hermione/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245785192,"owners_count":20671621,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","hermione","machine-learning","python"],"created_at":"2024-07-31T16:01:36.793Z","updated_at":"2025-03-27T04:31:49.144Z","avatar_url":"https://github.com/A3Data.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"![hermione](images/vertical_logo.png)\n\n\n[![PyPI version fury.io](https://badge.fury.io/py/hermione-ml.svg)](https://pypi.python.org/pypi/hermione-ml/)\n![Hermione](https://github.com/A3Data/hermione/workflows/hermione/badge.svg)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![GitHub issues](https://img.shields.io/github/issues/a3data/hermione.svg)](https://GitHub.com/a3data/hermione/issues/)\n[![GitHub issues-closed](https://img.shields.io/github/issues-closed/a3data/hermione.svg)](https://GitHub.com/a3data/hermione/issues?q=is%3Aissue+is%3Aclosed)\n[![PyPI status](https://img.shields.io/pypi/status/hermione-ml.svg)](https://pypi.python.org/pypi/hermione-ml/)\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/hermione-ml.svg)](https://pypi.python.org/pypi/hermione-ml/)\n[![PyPi downloads](https://pypip.in/d/hermione-ml/badge.png)](https://crate.io/packages/hermione-ml/)\n\n\n\n\n[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)\n\nA Data Science Project struture in cookiecutter style.\n\nDeveloped with ❤️ by \u003ca href=\"http://www.a3data.com.br/\" target=\"_blank\"\u003eA3Data\u003c/a\u003e\n\n  \n\n## What is Hermione?\n\n  \n\nHermione is the newest **open source** library that will help Data Scientists on setting up more organized codes, in a quicker and simpler way. Besides, there are some classes in Hermione which assist with daily tasks such as: column normalization and denormalization, data view, text vectoring, etc. Using Hermione, all you need is to execute a method and the rest is up to her, just like magic.\n\n### Why Hermione?\nTo bring in a little of **A3Data** experience, we work in Data Science teams inside several client companies and it’s undeniable the excellence of notebooks as a data exploration tool. Nevertheless, when it comes to data science products and their context, when the models needs to be consumed, monitored and have periodic maintenance, putting it into production inside a Jupyter Notebook is not the best choice (we are not even mentioning memory and CPU performance yet). And that’s why **Hermione comes in**!\nWe have been inspired by this brilliant, empowered and awesome witch of The Harry Potter saga to name this framework!\n\nThis is also our way of reinforcing our position that women should be taking more leading roles in the technology field. **#CodeLikeAGirl**\n\n## Installing\n\n\n### Dependencies\n\n- Python (\u003e= 3.8)\n- **docker**\n\nHermione does not depend on conda to build and manage virtual environments anymore. It uses `venv` instead.\n\n\n### Install\n\n```python\n\npip install -U hermione-ml\n\n```\n\n### Enabling autocompletion (unix users):\n\nFor bash:\n\n```bash\necho 'eval \"$(_HERMIONE_COMPLETE=source_bash hermione)\"' \u003e\u003e ~/.bashrc\n```\n\nFor Zsh:\n\n```bash\necho 'eval \"$(_HERMIONE_COMPLETE=source_zsh hermione)\"' \u003e\u003e ~/.zshrc\n```\n\n## How do I use Hermione?\nAfter installed Hermione:\n1.  Create you new project:\n\n```\nhermione project new project_hermione\n```\n\n2. Hit Enter if you want to start with an example code\n\n```\nPlease select one of the following templates \n\t(0) starter \n\t(1) barebones \n\t(2) sagemaker \nOption [0]: \n\n```\n\n3. Hermione already creates a virtual environment for the project. For Windows users, activate it with\n\n```cmd\n\u003cproject_name\u003e_env\\Scripts\\activate\n```\n\nFor linux and MacOS users, do\n\n```bash\nsource \u003cproject_name\u003e_env/bin/activate\n```\n\n\n5. After activating, you should install some libraries. There are a few suggestions in “requirements.txt” file:\n\n```\npip install -r requirements.txt\n```\n\n1. Now, if you selected the starter version,  we will train some models from the example, using MLflow ❤. To do so, inside project directory, just type: _hermione train_. The “hermione run train” command will search for a `train.py` file and execute it. In the example, models and metrics are already controlled via MLflow.\n\n![](https://cdn-images-1.medium.com/max/800/1*MmVcmAYspxWdzbd5r00W5g.png)\n\n6. After that, a mlflow experiment is created. To verify the experiment in mlflow, type: mlflow ui. The application will go up.\n\n```\nmlflow ui\n```\n\n    [2020-10-19 23:23:12 -0300] [15676] [INFO] Starting gunicorn 19.10.0\n    [2020-10-19 23:23:12 -0300] [15676] [INFO] Listening at: http://127.0.0.1:5000 (15676)\n    [2020-10-19 23:23:12 -0300] [15676] [INFO] Using worker: sync\n    [2020-10-19 23:23:12 -0300] [15678] [INFO] Booting worker with pid: 15678\n\n1. To access the experiment, just enter the path previously provided in your preferred browser. Then it is possible to check the trained models and their metrics.\n\n![](https://cdn-images-1.medium.com/max/800/1*c_rDEqERZR6r8JVI3TMTcQ.png)\n\n7. To make batch predictions using your `predict.py` file, type `hermione run predict`. The default implemented version will print some predictions for you in the terminal.\n\n```\nhermione run predict\n```\n\n8.  In the Titanic example, we also provide a step by step notebook. To view it, just type jupyter notebook inside directory `notebooks`.\n\n![](https://i.imgur.com/tKDrjc6.png)\n\n\n9. If you selected the Sagemaker version, click [here](hermione/module_templates/__IMPLEMENTED_SAGEMAKER__/README.tpl.md) to check a tutorial.\n\nDo you want to create your **project from scratch**? There click [here](tutorial_base.md) to check a tutorial.\n\n\n# Docker\n\nHermione comes with a default `Dockerfile` which implements a FastAPI application that serves your ML model. You should take a look at the `api/app.py` module and rewrite `predict_new()` function as you see fit.  \n\nAlso, in the newest version, hermione brings two CLI commands that helps us abstract a little bit the complexity regarding docker commands. To build an image (remember you should have docker installed), you should be in the project's root directory. Than, do:\n\n```bash\nhermione run build \u003cIMAGE_NAME\u003e\n```\n\nAfter you have built you're docker image, run it with:\n\n```bash\nhermione run container \u003cIMAGE_NAME\u003e\n```\n\n    [2020-10-20 02:13:20 +0000] [1] [INFO] Starting gunicorn 20.0.4\n    [2020-10-20 02:13:20 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)\n    [2020-10-20 02:13:20 +0000] [1] [INFO] Using worker: sync\n    [2020-10-20 02:13:20 +0000] [7] [INFO] Booting worker with pid: 7\n    [2020-10-20 02:13:20 +0000] [8] [INFO] Booting worker with pid: 8\n    [2020-10-20 02:13:20 +0000] [16] [INFO] Booting worker with pid: 16\n\n**THAT IS IT!** You have a live model up and running. To test your API, hermione provides a `api/myrequests.py` module. *This is not part of the project*; it's a \"ready to go\" code to make requests to the API. Help yourself!\n\n```bash\ncd src/api\npython myrequests.py\n```\n\n    Sending request for model...\n    Data: {\"Pclass\": [3, 2, 1], \"Sex\": [\"male\", \"female\", \"male\"], \"Age\": [4, 22, 28]}\n    Response: \"[0.24630952 0.996      0.50678968]\"\n\nPlay a little with the 'fake' data and see how far can the predictions go.\n\n\n## Documentation\nThis is the class structure diagram that Hermione relies on:\n\n![](images/class_diagram.png)\n\nHere we describe briefly what each class is doing:\n\n### Data Source\n-   **DataBase** - should be used when data recovery requires a connection to a database. Contains methods for opening and closing a connection.\n-   **Spreadsheet**  - should be used when data recovery is in spreadsheets/text files. All aggregation of the bases to generate a \"flat table\" should be performed in this class.\n-   **DataSource**  - abstract class which DataBase and Spreadsheet inherit from.\n\n\n### Preprocessing\n\n-   **Preprocessing**  - concentrates all preprocessing steps that must be performed on the data before the model is trained.\n-   **Normalization** - applies normalization and denormalization to reported columns. This class contains the following normalization algorithms already implemented: StandardScaler e MinMaxScaler.\n-   **TextVectorizer**  - transforms text into vector. Implemented methods: Bag of words, TF_IDF, Embedding: mean, median e indexing.\n-   **DataQuality**  - concentrates all data validation steps that must be performed on the data to ensure its quality.\n\n### Visualization\n\n-   **Visualization** - methods for data visualization. There are methods to make static and interactive plots.\n-   **App Streamlit** - streamlit example consuming Titanic dataset, including pandas profilling.\n\n### Model\n\n-   **Trainer**  - module that centralizes training algorithms classes. Algorithms from `scikit-learn` library, for instance, can be easily used with the TrainerSklearn implemented class.\n-   **Wrapper** - centralizes the trained model with its metrics. This class has built-in integration with MLFlow.\n-   **Metrics** - it contains key metrics that are calculated when models are trained. Classification, regression and clustering metrics are already implemented.\n\n### Tests\n-   **test_project** - module for unit testing.\n  \n\n## Contributing\n\nHave a look at our [contributing guide](CONTRIBUTING.md).\n\nMake a pull request with your implementation.\n\nFor suggestions, contact us: hermione@a3data.com.br\n\n## Licence\nHermione is open source and has Apache 2.0 License: [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FA3Data%2Fhermione","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FA3Data%2Fhermione","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FA3Data%2Fhermione/lists"}