https://github.com/zooniverse/hamlet
Produces subject set and classification exports suitable for AutoML
https://github.com/zooniverse/hamlet
automl django machine-learning python
Last synced: 4 months ago
JSON representation
Produces subject set and classification exports suitable for AutoML
- Host: GitHub
- URL: https://github.com/zooniverse/hamlet
- Owner: zooniverse
- License: apache-2.0
- Created: 2019-04-30T15:46:40.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2024-07-24T10:10:03.000Z (almost 2 years ago)
- Last Synced: 2024-07-24T11:28:48.746Z (almost 2 years ago)
- Topics: automl, django, machine-learning, python
- Language: Python
- Homepage:
- Size: 1.02 MB
- Stars: 5
- Watchers: 11
- Forks: 0
- Open Issues: 17
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Hosted AutoML Export Transformer (Hamlet)
Hamlet is a website that connects Zooniverse data from Panoptes with external data services, e,g. sending a camera trap project's animal-filled photos to an animal-identifying machine-learning system.
## Auto ML Export
## Subject Assistant
Hamlet has an export feature that ties into the Zooniverse Machine Learning Subject Assistant, [(app)](https://subject-assistant.zooniverse.org/) [(source)](https://github.com/zooniverse/zoo-ml-subject-assistant) which lets project owners/researchers submit their camera trap photos to an external Machine Learning (ML) service, which in turn finds animals in those images.
### User Story
The user story is as follows:
- Users start at the Subject Assistant app.
- Users are directed to Hamlet, where they choose a Subject Set to export to the external ML Service.
- Hamlet performs the export feature, and provides users with a link back to the Subject Assistant with an "ML Task ID" - e.g. `https://subject-assistant.zooniverse.org/#/tasks/6378`
- Users click that link, and process the ML-tagged photos on the Subject Assistant app.
### External Dependencies
The Subject Assistant requires the following external systems:
- Machine Learning Service - in this case, powered by Microsoft.
- an Azure Storage Container - works in conjunction with the ML Service, which requires "subject manifest" files to be stored on Azure.
As of late 2022, these services are maintained by the Zooniverse team.
### Environmental Variables
The Subject Assistant feature requires the following ENV variables defined:
- `SUBJECT_ASSISTANT_AZURE_ACCOUNT_NAME`
- `SUBJECT_ASSISTANT_AZURE_ACCOUNT_KEY`
- `SUBJECT_ASSISTANT_AZURE_CONTAINER_NAME`
- `SUBJECT_ASSISTANT_ML_SERVICE_CALLER_ID` - provided by our friends in Microsoft who run the ML Service.
- `SUBJECT_ASSISTANT_ML_SERVICE_URL` - ditto
Optionally, the following ENV variables can be defined:
- `SUBJECT_ASSISTANT_EXTERNAL_URL` - defaults to `http://subject-assistant.zooniverse.org/#/tasks/`
### Mechanics: Django Pages/Views
The ML Subject Assistant feature in Hamlet has two views:
- `GET /subject-assistant//` - lists all the Subject Sets for a Project, along with their "ML export" status and (if the export is successful) a link back to the Subject Assistant app.
- `POST /subject-assistant//subject-sets//` - performs the ML Export action for a given Subject Set, then redirects users back to the listing page.
### Mechanics: Database Model
The `MLSubjectAssistantExport` table has the following fields:
- subject_set_id - the ID of the Zooniverse Subject Set that was exported to the external ML Service
- json - the "subject manifest" file, in JSON format, created from all the Subjects of the Subject Set. The format is specific to the ML Service.
- azure_url - the URL of the "subject manifest" file that was uploaded to an external Azure storage container. (See Mechanics: ML Export Action for why)
- ml_task_uuid - the task request ID or "job ID" for the ML Export action. This is generated by the external ML Service.
### Mechanics: ML Export Action
Mechanically, the ML Subject Assistant's "export to Microsoft" action performs the following:
1. get all the Subjects for a given Subject Set (pulling from Panoptes)
2. create a JSON file - the "subject manifest" - that describes the Subjects to be exported, in a format specified by the external ML Service.
3. upload the JSON file to an external Azure storage container (reason: the current external ML Service only reads subject manifest files from Azure), then create a "shareable URL" to that JSON file. (Clarification: Azure uses a SAS or Shared Access Signature tokens to create shareable URLs with limited lifespans.)
4. Submit the shareable URL to the ML Service, and get the "job ID" it returns.
The Job ID plus the known Subject Assistant app URL is all that's required to construct a "return URL" for the user.
## Development
Use docker & docker-compose to setup a development env.
1. Run `docker-compose build` to build the app container.
2. Run the tests `docker-compose run -T --rm app bundle exec pytest --cov=hamlet`
Alternatively you can use docker & compose to run an interactive bash shell for development and testing
1. Run `docker-compose run --service-ports --rm app bash` to start the containers
2. Run `pytest --cov=hamlet` to run the test suite in that shell (sadly this system has no tests :sadpanda:)
3. Or `./start_server.sh` to run the server (see Pipfile)
### Troubleshooting
**I can't login on local development**
Problem:
- You're able to run `docker-compose build ; docker-compose up`, and you can view Hamlet on local development on `http://localhost:8080`
- However, when you click on the "Login with Zooniverse" button and provide your details on the Panoptes login page, you
Analysis:
- It's likely that your instance of Hamlet is missing the `PANOPTES_APPLICATION_ID` and `PANOPTES_SECRET` environment variables.
- These env vars are required to tell Panoptes _which oAuth application_ you're logging into.
Solution:
- Go to [Panoptes's oAuth applications list,](https://panoptes.zooniverse.org/oauth/applications) find the Hamlet app, and copy the Application ID and Secret
- Add these to your local development Docker's environment variables, as `PANOPTES_APPLICATION_ID` and `PANOPTES_SECRET`
- This can be done easily by creating a `.env` file in the root folder of your `hamlet` repo.
Related issue: [479](https://github.com/zooniverse/hamlet/issues/479)
**The database won't start on local development**
Problem:
- When you run `docker-compose build ; docker-compose up`, you notice that the PostgreSQL database isn't running.
- There's probably a few error message in the console: `app_1` will continuously complain that it's trying (and failing) to find the PostgreSQL database, while `postgres_1` might say something about "can't initialise due to incompatible database".
Analysis:
- It's possible that your existing local PostgreSQL database (i.e. the `/postgres_data` folder) was built on an older version of PostgreSQL, and recent updates to Hamlet have upgraded the PostgreSQL that Hamlet uses, causing an incompatibility.
Solution:
- Check if you have an existing `/postgres_data` folder in your local `hamlet` repo.
- If yes, delete it. The next time you start Hamlet, the database will be rebuilt with the latest version.
### Useful application scripts
- console: `python manage.py shell`
- create_local_db: `createdb -U halmet -O hamlet hamlet`
- drop_local_db: `dropdb -U hamlet hamlet`
- makemigrations: `python manage.py makemigrations`
- migrate `python manage.py migrate`
- server: `bash -e ./start_server.sh`
- tests: `pytest --cov=hamlet`
- tree: `bash -c 'find . | grep -v git | grep -v cache'`
- worker: `bash -c ./start_worker.sh`
### Updating a package with peotry
- `poetry update django`
See [Poetry docs](https://python-poetry.org/docs/basic-usage/#installing-dependencies) for more details