{"id":13578605,"url":"https://github.com/falcony-io/ml-annotate","last_synced_at":"2025-04-05T19:33:20.315Z","repository":{"id":70400831,"uuid":"97089860","full_name":"falcony-io/ml-annotate","owner":"falcony-io","description":"Use ML-Annotate to label data for machine learning purposes","archived":true,"fork":false,"pushed_at":"2020-07-30T21:41:31.000Z","size":1173,"stargazers_count":102,"open_issues_count":6,"forks_count":25,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-05-18T23:18:54.730Z","etag":null,"topics":["annotation","labeling","machine-learning","python","tagging"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/falcony-io.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-07-13T06:51:17.000Z","updated_at":"2024-07-24T00:46:33.904Z","dependencies_parsed_at":"2023-04-26T15:32:13.181Z","dependency_job_id":null,"html_url":"https://github.com/falcony-io/ml-annotate","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/falcony-io%2Fml-annotate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/falcony-io%2Fml-annotate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/falcony-io%2Fml-annotate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/falcony-io%2Fml-annotate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/falcony-io","download_url":"https://codeload.github.com/falcony-io/ml-annotate/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247393094,"owners_count":20931804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["annotation","labeling","machine-learning","python","tagging"],"created_at":"2024-08-01T15:01:32.158Z","updated_at":"2025-04-05T19:33:19.458Z","avatar_url":"https://github.com/falcony-io.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"ML-Annotate\n===============\n\nYou can use ML-Annotate to label text data for machine learning purposes. ML-Annotate supports binary, multi-label and multi-class labeling.\n\n.. image:: http://i.imgur.com/JMVU6Ym.png\n\nRunning locally\n-----------\n\nML-Annotate requires Python 3.5 or later.\n\n1. Create neccessary virtualenv for ML-Annotate and install all packages::\n\n    virtualenv --python python3 .virtualenv\n    source .virtualenv/bin/activate\n    pip install -r requirements.txt\n\n2. Setup .env with all neccessary enviroment variables::\n\n    echo \"source .virtualenv/bin/activate\" \u003e\u003e .env\n    echo \"export FLASK_APP=annotator/app.py\" \u003e\u003e .env\n    echo \"export DATABASE_URL=postgres://localhost/annotator\" \u003e\u003e .env\n    echo \"export FLASK_DEBUG=1\" \u003e\u003e .env\n    source .env\n\n3. Create database. This requires you to have PostgreSQL installed so you should have command line tools such as createdb::\n\n    .virtualenv/bin/flask resetdb\n    .virtualenv/bin/flask add_user admin password\n\n4. Normally you would want to import your data at this point. We have included a test script to make up some data for testing purposes::\n\n    .virtualenv/bin/flask import_fake_data\n\n5. Run the app::\n\n    .virtualenv/bin/flask run\n\n\nAdding data\n-----------\n\nML-Annotate includes iPython shell for inserting data. Start by running::\n\n    flask shell\n\nThen you will have access to the application shell. Here's an example on how to add data from Project Gutenberg::\n\n    import requests\n    request = requests.get('https://www.gutenberg.org/files/1342/1342-0.txt')\n    text_contents = max(request.text.split('***'), key=lambda x: len(x))\n    paragraphs = [\n        x.strip() for x in text_contents.replace('\\r', '').split('\\n\\n')\n        if x.strip()\n    ]\n    new_problem = Problem(\n        name='Example',\n        labels=[ProblemLabel(label='Example', order_index=1)],\n        # supported types: binary, multi-label, multi-class\n        # add more labels if using other labels.\n        classification_type='binary'\n    )\n    for i, paragraph in enumerate(paragraphs):\n        db.session.add(Dataset(\n            table_name='gutenberg.pride_and_prejudice_by_jane_austen',\n            entity_id='paragraph%i' % i,\n            problem=new_problem,\n            free_text=paragraph\n        ))\n    db.session.commit()\n\n\nDeploying to Heroku\n-----------\n\nThis guide expects that you are deploying ML-Annotate to Heroku.\n\n1. Create new Heroku application.\n2. Set up the Heroku application Git remotes and push the application to production::\n\n    git remote add production git@heroku.com:APP_NAME_HERE.git\n    git push production\n\n3. Setup configuration::\n\n    heroku addons:create heroku-postgresql:hobby-dev --app APP_NAME_HERE\n    heroku config:set SECRET_KEY=$(python3 -c 'import binascii, os; print(binascii.hexlify(os.urandom(24)).decode())') --app APP_NAME_HERE\n    heroku config:set FLASK_APP=annotator/app.py --app APP_NAME_HERE\n    heroku buildpacks:add --index 1 heroku/nodejs --app APP_NAME_HERE\n    heroku buildpacks:add --index 2 https://github.com/philippkueng/heroku-buildpack-sassc.git --app APP_NAME_HERE\n    heroku buildpacks:add --index 3 heroku/python --app APP_NAME_HERE\n\n4. Then create the tables and create the user::\n\n    heroku run \"flask createtables\" --app APP_NAME_HERE\n    heroku run \"flask add_user admin password\" --app APP_NAME_HERE\n\n5. You should be able to access your instance of ML-Annotate now by going to *YOUR_APP_NAME.herokuapp.com*. Username is *admin* and the password is the one you set previously (yoursupersecretpassword).\n\n\nUsers\n-----------\n\nYou can add admin users with the command::\n\n    flask add_user username password\n\nIf you need to add more specific permissions, you can use **flask shell**::\n\n    flask shell\n    u = User(username='username', password='password')\n    db.session.add(u)\n    db.session.add(UserProblem(user=u, problem=Problem.query.get('PROBLEM_ID')))\n    db.session.commit()\n\n\nMaking modifications\n-----------\n\nIt's very likely that this application does not fit your needs perfectly and you need to make some modifications. If you need to extend any models, you can do so and generate migration with the following command::\n\n\n    alembic revision --autogenerate -m 'Add column'\n\nThen you can run the migration locally with `alembic upgrade head`. The migration is run automatically on Heroku when you deploy.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffalcony-io%2Fml-annotate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffalcony-io%2Fml-annotate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffalcony-io%2Fml-annotate/lists"}