{"id":37668541,"url":"https://github.com/agentmorris/speciesnet-taxonomy-mapper","last_synced_at":"2026-01-16T12:01:04.810Z","repository":{"id":325511182,"uuid":"1101459659","full_name":"agentmorris/speciesnet-taxonomy-mapper","owner":"agentmorris","description":"Web-based app to map taxon lists to the SpeciesNet taxonmy","archived":false,"fork":false,"pushed_at":"2025-11-21T18:24:03.000Z","size":162,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-21T20:22:15.706Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/agentmorris.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-21T17:49:03.000Z","updated_at":"2025-11-21T18:24:06.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/agentmorris/speciesnet-taxonomy-mapper","commit_stats":null,"previous_names":["agentmorris/speciesnet-taxonomy-mapper"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/agentmorris/speciesnet-taxonomy-mapper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agentmorris%2Fspeciesnet-taxonomy-mapper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agentmorris%2Fspeciesnet-taxonomy-mapper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agentmorris%2Fspeciesnet-taxonomy-mapper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agentmorris%2Fspeciesnet-taxonomy-mapper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/agentmorris","download_url":"https://codeload.github.com/agentmorris/speciesnet-taxonomy-mapper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/agentmorris%2Fspeciesnet-taxonomy-mapper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478400,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T11:59:17.896Z","status":"ssl_error","status_checked_at":"2026-01-16T11:55:55.838Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-16T12:01:03.670Z","updated_at":"2026-01-16T12:01:04.737Z","avatar_url":"https://github.com/agentmorris.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SpeciesNet Taxonomy Mapper\n\nA web-based tool to assist ecologists in mapping user-defined species lists to the [SpeciesNet](https://github.com/google/cameratrapai) taxonomy. It uses exact matching, heuristic parsing, and Google Gemini (LLM) for soft matching of unknown terms.\n\n## Features\n\n*   **Input Handling**: Supports single names or \"Common, Latin\" / \"Latin, Common\" pairs.\n*   **Interactive \u0026 Iterative Workflow**:\n    *   View mapped results in an interactive preview panel.\n    *   Each output row is editable, allowing for manual corrections.\n    *   **Row Locking**: Lock correct mappings to prevent them from being changed.\n    *   **Partial Reprocessing**: When you re-run the tool, only unlocked rows are sent for processing, saving time and API calls.\n*   **Gemini Integration**:\n    *   Uses Google's Gemini models to suggest mappings for ambiguous or unknown terms.\n    *   Accepts a server-default API key or a **custom, user-provided key** for a specific session.\n*   **Location Context**: Accepts a study area (e.g., \"Alberta, Canada\") to improve LLM disambiguation.\n\n## Workflow\n\nThe UI is designed for an iterative workflow where you can refine your results efficiently.\n\n1.  **Initial Processing**: Paste your species list into the left-hand input panel and click \"Process Input\".\n2.  **Review \u0026 Lock**: The right-hand panel will populate with the mapped results. Review each line. If a line is correct, click the **Unlock icon** (\u003ci class=\"bi bi-unlock\"\u003e\u003c/i\u003e) next to it. The icon will change to a **Lock icon** (\u003ci class=\"bi bi-lock-fill\"\u003e\u003c/i\u003e), and the row will be protected from future changes.\n3.  **Correct \u0026 Edit**: For any incorrect mappings, go back to the left-hand input panel and edit the corresponding line to be more specific. You can also directly edit the text in the unlocked output rows on the right.\n4.  **Reprocess**: Click \"Process Input\" again. Only the information for the unlocked rows will be sent to the backend. Your locked rows will remain untouched.\n5.  **Download**: Once you are satisfied with all the mappings, click \"Download CSV\".\n\n## Prerequisites\n\n*   Python 3.9+\n*   A SpeciesNet taxonomy file (e.g., `taxonomy_release.txt`) - **Required**\n*   (Optional) Google Gemini API Key\n\n## Setup \u0026 Local Execution\n\n1.  **Install Dependencies**:\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n2.  **Place Taxonomy File**:\n    Copy or symlink your `taxonomy_release.txt` file to the project root directory. This file is **required** - the app will not start without it.\n    *(Alternatively, set the `TAXONOMY_PATH` environment variable to point to the file location)*.\n\n3.  **Configure API Key (Server Default)**:\n    Create a file named `gemini-key.txt` in the project root and paste your Google Gemini API key into it. This will be the default key.\n    *(Alternatively, set the `GOOGLE_API_KEY` environment variable)*.\n\n4.  **Run the Application**:\n    ```bash\n    python app.py\n    ```\n    Access the app at [http://127.0.0.1:5000](http://127.0.0.1:5000).\n\n## Docker Deployment\n\nThe application is containerized for easy deployment on Linux servers.\n\n1.  **Prepare Configuration**:\n    *   Place `taxonomy_release.txt` in the project root directory (**required**).\n    *   Ensure `gemini-key.txt` is present in the project root.\n\n2.  **Build and Run (test)**:\n    ```bash\n    docker-compose down\n    docker-compose up --build\n    ```\n\n2.  **Build and Run (as a service)**:\n    ```bash\n    docker-compose down\n    docker-compose up --build -d\n    ```\n\nThe `docker-compose.yml` is pre-configured to mount both files from the current directory. If you need to use a different location for the taxonomy file, you can either modify the volume mount in `docker-compose.yml` or set the `TAXONOMY_PATH` environment variable in the Docker configuration.\n\n## Command-Line Testing Interface\n\nThe `matcher.py` script can be run directly from the command line to test individual species mappings:\n\n```bash\n# Single query\npython matcher.py --query \"brown creeper\"\n\n# Multiple queries (semicolon-delimited)\npython matcher.py --query \"brown creeper; american three-toed woodpecker; weasel\"\n\n# With location context\npython matcher.py --query \"deer; elk; moose\" --location \"British Columbia\"\n\n# Verbose mode for detailed debugging\npython matcher.py --query \"brown creeper\" --location \"British Columbia\" --verbose\n```\n\n**Verbose mode** shows:\n- How the input was parsed\n- What Gemini suggested (including full taxonomic hierarchy)\n- Which taxonomic level matched (species/genus/family/order/class)\n- Why matches might fail\n\nThis is useful for debugging mapping issues and understanding how species are being matched to the taxonomy.\n\n## Hierarchical Matching\n\nThe app supports hierarchical taxonomic matching when a species is not in the SpeciesNet taxonomy but a higher-level taxon is available:\n\n1. **Gemini provides full taxonomy**: For each candidate match, Gemini returns the complete taxonomic hierarchy (class, order, family, genus, species)\n\n2. **Hierarchical search**: If the species isn't found, the app tries matching at genus level, then family, order, and class\n\n3. **Uniqueness checking**: After processing all inputs:\n   - If only one input matches a higher-level taxon (e.g., \"Picoides\") → the match is kept\n   - If multiple inputs match the same taxon → all are marked as failed (ambiguous)\n\nFor example, \"american three-toed woodpecker\" might not be in SpeciesNet as a species, but if the genus \"Picoides\" is present and no other input also maps to \"Picoides\", the mapping will succeed with \"picoides\" in the Latin column.\n\n## Debugging Gemini Models\n\nIf you encounter errors regarding the Gemini model version (e.g., \"404 model not found\"):\n\n1.  Ensure your API key is correct in `gemini-key.txt`.\n2.  Run the debug script to list valid models for your account:\n    ```bash\n    python list_models.py\n    ```\n3.  Update the model name in `matcher.py` if necessary.\n\n## TODO\n\n1. Consider replacing calls to Gemini with an in-browser, open-weights LLM, e.g. via transfomers.js .","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagentmorris%2Fspeciesnet-taxonomy-mapper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagentmorris%2Fspeciesnet-taxonomy-mapper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagentmorris%2Fspeciesnet-taxonomy-mapper/lists"}