https://github.com/neotomadb/taxoninsertion
A package to bulk upload taxonomic data (beetles in this case) from Family/genus/species info.
https://github.com/neotomadb/taxoninsertion
Last synced: 11 months ago
JSON representation
A package to bulk upload taxonomic data (beetles in this case) from Family/genus/species info.
- Host: GitHub
- URL: https://github.com/neotomadb/taxoninsertion
- Owner: NeotomaDB
- License: mit
- Created: 2025-02-26T23:00:10.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-21T21:31:23.000Z (about 1 year ago)
- Last Synced: 2025-05-21T21:36:47.000Z (about 1 year ago)
- Language: Python
- Size: 38.1 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: code_of_conduct.md
Awesome Lists containing this project
README
[](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2410965&HistoricalAwards=false)
[](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2410961&HistoricalAwards=false)
[](https://doi.org/10.5281/zenodo.15485153)
# Taxon Insertion in Neotoma
This is a package intended to help support both bulk and individual taxon insertion in Neotoma.
What the package does:
1. Connects to the Neotoma Paleoecology Database (both main and tank).
2. Takes a taxon name and checks the name with GBIF and NCBI.
3. Finds the appropriate insertion point within Neotoma for insertion.
4. Returns a set of rows to be inserted into Neotoma to support the new taxonname.
5. Inserts new rows into Neotoma.
6. Supports rollback and commit features.
## Contributors
* [Simon Goring](http://goring.org): University of Wisconsin - Madison [](https://orcid.org/0000-0002-2700-4605)
## Using this Repository
Secrets to connect to the database are in an "ignored" file named `.env`. It should have the format:
```bash
DBAUTH={"host":"localhost","port":5432,"user":"postgres","password":"postgres","dbname":"neotoma"}
```
The repository uses the `uv` package manager to help manage dependencies and ensure a stable development platform. To use this repository, first clone the repository locally, and then, run `uv init` to install all neccessary packages. Once the packages are installed, the script `testing_beetles.py` can be run using `uv run testing_beetles.py`.
The script may be slow to run because it must interact with the Neotoma database (and could interact with GBIF and NCBI if wanted).
## Current Expected Input and Output
The script in this repository is designed to work with a CSV file provided by the [BUGSCep](https://bugscep.com/) research data group. That CSV has the format:
```csv
CODE,FAMILY,GENUS,SPECIES,AUTHORITY,epithet,neotoma_path,gbifid,ncbi_id,taxonRank,taxonomicStatus,ncbi_lineage_names
```
The `testing_beetles.py` file parses this document row by row and returns a similar CSV, with the additional columns `familyid`, `genusid`, and `speciesid`, with the associated Neotoma taxon IDs.