{"id":19630075,"url":"https://github.com/omsai/inbase","last_synced_at":"2025-10-04T06:42:20.569Z","repository":{"id":145739631,"uuid":"119761515","full_name":"omsai/inbase","owner":"omsai","description":"Cache inteins.com InBase database locally as Pandas DataFrame.","archived":false,"fork":false,"pushed_at":"2018-02-15T02:47:13.000Z","size":349,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-26T20:42:54.812Z","etag":null,"topics":["biopython","genomics","inbase","inteins"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/omsai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-01T00:33:59.000Z","updated_at":"2018-02-13T04:47:09.000Z","dependencies_parsed_at":"2023-03-28T00:19:34.991Z","dependency_job_id":null,"html_url":"https://github.com/omsai/inbase","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/omsai/inbase","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omsai%2Finbase","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omsai%2Finbase/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omsai%2Finbase/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omsai%2Finbase/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/omsai","download_url":"https://codeload.github.com/omsai/inbase/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omsai%2Finbase/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259868580,"owners_count":22924236,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["biopython","genomics","inbase","inteins"],"created_at":"2024-11-11T12:00:34.776Z","updated_at":"2025-10-04T06:42:15.514Z","avatar_url":"https://github.com/omsai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# InBase\n\n[![Build Status](https://travis-ci.org/omsai/inbase.svg?branch=master)](https://travis-ci.org/omsai/inbase)\n[![Coverage](https://codecov.io/gh/omsai/inbase/graphs/badge.svg)](https://codecov.io/gh/omsai/inbase)\n[![License: CC0-1.0](https://img.shields.io/badge/License-CC0%201.0-lightgrey.svg)](http://creativecommons.org/publicdomain/zero/1.0/)\n\nInBase provides a convenient pandas DataFrame of the 585 inteins in\nthe unmaintained [inteins.com](http://inteins.com) InBase database.\nThe protein sequences are available as biopython SeqRecord objects,\nbut otherwise nothing else is changed from the inteins.com metadata.\n\nInBase was collected using [scrapy](https://scrapy.org) and can\nupdated as detailed in the \"update database\" section below.\n\n# Installation\n\n    pip install --user git+https://github.com/omsai/inbase\n\n# Usage\n\n``` python\nfrom inbase import INBASE\n\n# See first few lines of all inteins.\nINBASE.head()\n# See first intein.\nINBASE.ix[0]\n# Access biopython seq record information of first intein.\nINBASE.ix[0, 'Intein aa Sequence']\n# Count archea inteins.\nINBASE['Domain of Life'].unique()\n(INBASE['Domain of Life'] == 'Archaea').sum()\n# Count all inteins.\nlen(INBASE)\n```\n\n# Development Environment\n\nVirtual environments and tests are orchestrated using `tox`.  Install\n`tox` using `pip`:\n\n    pip install --user tox\n\nMake sure that `~/.local/bin` or similar is in your path per\n[PEP 370](https://www.python.org/dev/peps/pep-0370/).\n\nInstall without tests:\n\n    tox --notest -e py27\n\n# Update DataBase\n\nUnfortunately `scrapy` does not provide an update function to check\nagainst the existing JSON data.  One has to redownload the database,\nbut which only takes a few seconds.  First, you will need to clone\nthis repository and create a \"development environment\" as described in\nthe section above.  Then initialize the data environment with the\n`scrapy` extras package:\n\n    tox --notest -e data\n\nCheck the current number of inbase records:\n\n    cat data/inbase.json | wc -l | xargs expr -2 +\n\nRedownload the data:\n\n    rm data/inbase.json\n    .tox/data/bin/scrapy runspider -o data/inbase.json inbase/update.py\n\nCheck the new number of records:\n\n    cat data/inbase.json | wc -l | xargs expr -2 +\n\nIf there indeed are more records, update your Manifest checksums,\nre-run the data tests and update your git repository and submit a pull\nrequest:\n\n    version=$(date +%Y%m%d.1)\n    sed -i -E \"s#(version=').*('.+)#\\1${version}\\2#\" setup.py\n    .tox/data/bin/gemato create --hashes \"MD5 SHA1 SHA256\" data/\n    tox -e data\n    git commit setup.py data/* -m \"MAINT: Update inbase database on $(date -I)\"\n\tgit push\n\n# Tests\n\nRun all non-data tests using:\n\n    tox\n\nDebug failing tests:\n\n    tox --pdb\n\nIf you add dependencies and get import errors, you need to recreate\nthe tox environment:\n\n    tox --recreate\n\nWhen you edit the files, you're likely going to create lots of linter\nerrors caught by the tox unit tests if your text editor doesn't have\ninteractive error reporting.  If you use Emacs, you can configure it\nfor python development by installing\n[elpy](https://github.com/jorgenschaefer/elpy).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomsai%2Finbase","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fomsai%2Finbase","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomsai%2Finbase/lists"}