{"id":20648573,"url":"https://github.com/edikedik/lxtractor","last_synced_at":"2026-02-14T20:02:36.607Z","repository":{"id":55836699,"uuid":"440547430","full_name":"edikedik/lXtractor","owner":"edikedik","description":"Library for analysing protein structures and sequences","archived":false,"fork":false,"pushed_at":"2025-04-06T08:32:52.000Z","size":4080,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-23T21:36:13.383Z","etag":null,"topics":["bioinfomatics","computational-biology","data-analysis","data-mining","feature-extraction","python","structural-biology"],"latest_commit_sha":null,"homepage":"https://lxtractor.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/edikedik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-12-21T14:36:50.000Z","updated_at":"2025-04-06T08:32:55.000Z","dependencies_parsed_at":"2023-02-12T15:45:56.375Z","dependency_job_id":"c91143ec-e4b4-443b-926b-f3c0903df436","html_url":"https://github.com/edikedik/lXtractor","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/edikedik/lXtractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edikedik%2FlXtractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edikedik%2FlXtractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edikedik%2FlXtractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edikedik%2FlXtractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/edikedik","download_url":"https://codeload.github.com/edikedik/lXtractor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edikedik%2FlXtractor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29454688,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-14T15:52:44.973Z","status":"ssl_error","status_checked_at":"2026-02-14T15:52:11.208Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinfomatics","computational-biology","data-analysis","data-mining","feature-extraction","python","structural-biology"],"created_at":"2024-11-16T17:09:22.367Z","updated_at":"2026-02-14T20:02:36.588Z","avatar_url":"https://github.com/edikedik.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# lXtractor\n\n[![Coverage Status](https://coveralls.io/repos/github/edikedik/lXtractor/badge.svg?branch=master)](https://coveralls.io/github/edikedik/lXtractor?branch=master)\n[![Documentation status](https://readthedocs.org/projects/lxtractor/badge/?version=latest)](https://lxtractor.readthedocs.io/en/latest/?badge=latest)\n[![PyPi status](https://img.shields.io/pypi/v/lXtractor.svg)](https://pypi.org/project/lXtractor)\n[![Python version](https://img.shields.io/pypi/pyversions/lXtractor.svg)](https://pypi.org/project/lXtractor)\n[![Hatch project](https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg)](https://github.com/pypa/hatch)\n\n\u003cimg src=\"./fig/lXt_diagram.png\" alt=\"lXt_diagram\" width=\"300\"/\u003e\n\n## Introduction\n\n`lXtractor` is a toolbox devoted to feature extraction from macromolecular\nsequences and structures.\nIt's tailored towards creating shareable local data collections anchored to\na reference sequence-based object: a single sequence, MSA, or an HMM model.\nCurrently, it doesn't define any unique algorithms, aiming at simplicity and\ntransparency.\nIt simply provides a (hopefully) convenient interface simplifying mundane tasks,\nsuch as fetching the data, extracting domains, mapping sequences, and computing\nsequential and structural variables.\nSequences and structures anchored to a single reference object have a benefit\nof interpretability in downstream applications, such as fitting interpretable\nML models.\n\n## Installation\n\n`lXtractor` requires python\u003e=3.10 installed on a Unix system and is\ninstallable via pip\n\n```bash\npip install lXtractor\n```\n\nWe encourage users to first create a virtual environment via `conda` or `mamba`.\n\n## Usage\n\n`lXtractor` is designed to be flexible and its usage is defined by the initial\nhypothesis or a reference object that one wants to extrapolate towards the\nexisting sequences or structures.\nBelow, we'll provide a very abstract description of what this package is\nintended for.\n\nIn creating data collections, one could define the following steps::\n\n1. Assemble the data.\n2. Map reference object to assembled entries' sequences.\n3. Filter hits.\n4. Define and calculate variables -- sequence or structure descriptors.\n5. Save the data for later usage or modifications.\n\n`lXtractor` defines objects and routines helpful throughout this process.\nNamely, `PDB`, `SIFTS`, `AlphaFold`, `fetch_uniprot()`\ncan aid in the first step.\nThen, `Alignment` and `PyHMMer` can facilitate step 2.\nAt the end of the step 2 one will get a collection of `Chain*`-type objects.\nIf working with sequence-only collections, these are going to be\n`ChainSequence` objects.\nFor structure-only data, these are going to be ``ChainStructure`` containers,\nembedding `ChainSequence` and `GenericStructure` objects.\nFinally, dealing with mappings between canonical sequence associated with\na group of structures will result in ``Chain`` objects.\n\n`ChainList` wraps `Chain*`-type objects into a list-like collection with\nuseful operations allowing to quickly filter and bulk-modify `Chain*`-type\nobjects.\nThus, filtering typically comes down to using ``ChainList.filter()`` method that\naccepts a `Callable[Chain*, bool]` and returns a filtered `ChainList`.\nOne can save/load the collected objects using `ChainIO` and proceed\nwith the feature extraction.\n\n`lXtractor` defines various sequence and structure variables.\nVariable-related operations are handled by `GenericCalculator` and\n`Manager` classes. The former defines the calculation strategy and how\nthe calculations are parallelized, while the latter handles the calculations\nand aggregates the results into a pandas `DataFrame`.\n\nAs a result, one is left with a collection of `Chain*`-type objects and a\ntable with calculated variables. In addition, one can store the calculated\nvariables within the objects themselves, although we currently do not encourage\nthis practice.\n\n`lXtractor` is in the experimental stage and under active development.\nThus, objects' interfaces may change.\n\nFor the time being, one can check the examples of\n1. [finding sequence determinants](https://eboruta.readthedocs.io/en/latest/notebooks/sequence_determinants_tutorial.html)\nof tyrosine and serine-threonine kinases and\n2. [a protocol](https://github.com/edikedik/kinactive/blob/abae9c8a1fca0754d02e3f117dee210b587e666b/kinactive/db.py#L142)\nto build a complete structural collection of protein kinase domains.\n\nMore examples are to come in the future, so stay tuned. If you know a good example to apply `lXtractor`, feel free to raise an issue or reach out [ivan.reveguk@gmail.com](ivan.reveguk@gmail.com).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fedikedik%2Flxtractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fedikedik%2Flxtractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fedikedik%2Flxtractor/lists"}