Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/servierhub/top-life-sciences

Top Life Sciences open-source software
https://github.com/servierhub/top-life-sciences

List: top-life-sciences

ai awesome awesome-list awesome-lists bioinformatics biology biology-ai computational-biology computational-chemistry ebiology life-sciences lifescience lifesciences pharma pharmaceuticals servier

Last synced: about 1 month ago
JSON representation

Top Life Sciences open-source software

Awesome Lists containing this project

README

        

[![Servier Contributed](https://raw.githubusercontent.com/servierhub/.github/main/badges/contributed.svg)](https://github.com/ServierHub/)
# Top life sciences open source software
This is an automatically generated[^1] **ranked list** of [open source](https://opensource.org/osd) software from
[pharmaceutical companies](https://en.wikipedia.org/wiki/List_of_pharmaceutical_companies) and cross organizations,
[biotechnology companies](https://en.wikipedia.org/wiki/Category:Biotechnology_companies),
research institutes,
open source communities and individuals,
plus some life-science software from technological companies.

It's made from a **curated** list of [GitHub accounts](Results/SOURCES.md), and will be periodically refreshed from these sources' repositories.

You can also access [what they have updated lately](Results/NEW.md)
and [which topics are covered](Results/TOPICS.md) by these software.

## Ranked by starred repositories
> [!NOTE]
> **stars** - number of people who especially appreciated the repository

> **forks** - number of people who have cloned the repository in order to modify it

> **watchers** - number of people who are monitoring changes in the repository

> **main programming language**

> **license**

> **last update date & time**

|Rank|Software|
|---|:---|
|1|[**google-deepmind/alphafold**](https://github.com/google-deepmind/alphafold)
Open source code for AlphaFold.
11987 2135 226 Python Apache-2.0 license 2023-04-05 09:45:53 |
|2|[**deepchem/deepchem**](https://github.com/deepchem/deepchem)
Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology
`biology`, `deep-learning`, `drug-discovery`, `hacktoberfest`, `materials-science`, `quantum-chemistry`
5220 1626 Python MIT License 2024-06-08 13:03:11 |
|3|[**biopython/biopython**](https://github.com/biopython/biopython)
Official git repository for Biopython (originally converted from CVS)
`bioinformatics`, `biopython`, `dna`, `genomics`, `phylogenetics`, `protein`, `protein-structure`, `python`, `sequence-alignment`
4213 1728 168 Python Unknown LICENSE |
|4|[**google/deepvariant**](https://github.com/google/deepvariant)
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
`bioinformatics`, `deep-learning`, `deep-neural-network`, `deepvariant`, `dna`, `genome`, `genomics`, `machine-learning`, `ngs`, `science`, `sequencing`, `tensorflow`
3100 698 159 Python BSD-3-Clause license 2024-03-19 19:20:10 |
|5|[**facebookresearch/esm**](https://github.com/facebookresearch/esm)
Evolutionary Scale Modeling (esm): Pretrained language models for proteins
2917 577 63 Python MIT license 2022-10-18 13:38:47 |
|6|[**aqlaboratory/openfold**](https://github.com/aqlaboratory/openfold)
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
`alphafold2`, `protein-structure`, `pytorch`
2572 466 Python Apache License 2.0 2024-06-04 08:33:28 |
|7|[**rdkit/rdkit**](https://github.com/rdkit/rdkit)
The official sources for the RDKit library
`c-plus-plus`, `cheminformatics`, `python`, `rdkit`
2483 845 HTML BSD 3-Clause "New" or "Revised" License 2024-06-08 03:18:22 |
|8|[**AstraZeneca/awesome-explainable-graph-reasoning**](https://github.com/AstraZeneca/awesome-explainable-graph-reasoning)
A collection of research papers and software related to explainability in graph machine learning.
`awesome-list`, `deep-learning`, `explainable-ai`, `explainable-ml`, `graph`, `graph-algorithms`, `graphml`
1941 129 Apache License 2.0 2022-04-04 14:54:08 |
|9|[**OpenGene/fastp**](https://github.com/OpenGene/fastp)
An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
`adapter`, `bioinformatics`, `duplication`, `fastq`, `filter`, `filtering`, `illumina`, `merging`, `ngs`, `overlap`, `polyg`, `preprocessing`, `qc`, `quality`, `quality-control`, `sequencing`, `splitting`, `trimming`, `umi`
1803 333 C++ MIT License 2024-04-07 08:16:11 |
|10|[**scverse/scanpy**](https://github.com/scverse/scanpy)
Single-cell analysis in Python. Scales to >1M cells.
`anndata`, `bioinformatics`, `data-science`, `machine-learning`, `python`, `scanpy`, `scverse`, `transcriptomics`, `visualize-data`
1789 579 Python BSD 3-Clause "New" or "Revised" License 2024-06-07 08:43:34 |
|11|[**lh3/minimap2**](https://github.com/lh3/minimap2)
A versatile pairwise aligner for genomic and spliced nucleotide sequences
`bioinformatics`, `genomics`, `sequence-alignment`, `spliced-alignment`
1708 396 C Other 2024-05-22 19:58:33 |
|12|[**allenai/scispacy**](https://github.com/allenai/scispacy)
A full spaCy pipeline and models for scientific/biomedical documents.
`bioinformatics`, `biomedical`, `custom-pipes`, `nlp`, `scientific-documents`, `spacy`
1629 221 52 Python Apache-2.0 license 2024-03-08 05:57:56 |
|13|[**broadinstitute/gatk**](https://github.com/broadinstitute/gatk)
Official code repository for GATK versions 4 and up
`bioinformatics`, `dna`, `gatk`, `genome`, `genomics`, `ngs`, `science`, `sequencing`, `spark`
1621 577 156 Java specific 2023-12-13 22:53:56 |
|14|[**bioconda/bioconda-recipes**](https://github.com/bioconda/bioconda-recipes)
Conda recipes for the bioconda channel.
`bioinformatics`, `conda`, `hacktoberfest`, `package-management`
1595 3089 96 Shell MIT license |
|15|[**samtools/samtools**](https://github.com/samtools/samtools)
Tools (written in C using htslib) for manipulating next-generation sequencing data
1572 572 C Other 2024-06-07 09:32:59 |
|16|[**Slicer/Slicer**](https://github.com/Slicer/Slicer)
Multi-platform, free open source software for visualization and image computing.
`3d-printing`, `3d-slicer`, `c-plus-plus`, `computed-tomography`, `image-guided-therapy`, `image-processing`, `itk`, `kitware`, `medical-image-computing`, `medical-imaging`, `national-institutes-of-health`, `neuroimaging`, `nih`, `python`, `qt`, `registration`, `segmentation`, `tcia-dac`, `tractography`, `vtk`
1521 520 38 C++ specific |
|17|[**lh3/bwa**](https://github.com/lh3/bwa)
Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
`bioinformatics`, `fm-index`, `genomics`, `sequence-alignment`
1468 547 C GNU General Public License v3.0 2024-04-15 02:54:32 |
|18|[**DeepGraphLearning/torchdrug**](https://github.com/DeepGraphLearning/torchdrug)
A powerful and flexible machine learning platform for drug discovery
`deep-learning`, `drug-discovery`, `graph-neural-networks`, `pytorch`
1407 194 31 Python Apache-2.0 license 2023-07-16 22:37:17 |
|19|[**lh3/seqtk**](https://github.com/lh3/seqtk)
Toolkit for processing sequences in FASTA/Q formats
`bioinformatics`, `sequence-analysis`
1332 310 C MIT License 2023-10-24 15:01:39 |
|20|[**galaxyproject/galaxy**](https://github.com/galaxyproject/galaxy)
Data intensive science for everyone.
`bioinformatics`, `dna`, `genomics`, `hacktoberfest`, `ngs`, `pipeline`, `science`, `sequencing`, `usegalaxy`, `workflow`, `workflow-engine`
1329 967 69 Python specific 2024-05-07 13:56:26 |
|21|[**schrodinger/fixed-data-table-2**](https://github.com/schrodinger/fixed-data-table-2)
A React table component designed to allow presenting millions of rows of data.
1290 289 JavaScript Other 2024-05-23 05:13:10 |
|22|[**soedinglab/MMseqs2**](https://github.com/soedinglab/MMseqs2)
MMseqs2: ultra fast and sensitive search and clustering suite
`alignment`, `bioinformatics`, `blast`, `linclust`, `metagenomics`, `mmseqs`, `profile-search`, `sequence-clustering`, `sequence-search`, `taxonomy`
1281 181 C GNU General Public License v3.0 2024-05-23 07:07:21 |
|23|[**facebookresearch/fastMRI**](https://github.com/facebookresearch/fastMRI)
A large-scale dataset of both raw MRI measurements and clinical MRI images.
`convolutional-neural-networks`, `deep-learning`, `fastmri`, `fastmri-challenge`, `fastmri-dataset`, `medical-imaging`, `mri`, `mri-reconstruction`, `pytorch`
1259 370 74 Python MIT license 2023-06-26 17:17:06 |
|24|[**greenelab/deep-review**](https://github.com/greenelab/deep-review)
A collaboratively written review paper on deep learning, genomics, and precision medicine
`deep-learning`, `genomics`, `manubot`, `manuscript`, `neural-networks`, `review`
1235 271 129 HTML Unknown LICENSE.md 2018-03-12 15:06:48 |
|25|[**shenwei356/seqkit**](https://github.com/shenwei356/seqkit)
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
`bioinformatics`, `cross-platform`, `fasta`, `fastq`, `golang`, `manipulation`, `sequence`, `tool`, `toolkit`
1226 157 26 Go MIT license 2024-05-17 15:59:35 |
|26|[**MultiQC/MultiQC**](https://github.com/MultiQC/MultiQC)
Aggregate results from bioinformatics analyses across many samples into a single report.
`analysis`, `bioconda`, `bioinformatics`, `data-visualization`, `multiqc`, `pypi`, `python`, `quality-control`, `reporting`, `seqera`, `vizualisation`
1185 582 37 JavaScript GPL-3.0 license 2024-05-31 18:30:12 |
|27|[**dcm4che/dcm4che**](https://github.com/dcm4che/dcm4che)
DICOM Implementation in JAVA
1165 637 119 Java specific 2024-04-22 10:59:11 |
|28|[**scverse/scvi-tools**](https://github.com/scverse/scvi-tools)
Deep probabilistic analysis of single-cell and spatial omics data
`cite-seq`, `deep-generative-model`, `deep-learning`, `human-cell-atlas`, `scrna-seq`, `scverse`, `single-cell-genomics`, `single-cell-rna-seq`, `variational-autoencoder`, `variational-bayes`
1149 342 Python BSD 3-Clause "New" or "Revised" License 2024-06-05 17:01:13 |
|29|[**vgteam/vg**](https://github.com/vgteam/vg)
tools for working with genome variation graphs
`dna`, `genome-graph`, `genomics`, `graph`, `variation-graph`
1072 191 48 C++ specific 2024-05-20 18:50:28 |
|30|[**schrodinger/pymol-open-source**](https://github.com/schrodinger/pymol-open-source)
Open-source foundation of the user-sponsored PyMOL molecular visualization system.
1071 260 C Other 2024-06-06 19:36:48 |
|31|[**scipipe/scipipe**](https://github.com/scipipe/scipipe)
Robust, flexible and resource-efficient pipelines using Go and the commandline
`bioinformatics`, `bioinformatics-pipeline`, `cheminformatics`, `dataflow`, `fbp`, `go`, `golang`, `pipeline`, `scientific-workflows`, `scipipe`, `workflow`, `workflow-engine`
1055 72 38 Go MIT license 2021-10-14 09:11:34 |
|32|[**shenwei356/csvtk**](https://github.com/shenwei356/csvtk)
A cross-platform, efficient and practical CSV/TSV toolkit in Golang
`bioinformatics`, `command-line`, `cross-platform`, `csv`, `golang`, `tool`, `toolkit`, `tsv`
972 85 25 Go MIT license 2024-05-29 15:30:38 |
|33|[**bigdatagenomics/adam**](https://github.com/bigdatagenomics/adam)
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
`avro`, `big-data`, `bioinformatics`, `genomics`, `java`, `parquet`, `python`, `r`, `scala`, `spark`
967 304 Scala Apache License 2.0 2024-03-23 13:27:52 |
|34|[**broadinstitute/cromwell**](https://github.com/broadinstitute/cromwell)
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
`application`, `bioinformatics`, `cloud`, `containers`, `docker`, `executor`, `ga4gh`, `hpc`, `scala`, `wdl`, `workflow`, `workflow-description-language`, `workflow-execution`
965 351 112 Scala BSD-3-Clause LICENSE.txt 2024-05-07 17:47:13 |
|35|[**hail-is/hail**](https://github.com/hail-is/hail)
Cloud-native genomic dataframes and batch computing
`bioinformatics`, `genetics`, `genomics`, `gwas`, `hail`, `python`, `software`, `vcf`
946 238 55 Python MIT license 2024-06-05 17:48:05 |
|36|[**broadinstitute/picard**](https://github.com/broadinstitute/picard)
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
944 365 160 Java MIT license 2023-11-14 22:01:18 |
|37|[**aqlaboratory/proteinnet**](https://github.com/aqlaboratory/proteinnet)
Standardized data set for machine learning of protein structure
`dataset`, `deep-learning`, `machine-learning`, `protein-sequence`, `protein-structure`, `proteins`
849 130 Python MIT License 2020-11-18 23:43:32 |
|38|[**shenwei356/rush**](https://github.com/shenwei356/rush)
A cross-platform command-line tool for executing jobs in parallel
`bioinformatics`, `command`, `cross-platform`, `execute`, `golang`, `parallel`, `pipeline`, `shell`, `windows`
834 63 20 Go MIT license 2023-11-13 17:53:58 |
|39|[**evo-design/evo**](https://github.com/evo-design/evo)
DNA foundation modeling from molecular to genome scale
832 97 Jupyter Notebook Apache License 2.0 2024-04-30 22:35:34 |
|40|[**PaddlePaddle/PaddleHelix**](https://github.com/PaddlePaddle/PaddleHelix)
Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集
`biocomputing`, `ddi`, `deeplearning`, `dti`, `graph-networks`, `machine-learning`, `molecule-design`, `ppi`, `protein-design`, `protein-docking`, `protein-folding`, `protein-structure-prediction`, `representation-learning`, `rna-structure-prediction`, `self-supervised-learning`
799 188 25 Python Apache-2.0 license 2023-08-01 09:31:36 |
|41|[**samtools/htslib**](https://github.com/samtools/htslib)
C library for high-throughput sequencing data formats
`bam`, `bcf`, `bioinformatics`, `cram`, `htslib`, `ngs`, `sam`, `vcf`
779 448 C Other 2024-06-06 15:40:15 |
|42|[**google/nucleus**](https://github.com/google/nucleus)
Python and C++ code for reading and writing genomics data.
`bioinformatics`, `dna`, `genomics`, `tensorflow`
777 126 53 C++ specific 2021-08-31 23:19:33 |
|43|[**nroduit/Weasis**](https://github.com/nroduit/Weasis)
Weasis is a DICOM viewer available as a desktop application or as a web-based application.
`dicom`, `dicom-image`, `dicom-image-viewer`, `dicom-images`, `dicom-pr`, `dicom-rt`, `dicom-seg`, `dicom-viewer`, `dicom-web-viewer`, `dicomweb`, `ecg`, `export-dicom`, `medical`, `medical-imaging`, `multiplanar-reconstruction`, `viewer`, `volume-rendering`, `weasis`
763 281 49 Java specific 2024-05-06 18:42:54 |
|44|[**baidu-research/NCRF**](https://github.com/baidu-research/NCRF)
Cancer metastasis detection with neural conditional random field (NCRF)
`camelyon16`, `conditional-random-fields`, `deep-learning`, `pathology`, `whole-slide-imaging`
749 184 37 Python Apache-2.0 license 2018-06-17 18:22:34 |
|45|[**AstraZeneca/chemicalx**](https://github.com/AstraZeneca/chemicalx)
A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022)
`biology`, `chemistry`, `deep-chemistry`, `deep-learning`, `drug`, `drug-discovery`, `drug-interaction`, `drug-pair`, `geometric-deep-learning`, `geometry`, `graph-neural-network`, `machine-learning`, `pharma`, `polypharmacy`, `pytorch`, `smiles`, `smiles-strings`, `torch`, `torchdrug`
701 89 Python Apache License 2.0 2023-09-11 08:01:43 |
|46|[**samtools/hts-specs**](https://github.com/samtools/hts-specs)
Specifications of SAM/BAM and related high-throughput sequencing file formats
627 173 TeX 2024-06-06 06:50:26 |
|47|[**samtools/bcftools**](https://github.com/samtools/bcftools)
This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
626 241 C Other 2024-06-07 13:13:17 |
|48|[**insilicomedicine/GENTRL**](https://github.com/insilicomedicine/GENTRL)
Generative Tensorial Reinforcement Learning (GENTRL) model
596 216 Python 2020-04-28 11:58:05 |
|49|[**shenwei356/awesome**](https://github.com/shenwei356/awesome)
Awesome resources on Bioinformatics, data science, machine learning, programming language (Python, Golang, R, Perl) and miscellaneous stuff.
`awesome`, `data-science`, `git`, `golang`, `linux`, `perl`, `programing-language`, `python`
593 163 35 MIT license 2023-09-25 02:09:01 |
|50|[**chanzuckerberg/cellxgene**](https://github.com/chanzuckerberg/cellxgene)
An interactive explorer for single-cell transcriptomics data
`dataviz`, `scientific`, `scrna-seq`, `transcriptomics`, `visualization`
591 111 33 JavaScript MIT license 2023-12-19 22:19:07 |
|51|[**invesalius/invesalius3**](https://github.com/invesalius/invesalius3)
3D medical imaging reconstruction software
584 277 37 Python GPL-2.0 license 2022-04-14 02:28:31 |
|52|[**lh3/bioawk**](https://github.com/lh3/bioawk)
BWK awk modified for biological data
`bioinformatics`, `sequence-analysis`
582 121 C 2022-08-11 01:06:45 |
|53|[**MolecularAI/aizynthfinder**](https://github.com/MolecularAI/aizynthfinder)
A tool for retrosynthetic planning
`astrazeneca`, `chemical-reactions`, `cheminformatics`, `monte-carlo-tree-search`, `neural-networks`, `reaction-informatics`
548 125 Python MIT License 2024-06-03 13:34:33 |
|54|[**owkin/PyDESeq2**](https://github.com/owkin/PyDESeq2)
A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
`bioinformatics`, `differential-expression`, `python`, `rna-seq`, `transcriptomics`
533 58 Python MIT License 2024-06-06 01:43:52 |
|55|[**broadinstitute/infercnv**](https://github.com/broadinstitute/infercnv)
Inferring CNV from Single-Cell RNA-Seq
520 159 42 R specific 2020-02-07 20:29:28 |
|56|[**scverse/anndata**](https://github.com/scverse/anndata)
Annotated data.
`anndata`, `bioinformatics`, `data-science`, `machine-learning`, `scanpy`, `scverse`, `transcriptomics`
511 148 Python BSD 3-Clause "New" or "Revised" License 2024-06-07 16:03:50 |
|57|[**soedinglab/hh-suite**](https://github.com/soedinglab/hh-suite)
Remote protein homology detection suite.
`alignment`, `bioinformatics`, `cpp`, `hh-suite`, `hhblits`, `hhpred`, `hhsearch`, `opensource`, `profile-profile-search`, `profile-search`, `protein-structure`, `sequence-search`, `simd`, `viterbi`
509 128 C GNU General Public License v3.0 2023-08-13 08:44:05 |
|58|[**chhylp123/hifiasm**](https://github.com/chhylp123/hifiasm)
Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
`bioinformatics`, `denovo-assembly`, `genomics`, `hifi-read`, `pacbio`
490 84 28 C++ MIT license 2024-05-06 14:29:45 |
|59|[**insitro/redun**](https://github.com/insitro/redun)
Yet another redundant workflow engine
`aws`, `bioinformatics`, `data-engineering`, `data-science`, `docker`, `etl`, `gcp`, `ml`, `python`, `workflow-engine`
489 40 Python Apache License 2.0 2024-06-06 18:52:56 |
|60|[**biosustain/potion**](https://github.com/biosustain/potion)
Flask-Potion is a RESTful API framework for Flask and SQLAlchemy, Peewee or MongoEngine
`flask`, `flask-extensions`, `mongoengine`, `peewee`, `sqlalchemy`
488 51 Python Other 2019-04-23 17:00:39 |
|61|[**google-deepmind/alphamissense**](https://github.com/google-deepmind/alphamissense)
461 58 25 Python Apache-2.0 license |
|62|[**scverse/squidpy**](https://github.com/scverse/squidpy)
Spatial Single Cell Analysis in Python
`data-visualization`, `image-analysis`, `single-cell-genomics`, `single-cell-rna-seq`, `spatial-analysis`, `spatial-transcriptomics`, `squidpy`
399 71 Python BSD 3-Clause "New" or "Revised" License 2024-06-08 21:22:47 |
|63|[**lh3/minigraph**](https://github.com/lh3/minigraph)
Sequence-to-graph mapper and graph generator
`bioinformatics`, `genome-graph`, `genomics`, `pan-genome`, `sequence-alignment`
394 38 C MIT License 2024-05-22 00:59:12 |
|64|[**benevolentAI/guacamol**](https://github.com/benevolentAI/guacamol)
Benchmarks for generative chemistry
383 82 Python MIT License 2024-02-11 08:59:38 |
|65|[**calico/basenji**](https://github.com/calico/basenji)
Sequential regulatory activity predictions with deep convolutional neural networks.
373 119 Python Apache License 2.0 2024-05-28 20:08:23 |
|66|[**ome/bioformats**](https://github.com/ome/bioformats)
Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
`bio-formats`, `format-converter`, `format-reader`, `image`, `java`, `life-sciences-image`, `lightsheet`, `metadata`, `whole-slide-imaging`, `wsi`
367 239 Java GNU General Public License v2.0 2024-06-07 19:34:33 |
|67|[**MolecularAI/GraphINVENT**](https://github.com/MolecularAI/GraphINVENT)
Graph neural networks for molecular design.
356 74 Python MIT License 2023-03-11 11:55:32 |
|67|[**chembl/chembl_webresource_client**](https://github.com/chembl/chembl_webresource_client)
Official Python client for accessing ChEMBL API
`chembl`, `cheminformatics`, `chemistry`, `chemoinformatics`, `python`, `rest`, `rest-client`
356 95 Python Other 2024-02-26 15:44:57 |
|68|[**shenwei356/taxonkit**](https://github.com/shenwei356/taxonkit)
A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
`bioinformatics`, `cross-platform`, `lca`, `lineage`, `taxdump`, `taxid`, `taxonkit`, `taxonomy`
342 29 10 Go MIT license 2024-04-25 17:15:34 |
|69|[**deepchem/DeepLearningLifeSciences**](https://github.com/deepchem/DeepLearningLifeSciences)
Example code from the book "Deep Learning for the Life Sciences"
338 150 Jupyter Notebook MIT License 2021-09-17 05:10:37 |
|70|[**MolecularAI/Reinvent**](https://github.com/MolecularAI/Reinvent)
`astrazeneca`, `cheminformatics`, `denovo-design`, `neural-networks`, `reinforcement-learning`, `transfer-learning`
332 108 Python Apache License 2.0 2023-10-19 05:26:16 |
|71|[**aqlaboratory/rgn**](https://github.com/aqlaboratory/rgn)
Recurrent Geometric Networks for end-to-end differentiable learning of protein structure
`deep-learning`, `deep-neural-networks`, `protein-structure`, `protein-structure-prediction`
326 89 Python MIT License 2019-08-01 14:17:59 |
|72|[**tencent-ailab/grover**](https://github.com/tencent-ailab/grover)
This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data
313 68 7 Python specific 2021-01-18 09:06:32 |
|73|[**lh3/miniprot**](https://github.com/lh3/miniprot)
Align proteins to genomes with splicing and frameshift
`bioinformatics`, `sequence-alignment`
305 16 C MIT License 2024-04-12 21:01:25 |
|74|[**Roche/pyreadstat**](https://github.com/Roche/pyreadstat)
Python package to read sas, spss and stata files into pandas data frames. It is a wrapper for the C library readstat.
`conversion`, `pandas-dataframe`, `python`, `readstat`, `sas7bdat`, `spss`, `stata-files`
303 55 C Other 2024-06-04 09:55:07 |
|75|[**lh3/miniasm**](https://github.com/lh3/miniasm)
Ultrafast de novo assembly for long noisy reads (though having no consensus step)
`bioinformatics`, `denovo-assembly`, `genomics`
293 68 TeX MIT License 2023-12-13 01:35:58 |
|76|[**chanzuckerberg/MedMentions**](https://github.com/chanzuckerberg/MedMentions)
A corpus of Biomedical papers annotated with mentions of UMLS entities.
291 31 25 |
|77|[**AstraZeneca/rexmex**](https://github.com/AstraZeneca/rexmex)
A general purpose recommender metrics library for fair evaluation.
`coverage`, `deep-learning`, `evaluation`, `machine-learning`, `metric`, `metrics`, `mrr`, `personalization`, `precision`, `rank`, `ranking`, `recall`, `recommender`, `recommender-system`, `recsys`, `rsquared`
275 25 Python 2023-08-22 09:22:20 |
|78|[**samtools/htsjdk**](https://github.com/samtools/htsjdk)
A Java API for high-throughput sequencing data (HTS) formats.
`bam`, `cram`, `dna`, `fasta`, `genomics`, `java`, `java-api`, `ngs`, `sam`, `sequencing`, `vcf`
274 244 Java 2024-06-04 18:40:43 |
|79|[**shenwei356/brename**](https://github.com/shenwei356/brename)
A practical cross-platform command-line tool for safely batch renaming files/directories via regular expression
`batch`, `batch-rename`, `batch-rename-files`, `batch-renamer`, `go`, `golang`, `rename`, `safe`, `windows`
254 21 6 Go MIT license 2024-04-14 08:22:45 |
|80|[**lh3/wgsim**](https://github.com/lh3/wgsim)
Reads simulator
`bioinformatics`, `genomics`
252 90 C 2021-09-03 14:58:22 |
|81|[**Acellera/htmd**](https://github.com/Acellera/htmd)
HTMD: Programming Environment for Molecular Discovery
`automate`, `drug-discovery`, `htmd`, `molecular-simulations`
250 58 Rich Text Format Other 2024-06-07 15:24:26 |
|82|[**DeepGraphLearning/GearNet**](https://github.com/DeepGraphLearning/GearNet)
GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125)
`graph-neural-networks`, `pre-training`, `protein-representation-learning`
249 26 10 Python MIT license |
|83|[**MolecularAI/REINVENT4**](https://github.com/MolecularAI/REINVENT4)
AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization.
`ai`, `astrazeneca`, `cheminformatics`, `chemistry`, `deep-learning`, `denovo-design`, `drug-design`, `drug-discovery`, `generative-ai`, `ml`, `molecule-generation`, `neural-networks`, `reinforcement-learning`, `transfer-learning`
247 57 Python Apache License 2.0 2024-04-27 11:00:08 |
|84|[**rdkit/rdkit-tutorials**](https://github.com/rdkit/rdkit-tutorials)
Tutorials to learn how to work with the RDKit
239 71 Jupyter Notebook Other 2023-03-19 13:36:55 |
|85|[**insightsengineering/rtables**](https://github.com/insightsengineering/rtables)
Reporting tables with R
`pharmaceuticals`, `r`, `tables`
213 49 R Other 2024-06-07 21:27:39 |
|86|[**Bayer-Group/cloudformation-template-generator**](https://github.com/Bayer-Group/cloudformation-template-generator)
A type-safe Scala DSL for generating CloudFormation templates
211 71 Scala BSD 3-Clause "New" or "Revised" License 2022-07-29 11:32:04 |
|87|[**pharmaverse/admiral**](https://github.com/pharmaverse/admiral)
ADaM in R Asset Library
`cdisc`, `clinical-trials`, `open-source`, `r`
207 53 R Apache License 2.0 2024-06-07 18:23:44 |
|87|[**OpenGene/awesome-bio-datasets**](https://github.com/OpenGene/awesome-bio-datasets)
awesome-bio-datasets
207 42 MIT License 2017-10-28 12:32:15 |
|88|[**OpenGene/AfterQC**](https://github.com/OpenGene/AfterQC)
Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
`adapter-trimming`, `bioinformatics`, `error`, `fastq`, `filtering`, `ngs`, `overlap`, `qc`, `quality-control`, `sequencing`, `trimming`
203 50 Python MIT License 2020-05-14 07:15:54 |
|89|[**Bayer-Group/etcd-aws-cluster**](https://github.com/Bayer-Group/etcd-aws-cluster)
A container to assist in managing a etcd2 cluster from an Amazon auto scaling group
202 102 Shell BSD 3-Clause "New" or "Revised" License 2017-02-01 01:09:05 |
|89|[**modernatx/seqlike**](https://github.com/modernatx/seqlike)
Unified biological sequence manipulation in Python
`biological-sequences`, `biopython`, `machine-learning`, `sequence`
202 18 Python Apache License 2.0 2024-02-16 13:13:05 |
|89|[**scverse/scirpy**](https://github.com/scverse/scirpy)
A scanpy extension to analyse single-cell TCR and BCR data.
202 31 Python BSD 3-Clause "New" or "Revised" License 2024-06-06 06:21:35 |
|90|[**lh3/gfatools**](https://github.com/lh3/gfatools)
Tools for manipulating sequence graphs in the GFA and rGFA formats
`bioinformatics`, `genome-graph`, `genomics`
201 18 C 2024-02-20 15:29:14 |
|90|[**scverse/muon**](https://github.com/scverse/muon)
muon is a multimodal omics Python framework
`anndata`, `cite-seq`, `mudata`, `multi-omics`, `multimodal-data`, `multimodal-omics-analysis`, `muon`, `scanpy`, `scatac-seq`, `scrna-seq`, `scverse`
201 28 Python BSD 3-Clause "New" or "Revised" License 2024-05-30 21:21:35 |
|91|[**aws-samples/aws-batch-genomics**](https://github.com/aws-samples/aws-batch-genomics)
Software sets up and runs an genome sequencing analysis workflow using AWS Batch and AWS Step Functions.
199 75 39 Python Apache-2.0 license 2018-11-29 18:40:42 |
|92|[**rdkit/mmpdb**](https://github.com/rdkit/mmpdb)
A package to identify matched molecular pairs and use them to predict property changes.
195 53 Python Other 2024-04-30 10:55:30 |
|93|[**Acellera/moleculekit**](https://github.com/Acellera/moleculekit)
MoleculeKit: Your favorite molecule manipulation kit
`drug-discovery`, `machine-learning`, `molecular-modeling`, `molecular-simulation`, `molecule`, `proteins`
193 35 Python Other 2024-06-04 13:53:30 |
|94|[**bioinform/somaticseq**](https://github.com/bioinform/somaticseq)
An ensemble approach to accurately detect somatic mutations using SomaticSeq
`cancer-genomics`, `somatic-variants`
189 53 Python BSD 2-Clause "Simplified" License 2024-05-30 07:55:34 |
|95|[**MolecularAI/Chemformer**](https://github.com/MolecularAI/Chemformer)
188 34 Python Apache License 2.0 2024-05-29 14:43:33 |
|96|[**owkin/FLamby**](https://github.com/owkin/FLamby)
Cross-silo Federated Learning playground in Python. Discover 7 real-world federated datasets to test your new FL strategies and try to beat the leaderboard.
`dataset`, `deep-learning`, `differential-privacy`, `federated-learning`, `healthcare`, `machine-learning`, `python`
187 22 Python MIT License 2024-06-03 12:18:27 |
|96|[**ome/openmicroscopy**](https://github.com/ome/openmicroscopy)
OME (Open Microscopy Environment) develops open-source software and data format standards for the storage and manipulation of biological light microscopy data. A joint project between universities, research establishments and industry in Europe and the USA, OME has over 20 active researchers with strong links to the microscopy community. Funded …
`database`, `image`, `java`, `omero`, `python`, `server`
187 100 Java GNU General Public License v2.0 2024-06-08 00:39:30 |
|97|[**AstraZeneca-NGS/VarDict**](https://github.com/AstraZeneca-NGS/VarDict)
VarDict
186 60 Perl MIT License 2024-01-05 14:06:13 |
|97|[**scverse/spatialdata**](https://github.com/scverse/spatialdata)
An open and interoperable data framework for spatial omics data
186 34 Python BSD 3-Clause "New" or "Revised" License 2024-06-08 00:23:48 |
|98|[**haowenz/chromap**](https://github.com/haowenz/chromap)
Fast alignment and preprocessing of chromatin profiles
`bioinformatics`, `chromatin-profiles`, `genomics`, `sequence-analysis`
184 18 7 C++ MIT license 2024-02-06 15:29:20 |
|99|[**chao1224/MoleculeSTM**](https://github.com/chao1224/MoleculeSTM)
Multi-modal Molecule Structure-text Model for Text-based Editing and Retrieval, Nat Mach Intell 2023 (https://www.nature.com/articles/s42256-023-00759-6)
`clip`, `computation-chemistry`, `drug-discovery`, `editing`, `foundation-model`, `molecule-editing`, `moleculeclip`, `moleculestm`, `pretraining`, `retrieval`
182 17 4 Python specific 2024-04-19 05:25:24 |
|100|[**openpharma/visR**](https://github.com/openpharma/visR)
A package to wrap functionality for plots, tables and diagrams adhering to graphical principles.
179 32 R Other 2024-06-04 13:48:59 |
|100|[**chembl/ChEMBL_Structure_Pipeline**](https://github.com/chembl/ChEMBL_Structure_Pipeline)
ChEMBL database structure pipelines
179 38 Python MIT License 2023-10-25 15:20:47 |
|101|[**AstraZeneca/awesome-drug-discovery-knowledge-graphs**](https://github.com/AstraZeneca/awesome-drug-discovery-knowledge-graphs)
A collection of research papers, datasets and software related to knowledge graphs for drug discovery. Accompanies the paper "A review of biomedical datasets relating to drug discovery: a knowledge graph perspective" (Briefings in Bioinformatics, 2022)
`awesome-list`, `drug-discovery`, `drug-discovery-knowledge-graph`, `knowledge-graph`
177 19 Apache License 2.0 2023-09-10 16:33:40 |
|102|[**lh3/biofast**](https://github.com/lh3/biofast)
Benchmarking programming languages/implementations for common tasks in Bioinformatics
`bioinformatics`
175 26 C 2021-12-09 14:10:44 |
|103|[**shenwei356/kmcp**](https://github.com/shenwei356/kmcp)
Accurate metagenomic profiling && Fast large-scale sequence/genome searching
`bigsi`, `cobs`, `fracminhash`, `kmer`, `metagenomics`, `scaled-minhash`, `searching`, `sketch`, `sketching`, `syncmers`, `taxonomic-classification`, `taxonomic-profiling`, `virome`
173 13 6 Go MIT license 2023-09-22 04:09:54 |
|104|[**rgcgithub/regenie**](https://github.com/rgcgithub/regenie)
regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
172 49 C++ Other 2024-04-03 13:52:31 |
|105|[**soedinglab/metaeuk**](https://github.com/soedinglab/metaeuk)
MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics
`bioinformatics`, `eukaryotes`, `gene-discovery`, `gene-prediction`, `metagenomics`
171 24 C GNU General Public License v3.0 2024-05-30 09:04:06 |
|106|[**recursionpharma/gflownet**](https://github.com/recursionpharma/gflownet)
GFlowNet library specialized for graph & molecular data
`deep-learning`, `gflownet`, `graph-neural-network`, `pytorch`
168 34 Python MIT License 2024-06-06 13:29:06 |
|106|[**scverse/scanpy-tutorials**](https://github.com/scverse/scanpy-tutorials)
Scanpy Tutorials.
168 113 Jupyter Notebook 2024-06-03 19:42:01 |
|107|[**bioinform/neusomatic**](https://github.com/bioinform/neusomatic)
NeuSomatic: Deep convolutional neural networks for accurate somatic mutation detection
`convolutional-neural-networks`, `deep-learning`, `genomics`, `somatic-variants`
167 50 Python Other 2021-12-23 10:41:50 |
|108|[**lh3/readfq**](https://github.com/lh3/readfq)
Fast multi-line FASTA/Q reader in several programming languages
`bioinformatics`, `sequence-analysis`
166 60 C 2021-06-06 07:27:15 |
|109|[**insightsengineering/teal**](https://github.com/insightsengineering/teal)
Exploratory Web Apps for Analyzing Clinical Trial Data
`clinical-trials`, `nest`, `r`, `shiny`, `webapp`
164 29 R Other 2024-06-07 12:49:26 |
|110|[**lh3/cgranges**](https://github.com/lh3/cgranges)
A C/C++ library for fast interval overlap queries (with a "bedtools coverage" example)
`algorithm`, `bioinformatics`, `genomics`
161 18 C MIT License 2024-05-28 21:47:37 |
|110|[**lh3/kmer-cnt**](https://github.com/lh3/kmer-cnt)
Code examples of fast and simple k-mer counters for tutorial purposes
`bioinformatics`, `genomics`, `k-mer-counting`
161 13 C++ MIT License 2020-03-10 16:24:06 |
|111|[**greenelab/tybalt**](https://github.com/greenelab/tybalt)
Training and evaluating a variational autoencoder for pan-cancer gene expression data
`analysis`, `autoencoder`, `cancer`, `cancer-genomics`, `deep-learning`, `gene-expression`, `script`, `tool`, `unsupervised-learning`, `variational-autoencoder`, `variational-autoencoders`
159 62 10 HTML BSD-3-Clause license 2017-11-13 13:38:42 |
|112|[**aqlaboratory/genie**](https://github.com/aqlaboratory/genie)
De Novo Protein Design by Equivariantly Diffusing Oriented Residue Clouds
`diffusion-models`, `protein-design`
154 18 Python Apache License 2.0 2024-04-21 13:48:25 |
|113|[**DeepGraphLearning/ConfGF**](https://github.com/DeepGraphLearning/ConfGF)
Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).
153 34 10 Python MIT license |
|114|[**benevolentAI/DeeplyTough**](https://github.com/benevolentAI/DeeplyTough)
DeeplyTough: Learning Structural Comparison of Protein Binding Sites
`3d-models`, `deep-learning`, `drug-discovery`, `metric-learning`, `protein-structure`
151 39 Python Other 2023-04-07 09:33:44 |
|115|[**chao1224/GraphMVP**](https://github.com/chao1224/GraphMVP)
Pre-training Molecular Graph Representation with 3D Geometry, ICLR'22 (https://openreview.net/forum?id=xQUe1pOKPam)
`contrastive-learning`, `generative-model`, `geometry`, `graph`, `molecule`, `pretraining`, `self-supervised`, `self-supervised-learning`
150 20 5 Python MIT license 2022-09-20 14:29:48 |
|116|[**OpenGene/MutScan**](https://github.com/OpenGene/MutScan)
Detect and visualize target mutations by scanning FastQ files directly
`bioinformatics`, `cancer`, `detection`, `fastq`, `mutation`, `ngs`, `somatic`, `validation`, `variant`, `visualization`
146 38 C MIT License 2022-02-10 01:52:44 |
|117|[**MolecularAI/ReinventCommunity**](https://github.com/MolecularAI/ReinventCommunity)
`astrazeneca`, `cheminformatics`, `denovo-design`, `jupyter-notebook`, `neural-networks`, `reinforcement-learning`, `transfer-learning`
145 57 Jupyter Notebook MIT License 2022-04-22 16:44:35 |
|117|[**lh3/psmc**](https://github.com/lh3/psmc)
Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model
`bioinformatics`, `genomics`, `population-genetics`
145 60 C Other 2022-11-21 04:39:31 |
|117|[**tencent-ailab/DrugOOD**](https://github.com/tencent-ailab/DrugOOD)
OOD Dataset Curator and Benchmark for AI-aided Drug Discovery
145 19 6 Python specific |
|118|[**ome/ome-zarr-py**](https://github.com/ome/ome-zarr-py)
Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
`ngff`, `ome`, `ome-zarr`, `zarr`
143 51 Python Other 2024-06-06 12:51:57 |
|119|[**Novartis/tidymodules**](https://github.com/Novartis/tidymodules)
An Object-Oriented approach to Shiny modules
`communication`, `inheritance`, `oop`, `r`, `shiny`, `shiny-modules`, `tidy-operators`
141 11 R Other 2023-02-23 15:04:31 |
|120|[**aws-samples/aws-genomics-workflows**](https://github.com/aws-samples/aws-genomics-workflows)
Genomics Workflows on AWS
`aws`, `batch`, `genomics`, `step-functions`, `workflows`
140 106 19 Shell MIT-0 license 2022-03-30 21:38:09 |
|121|[**MolecularAI/deep-molecular-optimization**](https://github.com/MolecularAI/deep-molecular-optimization)
Molecular optimization by capturing chemist’s intuition using the Seq2Seq with attention and the Transformer
`molecular-optimization`, `multi-property-optimization`, `seq2seq`, `transformer`
139 36 Python Apache License 2.0 2023-03-16 07:05:06 |
|122|[**AstraZeneca/SubTab**](https://github.com/AstraZeneca/SubTab)
The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"
`contrastive-learning`, `multi-view-learning`, `representation-learning`, `self-supervised-learning`, `tabular-data`
138 20 Python Apache License 2.0 2022-07-01 09:03:38 |
|122|[**johnsonandjohnson/Bodiless-JS**](https://github.com/johnsonandjohnson/Bodiless-JS)
Framework for building editable websites on the JAMStack
138 59 TypeScript Apache License 2.0 2024-01-24 03:00:32 |
|123|[**Benson-Genomics-Lab/TRF**](https://github.com/Benson-Genomics-Lab/TRF)
Tandem Repeats Finder: a program to analyze DNA sequences
137 24 C GNU Affero General Public License v3.0 2023-01-16 20:44:26 |
|124|[**lh3/pangene**](https://github.com/lh3/pangene)
Constructing a pangenome gene graph
`bioinformatics`, `pangenome`
136 7 C 2024-05-29 00:13:01 |
|125|[**owkin/HistoSSLscaling**](https://github.com/owkin/HistoSSLscaling)
Code associated to the publication: Scaling self-supervised learning for histopathology with masked image modeling, A. Filiot et al., MedRxiv (2023). We publicly release Phikon 🚀
`computational-pathology`
135 11 Jupyter Notebook Other 2024-01-29 22:35:32 |
|126|[**AstraZeneca/awesome-shapley-value**](https://github.com/AstraZeneca/awesome-shapley-value)
Reading list for "The Shapley Value in Machine Learning" (JCAI 2022)
`artificial-intelligence`, `data-science`, `deep-learning`, `explainability`, `explainable`, `explainable-ai`, `explainable-artificial-intelligence`, `explainable-ml`, `lime`, `machine-learning`, `owen-value`, `shap`, `shapley`, `shapley-additive-explanations`, `shapley-decomposition`, `shapley-q-value`, `shapley-value`, `xai`
134 10 Apache License 2.0 2022-08-08 08:53:10 |
|127|[**lh3/bedtk**](https://github.com/lh3/bedtk)
A simple toolset for BED files (warning: CLI may change before bedtk becomes stable)
`bioinformatics`
132 15 C MIT License 2024-05-28 21:48:28 |
|128|[**Bioconductor/Contributions**](https://github.com/Bioconductor/Contributions)
Contribute Packages to Bioconductor
`bioconductor`
131 33 2023-09-12 18:32:10 |
|129|[**Merck/BioPhi**](https://github.com/Merck/BioPhi)
BioPhi is an open-source antibody design platform. It features methods for automated antibody humanization (Sapiens), humanness evaluation (OASis) and an interface for computer-assisted antibody sequence design.
`antibody`, `humanization`, `humanness`, `oasis`, `sapiens`
129 44 Python MIT License 2024-06-03 07:17:18 |
|129|[**soedinglab/plass**](https://github.com/soedinglab/plass)
sensitive and precise assembly of short sequencing reads
`bioinformatics`, `metagenomics`, `metatranscriptomics`, `opensource`, `proteins`, `proteomics`, `sequence-assembler`
129 14 C GNU General Public License v3.0 2024-04-16 20:44:12 |
|130|[**benevolentAI/guacamol_baselines**](https://github.com/benevolentAI/guacamol_baselines)
Baselines models for GuacaMol benchmarks
128 33 Python MIT License 2024-02-16 09:40:42 |
|131|[**AstraZeneca-NGS/VarDictJava**](https://github.com/AstraZeneca-NGS/VarDictJava)
VarDict Java port
127 52 Java MIT License 2024-01-05 14:03:51 |
|132|[**lh3/ksw2**](https://github.com/lh3/ksw2)
Global alignment and alignment extension
`bioinformatics`, `sequence-alignment`
124 24 C Other 2023-06-27 17:21:12 |
|132|[**chao1224/ChatDrug**](https://github.com/chao1224/ChatDrug)
LLM for Drug Editing, ICLR 2024
`chatgpt`, `chatgpt3`, `conversation`, `domain-feedback`, `drug`, `drug-discovery`, `drug-editing`, `editing`, `llm`, `molecule`, `motif`, `peptide`, `protein`, `retrieval`, `secondary-structure`, `small-molecule`, `structure`
124 8 3 Python 2024-05-28 19:44:44 |
|133|[**rdkit/rdkit-js**](https://github.com/rdkit/rdkit-js)
A powerful cheminformatics and molecule rendering toolbelt for JavaScript, powered by RDKit .
`cheminformatics`, `drug-discovery`, `javascript`, `molecule`, `molecule-viewer`, `molecule-visualization`, `node-js`, `npm`, `rdkit`, `react`, `wasm`
123 35 Dockerfile BSD 3-Clause "New" or "Revised" License 2024-06-01 09:54:52 |
|133|[**blazerye/DrugAssist**](https://github.com/blazerye/DrugAssist)
DrugAssist: A Large Language Model for Molecule Optimization
`ai-for-science`, `drug-discovery`, `instruction-datasets`, `instruction-tuning`, `large-language-models`, `molecule-generation`, `molecule-optimization`
123 10 3 Python |
|134|[**bigdatagenomics/mango**](https://github.com/bigdatagenomics/mango)
A scalable genome browser. Apache 2 licensed.
122 30 Scala Apache License 2.0 2022-12-02 22:21:57 |
|135|[**OpenGene/repaq**](https://github.com/OpenGene/repaq)
A fast lossless FASTQ compressor with ultra-high compression ratio
120 20 C MIT License 2023-09-22 02:48:34 |
|136|[**Bioconductor/BiocStickers**](https://github.com/Bioconductor/BiocStickers)
Stickers for some Bioconductor packages - feel free to contribute and/or modify.
`bioconductor`, `stickers`
119 86 R Other 2024-05-10 05:58:21 |
|136|[**greenelab/pancancer**](https://github.com/greenelab/pancancer)
Building classifiers using cancer transcriptomes across 33 different cancer-types
`analysis`, `cancer`, `classifier`, `gene-expression`, `machine-learning`, `methodology`, `pancancer`, `tcga`, `tool`, `transcriptome`
119 58 10 Jupyter Notebook BSD-3-Clause license 2018-03-01 15:38:33 |
|137|[**Roche/BalancedLossNLP**](https://github.com/Roche/BalancedLossNLP)
118 23 Jupyter Notebook Other 2023-06-12 21:51:15 |
|138|[**Merck/deepbgc**](https://github.com/Merck/deepbgc)
BGC Detection and Classification Using Deep Learning
`bidirectional-lstm`, `biosynthetic-gene-clusters`, `deep-learning`, `deepbgc`, `natural-products`, `pfam2vec`, `python`, `synthetic-biology`
117 26 Jupyter Notebook MIT License 2023-11-11 12:48:56 |
|138|[**benevolentAI/MolBERT**](https://github.com/benevolentAI/MolBERT)
117 35 Python MIT License 2021-06-06 10:28:35 |
|139|[**genentech/equifold**](https://github.com/genentech/equifold)
Official code repository for EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation
`machine-learning`, `proteins`, `structural-biology`, `structure-prediction`
116 15 Python Apache License 2.0 2023-01-08 19:51:30 |
|140|[**OpenGene/GeneFuse**](https://github.com/OpenGene/GeneFuse)
Gene fusion detection and visualization
`alk`, `bioinformatics`, `cancer`, `cosmic`, `eml4`, `fusion`, `gene`, `ret`, `ros1`
114 62 C MIT License 2022-02-21 08:07:06 |
|141|[**biosustain/cameo**](https://github.com/biosustain/cameo)
cameo - computer aided metabolic engineering & optimization
113 42 Python Apache License 2.0 2022-11-07 14:54:19 |
|142|[**EBI-Metagenomics/emg-viral-pipeline**](https://github.com/EBI-Metagenomics/emg-viral-pipeline)
VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies
`cwl`, `nextflow`, `pipeline`, `viruses`, `workflow`
109 13 Python Apache License 2.0 2024-05-08 20:10:03 |
|142|[**OpenGene/gencore**](https://github.com/OpenGene/gencore)
Generate duplex/single consensus reads to reduce sequencing noises and remove duplications
`bioinformatics`, `consensus`, `deduplication`, `deep-sequencing`, `duplex`, `duplex-sequencing`, `duplication`, `ngs`, `sequencing`, `sequencing-error`, `sequencing-noise`, `somatic`
109 32 C++ MIT License 2023-10-27 06:19:21 |
|142|[**OpenGene/fastv**](https://github.com/OpenGene/fastv)
An ultra-fast tool for identification of SARS-CoV-2 and other microbes from sequencing data. This tool can be used to detect viral infectious diseases, like COVID-19.
`2019-ncov`, `bioinformatics`, `coronavirus`, `covid`, `covid-19`, `hcov`, `meta-genomics`, `microbial-sequences`, `mngs`, `ngs`, `sars-cov-2`, `sequencing`, `viral`, `viral-infectious-diseases`, `virus`, `visualization`
109 24 C++ MIT License 2023-10-27 06:16:38 |
|143|[**lh3/yak**](https://github.com/lh3/yak)
Yet another k-mer analyzer
`bioinformatics`, `k-mer`
108 8 C MIT License 2024-04-01 21:39:44 |
|143|[**lh3/fermikit**](https://github.com/lh3/fermikit)
De novo assembly based variant calling pipeline for Illumina short reads
`bioinformatics`, `denovo-assembly`, `genomics`, `variant-calling`
108 23 TeX Other 2020-11-30 22:57:56 |
|144|[**Merck/Halyard**](https://github.com/Merck/Halyard)
Halyard is an extremely horizontally scalable Triplestore with support for Named Graphs, designed for integration of extremely large Semantic Data Models, and for storage and SPARQL 1.1 querying of the whole Linked Data universe snapshots.
107 17 Java Apache License 2.0 2023-01-23 16:59:32 |
|144|[**ome/ngff**](https://github.com/ome/ngff)
Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
`bioimaging`, `cloud`, `data-science`, `file-formats`, `spec`
107 38 Bikeshed Other 2024-06-02 06:26:47 |
|144|[**soedinglab/CCMpred**](https://github.com/soedinglab/CCMpred)
Protein Residue-Residue Contacts from Correlated Mutations predicted quickly and accurately.
107 25 C GNU Affero General Public License v3.0 2023-11-08 07:51:35 |
|145|[**lh3/minimap**](https://github.com/lh3/minimap)
This repo is DEPRECATED. Please use minimap2, the successor of minimap.
106 29 C MIT License 2017-09-20 14:15:02 |
|146|[**chao1224/Geom3D**](https://github.com/chao1224/Geom3D)
Geom3D: Geometric Modeling on 3D Structures, NeurIPS 2023
`3d`, `3d-structures`, `ai4science`, `biology`, `chemistry`, `crystals`, `drugs`, `equivariance`, `geometry`, `group`, `invariance`, `material`, `molecules`, `physics`, `proteins`, `symmetry`
105 9 2 Python MIT license 2024-06-05 03:18:58 |
|147|[**phuse-org/phuse-scripts**](https://github.com/phuse-org/phuse-scripts)
Delivery standard industry analyses, built upon CDISC standards for analysis data
104 88 SAS MIT License 2023-08-01 15:21:20 |
|147|[**chembl/FPSim2**](https://github.com/chembl/FPSim2)
Simple package for fast molecular similarity searches
`cheminformatics`, `chemistry`, `gpu`, `python`, `similarity-search`
104 17 Python MIT License 2024-02-15 11:13:05 |
|148|[**bayer-science-for-a-better-life/Img2Mol**](https://github.com/bayer-science-for-a-better-life/Img2Mol)
103 41 Jupyter Notebook Apache License 2.0 2023-03-24 18:07:41 |
|149|[**Biogen-Inc/tidyCDISC**](https://github.com/Biogen-Inc/tidyCDISC)
Demo the app here: https://bit.ly/tidyCDISC_app
`pharma`, `r`, `rinpharma`, `rstats`
102 38 R GNU Affero General Public License v3.0 2023-09-22 15:18:20 |
|150|[**openpharma/mmrm**](https://github.com/openpharma/mmrm)
Mixed Models for Repeated Measures (MMRM) in R.
100 17 R Other 2024-06-03 18:02:15 |
|150|[**MolecularAI/DockStream**](https://github.com/MolecularAI/DockStream)
DockStream: A Docking Wrapper to Enhance De Novo Molecular Design
`astrazeneca`, `chemoinformatics`, `denovo-design`, `jupyter-notebook`, `molecular-docking`, `reinforcement-learning`
100 30 Python Apache License 2.0 2023-03-16 07:07:10 |
|150|[**Bayer-Group/paquo**](https://github.com/Bayer-Group/paquo)
PAthological QUpath Obsession - QuPath and Python conversations
`digital-pathology`, `python`, `qupath`
100 16 Python GNU General Public License v3.0 2024-06-02 18:21:27 |
|151|[**genentech/gReLU**](https://github.com/genentech/gReLU)
gReLU is a python library to train, interpret, and apply deep learning models to DNA sequences.
99 5 Python MIT License 2024-06-07 20:29:13 |
|152|[**lh3/hickit**](https://github.com/lh3/hickit)
TAD calling, phase imputation, 3D modeling and more for diploid single-cell Hi-C (Dip-C) and general Hi-C
`bioinformatics`, `genomics`, `hi-c`
98 11 C 2021-02-04 01:47:43 |
|153|[**aqlaboratory/rgn2**](https://github.com/aqlaboratory/rgn2)
97 28 Python 2023-11-28 17:16:23 |
|154|[**lh3/bgt**](https://github.com/lh3/bgt)
Flexible genotype query among 30,000+ samples whole-genome
`bioinformatics`, `genomics`
96 10 C MIT License 2019-09-04 19:43:27 |
|154|[**scverse/rapids_singlecell**](https://github.com/scverse/rapids_singlecell)
Rapids_singlecell: A GPU-accelerated tool for scRNA analysis. Offers seamless scverse compatibility for efficient single-cell data processing and analysis.
`anndata`, `bioinformatics`, `gpu`, `scverse`, `single-cell`
96 18 Python MIT License 2024-06-03 18:07:06 |
|154|[**shenwei356/bio_scripts**](https://github.com/shenwei356/bio_scripts)
Practical, reusable scripts for bioinformatics
`bioinformatics`, `perl`, `python`, `reusable`, `script`
96 65 Perl MIT License 2019-02-12 13:21:47 |
|155|[**EBISPOT/OLS**](https://github.com/EBISPOT/OLS)
Ontology Lookup Service from SPOT at EBI
`java`, `neo4j`, `obofoundry`, `owl`, `owl-api`
95 40 JavaScript Apache License 2.0 2023-04-28 20:09:19 |
|156|[**Sanofi-Public/CodonBERT**](https://github.com/Sanofi-Public/CodonBERT)
Repository for mRNA Paper and CodonBERT publication.
94 14 Python Other 2024-05-03 19:24:06 |
|156|[**OpenGene/scrnapip**](https://github.com/OpenGene/scrnapip)
A Systematic and Dynamic Pipeline for Single-Cell RNA Sequencing Analysis
94 14 HTML 2023-10-16 01:24:06 |
|157|[**EBI-Metagenomics/genomes-catalogue-pipeline**](https://github.com/EBI-Metagenomics/genomes-catalogue-pipeline)
MGnify genome analysis pipeline
93 21 Python Other 2024-06-06 09:44:21 |
|158|[**samtools/tabix**](https://github.com/samtools/tabix)
Note: tabix and bgzip binaries are now part of the HTSlib project.
92 40 C 2021-08-03 14:29:38 |
|158|[**shenwei356/BlackheartedHospital**](https://github.com/shenwei356/BlackheartedHospital) (forked from: [open-power-workgroup/Hospital](https://github.com/open-power-workgroup/Hospital))
网传附莆田系医院名单,欢迎更新
92 15 2016-05-03 07:06:09 |
|159|[**AbSciBio/unlocking-de-novo-antibody-design**](https://github.com/AbSciBio/unlocking-de-novo-antibody-design)
91 14 Other 2024-01-09 17:36:19 |
|159|[**schrodinger/gpusimilarity**](https://github.com/schrodinger/gpusimilarity)
A Cuda/Thrust implementation of fingerprint similarity searching
`cheminformatics`, `chemistry`, `gpu`, `similarity-analysis`
91 26 C++ BSD 3-Clause "New" or "Revised" License 2024-01-24 19:08:08 |
|159|[**lh3/dipcall**](https://github.com/lh3/dipcall)
Reference-based variant calling pipeline for a pair of phased haplotype assemblies
91 9 JavaScript MIT License 2021-06-06 20:36:10 |
|160|[**Bioconductor/CSAMA**](https://github.com/Bioconductor/CSAMA)
Course material for CSAMA: Statistical Data Analysis for Genome Scale Biology
89 45 HTML 2024-06-06 12:04:08 |
|160|[**AstraZeneca/onto_merger**](https://github.com/AstraZeneca/onto_merger)
OntoMerger is an ontology alignment library for deduplicating knowledge graph nodes that represent the same domain.
`algorithm`, `alignment`, `biological-networks`, `biology`, `graph`, `kg`, `knowledge`, `knowledge-graph`, `mapping`, `ontology`, `ontology-alignment`
89 5 HTML Apache License 2.0 2024-01-11 19:22:08 |
|160|[**hoelzer-lab/rnaflow**](https://github.com/hoelzer-lab/rnaflow)
A simple RNA-Seq differential gene expression pipeline using Nextflow
89 19 HTML GNU General Public License v3.0 2024-02-26 20:45:37 |
|160|[**shenwei356/perfect-bioinformatic-tools**](https://github.com/shenwei356/perfect-bioinformatic-tools)
What should perfect bioinformatic tools be like?
`bioinformatics`, `cli`, `usability`
89 1 Creative Commons Zero v1.0 Universal 2024-03-19 10:22:54 |
|161|[**Sanofi-IADC/whispr**](https://github.com/Sanofi-IADC/whispr)
Open source event, comment and alert processing hub created by Sanofi IADC
88 8 TypeScript MIT License 2024-06-04 12:01:03 |
|161|[**calico/scBasset**](https://github.com/calico/scBasset)
Sequence-based Modeling of single-cell ATAC-seq using Convolutional Neural Networks.
88 11 Jupyter Notebook Apache License 2.0 2024-02-08 19:20:16 |
|161|[**shenwei356/bio**](https://github.com/shenwei356/bio)
A lightweight and high-performance bioinformatics package in Golang
`bioinformatics`, `golang`, `minimizer`, `package`, `scaled-minhash`, `sequence`, `syncmer`, `taxdump`, `taxonomy`
88 9 7 Go MIT license 2024-03-11 09:41:44 |
|162|[**owkin/HE2RNA_code**](https://github.com/owkin/HE2RNA_code)
Train a model to predict gene expression from histology slides.
87 39 Python GNU General Public License v3.0 2022-07-06 20:53:24 |
|162|[**scverse/pertpy**](https://github.com/scverse/pertpy)
Perturbation Analysis in the scverse ecosystem.
`perturbation`, `scverse`, `single-cell`
87 19 Python MIT License 2024-06-08 08:07:34 |

[Next page](Results/README-2.md)

[^1]: This page was generated with the [topgh](https://github.com/HubTou/topgh) open source software on 2024-06-09