https://github.com/Sanofi-Public/indication-finding

Repository for "Comparing Methods for Identifying Novel Indications of Drugs using Electronic Health Records" publication
https://github.com/Sanofi-Public/indication-finding

Last synced: 4 months ago
JSON representation

Repository for "Comparing Methods for Identifying Novel Indications of Drugs using Electronic Health Records" publication

Host: GitHub
URL: https://github.com/Sanofi-Public/indication-finding
Owner: Sanofi-Public
License: other
Created: 2024-01-26T16:20:24.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-01-26T20:59:59.000Z (over 1 year ago)
Last Synced: 2025-01-11T04:35:57.540Z (4 months ago)
Language: Python
Homepage:
Size: 36.1 KB
Stars: 3
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

top-pharma-fr - **Sanofi-Public/indication-finding** - 01-26 20:59:59 | (Ranked by starred repositories)

README

# Comparing Methods for Identifying Novel Indications of Drugs using Electronic Health Records

Computational drug re-purposing has received a lot of attention in the past decade. However, methods developed to date focused on established compounds for which information on both, successfully treated patients and chemical and genomic impact, were known. Such information does not always exist for first-in-class drugs under development. To identify indications (diseases) for drugs under development we extended and tested several unsupervised computational methods that utilize Electronic Health Record (EHR) data. We tested the methods on known drugs with multiple indications and show that a variant of matrix factorization leads to the best performance for first-in-line drugs.

# Install
```
python3 -m venv indication_example
source indication_example/bin/activate
python3 -m pip install -e .
```

# Login
```
source indication_example/bin/activate
```

# Training matrix factorization
```
spark-submit --master [local[2],or other config] src/train_models/mf_factory.py --input_data s3a://path/to-your-data
```

# Input data format
The data is assumed to be stored on s3 in parquet files with the following schema: person_id (int), feature_name (string).
The feature names are the events that happened to each person in their medical records.

# Training config
The training config is set (and can be changed) in config/training_mf.yaml

# Output data format
The output is saved in the output_location, and contains embeddings for all features, including drug indications.

## Contacts
For any inquiries please raise a git issue and we will try to follow-up in a timely manner.

# License
License Notice: Permission is hereby granted, free of charge, for academic research purposes only and for non-commercial uses only, to any person from academic research or non-profit organizations obtaining a copy of this software and associated documentation files (the "Software"), to use, copy, modify, or merge the Software, subject to the following conditions: this permission notice shall be included in all copies of the Software or of substantial portions of the Software. For purposes of this license, “non-commercial use” excludes uses foreseeably resulting in a commercial benefit. All other rights are reserved. The Software is provided “as is”, without warranty of any kind, express or implied, including the warranties of noninfringement.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Sanofi-Public/indication-finding

Awesome Lists containing this project

README