Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jhylin/ml1-1_small_mols_in_chembl
Polars dataframe library and logistic regression in scikit-learn (update)
https://github.com/jhylin/ml1-1_small_mols_in_chembl
logistic-regression machine-learning parquet-files polars-dataframe scikit-learn
Last synced: about 1 month ago
JSON representation
Polars dataframe library and logistic regression in scikit-learn (update)
- Host: GitHub
- URL: https://github.com/jhylin/ml1-1_small_mols_in_chembl
- Owner: jhylin
- License: mit
- Created: 2024-08-31T09:13:59.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-11-01T22:03:38.000Z (2 months ago)
- Last Synced: 2024-11-21T16:15:03.774Z (about 1 month ago)
- Topics: logistic-regression, machine-learning, parquet-files, polars-dataframe, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 325 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
A series of Quarto markdown files (.qmd) are stored in this repository for an updated version of this old post on
"Small molecules in ChEMBL database - Series 1.1 - Polars dataframe library and machine learning in scikit-learn". This old post will be updated and splitted into four smaller posts for ease of reading. Each post also has its own Jupyter notebook version saved in the folder named "Jupyter_notebooks". Current ETA for posting is around mid-October to mid-November (one post each week).A quick overview on what each post will be about:
Post 1 - storing the small molecules data from ChEMBL in a compressed parquet file format using Polars dataframe library as the original .csv file is about 660 MB for ChEMBL version 31
Post 2 - preprocessing the data using Polars dataframe library prior to building a machine learning model
Post 3 - building a logistic regression model using scikit-learn and Polars dataframe library
Post 4 - evaluations of the logistic regression model using various calculations or metrics in scikit-learn