https://github.com/brsynth/icfree-ml
Design of experiments (DoE) and machine learning packages for the iCFree project
https://github.com/brsynth/icfree-ml
cell-free design-of-experiments latin-hypercube-sampling machine-learning
Last synced: 5 months ago
JSON representation
Design of experiments (DoE) and machine learning packages for the iCFree project
- Host: GitHub
- URL: https://github.com/brsynth/icfree-ml
- Owner: brsynth
- License: mit
- Created: 2022-01-13T17:39:28.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-02-20T11:41:32.000Z (over 1 year ago)
- Last Synced: 2025-09-09T20:14:08.672Z (9 months ago)
- Topics: cell-free, design-of-experiments, latin-hypercube-sampling, machine-learning
- Language: Python
- Homepage:
- Size: 5.48 MB
- Stars: 5
- Watchers: 0
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# iCFree
iCFree is a Python-based program designed to automate the process of generating and running a Snakemake workflow for sampling and preparing instructions for laboratory experiments. The program includes components for generating samples, creating plates, and instructing the handling of these plates.
## Table of Contents
- [iCFree](#icfree)
- [Table of Contents](#table-of-contents)
- [Installation](#installation)
- [Usage](#usage)
- [Basic Command](#basic-command)
- [Components](#components)
- [Sampler](#sampler)
- [Usage](#usage-1)
- [Arguments](#arguments)
- [Plate Designer](#plate-designer)
- [Usage](#usage-2)
- [Options](#options)
- [Instructor](#instructor)
- [Usage](#usage-3)
- [Options](#options-1)
- [Learner](#learner)
- [Usage](#usage-4)
- [Options](#options-2)
- [Example](#example)
- [License](#license)
- [Authors](#authors)
## Installation
1. **Install Conda:**
- Download the installer for your operating system from the [Conda Installation page](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html).
- Follow the instructions on the page to install Conda. For example, on Windows, you would download the installer and run it. On macOS and Linux, you might use a command like:
```bash
bash ~/Downloads/Miniconda3-latest-Linux-x86_64.sh
```
- Follow the prompts on the installer to complete the installation.
2. **Install iCFree from conda-forge:**
```bash
conda install -c conda-forge icfree
```
## Usage
The main entry point of the program is the `__main__.py` file. You can run the program from the command line by providing the necessary arguments for each step of the workflow.
### Basic Command
```bash
python -m icfree --sampler_input_filename --sampler_nb_samples --sampler_seed --sampler_output_filename --plate_designer_input_filename --plate_designer_sample_volume --plate_designer_default_dead_volume --plate_designer_num_replicates --plate_designer_well_capacity --plate_designer_start_well_src_plt --plate_designer_start_well_dst_plt --plate_generat...
```
### Components
#### Sampler
The sampler.py script generates Latin Hypercube Samples (LHS) for given components.
##### Usage
```bash
python icfree/sampler.py [--step ] [--seed ]
```
##### Arguments
- input_file: Input file path with components and their max values.
- output_file: Output CSV file path for the samples.
- num_samples: Number of samples to generate.
- --step: Step size for creating discrete ranges (default: 2.5).
- --seed: Seed for random number generation for reproducibility (optional).
#### Plate Designer
The plate_designer.py script generates plates based on the sampled data.
##### Usage
```bash
python icfree/plate_designer.py [options]
```
##### Options
- --default_dead_volume: Default dead volume.
- --dead_volumes: Dead volumes for specific wells.
- --num_replicates: Number of replicates.
- --well_capacity: Well capacity.
- --start_well_src_plt: Starting well for the source plate.
- --start_well_dst_plt: Starting well for the destination plate.
- --extra_wells: Extra wells to add to the plate.
- --output_folder: Folder to save the output files.
#### Instructor
The instructor.py script generates instructions for handling the generated plates.
##### Usage
```bash
python icfree/instructor.py [options]
```
##### Options
- --max_transfer_volume: Maximum transfer volume.
- --split_threshold: Threshold for splitting components.
- --source_plate_type: Type of the source plate.
- --split_components: Components to split.
- --dispense_order: Comma-separated list of component names specifying the dispensing order.
#### Learner
The Learner module carries out an active learning process to both train the model and explore the space of possible cell-free combinations.
##### Usage
```bash
python -m icfree.learner [options]
```
##### Options
- --name_list: a comma-separated string of column names or identifiers, converted to a list of strings representing columns that contain labels (y). This separates y columns from the rest (X features). (Default: Yield1,Yield2,Yield3,Yield4,Yield5)
- --test: a flag for validating the model; not required to run inside the active learning loop. If not set, skip the validating step.
- --nb_rep NB_REP: the number of test repetitions for validating the model behavior. 80% of data is randomly separated for training, and 20% is used for testing. (Default: 100)
- --flatten: a flag to indicate whether to flatten Y data. If set, treats each repetition in the same experiment independently; multiple same X values with different y outputs are modeled. Else, calculates the average of y across repetitions and only model with y average.
- --seed SEED: the random seed value used for reproducibility in random operations. (Default: 85)
- --nb_new_data_predict: The number of new data points sampled from all possible cases. (Default: 1000)
- --nb_new_data: The number of new data points selected from the generated ones. These are the data points labeled after active learning loops. `nb_new_data_predict` must be greater than `nb_new_data` to be meaningful. (Default: 50)
- --parameter_step: The step size used to decrement the maximum predefined concentration sequentially. For example, if the maximum concentration is `max`, the sequence of concentrations is calculated as: `max - 1 * parameter_step`, `max - 2 * parameter_step`, `max - 3 * parameter_step`, and so on. Each concentration is a candidate for experimental testing. Smaller steps result in more possible combinations to sample. (Default: 10)
- --n_group: parameter for the cluster margin algorithm, specifying the number of groups into which generated data will be clustered. (Default: 15)
- --km: parameter for the cluster margin algorithm, specifying the number of data points for the first selection. Ensure `nb_new_data_predict > ks > km`. (Default: 50)
- --ks: parameter for the cluster margin algorithm, specifying the number of data points for the second selection. This is also similar to `nb_new_data`. (Default: 20)
- --plot: a flag to indicate whether to generate all plots for analysis visualization.
- --save_plot: a flag to indicate whether to save all generated plots.
- --verbose: flag to indicate whether to print all messages to the console.
### Example
Here is an example of how to run the program with sample data:
```bash
python -m icfree --sampler_input_filename data/components.csv --sampler_nb_samples 100 --sampler_seed 42 --sampler_output_filename results/samples.csv --plate_designer_input_filename results/samples.csv --plate_designer_sample_volume 10 --plate_designer_default_dead_volume 2 --plate_designer_num_replicates 3 --plate_designer_well_capacity 200 --plate_designer_start_well_src_plt A1 --plate_designer_start_well_dst_plt B1 --plate_designer_output_folder results/plates --instructor_max_transfer_volume...
```
## License
This project is licensed under the MIT License. See the LICENSE file for details.
## Authors
ChatGPT, OpenAI