https://github.com/ZhangYiqun018/GENOME

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/ZhangYiqun018/GENOME
Owner: ZhangYiqun018
License: mit
Created: 2025-01-31T02:21:51.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-03T10:14:28.000Z (over 1 year ago)
Last Synced: 2025-03-03T11:26:57.949Z (over 1 year ago)
Language: Python
Size: 558 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - ZhangYiqun018/GENOME - Cas9系统的脱靶效应。该项目使用PyTorch实现，并提供了一个用户友好的界面，方便研究人员和生物技术人员使用。GENOME模型的特色在于其高效的预测能力和对复杂基因组序列的理解。其工作原理是利用Transformer模型学习CRISPR-Cas9引导RNA与基因组序列之间的复杂关系，从而预测潜在的脱靶位点。项目提供预训练模型，用户也可以根据自己的数据进行微调。GENOME项目包含用于训练、评估和预测的代码，以及详细的文档和示例。该项目为基因组编辑的安全性评估提供了一个强大的工具，有助于减少脱靶效应，提高基因组编辑的精确性。项目地址是ZhangYiqun018/GENOME。 (基因 / 资源传输下载)

README

          # GENOME(+)

> **GENOME(+)** is a framework for population-based evolution of Large Language Models (LLMs), inspired by natural evolution. Starting with a population of parent LLMs, the framework enables model evolution through five key operations:

### Key Operations

- **Crossover**: Merges weights of different parent LLMs to create offspring

- **Mutation**: Introduces controlled random changes to foster diversity

- **Selection**: Prioritizes high-performing models

- **Succession**: Transfers learned experience from parents to offspring

- **Ensemble**: Combines the strengths of multiple evolved models for robust predictions

---

🌟 **Key Features**:

- **Rapid adaptation** with only 200 samples per new task

- **No gradients** required for evolution

- **Up to 54.8% accuracy gains** over initial population (on DROP dataset)

- **Effective scaling** with populations up to 40 LLMs

- **Zero-shot generalization** to unseen tasks

- **Runs on a single 4090 GPU** (24GB memory)

![GENOME+ Architecture](assets/genome.png)

## 📦 Installation

1. Clone the repository:

```bash

git clone https://github.com/yourusername/GENOME.git

cd GENOME

```

2. Install dependencies:

```bash

pip install -r requirements.txt

```

## 🚀 Usage

### GENOME

```bash

python run_genome.py \

    --tasks mmlu gsm8k arc_c \

    --task_weights 0.4 0.3 0.3 \

    --model_path meta-llama/Meta-Llama-3-8B-Instruct \

    --lora_dir lora_adapters \

    --combine_method ties \

    --population_size 30 \

    --max_iter 50

```

## 📁 Project Structure

```

GENOME/

├── src/                    # Source code

│   ├── genome/            # Genome optimization algorithms

│   ├── evaluate/          # Task evaluators (MMLU, GSM8K, etc.)

│   ├── base/              # Base classes and configurations

│   └── analysis/          # Analysis and visualization tools

├── scripts/               # Utility scripts

├── config/               # Configuration files

├── datas/                # Datasets

├── run_genome.py           # Genome algorithm entry

```

## 🔧 Extension Guide

### Adding New Evaluators

1. Create a new evaluator class in `src/evaluate`:

```python

from src.evaluate.eval import Evaluator, Method, Split

from typing import Dict, List

class NewTaskEvaluator(Evaluator):

    def __init__(self):

        super().__init__()

        self.data = {}

    

    def load_data(self, split: str):

        """Load dataset for specific split

        Args:

            split: One of 'train', 'valid', 'test', 'full'

        """

        data_path = f"datas/new_task/{split}.jsonl"

        self.data[split] = self.load_jsonl(data_path)

    

    def api_evaluate(self, client: 'OpenAI', **kwargs) -> float:

        """Evaluate using OpenAI API interface

        Args:

            client: OpenAI client instance

            **kwargs: Additional parameters

        Returns:

            float: Evaluation score

        """

        # Implement API-based evaluation

        return score

    

    def local_evaluate(self, model: 'LLM', **kwargs) -> float:

        """Evaluate using local vLLM model

        Args:

            model: vLLM model instance

            **kwargs: Additional parameters

        Returns:

            float: Evaluation score

        """

        # Implement local model evaluation

        return score

```

2. Register the new evaluator in `src/evaluate/factory.py`:

```python

from enum import Enum

from .new_task_evaluator import NewTaskEvaluator

class Benchmark(Enum):

    # ... existing benchmarks ...

    NEW_TASK = "new_task"  # Add new benchmark

class EvaluatorFactory:

    def get_evaluator(self, task: str):

        if isinstance(task, str):

            task = Benchmark(task.lower())

            

        if not isinstance(task, Benchmark):

            raise TypeError(f"Task must be a string or Benchmark enum, got {type(task)}")

            

        # ... existing evaluators ...

        elif task == Benchmark.NEW_TASK:

            return NewTaskEvaluator()

        else:

            raise ValueError(f"Evaluator for task {task} not found.")

```

### Adding New Methods

1. Create a new method configuration class in `src/base`:

```python

from src.base.base_config import BaseConfig

class NewMethodConfig(BaseConfig):

    def __init__(self, **kwargs):

        super().__init__(**kwargs)

        # Add method-specific configuration parameters

        

    def validate(self):

        """Validate configuration parameters"""

        super().validate()

        # Add method-specific validation

```

2. Create a new method class in `src`:

```python

from src.base.base_method import BaseMethod

class NewMethod(BaseMethod):

    def __init__(self, config: NewMethodConfig):

        self.config = config

        self.config.validate()

        # Initialize method-specific properties

        

    def search(self):

        """Implement search logic"""

        # Implement core optimization method logic

```

3. Create a run script `run_new_method.py`:

```python

import argparse

from src.new_method import NewMethod, NewMethodConfig

def parse_args():

    parser = argparse.ArgumentParser()

    # Add command line arguments

    return parser.parse_args()

def main():

    args = parse_args()

    config = NewMethodConfig(**vars(args))

    method = NewMethod(config)

    method.search()

if __name__ == "__main__":

    main()

```

### Using New Components

1. Using the new evaluator:

```bash

python run_modelswarms.py \

    --tasks new_task \

    --task_weights 1.0 \

    # ... other parameters

```

2. Using the new optimization method:

```bash

python run_new_method.py \

    --model_path meta-llama/Meta-Llama-3-8B-Instruct \

    --lora_dir lora_adapters \

    --task mmlu \

    # ... method-specific parameters

```

## 📝 Notes

- Ensure sufficient GPU resources for model deployment

- Recommended to use vLLM for efficient inference

- Performance can be optimized through parameter tuning

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ZhangYiqun018/GENOME

Awesome Lists containing this project

README