{"id":28423173,"url":"https://github.com/neuralmagic/research","last_synced_at":"2025-06-24T23:31:20.220Z","repository":{"id":278992138,"uuid":"840431677","full_name":"neuralmagic/research","owner":"neuralmagic","description":"Repository to enable research flows","archived":false,"fork":false,"pushed_at":"2025-05-31T01:48:47.000Z","size":428,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-05-31T06:43:17.856Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/neuralmagic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-09T17:33:11.000Z","updated_at":"2025-05-31T01:48:51.000Z","dependencies_parsed_at":"2025-02-23T03:22:58.387Z","dependency_job_id":"74974f00-28d6-437a-867a-1e1cea1252d6","html_url":"https://github.com/neuralmagic/research","commit_stats":null,"previous_names":["neuralmagic/research"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/neuralmagic/research","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fresearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fresearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fresearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fresearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/neuralmagic","download_url":"https://codeload.github.com/neuralmagic/research/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/neuralmagic%2Fresearch/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261774696,"owners_count":23207782,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-05T08:35:35.369Z","updated_at":"2025-06-24T23:31:20.213Z","avatar_url":"https://github.com/neuralmagic.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Research Automation  \n\nThis repository provides a Python interface for creating, managing, and executing ClearML tasks and pipelines using Neural Magic's queueing system. It includes:  \n\n- General-purpose classes for task and pipeline management.  \n- Specialized classes for common research workflows, such as:  \n  - **llm-compressor** for quantization.\n  - **LMEval** for evaluation.\n  - **GuideLLM** for benchmarking.\n\n\n## Repository Structure\n```\n/ (root)\n  │── docs/                   # Documentation\n  │── examples/               # Example scripts\n  └── src/\n      │\n      └── automation/         # Main source code\n          │ \n          ├── tasks/          # Base task class and specialized tasks\n          │   └── scripts/    # Core scripts executed in tasks\n          │   └── callbacks/  # Callback functions that can be optionally executed within core scripts\n          │\n          ├── pipelines/      # Base pipeline class\n          │\n          ├── hpo/            # Base hyperparameter optimization class\n          │   └── callbacks/  # Callback functions that can be optionally executed within optimization\n          │\n          └── standards/      # Config files for standardized tasks \u0026 pipelines for research team\n```\n\n\n## Design Principles  \n\n- Use **lightweight wrappers** around ClearML’s existing classes and interfaces.\n- Leverage ClearML's `Task.create()` interface to separate task creation and management from task execution.\n  - Tasks can be instantiated anywhere but only core scripts are executed in the target environment (remote server, locally).\n- Use caution to only introduce specialized depencies in the core scripts, not in class definitions.\nFor instance, the LMEvalTask class manages evaluation task objects, but it does not depend on the `lm_eval` library.\nThe underlying script `lm_eval_script.py` introduces that depency and `lm_eval` needs only to be installed in the machine that runs the task.\n- Tasks and pipelines can be instantiated via `yaml` config files.\nThis allows creating standard tasks and pipelines by adding config files to the `standards/` folder.\n\n\n## Tasks \u0026 Core Scripts\n\n- The **`BaseTask`** class offers light wrapping around ClearML's `Task` class.\n  - `BaseTask` allows separation betweem task creation and execution.\n  - This separation is achieved by using **`Task.create()` instead of `Task.init()`**.  \n  - This allows task objects to be instantiated, created in the ClearML backend, and manipulated locally in Python scripts or Jupyter notebooks, even if execution happens remotely.\n  - This separation simplifies pipeline construction and prevents outdated task environments by ensuring fresh, up-to-date task creation.\n- **Specialized task classes**, such as `LLMCompressorTask`, inherit from `BaseTask`.\nSpecialized task classes are responsible for:\n  - Implementing how arguments are parsed and connected as parameters (`get_paramters()` method) or configurations (`get_configurations()` method) to the underlying ClearML Task.\n  - Implementing how to parse an optional `yaml` config file to define arguments.\n  - Specifying the core script that will execute in the target hardware.\n- **Core scripts** actually implement the execution side of tasks.\n  - **Core scripts** are only executed on the target environment (e.g., remote server).\n  - These scripts access parameters **exclusively** via `task.get_parameters()` (or `task.get_parameters_as_dict()`) and `task.get_configuration_object` (or `task.get_configuration_object_as_dict`).\n- `BaseTask` implements two execution methods: `execute_remotely()` and `execute_locally()`.\n  - This allows the same script to be deplpyed seamlessly locally or remotely.\n  - `execute_locally()` is built on top of `Task.init()`, so it doesn't support separate task creation and execution and must be used with caution.\n\n\n## Pipelines  \n\nPipelines are **specialized tasks** that consist of multiple subtasks executed in a **Directed Acyclic Graph (DAG)**.  \n\n- The **`BasePipeline`** class inherits from `BaseTask`, allowing a user to instantiate and create a pipeline similarly to a regular task.\n  - `pipeline_script.py` contains the logic that actually creates a **PipelineController** ClearML object.\n- **Specialized pipeline classes**, such as `LLMCompressorLMEvalPipeline`, inherit from `BasePipeline`.\nSimilarly to tasks, specialized pipelines are responsible for:\n  - Implementing how arguments are parsed.\n  - Implementing how to parse an optional `yaml` config file to define arguments.\n  - Specifying which steps and paramters are part of the pipeline\n\n⚠ **Note:** ClearML introduced `PipelineController.create()` in version 1.17, which is **not currently supported on our servers**.\nThis means that in ClearML 1.17 or newer `BasePipeline` may wrap the `PipelineController` class directly.\nTo be investigated when we upgrade ClearML.\n\n\n## Hyperparameter optimization\n\nClearML natively supports hyperparameter optimization via specialized tasks.\nIn the classes implemented here we mimic this logic by defining a **`BaseHPO`** class that inherits from `BaseTask`.\nThe script `hpo_script.py` that is executed remotely is responsible for instantiating ClearML's `HyperParameterOptimizer` class, which orchestrates the optimization process.\n\n\n## Standards  \n\nThe `standards/` folder contains **`yaml` config files** that control the behvior of specialized tasks or pipelines.\nThese config files **enforce standardized** execution of key research processes.  \n\n- **Example:**  \n  - `tasks/LMEvalTask`: General-purpose evaluation with the LMEval harness.  \n  - `standards/openllm.yaml`: Specifies configurations for LMEvalTask to evaluate the OpenLLM benchmark.\n\nBy using `standards/`, researchers can ensure consistency and best practices across projects. \n\n\n## Docs\n\nDocumentation on how to contribute to the repo by constructing new specialized classes or config files.\n\n\n## Examples\n\nExample scripts on how to use different task classes, pipelines and standards.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuralmagic%2Fresearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneuralmagic%2Fresearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuralmagic%2Fresearch/lists"}