{"id":31041282,"url":"https://github.com/codelion/icm","last_synced_at":"2026-02-27T08:04:56.664Z","repository":{"id":310356233,"uuid":"1002477908","full_name":"codelion/icm","owner":"codelion","description":"Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs","archived":false,"fork":false,"pushed_at":"2025-08-31T00:43:41.000Z","size":113,"stargazers_count":16,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-31T02:36:25.049Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codelion.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-15T15:04:19.000Z","updated_at":"2025-08-31T01:37:17.000Z","dependencies_parsed_at":"2025-08-21T14:18:07.605Z","dependency_job_id":null,"html_url":"https://github.com/codelion/icm","commit_stats":null,"previous_names":["codelion/icm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/codelion/icm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelion%2Ficm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelion%2Ficm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelion%2Ficm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelion%2Ficm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codelion","download_url":"https://codeload.github.com/codelion/icm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codelion%2Ficm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275088381,"owners_count":25403373,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-14T02:00:10.474Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-14T09:53:18.390Z","updated_at":"2026-02-27T08:04:51.639Z","avatar_url":"https://github.com/codelion.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Internal Coherence Maximization (ICM)\n\n**ICM** (Internal Coherence Maximization) is a Python tool for unsupervised elicitation of language models. Based on the paper [\"Unsupervised Elicitation of Language Models\"](https://arxiv.org/abs/2506.10139), ICM fine-tunes pretrained language models on their own generated labels without external supervision.\n\n## Key Features\n\n- **Unsupervised Learning**: Generate high-quality labeled datasets without human supervision\n- **Mutual Predictability**: Find labels that are logically consistent and mutually predictable\n- **Multiple Task Types**: Support for classification, comparison, mathematical reasoning, and more\n- **Flexible Export**: Export to various formats (DPO, CSV, JSON) and push to Hugging Face\n\n## Installation\n\n### From Source\n```bash\ngit clone https://github.com/codelion/icm.git\ncd icm\npip install -e .\n```\n\n### Dependencies\n```bash\npip install -r requirements.txt\n```\n\n## Quick Start\n\n### Basic Usage\n\nGenerate a labeled dataset using ICM:\n\n```bash\nicm run --model google/gemma-3-1b-it --dataset truthful_qa --task-type truthfulqa --max-examples 100\n```\n\n### Export to Training Format\n\n```bash\nicm export --input-path icm_results/truthfulqa_dialoGPT_20240115_143022.jsonl --output-path truthfulqa_dpo.jsonl --format dpo\n```\n\n### Push to Hugging Face\n\n```bash\nicm push --input-path truthfulqa_dpo.jsonl --hf-repo-id your-username/icm-truthfulqa-dataset\n```\n\n## Try Now\n\n| Use Case | Dataset | Link |\n|----------|----------|-------|\n| Fine-tuning the model | dpo dataset | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1iJFjnTAjPPxjBi0PC3qQSLFIMFsANRUO?usp=sharing)|\n\n## Algorithm Overview\n\nICM uses two key components:\n\n1. **Mutual Predictability**: Measures how well the model can predict each label given all other labels\n2. **Logical Consistency**: Enforces simple logical constraints to prevent degenerate solutions\n\nThe algorithm uses simulated annealing to search for optimal label assignments that maximize:\n\n```\nU(D) = α × P_θ(D) - I(D)\n```\n\nWhere:\n- `P_θ(D)` is the mutual predictability score\n- `I(D)` is the inconsistency penalty  \n- `α` balances the two terms\n\n## Supported Tasks\n\n### TruthfulQA (Truthfulness)\n```bash\n# Fully automatic - detects config='multiple_choice' and split='validation'\nicm run --model google/gemma-3-1b-it --dataset truthful_qa --task-type truthfulqa\n\n# Or explicitly specify parameters\nicm run --model google/gemma-3-1b-it --dataset truthful_qa --config multiple_choice --split validation --task-type truthfulqa\n```\n\n### GSM8K (Mathematical Reasoning)\n```bash\n# Fully automatic - detects config='main'\nicm run --model google/gemma-3-1b-it --dataset gsm8k --task-type gsm8k\n\n# Or explicitly specify parameters\nicm run --model google/gemma-3-1b-it --dataset gsm8k --config main --task-type gsm8k\n```\n\n### Custom Datasets\n```bash\nicm run --model google/gemma-3-1b-it --dataset path/to/dataset.jsonl --task-type classification\n```\n\n## Synthetic Datasets\n\nICM can generate synthetic datasets for testing and experimentation. These are perfect for:\n- **Testing ICM**: Validate the algorithm on simple, verifiable tasks\n- **Quick experiments**: Generate datasets instantly without external dependencies\n- **Educational purposes**: Understand how ICM works with clear logical relationships\n\n### Available Synthetic Types\n\n#### **Math Dataset** (`--synthetic math`)\nGenerates **simple addition problems** with both correct and incorrect solutions:\n\n**Example Output:**\n```\nQuestion: What is 42 + 17?\nClaim: 42 + 17 = 59\nI think this Claim is [True/False]\n```\n\n**How it works:**\n- Random numbers between 1-100\n- Creates correct solutions (True labels)\n- Creates incorrect solutions with random errors (False labels)  \n- **Double the requested size**: `--synthetic-size 500` creates 1000 examples (500 correct + 500 incorrect)\n- **Perfectly balanced**: 50% True, 50% False labels\n\n#### **Comparison Dataset** (`--synthetic comparison`)\nGenerates **number comparison tasks**:\n\n**Example Output:**\n```\nQuery: Which number is larger?\nResponse A: 73\nResponse B: 45\nClaim: Response A is larger than Response B\nI think this Claim is [True/False]\n```\n\n**How it works:**\n- Random pairs of numbers\n- True/False based on actual comparison\n- Single example per iteration (not doubled)\n\n### Usage Examples\n\n```bash\n# Math problems - creates 1000 examples (500 pairs)\nicm run --model google/gemma-3-1b-it --synthetic math --synthetic-size 500\n\n# Number comparisons - creates 300 examples  \nicm run --model google/gemma-3-1b-it --synthetic comparison --synthetic-size 300\n\n# Quick test with defaults (100 examples)\nicm run --model google/gemma-3-1b-it --synthetic math\n```\n\n### Why Use Synthetic Datasets?\n\n1. **Instant generation**: No need to download or configure external datasets\n2. **Verifiable ground truth**: Clear logical relationships for validation\n3. **Reproducible**: Consistent results with same seed\n4. **Perfect for testing**: Simple tasks ideal for algorithm validation\n5. **No dependencies**: Works offline without internet connection\n\n### Dataset Format\n\nAll synthetic examples follow the standard ICM format:\n```json\n{\n  \"input\": \"Question: What is 42 + 17?\\nClaim: 42 + 17 = 59\\nI think this Claim is [True/False]\",\n  \"metadata\": {\n    \"gold_label\": \"True\",\n    \"task\": \"math\"\n  }\n}\n```\n\n## Command Reference\n\n### `icm run`\n\nRun ICM on a dataset to generate labeled examples.\n\n**Required Arguments:**\n- `--model`: Model name or path (e.g., `google/gemma-3-1b-it`)\n\n**Dataset Arguments:**\n- `--dataset`: Dataset name or path\n- `--task-type`: Task type (`auto`, `classification`, `comparison`, `truthfulqa`, `gsm8k`)\n- `--split`: Dataset split (default: `train`)\n- `--max-examples`: Maximum examples to process\n\n**Synthetic Dataset Options:**\n- `--synthetic`: Create synthetic dataset (`math`, `comparison`)\n- `--synthetic-size`: Number of synthetic examples to generate (default: 100)\n\n**ICM Algorithm Parameters:**\n- `--alpha`: Weight for mutual predictability vs consistency (default: 100.0)\n- `--initial-temperature`: Starting temperature for simulated annealing (default: 3.0)\n- `--final-temperature`: Ending temperature (default: 0.001)\n- `--cooling-rate`: Temperature cooling rate (default: 0.98)\n- `--initial-examples`: Number of initial random examples (default: 20)\n- `--max-iterations`: Maximum search iterations (default: 1000)\n\n**Generation Parameters:**\n- `--generation-temperature`: Temperature for text generation (default: 0.2)\n- `--generation-top-p`: Top-p for nucleus sampling (default: 0.9)\n- `--generation-max-tokens`: Maximum tokens to generate (default: 512)\n\n**System Parameters:**\n- `--device`: Computation device (`cuda`, `cpu`, `auto`)\n- `--seed`: Random seed for reproducibility (default: 42)\n- `--log-level`: Logging level (`DEBUG`, `INFO`, `WARNING`, `ERROR`)\n\n### `icm export`\n\nExport ICM results to various formats.\n\n**Required Arguments:**\n- `--input-path`: Path to ICM result file\n- `--output-path`: Output file path\n- `--format`: Export format (`json`, `dpo`, `csv`, `analysis`)\n\n**Optional Arguments:**\n- `--include-stats`: Include statistics in JSON export\n- `--create-pairs`: Create chosen/rejected pairs for DPO format\n- `--hf-push`: Push to Hugging Face after export\n- `--hf-repo-id`: Hugging Face repository ID\n- `--private`: Make Hugging Face repository private\n\n### `icm push`\n\nPush files to Hugging Face Hub.\n\n**Required Arguments:**\n- `--input-path`: Local file path to upload\n- `--hf-repo-id`: Hugging Face repository ID (e.g., `username/dataset-name`)\n\n**Optional Arguments:**\n- `--file-name`: Custom filename in repository\n- `--private`: Make repository private\n\n### `icm list`\n\nList all saved ICM results.\n\n```bash\nicm list --results-dir icm_results\n```\n\n### `icm analyze`\n\nAnalyze ICM results and show statistics.\n\n```bash\n# Analyze all results\nicm analyze\n\n# Analyze specific result file\nicm analyze --result-file icm_results/truthfulqa_gpt2_20240115_143022.jsonl\n```\n\n### `icm clean`\n\nClean old result files, keeping only the latest N results.\n\n```bash\nicm clean --keep-latest 10\n```\n\n## Configuration\n\n### Using Configuration Files\n\nCreate a `config.json` file:\n\n```json\n{\n  \"search_params\": {\n    \"alpha\": 30.0,\n    \"initial_temperature\": 15.0,\n    \"final_temperature\": 0.005,\n    \"max_iterations\": 2000\n  },\n  \"model_params\": {\n    \"generation_temperature\": 0.8,\n    \"generation_top_p\": 0.95\n  },\n  \"system_params\": {\n    \"device\": \"cuda\",\n    \"seed\": 123\n  }\n}\n```\n\n### Environment Variables\n\nSet common parameters via environment variables:\n\n```bash\nexport ICM_MODEL=\"google/gemma-3-1b-it\"\nexport ICM_DEVICE=\"cuda\"\nexport ICM_LOG_LEVEL=\"INFO\"\n```\n\n## Python API\n\n### Basic Usage\n\n```python\nfrom icm import ICMSearcher, load_icm_dataset\n\n# Load dataset\ndataset = load_icm_dataset(\"truthful_qa\", task_type=\"truthfulqa\")\n\n# Create searcher\nsearcher = ICMSearcher(\n    model_name=\"google/gemma-3-1b-it\",\n    alpha=50.0,\n    max_iterations=1000\n)\n\n# Run ICM search\nresult = searcher.search(dataset, max_examples=100)\n\n# Access results\nprint(f\"Generated {len(result.labeled_examples)} labeled examples\")\nprint(f\"Final score: {result.score:.4f}\")\n```\n\n### Advanced Usage\n\n```python\nfrom icm import ICMSearcher, ICMDataset, ICMExample\nfrom icm.consistency import LogicalConsistencyChecker, MathConsistencyRule\n\n# Create custom dataset\nexamples = [\n    ICMExample(\"What is 2+2?\", {\"category\": \"math\"}),\n    ICMExample(\"What is 3+3?\", {\"category\": \"math\"})\n]\ndataset = ICMDataset(examples)\n\n# Custom consistency checker\nchecker = LogicalConsistencyChecker([MathConsistencyRule()])\n\n# Advanced searcher\nsearcher = ICMSearcher(\n    model_name=\"google/gemma-3-1b-it\",\n    alpha=30.0,\n    initial_temperature=20.0,\n    consistency_checker=checker,\n    seed=42\n)\n\nresult = searcher.search(dataset)\n```\n\n### Storage and Export\n\n```python\nfrom icm.storage import ICMStorage\nfrom icm.exporters import ICMExporter\n\n# Save results\nstorage = ICMStorage(\"my_results\")\nstorage.save_result(result, \"experiment_1\")\n\n# Export to DPO format\nexporter = ICMExporter(storage)\nexporter.export_to_dpo_format(\n    result.labeled_examples,\n    \"training_data.jsonl\"\n)\n\n# Push to Hugging Face\nexporter.export_to_huggingface(\n    result.labeled_examples,\n    repo_id=\"username/my-icm-dataset\",\n    task_type=\"classification\",\n    model_name=\"google/gemma-3-1b-it\"\n)\n```\n\n## Examples\n\n### Generate Math Dataset\n\n```bash\n# Create synthetic math dataset\nicm run --model google/gemma-3-1b-it --synthetic math --synthetic-size 500 --max-iterations 500\n\n# Use real GSM8K dataset  \nicm run --model google/gemma-3-1b-it --dataset gsm8k --task-type gsm8k --max-examples 200\n```\n\n### Comparison Tasks\n\n```bash\n# Generate preference dataset\nicm run --model google/gemma-3-1b-it --dataset anthropic/hh-rlhf --task-type comparison --alpha 30.0\n```\n\n### Export and Use\n\n```bash\n# Export to DPO format for training\nicm export --input-path results.jsonl --output-path dpo_data.jsonl --format dpo --create-pairs\n\n# Export analysis report\nicm export --input-path results.jsonl --output-path analysis.json --format analysis --include-examples\n```\n\n## Troubleshooting\n\n### Common Issues\n\n**CUDA Out of Memory:**\n```bash\n# Use smaller model, MPS (Apple Silicon), or CPU\nicm run --model google/gemma-3-1b-it --device cpu\n# or on Apple Silicon:\nicm run --model google/gemma-3-1b-it --device mps\n```\n\n**Model Loading Errors:**\n```bash\n# Verify model name and check internet connection\nicm run --model google/gemma-3-1b-it --log-level DEBUG\n```\n\n**Poor Quality Results:**\n```bash\n# Increase alpha or iterations\nicm run --model your-model --alpha 100.0 --max-iterations 2000\n```\n\n**Dataset Configuration Errors:**\n```bash\n# ICM now auto-detects both config and split for known datasets\n# TruthfulQA: automatically uses config='multiple_choice' and split='validation'\n# GSM8K: automatically uses config='main' and split='train'\n\n# Your commands should work automatically:\nicm run --model google/gemma-3-1b-it --dataset truthful_qa --task-type truthfulqa\nicm run --model google/gemma-3-1b-it --dataset gsm8k --task-type gsm8k\n\n# Or specify manually if needed:\nicm run --model google/gemma-3-1b-it --dataset truthful_qa --config multiple_choice --split validation --task-type truthfulqa\nicm run --model google/gemma-3-1b-it --dataset gsm8k --config main --task-type gsm8k\n```\n\n**Memory Usage Issues:**\n```bash\n# ICM uses memory-efficient sampling to handle large datasets\n# If you still encounter memory issues, reduce the dataset size:\nicm run --model google/gemma-3-1b-it --dataset large-dataset --max-examples 50\n\n# Or use a smaller model:\nicm run --model distilgpt2 --dataset your-dataset --max-examples 100\n```\n\n### Debug Mode\n\nEnable detailed logging:\n\n```bash\nicm run --model google/gemma-3-1b-it --dataset your-data --log-level DEBUG --log-file debug.log\n```\n\n### Development Setup\n\n```bash\ngit clone https://github.com/codelion/icm.git\ncd icm\npip install -e \".[dev]\"\n```\n\n### Running Tests\n\n```bash\npytest tests/\n```\n\n## Citation\n\nIf you use ICM in your research, please cite:\n\n```bibtex\n@software{icm,\n  title = {ICM: Internal Coherence Maximization},\n  author = {Asankhaya Sharma},\n  year = {2025},\n  publisher = {GitHub},\n  url = {https://github.com/codelion/icm}\n}\n```\n\n## Related Work\n\n- **Eliciting Fine-Tuned Transformer Capabilities**: [Paper](https://arxiv.org/abs/2506.08060)\n- **Weak-to-Strong Generalization**: [Paper](https://arxiv.org/abs/2312.09390)\n- **Constitutional AI**: [Paper](https://arxiv.org/abs/2212.08073) \n- **Discovering Latent Knowledge**: [Paper](https://arxiv.org/abs/2212.03827)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodelion%2Ficm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodelion%2Ficm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodelion%2Ficm/lists"}