{"id":31078316,"url":"https://github.com/ericflo/icm","last_synced_at":"2025-09-16T08:51:17.632Z","repository":{"id":299555322,"uuid":"1002584709","full_name":"ericflo/icm","owner":"ericflo","description":"Implementation of Internal Coherence Maximization (ICM) algorithm","archived":false,"fork":false,"pushed_at":"2025-06-17T06:23:49.000Z","size":44,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-17T06:26:27.557Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ericflo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-15T19:20:26.000Z","updated_at":"2025-06-17T05:57:28.000Z","dependencies_parsed_at":"2025-06-17T06:37:16.977Z","dependency_job_id":null,"html_url":"https://github.com/ericflo/icm","commit_stats":null,"previous_names":["ericflo/icm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ericflo/icm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericflo%2Ficm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericflo%2Ficm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericflo%2Ficm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericflo%2Ficm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ericflo","download_url":"https://codeload.github.com/ericflo/icm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericflo%2Ficm/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275391313,"owners_count":25456316,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-16T02:00:10.229Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-16T08:51:16.499Z","updated_at":"2025-09-16T08:51:17.619Z","avatar_url":"https://github.com/ericflo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Internal Coherence Maximization (ICM) Implementation\n\nA PyTorch implementation of the Internal Coherence Maximization algorithm from the paper \"Unsupervised Elicitation of Language Models\" by Wen et al. (2025). This implementation supports both vLLM and transformers backends for efficient inference.\n\n## Overview\n\nICM is an unsupervised algorithm that fine-tunes pretrained language models on their own generated labels without external supervision. It works by:\n\n1. **Mutual Predictability**: Finding labels where the model can infer each label from all others\n2. **Logical Consistency**: Enforcing task-specific consistency constraints\n3. **Simulated Annealing**: Iteratively improving the label set using temperature-based acceptance\n\n## Features\n\n- 🚀 **Dual Backend Support**: Optimized vLLM backend for production, transformers for compatibility\n- 🔧 **Modular Design**: Easily extensible components for different tasks\n- 📊 **Built-in Tasks**: Support for truthfulness, math correctness, and comparison tasks\n- 🧪 **Comprehensive Testing**: Unit tests and integration tests included\n- 📈 **Performance Tracking**: Detailed metrics and experiment logging\n- 🌍 **Real Data Support**: Run experiments on actual datasets (TruthfulQA, GSM8K, HH-RLHF)\n- 🤖 **Unsupervised Learning**: No labels needed - ICM discovers patterns automatically\n\n## Installation\n\n### Requirements\n\n- Python 3.9+\n- PyTorch 2.0+\n- CUDA-capable GPU (recommended)\n- uv (for package management)\n\n### Installing uv\n\nFirst, install uv if you haven't already:\n\n```bash\n# On macOS and Linux\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# On Windows\npowershell -c \"irm https://astral.sh/uv/install.ps1 | iex\"\n\n# Or with pip\npip install uv\n```\n\n### Basic Installation\n\n```bash\n# Create and activate a virtual environment\nuv venv\nsource .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n\n# Install the package in editable mode with core dependencies\nuv pip install -e .\n\n# For vLLM backend (recommended for performance)\nuv pip install -e \".[vllm]\"\n\n# For all dependencies including development tools\nuv pip install -e \".[all]\"\n```\n\n### Docker Installation (Recommended)\n\n```dockerfile\nFROM pytorch/pytorch:2.5.1-cuda12.1-cudnn9-devel\n\nRUN pip install --upgrade pip \u0026\u0026 \\\n    pip install vllm==0.7.0 transformers\u003e=4.51.0 \\\n    tqdm numpy pandas psutil\n```\n\n## Quick Start\n\n### 1. Basic Usage\n\n```python\n# No need to activate venv when using uv run!\n# Save this as quick_test.py and run with: uv run quick_test.py\n\nfrom icm_implementation import ICM, ICMConfig, create_truthfulness_dataset\n\n# Create dataset\ndata = [\n    (\"Is the Earth round?\", \"Yes, the Earth is spherical\", None),\n    (\"Is the Earth flat?\", \"No, the Earth is round\", None),\n    (\"Is 2+2=4?\", \"Yes, 2+2 equals 4\", None),\n    (\"Is 2+2=5?\", \"No, 2+2 equals 4, not 5\", None),\n]\n\ndataset = create_truthfulness_dataset(data)\n\n# Configure ICM\nconfig = ICMConfig(\n    model_name=\"Qwen/Qwen3-4B\",  # or any HF model\n    backend=\"auto\",  # uses vLLM if available\n    initial_examples=2,\n    alpha=50.0\n)\n\n# Run ICM\nicm = ICM(config)\nlabeled_data = icm.run(dataset)\n\n# Check results\nfor data_point, label in labeled_data:\n    print(f\"Input: {data_point.input_text}\")\n    print(f\"Label: {config.label_names[label]}\\n\")\n```\n\n### 2. Running Experiments\n\n```bash\n# Run all tasks with default model\nuv run icm_examples.py --task all\n\n# Run specific task with custom model\nuv run icm_examples.py --task math --model meta-llama/Llama-3.2-1B\n\n# Quick test with small dataset\nuv run icm_examples.py --task truthfulness --small\n\n# Compare backends\nuv run icm_examples.py --compare-backends\n```\n\n### 3. Custom Tasks\n\n```python\nfrom icm_implementation import ICM, ICMConfig, DataPoint, LogicalConsistency\n\nclass CustomConsistency(LogicalConsistency):\n    def check_consistency(self, x_i, y_i, x_j, y_j):\n        # Implement your consistency logic\n        return True  # or False based on your constraints\n\n# Create custom dataset\ndataset = [\n    DataPoint(\n        id=i,\n        input_text=\"Your task-specific input\",\n        metadata={\"custom_field\": value}\n    )\n    for i, value in enumerate(your_data)\n]\n\n# Run with custom consistency\nconfig = ICMConfig(num_labels=3, label_names=[\"A\", \"B\", \"C\"])\nicm = ICM(config)\nicm.consistency_checker = CustomConsistency()\nresults = icm.run(dataset)\n```\n\n## Architecture\n\n### Core Components\n\n1. **ModelBackend**: Abstract interface for model inference\n   - `VLLMBackend`: High-performance batch inference\n   - `TransformersBackend`: Compatible with any HuggingFace model\n\n2. **LogicalConsistency**: Handles task-specific consistency checking\n   - General consistency (default)\n   - Asymmetry consistency (for comparisons)\n   - Math correctness consistency\n\n3. **ICM Algorithm**: Main algorithm implementation\n   - Simulated annealing with temperature scheduling\n   - Consistency fixing subroutine\n   - Score calculation and tracking\n\n### Configuration Options\n\n```python\n@dataclass\nclass ICMConfig:\n    # Model settings\n    model_name: str = \"Qwen/Qwen3-4B\"\n    backend: str = \"auto\"  # \"vllm\", \"transformers\", or \"auto\"\n    \n    # Algorithm parameters\n    initial_examples: int = 8        # K in the paper\n    initial_temperature: float = 10.0  # T_0\n    final_temperature: float = 0.01    # T_min\n    cooling_rate: float = 0.99         # β\n    alpha: float = 50.0                # Mutual predictability weight\n    \n    # Inference settings\n    max_context_length: int = 32768\n    max_new_tokens: int = 64\n    temperature: float = 0.1\n    top_p: float = 0.95\n```\n\n## Supported Tasks\n\n### 1. Truthfulness (TruthfulQA-style)\n```python\ndataset = create_truthfulness_dataset([\n    (question, claim, is_true),  # is_true can be None\n    ...\n])\n```\n\n### 2. Mathematical Correctness (GSM8K-style)\n```python\ndataset = create_math_correctness_dataset([\n    (problem, solution, answer, is_correct),\n    ...\n])\n```\n\n### 3. Comparison (Alpaca-style)\n```python\ndataset = create_comparison_dataset([\n    (query, response_a, response_b, a_is_better),\n    ...\n])\n```\n\n## Performance Optimization\n\n### Memory Management\n- Use smaller `max_context_length` for limited GPU memory\n- Adjust `initial_examples` based on dataset size\n- Use `backend=\"transformers\"` with CPU for testing\n\n### Speed Optimization\n- Use vLLM backend for 5-10x speedup\n- Batch size is automatically optimized\n- Reduce `max_iterations` for faster results\n\n### Model Selection\n- Qwen3-4B: Best balance of performance and efficiency\n- Qwen3-1.7B: For resource-constrained environments\n- Llama-3.2-1B: Alternative lightweight option\n\n## Testing\n\n```bash\n# Run all tests\nuv run icm_test_suite.py\n\n# Run specific test class\nuv run python -m unittest icm_test_suite.TestLogicalConsistency\n\n# Run with verbose output\nuv run icm_test_suite.py -v\n\n# Or use pytest if you have dev dependencies installed\nuv run pytest icm_test_suite.py -v\n```\n\n## Running Experiments on Real Data\n\nICM includes a powerful experiment runner that works with real datasets from Hugging Face. You can evaluate ICM's unsupervised learning capabilities on actual benchmarks without any labeled data.\n\n### Available Tasks\n\n1. **Truthfulness (TruthfulQA)** - Evaluate factual accuracy of claims\n2. **Math Correctness (GSM8K)** - Verify mathematical problem solutions  \n3. **Comparison (HH-RLHF)** - Learn preferences between responses\n\n### Basic Usage\n\n```bash\n# Run on a single task\nuv run run_experiments.py --task truthfulness\n\n# Run on all tasks\nuv run run_experiments.py --task all\n\n# Customize model and sample size\nuv run run_experiments.py --task math --model Qwen/Qwen3-4B --sample-size 100\n\n# Control iterations\nuv run run_experiments.py --task comparison --max-iterations 200\n```\n\n### Example Commands\n\n```bash\n# Quick test with small model\nuv run run_experiments.py --task math --model Qwen/Qwen2.5-0.5B --sample-size 20\n\n# Full experiment with Qwen3-4B\nuv run run_experiments.py --task all --model Qwen/Qwen3-4B --sample-size 50\n\n# Large-scale truthfulness evaluation\nuv run run_experiments.py --task truthfulness --sample-size 200 --max-iterations 400\n```\n\n### How It Works\n\nThe experiment runner:\n1. **Loads real data** from Hugging Face datasets (TruthfulQA, GSM8K, HH-RLHF)\n2. **Formats data** into question-claim pairs suitable for ICM\n3. **Runs ICM algorithm** to label data without supervision\n4. **Enforces consistency** using task-specific logical constraints\n5. **Saves detailed results** including metrics, labels, and score history\n\n### Output\n\nResults are saved to `icm_results/` with filenames like:\n```\nREAL_truthfulness_Qwen_Qwen3-4B_20250615_120000.json\n```\n\nEach result file contains:\n- Full configuration used\n- Final metrics (score, mutual predictability, inconsistencies)\n- All labeled examples with model's predictions\n- Score history for analysis\n- Runtime and acceptance rate statistics\n\n### Task-Specific Details\n\n**Truthfulness (TruthfulQA)**\n- Tests ability to distinguish true/false claims\n- Uses questions from TruthfulQA validation set\n- No specific consistency constraints\n\n**Math Correctness (GSM8K)**\n- Verifies correct vs incorrect math solutions\n- Enforces mathematical consistency: same problem can't have different correct answers\n- Creates deliberate wrong answers for contrastive learning\n\n**Comparison (HH-RLHF)**\n- Learns preferences between helpful/harmless responses\n- Uses Anthropic's HH-RLHF dataset\n- Enforces asymmetry: if A\u003eB then B cannot be \u003eA\n\n## Experiment Tracking\n\nResults are automatically saved to `icm_results/` with:\n- Detailed JSON logs for each experiment\n- Summary CSV with key metrics  \n- Score history and acceptance rates\n- Full labeled datasets for analysis\n\n## Limitations\n\n1. **Context Length**: Limited by model's context window for in-context examples\n2. **Concept Salience**: Only works for concepts the model already understands\n3. **Compute Requirements**: Requires multiple forward passes per label\n\n## Citation\n\nIf you use this implementation, please cite the original paper:\n\n```bibtex\n@article{wen2025unsupervised,\n  title={Unsupervised Elicitation of Language Models},\n  author={Wen, Jiaxin and others},\n  journal={arXiv preprint arXiv:2505.15134},\n  year={2025}\n}\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **CUDA Out of Memory**\n   ```python\n   config.max_context_length = 4096  # Reduce context\n   config.backend = \"transformers\"   # Use CPU\n   ```\n\n2. **vLLM Import Error**\n   ```bash\n   # Install with specific CUDA version\n   uv pip install vllm --index-url https://download.pytorch.org/whl/cu121\n   ```\n\n3. **Slow Performance**\n   - Ensure vLLM backend is being used\n   - Check GPU utilization with `nvidia-smi`\n   - Reduce dataset size or max_iterations\n\n## Contributing\n\nContributions are welcome! Areas for improvement:\n- Additional consistency types\n- Support for more model architectures\n- Multi-GPU support\n- Additional evaluation metrics\n\n## License\n\nThis implementation is provided for research purposes. Please ensure you comply with the licenses of the models you use (Qwen3, Llama, etc.).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericflo%2Ficm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fericflo%2Ficm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericflo%2Ficm/lists"}