{"id":28401453,"url":"https://github.com/peremartra/optipfair","last_synced_at":"2026-02-24T21:31:19.512Z","repository":{"id":287619699,"uuid":"964838278","full_name":"peremartra/optipfair","owner":"peremartra","description":"Structured pruning and bias visualization for Large Language Models. Tools for LLM optimization and fairness analysis.","archived":false,"fork":false,"pushed_at":"2026-01-10T19:08:40.000Z","size":12414,"stargazers_count":26,"open_issues_count":5,"forks_count":8,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-11T05:33:46.588Z","etag":null,"topics":["activation-analysis","bias-detection","bias-mitigation","fairness","large-language-models","llm","model-pruning","python","structured-pruning","transformers"],"latest_commit_sha":null,"homepage":"https://peremartra.github.io/optipfair/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/peremartra.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/roadmap.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-04-11T21:54:47.000Z","updated_at":"2026-01-10T19:08:25.000Z","dependencies_parsed_at":"2025-06-14T20:23:03.901Z","dependency_job_id":"32f48f29-92c5-4203-9b4f-97e057b9084e","html_url":"https://github.com/peremartra/optipfair","commit_stats":null,"previous_names":["peremartra/optipfair"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/peremartra/optipfair","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peremartra%2Foptipfair","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peremartra%2Foptipfair/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peremartra%2Foptipfair/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peremartra%2Foptipfair/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/peremartra","download_url":"https://codeload.github.com/peremartra/optipfair/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peremartra%2Foptipfair/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29801021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-24T21:02:39.706Z","status":"ssl_error","status_checked_at":"2026-02-24T21:02:21.834Z","response_time":75,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["activation-analysis","bias-detection","bias-mitigation","fairness","large-language-models","llm","model-pruning","python","structured-pruning","transformers"],"created_at":"2025-06-01T12:38:36.646Z","updated_at":"2026-02-24T21:31:19.506Z","avatar_url":"https://github.com/peremartra.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# optipfair\n\n\u003cdiv align=\"center\"\u003e\n\n  \u003cimg src=\"images/optiPfair.png\" alt=\"optipfair Logo\" width=\"600\"/\u003e\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ch1\u003eoptipfair\u003c/h1\u003e\n  \u003cstrong\u003eThe Python library for making LLMs both efficient (via pruning) and fair (via bias analysis).\u003c/strong\u003e\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.org/project/optipfair/\"\u003e\u003cimg alt=\"PyPI Version\" src=\"https://img.shields.io/pypi/v/optipfair?color=blue\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/optipfair/\"\u003e\u003cimg alt=\"Downloads\" src=\"https://img.shields.io/pypi/dm/optipfair?color=orange\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/peremartra/optipfair/blob/main/LICENSE\"\u003e\u003cimg alt=\"License\" src=\"https://img.shields.io/github/license/peremartra/optipfair?color=green\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/peremartra/optipfair/stargazers\"\u003e\u003cimg alt=\"GitHub Stars\" src=\"https://img.shields.io/github/stars/peremartra/optipfair?style=social\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n    \u003ch3\u003e\n        \u003ca href=\"https://peremartra.github.io/optipfair/\" target=\"_blank\"\u003eDocumentation\u003c/a\u003e\n        ·\n        \u003ca href=\"https://github.com/peremartra/optipfair/issues\" target=\"_blank\"\u003eReport Bug\u003c/a\u003e\n        ·\n        \u003ca href=\"https://github.com/peremartra/optipfair/issues\" target=\"_blank\"\u003eRequest Feature\u003c/a\u003e\n    \u003c/h3\u003e\n\u003c/div\u003e\n\n---\n\u003e **New to optipfair?** Use our [LLM Reference Manual](optipfair_llm_reference_manual.txt) - paste it into ChatGPT, Claude or your Favourite LLM for guided assistance with any optipfair task.\n\n\u003e **Note on Terminology:** The default neuron selection method is **PPM (Peak-to-Peak Magnitude)**, which calculates neuron importance based on the full dynamic range of weights (max + |min|). This method is formally described in: *Martra, P. (2025). Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2. ArXiv. https://arxiv.org/abs/2512.22671*. For backward compatibility, the parameter value `\"MAW\"` is still accepted and maps to PPM.\n### 🚀 Interactive Demos: Try optipfair NOW\n\nExperience optipfair's capabilities directly in your browser.\n\n| Live Bias Visualization Demo |\n| :--------------------------: |\n| Analyze any compatible model from Hugging Face with a full UI. No setup required. |\n| **[🚀 Launch the Live Demo on HF Spaces](https://huggingface.co/spaces/oopere/optipfair-bias-analyzer)** |\n\n#### Tutorials on Google Colab\n\nExplore optipfair’s features with these interactive notebooks.\n\n| Tutorial | Description | Link |\n| :--- | :--- | :---: |\n| **Depth Pruning** | Learn how to remove entire transformer layers from models like Llama-3. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/peremartra/optipfair/blob/main/examples/depth_pruning.ipynb) |\n| **Layer Importance** | Identify which transformer layers contribute the least to your model. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/peremartra/optipfair/blob/main/examples/layer_importance_analysis.ipynb) |\n| **Pruning Compatibility** | Check if your model's architecture can be pruned by optipfair. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/peremartra/optipfair/blob/main/examples/pruning_compatibility_check.ipynb) |\n| **Bias Compatibility** | The coder's alternative to our live demo for bias analysis. | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/peremartra/optipfair/blob/main/examples/bias_compatibility_check.ipynb) |\n\n---\n### ✅ Why optipfair?\n\noptipfair is more than just another pruning library. It's a toolkit designed for the modern AI developer who cares about both performance and responsibility.\n\n* **Efficiency \u0026 Fairness in One Place**: Stop juggling tools. optipfair is the only library designed to integrate structured pruning with powerful, intuitive bias visualization and analysis.\n\n* **Dual Pruning Strategies**: optipfair supports both **Width Pruning** (removing neurons from MLP layers) and **Depth Pruning** (removing entire transformer layers), giving you flexible control over the efficiency-performance trade-off.\n\n* **Optimized for Modern Architectures**: We focus on what works now. The library is specialized for GLU-based models like LLaMA, Mistral, Gemma, and Qwen, ensuring relevant and effective pruning.\n\n* **Go Beyond Numbers with Bias Visualization**: Don't just get a bias score. Our visualization tools (PCA, heatmaps, mean differences) help you *understand* how and where your model encodes bias, enabling more effective mitigation.\n\n* **🤖 AI-Assisted Development**: Accelerate your workflow using the included [`LLM Reference Manual`](https://github.com/peremartra/optipfair/blob/main/optipfair_llm_reference_manual.txt). Provide it to your favorite LLM (ChatGPT, Claude) to get expert-level help and generate integration code instantly.\n\u003cp align=\"center\"\u003e\n      \u003cimg src=\"images/optipfair_llmflow.gif\" alt=\"AI Pair Programming with optipfair\" width=\"500\"/\u003e\n\u003c/p\u003e\n\n* **🔬 Backed by Research**: Our methods aren't arbitrary. They are built upon and validated by ongoing applied research in model optimization and fairness analysis.\n\n---\n### ⚙️ Installation\nChoose the installation method that best suits your needs. For bias visualization features, you'll need the [viz] extra.\n**Standard Installation**\nFor core pruning functionality:\n```python\npip install optipfair\n```\n\n**Full Installation (with Bias Visualization)**\nTo use the bias analysis and visualization tools, install with the [viz] extra dependencies:\n```python\npip install \"optipfair[viz]\"\n```\n\n**Developer Installation**\nTo install from the source for contributing or development:\n```bash\ngit clone https://github.com/peremartra/optipfair.git\ncd optipfair\npip install -e .\n```\n---\n## ⚡ Quick Start\n\nSee how to use optipfair's core features in just a few lines of code.\n\n### Pruning with the Python API\n\nPrune 20% of the MLP neurons from a model using the Peak-to-Peak Magnitude (PPM) method.\n\n```python\nfrom transformers import AutoModelForCausalLM\nimport optipfair as opf\n\n# Load a pre-trained model\nmodel = AutoModelForCausalLM.from_pretrained(\"meta-llama/Llama-3.2-1B\")\n\n# Prune 20% of neurons from MLP layers\npruned_model, stats = opf.prune_model(\n    model=model,\n    pruning_type=\"MLP_GLU\",\n    neuron_selection_method=\"MAW\",\n    pruning_percentage=20,\n    expansion_divisor=None,  # Optional: round to divisor (32, 64, 128, 256)\n    show_progress=True,\n    return_stats=True\n)\n\n# Print pruning statistics\nprint(f\"Original parameters: {stats['original_parameters']:,}\")\nprint(f\"Pruned parameters: {stats['pruned_parameters']:,}\")\nprint(f\"Reduction: {stats['reduction']:,} parameters ({stats['percentage_reduction']:.2f}%)\")\n\n# Save the pruned model\npruned_model.save_pretrained(\"./pruned-llama-model\")\n```\nThe pruning process yields tangible results in model size and performance. Here's a sample comparison for **Llama-3.2-1B** after pruning 20% of its MLP neurons:\n\n| Metric | Original Model | Pruned Model | Improvement |\n| :--- | :---: | :---: | :---: |\n| **Total Parameters** | 1.24B | 1.07B | **-13.03%** |\n| **Inference Speed** | *Benchmark in progress* | *Benchmark in progress* | *Coming soon* |\n| **MMLU Score** | *Benchmark in progress* | *Benchmark in progress* | *Minimal change expected* |\n\n*Results based on the [PPM pruning method](#neuron-selection-methods) (parameter `\"MAW\"`). Full benchmark results will be published shortly.*\n\n### Data-Driven Width Pruning (NEW in v0.2.0)\n\nEnhance pruning decisions with activation statistics from calibration data. This hybrid approach combines weight magnitudes with real data patterns for more intelligent neuron selection.\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom torch.utils.data import DataLoader, TensorDataset\nimport torch\nimport optipfair as opf\n\n# Load model and tokenizer\nmodel = AutoModelForCausalLM.from_pretrained(\"meta-llama/Llama-3.2-1B\")\ntokenizer = AutoTokenizer.from_pretrained(\"meta-llama/Llama-3.2-1B\")\ntokenizer.pad_token = tokenizer.eos_token\n\n# Prepare calibration data (use your domain-specific dataset)\ntexts = [\n    \"Your domain-specific text here...\",\n    \"More examples from your use case...\",\n    # Add 100-1000 samples for best results\n]\n\ninputs = tokenizer(texts, return_tensors=\"pt\", padding=True, truncation=True, max_length=512)\ndataset = TensorDataset(inputs['input_ids'], inputs['attention_mask'])\ndataloader = DataLoader(dataset, batch_size=8)\n\n# Prune with data-driven importance calculation\npruned_model, stats = opf.prune_model(\n    model=model,\n    neuron_selection_method=\"MAW\",  # Only PPM (parameter \"MAW\") supports data-driven pruning\n    pruning_percentage=20,\n    dataloader=dataloader,  # ← Enables hybrid pruning\n    show_progress=True,\n    return_stats=True\n)\n\nprint(f\"Reduction: {stats['reduction']:,} parameters ({stats['percentage_reduction']:.2f}%)\")\npruned_model.save_pretrained(\"./pruned-datadriven-model\")\n```\n\n**Key Benefits:**\n- 📊 **Better Preservation**: Keeps neurons important for your specific use case\n- 🎯 **Domain Adaptation**: Use calibration data from your target domain\n- 🔬 **Research-Backed**: Based on CFSP methodology (arXiv:2409.13199v2)\n- ⚡ **Easy Integration**: Just add a dataloader - no other changes needed\n\n**Note:** Data-driven pruning is currently only available with `neuron_selection_method=\"MAW\"` (PPM method). Using a dataloader with \"VOW\" or \"PON\" will raise a `ValueError`.\n\n### Selective Layer Width Pruning (NEW in v0.2.0)\n\nPrune neurons only in specific layers while leaving others unchanged. Perfect for preserving critical layers or implementing layer-specific optimization strategies.\n\n```python\nfrom transformers import AutoModelForCausalLM\nimport optipfair as opf\n\n# Load a pre-trained model\nmodel = AutoModelForCausalLM.from_pretrained(\"meta-llama/Llama-3.2-1B\")\n\n# Prune neurons only in specific layers (e.g., middle layers)\npruned_model, stats = opf.prune_model(\n    model=model,\n    pruning_type=\"MLP_GLU\",\n    neuron_selection_method=\"MAW\",\n    pruning_percentage=30,\n    layer_indices=[5, 10, 15, 20, 25],  # Only prune these layers\n    show_progress=True,\n    return_stats=True\n)\n\n# Print pruning statistics\nprint(f\"Pruned {stats['pruned_layers']} of {stats['total_layers']} layers\")\nprint(f\"Total reduction: {stats['reduction']:,} parameters ({stats['percentage_reduction']:.2f}%)\")\n\n# Save the pruned model\npruned_model.save_pretrained(\"./selective-pruned-llama\")\n```\n\n**Key Benefits:**\n- 🎯 **Precision Control**: Choose exactly which layers to optimize\n- 🛡️ **Preserve Critical Layers**: Keep first and last layers at full capacity\n- 🔬 **Data-Driven Selection**: Combine with layer importance analysis\n- ⚡ **Full Compatibility**: Works with all MLP_GLU features (expansion_rate, expansion_divisor, dataloader)\n\n**Use Cases:**\n- Preserve embedding and output layers while pruning middle layers\n- Target specific layer ranges based on importance analysis\n- Implement asymmetric pruning strategies for domain adaptation\n- Experiment with different layer-wise pruning patterns\n\n### Hardware-Optimized Pruning with expansion_divisor (NEW in v0.2.0)\n\nThe `expansion_divisor` parameter ensures that intermediate layer sizes are divisible by specific values (32, 64, 128, or 256), optimizing performance on modern GPUs and TPUs.\n\n```python\nfrom transformers import AutoModelForCausalLM\nimport optipfair as opf\n\n# Load model\nmodel = AutoModelForCausalLM.from_pretrained(\"meta-llama/Llama-3.2-1B\")\n\n# Prune with hardware optimization\npruned_model, stats = opf.prune_model(\n    model=model,\n    neuron_selection_method=\"MAW\",\n    pruning_percentage=20,\n    expansion_divisor=128,  # Round intermediate size to multiple of 128\n    show_progress=True,\n    return_stats=True\n)\n\nprint(f\"Intermediate size is divisible by 128: {stats['expansion_rate']}\")\npruned_model.save_pretrained(\"./pruned-optimized-model\")\n```\n\n**Key Benefits:**\n- 🚀 **Better GPU Performance**: Optimized memory access patterns\n- ⚡ **Tensor Core Efficiency**: Multiples of 128/256 leverage modern GPU architectures\n- 🎯 **Flexible**: Works with both `pruning_percentage` and `expansion_rate`\n- 🔧 **Easy to Use**: Just add one parameter to existing code\n\n**Valid Values:** `None` (default, no rounding), `32`, `64`, `128`, `256`\n\n**Note:** Cannot be used alone—requires either `pruning_percentage` or `expansion_rate`.\n\n### Selective Layer Width Pruning (NEW in v0.2.0)\n\nPrune neurons only in specific layers while leaving others unchanged. Perfect for preserving critical layers or implementing layer-specific optimization strategies.\n\n```python\nfrom transformers import AutoModelForCausalLM\nimport optipfair as opf\n\n# Load a pre-trained model\nmodel = AutoModelForCausalLM.from_pretrained(\"meta-llama/Llama-3.2-1B\")\n\n# Prune neurons only in specific layers (e.g., middle layers)\npruned_model, stats = opf.prune_model(\n    model=model,\n    pruning_type=\"MLP_GLU\",\n    neuron_selection_method=\"MAW\",\n    pruning_percentage=30,\n    layer_indices=[5, 10, 15, 20, 25],  # Only prune these layers\n    show_progress=True,\n    return_stats=True\n)\n\n# Print pruning statistics\nprint(f\"Pruned {stats['pruned_layers']} of {stats['total_layers']} layers\")\nprint(f\"Total reduction: {stats['reduction']:,} parameters ({stats['percentage_reduction']:.2f}%)\")\n\n# Save the pruned model\npruned_model.save_pretrained(\"./selective-pruned-llama\")\n```\n\n**Key Benefits:**\n- 🎯 **Precision Control**: Choose exactly which layers to optimize\n- 🛡️ **Preserve Critical Layers**: Keep first and last layers at full capacity\n- 🔬 **Data-Driven Selection**: Combine with layer importance analysis\n- ⚡ **Full Compatibility**: Works with all MLP_GLU features (expansion_rate, expansion_divisor, dataloader)\n\n**Use Cases:**\n- Preserve embedding and output layers while pruning middle layers\n- Target specific layer ranges based on importance analysis\n- Implement asymmetric pruning strategies for domain adaptation\n- Experiment with different layer-wise pruning patterns\n\n### Pruning Transformer Layers (Depth Pruning)\n\nRemove entire layers from a model for significant efficiency gains. Here, we remove the last 4 layers.\n\n```python\nfrom transformers import AutoModelForCausalLM\nimport optipfair as opf\n\n# Load a pre-trained model\nmodel = AutoModelForCausalLM.from_pretrained(\"meta-llama/Llama-3.2-1B\")\n\n# Remove the last 4 transformer layers\npruned_model, stats = opf.prune_model(\n    model=model,\n    pruning_type=\"DEPTH\",\n    num_layers_to_remove=4,\n    layer_selection_method=\"last\", # Recommended for best performance retention\n    show_progress=True,\n    return_stats=True\n)\n\n# Print pruning statistics\nprint(f\"Original layers: {stats['original_layers']}\")\nprint(f\"Pruned layers: {stats['pruned_layers']}\")\nprint(f\"Reduction: {stats['reduction']:,} parameters ({stats['percentage_reduction']:.2f}%)\")\n\n# Save the pruned model\npruned_model.save_pretrained(\"./pruned-depth-llama-model\")\n```\n\n### Analyzing Layer Importance\nBefore performing Depth Pruning, you can analyze which layers are the most redundant. This function measures the cosine similarity between the input and output of each transformer layer—a higher similarity (lower distance score) suggests the layer has less impact on the final representation.\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom torch.utils.data import DataLoader, TensorDataset\nimport torch\nfrom optipfair import analyze_layer_importance\n\n# 1. Load a model and tokenizer\nmodel_name = \"distilgpt2\"\nmodel = AutoModelForCausalLM.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nif tokenizer.pad_token is None:\n    tokenizer.pad_token = tokenizer.eos_token\n\n# 2. Prepare a sample dataloader (this is the user's responsibility)\ndummy_texts = [\"This is a sample sentence.\" for _ in range(16)]\ninputs = tokenizer(dummy_texts, return_tensors=\"pt\", padding=True, truncation=True, max_length=128)\ndataset = TensorDataset(inputs['input_ids'], inputs['attention_mask'])\ndataloader = DataLoader(dataset, batch_size=8)\n\n# 3. Analyze layer importance\n# The function returns a dictionary {layer_index: importance_score}\n# where importance is 1 - cosine_similarity\nimportance_scores = analyze_layer_importance(model, dataloader)\n\n# 4. Print the results, sorted by importance (less important first)\nsorted_layers = sorted(importance_scores.items(), key=lambda item: item[1])\nprint(\"Layer importance scores (lower is less important):\")\nfor layer_idx, score in sorted_layers:\n    print(f\"  Layer {layer_idx}: {score:.4f}\")\n\n# The layers with the lowest scores are potential candidates for pruning.\n```\n\n\n### Visualizing Bias with the Python API\nGenerate visualizations to analyze how a model's activations differ when processing prompts with varying demographic attributes.\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom optipfair.bias import visualize_bias\n\n# Load model and tokenizer\nmodel_name = \"meta-llama/Llama-3.2-1B\"\nmodel = AutoModelForCausalLM.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n\n# Define prompt pairs that differ in demographic attributes\nprompt_pairs = [\n    (\"The white man walked through the neighborhood. The police officer thought he\", \n     \"The Black man walked through the neighborhood. The police officer thought he\")\n]\n\n# Generate visualizations and get bias metrics\n_, metrics = visualize_bias(\n    model, \n    tokenizer,\n    prompt_pairs=prompt_pairs,\n    visualization_types=[\"mean_diff\", \"pca\"],\n    layers=\"first_middle_last\",\n    output_dir=\"./bias_analysis\"\n)\n\n# Print overall bias metrics\noverall = metrics[\"pair_1\"][\"metrics\"][\"overall_metrics\"]\nprint(f\"Mean activation difference: {overall['mean_difference']:.6f}\")\n```\nThe code above generates the following types of visualizations, allowing for a deep dive into model fairness:\n| ![Mean Image Differences](/images/mean_image_differences.png) | ![activation_differences_layer.png](/images/activation_differences_layer.png) | ![pca_analysis.png](/images/pca_analysis.png)\n| ---| --- | --- |\n\n---\n## Dive Deeper: Features \u0026 Concepts\n\nFor those who want to understand the nuts and bolts of optipfair.\n\n### Supported Models\noptipfair is designed to work with transformer-based language models that use GLU architecture in their MLP layers. This includes most modern LLMs:\n\n* **LLaMA family** (LLaMA, LLaMA-2, LLaMA-3)\n* **Mistral** models\n* **Gemma** models\n* **Qwen** models\n* ... and other models with a similar GLU architecture.\n\n### Pruning Strategies: Neurons vs. Layers\n\noptipfair offers two powerful structured pruning strategies:\n\n1.  **MLP Pruning (Width Pruning)**: Reduces the number of neurons within the MLP layers of GLU-based models. This is a fine-grained approach to improve efficiency. You can control it via `pruning_percentage` or a target `expansion_rate`. It uses several neuron importance metrics:\n    * **PPM (Peak-to-Peak Magnitude)**: Default and most effective method (parameter `\"MAW\"` for backward compatibility).\n    * **VOW (Variance of Weights)**\n    * **PON (Product of Norms)**\n\n2.  **Depth Pruning (Layer Pruning)**: Removes entire transformer layers from the model. This is a more aggressive technique that can yield significant reductions in parameters and latency. It's universally compatible with most transformer architectures. You can define which layers to remove by:\n    * **Number**: `num_layers_to_remove=4`\n    * **Percentage**: `depth_pruning_percentage=25`\n    * **Specific Indices**: `layer_indices=[12, 13, 14, 15]`\n  \n### Understanding Model Internals: Layer Importance Analysis\nBefore deciding which layers to remove with Depth Pruning, you can assess their relative importance. optipfair provides a method based on the cosine similarity between a layer's input and output embeddings.\n\n* **How it works**: The analyze_layer_importance function passes data through the model and uses hooks to capture the input and output of each transformer layer. It then calculates a score based on 1 - cosine_similarity.\n* **Interpretation**: A low score indicates that a layer alters its input representation minimally. These layers are strong candidates for removal via Depth Pruning, as their impact on the model's overall function may be less critical. This analysis provides data-driven insights to guide your pruning strategy.\n\n---\n\n## 🗺️ Roadmap \u0026 Community\n\nThe optipfair project is actively developed. Here's what's planned for the future.\n\n### Future Roadmap\nOur goal is to make optipfair the go-to toolkit for efficient and fair model optimization. Key upcoming features include:\n\n* **Selective Layer Width Pruning**: Implemented in v0.2.0 ✓ - Prune neurons in specific layers using layer_indices\n* **Data-Driven Width Pruning**: Implemented in v0.2.0 ✓ - Hybrid importance with calibration data\n* **Hardware-Optimized Pruning**: Implemented in v0.2.0 ✓ - expansion_divisor for GPU optimization\n* **Attention Pruning**: Implementing Attention Bypass and Adaptive Attention Bypass(AAB).\n* **Advanced Benchmarks**: Integrating more comprehensive performance and evaluation benchmarks.\n* **GPU Optimizations**: Creating a v2.0 with significant GPU-specific optimizations for faster execution. \n* **Large-Scale Model Support**: Adding compatibility for DeepSpeed and FSDP to handle 70B+ models efficiently. \n\n### 🤝 Contributing\nContributions are welcome! Whether it's bug reports, feature requests, or code contributions, please check out our [contributing guidelines](CONTRIBUTING.md) to get started.\n\n### Citation\nIf you use optipfair in your research or projects, please cite the library:\n\n```bibtex\n@misc{Martra2024optipfair,\n  author = {Martra, Pere},\n  title = {{optipfair: A Library for Structured Pruning and Bias Visualization of Large Language Models}},\n  year = {2024},\n  howpublished = {GitHub Repository},\n  url = {https://github.com/peremartra/optipfair},\n  note = {Versión 0.2.0, accedido 14 Noviembre 2025}\n}\n```\n### License\nThis project is licensed under the Apache 2.0 License. See the LICENSE file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperemartra%2Foptipfair","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fperemartra%2Foptipfair","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fperemartra%2Foptipfair/lists"}