https://github.com/abdellatif-laghjaj/lcm
https://github.com/abdellatif-laghjaj/lcm
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/abdellatif-laghjaj/lcm
- Owner: abdellatif-laghjaj
- License: mit
- Created: 2025-05-03T10:31:18.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-05-06T13:39:41.000Z (6 months ago)
- Last Synced: 2025-05-11T18:16:37.882Z (6 months ago)
- Language: Python
- Size: 179 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Large Concept Model (LCM) - Practical Implementation
This repository contains the implementation of my technical report paper "Towards Practical Concept-Based Language Models: An Efficiency-Focused Implementation". Our work demonstrates significant efficiency improvements in language processing through concept-based approaches.
## Key Features
- 🚀 3.8× faster inference through sentence-level processing
- 📉 Linear memory scaling (O(n)) for long sequences
- 🌍 Multilingual support with minimal performance drop
- 💡 Adaptive concept quantization
- 🔄 Hybrid attention mechanism
- 📊 Geometric regularization for semantic fidelity
- ⚡ Diffusion-based concept refinement (NEW)
- 🏗️ Two-Tower architecture for improved performance (NEW)
## Installation
```bash
# Clone the repository
git clone https://github.com/arimanyus/large-concept-model
cd large-concept-model
# Create a virtual environment
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
## Quick Start
```python
from model import LCM, DiffusionLCM, TwoTowerDiffusionLCM
# Initialize base model
base_model = LCM.from_pretrained('lcm-base')
# Or use the more advanced diffusion model
diffusion_model = DiffusionLCM(diffusion_steps=10)
# Process text
concepts = model.extract_concepts("Your input text here")
output = model.generate(concepts)
```
## Model Architectures
This repository implements three different Large Concept Model architectures:
### 1. Base-LCM
Our original implementation that processes concepts in a transformer decoder. It works by:
- Converting text to sentence embeddings (concepts)
- Processing these concepts with a transformer
- Projecting concepts back to text space
### 2. Diffusion LCM (One-Tower)
Based on Meta's One-Tower architecture, this diffusion-based model refines concept predictions iteratively:
- Starts with a noisy concept representation
- Uses diffusion process to iteratively denoise the concept
- Achieves better performance by considering multiple plausible outputs
```bash
python train.py --model_type diffusion --diffusion_steps 10
```
### 3. Two-Tower Diffusion LCM
Based on Meta's Two-Tower architecture, this model uses separate transformers for input and output:
- First tower processes input concepts
- Second tower processes output concepts
- Cross-attention mechanism connects the two towers
- Better separation of encoder and decoder functionality
```bash
python train.py --model_type two_tower --diffusion_steps 10
```
## Training
To train your own model:
```bash
python train.py \
--model_type base \ # Options: base, diffusion, two_tower
--batch_size 32 \
--learning_rate 5e-5 \
--epochs 5 \
--encoder_model sentence-transformers/all-mpnet-base-v2 \
--dataset cnn_dailymail \
--dataset_version 3.0.0
```
For diffusion-based models, you can also specify:
```bash
python train.py \
--model_type diffusion \ # or two_tower
--diffusion_steps 10 \ # Number of diffusion steps
--batch_size 16 # Smaller batch size due to increased memory usage
```
## Generation
Generate summaries using the trained models:
```bash
python generate.py \
--model_path base_model.pt \ # Path to trained model
--model_type base \ # Options: base, diffusion, two_tower
--method retrieval \ # Options: retrieval, generation
--temperature 1.0 # For retrieval sampling
```
For diffusion-based generation:
```bash
python generate.py \
--model_type diffusion \ # or two_tower
--method generation \
--diffusion_steps 10 # More steps = higher quality, slower generation
```
## Evaluation
Run evaluation on standard benchmarks:
```bash
python evaluate.py \
--model_path path/to/model \
--dataset cnn_dailymail
```
## Model Architecture
Our implementation consists of three main components:
1. **Concept Formation**: Converts text to compressed concept embeddings
2. **Concept Processing**:
- Base-LCM: 4-layer transformer
- Diffusion LCM: 6-layer transformer with diffusion process
- Two-Tower: Separate encoder and decoder transformers
3. **Hybrid Generation**: Combines concept and token-level processing
## Hyperparameters
Key hyperparameters used in our experiments:
| Parameter | Base | Diffusion | Two-Tower |
| -------------------- | ---- | --------- | --------- |
| Learning Rate | 5e-5 | 5e-5 | 5e-5 |
| Batch Size | 32 | 16 | 16 |
| Transformer Layers | 4 | 6 | 4+4 |
| Attention Heads | 8 | 8 | 8 |
| Diffusion Steps | - | 10 | 10 |
| α (Hybrid Attention) | 0.7 | 0.7 | 0.7 |
## Results
Our models achieve:
- Base-LCM: 82% ROUGE-L retention compared to BART
- Diffusion LCM: 87% ROUGE-L retention
- Two-Tower LCM: 89% ROUGE-L retention
- All models show excellent multilingual performance with only 4-8% average performance drop
## Visualization
Generate concept space visualizations:
```bash
python visualize.py --embedding_dir path/to/embeddings
```
## Citation
If you use this code in your research, please cite our paper:
```bibtex
@article{tiwari2024towards,
title={Towards Practical Concept-Based Language Models: An Efficiency-Focused Implementation},
author={Tiwari, Vivek K.},
journal={arXiv preprint arXiv:2024.6154975},
year={2024}
}
```
## Acknowledgments
- The authors of the original LCM paper from Meta AI
- IBM for technical guidance
- The open-source NLP community
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Contributing
We welcome contributions! Please check our [contributing guidelines](CONTRIBUTING.md) for details.
## Contact
- Vivek K. Tiwari - vivek.tiwari4@ibm.com / vivek3312@gmail.com
- Project Link: https://github.com/arimanyus/large-concept-model