{"id":31772998,"url":"https://github.com/kb-perbyte/rm_model_trainer","last_synced_at":"2025-10-10T04:19:34.138Z","repository":{"id":309229540,"uuid":"1033686417","full_name":"KB-perByte/rm_model_trainer","owner":"KB-perByte","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-10T16:48:00.000Z","size":30,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-10T18:25:54.105Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KB-perByte.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-07T07:38:08.000Z","updated_at":"2025-08-10T16:48:04.000Z","dependencies_parsed_at":"2025-08-10T18:25:58.233Z","dependency_job_id":"53080bcc-8984-4305-870b-ab8947aff424","html_url":"https://github.com/KB-perByte/rm_model_trainer","commit_stats":null,"previous_names":["kb-perbyte/rm_model_trainer"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/KB-perByte/rm_model_trainer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KB-perByte%2Frm_model_trainer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KB-perByte%2Frm_model_trainer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KB-perByte%2Frm_model_trainer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KB-perByte%2Frm_model_trainer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KB-perByte","download_url":"https://codeload.github.com/KB-perByte/rm_model_trainer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KB-perByte%2Frm_model_trainer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279002685,"owners_count":26083442,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-10T04:19:30.725Z","updated_at":"2025-10-10T04:19:34.132Z","avatar_url":"https://github.com/KB-perByte.png","language":"Python","readme":"# Network Configuration Parser AI Trainer\n\n**\\*** Cursor has been used to Generate Readmes and structure code.\n\n🤖 An AI-powered system that learns from Ansible collection parsers to generate regex-based configuration parsers for network devices.\n\n## 🎯 What It Does\n\n- **Learns from existing parsers**: Trains on your Ansible collection's resource module templates\n- **Generates new parsers**: Creates regex patterns and Jinja2 templates for new configurations\n- **Validates patterns**: Suggests improvements for existing parsers\n- **Handles multiple vendors**: Works with any Ansible network collection structure\n\n## 🚀 Quick Start\n\n### 1. Setup Environment\n\n```bash\n# Install dependencies\npip install -r requirements.txt\n\n# Activate your ML environment (if using pyenv)\npyenv activate ai-test\n```\n\n### 2. Configure Collections\n\nEdit `collection_config.yaml` to point to your Ansible collections:\n\n```yaml\ncollection_base_path: '/path/to/your/ansible_collections'\n\ncollections:\n  - name: 'cisco_ios'\n    argspec_path: 'cisco/ios/plugins/module_utils/network/ios/argspec'\n    rm_template_path: 'cisco/ios/plugins/module_utils/network/ios/rm_templates'\n    enabled: true\n```\n\n### 3. Train the Model\n\n```bash\n# Dry run to verify setup\npython train_from_collections.py --dry-run\n\n# Train the model\npython train_from_collections.py\n```\n\n### 4. Use the Trained Model\n\n```python\nfrom src.collection_trainer import CollectionBasedTrainer\n\ntrainer = CollectionBasedTrainer()\nparser = trainer.load_and_test_model(\n    \"./trained_models/multi_vendor_parser_model\",\n    \"bgp additional-paths install receive\",\n    your_argspec_dict\n)\n```\n\n## 📁 Project Structure\n\n```\nrm_model_trainer/\n├── src/                          # Core source code\n│   ├── trainer.py               # Main AI trainer class\n│   ├── collection_trainer.py   # Collection-specific trainer\n│   └── data_prep_utils.py      # Data preparation utilities\n├── examples/                    # Usage examples\n│   ├── use_trained_model.py    # Simple model usage\n│   ├── api_usage_example.py    # Advanced API examples\n│   └── example_usage.py        # Detailed examples\n├── trained_models/             # Saved models (created after training)\n├── collection_config.yaml      # Collection configuration\n├── train_from_collections.py   # Main training script\n└── requirements.txt            # Python dependencies\n```\n\n## 🔧 Configuration\n\n### Collection Configuration (`collection_config.yaml`)\n\n```yaml\n# Base path to your Ansible collections\ncollection_base_path: '/home/user/ansible_collections'\n\n# Model settings\nmodel:\n  name: 'multi_vendor_parser_model'\n  storage_path: './trained_models'\n\n# Collections to train on\ncollections:\n  - name: 'cisco_ios'\n    argspec_path: 'cisco/ios/plugins/module_utils/network/ios/argspec'\n    rm_template_path: 'cisco/ios/plugins/module_utils/network/ios/rm_templates'\n    enabled: true\n\n  - name: 'cisco_nxos'\n    argspec_path: 'cisco/nxos/plugins/module_utils/network/nxos/argspec'\n    rm_template_path: 'cisco/nxos/plugins/module_utils/network/nxos/rm_templates'\n    enabled: false # Disable for now\n\n# Training parameters\ntraining:\n  batch_size: 4\n  epochs: 10\n  validation_split: 0.2\n```\n\n## 💡 Usage Examples\n\n### Basic Usage\n\n```python\nfrom src.collection_trainer import CollectionBasedTrainer\n\n# Initialize trainer\ntrainer = CollectionBasedTrainer()\n\n# Generate parser for new config\nconfig = \"interface GigabitEthernet0/1 ip address 192.168.1.1 255.255.255.0\"\nargspec = {\n    \"interface\": {\n        \"type\": \"dict\",\n        \"options\": {\n            \"name\": {\"type\": \"str\"},\n            \"ip_address\": {\"type\": \"str\"}\n        }\n    }\n}\n\nparser = trainer.load_and_test_model(\n    \"./trained_models/multi_vendor_parser_model\",\n    config,\n    argspec\n)\n```\n\n### Advanced API Usage\n\n```python\nfrom src.trainer import NetworkConfigParserAI\n\n# Direct API access\nparser_ai = NetworkConfigParserAI()\nparser_ai.load_model(\"./trained_models/multi_vendor_parser_model\")\n\n# Generate parser\nsuggested_parser = parser_ai.generate_parser([config], argspec)\n\n# Get improvement suggestions\nsuggestions = parser_ai.suggest_parser_improvements(existing_parser, config_lines)\n```\n\n## 🛠 Training Process\n\n1. **Data Loading**: Extracts argspecs and parser templates from Ansible collections\n2. **Data Preparation**: Creates training examples from existing parsers\n3. **Model Training**: Fine-tunes a CodeBERT-based model on your data\n4. **Model Saving**: Saves the trained model for future use\n\n### Training Features\n\n- ✅ **No Ansible Dependencies**: Parses collection files directly without importing Ansible\n- ✅ **Automatic Path Detection**: Finds argspecs and templates in collection structure\n- ✅ **Multiple Collections**: Train on multiple vendor collections simultaneously\n- ✅ **Progress Tracking**: Real-time training progress and metrics\n- ✅ **Model Versioning**: Saves training metadata and model checkpoints\n\n## 🎯 Use Cases\n\n### 1. **New Device Support**\n\nWhen adding support for a new network device, generate initial parsers:\n\n```python\nconfig = \"spanning-tree vlan 100 priority 4096\"\n# AI suggests regex patterns and Jinja2 templates\n```\n\n### 2. **Parser Validation**\n\nCheck if existing parsers handle new configuration variations:\n\n```python\nparser_ai.suggest_parser_improvements(existing_parser, new_config_examples)\n```\n\n### 3. **Configuration Analysis**\n\nUnderstand structure of unknown network configurations:\n\n```python\n# Feed unknown configs, get structured parsing suggestions\n```\n\n## 🔍 Troubleshooting\n\n### Common Issues\n\n**Path Not Found Errors**\n\n- Verify `collection_base_path` in `collection_config.yaml`\n- Ensure collections are properly installed\n- Check argspec and rm_template paths are correct\n\n**Import Errors**\n\n- Run from project root directory\n- Ensure `src/` is in Python path\n- Check all dependencies are installed\n\n**Training Failures**\n\n- Verify PyTorch and transformers versions\n- Check available GPU/CPU memory\n- Reduce batch size if out of memory\n\n### Debugging\n\n```bash\n# Verify collection paths\npython train_from_collections.py --dry-run\n\n# Check what data is loaded\npython -c \"from src.collection_trainer import CollectionDataLoader; loader = CollectionDataLoader(); print(loader.load_argspec_from_path('your/path'))\"\n```\n\n## 📋 Requirements\n\n- Python 3.8+\n- PyTorch 2.0+\n- Transformers 4.20+\n- scikit-learn\n- pandas\n- PyYAML\n\nSee `requirements.txt` for complete list.\n\n## 🚫 No External Data Sync - Privacy First\n\n**This system is completely self-contained and sends NO data to external services.**\n\n- ✅ **No wandb** - All training metrics stay local\n- ✅ **No tensorboard remote sync** - Only local files\n- ✅ **No cloud uploads** - Everything saved locally\n- ✅ **Privacy focused** - Your data never leaves your machine\n\n### Local Logging Only\n\nTraining logs are saved locally to:\n\n- Console output for real-time progress\n- `./trained_models/[model_name]/logs/` for detailed logs\n- `./trained_models/[model_name]/training_metadata.json` for training info\n\n### Extra Privacy Assurance\n\nIf you want to be extra sure wandb is disabled:\n\n```bash\n# Optional: Run this before training for extra assurance\npython disable_wandb.py\n\n# Then train normally\npython train_from_collections.py\n```\n\n## 🤝 Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests and examples\n5. Submit a pull request\n\n## 📄 License\n\n[Add your license here]\n\n## 🙏 Acknowledgments\n\n- Built on Hugging Face Transformers\n- Uses Microsoft CodeBERT as base model\n- Designed for Ansible network collections\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkb-perbyte%2Frm_model_trainer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkb-perbyte%2Frm_model_trainer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkb-perbyte%2Frm_model_trainer/lists"}