{"id":30348881,"url":"https://github.com/ricoledan/digimon-knowledge-graph","last_synced_at":"2025-08-18T19:14:31.819Z","repository":{"id":308619938,"uuid":"1033382314","full_name":"Ricoledan/digimon-knowledge-graph","owner":"Ricoledan","description":"👾 A comprehensive knowledge graph built from digimon.net/reference to analyze relationships between Digimon based on their characteristics, evolution patterns, and shared attributes.","archived":false,"fork":false,"pushed_at":"2025-08-10T22:48:56.000Z","size":4222,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-11T00:25:02.850Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ricoledan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-06T18:17:33.000Z","updated_at":"2025-08-10T22:48:59.000Z","dependencies_parsed_at":"2025-08-11T00:25:06.707Z","dependency_job_id":null,"html_url":"https://github.com/Ricoledan/digimon-knowledge-graph","commit_stats":null,"previous_names":["ricoledan/project-yggdrasil","ricoledan/digimon-knowledge-graph"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Ricoledan/digimon-knowledge-graph","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ricoledan%2Fdigimon-knowledge-graph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ricoledan%2Fdigimon-knowledge-graph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ricoledan%2Fdigimon-knowledge-graph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ricoledan%2Fdigimon-knowledge-graph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ricoledan","download_url":"https://codeload.github.com/Ricoledan/digimon-knowledge-graph/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ricoledan%2Fdigimon-knowledge-graph/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271044663,"owners_count":24690001,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-18T02:00:08.743Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-18T19:14:30.865Z","updated_at":"2025-08-18T19:14:31.799Z","avatar_url":"https://github.com/Ricoledan.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Digimon Knowledge Graph Project\n\nA comprehensive knowledge graph built from digimon.net/reference to analyze relationships between Digimon based on their characteristics, evolution patterns, and shared attributes.\n\n## Project Overview\n\n### What It Does\nThis project creates a searchable, analyzable network of all Digimon and their relationships by:\n1. **Collecting** - Scraping comprehensive data from the official Japanese Digimon reference\n2. **Translating** - Converting Japanese content to English for accessibility\n3. **Structuring** - Parsing unstructured HTML into organized data\n4. **Connecting** - Building a graph database of relationships\n5. **Analyzing** - Discovering patterns and insights through network analysis\n\n### Goals\n- **Comprehensive Data Collection**: Capture all 1,249+ Digimon with their complete profiles\n- **Relationship Mapping**: Identify evolution chains, type similarities, and shared attributes\n- **Pattern Discovery**: Uncover hidden connections and clustering in the Digimon universe\n- **Research Platform**: Provide a queryable database for fans and researchers\n- **Technical Demonstration**: Showcase modern data engineering practices\n\n### Expected Outcomes\n- **Complete Digimon Database**: Neo4j graph with all Digimon as nodes\n- **Relationship Network**: Edges representing evolutions, shared types, attributes, and moves\n- **Analytical Insights**: Statistics on type distributions, evolution patterns, and network centrality\n- **Visual Reports**: Network visualizations and analysis charts\n- **Query Interface**: Cypher queries for exploring specific relationships\n\n## Documentation\n\n### Analysis Documentation\n- **[Analysis Specification](docs/analysis-specification.md)**: Comprehensive specification for the 8-notebook analysis suite\n- **[Methodology Guide](docs/methodology.md)**: Detailed statistical methods, algorithms, and ML approaches\n- **[Visualization Guide](docs/visualization-guide.md)**: Complete specifications for 30+ visualizations\n- **[Insights Summary](docs/insights-summary.md)**: Expected findings, metrics, and practical applications\n\n### Analysis Notebooks Overview\n1. **Data Exploration \u0026 Profiling**: Dataset statistics and quality assessment\n2. **Evolution Network Analysis**: Evolution chains and branching patterns\n3. **Type-Attribute Correlation**: Statistical relationships and pattern mining\n4. **Move Network Analysis**: Move-based connections and clustering\n5. **Community Detection**: Graph clustering and natural groupings\n6. **Centrality \u0026 Influence**: Network importance metrics\n7. **Machine Learning**: Predictive models with 85%+ accuracy\n8. **Recommendation System**: Similarity metrics and team optimization\n\n## Architecture\n\n### System Architecture\n\n#### Overall Architecture\nThe system follows a modular pipeline architecture where each component has a specific responsibility in the data processing flow.\n\n```\n┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐\n│                 │     │                 │     │                 │\n│  digimon.net    │────▶│   Scraper       │────▶│  Raw HTML       │\n│  (Data Source)  │     │   (Async)       │     │  Storage        │\n│                 │     │                 │     │                 │\n└─────────────────┘     └─────────────────┘     └─────────────────┘\n                                                          │\n                                                          ▼\n┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐\n│                 │     │                 │     │                 │\n│  Translation    │◀────│   Parser        │◀────│  Structured     │\n│  (Google API)   │     │   (BS4)         │     │  JSON Data      │\n│                 │     │                 │     │                 │\n└─────────────────┘     └─────────────────┘     └─────────────────┘\n                                │\n                                ▼\n┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐\n│                 │     │                 │     │                 │\n│  Neo4j Graph    │◀────│   Loader        │     │   Analysis      │\n│  Database       │     │   (py2neo)      │────▶│   (NetworkX)    │\n│                 │     │                 │     │                 │\n└─────────────────┘     └─────────────────┘     └─────────────────┘\n```\n\n#### Data Flow Pipeline\nThis diagram shows how data flows through the system from source to analysis, including all intermediate storage layers.\n\n```mermaid\nflowchart LR\n    subgraph DS[\"Data Sources\"]\n        A[digimon.net/reference]\n    end\n    \n    subgraph DP[\"Data Pipeline\"]\n        B[Scraper\u003cbr/\u003eBeautifulSoup4]\n        C[Parser\u003cbr/\u003eHTML → JSON]\n        D[Translator\u003cbr/\u003eJP → EN]\n        E[Loader\u003cbr/\u003eJSON → Neo4j]\n    end\n    \n    subgraph ST[\"Storage\"]\n        F[(Raw HTML\u003cbr/\u003eFiles)]\n        G[(Parsed JSON\u003cbr/\u003eFiles)]\n        H[(Translated\u003cbr/\u003eJSON)]\n        I[(Neo4j\u003cbr/\u003eGraph DB)]\n    end\n    \n    subgraph AN[\"Analysis\"]\n        J[NetworkX\u003cbr/\u003eAnalyzer]\n        K[Notebooks\u003cbr/\u003e\u0026 Visualizations]\n    end\n    \n    A --\u003e|HTTP Requests| B\n    B --\u003e|Save| F\n    F --\u003e|Read| C\n    C --\u003e|Save| G\n    G --\u003e|Read| D\n    D --\u003e|Cache| H\n    H --\u003e|Read| E\n    E --\u003e|Import| I\n    I --\u003e|Query| J\n    J --\u003e|Generate| K\n    \n    style DS fill:#666,stroke:#333,stroke-width:2px,color:#fff\n    style DP fill:#666,stroke:#333,stroke-width:2px,color:#fff\n    style ST fill:#666,stroke:#333,stroke-width:2px,color:#fff\n    style AN fill:#666,stroke:#333,stroke-width:2px,color:#fff\n    style A fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff\n    style B fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff\n    style C fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff\n    style D fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff\n    style E fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff\n    style F fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style G fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style H fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style I fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff\n    style J fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff\n    style K fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n```\n\n#### System Components\nThis diagram illustrates the modular architecture showing how the CLI interface connects to core modules and infrastructure.\n\n```mermaid\ngraph TB\n    subgraph CI[\"CLI Interface\"]\n        CLI[ygg CLI\u003cbr/\u003eClick Framework]\n    end\n    \n    subgraph CM[\"Core Modules\"]\n        SCR[Scraper Module\u003cbr/\u003e• Rate Limiting\u003cbr/\u003e• Async Support\u003cbr/\u003e• Error Handling]\n        PRS[Parser Module\u003cbr/\u003e• BeautifulSoup4\u003cbr/\u003e• CSS Selectors\u003cbr/\u003e• Data Extraction]\n        TRN[Translator Module\u003cbr/\u003e• Google Translate\u003cbr/\u003e• Caching System\u003cbr/\u003e• Batch Processing]\n        LDR[Loader Module\u003cbr/\u003e• Neo4j Driver\u003cbr/\u003e• Schema Creation\u003cbr/\u003e• Relationship Building]\n        ANL[Analyzer Module\u003cbr/\u003e• NetworkX\u003cbr/\u003e• Graph Algorithms\u003cbr/\u003e• Statistics]\n    end\n    \n    subgraph IN[\"Infrastructure\"]\n        NEO[Neo4j Database\u003cbr/\u003eCommunity Edition]\n        FS[File System\u003cbr/\u003e• HTML Storage\u003cbr/\u003e• JSON Storage\u003cbr/\u003e• Cache Files]\n    end\n    \n    CLI --\u003e SCR\n    CLI --\u003e PRS\n    CLI --\u003e TRN\n    CLI --\u003e LDR\n    CLI --\u003e ANL\n    \n    SCR --\u003e FS\n    PRS --\u003e FS\n    TRN --\u003e FS\n    LDR --\u003e NEO\n    ANL --\u003e NEO\n    \n    style CI fill:#666,stroke:#333,stroke-width:2px,color:#fff\n    style CM fill:#666,stroke:#333,stroke-width:2px,color:#fff\n    style IN fill:#666,stroke:#333,stroke-width:2px,color:#fff\n    style CLI fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff\n    style SCR fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style PRS fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style TRN fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style LDR fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style ANL fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style NEO fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff\n    style FS fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n```\n\n### Data Flow\n\n1. **Data Collection Phase**\n   - API fetcher retrieves list of all Digimon URLs\n   - Async scraper downloads HTML pages with rate limiting\n   - Raw HTML and images stored locally\n\n2. **Processing Phase**\n   - Parser extracts structured data from HTML\n   - Identifies Japanese/English names, types, attributes, moves\n   - Saves as JSON with consistent schema\n\n3. **Translation Phase**\n   - Translates Japanese profile text to English\n   - Uses caching to avoid duplicate API calls\n   - Preserves original Japanese for reference\n\n4. **Graph Construction Phase**\n   - Creates nodes for Digimon, Types, Attributes, Moves\n   - Establishes relationships between entities\n   - Indexes for efficient querying\n\n5. **Analysis Phase**\n   - Network analysis identifies central Digimon\n   - Community detection finds clusters\n   - Evolution chain analysis\n   - Statistical reports generation\n\n### Key Components\n\n- **Scraper** (`src/scraper/`): Async web scraping with robots.txt compliance\n- **Parser** (`src/parser/`): BeautifulSoup-based HTML parsing\n- **Translator** (`src/processor/`): Google Translate API integration with caching\n- **Graph Loader** (`src/graph/`): Neo4j database population\n- **Analyzer** (`src/analysis/`): NetworkX-based graph analysis\n- **CLI** (`yggdrasil_cli.py`): Unified command-line interface\n\n## Quick Start\n\n```bash\n# Clone the repository\ngit clone https://github.com/yourusername/project-yggdrasil.git\ncd project-yggdrasil\n\n# Enter Nix development environment\nnix develop\n\n# Install the CLI\npip install -e .\n\n# Start Neo4j and run full pipeline\nygg start\nygg run\n```\n\nThat's it! These commands start Neo4j and run the entire pipeline.\n\n## Prerequisites\n\n- Docker \u0026 Docker Compose\n- Python 3.11+\n- One of: Nix (recommended), Poetry, or standard pip/venv\n\n## Environment Setup\n\n#### Option 1: Nix (Recommended)\n```bash\n# Install Nix if you haven't already\ncurl -L https://nixos.org/nix/install | sh\n\n# Enable flakes (add to ~/.config/nix/nix.conf)\nexperimental-features = nix-command flakes\n\n# Enter development shell\nnix develop\n\n# Or with direnv\ndirenv allow\n```\n\n#### Option 2: Poetry\n```bash\n# Install Poetry\ncurl -sSL https://install.python-poetry.org | python3 -\n\n# Install dependencies\npoetry install\n\n# Activate shell\npoetry shell\n```\n\n#### Option 3: pyenv + virtualenv\n```bash\n# Install Python 3.11 with pyenv\npyenv install 3.11.8\npyenv local 3.11.8\n\n# Create virtual environment\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\n```\n\n#### Option 4: Standard virtualenv\n```bash\n# Create virtual environment\npython3.11 -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\n```\n\n## Complete Pipeline\n\n```bash\n# Run everything at once\nygg run\n\n# Or run individual steps\nygg scrape       # Scrape data\nygg parse        # Parse HTML to JSON  \nygg translate    # Translate to English\nygg load         # Load into Neo4j\nygg analyze      # Run analysis\n```\n\n## Typical Workflows\n\n### First Time Setup\n```bash\n# 1. Clone and enter project\ngit clone https://github.com/yourusername/project-yggdrasil.git\ncd project-yggdrasil\n\n# 2. Enter Nix environment (installs Python, dependencies, etc.)\nnix develop\n\n# 3. Install the CLI tool\npip install -e .\n\n# 4. Start Neo4j\nygg start\n\n# 5. Run the full pipeline\nygg run\n```\n\n### Returning to the Project\n```bash\n# 1. Enter project and Nix environment\ncd project-yggdrasil\nnix develop  # or use direnv\n\n# 2. Check current status\nygg status\n\n# 3. Start Neo4j if needed\nygg start\n\n# 4. Continue where you left off\nygg run  # or specific step like 'ygg translate'\n```\n\n### Common Scenarios\n\n#### Scenario: Scraping failed midway\n```bash\n# Check what was scraped\nygg status\n\n# Clean up partial data\nygg prune --keep-cache\n\n# Restart scraping\nygg scrape --fetch-api\n```\n\n#### Scenario: Want to test with a small dataset\n```bash\n# Scrape just a few pages for testing\npython -m src.scraper.main --limit 10\n\n# Then run the rest of the pipeline\nygg parse\nygg translate\nygg load\nygg analyze\n```\n\n#### Scenario: Need to restart from scratch\n```bash\n# Stop Neo4j\nygg stop\n\n# Clean everything including Neo4j database\nygg prune --include-neo4j\n\n# Start fresh\nygg start\nygg run\n```\n\n#### Scenario: Just want to explore the data\n```bash\n# Make sure Neo4j is running\nygg start\n\n# Open Neo4j Browser\n# Go to: http://localhost:7474\n# Login: neo4j / digimon123\n\n# Example queries:\n# - MATCH (d:Digimon) RETURN d LIMIT 25\n# - MATCH (d:Digimon {name_en: \"Agumon\"})-[r]-\u003e(other) RETURN d, r, other\n```\n\n### Troubleshooting\n\n**Issue: \"command not found: ygg\"**\n```bash\n# Make sure you're in Nix environment\nnix develop\n\n# Reinstall the CLI\npip install -e .\n```\n\n**Issue: Scraping shows \"success=0\"**\n```bash\n# The save_html fix might not be applied\npip install -e . --force-reinstall --no-deps\n\n# Clean and restart\nygg prune --keep-cache\nygg scrape --fetch-api\n```\n\n**Issue: Neo4j won't start**\n```bash\n# Check if Docker is running\ndocker ps\n\n# Check logs\nygg logs\n\n# Try manual start\ndocker-compose up -d\n```\n\n**Issue: Translation taking too long**\n```bash\n# Translation uses caching, so you can safely interrupt (Ctrl+C)\n# and resume later - it won't retranslate cached items\nygg translate\n```\n\n### Time Estimates\n- **Scraping**: ~40-50 minutes for all 1,249 Digimon\n- **Parsing**: ~5 minutes\n- **Translation**: ~60-90 minutes (first time, much faster with cache)\n- **Loading**: ~5 minutes\n- **Analysis**: ~1 minute\n- **Total**: ~2-3 hours for complete pipeline\n\n## Analysis Methodology\n\n### Statistical Methods\n- **Chi-Square Tests**: Testing independence between type and attribute distributions\n- **Cramér's V**: Measuring association strength in categorical variables\n- **Markov Chains**: Modeling evolution transition probabilities\n- **Permutation Tests**: Validating network properties against random models\n\n### Network Analysis Algorithms\n- **Centrality Measures**: Degree, Betweenness, Closeness, Eigenvector, PageRank\n- **Community Detection**: Louvain, Label Propagation, Spectral Clustering\n- **Path Analysis**: Shortest paths, evolution chains, cycle detection\n- **Graph Embeddings**: Node2Vec, DeepWalk for similarity computation\n\n### Machine Learning Approaches\n- **Classification**: Random Forest, XGBoost, Neural Networks for type/attribute prediction\n- **Link Prediction**: Graph Neural Networks for evolution prediction\n- **Feature Engineering**: Graph features, text embeddings, move similarity\n- **Model Validation**: Cross-validation, learning curves, SHAP interpretability\n\n### Expected Insights\n- **Network Properties**: Small-world network with diameter 6-10, scale-free distribution\n- **Evolution Patterns**: 2-4 paths per Digimon, 72% type stability through evolution\n- **Community Structure**: 8-12 natural communities aligned with thematic groups\n- **Predictive Power**: 85%+ accuracy in type prediction using graph features\n\n## Project Structure\n\n```\nproject-yggdrasil/\n├── src/                    # Source code\n│   ├── scraper/           # Web scraping \u0026 API integration\n│   │   ├── fetcher.py     # Async HTML scraper\n│   │   ├── api_fetcher.py # API endpoint discovery\n│   │   └── robots_checker.py # Robots.txt compliance\n│   ├── parser/            # HTML parsing \u0026 data extraction\n│   │   ├── html_parser.py # BeautifulSoup parser\n│   │   └── main.py        # Parser orchestration\n│   ├── processor/         # Data processing \u0026 translation\n│   │   ├── translator.py  # Google Translate integration\n│   │   └── main.py        # Processing pipeline\n│   ├── graph/             # Neo4j database layer\n│   │   ├── loader.py      # Graph construction\n│   │   └── main.py        # Database operations\n│   ├── analysis/          # Network analysis \u0026 insights\n│   │   └── main.py        # NetworkX analysis\n│   └── utils/             # Shared utilities\n│       ├── config.py      # Configuration management\n│       ├── cache.py       # Translation caching\n│       └── logger.py      # Logging setup\n│\n├── data/                  # Data storage\n│   ├── raw/              # Original scraped content\n│   │   ├── html/         # HTML pages\n│   │   └── images/       # Digimon images\n│   ├── processed/        # Parsed JSON data\n│   ├── translated/       # English translations\n│   └── cache/            # Translation cache\n│\n├── notebooks/            # Analysis notebooks\n│   ├── 01_data_exploration.ipynb\n│   ├── 02_evolution_analysis.ipynb\n│   ├── 03_type_correlation.ipynb\n│   ├── 04_move_network.ipynb\n│   ├── 05_community_detection.ipynb\n│   ├── 06_centrality_analysis.ipynb\n│   ├── 07_machine_learning.ipynb\n│   └── 08_recommendations.ipynb\n│\n├── docs/                 # Documentation\n│   ├── analysis-specification.md\n│   ├── methodology.md\n│   ├── visualization-guide.md\n│   └── insights-summary.md\n│\n├── yggdrasil_cli.py      # CLI interface (ygg command)\n├── docker-compose.yml    # Neo4j container setup\n├── config.yaml           # Application configuration\n├── requirements.txt      # Python dependencies\n├── pyproject.toml        # Poetry/packaging config\n└── flake.nix            # Nix development environment\n```\n\n## Configuration\n\nEdit `.env` file:\n```env\n# Scraping settings\nSCRAPE_DELAY=1.0  # Be respectful!\nMAX_RETRIES=3\n\n# Neo4j connection\nNEO4J_URI=bolt://localhost:7687\nNEO4J_USER=neo4j\nNEO4J_PASSWORD=digimon123\n```\n\n## Data Model\n\n### Neo4j Graph Schema\n\n```mermaid\ngraph TD\n    subgraph NT[\"Node Types\"]\n        D[Digimon\u003cbr/\u003e• name_jp\u003cbr/\u003e• name_en\u003cbr/\u003e• profile\u003cbr/\u003e• image_url]\n        L[Level\u003cbr/\u003e• name\u003cbr/\u003e• order]\n        T[Type\u003cbr/\u003e• name]\n        A[Attribute\u003cbr/\u003e• name]\n        M[Move\u003cbr/\u003e• name\u003cbr/\u003e• description]\n    end\n    \n    D --\u003e|HAS_LEVEL| L\n    D --\u003e|HAS_TYPE| T\n    D --\u003e|HAS_ATTRIBUTE| A\n    D --\u003e|CAN_USE| M\n    D --\u003e|RELATED_TO| D\n    \n    subgraph SR[\"Similarity Relationships\"]\n        D2[Digimon] -.-\u003e|SHARES_TYPE| D3[Digimon]\n        D2 -.-\u003e|SHARES_LEVEL| D3\n        D2 -.-\u003e|SHARES_ATTRIBUTE| D3\n        D2 -.-\u003e|SHARES_MOVE| D3\n    end\n    \n    style NT fill:#666,stroke:#333,stroke-width:2px,color:#fff\n    style SR fill:#666,stroke:#333,stroke-width:2px,color:#fff\n    style D fill:#2a2a2a,stroke:#888,stroke-width:2px,color:#fff\n    style L fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style T fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style A fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style M fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style D2 fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n    style D3 fill:#444,stroke:#666,stroke-width:1px,color:#ccc\n```\n\n### Graph Schema Details\n```\nNodes:\n├── Digimon (Primary Entity)\n│   ├── name_jp: Japanese name\n│   ├── name_en: English name\n│   ├── profile_jp: Original description\n│   ├── profile_en: Translated description\n│   └── image_url: Character image\n│\n├── Level (Evolution Stage)\n│   └── name: Baby, Rookie, Champion, Ultimate, Mega, etc.\n│\n├── Type (Species Classification)\n│   └── name: Dragon, Machine, Beast, Angel, Demon, etc.\n│\n├── Attribute (Alignment)\n│   └── name: Vaccine, Virus, Data, Free, Variable\n│\n└── Move (Special Attacks)\n    └── name: Attack/technique name\n\nRelationships:\n├── (Digimon)-[:HAS_LEVEL]-\u003e(Level)\n├── (Digimon)-[:HAS_TYPE]-\u003e(Type)\n├── (Digimon)-[:HAS_ATTRIBUTE]-\u003e(Attribute)\n├── (Digimon)-[:CAN_USE]-\u003e(Move)\n├── (Digimon)-[:EVOLVES_FROM]-\u003e(Digimon)\n├── (Digimon)-[:RELATED_TO]-\u003e(Digimon)\n├── (Digimon)-[:SHARES_TYPE]-\u003e(Digimon)\n├── (Digimon)-[:SHARES_LEVEL]-\u003e(Digimon)\n├── (Digimon)-[:SHARES_ATTRIBUTE]-\u003e(Digimon)\n└── (Digimon)-[:SHARES_MOVE]-\u003e(Digimon)\n```\n\n## Example Insights \u0026 Queries\n\n### Network Analysis Results\nAfter analyzing the complete graph, the system discovers:\n\n1. **Most Connected Digimon** - Network hubs that share many relationships\n2. **Evolution Chains** - Complete paths from Baby to Mega level\n3. **Type Clusters** - Groups of similar Digimon based on shared characteristics\n4. **Rare Combinations** - Unique type/attribute pairings\n5. **Move Popularity** - Most common special attacks across species\n\n### Sample Neo4j Queries\n\n```cypher\n// Find all Dragon-type Mega level Digimon\nMATCH (d:Digimon)-[:HAS_TYPE]-\u003e(t:Type {name: \"Dragon Type\"})\nMATCH (d)-[:HAS_LEVEL]-\u003e(l:Level {name: \"Mega\"})\nRETURN d.name_en, d.name_jp\nORDER BY d.name_en;\n\n// Discover evolution paths to a specific Digimon\nMATCH path = (start:Digimon)-[:EVOLVES_FROM*]-\u003e(end:Digimon {name_en: \"Omegamon\"})\nRETURN path;\n\n// Find Digimon that share the most moves with Agumon\nMATCH (agumon:Digimon {name_en: \"Agumon\"})-[:CAN_USE]-\u003e(m:Move)\nMATCH (other:Digimon)-[:CAN_USE]-\u003e(m)\nWHERE other \u003c\u003e agumon\nRETURN other.name_en, COUNT(m) as shared_moves\nORDER BY shared_moves DESC\nLIMIT 10;\n\n// Identify type distribution by level\nMATCH (d:Digimon)-[:HAS_LEVEL]-\u003e(l:Level)\nMATCH (d)-[:HAS_TYPE]-\u003e(t:Type)\nRETURN l.name as Level, t.name as Type, COUNT(d) as Count\nORDER BY Level, Count DESC;\n\n// Find the shortest path between two Digimon\nMATCH path = shortestPath(\n  (d1:Digimon {name_en: \"Agumon\"})-[*]-(d2:Digimon {name_en: \"Gabumon\"})\n)\nRETURN path;\n```\n\n## Development\n\n### CLI Commands\n```bash\nygg start        # Start Neo4j database\nygg stop         # Stop Neo4j database\nygg status       # Check pipeline progress\nygg run          # Run complete pipeline\nygg prune        # Clean up data files\nygg prune --include-neo4j  # Clean data AND Neo4j\nygg --help       # Show all commands\n```\n\n### Run Tests\n```bash\npytest tests/\n```\n\n### Code Formatting\n```bash\nblack src/\nruff check src/\n```\n\n### Type Checking\n```bash\nmypy src/\n```\n\n### Jupyter Notebooks\n```bash\n# Run locally after activating your Python environment\njupyter notebook\n# Or with JupyterLab\njupyter lab\n```\n\n## Docker Services\n\n- **Neo4j**: Graph database (ports 7474, 7687)\n- **Neo4j Browser**: Web UI at http://localhost:7474\n\n## Environment Variables\n\n| Variable | Description | Default |\n|----------|-------------|---------|\n| `NEO4J_URI` | Neo4j connection string | `bolt://localhost:7687` |\n| `SCRAPE_DELAY` | Seconds between requests | `1.0` |\n| `LOG_LEVEL` | Logging verbosity | `INFO` |\n| `DEBUG` | Enable debug mode | `false` |\n\n## Quick Reference\n\n### Essential Commands\n```bash\nygg start        # Start Neo4j\nygg stop         # Stop Neo4j\nygg status       # Check progress\nygg run          # Run full pipeline\nygg prune        # Clean data files\n```\n\n### Pipeline Steps (in order)\n```bash\nygg scrape --fetch-api  # 1. Scrape (40-50 min)\nygg parse               # 2. Parse (5 min)\nygg translate           # 3. Translate (60-90 min)\nygg load                # 4. Load to Neo4j (5 min)\nygg analyze             # 5. Analyze (1 min)\n```\n\n### Maintenance\n```bash\nygg prune               # Clean all data files\nygg prune --keep-cache  # Keep translations\nygg prune --include-neo4j  # Clean everything\nygg logs                # View Neo4j logs\nygg db-status           # Check database\n```\n\n### Key Files\n- `.env` - Configuration\n- `data/raw/html/` - Scraped HTML\n- `data/processed/` - Parsed JSON\n- `data/translated/` - English data\n- `data/cache/translations.json` - Translation cache\n\n## License\n\nMIT License - see LICENSE file\n\n## Author\n\nRicardo Ledan \u003cricardoledan@proton.me\u003e\n\n## Acknowledgments\n\n- Data source: [digimon.net/reference](https://digimon.net/reference/)\n- Built with Claude, Neo4j, Python, and Cafe Bustelo coffee","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fricoledan%2Fdigimon-knowledge-graph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fricoledan%2Fdigimon-knowledge-graph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fricoledan%2Fdigimon-knowledge-graph/lists"}