{"id":30860539,"url":"https://github.com/rhoadesscholar/rose","last_synced_at":"2025-09-07T16:06:04.819Z","repository":{"id":307422275,"uuid":"1028604778","full_name":"rhoadesScholar/RoSE","owner":"rhoadesScholar","description":"PyTorch implementation of Rotary Spatial Embeddings","archived":false,"fork":false,"pushed_at":"2025-08-26T20:10:04.000Z","size":100,"stargazers_count":2,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-04T18:44:34.605Z","etag":null,"topics":["attention","positional-encoding","pytorch","rotary-position-embedding","rotary-position-encoding","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rhoadesScholar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-29T19:25:57.000Z","updated_at":"2025-08-26T20:07:05.000Z","dependencies_parsed_at":"2025-07-31T07:10:04.467Z","dependency_job_id":"732c6538-7a6e-4b6b-8011-45e602a00a66","html_url":"https://github.com/rhoadesScholar/RoSE","commit_stats":null,"previous_names":["rhoadesscholar/rose"],"tags_count":14,"template":false,"template_full_name":null,"purl":"pkg:github/rhoadesScholar/RoSE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhoadesScholar%2FRoSE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhoadesScholar%2FRoSE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhoadesScholar%2FRoSE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhoadesScholar%2FRoSE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rhoadesScholar","download_url":"https://codeload.github.com/rhoadesScholar/RoSE/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rhoadesScholar%2FRoSE/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274058956,"owners_count":25215201,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","positional-encoding","pytorch","rotary-position-embedding","rotary-position-encoding","transformer"],"created_at":"2025-09-07T16:06:02.834Z","updated_at":"2025-09-07T16:06:04.809Z","avatar_url":"https://github.com/rhoadesScholar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RoSE N-dimensional Rotary Spatial Embeddings\n\n## Original implementation of Rotary Spatial Embeddings (in PyTorch)\n\n![GitHub - License](https://img.shields.io/github/license/rhoadesScholar/RoSE)\n[![CI/CD Pipeline](https://github.com/rhoadesScholar/RoSE/actions/workflows/ci-cd.yml/badge.svg)](https://github.com/rhoadesScholar/RoSE/actions/workflows/ci-cd.yml)\n[![codecov](https://codecov.io/github/rhoadesScholar/RoSE/graph/badge.svg?token=PPT4ZNZZCJ)](https://codecov.io/github/rhoadesScholar/RoSE)\n![PyPI - Version](https://img.shields.io/pypi/v/rotary-spatial-embeddings)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/rotary-spatial-embeddings)\n\n\nRotary Spatial Embeddings (RoSE) extends [2D Rotary Position Embeddings (RoPE)](https://arxiv.org/abs/2403.13298) and the original [1D RoPE](https://arxiv.org/pdf/2104.09864) to incorporate into the embeddings spatial information in terms of N-dimensional real world coordinates. This is particularly useful for tasks that require understanding of spatial relationships across different scales, such as in microscopy.\n\n## Explanation\n\n### 1 Relative phase in 1-D RoPE\n\nIf you write the 1-D RoPE positional factor for token $t$ as a per-token complex phase\n\n```math\n\\phi(t)=e^{\\,i\\,t\\theta},\\qquad t\\in\\mathbb Z .\n```\n\nAfter you attach that phase to query $q_t$ and key $k_t$,\n\n```math\n\\tilde q_t = q_t\\;\\phi(t),\\qquad\n\\tilde k_t = k_t\\;\\phi(t)^{*},\n```\n\nwhere $^*$ denotes complex conjugation, their dot-product inside attention becomes\n\n```math\n\\tilde q_n\\,\\tilde k_m^{}\n\\;=\\; q_n\\,k_m^{}\\,\n\\underbrace{\\phi(n)\\,\\phi(m)^{*}}_{=\\,e^{\\,i\\,(n-m)\\theta}} .\n```\n\n⸻\n\n### 2 Extending to N dimensions\n\nGive every token a coordinate vector\n$\\mathbf{p}=(x,y,z,\\dots)\\in\\mathbb R^{N}.$\n\nDefine its phase as\n\n```math\n\\phi(\\mathbf{p}) \\;=\\;e^{\\,i\\,\\langle\\mathbf{p},\\,\\boldsymbol\\theta\\rangle},\n\\qquad\n\\langle\\mathbf{p},\\boldsymbol\\theta\\rangle\n=\\sum_{a=1}^{N} p_a\\,\\theta_a .\n```\n\nThen\n\n```math\n\\phi(\\mathbf{p}_n)\\,\\phi(\\mathbf{p}_m)^{*}\n\\;=\\;\ne^{\\,i\\,\\langle\\mathbf{p}_n-\\mathbf{p}_m,\\;\\boldsymbol\\theta\\rangle},\n```\n\nwhich is the ND generalisation of the 1-D $e^{\\,i\\,(n-m)\\theta}$.\nYou still get\n\n```math\nA_{nm}\\;=\\;\\mathrm{Re}\n\\bigl[q_n k_m^{*}\\;e^{\\,i\\,\\langle\\mathbf{p}_n-\\mathbf{p}_m,\n\\boldsymbol\\theta\\rangle}\\bigr],\n```\n\nwhile keeping the per-token encoding cost $O(LD)$.\n\n**Partial Rotation**: RoSE also supports partial rotation via the `rotary_ratio` parameter, where only a fraction of the embedding dimensions are rotated while the rest are passed through unchanged. This provides a balance between spatial awareness and computational efficiency.\n\n---\n\n### 3 Embedding real-world coordinates\n\nIn many applications, such as microscopy or 3D point clouds, the coordinates are not just indices but represent real-world positions that may contain useful spatial information. RoSE allows for injecting these coordinates directly into the rotary embeddings by simply multiplying the coordinate vectors by the coordinate spacing (i.e. voxel size) before applying the rotary embedding.\n\n---\n\n## Installation\n\n### From PyPI\n\n```bash\npip install rotary-spatial-embeddings\n```\n\n### From source\n\n```bash\npip install git+https://github.com/rhoadesScholar/RoSE.git\n```\n\n## Usage\n\n### Basic Usage - Multi-Head Attention with Spatial Embeddings\n\n```python\nimport torch\nfrom RoSE import RoSEMultiHeadCrossAttention\n\n# Create RoSE multi-head attention layer\nlayer = RoSEMultiHeadCrossAttention(\n    dim=128,\n    num_heads=8,\n    spatial_dims=3,\n    learnable=True,\n    base_theta=1e4,\n    rotary_ratio=1.0  # Apply rotation to all dimensions (default)\n)\n\nbatch_size, seq_len = 2, 1000\nq = torch.randn(batch_size, seq_len, 128)\nk = torch.randn(batch_size, seq_len, 128)\n\n# Define spatial grid properties\ngrid_shape = (10, 10, 10)  # 3D grid dimensions\nspacing = (1.0, 1.0, 1.0)  # Physical size of each voxel\n\n# Compute attention scores with spatial embeddings\nattn_scores = layer(q, k, spacing, grid_shape)  # Shape: (batch_size, num_heads, seq_len, seq_len)\n```\n\n### Partial Rotation with `rotary_ratio`\n\nThe `rotary_ratio` parameter allows you to apply rotary embeddings to only a fraction of the embedding dimensions, which can be beneficial for performance and model capacity:\n\n```python\nimport torch\nfrom RoSE import RotarySpatialEmbedding\n\n# Apply rotation to only 50% of the embedding dimensions\nembedding = RotarySpatialEmbedding(\n    dim=128,\n    num_heads=8,\n    spatial_dims=2,\n    rotary_ratio=0.5,  # Only rotate first 50% of dimensions per head\n    learnable=False\n)\n\nbatch_size, seq_len = 2, 100\nx = torch.randn(batch_size, seq_len, 128)\n\n# The first 64 dimensions (50% of 128) will be rotated\n# The last 64 dimensions will be passed through unchanged\nx_embedded = embedding(x, spacing=(0.5, 0.5), grid_shape=(10, 10))\n```\n\n**Key benefits of partial rotation:**\n\n- **Performance**: Reduces computational cost for large embeddings\n- **Flexibility**: Allows some dimensions to encode non-spatial information\n- **Stability**: Can improve training stability in some scenarios\n- **Memory**: Lower memory usage for frequency parameters\n\n### Using Just the Embedding Layer\n\n```python\nimport torch\nfrom RoSE import RotarySpatialEmbedding\n\n# Create just the rotary spatial embedding layer\nembedding = RotarySpatialEmbedding(\n    dim=128,\n    num_heads=8,\n    spatial_dims=2,\n    learnable=False,\n    frequency_scaling=\"sqrt\",\n    rotary_ratio=1.0  # Apply rotation to all dimensions (default)\n)\n\nbatch_size, seq_len = 2, 100\nx = torch.randn(batch_size, seq_len, 128)\n\n# Define 2D grid\ngrid_shape = (10, 10)\nspacing = (0.5, 0.5)\n\n# Apply rotary spatial embeddings\nx_embedded = embedding(x, spacing, grid_shape)  # Shape: (batch_size, seq_len, 128)\n```\n\n## Parameters\n\n### Core Parameters\n\n- **`dim`**: Total embedding dimension (must be even and divisible by `num_heads`)\n- **`num_heads`**: Number of attention heads\n- **`spatial_dims`**: Number of spatial dimensions (2 for 2D, 3 for 3D, etc.)\n- **`rotary_ratio`**: Fraction of embedding dimensions to apply rotation to (0.0 to 1.0, default: 1.0)\n  - `1.0`: Apply rotation to all dimensions (full rotation)\n  - `0.5`: Apply rotation to 50% of dimensions per head\n  - `0.0`: No rotation applied (passthrough)\n\n### Advanced Parameters\n\n- **`base_theta`**: Base frequency for rotary embeddings (default: 10000.0)\n- **`learnable`**: Whether frequencies should be learnable parameters (default: True)\n- **`init_jitter_std`**: Standard deviation for frequency initialization jitter (default: 0.02)\n- **`frequency_scaling`**: Scaling strategy for frequencies (default: \"sqrt\")\n  - `\"none\"`: No frequency scaling\n  - `\"linear\"`: Linear scaling with spatial dimensions\n  - `\"sqrt\"`: Square root scaling with spatial dimensions\n  - `\"adaptive\"`: Adaptive scaling based on spatial dims and embedding dim\n\n## Advanced Examples\n\n### Working with 3D Medical Imaging Data\n\n```python\nimport torch\nfrom RoSE import RotarySpatialEmbedding\n\n# Example: 3D CT scan with anisotropic voxel spacing\nbatch_size, seq_len = 1, 8000  # 20x20x20 volume flattened\nembedding_dim = 256\nnum_heads = 8\n\n# Create embedding layer for 3D medical data\nembedding = RotarySpatialEmbedding(\n    dim=embedding_dim,\n    num_heads=num_heads,\n    spatial_dims=3,\n    learnable=True,\n    rotary_ratio=0.75,  # Rotate 75% of dimensions\n    frequency_scaling=\"adaptive\"\n)\n\n# Define anisotropic voxel spacing (common in medical imaging)\ngrid_shape = (20, 20, 20)\nvoxel_spacing = (0.5, 0.5, 2.0)  # 0.5mm x 0.5mm x 2mm\n\n# Your input features (e.g., from a CNN backbone)\nx = torch.randn(batch_size, seq_len, embedding_dim)\n\n# Apply spatial embeddings\nx_with_spatial = embedding(x, voxel_spacing, grid_shape)\nprint(f\"Input shape: {x.shape}\")\nprint(f\"Output shape: {x_with_spatial.shape}\")\n```\n\n### Multi-Scale Microscopy Analysis\n\n```python\nimport torch\nfrom RoSE import RoSEMultiHeadCrossAttention\n\n# Example: Multi-scale microscopy with different zoom levels\ndef create_multiscale_attention():\n    return RoSEMultiHeadCrossAttention(\n        dim=512,\n        num_heads=16,\n        spatial_dims=2,\n        learnable=True,\n        base_theta=1e4,\n        rotary_ratio=1.0  # Full rotation for spatial awareness\n    )\n\n# Different scales: 10x, 40x, 100x magnification\nscales_and_spacings = [\n    ((100, 100), (1.0, 1.0)),      # 10x: 1μm/pixel\n    ((200, 200), (0.25, 0.25)),    # 40x: 0.25μm/pixel\n    ((400, 400), (0.1, 0.1)),      # 100x: 0.1μm/pixel\n]\n\nattention_layer = create_multiscale_attention()\n\nfor i, (grid_shape, spacing) in enumerate(scales_and_spacings):\n    seq_len = grid_shape[0] * grid_shape[1]\n\n    # Simulate features from different magnifications\n    q = torch.randn(1, seq_len, 512)\n    k = torch.randn(1, seq_len, 512)\n\n    # Compute attention with spatial awareness\n    attn_scores = attention_layer(q, k, spacing, grid_shape)\n\n    print(f\"Scale {i+1}: {grid_shape} grid, {spacing} spacing\")\n    print(f\"Attention shape: {attn_scores.shape}\\n\")\n```\n\n### Custom Coordinate Systems\n\n```python\nimport torch\nfrom RoSE import RotarySpatialEmbedding\n\n# Example: Geographic coordinate system (lat/lon/elevation)\nclass GeospatialEmbedding(torch.nn.Module):\n    def __init__(self, dim, num_heads):\n        super().__init__()\n        self.spatial_embedding = RotarySpatialEmbedding(\n            dim=dim,\n            num_heads=num_heads,\n            spatial_dims=3,  # lat, lon, elevation\n            learnable=True,\n            frequency_scaling=\"adaptive\"\n        )\n\n    def forward(self, x, coordinates):\n        \"\"\"\n        Args:\n            x: Features [B, N, D]\n            coordinates: [B, N, 3] tensor with [lat, lon, elevation]\n        \"\"\"\n        # Normalize coordinates to reasonable scales\n        lat_scale, lon_scale, elev_scale = 1/90, 1/180, 1/1000\n        normalized_coords = coordinates * torch.tensor([lat_scale, lon_scale, elev_scale])\n\n        # Convert to grid format (this is a simplified example)\n        # In practice, you'd need proper coordinate-to-grid mapping\n        batch_size, seq_len, _ = coordinates.shape\n        grid_size = int(seq_len ** (1/3)) if seq_len ** (1/3) == int(seq_len ** (1/3)) else 10\n        grid_shape = (grid_size, grid_size, grid_size)\n        spacing = (lat_scale, lon_scale, elev_scale)\n\n        return self.spatial_embedding(x, spacing, grid_shape)\n\n# Usage\ngeo_embedding = GeospatialEmbedding(dim=256, num_heads=8)\nfeatures = torch.randn(2, 1000, 256)\ncoordinates = torch.randn(2, 1000, 3)  # Random lat/lon/elevation\nresult = geo_embedding(features, coordinates)\n```\n\n### Integration with Transformers\n\n```python\nimport torch\nimport torch.nn as nn\nfrom RoSE import RotarySpatialEmbedding\n\nclass SpatialTransformerBlock(nn.Module):\n    \"\"\"Transformer block with spatial awareness via RoSE.\"\"\"\n\n    def __init__(self, dim, num_heads, spatial_dims=2):\n        super().__init__()\n        self.spatial_embedding = RotarySpatialEmbedding(\n            dim=dim,\n            num_heads=num_heads,\n            spatial_dims=spatial_dims,\n            learnable=True\n        )\n\n        self.attention = nn.MultiheadAttention(\n            embed_dim=dim,\n            num_heads=num_heads,\n            batch_first=True\n        )\n\n        self.norm1 = nn.LayerNorm(dim)\n        self.norm2 = nn.LayerNorm(dim)\n\n        self.mlp = nn.Sequential(\n            nn.Linear(dim, 4 * dim),\n            nn.GELU(),\n            nn.Linear(4 * dim, dim)\n        )\n\n    def forward(self, x, spacing, grid_shape):\n        # Apply spatial embeddings\n        x_spatial = self.spatial_embedding(x, spacing, grid_shape)\n\n        # Self-attention with spatial embeddings\n        attn_out, _ = self.attention(x_spatial, x_spatial, x_spatial)\n        x = self.norm1(x + attn_out)\n\n        # MLP\n        mlp_out = self.mlp(x)\n        x = self.norm2(x + mlp_out)\n\n        return x\n\n# Example usage\ntransformer = SpatialTransformerBlock(dim=256, num_heads=8, spatial_dims=2)\nx = torch.randn(4, 100, 256)  # Batch of sequences\nresult = transformer(x, spacing=(1.0, 1.0), grid_shape=(10, 10))\nprint(f\"Transformer output shape: {result.shape}\")\n```\n\n## Tips and Best Practices\n\n1. **Voxel Spacing**: Always provide real-world spacing when available - it significantly improves spatial understanding\n2. **Rotary Ratio**: Start with `rotary_ratio=1.0` for maximum spatial awareness, then experiment with lower values for efficiency\n3. **Learnable Frequencies**: Set `learnable=True` for fine-tuning on your specific spatial domain\n4. **Frequency Scaling**: Use `\"adaptive\"` scaling for most applications, `\"sqrt\"` for simpler cases\n5. **Grid Shape**: Ensure your sequence length matches `prod(grid_shape)` for proper spatial mapping\n\n## License\n\nBSD 3-Clause License. See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frhoadesscholar%2Frose","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frhoadesscholar%2Frose","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frhoadesscholar%2Frose/lists"}