https://github.com/notedance/note-documentation

Machine learning library, Distributed training, Deep learning, Reinforcement learning, Models, TensorFlow, PyTorch
https://github.com/notedance/note-documentation
deep-learning machine-learning machine-learning-library reinforcement-learning tensorflow
Last synced: about 2 months ago
JSON representation
Machine learning library, Distributed training, Deep learning, Reinforcement learning, Models, TensorFlow, PyTorch
Host: GitHub
URL: https://github.com/notedance/note-documentation
Owner: NoteDance
Created: 2022-03-27T13:14:41.000Z (over 4 years ago)
Default Branch: layer-7.0
Last Pushed: 2025-08-20T08:05:45.000Z (11 months ago)
Last Synced: 2025-08-20T10:08:44.259Z (11 months ago)
Topics: deep-learning, machine-learning, machine-learning-library, reinforcement-learning, tensorflow
Homepage: https://github.com/NoteDance/Note
Size: 2.23 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # adaptive_avg_pooling1d

The `adaptive_avg_pooling1d` class implements a 1D adaptive average pooling layer. This layer reduces the input tensor along the specified dimension to a new size defined by `output_size`.

**Initialization Parameters**

- **`output_size`** (int or iterable of int): Specifies the desired size of the output features. This can be an integer or a list/tuple of a single integer.

- **`data_format`** (str, default='channels_last'): Specifies the ordering of the dimensions in the input data. `channels_last` corresponds to inputs with shape `(batch, steps, features)` while `channels_first` corresponds to inputs with shape `(batch, features, steps)`.

**Methods**

- **`__call__(self, data)`**: Applies the adaptive average pooling operation to the input data.

  - **Parameters**:

    - **`data`** (tensor): Input tensor to be pooled.

  

  - **Returns**:

    - **`out_vect`** (tensor): Output tensor after adaptive average pooling.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of adaptive_avg_pooling1d

pooling_layer = adaptive_avg_pooling1d(output_size=4)

# Generate some sample data

data = tf.random.normal((32, 16, 8))  # Batch of 32 samples, 16 timesteps, 8 features

# Apply adaptive average pooling

output = pooling_layer(data)

print(output.shape)  # Output shape will be (32, 4, 8) for 'channels_last' data format

```

# adaptive_avg_pooling2d

The `adaptive_avg_pooling2d` class implements a 2D adaptive average pooling layer. This layer reduces the input tensor along the specified dimensions to a new size defined by `output_size`.

**Initialization Parameters**

- **`output_size`** (int or iterable of int): Specifies the desired size of the output features as (pooled_rows, pooled_cols). This can be an integer or a list/tuple of two integers.

- **`data_format`** (str, default='channels_last'): Specifies the ordering of the dimensions in the input data. `channels_last` corresponds to inputs with shape `(batch, height, width, channels)` while `channels_first` corresponds to inputs with shape `(batch, channels, height, width)`.

**Methods**

- **`__call__(self, data)`**: Applies the adaptive average pooling operation to the input data.

  - **Parameters**:

    - **`data`** (tensor): Input tensor to be pooled.

  

  - **Returns**:

    - **`out_vect`** (tensor): Output tensor after adaptive average pooling.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of adaptive_avg_pooling2d

pooling_layer = nn.adaptive_avg_pooling2d(output_size=(4, 4))

# Generate some sample data

data = tf.random.normal((32, 16, 16, 8))  # Batch of 32 samples, 16x16 spatial dimensions, 8 channels

# Apply adaptive average pooling

output = pooling_layer(data)

print(output.shape)  # Output shape will be (32, 4, 4, 8) for 'channels_last' data format

```

# adaptive_avg_pooling3d

The `adaptive_avg_pooling3d` class implements a 3D adaptive average pooling layer. This layer reduces the input tensor along the specified dimensions to a new size defined by `output_size`.

**Initialization Parameters**

- **`output_size`** (int or iterable of int): Specifies the desired size of the output features as (pooled_dim1, pooled_dim2, pooled_dim3). This can be an integer or a list/tuple of three integers.

- **`data_format`** (str, default='channels_last'): Specifies the ordering of the dimensions in the input data. `channels_last` corresponds to inputs with shape `(batch, spatial_dim1, spatial_dim2, spatial_dim3, channels)` while `channels_first` corresponds to inputs with shape `(batch, channels, spatial_dim1, spatial_dim2, spatial_dim3)`.

**Methods**

- **`__call__(self, data)`**: Applies the adaptive average pooling operation to the input data.

  - **Parameters**:

    - **`data`** (tensor): Input tensor to be pooled.

  

  - **Returns**:

    - **`out_vect`** (tensor): Output tensor after adaptive average pooling.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of adaptive_avg_pooling3d

pooling_layer = nn.adaptive_avg_pooling3d(output_size=(4, 4, 4))

# Generate some sample data

data = tf.random.normal((32, 16, 16, 16, 8))  # Batch of 32 samples, 16x16x16 spatial dimensions, 8 channels

# Apply adaptive average pooling

output = pooling_layer(data)

print(output.shape)  # Output shape will be (32, 4, 4, 4, 8) for 'channels_last' data format

```

# adaptive_max_pooling1d

The `adaptive_max_pooling1d` class implements a 1D adaptive max pooling layer. This layer reduces the input tensor along the specified dimension to a new size defined by `output_size`.

**Initialization Parameters**

- **`output_size`** (int or iterable of int): Specifies the desired size of the output features as a single integer. This can be an integer or a list/tuple containing a single integer.

- **`data_format`** (str, default='channels_last'): Specifies the ordering of the dimensions in the input data. `channels_last` corresponds to inputs with shape `(batch, steps, features)` while `channels_first` corresponds to inputs with shape `(batch, features, steps)`.

**Methods**

- **`__call__(self, data)`**: Applies the adaptive max pooling operation to the input data.

  - **Parameters**:

    - **`data`** (tensor): Input tensor to be pooled.

  

  - **Returns**:

    - **`out_vect`** (tensor): Output tensor after adaptive max pooling.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of adaptive_max_pooling1d

pooling_layer = nn.adaptive_max_pooling1d(output_size=4)

# Generate some sample data

data = tf.random.normal((32, 16, 8))  # Batch of 32 samples, 16 steps, 8 features

# Apply adaptive max pooling

output = pooling_layer(data)

print(output.shape)  # Output shape will be (32, 4, 8) for 'channels_last' data format

```

# adaptive_max_pooling2d

The `adaptive_max_pooling2d` class implements a 2D adaptive max pooling layer. This layer reduces the input tensor along the specified dimensions to a new size defined by `output_size`.

**Initialization Parameters**

- **`output_size`** (int or iterable of int): Specifies the desired size of the output features as two integers, representing the number of pooled rows and columns. This can be an integer or a list/tuple containing two integers.

- **`data_format`** (str, default='channels_last'): Specifies the ordering of the dimensions in the input data. `channels_last` corresponds to inputs with shape `(batch, height, width, channels)` while `channels_first` corresponds to inputs with shape `(batch, channels, height, width)`.

**Methods**

- **`__call__(self, data)`**: Applies the adaptive max pooling operation to the input data.

  - **Parameters**:

    - **`data`** (tensor): Input tensor to be pooled.

  

  - **Returns**:

    - **`out_vect`** (tensor): Output tensor after adaptive max pooling.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of adaptive_max_pooling2d

pooling_layer = adaptive_max_pooling2d(output_size=(4, 4))

# Generate some sample data

data = tf.random.normal((32, 16, 16, 8))  # Batch of 32 samples, 16x16 spatial dimensions, 8 channels

# Apply adaptive max pooling

output = pooling_layer(data)

print(output.shape)  # Output shape will be (32, 4, 4, 8) for 'channels_last' data format

```

# adaptive_max_pooling3d

The `adaptive_max_pooling3d` class implements a 3D adaptive max pooling layer. This layer reduces the input tensor along the specified dimensions to a new size defined by `output_size`.

**Initialization Parameters**

- **`output_size`** (int or iterable of int): Specifies the desired size of the output features as three integers, representing the number of pooled dimensions. This can be an integer or a list/tuple containing three integers.

- **`data_format`** (str, default='channels_last'): Specifies the ordering of the dimensions in the input data. `channels_last` corresponds to inputs with shape `(batch, spatial_dim1, spatial_dim2, spatial_dim3, channels)` while `channels_first` corresponds to inputs with shape `(batch, channels, spatial_dim1, spatial_dim2, spatial_dim3)`.

**Methods**

- **`__call__(self, data)`**: Applies the adaptive max pooling operation to the input data.

  - **Parameters**:

    - **`data`** (tensor): Input tensor to be pooled.

  

  - **Returns**:

    - **`out_vect`** (tensor): Output tensor after adaptive max pooling.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of adaptive_max_pooling3d

pooling_layer = nn.adaptive_max_pooling3d(output_size=(4, 4, 4))

# Generate some sample data

data = tf.random.normal((32, 16, 16, 16, 8))  # Batch of 32 samples, 16x16x16 spatial dimensions, 8 channels

# Apply adaptive max pooling

output = pooling_layer(data)

print(output.shape)  # Output shape will be (32, 4, 4, 4, 8) for 'channels_last' data format

```

# FastAdaptiveAvgPool

The `FastAdaptiveAvgPool` class implements fast adaptive average pooling for 2D inputs. It computes the average of the input tensor along the spatial dimensions.

**Initialization Parameters**

- **`flatten`** (bool, optional): If `True`, flattens the output. Default is `False`.

- **`input_fmt`** (str, optional): Specifies the format of the input tensor (`'NHWC'` or `'NCHW'`). Default is `'NHWC'`.

**Methods**

- **`__call__(self, x)`**: Applies adaptive average pooling to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor.

  - **Returns**: Pooled output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of FastAdaptiveAvgPool

avg_pool = nn.FastAdaptiveAvgPool(flatten=True)

# Generate some sample data

data = tf.random.normal((2, 8, 8, 3))

# Apply average pooling

output = avg_pool(data)

```

# FastAdaptiveMaxPool

The `FastAdaptiveMaxPool` class implements fast adaptive max pooling for 2D inputs. It computes the maximum value of the input tensor along the spatial dimensions.

**Initialization Parameters**

- **`flatten`** (bool, optional): If `True`, flattens the output. Default is `False`.

- **`input_fmt`** (str, optional): Specifies the format of the input tensor (`'NHWC'` or `'NCHW'`). Default is `'NHWC'`.

**Methods**

- **`__call__(self, x)`**: Applies adaptive max pooling to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor.

  - **Returns**: Pooled output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of FastAdaptiveMaxPool

max_pool = nn.FastAdaptiveMaxPool(flatten=True)

# Generate some sample data

data = tf.random.normal((2, 8, 8, 3))

# Apply max pooling

output = max_pool(data)

```

# FastAdaptiveAvgMaxPool

The `FastAdaptiveAvgMaxPool` class combines both average and max pooling for 2D inputs. It computes the average and maximum of the input tensor along the spatial dimensions and returns their mean.

**Initialization Parameters**

- **`flatten`** (bool, optional): If `True`, flattens the output. Default is `False`.

- **`input_fmt`** (str, optional): Specifies the format of the input tensor (`'NHWC'` or `'NCHW'`). Default is `'NHWC'`.

**Methods**

- **`__call__(self, x)`**: Applies combined adaptive average and max pooling to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor.

  - **Returns**: Pooled output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of FastAdaptiveAvgMaxPool

avg_max_pool = nn.FastAdaptiveAvgMaxPool(flatten=True)

# Generate some sample data

data = tf.random.normal((2, 8, 8, 3))

# Apply average and max pooling

output = avg_max_pool(data)

```

# FastAdaptiveCatAvgMaxPool

The `FastAdaptiveCatAvgMaxPool` class concatenates the results of both average and max pooling for 2D inputs. It computes the average and maximum of the input tensor along the spatial dimensions and concatenates the results.

**Initialization Parameters**

- **`flatten`** (bool, optional): If `True`, flattens the output. Default is `False`.

- **`input_fmt`** (str, optional): Specifies the format of the input tensor (`'NHWC'` or `'NCHW'`). Default is `'NHWC'`.

**Methods**

- **`__call__(self, x)`**: Applies concatenated adaptive average and max pooling to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor.

  - **Returns**: Concatenated pooled output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of FastAdaptiveCatAvgMaxPool

cat_avg_max_pool = nn.FastAdaptiveCatAvgMaxPool(flatten=True)

# Generate some sample data

data = tf.random.normal((2, 8, 8, 3))

# Apply concatenated average and max pooling

output = cat_avg_max_pool(data)

```

# AdaptiveAvgMaxPool2d

The `AdaptiveAvgMaxPool2d` class implements adaptive pooling that combines average and max pooling for 2D inputs.

**Initialization Parameters**

- **`output_size`** (tuple of int, optional): Specifies the output size. Default is `1`.

**Methods**

- **`__call__(self, x)`**: Applies adaptive average and max pooling to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor.

  - **Returns**: Pooled output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of AdaptiveAvgMaxPool2d

adaptive_avg_max_pool = nn.AdaptiveAvgMaxPool2d(output_size=(2, 2))

# Generate some sample data

data = tf.random.normal((2, 8, 8, 3))

# Apply adaptive average and max pooling

output = adaptive_avg_max_pool(data)

```

# AdaptiveCatAvgMaxPool2d

The `AdaptiveCatAvgMaxPool2d` class implements adaptive pooling that concatenates the results of average and max pooling for 2D inputs.

**Initialization Parameters**

- **`output_size`** (tuple of int, optional): Specifies the output size. Default is `1`.

**Methods**

- **`__call__(self, x)`**: Applies adaptive concatenated average and max pooling to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor.

  - **Returns**: Concatenated pooled output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of AdaptiveCatAvgMaxPool2d

adaptive_cat_avg_max_pool = nn.AdaptiveCatAvgMaxPool2d(output_size=(2, 2))

# Generate some sample data

data = tf.random.normal((2, 8, 8, 3))

# Apply adaptive concatenated average and max pooling

output = adaptive_cat_avg_max_pool(data)

```

# SelectAdaptivePool2d

The `SelectAdaptivePool2d` class provides a selectable global pooling layer with dynamic input kernel size.

**Initialization Parameters**

- **`output_size`** (tuple of int, optional): Specifies the output size. Default is `1`.

- **`pool_type`** (str, optional): Specifies the type of pooling (`'fast'`, `'avgmax'`, `'catavgmax'`, `'max'`, or `'avg'`). Default is `'fast'`.

- **`flatten`** (bool, optional): If `True`, flattens the output. Default is `False`.

- **`input_fmt`** (str, optional): Specifies the format of the input tensor (`'NHWC'` or `'NCHW'`). Default is `'NHWC'`.

**Methods**

- **`is_identity(self)`**: Checks if the pool type is an identity (no pooling).

- **`__call__(self, x)`**: Applies the selected pooling method to the input `x`.

- **`feat_mult(self)`**: Returns the feature multiplier for the selected pooling method.

  - **Parameters**:

    - **`x`**: Input tensor.

  - **Returns**: Pooled output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of SelectAdaptivePool2d

select_pool = nn.SelectAdaptivePool2d(pool_type='avgmax', flatten=True)

# Generate some sample data

data = tf.random.normal((2, 8, 8, 3))

# Apply selected pooling

output = select_pool(data)

```

# additive_attention

The `additive_attention` class implements an additive attention mechanism, which calculates attention scores as a nonlinear sum of query and key tensors.

**Initialization Parameters**

- **`input_size`** (int, optional): The size of the input tensor. If provided, it is used to initialize the scale parameter.

- **`use_scale`** (bool, default=True): Whether to use a learnable scale parameter.

- **`dtype`** (str, default='float32'): The data type of the input and scale parameter.

**Methods**

- **`__call__(self, query, key)`**: Computes the attention scores.

  - **Parameters**:

    - **`query`** (tensor): Query tensor of shape `[batch_size, Tq, dim]`.

    - **`key`** (tensor): Key tensor of shape `[batch_size, Tv, dim]`.

  

  - **Returns**:

    - **`Tensor`**: Attention scores of shape `[batch_size, Tq, Tv]`.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of additive_attention

attention_layer = nn.additive_attention(input_size=128)

# Generate some sample data

query = tf.random.normal((32, 10, 128))  # Batch of 32 samples, 10 query steps, 128 dimensions

key = tf.random.normal((32, 20, 128))    # Batch of 32 samples, 20 key steps, 128 dimensions

# Apply attention

output = attention_layer(query, key)

print(output.shape)  # Output shape will be (32, 10, 20)

```

# avg_pool1d

The `avg_pool1d` class performs 1D average pooling on the input tensor.

**Initialization Parameters**

- **`kernel_size`** (int): Size of the window for each dimension of the input tensor.

- **`strides`** (int): Stride of the sliding window for each dimension of the input tensor. Default is `None`. If `None`, it will default to `kernel_size`.

- **`padding`** (str, int, list, tuple): implicit zero paddings on both sides of the input. Default is `0`.

- **`count_include_pad`** (bool): whether to include zero padding in the average calculation. Default is `True`.

**Methods**

- **`__call__(self, data)`**: Applies 1D average pooling to the input data.

  - **Parameters**:

    - **`data`** (tensor): Input tensor of shape `[batch_size, length, channels]`.

  

  - **Returns**:

    - **`Tensor`**: Pooled output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of avg_pool1d

pooling_layer = nn.avg_pool1d(kernel_size=2, strides=2, padding='SAME')

# Generate some sample data

data = tf.random.normal((32, 100, 64))  # Batch of 32 samples, 100 steps, 64 channels

# Apply 1D average pooling

output = pooling_layer(data)

print(output.shape)  # Output shape will be (32, 50, 64)

```

# avg_pool2d

The `avg_pool2d` class performs 2D average pooling on the input tensor.

**Initialization Parameters**

- **`kernel_size`** (int or tuple of 2 ints): Size of the window for each dimension of the input tensor.

- **`strides`** (int or tuple of 2 ints): Stride of the sliding window for each dimension of the input tensor. Default is `None`. If `None`, it will default to `kernel_size`.

- **`padding`** (str, int, list, tuple): implicit zero paddings on both sides of the input. Default is `0`.

- **`count_include_pad`** (bool): whether to include zero padding in the average calculation. Default is `True`.

**Methods**

- **`__call__(self, data)`**: Applies 2D average pooling to the input data.

  - **Parameters**:

    - **`data`** (tensor): Input tensor of shape `[batch_size, height, width, channels]`.

  

  - **Returns**:

    - **`Tensor`**: Pooled output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of avg_pool2d

pooling_layer = nn.avg_pool2d(kernel_size=(2, 2), strides=(2, 2), padding='SAME')

# Generate some sample data

data = tf.random.normal((32, 64, 64, 3))  # Batch of 32 samples, 64x64 spatial dimensions, 3 channels

# Apply 2D average pooling

output = pooling_layer(data)

print(output.shape)  # Output shape will be (32, 32, 32, 3)

```

# avg_pool3d

The `avg_pool3d` class performs 3D average pooling on the input tensor.

**Initialization Parameters**

- **`kernel_size`** (int or tuple of 3 ints): Size of the window for each dimension of the input tensor.

- **`strides`** (int or tuple of 3 ints): Stride of the sliding window for each dimension of the input tensor. Default is `None`. If `None`, it will default to `kernel_size`.

- **`padding`** (str, int, list, tuple): implicit zero paddings on both sides of the input. Default is `0`.

- **`count_include_pad`** (bool): whether to include zero padding in the average calculation. Default is `True`.

**Methods**

- **`__call__(self, data)`**: Applies 3D average pooling to the input data.

  - **Parameters**:

    - **`data`** (tensor): Input tensor of shape `[batch_size, depth, height, width, channels]`.

  

  - **Returns**:

    - **`Tensor`**: Pooled output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of avg_pool3d

pooling_layer = nn.avg_pool3d(kernel_size=(2, 2, 2), strides=(2, 2, 2), padding='SAME')

# Generate some sample data

data = tf.random.normal((16, 32, 32, 32, 3))  # Batch of 16 samples, 32x32x32 spatial dimensions, 3 channels

# Apply 3D average pooling

output = pooling_layer(data)

print(output.shape)  # Output shape will depend on the input shape, ksize, strides, and padding

```

# axial_positional_encoding

The `axial_positional_encoding` class generates axial positional encodings for Reformer models.

**Initialization Parameters**

- **`d_model`** (int): The dimension of the model embeddings.

- **`axial_shape`** (tuple of int): The shape of the input sequence, such as `(batch_size, seq_length)`.

- **`initializer`** (str): The initializer to use for the positional encoding weights (default is 'Xavier').

- **`trainable`** (bool): Whether the positional encodings are trainable (default is `True`).

- **`dtype`** (str): The data type of the positional encodings (default is 'float32').

**Methods**

- **`__call__(self, data)`**: Generates the axial positional encoding for the input tensor.

  - **Parameters**:

    - **`data`** (tensor): Input tensor of shape `[batch_size, seq_length, d_model]`.

  

  - **Returns**:

    - **`Tensor`**: Output tensor with axial positional encoding added.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of axial_positional_encoding

axial_pe = nn.axial_positional_encoding(d_model=512, axial_shape=(32, 128))

# Generate some sample data

data = tf.random.normal((32, 128, 512))  # Batch of 32 samples, 128 sequence length, 512 dimensions

# Apply axial positional encoding

output = axial_pe(data)

print(output.shape)  # Output shape will be (32, 128, 512)

```

# attention

This class implements an attention mechanism for neural networks, supporting both dot-product and concatenation-based attention scoring methods. It also allows for optional scaling of attention scores.

**Initialization Parameters**

- **`use_scale`** (bool): If `True`, scales the attention scores. Default is `False`.

- **`score_mode`** (str): The method to calculate attention scores. Options are `"dot"` (default) and `"concat"`.

- **`dtype`** (str): The data type for computations. Default is `'float32'`.

**Methods**

- **`__call__(self, query, value, key=None)`**: Applies the attention mechanism to the provided tensors.

  - **Parameters**:

    - **`query`** (Tensor): The query tensor.

    - **`value`** (Tensor): The value tensor.

    - **`key`** (Tensor, optional): The key tensor. If not provided, `value` is used as the key.

  - **Returns**:

    - **`Tensor`**: The result of the attention mechanism applied to the input tensors.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Instantiate the attention class

att = nn.attention(use_scale=True, score_mode="dot", dtype='float32')

# Define sample query and value tensors

query = tf.random.normal(shape=(2, 5, 10))  # (batch_size, query_length, dim)

value = tf.random.normal(shape=(2, 6, 10))  # (batch_size, value_length, dim)

# Compute attention output

output = att(query, value)

print(output.shape)  # Should be (2, 5, 10)

```

# AttentionPoolLatent

The `AttentionPoolLatent` class implements attention pooling with latent queries.

**Initialization Parameters**

- **`in_features`** (int): Number of input features.

- **`out_features`** (int, optional): Number of output features. Default is `in_features`.

- **`embed_dim`** (int, optional): Dimension of the embedding. Default is `in_features`.

- **`num_heads`** (int): Number of attention heads. Default is `8`.

- **`feat_size`** (int, optional): Size of the feature map.

- **`mlp_ratio`** (float): Ratio for the MLP hidden layer. Default is `4.0`.

- **`qkv_bias`** (bool): Whether to use bias in QKV projections. Default is `True`.

- **`qk_norm`** (bool): Whether to normalize Q and K. Default is `False`.

- **`latent_len`** (int): Length of the latent sequence. Default is `1`.

- **`latent_dim`** (int, optional): Dimension of the latent vector. Default is `embed_dim`.

- **`pos_embed`** (str): Type of positional embedding. Default is `''`.

- **`pool_type`** (str): Type of pooling ('token' or 'avg'). Default is `'token'`.

- **`norm_layer`** (callable, optional): Normalization layer.

- **`drop`** (float): Dropout rate. Default is `0.0`.

- **`use_fused_attn`** (bool): Whether to use fused attention. Default is `True`.

**Methods**

- **`__call__(self, x)`**: Applies attention pooling to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor of shape `[B, N, C]`.

  - **Returns**: Output tensor after applying attention pooling.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the AttentionPoolLatent layer

attn_pool = nn.AttentionPoolLatent(in_features=128)

# Generate some sample data

data = tf.random.normal((2, 100, 128))

# Apply attention pooling

output = attn_pool(data)

```

# RotAttentionPool2d

The `RotAttentionPool2d` class implements a multi-head attention-based 2D feature pooling with rotary (relative) positional embedding. It serves as a replacement for spatial average pooling in neural network architectures, adapted from the AttentionPool2d in CLIP by OpenAI.

**Initialization Parameters**

- **`in_features`** (int): Number of input features.

- **`out_features`** (int, optional): Number of output features. Defaults to `in_features`.

- **`ref_feat_size`** (int or tuple): Reference feature size. Default is `7`.

- **`embed_dim`** (int, optional): Dimension of the embedding. Defaults to `in_features`.

- **`head_dim`** (int, optional): Dimension of each attention head. Default is `64`.

- **`num_heads`** (int, optional): Number of attention heads. Automatically calculated if not provided.

- **`qkv_bias`** (bool): Whether to use bias in QKV projections. Default is `True`.

- **`qkv_separate`** (bool): Whether to use separate Q, K, V projections. Default is `False`.

- **`pool_type`** (str): Type of pooling. Must be 'token' or ''. Default is 'token'.

- **`class_token`** (bool): Whether to use a class token. Default is `False`.

- **`drop_rate`** (float): Dropout rate. Default is `0.0`.

- **`use_fused_attn`** (bool): Whether to use fused attention. Default is `True`.

**Methods**

- **`init_weights(self, zero_init_last=False)`**: Initializes the weights of the layer.

- **`reset(self, num_classes=None, pool_type=None)`**: Resets the projection layer and pool type.

  - **Parameters**:

    - **`num_classes`** (int, optional): Number of output classes. Defaults to `None`.

    - **`pool_type`** (str, optional): Type of pooling. Must be 'token' or ''. Defaults to `None`.

- **`__call__(self, x, pre_logits=False)`**: Applies the attention pooling to the input tensor.

  - **Parameters**:

    - **`x`**: Input tensor.

    - **`pre_logits`** (bool): Whether to return pre-logits output. Default is `False`.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the RotAttentionPool2d layer

attn_pool = nn.RotAttentionPool2d(in_features=64, embed_dim=128, num_heads=8)

# Generate some sample data

data = tf.random.normal((2, 16, 16, 64))

# Apply attention pooling

output = attn_pool(data)

```

# AttentionPool2d

The `AttentionPool2d` class implements a multi-head attention-based 2D feature pooling with learned (absolute) positional embedding. It is a replacement for spatial average pooling in neural network architectures, based on the implementation in CLIP by OpenAI.

**Initialization Parameters**

- **`in_features`** (int): Number of input features.

- **`feat_size`** (int or tuple): Feature size. Default is `7`.

- **`out_features`** (int, optional): Number of output features. Defaults to `in_features`.

- **`embed_dim`** (int, optional): Dimension of the embedding. Defaults to `in_features`.

- **`head_dim`** (int, optional): Dimension of each attention head. Default is `64`.

- **`num_heads`** (int, optional): Number of attention heads. Automatically calculated if not provided.

- **`qkv_bias`** (bool): Whether to use bias in QKV projections. Default is `True`.

- **`qkv_separate`** (bool): Whether to use separate Q, K, V projections. Default is `False`.

- **`pool_type`** (str): Type of pooling. Must be 'token' or ''. Default is 'token'.

- **`class_token`** (bool): Whether to use a class token. Default is `False`.

- **`drop_rate`** (float): Dropout rate. Default is `0.0`.

- **`use_fused_attn`** (bool): Whether to use fused attention. Default is `True`.

**Methods**

- **`init_weights(self, zero_init_last=False)`**: Initializes the weights of the layer.

- **`reset(self, num_classes=None, pool_type=None)`**: Resets the projection layer and pool type.

  - **Parameters**:

    - **`num_classes`** (int, optional): Number of output classes. Defaults to `None`.

    - **`pool_type`** (str, optional): Type of pooling. Must be 'token' or ''. Defaults to `None`.

- **`__call__(self, x, pre_logits=False)`**: Applies the attention pooling to the input tensor.

  - **Parameters**:

    - **`x`**: Input tensor.

    - **`pre_logits`** (bool): Whether to return pre-logits output. Default is `False`.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the AttentionPool2d layer

attn_pool = nn.AttentionPool2d(in_features=64, feat_size=16, embed_dim=128, num_heads=8)

# Generate some sample data

data = tf.random.normal((2, 16, 16, 64))

# Apply attention pooling

output = attn_pool(data)

```

# MultiQueryAttentionV2

The `MultiQueryAttentionV2` class implements a fast multi-query attention mechanism optimized for Transformer decoding.

**Initialization Parameters**

- **`dim`** (int): Dimension of the input.

- **`dim_out`** (int, optional): Dimension of the output. Default is `dim`.

- **`num_heads`** (int): Number of attention heads. Default is `8`.

- **`key_dim`** (int): Dimension of the keys. Default is `64`.

- **`value_dim`** (int): Dimension of the values. Default is `64`.

- **`attn_drop`** (float): Dropout rate for attention. Default is `0.0`.

- **`proj_drop`** (float): Dropout rate for projection. Default is `0.0`.

**Methods**

- **`__call__(self, x, m=None)`**: Applies multi-query attention to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor.

    - **`m`**: Memory tensor (optional). Default is `x`.

  - **Returns**: Output tensor after applying multi-query attention.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the MultiQueryAttentionV2 layer

attn = nn.MultiQueryAttentionV2(dim=128)

# Generate some sample data

data = tf.random.normal((2, 10, 128))

# Apply multi-query attention

output = attn(data)

```

# Attention2d

The `Attention2d` class implements multi-head attention for 2D NHWC tensors.

**Initialization Parameters**

- **`dim`** (int): Dimension of the input.

- **`dim_out`** (int, optional): Dimension of the output. Default is `dim`.

- **`num_heads`** (int): Number of attention heads. Default is `32`.

- **`bias`** (bool): Whether to use bias in the convolutions. Default is `True`.

- **`expand_first`** (bool): Whether to expand dimension before attention. Default is `False`.

- **`head_first`** (bool): Whether to process heads first. Default is `False`.

- **`attn_drop`** (float): Dropout rate for attention. Default is `0.0`.

- **`proj_drop`** (float): Dropout rate for projection. Default is `0.0`.

- **`use_fused_attn`** (bool): Whether to use fused attention. Default is `True`.

**Methods**

- **`__call__(self, x, attn_mask=None)`**: Applies multi-head attention to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor of shape `[B, H, W, C]`.

    - **`attn_mask`**: Attention mask (optional).

  - **Returns**: Output tensor after applying multi-head attention.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the Attention2d layer

attn2d = nn.Attention2d(dim=64)

# Generate some sample data

data = tf.random.normal((2, 16, 16, 64))

# Apply 2D attention

output = attn2d(data)

```

# batch_norm

The `batch_norm` class implements batch normalization, which helps to stabilize and accelerate training by normalizing the input layer by adjusting and scaling the activations.

**Initialization Parameters**

- **`input_size`** (int, optional): Size of the input.

- **`axis`** (int): Axis along which to normalize. Default is `-1`.

- **`momentum`** (float): Momentum for the moving average. Default is `0.99`.

- **`epsilon`** (float): Small constant to avoid division by zero. Default is `0.001`.

- **`center`** (bool): If `True`, add offset of `beta` to the normalized tensor. Default is `True`.

- **`scale`** (bool): If `True`, multiply by `gamma`. Default is `True`.

- **`beta_initializer`** (str, list, tuple): Initializer for the beta weight. Default is `'zeros'`.

- **`gamma_initializer`** (str, list, tuple): Initializer for the gamma weight. Default is `'ones'`.

- **`moving_mean_initializer`** (str, list, tuple): Initializer for the moving mean. Default is `'zeros'`.

- **`moving_variance_initializer`** (str, list, tuple): Initializer for the moving variance. Default is `'ones'`.

- **`synchronized`** (bool): If `True`, synchronize the moments across replicas. Default is `False`.

- **`trainable`** (bool): If `True`, add variables to the trainable variables collection. Default is `True`.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

**Methods**

- **`__call__(self, data, training=None, mask=None)`**: Applies batch normalization to the input `data`.

  - **Parameters**:

    - **`data`**: Input tensor.

    - **`training`** (bool, optional): Specifies whether the layer is in training mode.

    - **`mask`** (tensor, optional): Mask tensor for weighted moments calculation.

  - **Returns**: Normalized output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the batch normalization layer

bn = nn.batch_norm(input_size=10)

# Generate some sample data

data = tf.random.normal((2, 5, 10))

# Apply batch normalization

output = bn(data)

```

# BigBird_attention

The `BigBird_attention` class implements BigBird, a sparse attention mechanism, which reduces the quadratic dependency of attention computation to linear. This implementation is based on the paper "Big Bird: Transformers for Longer Sequences" (https://arxiv.org/abs/2007.14062).

**Initialization Parameters**

- **`n_head`** (int): Number of attention heads.

- **`key_dim`** (int): Size of each attention head for query and key.

- **`input_size`** (int, optional): Size of the input.

- **`num_rand_blocks`** (int): Number of random blocks. Default is `3`.

- **`from_block_size`** (int): Block size of the query. Default is `64`.

- **`to_block_size`** (int): Block size of the key. Default is `64`.

- **`max_rand_mask_length`** (int): Maximum length for the random mask. Default is `MAX_SEQ_LEN`.

- **`weight_initializer`** (str, list, tuple): Initializer for the weights. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): Initializer for the bias. Default is `'zeros'`.

- **`use_bias`** (bool): If `True`, adds a bias term to the attention computation. Default is `True`.

- **`seed`** (int, optional): Seed for random number generation.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

**Methods**

- **`__call__(self, query, value, key=None, attention_mask=None)`**: Applies BigBird sparse attention to the input `query`, `key`, and `value`.

  - **Parameters**:

    - **`query`**: Query tensor.

    - **`value`**: Value tensor.

    - **`key`** (optional): Key tensor. If not provided, `value` is used as the key.

    - **`attention_mask`** (optional): Mask tensor for attention computation.

  - **Returns**: Attention output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of BigBird_attention

bigbird_attn = nn.BigBird_attention(

    n_head=8,

    key_dim=64,

    input_size=128,

    num_rand_blocks=3,

    from_block_size=64,

    to_block_size=64,

    max_rand_mask_length=512,

    weight_initializer='Xavier',

    bias_initializer='zeros',

    use_bias=True,

    dtype='float32'

)

# Generate some sample data

query = tf.random.normal((2, 128, 128))

value = tf.random.normal((2, 128, 128))

# Apply BigBird attention

output = bigbird_attn(query, value)

```

# BigBird_masks

The `BigBird_masks` class creates attention masks for the BigBird attention mechanism, which are used to efficiently handle long sequences by reducing the complexity of the attention computation.

**Initialization Parameters**

- **`block_size`** (int): Size of the blocks used in the BigBird attention mechanism.

**Methods**

- **`__call__(self, data, mask)`**: Generates the attention masks required for BigBird attention.

  - **Parameters**:

    - **`data`**: Input tensor.

    - **`mask`**: Mask tensor indicating which elements should be attended to.

  - **Returns**: A list of masks `[band_mask, encoder_from_mask, encoder_to_mask, blocked_encoder_mask]` used in the BigBird attention mechanism.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the BigBird_masks class

bigbird_masks = nn.BigBird_masks(block_size=64)

# Generate some sample data and mask

data = tf.random.normal((2, 128, 128))

mask = tf.cast(tf.random.uniform((2, 128), maxval=2, dtype=tf.int32), tf.float32)

# Generate the BigBird attention masks

masks = bigbird_masks(data, mask)

```

# BlurPool2d

The `BlurPool2d` class creates a module that applies blurring and downsampling to a given feature map, as described in the paper "Making Convolutional Networks Shift-Invariant Again" (https://arxiv.org/abs/1904.11486).

**Initialization Parameters**

- **`channels`** (int, optional): Number of input channels. If not provided, it will be inferred from the input tensor.

- **`filt_size`** (int): Size of the binomial filter for blurring. Supported values are `3` (default) and `5`.

- **`stride`** (int): Stride for the downsampling filter. Default is `2`.

- **`pad_mode`** (str): Padding mode to use. Default is `'REFLECT'`.

**Methods**

- **`__call__(self, x)`**: Applies blurring and downsampling to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor.

  - **Returns**: Transformed tensor after applying blurring and downsampling.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the BlurPool2d layer

blur_pool = nn.BlurPool2d(channels=64, filt_size=3, stride=2)

# Generate some sample data

data = tf.random.normal((2, 32, 32, 64))

# Apply blurring and downsampling

output = blur_pool(data)

```

**Notes**

The `BlurPool2d` class uses a binomial filter for blurring, followed by downsampling the input feature map. This operation helps to maintain the shift-invariance of convolutional neural networks by reducing aliasing effects that occur during downsampling.

The padding applied to the input tensor ensures that the dimensions are consistent during the convolution operation. The filter coefficients are computed using a binomial distribution, which provides a smoothing effect on the input feature map before downsampling.

The code for the `BlurPool2d` class is designed to be flexible and efficient, allowing for easy integration into various neural network architectures that require blurring and downsampling operations.

# BottleneckAttn

The `BottleneckAttn` class implements bottleneck attention for visual recognition.

**Initialization Parameters**

- **`dim`** (int): Input dimension.

- **`dim_out`** (int, optional): Output dimension. Default is `dim`.

- **`feat_size`** (tuple): Size of the feature map (height, width).

- **`stride`** (int): Output stride. Default is `1`.

- **`num_heads`** (int): Number of attention heads. Default is `4`.

- **`dim_head`** (int, optional): Dimension of the query and key heads.

- **`qk_ratio`** (float): Ratio of query and key dimensions to output dimension. Default is `1.0`.

- **`qkv_bias`** (bool): Whether to use bias in QKV projections.

- **`scale_pos_embed`** (bool): Whether to scale the position embedding as well as Q @ K.

**Methods**

- **`__call__(self, x)`**: Applies bottleneck attention to the input `x`.

  - **Parameters**:

    - **`x`**: Input tensor of shape `[B, H, W, C]`.

  - **Returns**: Output tensor after applying bottleneck attention.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the BottleneckAttn layer

bottleneck_attn = nn.BottleneckAttn(dim=64, feat_size=(16, 16))

# Generate some sample data

data = tf.random.normal((2, 16, 16, 64))

# Apply bottleneck attention

output = bottleneck_attn(data)

```

# cached_attention

The `cached_attention` class implements an attention mechanism with caching, primarily used for autoregressive decoding.

**Initialization Parameters**

- **`n_head`** (int): Number of attention heads.

- **`key_dim`** (int): Dimension of the keys.

- **`value_dim`** (int, optional): Dimension of the values. Defaults to the same as `key_dim` if not specified.

- **`input_size`** (int, optional): Size of the input. If not specified, it will be inferred from the input data.

- **`attention_axes`** (list or tuple of ints, optional): Axes along which to apply attention. Defaults to the last axis.

- **`dropout_rate`** (float, optional): Dropout rate to apply to the attention scores. Defaults to 0.0.

- **`weight_initializer`** (str, list, tuple): Initializer for the weights. Defaults to "Xavier".

- **`bias_initializer`** (str, list, tuple): Initializer for the biases. Defaults to "zeros".

- **`use_bias`** (bool, optional): Whether to use bias in the dense layers. Defaults to True.

- **`dtype`** (str, optional): Data type of the layer. Defaults to 'float32'.

**Methods**

- **`build(self)`**: Builds the internal dense layers if `input_size` was not provided during initialization.

- **`_masked_softmax(self, attention_scores, attention_mask=None)`**: Applies a softmax operation to the attention scores with an optional mask.

  - **Parameters**:

    - **`attention_scores`** (tensor): Raw attention scores.

    - **`attention_mask`** (tensor, optional): Mask to apply to the attention scores.

  - **Returns**:

    - **`Tensor`**: Normalized attention scores.

- **`_update_cache(self, key, value, cache, decode_loop_step)`**: Updates the cache with new keys and values during decoding.

  - **Parameters**:

    - **`key`** (tensor): New keys.

    - **`value`** (tensor): New values.

    - **`cache`** (dict): Cache containing previous keys and values.

    - **`decode_loop_step`** (int, optional): Current step in the decoding loop.

  - **Returns**:

    - **`Tensor`**: Updated keys.

    - **`Tensor`**: Updated values.

- **`__call__(self, query, value, key=None, attention_mask=None, cache=None, decode_loop_step=None, return_attention_scores=False)`**: Computes the attention output and optionally returns the attention scores and updated cache.

  - **Parameters**:

    - **`query`** (tensor): Query tensor.

    - **`value`** (tensor): Value tensor.

    - **`key`** (tensor, optional): Key tensor. If not provided, the value tensor will be used.

    - **`attention_mask`** (tensor, optional): Mask to apply to the attention scores.

    - **`cache`** (dict, optional): Cache for storing previous keys and values.

    - **`decode_loop_step`** (int, optional): Current step in the decoding loop.

    - **`return_attention_scores`** (bool, optional): Whether to return the attention scores.

  - **Returns**:

    - **`Tensor`**: Attention output.

    - **`dict`**: Updated cache.

    - **`Tensor`** (optional): Attention scores if `return_attention_scores` is True.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of cached_attention

attention_layer = nn.cached_attention(

    n_head=8,

    key_dim=64,

    input_size=128,

    attention_axes=[1],

    dropout_rate=0.1

)

# Generate some sample data

query = tf.random.normal((32, 10, 128))  # Batch of 32 samples, 10 sequence length, 128 input size

value = tf.random.normal((32, 10, 128))

# Apply the cached attention layer

output, cache = attention_layer(query, value)

print(output.shape)  # Output shape will be (32, 10, 128)

```

# capsule

This class implements a Capsule layer for neural networks, supporting both fully connected (FC) and convolutional (CONV) capsule layers with routing mechanisms.

**Initialization Parameters**

- **`num_outputs`** (int): The number of output capsules in this layer.

- **`vec_len`** (int): The length of the output vector of a capsule.

- **`input_shape`** (tuple, optional): The shape of the input tensor. Required for layer building.

- **`kernel_size`** (int, optional): The kernel size for convolutional capsule layers.

- **`stride`** (int, optional): The stride for convolutional capsule layers.

- **`with_routing`** (bool): Whether this capsule layer uses routing with the lower-level capsules. Default is `True`.

- **`layer_type`** (str): The type of capsule layer. Options are `'FC'` for fully connected or `'CONV'` for convolutional. Default is `'FC'`.

- **`iter_routing`** (int): The number of routing iterations. Default is `3`.

- **`steddev`** (float): The standard deviation for initializing the weights. Default is `0.01`.

**Methods**

- **`build(self)`**: Builds the layer based on the type and input shape. Should be called if `input_shape` is provided during initialization.

- **`__call__(self, data)`**: Applies the capsule layer to the provided input tensor.

  - **Parameters**:

    - **`data`** (Tensor): The input tensor.

  - **Returns**:

    - **`Tensor`**: The output tensor after applying the capsule layer.

- **`routing(self, input, b_IJ, num_outputs=10, num_dims=16)`**: The routing algorithm for the capsule layer.

  - **Parameters**:

    - **`input`** (Tensor): Input tensor with shape `[batch_size, num_caps_l, 1, length(u_i), 1]`.

    - **`b_IJ`** (Tensor): Initial logits for routing.

    - **`num_outputs`** (int): Number of output capsules.

    - **`num_dims`** (int): Number of dimensions for output capsule.

  - **Returns**:

    - **`Tensor`**: The output tensor after applying routing.

- **`squash(self, vector)`**: Squashing function to ensure that the length of the output vector is between 0 and 1.

  - **Parameters**:

    - **`vector`** (Tensor): Input tensor to be squashed.

  - **Returns**:

    - **`Tensor`**: Squashed output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Define input tensor

input_tensor = tf.random.normal(shape=(32, 28, 28, 256))  # Example shape

# Instantiate the capsule class

caps_layer = nn.capsule(num_outputs=10, vec_len=16, input_shape=input_tensor.shape, layer_type='FC', with_routing=True)

# Apply the capsule layer to the input tensor

output = caps_layer(input_tensor)

print(output.shape)  # Should be (32, 10, 16, 1) if the num_outputs is 10 and vec_len is 16

```

# CbamModule

The standard CBAM module with original channel and spatial attention mechanisms.

**Initialization Parameters**

- **channels** (int): Number of input channels.

- **rd_ratio** (float, optional): Reduction ratio for channel attention. Default is `1/16`.

- **rd_channels** (int, optional): Fixed number of reduced channels. Default is `None`.

- **rd_divisor** (int, optional): Ensures reduced channels are divisible by this value. Default is `1`.

- **spatial_kernel_size** (int, optional): Kernel size for spatial attention convolution. Default is `7`.

- **act_layer** (callable, optional): Activation function for intermediate layers. Default is `tf.nn.relu`.

- **gate_layer** (callable, optional): Activation function for gating. Default is `tf.nn.sigmoid`.

- **mlp_bias** (bool, optional): If `True`, use biases in MLP layers. Default is `False`.

**Methods**

- **__call__(self, x)**: Applies CBAM attention to the input tensor.

  - **Parameters**:

    - **x**: Input tensor of shape `(batch_size, height, width, channels)`.

  - **Returns**: Tensor of the same shape as the input, with attention applied.

# LightCbamModule

A lightweight variant that simplifies the attention computations.

**Initialization Parameters**

Same as `CbamModule`, but uses reduced operations for channel and spatial attention.

**Methods**

- **__call__(self, x)**: Applies lightweight CBAM attention to the input tensor.

  - **Parameters**:

    - **x**: Input tensor of shape `(batch_size, height, width, channels)`.

  - **Returns**: Tensor of the same shape as the input, with lightweight attention applied.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the CBAM module

cbam = nn.CbamModule(channels=64, rd_ratio=1./16, spatial_kernel_size=7)

# Generate some sample data

data = tf.random.normal((2, 32, 32, 64))  # Batch size 2, spatial dimensions 32x32, 64 channels

# Apply CBAM

output = cbam(data)

# Create an instance of the lightweight CBAM module

light_cbam = nn.LightCbamModule(channels=64, rd_ratio=1./16, spatial_kernel_size=7)

# Apply lightweight CBAM

light_output = light_cbam(data)

```

# ClassifierHead

This class implements a classifier head with configurable global pooling and dropout options.

**Initialization Parameters**

- **`in_features`** (int): The number of input features.

- **`num_classes`** (int): The number of classes for the final classifier layer (output).

- **`pool_type`** (str, optional): Type of global pooling. Options are `'avg'`, `'max'`, or `''` (no pooling). Default is `'avg'`.

- **`drop_rate`** (float, optional): Dropout rate before the classifier. Default is `0.`.

- **`use_conv`** (bool, optional): Whether to use convolution for the classifier. Default is `False`.

- **`input_fmt`** (str, optional): The input format. Options are `'NHWC'` or `'NCHW'`. Default is `'NHWC'`.

**Methods**

- **`reset(self, num_classes, pool_type=None)`**: Resets the classifier head with a new number of classes and optionally a new pooling type.

  - **Parameters**:

    - **`num_classes`** (int): New number of output classes.

    - **`pool_type`** (str, optional): New pooling type.

**`__call__(self, x, pre_logits=False)`**: Applies the classifier head to the provided input tensor.

  - **Parameters**:

    - **`x`** (Tensor): Input tensor.

    - **`pre_logits`** (bool, optional): Whether to return the features before the final logits layer. Default is `False`.

  - **Returns**:

    - **`Tensor`**: The output tensor after applying the classifier head.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Define input tensor

input_tensor = tf.random.normal(shape=(32, 7, 7, 2048))  # Example shape

# Instantiate the classifier head

classifier_head = nn.ClassifierHead(in_features=2048, num_classes=1000, pool_type='avg', drop_rate=0.5)

# Apply the classifier head to the input tensor

output = classifier_head(input_tensor)

print(output.shape)  # Should be (32, 1000) if num_classes is 1000

```

# NormMlpClassifierHead

This class implements a classifier head with normalization, configurable MLP, and global pooling options.

**Initialization Parameters**

- `in_features` (int): The number of input features.

- `num_classes` (int): The number of classes for the final classifier layer (output).

- `hidden_size` (int, optional): The hidden size of the MLP (pre-logits FC layer). Default is `None`.

- `pool_type` (str, optional): Type of global pooling. Options are `'avg'`, `'max'`, or `''` (no pooling). Default is `'avg'`.

- `drop_rate` (float, optional): Dropout rate before the classifier. Default is `0.`.

- `norm_layer` (Callable, optional): Normalization layer type. Default is `layer_norm`.

- `act_layer` (Callable, optional): Activation layer type. Default is `tf.nn.tanh`.

**Methods**

**`reset(self, num_classes, pool_type=None)`**: Resets the classifier head with a new number of classes and optionally a new pooling type.

  - **Parameters**:

    - **`num_classes`** (int): New number of output classes.

    - **`pool_type`** (str, optional): New pooling type.

**`__call__(self, x, pre_logits=False)`**: Applies the normalized MLP classifier head to the provided input tensor.

  - **Parameters**:

    - **`x`** (Tensor): Input tensor.

    - **`pre_logits`** (bool, optional): Whether to return the features before the final logits layer. Default is `False`.

  - Returns:

    - **`Tensor`**: The output tensor after applying the classifier head.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Define input tensor

input_tensor = tf.random.normal(shape=(32, 7, 7, 2048))  # Example shape

# Instantiate the normalized MLP classifier head

norm_mlp_head = nn.NormMlpClassifierHead(in_features=2048, num_classes=1000, hidden_size=512, pool_type='avg', drop_rate=0.5)

# Apply the normalized MLP classifier head to the input tensor

output = norm_mlp_head(input_tensor)

print(output.shape)  # Should be (32, 1000) if num_classes is 1000

```

# ConvNormAct

The `ConvNormAct` class implements a combination of convolution, normalization, and activation layers, with optional anti-aliasing and dropout. It provides a modular and flexible way to define these operations for deep learning models.

**Initialization Parameters**

- **in_channels** (int): Number of input channels.

- **out_channels** (int): Number of output channels.

- **kernel_size** (int): Size of the convolution kernel. Default is 1.

- **stride** (int): Stride of the convolution. Default is 1.

- **dilation** (int): Dilation rate for convolution. Default is 1.

- **groups** (int): Number of groups for group convolution. Default is 1.

- **bias** (bool): Whether to include a bias term in the convolution. Default is False.

- **apply_norm** (bool): Whether to apply normalization. Default is True.

- **apply_act** (bool): Whether to apply activation. Default is True.

- **norm_layer** (callable): Normalization layer. Default is `nn.batch_norm`.

- **act_layer** (callable): Activation function. Default is `tf.nn.relu`.

- **aa_layer** (callable or str): Anti-aliasing layer. Supports predefined layers like `'avg'` or `'blur'`. Default is None.

- **drop_layer** (callable): Dropout layer. Default is None.

- **drop_rate** (float): Dropout rate. Default is 0.

**Methods**

- **__call__(self, x)**: Applies the convolution, normalization, activation, and optional anti-aliasing and dropout to the input.

  - **Parameters**:

    - **x**: Input tensor.

  - **Returns**: Processed tensor after applying the specified operations.

**Attributes**

- **in_channels** (int): Number of input channels.

- **out_channels** (int): Number of output channels.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create a ConvNormAct instance

conv_layer = nn.ConvNormAct(

    in_channels=32, 

    out_channels=64, 

    kernel_size=3, 

    stride=2, 

    norm_layer=nn.batch_norm, 

    act_layer=tf.nn.relu, 

    aa_layer='blur', 

    drop_layer=nn.dropout, 

    drop_rate=0.1

)

# Input tensor

x = tf.random.normal((2, 128, 128, 32))

# Apply ConvNormAct

output = conv_layer(x)

```

# conv1d

The `conv1d` class implements a 1D convolutional layer, which is commonly used in processing sequential data such as time series or audio.

**Initialization Parameters**

- **`filters`** (int): Number of output filters in the convolution.

- **`kernel_size`** (int or list of int): Size of the convolutional kernel.

- **`input_size`** (int, optional): Size of the input channels.

- **`strides`** (int or list of int): Stride size for the convolution. Default is `[1]`.

- **`padding`** (str or list of int): Padding type or size. Default is `'VALID'`.

- **`weight_initializer`** (str, list, tuple): Initializer for the weight tensor. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): Initializer for the bias vector. Default is `'zeros'`.

- **`activation`** (str, optional): Activation function to use. Default is `None`.

- **`data_format`** (str): Data format, either `'NWC'` or `'NCW'`. Default is `'NWC'`.

- **`dilations`** (int or list of int, optional): Dilation rate for dilated convolution. Default is `None`.

- **`groups`** (int): Number of groups for grouped convolution. Default is `1`.

- **`use_bias`** (bool): Whether to use a bias vector. Default is `True`.

- **`trainable`** (bool): Whether the layer's variables should be trainable. Default is `True`.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

**Methods**

- **`__call__(self, data)`**: Applies the 1D convolution to the input `data`.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the 1D convolution and activation function (if specified).

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the conv1d layer

conv_layer = nn.conv1d(filters=32, kernel_size=3, input_size=64, strides=1, padding='SAME', activation='relu')

# Generate some sample data

data = tf.random.normal((10, 100, 64))

# Apply the convolutional layer

output = conv_layer(data)

print(output.shape)  # Output shape will be (10, 100, 32) if padding is 'SAME'

```

# conv1d_transpose

The `conv1d_transpose` class implements a 1D transposed convolutional layer, often used for tasks like upsampling in sequence data.

**Initialization Parameters**

- **`filters`** (int): Number of output filters in the transposed convolution.

- **`kernel_size`** (int or list of int): Size of the convolutional kernel.

- **`input_size`** (int, optional): Size of the input channels.

- **`strides`** (int or list of int): Stride size for the transposed convolution. Default is `[1]`.

- **`padding`** (str): Padding type. Default is `'VALID'`.

- **`output_padding`** (int, optional): Additional size added to the output shape.

- **`weight_initializer`** (str, list, tuple): Initializer for the weight tensor. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): Initializer for the bias vector. Default is `'zeros'`.

- **`activation`** (str, optional): Activation function to use. Default is `None`.

- **`data_format`** (str): Data format, either `'NWC'` or `'NCW'`. Default is `'NWC'`.

- **`dilations`** (int or list of int, optional): Dilation rate for dilated convolution. Default is `None`.

- **`use_bias`** (bool): Whether to use a bias vector. Default is `True`.

- **`trainable`** (bool): Whether the layer's variables should be trainable. Default is `True`.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

**Methods**

- **`__call__(self, data)`**: Applies the 1D transposed convolution to the input `data`.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the 1D transposed convolution and activation function (if specified).

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the conv1d_transpose layer

conv_transpose_layer = nn.conv1d_transpose(filters=32, kernel_size=3, input_size=64, strides=[1], padding='SAME', activation='relu')

# Generate some sample data

data = tf.random.normal((10, 100, 64))

# Apply the transposed convolutional layer

output = conv_transpose_layer(data)

print(output.shape)  # Output shape will depend on strides and padding

```

# conv2d

The `conv2d` class implements a 2D convolutional layer, which is commonly used in image processing tasks.

**Initialization Parameters**

- **`filters`** (int): Number of output filters in the convolution.

- **`kernel_size`** (int or list of int): Size of the convolutional kernel. If a single integer is provided, it is used for both dimensions.

- **`input_size`** (int, optional): Number of input channels. If not provided, it will be inferred from the input data.

- **`strides`** (int or list of int): Stride size for the convolution. Default is `[1, 1]`.

- **`padding`** (str or list of int): Padding type or size. Default is `'VALID'`.

- **`weight_initializer`** (str, list, tuple): Initializer for the weight tensor. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): Initializer for the bias vector. Default is `'zeros'`.

- **`activation`** (str, optional): Activation function to use. Default is `None`.

- **`data_format`** (str): Data format, either `'NHWC'` or `'NCHW'`. Default is `'NHWC'`.

- **`dilations`** (int or list of int, optional): Dilation rate for dilated convolution. Default is `None`.

- **`groups`** (int): Number of groups for group convolution. Default is `1`.

- **`use_bias`** (bool): Whether to use a bias vector. Default is `True`.

- **`trainable`** (bool): Whether the layer's variables should be trainable. Default is `True`.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

**Methods**

- **`__call__(self, data)`**: Applies the 2D convolution to the input `data`.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the 2D convolution and activation function (if specified).

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the conv2d layer

conv_layer = nn.conv2d(filters=32, kernel_size=3, input_size=64, strides=2, padding='SAME', activation='relu')

# Generate some sample data

data = tf.random.normal((10, 128, 128, 64))  # Batch of 10 images, 128x128 pixels, 64 channels

# Apply the convolutional layer

output = conv_layer(data)

print(output.shape)  # Output shape will depend on strides and padding

```

# conv2d_transpose

The `conv2d_transpose` class implements a 2D transposed convolutional layer, which is commonly used for upsampling in image processing tasks.

**Initialization Parameters**

- **`filters`** (int): Number of output filters in the transposed convolution.

- **`kernel_size`** (int or list of int): Size of the transposed convolutional kernel. If a single integer is provided, it is used for both dimensions.

- **`input_size`** (int, optional): Number of input channels. If not provided, it will be inferred from the input data.

- **`strides`** (int or list of int): Stride size for the transposed convolution. Default is `[1, 1]`.

- **`padding`** (str): Padding type, either `'VALID'` or `'SAME'`. Default is `'VALID'`.

- **`output_padding`** (int or list of int, optional): Additional size added to one side of each dimension in the output shape. Default is `None`.

- **`weight_initializer`** (str, list, tuple): Initializer for the weight tensor. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): Initializer for the bias vector. Default is `'zeros'`.

- **`activation`** (str, optional): Activation function to use. Default is `None`.

- **`data_format`** (str): Data format, either `'NHWC'` or `'NCHW'`. Default is `'NHWC'`.

- **`dilations`** (int or list of int, optional): Dilation rate for dilated transposed convolution. Default is `None`.

- **`use_bias`** (bool): Whether to use a bias vector. Default is `True`.

- **`trainable`** (bool): Whether the layer's variables should be trainable. Default is `True`.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

**Methods**

- **`__call__(self, data)`**: Applies the 2D transposed convolution to the input `data`.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the 2D transposed convolution and activation function (if specified).

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the conv2d_transpose layer

conv_transpose_layer = nn.conv2d_transpose(filters=32, kernel_size=3, input_size=64, strides=2, padding='SAME', activation='relu')

# Generate some sample data

data = tf.random.normal((10, 64, 64, 64))  # Batch of 10 images, 64x64 pixels, 64 channels

# Apply the transposed convolutional layer

output = conv_transpose_layer(data)

print(output.shape)  # Output shape will depend on strides and padding

```

# conv3d

The `conv3d` class implements a 3D convolutional layer, which is commonly used for volumetric data such as videos or 3D medical images.

**Initialization Parameters**

- **`filters`** (int): Number of output filters in the convolution.

- **`kernel_size`** (int or list of int): Size of the convolutional kernel. If a single integer is provided, it is used for all three dimensions.

- **`input_size`** (int, optional): Number of input channels. If not provided, it will be inferred from the input data.

- **`strides`** (int or list of int): Stride size for the convolution. Default is `[1, 1, 1]`.

- **`padding`** (str): Padding type, either `'VALID'` or `'SAME'`. Default is `'VALID'`.

- **`weight_initializer`** (str, list, tuple): Initializer for the weight tensor. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): Initializer for the bias vector. Default is `'zeros'`.

- **`activation`** (str, optional): Activation function to use. Default is `None`.

- **`data_format`** (str): Data format, either `'NDHWC'` or `'NCDHW'`. Default is `'NDHWC'`.

- **`dilations`** (int or list of int, optional): Dilation rate for dilated convolution. Default is `None`.

- **`use_bias`** (bool): Whether to use a bias vector. Default is `True`.

- **`trainable`** (bool): Whether the layer's variables should be trainable. Default is `True`.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

**Methods**

- **`__call__(self, data)`**: Applies the 3D convolution to the input `data`.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the 3D convolution and activation function (if specified).

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the conv3d layer

conv_layer = nn.conv3d(filters=32, kernel_size=3, input_size=64, strides=2, padding='SAME', activation='relu')

# Generate some sample data

data = tf.random.normal((10, 64, 64, 64, 64))  # Batch of 10 volumetric data, 64x64x64 voxels, 64 channels

# Apply the convolutional layer

output = conv_layer(data)

print(output.shape)  # Output shape will depend on strides and padding

```

# conv3d_transpose

The `conv3d_transpose` class implements a 3D transposed convolutional (also known as deconvolutional) layer, which is commonly used for upsampling volumetric data such as videos or 3D medical images.

**Initialization Parameters**

- **`filters`** (int): Number of output filters in the transposed convolution.

- **`kernel_size`** (int or list of int): Size of the transposed convolutional kernel. If a single integer is provided, it is used for all three dimensions.

- **`input_size`** (int, optional): Number of input channels. If not provided, it will be inferred from the input data.

- **`strides`** (int or list of int): Stride size for the transposed convolution. Default is `[1, 1, 1]`.

- **`padding`** (str): Padding type, either `'VALID'` or `'SAME'`. Default is `'VALID'`.

- **`output_padding`** (int or list of int, optional): Additional size added to each dimension of the output shape. Default is `None`.

- **`weight_initializer`** (str, list, tuple): Initializer for the weight tensor. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): Initializer for the bias vector. Default is `'zeros'`.

- **`activation`** (str, optional): Activation function to use. Default is `None`.

- **`data_format`** (str): Data format, either `'NDHWC'` or `'NCDHW'`. Default is `'NDHWC'`.

- **`dilations`** (int or list of int, optional): Dilation rate for dilated transposed convolution. Default is `None`.

- **`use_bias`** (bool): Whether to use a bias vector. Default is `True`.

- **`trainable`** (bool): Whether the layer's variables should be trainable. Default is `True`.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

**Methods**

- **`__call__(self, data)`**: Applies the 3D transposed convolution to the input `data`.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the 3D transposed convolution and activation function (if specified).

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the conv3d_transpose layer

conv_layer = nn.conv3d_transpose(filters=32, kernel_size=3, input_size=64, strides=2, padding='SAME', activation='relu')

# Generate some sample data

data = tf.random.normal((10, 16, 16, 16, 64))  # Batch of 10 volumetric data, 16x16x16 voxels, 64 channels

# Apply the transposed convolutional layer

output = conv_layer(data)

print(output.shape)  # Output shape will depend on strides and padding

```

# cropping1d

This class implements 1D cropping for tensors.

**Initialization Parameters**

- **`cropping`** (int or list): The amount to crop from the start and end of the dimension. Can be an int or a list of two ints.

**Methods**

- **`__call__(self, data)`**: Applies 1D cropping to the input tensor.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the 1D cropping.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Instantiate the cropping1d layer

layer = nn.cropping1d(cropping=2)

# Define a sample input tensor

input_tensor = tf.random.normal(shape=(32, 100, 64))  # (batch_size, length, channels)

# Compute the cropped output

output = layer(input_tensor)

print(output.shape)  # Should be (32, 96, 64)

```

# cropping2d

This class implements 2D cropping for tensors.

**Initialization Parameters**

- **`cropping`** (int or list): The amount to crop from the dimensions. Can be an int, a list of two ints, or a list of four ints.

**Methods**

**`__call__(self, data)`**: Applies 2D cropping to the input tensor.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the 2D cropping.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Instantiate the cropping2d layer

layer = nn.cropping2d(cropping=[2, 3])

# Define a sample input tensor

input_tensor = tf.random.normal(shape=(32, 100, 100, 64))  # (batch_size, height, width, channels)

# Compute the cropped output

output = layer(input_tensor)

print(output.shape)  # Should be (32, 96, 94, 64)

```

# cropping3d

This class implements 3D cropping for tensors.

**Initialization Parameters**

- **`cropping`** (int or list): The amount to crop from the dimensions. Can be an int, a list of three ints, or a list of six ints.

**Methods**

**`__call__(self, data)`**: Applies 3D cropping to the input tensor.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the 3D cropping.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Instantiate the cropping3d layer

layer = nn.cropping3d(cropping=[2, 3, 4])

# Define a sample input tensor

input_tensor = tf.random.normal(shape=(32, 50, 50, 50, 64))  # (batch_size, depth, height, width, channels)

# Compute the cropped output

output = layer(input_tensor)

print(output.shape)  # Should be (32, 46, 44, 42, 64)

```

# dense

The `dense` class implements a fully connected layer, which is a core component of many neural networks. This layer is used to perform a linear transformation on the input data, optionally followed by an activation function.

**Initialization Parameters**

- **`output_size`** (int): Number of output units (neurons) in the dense layer.

- **`input_size`** (int, optional): Number of input units (neurons) in the dense layer. If not provided, it will be inferred from the input data.

- **`weight_initializer`** (str, list, tuple): Initializer for the weight matrix. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): Initializer for the bias vector. Default is `'zeros'`.

- **`activation`** (str, optional): Activation function to use. Default is `None`.

- **`use_bias`** (bool): Whether to use a bias vector. Default is `True`.

- **`trainable`** (bool): Whether the layer's variables should be trainable. Default is `True`.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

- **`name`** (str, optional): Name for the layer. Default is `None`.

**Methods**

- **`__call__(self, data)`**: Applies the dense layer to the input `data`.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the linear transformation and activation function (if specified).

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the dense layer

dense_layer = nn.dense(output_size=64, input_size=128, activation='relu')

# Generate some sample data

data = tf.random.normal((10, 128))  # Batch of 10 samples, each with 128 features

# Apply the dense layer

output = dense_layer(data)

print(output.shape)  # Output shape will be (10, 64)

```

# depthwise_conv1d

The `depthwise_conv1d` class implements a depthwise 1D convolutional layer, which applies a single convolutional filter per input channel (channel-wise convolution), followed by an optional activation function.

**Initialization Parameters**

- **`kernel_size`** (int): Size of the convolutional kernel.

- **`depth_multiplier`** (int): Multiplier for the depth of the output tensor. Default is `1`.

- **`input_size`** (int, optional): Number of input channels. If not provided, it will be inferred from the input data.

- **`strides`** (int or list of int): Stride of the convolution. Default is `1`.

- **`padding`** (str): Padding algorithm to use. Default is `'VALID'`.

- **`weight_initializer`** (str, list, tuple): Initializer for the weight tensor. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): Initializer for the bias vector. Default is `'zeros'`.

- **`activation`** (str, optional): Activation function to use. Default is `None`.

- **`data_format`** (str): Data format of the input and output data. Default is `'NHWC'`.

- **`dilations`** (int or list of int, optional): Dilation rate for the convolution. Default is `None`.

- **`use_bias`** (bool): Whether to use a bias vector. Default is `True`.

- **`trainable`** (bool): Whether the layer's variables should be trainable. Default is `True`.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

**Methods**

- **`__call__(self, data)`**: Applies the depthwise 1D convolutional layer to the input `data`.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the depthwise convolution and activation function (if specified).

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the depthwise_conv1d layer

depthwise_layer = nn.depthwise_conv1d(kernel_size=3, input_size=128, depth_multiplier=2, activation='relu')

# Generate some sample data

data = tf.random.normal((10, 128))  # Batch of 10 samples, each with 128 features

# Apply the depthwise 1D convolutional layer

output = depthwise_layer(data)

print(output.shape)  # Output shape will depend on the stride, padding, and depth_multiplier

```

# depthwise_conv2d

The `depthwise_conv2d` class implements a depthwise 2D convolutional layer, which applies a single convolutional filter per input channel (channel-wise convolution), followed by an optional activation function.

**Initialization Parameters**

- **`kernel_size`** (int or list of int): Size of the convolutional kernel. If an integer is provided, the same value will be used for both height and width.

- **`depth_multiplier`** (int): Multiplier for the depth of the output tensor. Default is `1`.

- **`input_size`** (int, optional): Number of input channels. If not provided, it will be inferred from the input data.

- **`strides`** (int or list of int): Stride of the convolution. Default is `1`.

- **`padding`** (str or list of int): Padding algorithm to use. Can be a string (`'VALID'` or `'SAME'`) or a list of integers for custom padding. Default is `'VALID'`.

- **`weight_initializer`** (str, list, tuple): Initializer for the weight tensor. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): Initializer for the bias vector. Default is `'zeros'`.

- **`activation`** (str, optional): Activation function to use. Default is `None`.

- **`data_format`** (str): Data format of the input and output data. Default is `'NHWC'`.

- **`dilations`** (int or list of int, optional): Dilation rate for the convolution. Default is `None`.

- **`use_bias`** (bool): Whether to use a bias vector. Default is `True`.

- **`trainable`** (bool): Whether the layer's variables should be trainable. Default is `True`.

- **`dtype`** (str): Data type for the layer. Default is `'float32'`.

**Methods**

- **`__call__(self, data)`**: Applies the depthwise 2D convolutional layer to the input `data`.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the depthwise convolution and activation function (if specified).

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the depthwise_conv2d layer

depthwise_layer = nn.depthwise_conv2d(kernel_size=3, input_size=128, depth_multiplier=2, activation='relu')

# Generate some sample data

data = tf.random.normal((10, 64, 64, 128))  # Batch of 10 samples, each with 64x64 size and 128 channels

# Apply the depthwise 2D convolutional layer

output = depthwise_layer(data)

print(output.shape)  # Output shape will depend on the stride, padding, and depth_multiplier

```

# dropout

The `dropout` class applies dropout to the input data, randomly dropping elements with a specified probability during training.

**Initialization Parameters**

- **`rate`** (float): The fraction of input units to drop, between 0 and 1.

- **`noise_shape`** (tensor, optional): A 1-D tensor representing the shape of the binary dropout mask that will be multiplied with the input. If `None`, the mask will have the same shape as the input.

- **`seed`** (int, optional): A random seed to ensure reproducibility.

**Methods**

- **`__call__(self, data, training=None)`**: Applies dropout to the input `data`.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

    - **`training`** (bool, optional): If `True`, dropout is applied; if `False`, the input is returned unchanged. If `None`, the layer uses its internal `train_flag` attribute. Default is `None`.

  - **Returns**: The tensor after applying dropout during training or the original tensor during inference.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the dropout layer

dropout_layer = nn.dropout(rate=0.5)

# Generate some sample data

data = tf.random.normal((10, 64))  # Batch of 10 samples, each with 64 features

# Apply the dropout layer

output = dropout_layer(data, train_flag=True)

print(output.shape)  # Output shape will be the same as the input shape

```

# EcaModule

The `EcaModule` class implements an Efficient Channel Attention (ECA) module, which enhances channel attention without using dimensionality reduction, improving the efficiency and performance of the model.

**Initialization Parameters**

- **`channels`** (int, optional): Number of input feature map channels. If provided, kernel size will be adaptively determined. Default is `None`.

- **`kernel_size`** (int): Kernel size for the convolution. Default is `3`.

- **`gamma`** (int): Parameter used in adaptive kernel size calculation. Default is `2`.

- **`beta`** (int): Parameter used in adaptive kernel size calculation. Default is `1`.

- **`act_layer`** (optional): Activation layer to use after convolution. Default is `None`.

- **`gate_layer`** (callable): Gating function to use. Default is `tf.nn.sigmoid`.

- **`rd_ratio`** (float): Reduction ratio for MLP mode. Default is `1/8`.

- **`rd_channels`** (int, optional): Number of reduced channels for MLP mode. Default is `None`.

- **`rd_divisor`** (int): Divisor for channel reduction. Default is `8`.

- **`use_mlp`** (bool): Flag to use MLP mode. Default is `False`.

**Methods**

- **`__call__(self, x)`**: Applies the ECA module to the input `x`.

  - **Parameters**:

    - **`x`** (Tensor): Input tensor.

  

  - **Returns**: Output tensor with enhanced channel attention.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the EcaModule

eca = nn.EcaModule(channels=128)

# Generate some sample data

data = tf.random.normal((2, 128, 32, 32))

# Apply the ECA module

output = eca(data)

```

# CecaModule

The `CecaModule` class implements a Circular Efficient Channel Attention (CECA) module, which applies circular padding in the convolution process, improving connectivity and potentially enhancing performance metrics.

**Initialization Parameters**

- **`channels`** (int, optional): Number of input feature map channels. If provided, kernel size will be adaptively determined. Default is `None`.

- **`kernel_size`** (int): Kernel size for the convolution. Default is `3`.

- **`gamma`** (int): Parameter used in adaptive kernel size calculation. Default is `2`.

- **`beta`** (int): Parameter used in adaptive kernel size calculation. Default is `1`.

- **`act_layer`** (optional): Activation layer to use after convolution. Default is `None`.

- **`gate_layer`** (callable): Gating function to use. Default is `tf.nn.sigmoid`.

**Methods**

- **`__call__(self, x)`**: Applies the CECA module to the input `x`.

  - **Parameters**:

    - **`x`** (Tensor): Input tensor.

  

  - **Returns**: Output tensor with enhanced channel attention.

- **`circular_pad(self, x, pad)`**: Applies circular padding to the input `x`.

  - **Parameters**:

    - **`x`** (Tensor): Input tensor.

    - **`pad`** (int): Padding size.

  

  - **Returns**: Padded tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the CecaModule

ceca = nn.CecaModule(channels=128)

# Generate some sample data

data = tf.random.normal((2, 128, 32, 32))

# Apply the CECA module

output = ceca(data)

```

# einsum_dense

This class implements a dense layer using `tf.einsum` for computations. It allows for flexible einsum operations on tensors of arbitrary dimensionality.

**Initialization Parameters**

- **`equation`** (str): An einsum equation string, e.g., `ab,bc->ac`.

- **`output_shape`** (int or list): The expected shape of the output tensor.

- **`input_shape`** (list, optional): Shape of the input tensor.

- **`activation`** (str, optional): Activation function to use.

- **`bias_axes`** (str, optional): Axes to apply bias on.

- **`weight_initializer`** (str, list, tuple): Initializer for the weight matrix. Default is "Xavier".

- **`bias_initializer`** (str, list, tuple): Initializer for the bias vector. Default is "zeros".

- **`trainable`** (bool): Whether the layer's variables should be trainable. Default is `True`.

- **`dtype`** (str): Data type for the computations. Default is `'float32'`.

**Methods**

**`__call__(self, data)`**: Applies the einsum operation to the provided input tensor.

  - **Parameters**:

    - **`data`** (tensor): Input tensor.

  - **Returns**: Output tensor after applying the einsum operation and activation function (if specified).

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Instantiate the einsum_dense layer

layer = nn.einsum_dense(equation='ab,bc->ac', output_shape=64, activation='relu')

# Define a sample input tensor

input_tensor = tf.random.normal(shape=(32, 128))  # (batch_size, input_dim)

# Compute the layer output

output = layer(input_tensor)

print(output.shape)  # Should be (32, 64)

```

# embedding

This class implements an embedding layer, which transforms input indices into dense vectors.

**Initialization Parameters**

- **`output_size`** (int): The size of the output embedding vectors.

- **`input_size`** (int, optional): The size of the input vocabulary. Default is `None`.

- **`initializer`** (str, list, tuple): The initializer for the embedding weights. Default is `'normal'`.

- **`sparse`** (bool): If `True`, supports sparse input tensors. Default is `False`.

- **`use_one_hot_matmul`** (bool): If `True`, uses one-hot matrix multiplication. Default is `False`.

- **`trainable`** (bool): If `True`, the embedding weights are trainable. Default is `True`.

- **`dtype`** (str): The data type for the embedding weights. Default is `'float32'`.

**Methods**

**`__call__(self, data)`**: Applies the embedding layer to the input indices.

  - **Parameters**:

    - **`data`** (tensor): The input tensor containing indices to be embedded.

  - **Returns**: The embedded output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Instantiate the embedding class

embedding_layer = nn.embedding(output_size=64, input_size=1000)

# Define sample input data

input_data = tf.constant([1, 2, 3, 4, 5])

# Compute embeddings

output = embedding_layer(input_data)

print(output.shape)  # Should be (5, 64)

```

# FAVOR_attention

This class implements the Fast Attention via Positive Orthogonal Random Features (FAVOR) mechanism.

**Initialization Parameters**

- **`key_dim`** (int): The dimensionality of the keys.

- **`orthonormal`** (bool): If `True`, uses orthonormal random features. Default is `True`.

- **`causal`** (bool): If `True`, applies causal attention. Default is `False`.

- **`m`** (int): The number of random features. Default is `128`.

- **`redraw`** (bool): If `True`, redraws the random features at each call. Default is `True`.

- **`h`** (function, optional): A scaling function for the random features. Default is `None`.

- **`f`** (list): A list of activation functions to apply to the random features. Default is `[tf.nn.relu]`.

- **`randomizer`** (function): The function to generate random features. Default is `tf.random.normal`.

- **`eps`** (float): A small constant for numerical stability. Default is `0.0`.

- **`kernel_eps`** (float): A small constant added to the kernel features. Default is `0.001`.

- **`dtype`** (str): The data type for computations. Default is `'float32'`.

**Methods**

**`__call__(self, keys, values, queries)`**: Applies the FAVOR attention mechanism to the provided keys, values, and queries.

  - **Parameters**:

    - **`keys`** (Tensor): The key tensor.

    - **`values`** (Tensor): The value tensor.

    - **`queries`** (Tensor): The query tensor.

  - **Returns**:

    - **`Tensor`**: The result of the FAVOR attention mechanism.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Instantiate the FAVOR attention class

attention_layer = nn.FAVOR_attention(key_dim=64)

# Define sample keys, values, and queries

keys = tf.random.normal(shape=(2, 10, 64))

values = tf.random.normal(shape=(2, 10, 64))

queries = tf.random.normal(shape=(2, 5, 64))

# Compute attention output

output = attention_layer(keys, values, queries)

print(output.shape)  # Should be (2, 5, 64)

```

# feed_forward_experts

This class implements a feed-forward layer with multiple experts, allowing for independent feed-forward blocks.

**Initialization Parameters**

- **`num_experts`** (int): The number of experts.

- **`d_ff`** (int): The dimension of the feed-forward layer of each expert.

- **`input_shape`** (tuple, optional): The shape of the input tensor. Default is `None`.

- **`inner_dropout`** (float): Dropout probability after intermediate activations. Default is `0.0`.

- **`output_dropout`** (float): Dropout probability after the output layer. Default is `0.0`.

- **`activation`** (function): The activation function. Default is `tf.nn.gelu`.

- **`kernel_initializer`** (str, list, tuple): The initializer for the kernel weights. Default is `'Xavier'`.

- **`bias_initializer`** (str, list, tuple): The initializer for the bias weights. Default is `'zeros'`.

**Methods**

**`__call__(self, data, train_flag=True)`**: Applies the feed-forward experts layer to the input data.

  - **Parameters**:

    - **`data`** (Tensor): The input tensor.

    - **`train_flag`** (bool): If `True`, applies dropout during training.

  - **Returns**: The transformed input tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Instantiate the feed-forward experts class

ff_experts_layer = nn.feed_forward_experts(num_experts=4, d_ff=128, input_shape=(None, 4, 32, 64))

# Define sample input data

input_data = tf.random.normal(shape=(8, 4, 32, 64))

# Compute output

output = ff_experts_layer(input_data, train_flag=True)

print(output.shape)  # Should be (8, 4, 32, 64)

```

# filter_response_norm

This class implements the Filter Response Normalization (FRN) layer, which normalizes per-channel activations.

**Initialization Parameters**

- **`input_shape`** (tuple, optional): The shape of the input tensor. Default is `None`.

- **`epsilon`** (float): Small constant added to variance to avoid division by zero. Default is `1e-6`.

- **`axis`** (list): List of axes that should be normalized. Default is `[1, 2]`.

- **`beta_initializer`** (str, list, tuple): Initializer for the beta weights. Default is `'zeros'`.

- **`gamma_initializer`** (str, list, tuple): Initializer for the gamma weights. Default is `'ones'`.

- **`learned_epsilon`** (bool): If `True`, adds a learnable epsilon parameter. Default is `False`.

- **`dtype`** (str): The data type for computations. Default is `'float32'`.

**Methods**

**`__call__(self, data)`**: Applies the FRN layer to the input data.

  - **Parameters**:

    - **`data`** (Tensor): The input tensor.

- **Returns**: The normalized output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Instantiate the FRN class

frn_layer = nn.filter_response_norm(input_shape=(None, 32, 32, 64))

# Define sample input data

input_data = tf.random.normal(shape=(8, 32, 32, 64))

# Compute output

output = frn_layer(input_data)

print(output.shape)  # Should be (8, 32, 32, 64)

```

# flatten

This class implements a flatten layer, which reshapes the input tensor to a 2D tensor.

**Methods**

**`__call__(self, data)`**: Applies the flatten layer to the input data.

  - **Parameters**:

    - **`data`** (Tensor): The input tensor.

- **Returns**: The reshaped output tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Instantiate the flatten class

flatten_layer = nn.flatten()

# Define sample input data

input_data = tf.random.normal(shape=(8, 32, 32, 64))

# Compute output

output = flatten_layer(input_data)

print(output.shape)  # Should be (8, 65536)

```

# GatherExcite

The `GatherExcite` class implements the Gather-Excite attention mechanism, designed to enhance convolutional networks by leveraging spatial and channel-wise feature context. It generalizes Squeeze-and-Excitation (SE) with additional flexibility through spatial extent and optional extra parameters.

**Initialization Parameters**

- **channels** (int): Number of input and output channels.

- **feat_size** (int, optional): Spatial feature size for global extent with extra parameters. Required when `extent=0` and `extra_params=True`.

- **extra_params** (bool): If `True`, includes convolutional layers for the gather step. Default is `False`.

- **extent** (int): Controls the spatial extent for feature gathering. `0` indicates global extent. Must be even if greater than `0`. Default is `0`.

- **use_mlp** (bool): Whether to use a multi-layer perceptron for feature transformation. Default is `True`.

- **rd_ratio** (float): Reduction ratio for the number of channels in the MLP. Default is `1/16`.

- **rd_channels** (int, optional): Explicit number of reduced channels in the MLP. Overrides `rd_ratio` if provided.

- **rd_divisor** (int): Divisor used when computing reduced channels. Default is `1`.

- **add_maxpool** (bool): If `True`, adds max pooling to the gather step for enhanced feature aggregation. Default is `False`.

- **act_layer** (callable): Activation function for intermediate layers. Default is `tf.nn.relu`.

- **norm_layer** (callable): Normalization layer to use after convolutions. Default is `nn.batch_norm`.

- **gate_layer** (callable): Gating function to modulate features. Default is `tf.nn.sigmoid`.

**Methods**

**`__call__(self, x)`**

Applies the Gather-Excite attention mechanism to the input tensor.

**Parameters**

- **x** (tf.Tensor): Input tensor of shape `(batch_size, height, width, channels)`.

**Returns**

- **tf.Tensor**: Tensor of the same shape as the input, modulated by the attention mechanism.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the GatherExcite attention module

ge = nn.GatherExcite(channels=64, extent=2, use_mlp=True)

# Generate some sample data

data = tf.random.normal((2, 32, 32, 64))

# Apply the GatherExcite mechanism

output = ge(data)

```

# gaussian_dropout

This class applies multiplicative 1-centered Gaussian noise, useful for regularization during training.

**Initialization Parameters**

- **`rate`** (float): Drop probability. The noise will have standard deviation `sqrt(rate / (1 - rate))`.

- **`seed`** (int, optional): Random seed for deterministic behavior. Default is `7`.

**Methods**

**`__call__(self, data, training=True)`**: Applies the Gaussian Dropout to the input tensor during training.

  - **Parameters**:

    - **`data`** (Tensor): Input tensor of any rank.

    - **`training`** (bool): If `True`, applies dropout. If `False`, returns the input tensor as is.

  - Returns: The output tensor with Gaussian Dropout applied during training.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

dropout_layer = nn.gaussian_dropout(rate=0.5, seed=7)

data = tf.random.normal(shape=(3, 4))

output = dropout_layer(data, train_flag=True)

print(output.shape)  # Same shape as input

```

# gaussian_noise

This class applies additive zero-centered Gaussian noise, useful for regularization and data augmentation during training.

**Initialization Parameters**

- **`stddev`** (float): Standard deviation of the noise distribution.

- **`seed`** (int, optional): Random seed for deterministic behavior. Default is `7`.

**Methods**

**`__call__(self, data, train_flag=True)`**: Applies Gaussian noise to the input tensor during training.

  - **Parameters**:

    - **`data`** (Tensor): Input tensor of any rank.

    - **`train_flag`** (bool): If `True`, adds noise. If `False`, returns the input tensor as is.

  - **Returns**: The output tensor with Gaussian noise added during training.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

noise_layer = nn.gaussian_noise(stddev=0.1, seed=7)

data = tf.random.normal(shape=(3, 4))

output = noise_layer(data, train_flag=True)

print(output.shape)  # Same shape as input

```

# GCN

This class implements a multi-layer Graph Convolutional Network (GCN).

**Initialization Parameters**

- **`x_dim`** (int): Dimension of input features.

- **`h_dim`** (int): Dimension of hidden layers.

- **`out_dim`** (int): Dimension of output features.

- **`nb_layers`** (int): Number of GCN layers. Default is `2`.

- **`dropout_rate`** (float): Dropout rate for regularization. Default is `0.5`.

- **`bias`** (bool): If `True`, adds a learnable bias to the output. Default is `True`.

**Methods**

**`__call__(self, x, adj)`**: Applies the multi-layer GCN to the input tensor and adjacency matrix.

  - **Parameters**:

    - **`x`** (Tensor): Input feature tensor.

    - **`adj`** (Tensor): Adjacency matrix tensor.

  - **Returns**: The output tensor after applying the GCN layers.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

gcn = nn.GCN(x_dim=10, h_dim=20, out_dim=5, nb_layers=2, dropout_rate=0.5)

x = tf.random.normal(shape=(5, 10))

adj = tf.eye(5)

output = gcn(x, adj)

print(output.shape)  # Should be (5, 5)

```

# global_avg_pool1d

These classes implement global average pooling operations for 1D tensors.

**Initialization Parameters**

- **`keepdims`** (bool): If `True`, retains reduced dimensions with length 1. Default is `False`.

**Methods**

**`__call__(self, data)`**: Applies global average pooling to the input tensor.

  - **Parameters**:

    - **`data`** (Tensor): Input 1D tensor.

  - **Returns**: The pooled tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

pool1d = nn.global_avg_pool1d(keepdims=False)

data = tf.random.normal(shape=(3, 4, 5))

output = pool1d(data)

print(output.shape)  # Should be (3, 5)

```

# global_avg_pool2d

These classes implement global average pooling operations for 2D tensors.

**Initialization Parameters**

- **`keepdims`** (bool): If `True`, retains reduced dimensions with length 1. Default is `False`.

**Methods**

**`__call__(self, data)`**: Applies global average pooling to the input tensor.

  - **Parameters**:

    - **`data`** (Tensor): Input 2D tensor.

  - **Returns**: The pooled tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

pool2d = nn.global_avg_pool2d(keepdims=False)

data = tf.random.normal(shape=(3, 4, 5, 6))

output = pool2d(data)

print(output.shape)  # Should be (3, 6)

```

# global_avg_pool3d

These classes implement global average pooling operations for 3D tensors.

**Initialization Parameters**

- **`keepdims`** (bool): If `True`, retains reduced dimensions with length 1. Default is `False`.

**Methods**

**`__call__(self, data)`**: Applies global average pooling to the input tensor.

  - **Parameters**:

    - **`data`** (Tensor): Input 3D tensor.

  - **Returns**: The pooled tensor.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

pool3d = nn.global_avg_pool3d(keepdims=False)

data = tf.random.normal(shape=(3, 4, 5, 6, 7))

output = pool3d(data)

print(output.shape)  # Should be (3, 7)

```

# global_max_pool1d

The `global_max_pool1d` class performs global max pooling on 1D input data, reducing each feature map to its maximum value.

**Initialization Parameters**

- **`keepdims`** (bool, optional): If `True`, retains reduced dimensions with length 1. Default is `False`.

**Methods**

- **`__call__(self, data)`**: Applies global max pooling to the input `data`.

  - **Parameters**:

    - **`data`**: Input tensor of shape `[batch_size, sequence_length, features]`.

  

  - **Returns**: Tensor of reduced shape depending on `keepdims`.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the global max pooling 1D layer

gmp1d = nn.global_max_pool1d(keepdims=True)

# Generate some sample data

data = tf.random.normal((2, 10, 8))

# Apply global max pooling 1D

output = gmp1d(data)

```

# global_max_pool2d

The `global_max_pool2d` class performs global max pooling on 2D input data, reducing each feature map to its maximum value.

**Initialization Parameters**

- **`keepdims`** (bool, optional): If `True`, retains reduced dimensions with length 1. Default is `False`.

**Methods**

- **`__call__(self, data)`**: Applies global max pooling to the input `data`.

  - **Parameters**:

    - **`data`**: Input tensor of shape `[batch_size, height, width, channels]`.

  

  - **Returns**: Tensor of reduced shape depending on `keepdims`.

**Example Usage**

```python

import tensorflow as tf

from Note import nn

# Create an instance of the global max pooling 2D layer

gmp2d = nn.global_max_pool2d(keepdims=True)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/notedance/note-documentation

Awesome Lists containing this project

README