https://github.com/vortezwohl/deeplotx
An out-of-the-box long-text NLP framework.
https://github.com/vortezwohl/deeplotx
autoregressive classification deep-learning deeplearning gender-recognition multilingual ner nlp sequence-models
Last synced: about 2 months ago
JSON representation
An out-of-the-box long-text NLP framework.
- Host: GitHub
- URL: https://github.com/vortezwohl/deeplotx
- Owner: vortezwohl
- License: gpl-3.0
- Created: 2025-04-27T09:39:19.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-08-28T10:11:39.000Z (8 months ago)
- Last Synced: 2025-08-28T17:28:42.213Z (8 months ago)
- Topics: autoregressive, classification, deep-learning, deeplearning, gender-recognition, multilingual, ner, nlp, sequence-models
- Language: Python
- Homepage: https://deepwiki.com/vortezwohl/DeepLoTX
- Size: 4.71 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://deepwiki.com/vortezwohl/DeepLoTX)
# *Deep Long Text Learning*
*An out-of-the-box long-text NLP framework.*
> Author: [vortezwohl](https://github.com/vortezwohl)
## Citation
If you are incorporating the `DeepLoTX` framework into your research, please remember to properly cite it to acknowledge its contribution to your work.
Если вы интегрируете фреймворк `DeepLoTX` в своё исследование, пожалуйста, не забудьте правильно сослаться на него, указывая его вклад в вашу работу.
もしあなたが研究に `DeepLoTX` フレームワークを組み入れているなら、その貢献を認めるために適切に引用することを忘れないでください.
如果您正在將 `DeepLoTX` 框架整合到您的研究中,請務必正確引用它,以聲明它對您工作的貢獻.
```bibtex
@software{Wu_DeepLoTX_2025,
author = {Wu, Zihao},
license = {GPL-3.0},
month = aug,
title = {{DeepLoTX}},
url = {https://github.com/vortezwohl/DeepLoTX},
version = {0.9.5},
year = {2025}
}
```
## Installation
- **With pip**
```
pip install -U deeplotx
```
- **With uv (recommended)**
```
uv add -U deeplotx
```
- **Get the latest features from GitHub**
```
pip install -U git+https://github.com/vortezwohl/DeepLoTX.git
```
## Quick start
- ### Named entity recognition
> *Multilingual is supported.*
> *Gender recognition is supported.*
Import dependencies
```python
from deeplotx import BertNER
ner = BertNER()
```
```python
ner('你好, 我的名字是吴子豪, 来自福建福州.')
```
stdout:
```
[NamedPerson(text='吴子豪', type='PER', base_probability=0.9995428418719051, gender=, gender_probability=0.9970703125),
NamedEntity(text='福建', type='LOC', base_probability=0.9986373782157898),
NamedEntity(text='福州', type='LOC', base_probability=0.9993632435798645)]
```
```python
ner("Hi, i'm Vortez Wohl, author of DeeploTX.")
```
stdout:
```
[NamedPerson(text='Vortez Wohl', type='PER', base_probability=0.9991965342072855, gender=, gender_probability=0.87255859375)]
```
- ### Gender recognition
> *Multilingual is supported.*
> *Integrated from [Name2Gender](https://github.com/vortezwohl/Name2Gender)*
Import dependencies
```python
from deeplotx import Name2Gender
n2g = Name2Gender()
```
Recognize gender of "Elon Musk":
```python
n2g('Elon Musk')
```
stdout:
```
```
Recognize gender of "Anne Hathaway":
```python
n2g('Anne Hathaway')
```
stdout:
```
```
Recognize gender of "吴彦祖":
```python
n2g('吴彦祖', return_probability=True)
```
stdout:
```
(, 1.0)
```
- ### Apply LoRA to a model
Import dependencies
```python
from deeplotx import LoRA
```
Assumed that the `model` has been loaded
```python
model = ... # Maybe an LLM or some other deep neural network models
lora_model = LoRA.apply_to(model, target_modules=['q_proj'], rank=16, alpha=32, dropout_rate=.05)
```
- ### Long text embedding
- **BERT based long text embedding**
```python
from deeplotx import LongTextEncoder
encoder = LongTextEncoder(
chunk_size=448,
overlapping=32
)
encoder.encode('我是吴子豪, 这是一个测试文本.', flatten=False)
```
stdout:
```
tensor([ 2.2316e-01, 2.0300e-01, ..., 1.5578e-01, -6.6735e-02])
```
- **Longformer based long text embedding**
```python
from deeplotx import LongformerEncoder
encoder = LongformerEncoder()
encoder.encode('Thank you for using DeepLoTX.')
```
stdout:
```
tensor([-2.7490e-02, 6.6503e-02, ..., -6.5937e-02, 6.7802e-03])
```
- ### Similarities calculation
- **Vector based**
```python
import deeplotx.similarity as sim
vector_0, vector_1 = [1, 2, 3, 4], [4, 3, 2, 1]
distance_0 = sim.euclidean_similarity(vector_0, vector_1)
print(distance_0)
distance_1 = sim.cosine_similarity(vector_0, vector_1)
print(distance_1)
distance_2 = sim.chebyshev_similarity(vector_0, vector_1)
print(distance_2)
```
stdout:
```
4.47213595499958
0.33333333333333337
3
```
- **Set based**
```python
import deeplotx.similarity as sim
set_0, set_1 = {1, 2, 3, 4}, {4, 5, 6, 7}
distance_0 = sim.jaccard_similarity(set_0, set_1)
print(distance_0)
distance_1 = sim.ochiai_similarity(set_0, set_1)
print(distance_1)
distance_2 = sim.dice_coefficient(set_0, set_1)
print(distance_2)
distance_3 = sim.overlap_coefficient(set_0, set_1)
print(distance_3)
```
stdout:
```
0.1428571428572653
0.2500000000001875
0.25000000000009376
0.2500000000001875
```
- **Distribution based**
```python
import deeplotx.similarity as sim
dist_0, dist_1 = [0.3, 0.2, 0.1, 0.4], [0.2, 0.1, 0.3, 0.4]
distance_0 = sim.cross_entropy(dist_0, dist_1)
print(distance_0)
distance_1 = sim.kl_divergence(dist_0, dist_1)
print(distance_1)
distance_2 = sim.js_divergence(dist_0, dist_1)
print(distance_2)
distance_3 = sim.hellinger_distance(dist_0, dist_1)
print(distance_3)
```
stdout:
```
0.3575654913778237
0.15040773967762736
0.03969123741566945
0.20105866986400994
```
- ### Pre-defined neural networks
```python
from deeplotx import (
FeedForward,
MultiHeadFeedForward,
LinearRegression,
LogisticRegression,
SoftmaxRegression,
RecursiveSequential,
LongContextRecursiveSequential,
RoPE,
LoRA,
Attention,
MultiHeadAttention,
RoFormerEncoder,
AutoRegression,
LongContextAutoRegression
)
```
The fundamental FFN (MLPs):
```python
from typing_extensions import override
import torch
from torch import nn
from deeplotx.nn.base_neural_network import BaseNeuralNetwork
class FeedForwardUnit(BaseNeuralNetwork):
def __init__(self, feature_dim: int, expansion_factor: int | float = 2,
bias: bool = True, dropout_rate: float = 0.05, model_name: str | None = None,
device: str | None = None, dtype: torch.dtype | None = None):
super().__init__(in_features=feature_dim, out_features=feature_dim, model_name=model_name, device=device, dtype=dtype)
self._dropout_rate = dropout_rate
self.up_proj = nn.Linear(in_features=feature_dim, out_features=int(feature_dim * expansion_factor),
bias=bias, device=self.device, dtype=self.dtype)
self.down_proj = nn.Linear(in_features=int(feature_dim * expansion_factor), out_features=feature_dim,
bias=bias, device=self.device, dtype=self.dtype)
self.parametric_relu = nn.PReLU(num_parameters=1, init=5e-3,
device=self.device, dtype=self.dtype)
self.layer_norm = nn.LayerNorm(normalized_shape=self.up_proj.in_features, eps=1e-9,
device=self.device, dtype=self.dtype)
@override
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.ensure_device_and_dtype(x, device=self.device, dtype=self.dtype)
residual = x
x = self.layer_norm(x)
x = self.up_proj(x)
x = self.parametric_relu(x)
if self._dropout_rate > .0:
x = torch.dropout(x, p=self._dropout_rate, train=self.training)
return self.down_proj(x) + residual
class FeedForward(BaseNeuralNetwork):
def __init__(self, feature_dim: int, num_layers: int = 1, expansion_factor: int | float = 2,
bias: bool = True, dropout_rate: float = 0.05, model_name: str | None = None,
device: str | None = None, dtype: torch.dtype | None = None):
if num_layers < 1:
raise ValueError('num_layers cannot be less than 1.')
super().__init__(in_features=feature_dim, out_features=feature_dim, model_name=model_name, device=device, dtype=dtype)
self.ffn_layers = nn.ModuleList([FeedForwardUnit(feature_dim=feature_dim,
expansion_factor=expansion_factor, bias=bias,
dropout_rate=dropout_rate,
device=self.device, dtype=self.dtype) for _ in range(num_layers)])
@override
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = self.ensure_device_and_dtype(x, device=self.device, dtype=self.dtype)
for ffn in self.ffn_layers:
x = ffn(x)
return x
```
Attention:
```python
from typing_extensions import override
import torch
from deeplotx.nn.base_neural_network import BaseNeuralNetwork
from deeplotx.nn.feed_forward import FeedForward
from deeplotx.nn.rope import RoPE, DEFAULT_THETA
class Attention(BaseNeuralNetwork):
def __init__(self, feature_dim: int, bias: bool = True, positional: bool = True,
proj_layers: int = 1, proj_expansion_factor: int | float = 1.5, dropout_rate: float = 0.02,
model_name: str | None = None, device: str | None = None, dtype: torch.dtype | None = None,
**kwargs):
super().__init__(in_features=feature_dim, out_features=feature_dim, model_name=model_name,
device=device, dtype=dtype)
self._positional = positional
self._feature_dim = feature_dim
self.q_proj = FeedForward(feature_dim=self._feature_dim, num_layers=proj_layers,
expansion_factor=proj_expansion_factor,
bias=bias, dropout_rate=dropout_rate, device=self.device, dtype=self.dtype)
self.k_proj = FeedForward(feature_dim=self._feature_dim, num_layers=proj_layers,
expansion_factor=proj_expansion_factor,
bias=bias, dropout_rate=dropout_rate, device=self.device, dtype=self.dtype)
self.v_proj = FeedForward(feature_dim=self._feature_dim, num_layers=proj_layers,
expansion_factor=proj_expansion_factor,
bias=bias, dropout_rate=dropout_rate, device=self.device, dtype=self.dtype)
if self._positional:
self.rope = RoPE(feature_dim=self._feature_dim, theta=kwargs.get('theta', DEFAULT_THETA),
device=self.device, dtype=self.dtype)
def _attention(self, x: torch.Tensor, y: torch.Tensor, mask: torch.Tensor | None = None) -> torch.Tensor:
q, k = self.q_proj(x), self.k_proj(y)
if self._positional:
q, k = self.rope(q), self.rope(k)
attn = torch.matmul(q, k.transpose(-2, -1))
attn = attn / (self._feature_dim ** 0.5)
attn = attn.masked_fill(mask == 0, -1e9) if mask is not None else attn
return torch.softmax(attn, dtype=self.dtype, dim=-1)
@override
def forward(self, x: torch.Tensor, y: torch.Tensor | None = None, mask: torch.Tensor | None = None) -> torch.Tensor:
x = self.ensure_device_and_dtype(x, device=self.device, dtype=self.dtype)
y = x if y is None else self.ensure_device_and_dtype(y, device=self.device, dtype=self.dtype)
if mask is not None:
mask = self.ensure_device_and_dtype(mask, device=self.device, dtype=self.dtype)
v = self.v_proj(y)
return torch.matmul(self._attention(x, y, mask), v)
```