https://github.com/heikowagner/generalized-semantic-regression

RiskBERT is a significant step forward, making it easier than ever to incorporate text fragments into various applications, such as insurance frequency and severity models, or other GLM-based models. Feel free to explore and utilize RiskBERT for your text analysis needs.
https://github.com/heikowagner/generalized-semantic-regression

glm insurance llm pytorch risk

Last synced: 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/heikowagner/generalized-semantic-regression
Owner: heikowagner
License: apache-2.0
Created: 2023-05-11T21:12:57.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-03-03T18:17:08.000Z (over 2 years ago)
Last Synced: 2024-03-03T19:28:47.215Z (over 2 years ago)
Topics: glm, insurance, llm, pytorch, risk
Language: Python
Homepage: https://www.thebigdatablog.com/generalized-semantic-regression-using-contextual-embeddings/
Size: 6.5 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # generalized-semantic-regression

RiskBERT is a significant step forward, making it easier than ever to incorporate text fragments into various applications, such as insurance frequency and severity models, or other GLM-based models. Feel free to explore and utilize RiskBERT for your text analysis needs.

To learn more about the RiskBERT implementation read this article: https://www.thebigdatablog.com/generalized-semantic-regression-using-contextual-embeddings/

Example: 

`pip install RiskBERT`

```

from transformers import AutoTokenizer

import torch

from RiskBERT import glmModel, RiskBertModel

from RiskBERT import trainer, evaluate_model

from RiskBERT.simulation.data_functions import Data

from RiskBERT.utils import DataConstructor

# Set device to gpu if available

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Init the model

model_dataset = Data(20000, scores=torch.tensor([[0.2],[0.4]]), weigth=5)

pre_model= "distilbert-base-uncased"

model = RiskBertModel(model=pre_model, input_dim=2, dropout=0.4, freeze_bert=True, mode="CLS")

tokenizer = AutoTokenizer.from_pretrained(pre_model)

# Train the model

model, Total_Loss, Validation_Loss, Test_Loss = trainer(model =model, 

        model_dataset=model_dataset, 

        epochs=100,

        batch_size=1000,

        evaluate_fkt=evaluate_model,

        tokenizer=tokenizer, 

        optimizer=torch.optim.SGD(model.parameters(), lr=0.001),

        device = device

        )

# Predict from the model

my_data = DataConstructor(

    sentences=[["Dies ist ein Test"],["Hallo Welt", "RiskBERT ist das Beste"]], 

    covariates=[[1,5],[2,6]],

    tokenizer= tokenizer).prepare_for_model()

my_prediction=model(**my_data)

```

# Upload to pip

```

python -m pip install build twine

python -m build

twine check dist/*

twine upload dist/*`

````

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/heikowagner/generalized-semantic-regression

Awesome Lists containing this project

README