{"id":48201405,"url":"https://github.com/heikowagner/generalized-semantic-regression","last_synced_at":"2026-04-04T18:26:03.876Z","repository":{"id":174211447,"uuid":"639621442","full_name":"heikowagner/generalized-semantic-regression","owner":"heikowagner","description":"RiskBERT is a significant step forward, making it easier than ever to incorporate text fragments into various applications, such as insurance frequency and severity models, or other GLM-based models. Feel free to explore and utilize RiskBERT for your text analysis needs.","archived":false,"fork":false,"pushed_at":"2024-03-03T18:17:08.000Z","size":6818,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-03-03T19:28:47.215Z","etag":null,"topics":["glm","insurance","llm","pytorch","risk"],"latest_commit_sha":null,"homepage":"https://www.thebigdatablog.com/generalized-semantic-regression-using-contextual-embeddings/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/heikowagner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-05-11T21:12:57.000Z","updated_at":"2024-03-03T19:28:48.819Z","dependencies_parsed_at":null,"dependency_job_id":"fcf747b6-a777-4010-82bc-337981618be0","html_url":"https://github.com/heikowagner/generalized-semantic-regression","commit_stats":{"total_commits":33,"total_committers":2,"mean_commits":16.5,"dds":0.06060606060606055,"last_synced_commit":"632a8e81f9feb38a7ee52435e37dd187a4a43fa2"},"previous_names":["heikowagner/generalized-semantic-regression"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/heikowagner/generalized-semantic-regression","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heikowagner%2Fgeneralized-semantic-regression","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heikowagner%2Fgeneralized-semantic-regression/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heikowagner%2Fgeneralized-semantic-regression/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heikowagner%2Fgeneralized-semantic-regression/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/heikowagner","download_url":"https://codeload.github.com/heikowagner/generalized-semantic-regression/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/heikowagner%2Fgeneralized-semantic-regression/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31408367,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["glm","insurance","llm","pytorch","risk"],"created_at":"2026-04-04T18:26:03.256Z","updated_at":"2026-04-04T18:26:03.852Z","avatar_url":"https://github.com/heikowagner.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# generalized-semantic-regression\r\nRiskBERT is a significant step forward, making it easier than ever to incorporate text fragments into various applications, such as insurance frequency and severity models, or other GLM-based models. Feel free to explore and utilize RiskBERT for your text analysis needs.\r\n\r\nTo learn more about the RiskBERT implementation read this article: https://www.thebigdatablog.com/generalized-semantic-regression-using-contextual-embeddings/\r\n\r\nExample: \r\n`pip install RiskBERT`\r\n\r\n```\r\nfrom transformers import AutoTokenizer\r\nimport torch\r\nfrom RiskBERT import glmModel, RiskBertModel\r\nfrom RiskBERT import trainer, evaluate_model\r\nfrom RiskBERT.simulation.data_functions import Data\r\nfrom RiskBERT.utils import DataConstructor\r\n\r\n# Set device to gpu if available\r\ndevice = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\r\n\r\n# Init the model\r\nmodel_dataset = Data(20000, scores=torch.tensor([[0.2],[0.4]]), weigth=5)\r\npre_model= \"distilbert-base-uncased\"\r\nmodel = RiskBertModel(model=pre_model, input_dim=2, dropout=0.4, freeze_bert=True, mode=\"CLS\")\r\ntokenizer = AutoTokenizer.from_pretrained(pre_model)\r\n# Train the model\r\nmodel, Total_Loss, Validation_Loss, Test_Loss = trainer(model =model, \r\n        model_dataset=model_dataset, \r\n        epochs=100,\r\n        batch_size=1000,\r\n        evaluate_fkt=evaluate_model,\r\n        tokenizer=tokenizer, \r\n        optimizer=torch.optim.SGD(model.parameters(), lr=0.001),\r\n        device = device\r\n        )\r\n\r\n# Predict from the model\r\nmy_data = DataConstructor(\r\n    sentences=[[\"Dies ist ein Test\"],[\"Hallo Welt\", \"RiskBERT ist das Beste\"]], \r\n    covariates=[[1,5],[2,6]],\r\n    tokenizer= tokenizer).prepare_for_model()\r\nmy_prediction=model(**my_data)\r\n\r\n```\r\n\r\n# Upload to pip\r\n```\r\npython -m pip install build twine\r\npython -m build\r\ntwine check dist/*\r\ntwine upload dist/*`\r\n````\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheikowagner%2Fgeneralized-semantic-regression","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheikowagner%2Fgeneralized-semantic-regression","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheikowagner%2Fgeneralized-semantic-regression/lists"}