{"id":13423445,"url":"https://github.com/bnosac/crfsuite","last_synced_at":"2026-03-15T09:36:25.909Z","repository":{"id":56934395,"uuid":"145138765","full_name":"bnosac/crfsuite","owner":"bnosac","description":"Labelling Sequential Data in Natural Language Processing with R - using CRFsuite","archived":false,"fork":false,"pushed_at":"2023-09-18T07:30:37.000Z","size":911,"stargazers_count":62,"open_issues_count":8,"forks_count":12,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-12-14T00:32:43.557Z","etag":null,"topics":["chunking","conditional-random-fields","crf","crfsuite","data-science","intent-classification","natural-language-processing","ner","nlp","r","r-package"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bnosac.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2018-08-17T15:43:42.000Z","updated_at":"2024-02-29T08:49:56.000Z","dependencies_parsed_at":"2024-01-31T07:57:43.406Z","dependency_job_id":null,"html_url":"https://github.com/bnosac/crfsuite","commit_stats":{"total_commits":136,"total_committers":2,"mean_commits":68.0,"dds":0.007352941176470562,"last_synced_commit":"2638e8e42722142c90f406b36f9b81c9ae43865f"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bnosac%2Fcrfsuite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bnosac%2Fcrfsuite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bnosac%2Fcrfsuite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bnosac%2Fcrfsuite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bnosac","download_url":"https://codeload.github.com/bnosac/crfsuite/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":231468021,"owners_count":18381174,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chunking","conditional-random-fields","crf","crfsuite","data-science","intent-classification","natural-language-processing","ner","nlp","r","r-package"],"created_at":"2024-07-31T00:00:34.749Z","updated_at":"2026-03-15T09:36:20.857Z","avatar_url":"https://github.com/bnosac.png","language":"C","funding_links":[],"categories":["C"],"sub_categories":[],"readme":"# Labelling Sequential Data in Natural Language Processing\n\nThis repository contains an R package which wraps the CRFsuite C/C++ library (https://github.com/chokkan/crfsuite), allowing the following:\n\n- Fit a **Conditional Random Field** model (1st-order linear-chain Markov) \n- Use the model to get predictions alongside the model on new data\n- The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for **named entity recognition, text chunking, part of speech tagging, intent recognition or classification** of any category you have in mind.\n\nFor users unfamiliar with Conditional Random Field (CRF) models, you can read this excellent tutorial https://homepages.inf.ed.ac.uk/csutton/publications/crftut-fnt.pdf\n\n## Installation\n\n- The package is on CRAN, so just install it with the command `install.packages(\"crfsuite\")`\n- For installing the development version of this package: `devtools::install_github(\"bnosac/crfsuite\", build_vignettes = TRUE)`\n\n## Model building and tagging\n\nFor detailed documentation on how to build your own CRF tagger for doing NER / Chunking. Look to the vignette.\n\n```r\nlibrary(crfsuite)\nvignette(\"crfsuite-nlp\", package = \"crfsuite\")\n```\n\n#### Short example\n\n```r\nlibrary(crfsuite)\n\n## Get example training data + enrich with token and part of speech 2 words before/after each token\nx \u003c- ner_download_modeldata(\"conll2002-nl\")\nx \u003c- crf_cbind_attributes(x, \n                          terms = c(\"token\", \"pos\"), by = c(\"doc_id\", \"sentence_id\"), \n                          from = -2, to = 2, ngram_max = 3, sep = \"-\")\n\n## Split in train/test set\ncrf_train \u003c- subset(x, data == \"ned.train\")\ncrf_test \u003c- subset(x, data == \"testa\")\n\n## Build the crf model\nattributes \u003c- grep(\"token|pos\", colnames(x), value=TRUE)\nmodel \u003c- crf(y = crf_train$label, \n             x = crf_train[, attributes], \n             group = crf_train$doc_id, \n             method = \"lbfgs\", options = list(max_iterations = 25, feature.minfreq = 5, c1 = 0, c2 = 1)) \nmodel\n\n## Use the model to score on existing tokenised data\nscores \u003c- predict(model, newdata = crf_test[, attributes], group = crf_test$doc_id)\n\ntable(scores$label)\n B-LOC B-MISC  B-ORG  B-PER  I-LOC I-MISC  I-ORG  I-PER      O \n   261    211    182    693     24    205    209    605  35297 \n```\n\n\n## Build custom CRFsuite models\n\nThe package itself does not contain any models to do NER or Chunking. It's a package which facilitates creating **your own CRF model** for doing Named Entity Recognition or Chunking **on your own data** with your **own categories**.\n\nIn order to facilitate creating training data of your own text, a shiny app is made available in this R package which allows you to easily tag your own chunks of text, using your own categories. \nMore details about how to launch the app, which data is needed for building a model, how to start to build and use your model - read the vignette *in detail*: `vignette(\"crfsuite-nlp\", package = \"crfsuite\")`.\n\n![](vignettes/app-screenshot.png)\n\n\n## Support in text mining\n\nNeed support in text mining?\nContact BNOSAC: http://www.bnosac.be\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbnosac%2Fcrfsuite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbnosac%2Fcrfsuite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbnosac%2Fcrfsuite/lists"}