Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bnosac/crfsuite
Labelling Sequential Data in Natural Language Processing with R - using CRFsuite
https://github.com/bnosac/crfsuite
chunking conditional-random-fields crf crfsuite data-science intent-classification natural-language-processing ner nlp r r-package
Last synced: 11 days ago
JSON representation
Labelling Sequential Data in Natural Language Processing with R - using CRFsuite
- Host: GitHub
- URL: https://github.com/bnosac/crfsuite
- Owner: bnosac
- License: other
- Created: 2018-08-17T15:43:42.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2023-09-18T07:30:37.000Z (about 1 year ago)
- Last Synced: 2024-10-07T19:37:09.244Z (about 1 month ago)
- Topics: chunking, conditional-random-fields, crf, crfsuite, data-science, intent-classification, natural-language-processing, ner, nlp, r, r-package
- Language: C
- Homepage:
- Size: 890 KB
- Stars: 62
- Watchers: 8
- Forks: 11
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Labelling Sequential Data in Natural Language Processing
This repository contains an R package which wraps the CRFsuite C/C++ library (https://github.com/chokkan/crfsuite), allowing the following:
- Fit a **Conditional Random Field** model (1st-order linear-chain Markov)
- Use the model to get predictions alongside the model on new data
- The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for **named entity recognition, text chunking, part of speech tagging, intent recognition or classification** of any category you have in mind.For users unfamiliar with Conditional Random Field (CRF) models, you can read this excellent tutorial https://homepages.inf.ed.ac.uk/csutton/publications/crftut-fnt.pdf
## Installation
- The package is on CRAN, so just install it with the command `install.packages("crfsuite")`
- For installing the development version of this package: `devtools::install_github("bnosac/crfsuite", build_vignettes = TRUE)`## Model building and tagging
For detailed documentation on how to build your own CRF tagger for doing NER / Chunking. Look to the vignette.
```r
library(crfsuite)
vignette("crfsuite-nlp", package = "crfsuite")
```#### Short example
```r
library(crfsuite)## Get example training data + enrich with token and part of speech 2 words before/after each token
x <- ner_download_modeldata("conll2002-nl")
x <- crf_cbind_attributes(x,
terms = c("token", "pos"), by = c("doc_id", "sentence_id"),
from = -2, to = 2, ngram_max = 3, sep = "-")## Split in train/test set
crf_train <- subset(x, data == "ned.train")
crf_test <- subset(x, data == "testa")## Build the crf model
attributes <- grep("token|pos", colnames(x), value=TRUE)
model <- crf(y = crf_train$label,
x = crf_train[, attributes],
group = crf_train$doc_id,
method = "lbfgs", options = list(max_iterations = 25, feature.minfreq = 5, c1 = 0, c2 = 1))
model## Use the model to score on existing tokenised data
scores <- predict(model, newdata = crf_test[, attributes], group = crf_test$doc_id)table(scores$label)
B-LOC B-MISC B-ORG B-PER I-LOC I-MISC I-ORG I-PER O
261 211 182 693 24 205 209 605 35297
```## Build custom CRFsuite models
The package itself does not contain any models to do NER or Chunking. It's a package which facilitates creating **your own CRF model** for doing Named Entity Recognition or Chunking **on your own data** with your **own categories**.
In order to facilitate creating training data of your own text, a shiny app is made available in this R package which allows you to easily tag your own chunks of text, using your own categories.
More details about how to launch the app, which data is needed for building a model, how to start to build and use your model - read the vignette *in detail*: `vignette("crfsuite-nlp", package = "crfsuite")`.![](vignettes/app-screenshot.png)
## Support in text mining
Need support in text mining?
Contact BNOSAC: http://www.bnosac.be