https://github.com/bnosac/crfsuite

Labelling Sequential Data in Natural Language Processing with R - using CRFsuite
https://github.com/bnosac/crfsuite

chunking conditional-random-fields crf crfsuite data-science intent-classification natural-language-processing ner nlp r r-package

Last synced: 4 months ago
JSON representation

Labelling Sequential Data in Natural Language Processing with R - using CRFsuite

Host: GitHub
URL: https://github.com/bnosac/crfsuite
Owner: bnosac
License: other
Created: 2018-08-17T15:43:42.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2023-09-18T07:30:37.000Z (over 1 year ago)
Last Synced: 2024-12-14T00:32:43.557Z (4 months ago)
Topics: chunking, conditional-random-fields, crf, crfsuite, data-science, intent-classification, natural-language-processing, ner, nlp, r, r-package
Language: C
Homepage:
Size: 890 KB
Stars: 62
Watchers: 8
Forks: 12
Open Issues: 8
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Labelling Sequential Data in Natural Language Processing

This repository contains an R package which wraps the CRFsuite C/C++ library (https://github.com/chokkan/crfsuite), allowing the following:

- Fit a **Conditional Random Field** model (1st-order linear-chain Markov) 

- Use the model to get predictions alongside the model on new data

- The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for **named entity recognition, text chunking, part of speech tagging, intent recognition or classification** of any category you have in mind.

For users unfamiliar with Conditional Random Field (CRF) models, you can read this excellent tutorial https://homepages.inf.ed.ac.uk/csutton/publications/crftut-fnt.pdf

## Installation

- The package is on CRAN, so just install it with the command `install.packages("crfsuite")`

- For installing the development version of this package: `devtools::install_github("bnosac/crfsuite", build_vignettes = TRUE)`

## Model building and tagging

For detailed documentation on how to build your own CRF tagger for doing NER / Chunking. Look to the vignette.

```r

library(crfsuite)

vignette("crfsuite-nlp", package = "crfsuite")

```

#### Short example

```r

library(crfsuite)

## Get example training data + enrich with token and part of speech 2 words before/after each token

x <- ner_download_modeldata("conll2002-nl")

x <- crf_cbind_attributes(x, 

                          terms = c("token", "pos"), by = c("doc_id", "sentence_id"), 

                          from = -2, to = 2, ngram_max = 3, sep = "-")

## Split in train/test set

crf_train <- subset(x, data == "ned.train")

crf_test <- subset(x, data == "testa")

## Build the crf model

attributes <- grep("token|pos", colnames(x), value=TRUE)

model <- crf(y = crf_train$label, 

             x = crf_train[, attributes], 

             group = crf_train$doc_id, 

             method = "lbfgs", options = list(max_iterations = 25, feature.minfreq = 5, c1 = 0, c2 = 1)) 

model

## Use the model to score on existing tokenised data

scores <- predict(model, newdata = crf_test[, attributes], group = crf_test$doc_id)

table(scores$label)

 B-LOC B-MISC  B-ORG  B-PER  I-LOC I-MISC  I-ORG  I-PER      O 

   261    211    182    693     24    205    209    605  35297 

```

## Build custom CRFsuite models

The package itself does not contain any models to do NER or Chunking. It's a package which facilitates creating **your own CRF model** for doing Named Entity Recognition or Chunking **on your own data** with your **own categories**.

In order to facilitate creating training data of your own text, a shiny app is made available in this R package which allows you to easily tag your own chunks of text, using your own categories. 

More details about how to launch the app, which data is needed for building a model, how to start to build and use your model - read the vignette *in detail*: `vignette("crfsuite-nlp", package = "crfsuite")`.

![](vignettes/app-screenshot.png)

## Support in text mining

Need support in text mining?

Contact BNOSAC: http://www.bnosac.be

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bnosac/crfsuite

Awesome Lists containing this project

README