Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shixzie/nlp
[UNMANTEINED] Extract values from strings and fill your structs with nlp.
https://github.com/shixzie/nlp
go golang natural-language-processing nlp parse text text-extraction
Last synced: 3 months ago
JSON representation
[UNMANTEINED] Extract values from strings and fill your structs with nlp.
- Host: GitHub
- URL: https://github.com/shixzie/nlp
- Owner: shixzie
- License: mit
- Archived: true
- Created: 2017-01-25T07:19:03.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-09-18T14:32:30.000Z (over 7 years ago)
- Last Synced: 2024-08-01T21:27:29.128Z (6 months ago)
- Topics: go, golang, natural-language-processing, nlp, parse, text, text-extraction
- Language: Go
- Homepage:
- Size: 50.8 KB
- Stars: 389
- Watchers: 22
- Forks: 34
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-go-extra - nlp - 01-25T07:19:03Z|2017-09-18T14:32:30Z| (Bot Building / Morphological Analyzers)
README
[![GoDoc](https://godoc.org/github.com/shixzie/nlp?status.svg)](https://godoc.org/github.com/shixzie/nlp)
[![Go Report Card](https://goreportcard.com/badge/github.com/shixzie/nlp)](https://goreportcard.com/report/github.com/shixzie/nlp)
[![Build Status](https://travis-ci.org/shixzie/nlp.svg?branch=master)](https://travis-ci.org/shixzie/nlp)
[![codecov](https://codecov.io/gh/shixzie/nlp/branch/master/graph/badge.svg)](https://codecov.io/gh/shixzie/nlp)# nlp
> `nlp` is a general purpose any-lang Natural Language Processor that parses the data inside a text and returns a filled model
## Supported types
```go
int int8 int16 int32 int64
uint uint8 uint16 uint32 uint64
float32 float64
string
time.Time
time.Duration
```## Installation
```
// go1.8+ is required
go get -u github.com/shixzie/nlp
```**Feel free to create PR's and open Issues :)**
## How it works
You will always begin by creating a NL type calling nlp.New(), the NL type is a
Natural Language Processor that owns 3 funcs, RegisterModel(), Learn() and P().### RegisterModel(i interface{}, samples []string, ops ...ModelOption) error
RegisterModel takes 3 parameters, an empty struct, a set of samples and some options for the model.
The empty struct lets nlp know all possible values inside the text, for example:
```go
type Song struct {
Name string // fields must be exported
Artist string
ReleasedAt time.Time
}
err := nl.RegisterModel(Song{}, someSamples, nlp.WithTimeFormat("2006"))
if err != nil {
panic(err)
}
// ...
```tells nlp that inside the text may be a Song.Name, a Song.Artist and a Song.ReleasedAt.
The samples are the key part about nlp, not just because they set the *limits*
between *keywords* but also because they will be used to choose which model
use to handle an expression.Samples must have a special syntax to set those *limits* and *keywords*.
```go
songSamples := []string{
"play {Name} by {Artist}",
"play {Name} from {Artist}",
"play {Name}",
"from {Artist} play {Name}",
"play something from {ReleasedAt}",
}
```In the example below, you can see we're reffering to the Name and Artist fields
of the `Song` type declared above, both `{Name}` and `{Artist}` are our *keywords*
and yes! you guessed it! Everything between `play` and `by` will be treated as a
`{Name}`, and everything that's after `by` will be treated as an `{Artist}` meaning
that `play` and `by` are our *limits*.
```
limits
┌─────┴─────┐
┌┴─┐ ┌┴┐
play {Name} by {Artist}
└─┬──┘ └───┬──┘
└──────┬─────┘
keywords
```Any character can be a *limit*, a `,` for example can be used as a limit.
*keywords* as well as *limits* are `CaseSensitive` so be sure to type them right.
**Note that putting 2 *keywords* together will cause that only 1 or none of them will be detected**
> *limits are important* - Me :3
### Learn() error
Learn maps all models samples to their respective models using the NaiveBayes
algorithm based on those samples. `Learn()` also trains all registered models
so they're able to fit expressions in the future.```go
// must call after all models are registrated and before calling nl.P()
err := nl.Learn()
if err != nil {
panic(err)
}
// ...
```Once the algorithm has finished learning, we're now ready to start Processing
those texts.**Note that you must call NL.Learn() after all models are registrated and before calling NL.P()**
### P(expr string) interface{}
P first asks the trained algorithm which model should be used, once we get
the right *and already trained* model, we just make it fit the expression.**Note that everything in the expression must be separated by a _space_ or _tab_**
When processing an expression, nlp searches for the *limits* inside that
expression and evaluates which sample fits better the expression, it doesn't
matter if the text has `trash`. In this example:
```
limits
┌─────┴─────┐
┌┴─┐ ┌┴┐
play {Name} by {Artist}
└─┬──┘ └───┬──┘
└──────┬─────┘
keywords
```we have 2 *limits*, `play` and `by`, it doesn't matter if we had an expression
*hello sir can you pleeeeeease play King by Lauren Aquilina*, since:
```
limits
trash ┌────┴────┐
┌─────────────┴─────────────┐ ┌┴─┐ ┌┴┐
hello sir can you pleeeeeease play King by Lauren Aquilina
└┬─┘ └─────┬───────┘
{Name} {Artist}
└─┬──┘ └───┬──┘
└──────┬───────┘
keywords
````{Name}` would be replaced with `King`,
`{Artist}` would be replaced with `Lauren Aquilina`,
`trash` would be ignored as well as the *limits* `play` and `by`,
and then **a pointer to a filled struct with the type used to register the model** (`Song`)
( `Song.Name` being `{Name}` and `Song.Artist` beign `{Artist}` )
**will be returned**.## Usage
```go
type Song struct {
Name string
Artist string
ReleasedAt time.Time
}songSamples := []string{
"play {Name} by {Artist}",
"play {Name} from {Artist}",
"play {Name}",
"from {Artist} play {Name}",
"play something from {ReleasedAt}",
}nl := nlp.New()
err := nl.RegisterModel(Song{}, songSamples, nlp.WithTimeFormat("2006"))
if err != nil {
panic(err)
}err = nl.Learn() // you must call Learn after all models are registered and before calling P
if err != nil {
panic(err)
}// after learning you can call P the times you want
s := nl.P("hello sir can you pleeeeeease play King by Lauren Aquilina")
if song, ok := s.(*Song); ok {
fmt.Println("Success")
fmt.Printf("%#v\n", song)
} else {
fmt.Println("Failed")
}// Prints
//
// Success
// &main.Song{Name: "King", Artist: "Lauren Aquilina"}
```