https://github.com/paceaux/isidore
Isidore: A grammar-parsing utility for the internet
https://github.com/paceaux/isidore
Last synced: 5 months ago
JSON representation
Isidore: A grammar-parsing utility for the internet
- Host: GitHub
- URL: https://github.com/paceaux/isidore
- Owner: paceaux
- License: mit
- Created: 2018-10-09T16:26:09.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-07-02T01:30:07.000Z (almost 7 years ago)
- Last Synced: 2025-09-25T13:58:13.279Z (9 months ago)
- Language: JavaScript
- Homepage:
- Size: 188 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Isidore: A grammar utility for the internet
An experimental, potentially multilingual, highly configurable part-of-speech tagger.
This is still somewhat experimental.. Verb identification, in particular, is still limited. Please monitor the [issuer tracker](https://github.com/paceaux/isidore/issues) for progress on feature development.
[![dev dependency status][1]][2]
[![License][license-image]][license-url]
[![Downloads][downloads-image]][downloads-url]
## API
### `PartsOfSpeech`
There is a class per part of speech. All parts of speech inherit from the `word` class:
#### `Word` Class
| Member | Type | Description |
|-------| -----| -------------|
| `partOfSpeech` | string | one of the eight parts of speech|
| `word` | string | the word (token) |
| `type` | string | a high-level classification for this partOfSpeech |
| `types` | array | all possible types for this `partOfSpeech` |
#### Parts of Speech:
* `Noun`
* `Pronoun`
* `Verb`
* `Adjective`
* `Adverb`
* `Conjunction`
* `Interjection`
* `Preposition`
These may all have additional members.
### Dictionaries
There is a dictionary class per part-of-speech. All dictionaries inherit from the `Dictionary` class.
Any rules about how words can vary ([inflections](https://en.wikipedia.org/wiki/Inflection)) are stored in the `Dictionary` for that part of speech (e.g. `NounDictionary` has rules about how to recognize plurals and possessives);
This is so that Isidore could have potential to scale into other languages without requiring massive rewrites.
#### `Dictionary`
| Member | Type | Description |
|-------| -----| -------------|
| `list` | array | sorted list of words |
| `language` | string | Language for the dictionary |
| `findWord()` | method | finds a word in the dictionary |
| `partOfSpeech` | string | the part of speech that dictionary has |
| `GrammarModel` | class | The type that the dictionary contains |
#### `NounDictionary` class:
| Member | Type | Description |
|-------| -----| -------------|
| `list`| array | Nouns with types and wordCategories|
| `language` | string | two-letter abbreviation of language|
| `inflections` | object | inflections that can be applied to all nouns |
| `findWord(word)` | method | Searches for word in dictionary (returns `Noun` if successful) |
| `getInflections(word)` | method | word (`string`), returns all possible inflections for the word |
| `guessInflection(word)` | method | (`string`), returns a single inflection (`object`) |
| `removeInflection(word, inflection)` | method | word(`string`), inflection (`object`). returns `string` |
### Languages
All languages within the utility.
Right now we just have `En`, which is a `Language`;
#### `Language` class
| Member | Type | Description |
|-------| -----| -------------|
|`grammarDictionaries` | Object | `NounDictionary`, `PronounDictionary`, `VerbDictionary`, `AdjectiveDictionary`, `AdverbDictionary`, `ConjunctionDictionary`, `InterjectionDictionary`, `PrepositionDictionary`
| `language` | string | two-letter description of language|
| `findWord()` | method | accepts a string, returns an array containing `partOfSpeech` or a `word` if no word is found
### `Sentence`
The sentence is where the parsing magic starts.
#### `Sentence` Class
| Member | Type | Description |
|-------| -----| -------------|
| `text`| string | raw text of the sentence |
|`type` | string | declarative, interrogative, imperative, exclamatory|
| `language` | string | Language (`En`) is default|
| `rawWordList`| array | only the words in the sentence |
| `wordList`| array | each word in the sentence classified as either a or word |
| `types` | array | the possible types that a sentence could have |
| `getSentenceType()` | method | returns string, the type of sentence. Guesses what the sentence type is based on punctuation |
## Example
const { Sentence } = isidore
const mySentence = new Sentence('He gives her a car.');
const { wordList } = mySentence;
console.log(mySentence);
/*
Sentence {
text: 'He gives him a car.',
language: 'En',
rawWordList: [ 'he', 'gives', 'him', 'a', 'car' ],
wordList:
[
Pronoun {
partOfSpeech: 'pronoun',
word: 'he',
referent: 'animate',
gender: 'masculine',
type: 'subject',
person: 3,
quantity: 'singular'
},
Verb {
partOfSpeech: 'verb',
word: 'give',
type: 'transitive',
valence: 2
},
Pronoun {
partOfSpeech: 'pronoun',
word: 'him',
referent: 'animate',
gender: 'masculine',
type: 'object',
person: 3,
quantity: 'singular',
},
Adjective {
partOfSpeech: 'adjective',
word: 'a',
type: 'article',
degree: undefined
},
Noun {
partOfSpeech: 'noun',
word: 'car',
type: 'entityClass',
subType: 'common',
inflection: undefined
}
],
type: 'declarative'
}
*/
[1]: https://david-dm.org/paceaux/isidore/dev-status.svg
[2]: https://david-dm.org/paceaux/isidore#info=devDependencies
[license-image]: http://img.shields.io/npm/l/isidore.svg
[license-url]: LICENSE
[downloads-image]: http://img.shields.io/npm/dm/isidore.svg
[downloads-url]: http://npm-stat.com/charts.html?package=isidore