https://github.com/entitizer/concepts-data-js
Data for Concept Extraction
https://github.com/entitizer/concepts-data-js
concept concepts entitizer entity
Last synced: 5 months ago
JSON representation
Data for Concept Extraction
- Host: GitHub
- URL: https://github.com/entitizer/concepts-data-js
- Owner: entitizer
- Created: 2015-09-15T20:55:35.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2022-12-06T20:25:00.000Z (over 3 years ago)
- Last Synced: 2025-10-22T13:18:50.785Z (8 months ago)
- Topics: concept, concepts, entitizer, entity
- Language: TypeScript
- Size: 340 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# concepts-data [DEPRECATED]
Data used by [concepts-parser](https://github.com/entitizer/concepts-parser-js).
## Data types/names
- **connect_words** - words that (may) connect concepts: *for*, *of*, etc.;
- **invalid_concepts** (*accentless*) - known invalid words/concepts: *Brown*, *all*, etc.;
- **invalid_prefixes** (*accentless*) - words that (can) connect concepts: *In London*, **In** is an invalid prefix;
- **known_concepts** - irregular known concepts: *Dancing with the stars*;
- **partial_concepts** (*accentless*) - words/concepts that are invalid alone: *Barack*, *Vladimir*, etc.;
- **split_words** - words that (can) split concepts: *and*, *-*, etc.;
- **valid_prefixes** - valid concept prefixes;
- **valid_suffixes** - valid concept suffixes: Mumbai City **district**, *island*;
- **firstnames** (*accentless*) - popular firstnames;
## Usage
```
const data = require('concept-data');
// get split words for English:
const rules = data.getSplitWords('en');
```
## Changelog
#### v0.4.2 - May 3, 2018
- news firstnames by country
#### v0.4.1 - May 2, 2018
- added `firstnames`
- script `build-firstnames`
#### v0.4.0 - April 19, 2018
- removed data `rename_concepts`
- data values can be `string`[] or `RegExp`[]
- `ava` tests
- node v4
#### v0.3.2 - March 26, 2018
- added stopwords to `invalid_concepts`
#### v0.3.0 - March 9, 2017
- TypeScript code
#### v0.2.1 - August 20, 2016
- fix empty data file issue
#### v0.2.0 - August 9, 2016
- engine >= node4
- es6 syntax
#### v0.1.2 - December 15, 2015
- build 1 regExp from a list of data items. better performance
- fix small errors
#### v0.1.0 - November 28, 2015
- renamed: **concept-data** to **concepts-data**;
- fix concept split bug.
#### v0.0.3 - October 4, 2015
- keep data files in txt format;
- added **rename_concepts** - set a correct/known name for a concept;
- get data by *lang* and **country** codes.