Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gatenlp/gateplugin-corpusstats
A plugin for the GATE language technology framework for creating and storing corpus statistics like tf, df.
https://github.com/gatenlp/gateplugin-corpusstats
Last synced: about 2 months ago
JSON representation
A plugin for the GATE language technology framework for creating and storing corpus statistics like tf, df.
- Host: GitHub
- URL: https://github.com/gatenlp/gateplugin-corpusstats
- Owner: GateNLP
- License: lgpl-3.0
- Created: 2016-10-07T13:12:54.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2023-02-10T13:32:37.000Z (almost 2 years ago)
- Last Synced: 2024-04-16T07:59:24.238Z (9 months ago)
- Language: Java
- Homepage: https://gatenlp.github.io/gateplugin-CorpusStats/
- Size: 4.14 MB
- Stars: 0
- Watchers: 12
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# gateplugin-CorpusStats
A plugin for the GATE language technology framework for calculating various term and term pair
statistics over a corpus.The plugin implements the following PRs:
* [CorpusStatsiTfIdfPR](https://gatenlp.github.io/gateplugin-CorpusStats/doc-CorpusStatsTfIdfPR) for processing
a whole corpus and creating files that contain corpus statistics like document frequency, term frequency,
total number of documents etc.
* [AssignStatsTfIdfPR](https://gatenlp.github.io/gateplugin-CorpusStats/doc-AssignStatsTfIdfPR) for processing
a corpus and using the corpus statistics file created with the CorpusStatsPR to add featires to terms
in each document of the corpus. This can be used to create features for scores like `tf` (term frequency),
`wtf` (weighted term frequency), `ltfidf` (logarithmic term frequency times inverse document frequency), and others.
* [CorpusStatsCollocationsPR](https://gatenlp.github.io/gateplugin-CorpusStats/doc-CorpusStatsCollocationsPR) for processing a
corpus and creating TSV files that contain corpus statistics like PMI, Chi-Squared and others
for all pairs of terms.More documentation:
* [User Documentation](https://gatenlp.github.io/gateplugin-CorpusStats/)
* [Developer Documentation](https://github.com/GateNLP/gateplugin-CorpusStats/wiki)
* [JavaDoc](https://gatenlp.github.io/gateplugin-CorpusStats/apidocs/)