https://github.com/neuw84/rake-java

A Java implementation of the Rapid Automatic Keyword Extraction Framework ( RAKE )
https://github.com/neuw84/rake-java

freeling illinois-pos-tagger java keyword-extraction nlp pos-tagger

Last synced: 6 months ago
JSON representation

A Java implementation of the Rapid Automatic Keyword Extraction Framework ( RAKE )

Host: GitHub
URL: https://github.com/neuw84/rake-java
Owner: Neuw84
License: bsd-3-clause
Created: 2014-06-24T16:43:50.000Z (over 11 years ago)
Default Branch: master
Last Pushed: 2018-02-08T09:23:04.000Z (over 7 years ago)
Last Synced: 2023-08-09T07:31:17.741Z (about 2 years ago)
Topics: freeling, illinois-pos-tagger, java, keyword-extraction, nlp, pos-tagger
Language: Java
Homepage:
Size: 45.9 KB
Stars: 28
Watchers: 8
Forks: 14
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          RAKE-Java

=====================

A Java 8 implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley & Sons.

The implementation is based on the python one from https://github.com/aneesha/RAKE (however some changes have been made)

The source code is released under the GPL V3License. 

Add this repository to your POM.XML whether you want to use it with maven

````xml 

 

        galan-maven-repo

        galan-maven-repo-releases

        http://galan.ehu.es/artifactory/ext-release-local

 

````

This implementation requires a POS tagger to be used in order to work. For example The Illinois POS tagger could be used for English.

http://cogcomp.cs.illinois.edu/page/software_view/POS

For Spanish or other languages: 

FreeLing --> http://nlp.lsi.upc.edu/freeling/ 

or Standford Pos tagger --> http://nlp.stanford.edu/software/tagger.shtml

The implementation is in beta state 

TODO: 

     - More testing 

Then an example parser for english that will provide the required data (using Illinois POS Tagger)

```java

    import LBJ2.nlp.SentenceSplitter;

    import LBJ2.nlp.WordSplitter;

    import LBJ2.nlp.seg.PlainToTokenParser;

    import LBJ2.parse.Parser;

    import edu.illinois.cs.cogcomp.lbj.chunk.Chunker;

    import edu.illinois.cs.cogcomp.lbj.pos.POSTagger;

    import edu.ehu.galan.cvalue.model.Token;

     ......

     List> tokenizedSentenceList;

     List sentenceList;

     POSTagger tagger = new POSTagger();

     Chunker chunker = new Chunker();

     boolean first = true;

     parser = new PlainToTokenParser(new WordSplitter(new SentenceSplitter(pFile)));

     String sentence = "";

     LinkedList tokenList = null;

     for (LBJ2.nlp.seg.Token word = (LBJ2.nlp.seg.Token) parser.next(); word != null;

            word = (LBJ2.nlp.seg.Token) parser.next()) {

            String chunked = chunker.discreteValue(word);

            tagger.discreteValue(word);

            if (first) {

                tokenList = new LinkedList<>();

                tokenizedSentenceList.add(tokenList);

                first = false;

            }

            tokenList.add(new Token(word.form, word.partOfSpeech, null, chunked));

            sentence = sentence + " " + (word.form);

            if (word.next == null) {

                sentenceList.add(sentence);

                first = true;

                sentence = "";

            }

     }

     parser.reset();

     

```

Then RAKE can be processed then.....

```java

    Document doc=new Document(full_path,name);

    doc.setSentenceList(sentences);

    doc.setTokenList(tokenized_sentences); 

    RakeAlgorithm ex = new RakeAlgorithm();

    ex.loadStopWordsList("resources/lite/stopWordLists/RakeStopLists/SmartStopListEn");

    ex.loadPunctStopWord("resources/lite/stopWordLists/RakeStopLists/RakePunctDefaultStopList");

    PlainTextDocumentReaderLBJEn parser = new PlainTextDocumentReaderLBJEn();

    parser.readSource("testCorpus/textAstronomy");

    Document doc = new Document("full_path", "name");

    ex.init(doc);

    ex.runAlgorithm();

    doc.getTermList();

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/neuw84/rake-java

Awesome Lists containing this project

README