{"id":26942372,"url":"https://github.com/neuw84/rake-java","last_synced_at":"2025-04-02T16:48:54.609Z","repository":{"id":18104112,"uuid":"21172823","full_name":"Neuw84/RAKE-Java","owner":"Neuw84","description":"A Java implementation of the Rapid Automatic Keyword Extraction Framework ( RAKE )","archived":false,"fork":false,"pushed_at":"2018-02-08T09:23:04.000Z","size":47,"stargazers_count":28,"open_issues_count":0,"forks_count":14,"subscribers_count":8,"default_branch":"master","last_synced_at":"2023-08-09T07:31:17.741Z","etag":null,"topics":["freeling","illinois-pos-tagger","java","keyword-extraction","nlp","pos-tagger"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Neuw84.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-06-24T16:43:50.000Z","updated_at":"2022-11-02T04:07:03.000Z","dependencies_parsed_at":"2022-09-13T06:50:21.216Z","dependency_job_id":null,"html_url":"https://github.com/Neuw84/RAKE-Java","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Neuw84%2FRAKE-Java","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Neuw84%2FRAKE-Java/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Neuw84%2FRAKE-Java/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Neuw84%2FRAKE-Java/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Neuw84","download_url":"https://codeload.github.com/Neuw84/RAKE-Java/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246855280,"owners_count":20844907,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["freeling","illinois-pos-tagger","java","keyword-extraction","nlp","pos-tagger"],"created_at":"2025-04-02T16:48:53.964Z","updated_at":"2025-04-02T16:48:54.604Z","avatar_url":"https://github.com/Neuw84.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"RAKE-Java\n=====================\n\nA Java 8 implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., \u0026 Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry \u0026 J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley \u0026 Sons.\n\nThe implementation is based on the python one from https://github.com/aneesha/RAKE (however some changes have been made)\nThe source code is released under the GPL V3License. \n\nAdd this repository to your POM.XML whether you want to use it with maven\n````xml \n \u003crepository\u003e\n        \u003cid\u003egalan-maven-repo\u003c/id\u003e\n        \u003cname\u003egalan-maven-repo-releases\u003c/name\u003e\n        \u003curl\u003ehttp://galan.ehu.es/artifactory/ext-release-local\u003c/url\u003e\n \u003c/repository\u003e\n\n````\n\nThis implementation requires a POS tagger to be used in order to work. For example The Illinois POS tagger could be used for English.\n\n\nhttp://cogcomp.cs.illinois.edu/page/software_view/POS\n\nFor Spanish or other languages: \n\nFreeLing --\u003e http://nlp.lsi.upc.edu/freeling/ \n\nor Standford Pos tagger --\u003e http://nlp.stanford.edu/software/tagger.shtml\n\n\nThe implementation is in beta state \n\nTODO: \n\n     - More testing \n\n\nThen an example parser for english that will provide the required data (using Illinois POS Tagger)\n\n\n```java\n\n    import LBJ2.nlp.SentenceSplitter;\n    import LBJ2.nlp.WordSplitter;\n    import LBJ2.nlp.seg.PlainToTokenParser;\n    import LBJ2.parse.Parser;\n    import edu.illinois.cs.cogcomp.lbj.chunk.Chunker;\n    import edu.illinois.cs.cogcomp.lbj.pos.POSTagger;\n    import edu.ehu.galan.cvalue.model.Token;\n     ......\n\n     List\u003cLinkedList\u003cToken\u003e\u003e tokenizedSentenceList;\n     List\u003cString\u003e sentenceList;\n     POSTagger tagger = new POSTagger();\n     Chunker chunker = new Chunker();\n     boolean first = true;\n     parser = new PlainToTokenParser(new WordSplitter(new SentenceSplitter(pFile)));\n     String sentence = \"\";\n     LinkedList\u003cToken\u003e tokenList = null;\n     for (LBJ2.nlp.seg.Token word = (LBJ2.nlp.seg.Token) parser.next(); word != null;\n            word = (LBJ2.nlp.seg.Token) parser.next()) {\n            String chunked = chunker.discreteValue(word);\n            tagger.discreteValue(word);\n            if (first) {\n                tokenList = new LinkedList\u003c\u003e();\n                tokenizedSentenceList.add(tokenList);\n                first = false;\n            }\n            tokenList.add(new Token(word.form, word.partOfSpeech, null, chunked));\n            sentence = sentence + \" \" + (word.form);\n            if (word.next == null) {\n                sentenceList.add(sentence);\n                first = true;\n                sentence = \"\";\n            }\n     }\n     parser.reset();\n     \n```\n\nThen RAKE can be processed then.....\n\n\n```java\n\n    Document doc=new Document(full_path,name);\n    doc.setSentenceList(sentences);\n    doc.setTokenList(tokenized_sentences); \n    RakeAlgorithm ex = new RakeAlgorithm();\n    ex.loadStopWordsList(\"resources/lite/stopWordLists/RakeStopLists/SmartStopListEn\");\n    ex.loadPunctStopWord(\"resources/lite/stopWordLists/RakeStopLists/RakePunctDefaultStopList\");\n    PlainTextDocumentReaderLBJEn parser = new PlainTextDocumentReaderLBJEn();\n    parser.readSource(\"testCorpus/textAstronomy\");\n    Document doc = new Document(\"full_path\", \"name\");\n    ex.init(doc);\n    ex.runAlgorithm();\n    doc.getTermList();\n```\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuw84%2Frake-java","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fneuw84%2Frake-java","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fneuw84%2Frake-java/lists"}