Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nlpir-team/nlpir-analysis-cn-ictclas
Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程,修改Lucene/Solr版本,以兼容相应版本。
https://github.com/nlpir-team/nlpir-analysis-cn-ictclas
chinese-word-segmentation ictclas lucene lucene-analyzer nlpir solr
Last synced: 2 days ago
JSON representation
Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程,修改Lucene/Solr版本,以兼容相应版本。
- Host: GitHub
- URL: https://github.com/nlpir-team/nlpir-analysis-cn-ictclas
- Owner: NLPIR-team
- License: apache-2.0
- Created: 2017-08-13T14:07:48.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-03-17T04:06:18.000Z (over 4 years ago)
- Last Synced: 2023-03-01T17:01:30.083Z (over 1 year ago)
- Topics: chinese-word-segmentation, ictclas, lucene, lucene-analyzer, nlpir, solr
- Language: Java
- Homepage:
- Size: 29.9 MB
- Stars: 71
- Watchers: 11
- Forks: 27
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Now NLPIR/ICTCLAS for Lucene/Solr plugin V2.2
# Lucene-analyzers-nlpir-ictclas-6.6.0
NLPIR/ICTCLAS for Lucene/Solr 6.6.0 analyzer plugin. Support: MacOS,Linux x86/64, Windows x86/64
The project resources folder is a source folder, which contains all platform's dynamic libraries and push them to the classpath.//Source Folder 保证所有平台下的动态库自动部署到classpath环境下,以便JNA加载动态库。
# Building Lucene-analyzers-nlpir-ictclas
Lucene-analyzers-nlpir-ictclas is built by Maven. To build Lucene-analyzers-nlpir-ictclas run:
```bash
mvn clean package -DskipTests
```
Or if you use IDE(Eclipse), there is also the same way.
# How to use in your projectsYou can use NLPIRTokenizerAnalyzer to do the Chinese Word Segmentation:
* NLPIRTokenizerAnalyzer DEMO
```java
String text="我是中国人";
NLPIRTokenizerAnalyzer nta = new NLPIRTokenizerAnalyzer("", 1, "", "", false);
TokenStream ts = nta.tokenStream("word", text);
ts.reset();
CharTermAttribute term = ts.getAttribute(CharTermAttribute.class);
while(ts.incrementToken()){
System.out.println(term.toString());
}
ts.end();
ts.close();
nta.close();
```
and also use in Lucene:* Lucene DEMO
The sample shows how to index your text and search by using NLPIRTokenizerAnalyzer.
```java
//For indexing
NLPIRTokenizerAnalyzer nta = new NLPIRTokenizerAnalyzer("", 1, "", "", false);
IndexWriterConfig inconf=new IndexWriterConfig(nta);
inconf.setOpenMode(OpenMode.CREATE_OR_APPEND);
IndexWriter index=new IndexWriter(FSDirectory.open(Paths.get("index/")),inconf);
Document doc = new Document();
doc.add(new TextField("contents", "特朗普表示,很高兴汉堡会晤后再次同习近平主席通话。我同习主席就重大问题保持沟通和协调、两国加强各层级和各领域交往十分重要。当前,美中关系发展态势良好,我相信可以发展得更好。我期待着对中国进行国事访问。",Field.Store.YES));
index.addDocument(doc);
index.flush();
index.close();
//for searching
String field = "contents";
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get("index/")));
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser(field, nta);
Query query = parser.parse("特朗普习近平");
TopDocs top=searcher.search(query, 100);
ScoreDoc[] hits = top.scoreDocs;
for(int i=0;iWaring: You need to make sure the plugin jar can find the nlpir.properties file. You can put the file to solr_home/server/, and the data need to set the path of NLPIR/ICTCLAS Data.
* Solr Managed-schema
```
```4. dependency jar for dll: jna.jar. add to your solr's lib.
# Tokenizer
* v2.*
```
//Standard Tokenizer
class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizer"
//Finer Segment
class="org.nlpir.lucene.cn.ictclas.finersegmet.FinerTokenizer"
```* v1.*
```
//Standard Tokenizer
class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizer"
```# Solr Show
![Alt text](https://github.com/NLPIR-team/nlpir-analysis-cn-ictclas/blob/master/solr.png)