https://github.com/nlpir-team/nlpir-analysis-cn-ictclas

Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程，修改Lucene/Solr版本，以兼容相应版本。
https://github.com/nlpir-team/nlpir-analysis-cn-ictclas

chinese-word-segmentation ictclas lucene lucene-analyzer nlpir solr

Last synced: 15 days ago
JSON representation

Host: GitHub
URL: https://github.com/nlpir-team/nlpir-analysis-cn-ictclas
Owner: NLPIR-team
License: apache-2.0
Created: 2017-08-13T14:07:48.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2020-03-17T04:06:18.000Z (about 5 years ago)
Last Synced: 2025-03-25T09:12:10.435Z (about 1 month ago)
Topics: chinese-word-segmentation, ictclas, lucene, lucene-analyzer, nlpir, solr
Language: Java
Homepage:
Size: 29.9 MB
Stars: 73
Watchers: 10
Forks: 28
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Now NLPIR/ICTCLAS for Lucene/Solr plugin V2.2 

# Lucene-analyzers-nlpir-ictclas-6.6.0

NLPIR/ICTCLAS for Lucene/Solr 6.6.0 analyzer plugin. Support: MacOS,Linux x86/64, Windows x86/64

The project resources folder is a source folder, which contains all platform's dynamic libraries and push them to the classpath.//Source Folder 保证所有平台下的动态库自动部署到classpath环境下，以便JNA加载动态库。

# Building Lucene-analyzers-nlpir-ictclas

Lucene-analyzers-nlpir-ictclas is built by Maven. To build Lucene-analyzers-nlpir-ictclas run:

```bash

mvn clean package -DskipTests

```

Or if you use IDE(Eclipse), there is also the same way.

# How to use in your projects

You can use NLPIRTokenizerAnalyzer to do the Chinese Word Segmentation:

* NLPIRTokenizerAnalyzer DEMO

```java

        String text="我是中国人";

        NLPIRTokenizerAnalyzer nta = new NLPIRTokenizerAnalyzer("", 1, "", "", false);

        TokenStream  ts  = nta.tokenStream("word", text);  

        ts.reset();

        CharTermAttribute  term = ts.getAttribute(CharTermAttribute.class);

        while(ts.incrementToken()){

            System.out.println(term.toString());

        }

        ts.end();

        ts.close();

        nta.close();

```

and also use in Lucene：

* Lucene DEMO

The sample shows how to index your text and search by using NLPIRTokenizerAnalyzer.

```java

        //For indexing

        NLPIRTokenizerAnalyzer nta = new NLPIRTokenizerAnalyzer("", 1, "", "", false);

        IndexWriterConfig inconf=new IndexWriterConfig(nta);

        inconf.setOpenMode(OpenMode.CREATE_OR_APPEND);

        IndexWriter index=new IndexWriter(FSDirectory.open(Paths.get("index/")),inconf);

        Document doc = new Document();

        doc.add(new TextField("contents", "特朗普表示，很高兴汉堡会晤后再次同习近平主席通话。我同习主席就重大问题保持沟通和协调、两国加强各层级和各领域交往十分重要。当前，美中关系发展态势良好，我相信可以发展得更好。我期待着对中国进行国事访问。",Field.Store.YES));

        index.addDocument(doc);

        index.flush();

        index.close();

        //for searching

        String field = "contents";

        IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get("index/")));

        IndexSearcher searcher = new IndexSearcher(reader);

        QueryParser parser = new QueryParser(field, nta);

        Query query = parser.parse("特朗普习近平");

        TopDocs top=searcher.search(query, 100);

        ScoreDoc[] hits = top.scoreDocs;

        for(int i=0;i

Waring: You need to make sure the plugin jar can find the nlpir.properties file. You can put the file to solr_home/server/, and the data need to set the path of NLPIR/ICTCLAS Data.

* Solr Managed-schema

```

  

    

      

    

    

      

    

  

```

4. dependency jar for dll: jna.jar. add to your solr's lib.

# Tokenizer

* v2.*

```

//Standard Tokenizer

class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizer"

//Finer Segment

class="org.nlpir.lucene.cn.ictclas.finersegmet.FinerTokenizer"

```

* v1.*

```

//Standard Tokenizer

class="org.nlpir.lucene.cn.ictclas.NLPIRTokenizer"

```

# Solr Show

![Alt text](https://github.com/NLPIR-team/nlpir-analysis-cn-ictclas/blob/master/solr.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nlpir-team/nlpir-analysis-cn-ictclas

Awesome Lists containing this project

README