word分词是一个Java实现的中文分词组件,提供了多种基于词典的分词算法,并利用ngram模型来消除歧义。
能准确识别英文、数字,以及日期、时间等数量词,能识别人名、地名、组织机构名等未登录词。
同时提供了Lucene、Solr、ElasticSearch插件。
License | GPL 3.0 |
---|---|
Organization | APDPlat |
HomePage | https://github.com/ysc/word |
Date | (Aug 28, 2015) |
Files | pom (11 KB) jar (0 bytes) View All |
Repositories | CentralSonatypeSpring Lib MSpring Lib ReleaseSpring Plugins |
Used By | 8 artifacts |
Compile Dependencies (9)
Category/License | Group / Artifact | Version | Updates | |
---|---|---|---|---|
BSD | com.belerweb » pinyin4j | 2.5.0 | 2.5.1 | |
Full-Text Indexing Apache 2.0 | org.apache.lucene » lucene-core | 5.2.1 | 8.8.2 | |
Apache 2.0 | org.apache.lucene » lucene-queryparser | 5.2.1 | 8.8.2 | |
Apache 2.0 | org.apache.lucene » lucene-analyzers-common | 5.2.1 | 8.8.2 | |
Apache 2.0 | org.apache.lucene » lucene-backward-codecs | 5.2.1 | 8.8.2 | |
Apache 2.0 | org.apache.lucene » lucene-suggest | 5.2.1 | 8.8.2 | |
ElasticSearch Client | org.elasticsearch » elasticsearch (optional) | 2.0.0-beta1 | 7.12.0 | |
Logging MIT | org.slf4j » slf4j-api | 1.6.4 | 1.7.30 | |
Redis Client MIT | redis.clients » jedis | 2.5.1 | 3.5.2 |
Runtime Dependencies (1)
Category/License | Group / Artifact | Version | Updates | |
---|---|---|---|---|
Logging EPL 1.0LGPL 2.1 | ch.qos.logback » logback-classic | 0.9.28 | 1.2.3 |
Test Dependencies (5)
Category/License | Group / Artifact | Version | Updates | |
---|---|---|---|---|
Testing Apache 2.0 | com.carrotsearch.randomizedtesting » randomizedtesting-runner | 2.1.11 | 2.7.8 | |
Testing EPL 2.0 | junit » junit | 4.11 | 5.7.1 | |
Apache 2.0 | org.apache.lucene » lucene-test-framework | 5.2.1 | 8.8.2 | |
ElasticSearch Client | org.elasticsearch » elasticsearch | 2.0.0-beta1 | 7.12.0 | |
Testing BSD 3-clause | org.hamcrest » hamcrest-library | 1.3 | 2.2 |
Licenses
License | URL |
---|---|
GNU GENERAL PUBLIC LICENSE, Version 3 | http://www.gnu.org/licenses/gpl.html |
Developers
Name | Dev Id | Roles | Organization | |
---|---|---|---|---|
杨尚川 | ysc<at>apdplat.org |