elasticsearch 拼音+ik分词,spring data elasticsearch 拼音分词

2023-03-09,,

elasticsearch 自定义分词

安装拼音分词器、ik分词器

  拼音分词器: https://github.com/medcl/elasticsearch-analysis-pinyin/releases

  ik分词器:https://github.com/medcl/elasticsearch-analysis-ik/releases

  下载源码需要使用maven打包

  下载构建好的压缩包解压后放直接在elasticsearch安装目录下 plugins文件夹下,可以重命名

1.在es中设置分词

创建索引,添加setting属性

PUT myindex
{
"settings": {
"index":{
"analysis":{
"analyzer":{
"ik_pinyin_analyzer":{
"type":"custom",
"tokenizer":"ik_smart",
"filter":"pinyin_filter"
}
},
"filter":{
"pinyin_filter":{
"type":"pinyin",
"keep_separate_first_letter" : false,
       "keep_full_pinyin" : true,
       "keep_original" : false,
"limit_first_letter_length" : 10,
"lowercase" : true,
"remove_duplicated_term" : true
}
}
}
}
}
}

添加属性 设置mapping属性

PUT myindex/_mapping/users
{
"properties": {
"uname":{
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart",
"fields": {
"my_pinyin":{
"type": "text"
, "analyzer": "ik_pinyin_analyzer",
"search_analyzer": "ik_pinyin_analyzer"
}
}
},
"age":{
"type": "integer"
}
}
}

2.spring data elasticsearch设置分词

创建实体类

@Mapping(mappingPath = "elasticsearch_mapping.json")//设置mapping
@Setting(settingPath = "elasticsearch_setting.json")//设置setting
@Document(indexName = "myindex",type = "users")
public class User {
@Id
private Integer id;
//
// @Field(type =FieldType.keyword ,analyzer = "pinyin_analyzer",searchAnalyzer = "pinyin_analyzer")//没有作用
private String name1;
@Field(type = FieldType.keyword)
private String userName;
@Field(type = FieldType.Nested)
private List<Product> products; }
在resources下创建elasticsearch_mapping.json 文件
{
"properties": {
"uname": {
"type": "text",
"analyzer": "ik_smart",
"search_analyzer": "ik_smart",
"fields": {
"my_pinyin": {
"type": "text",
"analyzer": "ik_pinyin_analyzer",
"search_analyzer": "ik_pinyin_analyzer"
}
}
},
"age": {
"type": "integer"
}
}
}
在resources下创建elasticsearch_setting.json 文件

{
"index": {
"analysis": {
"analyzer": {
"ik_pinyin_analyzer": {
"type": "custom",
"tokenizer": "ik_smart",
"filter": "pinyin_filter"
}
},
"filter": {
"pinyin_filter": {
"type": "pinyin",
//true:支持首字母
"keep_first_letter":true,
//false:不支持首字母分隔
"keep_separate_first_letter": false,
//true:支持全拼
"keep_full_pinyin": true,
"keep_original": false,
//设置最大长度
"limit_first_letter_length": 10,
//小写非中文字母
"lowercase": true,
//重复的项将被删除
"remove_duplicated_term": true
}
}
}
}
}
 

ik_max_word:会将文本做最细粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、中华人民、中华、华人、人民共和国、人民、人、民、共和国、共和、和、国国、国歌」,会穷尽各种可能的组合;
ik_smart:会将文本做最粗粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、国歌」;

程序启动后分词并没有设置分词

实体创建后需要加上,创建的索引才可以分词

elasticsearchTemplate.putMapping(User.class);

elasticsearch 拼音+ik分词,spring data elasticsearch 拼音分词的相关教程结束。

《elasticsearch 拼音+ik分词,spring data elasticsearch 拼音分词.doc》

下载本文的Word格式文档,以方便收藏与打印。