版本:7.17.2
Request
{
"field": "cn_name",
"text": "山崎 12"
}
Response
{
"tokens" : [
{
"token" : "山崎",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "12",
"start_offset" : 2,
"end_offset" : 4,
"type" : "ARABIC",
"position" : 1
}
]
}
Request
{
"profile": true,
"explain": true,
"query": {
"bool": {
"must": [
{
"match": {
"cn_name": {
"query": "山崎 12"
}
}
}
]
}
},
"from": 0,
"size": 10
}
Response
....
"_explanation" : {
"value" : 9.302625,
"description" : "sum of:",
"details" : [
{
"value" : 9.302625,
"description" : "weight(cn_name:12 in 13135) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 9.302625,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
只能命中数字 12
,山崎
不能命中,profile
的查询条件是有山崎
和 12
"profile" : {
"shards" : [
{
"id" : "[x-x][_tables][0]",
"searches" : [
{
"query" : [
{
"type" : "BooleanQuery",
"description" : "cn_name:山崎 cn_name:12",
Request
{
"profile": true,
"explain": true,
"query": {
"bool": {
"must": [
{
"match": {
"cn_name": {
"query": "山崎 12",
"operator": "and"
}
}
}
]
}
},
"from": 0,
"size": 10
}
添加了operator
参数做测试,但什么结果都匹配不到。搜索山崎 12 年
就能匹配到。想问下大佬我需要再做什么测试验证,从哪方便找问题呢?
1
bxb100 2022-07-25 11:11:33 +08:00
你看下 search analysis 是不是 standrad
|
2
Morriaty 2022-07-25 11:15:01 +08:00
GET <index_name>/_validate/query?rewrite=true 能看到是怎么拆 term 的
|
4
fengci OP @Morriaty
``` Request { "query": { "match": { "cn_name": { "query": "山崎 12" } } } } Response { "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "valid" : true, "explanations" : [ { "index" : "whiskey_depot_1658716667", "valid" : true, "explanation" : "cn_name:山崎 cn_name:12" } ] } ``` |
5
fengci OP 山崎 是做了自定义词典。 字段只做了 ik_smart 分词,没做其他过滤,搜索和录入用的都是 ik_smart 。
|
6
misaka19000 2022-07-25 11:34:43 +08:00
是不是索引搞错了
|
7
fengci OP @misaka19000 应该是原始内容录入的时候没有 12 这个分词 。他是 12 年作为分词了。raw: 山崎 12 年 金花标 单一麦芽威士忌
{ "tokens" : [ { "token" : "山崎", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0 }, { "token" : "12 年", "start_offset" : 3, "end_offset" : 6, "type" : "TYPE_CQUAN", "position" : 1 }, { "token" : "金花", "start_offset" : 7, "end_offset" : 9, "type" : "CN_WORD", "position" : 2 }, { "token" : "标", "start_offset" : 9, "end_offset" : 10, "type" : "CN_CHAR", "position" : 3 }, { "token" : "单一麦芽", "start_offset" : 11, "end_offset" : 15, "type" : "CN_WORD", "position" : 4 }, { "token" : "威士忌", "start_offset" : 15, "end_offset" : 18, "type" : "CN_WORD", "position" : 5 } ] } 但是我之前是对 12 做了单独的词典的,刚测试才把数字的词典删掉 |
8
novolunt 2022-07-25 13:05:24 +08:00
{
"tokens" : [ { "token" : "山崎", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0 }, { "token" : "12", "start_offset" : 3, "end_offset" : 5, "type" : "ARABIC", "position" : 1 }, { "token" : "年", "start_offset" : 6, "end_offset" : 7, "type" : "CN_CHAR", "position" : 2 }, { "token" : "金花", "start_offset" : 8, "end_offset" : 10, "type" : "CN_WORD", "position" : 3 }, { "token" : "标", "start_offset" : 10, "end_offset" : 11, "type" : "CN_CHAR", "position" : 4 }, { "token" : "单一", "start_offset" : 12, "end_offset" : 14, "type" : "CN_WORD", "position" : 5 }, { "token" : "麦芽", "start_offset" : 14, "end_offset" : 16, "type" : "CN_WORD", "position" : 6 }, { "token" : "威士忌", "start_offset" : 16, "end_offset" : 19, "type" : "CN_WORD", "position" : 7 }, { "token" : "1990", "start_offset" : 19, "end_offset" : 23, "type" : "ARABIC", "position" : 8 } ] } |
9
novolunt 2022-07-25 13:07:22 +08:00
bin/elasticsearch-plugin -v install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.17.2/elasticsearch-analysis-ik-7.17.2.zip
|
10
fengci OP @novolunt 谢谢,找到问题了。 之前多做了一步, 把 12 年 加到字典里面 ,然后 原始内容,入库的时候分隔得到的是 12 年 没有 12 。
|
11
WhereverYouGo 2022-07-25 14:46:03 +08:00
试下 terms 匹配
|
12
fengci OP @WhereverYouGo raw:山崎 12 年 金花标 单一麦芽威士忌 ,是我自己自定义词典有个 数字+年。所以建文档的时候数据只有 12 年,没有 12 的分词。 所以搜索不出来。谢谢
|