Elasticsearch相关操作梳理

2017-09-05

1. 创建新表

curl -XPUT 'host:port/table_name?pretty'

table_name:表名,不能有大写字母,不能以下划线开头

pretty:使返回结果以便于阅读的JSON格式返回

自定义 schema,即mapping两种方式:

1.在创建表的时候一块创建:

host:port/table_name

{ "settings": { "number_of_shards":4, "number_of_replicas":1 }, "mappings": { "dishlist": { "_all": { "enabled":false }, "properties": { "id": { "type":"string", "index":"not_analyzed" }, "dish_name": { "type":"string", "index":"not_analyzed" }, "dish_name_analyzed": { "type":"string", "index":"analyzed", "analyzer":"ik_max_word" }, "dish_id": { "type":"string", "index":"not_analyzed" }, "poi_id": { "type":"long", "index":"not_analyzed" }, "poi_name": { "type":"string", "store":"true" } } } } }

2.在创建type的时候创建

host:port/table_name/type_name/_mapping?pretty

{ "type_name": { "_all": { "enabled":false }, "properties": { "id": { "type":"string", "index":"not_analyzed" }, "dish_name": { "type":"string", "index":"not_analyzed" }, "dish_name_analyzed": { "type":"string", "index":"analyzed", "analyzer":"ik_max_word" }, "dish_id": { "type":"string", "index":"not_analyzed" }, "poi_id": { "type":"long", "index":"not_analyzed" }, "poi_name": { "type":"string", "index":"not_analyzed" } } } }

1. 单条插入数据

1、实时导入(带ID)

curl -XPUT 'host:port/table_name/type/id?pretty' -d ' { "field": "value" }'

type:类型,可理解为二级表名。

id:该条数据的唯一id

2、实时导入(不带ID)

curl -XPOST 'host:port/table_name/type?pretty' -d ' { "field": "value" }'

不带 id 时使用 “POST”,系统将自动随机生成一个唯一id

导入时如果想更新已存在的

curl -XPOST 'host:port/table_name/type?pretty' -d '

{ “doc”:{ "field": "value" } "detect_noop":false//无视是否修改,强制合并到现有的文档 }'

批量导入:

导入文件需要如下格式:

{ "index": { "_index": "table_name", "_type": "type_name", "_id": "id1" }}

{ "field1": "value1", "field2": "value2", ……}

{ "index": { "_index": "table_name", "_type": "type_name", "_id": "id2" }}

{ "field1": "value1", "field2": "value2", ……}

_id 是可以指定为具体的field的比如下面这样(要保证值是唯一的)

{ "index": { "_index": "table_name", "_type": "type_name", "_id": "value1" }}

{ "field1": "value1", "field2": "value2", ……}

然后用以下接口导入数据:

curl -XPOST 'host:port/_bulk' --data-binary @import.json

import.json 为导入文件名。

也可在此链接中指定表名或type名,这样就不用在导入文件中指定:

curl -XPOST 'host:port/table_name/type_name/_bulk' --data-binary @import.json

注意每个 import.json 文件不能过大,最好在10M左右,大文件可以分割为小文件并行导入

2. 创建自定义分词和同义词配置

1:在elasticsearch-x.x.x/config目录下新建同义词文件synonym.txt。 其中,synonym.txt 编码格式为’utf-8’,内容建议为空。

2:创建索引

host:port/dishtag

{ "settings": { "number_of_shards":3, "number_of_replicas":1, "index": { "analysis": { "analyzer": { "by_smart": { "type":"custom", "tokenizer":"ik_smart", "filter": [ "by_sfr" ] }, "by_max_word": { "type":"custom", "tokenizer":"ik_max_word", "filter": [ "by_sfr" ] } }, "filter": { "by_sfr": { "type":"synonym", "synonyms_path":"analysis/synonym.txt" } } } } }, "mappings": { "dishtag": { "_all": { "enabled":false }, "properties": { "id": { "type":"string", "index":"not_analyzed" }, "tag_id": { "type":"string", "index":"not_analyzed" }, "tag_name": { "type":"string", "index":"not_analyzed", "search_analyzer":"by_max_word", "analyzer":"by_smart" } } } } }

3:添加同义词

4:测试分词后同义词是否生效

host:port/dstag/_analyze?analyzer=by_max_word&pretty&text=鱼

这种结果就是同义词配置已经生效

小结

  • 同义词字典或是IK用户自定义词典更新,必须每次重启elasticsearch才有效。
  • 同义词词对是必须能被完成切分的词语。

评论
发表评论
validate
取消