关于KKFileView的搭建与使用这里就不多说了,KKFileView官网基本都给出了解决方案,有一些个别的复制问题,我也在另一篇文档中写了。KKFileView在线预览初使用记录,主要解决不可复制等一些限制问题。 Elasticsearch在Java中使用前言,公司之前在线文档使用的Flash预览,用的es2全文检索,现在要进行项目整改,Flash现在不能用了,所以调整为KKFileView。对于ES也需要进行升级,添加IK中文分词器。所以就写了这篇文档进行总结与存档。
下面我贴出了已经写好工具类,方便后续使用。
文件处理安装文本抽取插件如果是纯文本的格式,那么我们直接上传就好了,但是如果是word、PDF等其他的文件形式,就需要进行预处理操作了。所以我们要先建立一个通道;关于为什么建立通道的问题,有兴趣的同学可以去看一下es的PUT请求原理;
## 安装目录下运行下面的命令就可以进行安装 ./bin/elasticsearch-plugin install ingest-attachment定义文本抽取管道
利用kibana运行下面的代码段,提示true就OK了.
[记得重启es,不然一会定义管道的时候,会报错哦。如果是集群的话,所有的服务都要重启才可以]
如果不知道kibana是什么的同学,可以去学习一下
对于安装kibana和运行可以去看一下我的另一篇文档
Linux安装运行
Mac安装运行
ik分词器这两篇文档也都有说
PUT /_ingest/pipeline/attachment
{
"description": "Extract attachment information",
"processors": [
{
"attachment": {
"field": "content",
"ignore_missing": true
}
},
{
"remove": {
"field": "content"
}
}
]
}
创建索引
PUT /索引名称
这里的properties可以根据实际字段进行调整
PUT /fileindex
{
"mappings": {
"properties": {
"id":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "ik_max_word"
},
"sfName":{
"type": "text",
"analyzer": "ik_max_word"
},
"createBy":{
"type": "text",
"analyzer": "ik_max_word"
},
"type":{
"type": "keyword"
},
"attachment": {
"properties": {
"content":{
"type": "text",
"analyzer": "ik_smart"
}
}
}
}
}
}
如果上面两步都成功的话,那可以进行一个测试了
因为ElasticSearch是基于JSON 格式的文档数据库,所以附件文档在插入ElasticSearch之前必须进行base64编码。先通过下面的网站将一个pdf文件转化为base64的文本。PDF to base64
POST /docwrite/_doc?pipeline=attachment
{
"name":"进口红酒",
"type":"pdf",
"content":"这里放入你转换后的base64"
}
然后我们可以通过GET来查询刚刚上传的文档是否成功。
GET /docwrite/_search
如果不出意外的话,应该是可以正常看到已经解析后的信息,这里我就不贴图了。
如果不指定pipline的话,是无法被es解析的。查询出来就是不是你所认识的中文,哈哈哈。
必要信息的实体类org.elasticsearch.client elasticsearch-rest-high-level-client 7.13.4 org.elasticsearch elasticsearch 7.13.4 org.apache.httpcomponents httpclient 4.5.8 org.apache.httpcomponents httpcore 4.4.9 com.alibaba fastjson 1.2.7
//这个实体类可以是 你的实际业务为主,但是要保留content字段
public class fileMessage {
String id; //用于存储文件id
String name; //文件名
String type; //文件的type,pdf,word,or txt
String content; //文件转化成base64编码后所有的内容。
}
上传代码
使用下面的updateESFile方法就可以上传了。具体的业务逻辑,你们可以根据实际业务来做,实体类也要根据实际业务来做。
private void updateESFile(String filePath){
File file = new File(filePath);
if (!file.exists()) {
System.out.println("找不到文件");
}
fileMessage fileM = new fileMessage();
try {
byte[] bytes = getContent(file);
String base64 = base64.getEncoder().encodeToString(bytes);
fileM.setId("1");
fileM.setName(file.getName());
fileM.setContent(base64);
IndexRequest indexRequest = new IndexRequest("fileindex");
//上传同时,使用attachment pipline进行提取文件
indexRequest.source(JSON.toJSONString(fileM), XContentType.JSON);
indexRequest.setPipeline("attachment");
IndexResponse indexResponse = EsUtil.client.index(indexRequest, RequestOptions.DEFAULT);
logger.info("send to eSearch:" + fileName);
logger.info("send to eSeach results:" + indexResponse);
} catch (IOException | SAXException | TikaException e) {
e.printStackTrace();
}
}
private byte[] getContent(File file) throws IOException {
long fileSize = file.length();
if (fileSize > Integer.MAX_VALUE) {
System.out.println("file too big...");
return null;
}
FileInputStream fi = new FileInputStream(file);
byte[] buffer = new byte[(int) fileSize];
int offset = 0;
int numRead = 0;
while (offset < buffer.length
&& (numRead = fi.read(buffer, offset, buffer.length - offset)) >= 0) {
offset += numRead;
}
// 确保所有数据均被读取
if (offset != buffer.length) {
throw new IOException("Could not completely read file "
+ file.getName());
}
fi.close();
return buffer;
}
JavaEsUtil工具类
package util;
import com.alibaba.fastjson.JSON;
import org.apache.http.HttpHost;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.Timevalue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import java.io.IOException;
import java.lang.reflect.Method;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
public class EsUtil {
public static RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("127.0.0.1", 9200, "http")));
public static void shutdown() {
if (client != null) {
try {
client.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static final String DEFAULT_TYPE = "_doc";
public static final String SET_METHOD_PREFIX = "set";
public static final String RESPONSE_STATUS_CREATED = "CREATED";
public static final String RESPONSE_STATUS_OK = "OK";
public static final String RESPONSE_STATUS_NOT_FOUND = "NOT_FOUND";
public static final String[] IGNORE_KEY = {"@timestamp", "@version", "type"};
public static final Timevalue TIME_VALUE_SEConDS = Timevalue.timevalueSeconds(1);
public static final String PATCH_OP_TYPE_INSERT = "insert";
public static final String PATCH_OP_TYPE_DELETE = "delete";
public static final String PATCH_OP_TYPE_UPDATE = "update";
//==========================================数据操作(工具)(不参与调用es)=================================================
public static void ignoreSource(Map map) {
for (String key : IGNORE_KEY) {
map.remove(key);
}
}
public static T dealObject(Map sourceAsMap, Class clazz) {
try {
ignoreSource(sourceAsMap);
Iterator keyIterator = sourceAsMap.keySet().iterator();
T t = clazz.newInstance();
while (keyIterator.hasNext()) {
String key = keyIterator.next();
String replaceKey = key.replaceFirst(key.substring(0, 1), key.substring(0, 1).toUpperCase());
Method method = null;
try {
method = clazz.getMethod(SET_METHOD_PREFIX + replaceKey, sourceAsMap.get(key).getClass());
} catch (NoSuchMethodException e) {
continue;
}
method.invoke(t, sourceAsMap.get(key));
}
return t;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
//==========================================索引操作=================================================
public static boolean insertIndex(String index) {
//创建索引请求
CreateIndexRequest request = new CreateIndexRequest(index);
//执行创建请求IndicesClient,请求后获得响应
try {
CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
return response != null;
} catch (Exception e) {
e.printStackTrace();
}
return false;
}
public static boolean isExitsIndex(String index) {
GetIndexRequest request = new GetIndexRequest(index);
try {
return client.indices().exists(request, RequestOptions.DEFAULT);
} catch (Exception e) {
e.printStackTrace();
}
return false;
}
public static boolean deleteIndex(String index) {
DeleteIndexRequest request = new DeleteIndexRequest(index);
try {
AcknowledgedResponse response = client.indices().delete(request, RequestOptions.DEFAULT);
return response.isAcknowledged();
} catch (Exception e) {
e.printStackTrace();
}
return false;
}
//==========================================文档操作(新增,删除,修改)=================================================
public static boolean insertOrUpdatedocument(String index, String id, Object data) {
try {
IndexRequest request = new IndexRequest(index);
request.timeout(TIME_VALUE_SECONDS);
if (id != null && id.length() > 0) {
request.id(id);
}
request.source(JSON.toJSONString(data), XContentType.JSON);
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
String status = response.status().toString();
if (RESPONSE_STATUS_CREATED.equals(status) || RESPONSE_STATUS_OK.equals(status)) {
return true;
}
} catch (Exception e) {
e.printStackTrace();
}
return false;
}
public static boolean updatedocument(String index, String id, Object data) {
try {
UpdateRequest request = new UpdateRequest(index, id);
request.doc(JSON.toJSONString(data), XContentType.JSON);
UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
String status = response.status().toString();
if (RESPONSE_STATUS_OK.equals(status)) {
return true;
}
} catch (Exception e) {
e.printStackTrace();
}
return false;
}
public static boolean deletedocument(String index, String id) {
try {
DeleteRequest request = new DeleteRequest(index, id);
DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
String status = response.status().toString();
if (RESPONSE_STATUS_OK.equals(status)) {
return true;
}
} catch (Exception e) {
e.printStackTrace();
}
return false;
}
public static boolean simplePatchInsert(String index, List
Java ES全文检索
public List完结
至此简单使用Java对ES文档上传后全文检索已经完成。祝愿大家完美撒花;
✿✿ヽ(°▽°)ノ✿ 帅哥美女,留个赞再走吧。



