jq可以跨文件执行聚合

正如其中一项注释中所建议的那样，我最终使用SQL以我所需的格式导出JSON。另一个线程大有帮助。最后，我选择将给定的SQL表输出到其自己的JSON文件中，而不是将它们组合在一起（文件大小变得难以管理）。这样做的代码结构是这样的，您可以为Bulk
API和JSON数据行生成命令行：

create or replace function format_data_line(command text, data_str text)returns setof text language plpgsql as $$begin    return next command;    return next          replace( regexp_replace(data_str,     '(dddd-dd-dd)T', '1 ', 'g'), e' n ', '');end $$;COPY (    with f_1 as(       SELECT id, json_agg(fileX.*) AS tag       FROM forum.file3       GROUP BY id    )    SELECt         format_data_line( format('{"update":{"_index":"forum2","_type":"subject","_id":%s}}',a.id), format('{"doc":{"id":%s,"fileX":%s}}',      a.id, a.tag))    FROM f_1 a ) TO '/path/to/json/fileX.json';

使用Bulk
API导入较大的文件也存在问题（内存不足Java错误），因此在特定时间只需要脚本就可以将数据的子集发送到Curl（在Elasticsearch中建立索引）。该脚本的基本结构为：

#!/bin/bashFILE=$1INC=100numline=`wc -l $FILE | awk '{print $1}'`rm -f output/$FILE.txtfor i in `seq 1 $INC $numline`; do    TIME=`date +%H:%M:%S`    echo "[$TIME] Processing lines from $i to $((i + INC -1))"    rm -f intermediates/interm_file_$i.json    sed -n $i,$((i +INC - 1))p $FILE >> intermediates/interm_file_$i.json    curl -s -XPOST localhost:9200/_bulk --data-binary @intermediates/interm_file_$i.json >> output/$FILE.txtdone

应在脚本文件目录下创建一个“中间”目录。该脚本可以另存为“ ESscript”，并在命令行上运行：

./ESscript fileX.json

jq可以跨文件执行聚合

面试问答相关栏目本月热门文章