栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

iceberg系列(1):存储详解-初探1

iceberg系列(1):存储详解-初探1

Iceberg是数据湖热门组件之一,本系列文章将深入探究一二。
首先将研究iceberg底层存储。

1、启动本地的Spark

./bin/spark-sql 
  --packages org.apache.iceberg:iceberg-spark3-runtime:0.12.1 
  --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions 
  --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
  --conf spark.sql.catalog.spark_catalog.type=hive 
  --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog 
  --conf spark.sql.catalog.local.type=hadoop 
  --conf spark.sql.catalog.local.warehouse=$PWD/warehouse

分别使用v1 v2两种格式创建表
使用format-version 1创建表table

CREATE TABLE local.db.table (id bigint, data string) USING iceberg;

打开目录,其结构如下:

(base) ➜ table ll -R
total 0
drwxr-xr-x  6 liliwei  staff   192B Jan  2 21:22 metadata

./metadata:
total 16
-rw-r--r--@ 1 liliwei  staff   1.2K Jan  2 21:22 v1.metadata.json
-rw-r--r--@ 1 liliwei  staff     1B Jan  2 21:22 version-hint.text
(base) ➜ table

查看v1.metadata.json,内容如下:

{
  "format-version" : 1,
  "table-uuid" : "0dc08d49-ed4d-49bb-8ddf-006e37c65372",
  "location" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/table",
  "last-updated-ms" : 1641129739691,
  "last-column-id" : 2,
  "schema" : {
    "type" : "struct",
    "schema-id" : 0,
    "fields" : [ {
      "id" : 1,
      "name" : "id",
      "required" : false,
      "type" : "long"
    }, {
      "id" : 2,
      "name" : "data",
      "required" : false,
      "type" : "string"
    } ]
  },
  "current-schema-id" : 0,
  "schemas" : [ {
    "type" : "struct",
    "schema-id" : 0,
    "fields" : [ {
      "id" : 1,
      "name" : "id",
      "required" : false,
      "type" : "long"
    }, {
      "id" : 2,
      "name" : "data",
      "required" : false,
      "type" : "string"
    } ]
  } ],
  "partition-spec" : [ ],
  "default-spec-id" : 0,
  "partition-specs" : [ {
    "spec-id" : 0,
    "fields" : [ ]
  } ],
  "last-partition-id" : 999,
  "default-sort-order-id" : 0,
  "sort-orders" : [ {
    "order-id" : 0,
    "fields" : [ ]
  } ],
  "properties" : {
    "owner" : "liliwei"
  },
  "current-snapshot-id" : -1,
  "snapshots" : [ ],
  "snapshot-log" : [ ],
  "metadata-log" : [ ]
}

查看version-hint.text,内容如下:

1

使用format-version 2创建表tableV2

CREATE TABLE local.db.tableV2 (id bigint, data string) 
USING iceberg
TBLPROPERTIES ('format-version'='2'); 

tavleV2的目录结构如下:

(base) ➜ tableV2 cd metadata
(base) ➜ metadata ll
total 16
-rw-r--r--  1 liliwei  staff   936B Jan  2 21:38 v1.metadata.json
-rw-r--r--  1 liliwei  staff     1B Jan  2 21:38 version-hint.text
(base) ➜ metadata

v1.metadata.json的内容如下:

{
  "format-version" : 2,
  "table-uuid" : "67b54789-070c-4600-b2ff-3b9a0a774e4a",
  "location" : "/Users/liliwei/plat/spark-3.1.2-bin-hadoop3.2/warehouse/db/tableV2",
  "last-sequence-number" : 0,
  "last-updated-ms" : 1641130714999,
  "last-column-id" : 2,
  "current-schema-id" : 0,
  "schemas" : [ {
    "type" : "struct",
    "schema-id" : 0,
    "fields" : [ {
      "id" : 1,
      "name" : "id",
      "required" : false,
      "type" : "long"
    }, {
      "id" : 2,
      "name" : "data",
      "required" : false,
      "type" : "string"
    } ]
  } ],
  "default-spec-id" : 0,
  "partition-specs" : [ {
    "spec-id" : 0,
    "fields" : [ ]
  } ],
  "last-partition-id" : 999,
  "default-sort-order-id" : 0,
  "sort-orders" : [ {
    "order-id" : 0,
    "fields" : [ ]
  } ],
  "properties" : {
    "owner" : "liliwei"
  },
  "current-snapshot-id" : -1,
  "snapshots" : [ ],
  "snapshot-log" : [ ],
  "metadata-log" : [ ]
}

version-hint.text的内容如下:

1

下面,我们将插入数据到表中,查看其变化:
移步:iceberg系列(1):存储详解-初探2

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/742343.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号