不建议使用ParquetWriter的构造函数(1.8.1),但不建议使用ParquetWriter本身,您仍然可以通过扩展其中的abstract
Builder子类来创建ParquetWriter。
这里是实木复合地板创建者自己的示例ExampleParquetWriter:
public static class Builder extends ParquetWriter.Builder<Group, Builder> { private MessageType type = null; private Map<String, String> extrametaData = new HashMap<String, String>(); private Builder(Path file) { super(file); } public Builder withType(MessageType type) { this.type = type; return this; } public Builder withExtrametaData(Map<String, String> extrametaData) { this.extrametaData = extrametaData; return this; } @Override protected Builder self() { return this; } @Override protected WriteSupport<Group> getWriteSupport(Configuration conf) { return new GroupWriteSupport(type, extrametaData); } }如果您不想使用Group和GroupWriteSupport(捆绑在Parquet中,但仅用作数据模型实现的示例),则可以使用Avro,协议缓冲区或Thrift内存中数据模型。这是一个使用Avro编写Parquet的示例:
try (ParquetWriter<GenericData.Record> writer = AvroParquetWriter .<GenericData.Record>builder(fileToWrite) .withSchema(schema) .withConf(new Configuration()) .withCompressionCodec(CompressionCodecName.SNAPPY) .build()) { for (GenericData.Record record : recordsToWrite) { writer.write(record); }}您将需要以下依赖项:
<dependency> <groupId>org.apache.parquet</groupId> <artifactId>parquet-avro</artifactId> <version>1.8.1</version></dependency><dependency> <groupId>org.apache.parquet</groupId> <artifactId>parquet-hadoop</artifactId> <version>1.8.1</version></dependency>
完整的例子在这里。



