您应该使用yjk21
s3fs建议的模块。但是,由于调用ParquetDataset的结果,您将获得pyarrow.parquet.ParquetDataset对象。要获取PandasDataframe,您宁愿应用到它:
.read_pandas().to_pandas()
import pyarrow.parquet as pqimport s3fss3 = s3fs.S3FileSystem()pandas_dataframe = pq.ParquetDataset('s3://your-bucket/', filesystem=s3).read_pandas().to_pandas()


