栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

Google BigQuery中具有深度排序的通用数据透视表

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Google BigQuery中具有深度排序的通用数据透视表

Q 。如果我们进行汇总并获得1000万个结果怎么办?除非我们在bigquery中应用限制等-否则传输的数据量将需要大量的数据。


让我们在这里阐明挑战:

因此,通常,您将在后端运行以下内容,并将结果上载到可视化工具(前端),以进行进一步的操作,例如排序,限制,旋转等。

#standardSQLSELECt  Studio,   Title,   TerritoryID,  Type,   SUM(Price) AS Price,   COUNT(1) AS VolumeFROM YourTable  GROUP BY Studio, Title, TerritoryID, Type

如您所提到的,这种情况下的结果很容易产生1000万以上的行,并且 您希望减小其大小,而又不影响在前端数据透视/可视化中仍然呈现最终数据的能力


。推荐/解决方案

下面显示了如何通过在后端应用排序和限制(从而大大减小结果大小)而没有丢失进行透视的能力并仍然显示总数等来实现此目的。

让我们以简化的一词开始进行最终查询

  • 初始查询(骨架)

假设基于已知标准,我们预先知道应该选择哪些工作室,标题,地区和类型。
在这种情况下,下面的查询将返回所需的数据

#standardSQLWITH Studios AS (  SELECt 'Fox'   UNIOn ALL SELECT 'Paramouont' ),Titles AS (  SELECT 'Fox' AS Studio,'Best Laid Plans' AS Title  UNIOn ALL SELECT 'Fox','Homecoming'  UNIOn ALL SELECT 'Paramount','Titanic'  UNIOn ALL SELECT 'Paramount','Homecoming'),Territories AS (  SELECT 'US' AS TerritoryID  UNIOn ALL SELECT 'GB'),Totals AS (  SELECT     IFNULL(b.Studio,'Other') AS Studio,     IFNULL(b.Title,'Other') AS Title,     IFNULL(c.TerritoryID,'Other') AS TerritoryID,     Type,    ROUND(SUM(Price), 2) AS Price, COUNT(1) AS Volume  FROM yourTable AS a   LEFT JOIN Titles AS b ON a.Studio = b.Studio AND a.Title = b.Title  LEFT JOIN Territories AS c ON a.TerritoryID = c.TerritoryID  GROUP BY Studio, Title, TerritoryID, Type)SELECt * FROM TotalsORDER BY Studio, Title, TerritoryID, Type

输出将如下所示

Studio      TitleTerritoryID Type        Price    Volume  Fox         Best Laid Plans GB          Movie         87.32    18    Fox         Best Laid Plans GB          TV Episode    50.17    23    Fox         Best Laid Plans Other       TV Episode  1131.0      2    Fox         Best Laid Plans US          Movie        120.82    18    Fox         Best Laid Plans US          TV Episode    53.76    24    Fox         Homecoming      GB          TV Episode    60.22    28    Fox         Homecoming      Other       TV Episode  2262.0      4    Fox         Homecoming      US          TV Episode   128.45    58    Other       OtherGB          Movie        142.71    29    Other       OtherGB          TV Episode    84.8     40    Other       OtherOther       Movie       3292.0      4    Other       OtherOther       TV Episode  3282.0     16    Other       OtherUS          Movie         52.92     8    Other       OtherUS          TV Episode   233.05   101    Paramount   Homecoming      GB          Movie         18.96     4    Paramount   Homecoming      US          Movie        124.84    16    Paramount   Titanic         GB          Movie         41.92     8    Paramount   Titanic         Other       Movie         12.0      4    Paramount   Titanic         US          Movie        139.84    16

您可以轻松地将其反馈到用户界面,以任何需要的方式对其进行可视化

  • ``最终’‘查询

现在,让我们为每个维度实施实际的标准,而不是在所有涉及的维度中使用硬编码的值。
因此,以下查询(相对于骨架查询)的唯一变化是以下CTE:工作室,标题和地区

#standardSQLWITH Studios AS (  SELECt DISTINCT Studio   FROM yourTable   ORDER BY Studio LIMIT 3),Titles AS (  SELECt Studio, Title   FROM (    SELECt Studio, Title, ROW_NUMBER() OVER(PARTITION BY Studio ORDER BY PRICE DESC) AS pos    FROM (SELECt Studio, Title, SUM(Price) AS Price FROM yourTable GROUP BY Studio, Title)  ) WHERe pos <= 4),Territories AS (  SELECt TerritoryID FROM yourTable    WHERe Studio = 'Paramount' GROUP BY TerritoryID  ORDER BY COUNT(1) DESC LIMIT 2),Totals AS (  SELECt     IFNULL(b.Studio,'Other') AS Studio,     IFNULL(b.Title,'Other') AS Title,     IFNULL(c.TerritoryID,'Other') AS TerritoryID,     Type,    ROUND(SUM(Price), 2) AS Price, COUNT(1) AS Volume  FROM yourTable AS a   LEFT JOIN Titles AS b ON a.Studio = b.Studio AND a.Title = b.Title  LEFT JOIN Territories AS c ON a.TerritoryID = c.TerritoryID  GROUP BY Studio, Title, TerritoryID, Type)SELECt * FROM TotalsWHERe NOT 'Other' IN (TerritoryID)ORDER BY Studio, TerritoryID DESC, Type, Price DESC, Title

结果是:

Studio      TitleTerritoryID Type        Price  Volume    Fox         Best Laid Plans         US  Movie       120.82  18   Fox         Titanic      US  Movie        52.92   8   Fox         1:00 P.M. - 2:00 P.M.   US  TV Episode  187.25  81   Fox         Homecoming   US  TV Episode  128.45  58   Fox         Best Laid Plans         US  TV Episode   53.76  24   Fox         Best Laid Plans         GB  Movie        87.32  18   Fox         Titanic      GB  Movie        78.84  16   Fox         1:00 P.M. - 2:00 P.M.   GB  TV Episode   61.42  28   Fox         Homecoming   GB  TV Episode   60.22  28   Fox         Best Laid Plans         GB  TV Episode   50.17  23   Paramount   Titanic      US  Movie       139.84  16   Paramount   Homecoming   US  Movie       124.84  16   Paramount   Titanic      GB  Movie        41.92   8   Paramount   Homecoming   GB  Movie        18.96   4   Sony        Best Laid Plans         US  TV Episode   22.9   10   Sony        Homecoming   US  TV Episode   22.9   10   Sony        Best Laid Plans         GB  Movie        63.87  13   Sony        Homecoming   GB  TV Episode   18.81   9   Sony        Best Laid Plans         GB  TV Episode    4.57   3

这里的重点是
-尽管BigQuery在分析数十亿行和提取所需信息方面非常高效,但是使用BigQuery实际定制结果数据以反映该结果将如何在客户端UI的表示层中实际呈现是非常无效的。相反,您应该将这些数据传递给UI并使用可视化代码进行处理



转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/376182.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号