栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

使用jsonlite包解析JSON文件时出错

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

使用jsonlite包解析JSON文件时出错

另一个更新

您可以使用该

ndjson
软件包来处理ndjson / streaming
JSON数据。它比
jsonlite::stream_in()
总是产生一个完全“平坦”的数据帧更快,并且:

system.time(bitly01 <- ndjson::stream_in("usagov_bitly_data2013-05-17-1368832207.gz"))##    user  system elapsed ##   0.146   0.004   0.154system.time(bitly02 <- jsonlite::stream_in(file("usagov_bitly_data2013-05-17-1368832207.gz"), verbose=FALSE, pagesize=10000))##    user  system elapsed ##   0.419   0.008   0.427

如果我们检查结果数据frame2,您将看到

ndjson
扩展
ll
为,
ll.0
然后
ll.1
在其中找到一
list
jsonlite
,您以后必须对其进行处理。

ndjson

dplyr::glimpse(bitly01)## Observations: 3,959## Variables: 19## $ a<chr> "Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; HTC_PN071 Build/JZO54K) AppleWebKit/534.30 ...## $ al          <chr> "en-US", "en-us", "en-US,en;q=0.5", "en-US", "en", "en-US", "en-US,en;q=0.5", "en-us", "e...## $ c<chr> "US", NA, "US", "US", NA, "US", "US", NA, "AU", NA, "US", "US", "US", "US", "US", "US", "...## $ cy          <chr> "Anaheim", NA, "Fort Huachuca", "Houston", NA, "Mishawaka", "Hammond", NA, "Sydney", NA, ...## $ g<chr> "15r91", "ifIpBW", "10DaxOu", "TysVFU", "10IGW7m", "13GrCeP", "YmtpnZ", "13oM0hV", "15r91...## $ gr          <chr> "CA", NA, "AZ", "TX", NA, "IN", "WI", NA, "02", NA, "OH", "MD", "KY", "OR", "IL", "TX", "...## $ h<chr> "10OBm3W", "ifIpBW", "10DaxOt", "TChsoQ", "10IGW7l", "13GrCeP", "YmtpnZ", "15PUeH0", "10O...## $ hc          <dbl> 1365701422, 1302189369, 1368814585, 1354719206, 1368738258, 1368130510, 1363711958, 13687...## $ hh          <chr> "j.mp", "1.usa.gov", "1.usa.gov", "1.usa.gov", "1.usa.gov", "1.usa.gov", "1.usa.gov", "go...## $ l<chr> "pontifier", "bitly", "jaxstrong", "o_5004fs3lvd", "peacecorps", "bitly", "bitly", "nasat...## $ ll.0        <dbl> 33.8161, NA, 31.5273, 29.7633, NA, 41.6123, 45.0070, NA, -33.8615, NA, 39.5151, 39.1317, ...## $ ll.1        <dbl> -117.9794, NA, -110.3607, -95.3633, NA, -86.1381, -92.4591, NA, 151.2055, NA, -84.3983, -...## $ nk          <dbl> 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...## $ r<chr> "direct", "http://www.usa.gov/", "http://www.facebook.com/l.php?u=http%3A%2F%2F1.usa.gov%...## $ t<dbl> 1368832205, 1368832207, 1368832209, 1368832209, 1368832208, 1368832209, 1368832210, 13688...## $ tz          <chr> "America/Los_Angeles", "", "America/Phoenix", "America/Chicago", "", "America/Indianapoli...## $ u<chr> "http://www.nsa.gov/", "http://answers.usa.gov/system/selfservice.controller?CONFIGURATIO...## $ _heartbeat_ <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...## $ kw          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

jsonlite

dplyr::glimpse(bitly02)## Observations: 3,959## Variables: 18## $ a<chr> "Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; HTC_PN071 Build/JZO54K) AppleWebKit/534.30 ...## $ c<chr> "US", NA, "US", "US", NA, "US", "US", NA, "AU", NA, "US", "US", "US", "US", "US", "US", "...## $ nk          <int> 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...## $ tz          <chr> "America/Los_Angeles", "", "America/Phoenix", "America/Chicago", "", "America/Indianapoli...## $ gr          <chr> "CA", NA, "AZ", "TX", NA, "IN", "WI", NA, "02", NA, "OH", "MD", "KY", "OR", "IL", "TX", "...## $ g<chr> "15r91", "ifIpBW", "10DaxOu", "TysVFU", "10IGW7m", "13GrCeP", "YmtpnZ", "13oM0hV", "15r91...## $ h<chr> "10OBm3W", "ifIpBW", "10DaxOt", "TChsoQ", "10IGW7l", "13GrCeP", "YmtpnZ", "15PUeH0", "10O...## $ l<chr> "pontifier", "bitly", "jaxstrong", "o_5004fs3lvd", "peacecorps", "bitly", "bitly", "nasat...## ## $ al          <chr> "en-US", "en-us", "en-US,en;q=0.5", "en-US", "en", "en-US", "en-US,en;q=0.5", "en-us", "e...## $ hh          <chr> "j.mp", "1.usa.gov", "1.usa.gov", "1.usa.gov", "1.usa.gov", "1.usa.gov", "1.usa.gov", "go...## $ r<chr> "direct", "http://www.usa.gov/", "http://www.facebook.com/l.php?u=http%3A%2F%2F1.usa.gov%...## $ u<chr> "http://www.nsa.gov/", "http://answers.usa.gov/system/selfservice.controller?CONFIGURATIO...## $ t<int> 1368832205, 1368832207, 1368832209, 1368832209, 1368832208, 1368832209, 1368832210, 13688...## $ hc          <int> 1365701422, 1302189369, 1368814585, 1354719206, 1368738258, 1368130510, 1363711958, 13687...## $ cy          <chr> "Anaheim", NA, "Fort Huachuca", "Houston", NA, "Mishawaka", "Hammond", NA, "Sydney", NA, ...## $ ll          <list> [<33.8161, -117.9794>, NULL, <31.5273, -110.3607>, <29.7633, -95.3633>, NULL, <41.6123, ...## $ _heartbeat_ <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...## $ kw          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

更新

jsonlite
软件包的最新版本支持流JSON(这实际上就是它)。您现在可以像这样一行阅读:

json_file <- stream_in(file("usagov_bitly_data2013-05-17-1368832207"))

另请参见下面的Jeroen答案,以直接通过http对流进行解析。


老答案

事实证明,这是一个“伪JSON”文件。我在工作的许多朴素的API系统中都遇到了这些问题。每一行都是有效的JSON,但各个对象不在JSON数组中。您需要使用它

readLines
,然后从中构建自己的有效JSON数组,并将其传递给
fromJSON

library(jsonlite)# read in individual JSON linesjson_file <- "usagov_bitly_data2013-05-17-1368832207"# turn it into a proper array by separating each object with a "," and# wrapping that up in an array with "[]"'s.dat <- fromJSON(sprintf("[%s]", paste(readLines(json_file), collapse=",")))dim(dat)## [1] 3959   18str(dat)## 'data.frame': 3959 obs. of  18 variables:##  $ a          : chr  "Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; HTC_PN071 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile "| __truncated__ "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.4"| __truncated__ "Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20100101 Firefox/21.0" "Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; SGH-T889 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile S"| __truncated__ ...##  $ c          : chr  "US" NA "US" "US" ...##  $ nk         : int  0 0 1 1 0 0 1 0 0 0 ...##  $ tz         : chr  "America/Los_Angeles" "" "America/Phoenix" "America/Chicago" ...##  $ gr         : chr  "CA" NA "AZ" "TX" ...##  $ g          : chr  "15r91" "ifIpBW" "10DaxOu" "TysVFU" ...##  $ h          : chr  "10OBm3W" "ifIpBW" "10DaxOt" "TChsoQ" ...##  $ l          : chr  "pontifier" "bitly" "jaxstrong" "o_5004fs3lvd" ...##  $ al         : chr  "en-US" "en-us" "en-US,en;q=0.5" "en-US" ...##  $ hh         : chr  "j.mp" "1.usa.gov" "1.usa.gov" "1.usa.gov" ...## ... (goes on for a while, many columns)

我将

readLines
in与
paste
/
sprintf
调用组合在一起,因为
object.size
结果(临时)对象的in
2,025,656
字节(〜2MB),并且不想
rm
在单独的临时变量上进行操作。



转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/609070.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号