您可以使用R轻松生成符合正态分布的数据,可以按照以下步骤进行操作
#Read the data into a dataframelibrary(data.table)data = data = fread("data.csv", sep=",", select = c("latitude", "longitude"))#Remove duplicate and null valuesdf = data.frame("Lat"=data$"latitude", "Lon"=data$"longitude")df1 = unique(df[1:2])df2 <- na.omit(df1)#Determine the mean and standard deviation of latitude and longitude valuesmeanLat = mean(df2$Lat)meanLon = mean(df2$Lon)sdLat = sd(df2$Lat)sdLon = sd(df2$Lon)#Use Normal distribution to generate new data of 1 million recordsnewData = list()newData$Lat = sapply(rep(0, 1000000), function(x) (sum(runif(12))-6) * sdLat + meanLat)newData$Lon = sapply(rep(0, 1000000), function(x) (sum(runif(12))-6) * sdLon + meanLon)finalData = rbind(df2,newData)now final data contains both old records and new records将finalData数据帧写入CSV文件,您可以从Scala或python中读取它



