该
agrep函数(基数R的一部分)使用Levenshtein编辑距离进行近似字符串匹配可能值得尝试。不知道您的数据是什么样子,我无法真正提出可行的解决方案。但这是一个建议……它会将比赛记录在一个单独的列表中(如果有多个同样出色的比赛,那么也会记录这些比赛)。假设您的data.frame称为
df:
l <- vector('list',nrow(df))matches <- list(mother = l,father = l)for(i in 1:nrow(df)){ father_id <- with(df,which(student_name[i] == father_name)) if(length(father_id) == 1){ matches[['father']][[i]] <- father_id } else { old_father_id <- NULL ## try to find the total for(m in 10:1){ ## m is the maximum distance father_id <- with(df,agrep(student_name[i],father_name,max.dist = m)) if(length(father_id) == 1 || m == 1){ ## if we find a unique match or if we are in our last round, then stop matches[['father']][[i]] <- father_id break } else if(length(father_id) == 0 && length(old_father_id) > 0) { ## if we can't do better than multiple matches, then record them anyway matches[['father']][[i]] <- old_father_id break } else if(length(father_id) == 0 && length(old_father_id) == 0) { ## if the nearest match is more than 10 different from the current pattern, then stop break } } }}的代码
mother_name基本相同。您甚至可以将它们组合成一个循环,但是此示例仅出于说明目的。



