您可以
apply在每一行上调用并传递要执行的函数,如下所示:
In [9]:geolocator = Nominatim()df['city_coord'] = df['state_name'].apply(geolocator.geopre)dfOut[9]: city_name state_name county_name WASHINGTON DC DIST OF COLUMBIA 1 WASHINGTON DC DIST OF COLUMBIA city_coord 0 (District of Columbia, United States of Americ... 1 (District of Columbia, United States of Americ...
然后,您可以访问纬度和经度属性:
In [16]:df['city_coord'] = df['city_coord'].apply(lambda x: (x.latitude, x.longitude))dfOut[16]: city_name state_name county_name city_coord0 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)1 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
或在同一班轮中致电
apply两次:
In [17]:df['city_coord'] = df['state_name'].apply(geolocator.geopre).apply(lambda x: (x.latitude, x.longitude))dfOut[17]: city_name state_name county_name city_coord0 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)1 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)
同样,您的尝试
geolocator.geopre(lambda row: 'state_name'(row))也无济于事,因此,为什么您有一个充满
None价值的栏目
编辑
@leb在这里提出了一个有趣的观点,如果您有很多重复的值,则对每个唯一值进行地理编码然后再添加以下代码会更有效:
In [38]:states = df['state_name'].unique()d = dict(zip(states, pd.Series(states).apply(geolocator.geopre).apply(lambda x: (x.latitude, x.longitude))))dOut[38]:{'DC': (38.8937154, -76.9877934586326)}In [40]: df['city_coord'] = df['state_name'].map(d)dfOut[40]: city_name state_name county_name city_coord0 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)1 WASHINGTON DC DIST OF COLUMBIA (38.8937154, -76.9877934586326)因此,上述方法使用来获取所有唯一值
unique,并从中构造一个字典,然后调用
map执行查找并添加坐标,这比尝试按行对地址进行地理编码更为有效



