栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

python数据预处理——关联数据写入csv文件,为画图作准备

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

python数据预处理——关联数据写入csv文件,为画图作准备

python数据预处理——有关联数据写入csv文件,为画图作准备

文章目录
  • python数据预处理——有关联数据写入csv文件,为画图作准备


目标格式如下:

测试代码:


from pandas import  Dataframe
# import json
# import numpy as np
#
# file = open('clean_data.txt', encoding='UTF-8')
# js = file.read()
# dic = json.loads(js)
# #print(dic)
# file.close()



authors=[]
authors_id={}
empty=[]
author_name={}

#第一个循环代表遍历每一篇论文
author_name['id']=123456
author_name['name']='xiaotang'
author_name['follow_id']=1234567
author_name['follow_name']='xiaotangtang'
authors.append(author_name)
author_name={}
author_name['id']=12345678
author_name['name']='xiaotang123'
author_name['follow_id']=123456789
author_name['follow_name']='xiaotangtang456'
authors.append(author_name)
print(authors)
data_frame = Dataframe(data=authors)
data_frame.to_csv("a.csv")

筛选数据集中的数据:


from pandas import  Dataframe
import json
import numpy as np

file = open('clean_data.txt', encoding='UTF-8')
js = file.read()
dic = json.loads(js)
#print(dic)
file.close()

count=0

authors=[]
empty=[]
author_name={}

#第一个循环代表遍历每一篇论文
for leng_dic in range(len(dic)-1):
    author_dic=dic[leng_dic].get('authors', [])
    if author_dic is None:
        empty.append(leng_dic)
    else:

        author_name_now = []
        author_id_now = []


        #遍历每篇论文里的作者信息
        for i in range( len(author_dic)-1):
            try:
                author_id_now.append(author_dic[i]['_id'])
            except:
                author_id_now.append([])


            try:
                author_name_now.append(author_dic[i]['name'])
            except:
                author_id_now.append([])


        for j in range(len(author_name_now)-1):
            for t in range(len(author_name_now) - 1):
                author_name={}

                if j !=t:
                    #print(123)
                    author_name['id']=author_id_now[j]
                    author_name['name']= author_name_now[j]
                    author_name['follow_id']=author_id_now[t]
                    author_name['follow_name'] = author_name_now[t]
                    authors.append(author_name)
                    #print('start')
                    #print(authors)


data_frame = Dataframe(data=authors)
data_frame.to_csv("tu1.csv")




部分数据集没有部分关键字,直接删掉:


from pandas import  Dataframe
import json
import numpy as np

file = open('clean_data.txt', encoding='UTF-8')
js = file.read()
dic = json.loads(js)
#print(dic)
file.close()

count=0

authors=[]
empty=[]
author_name={}

#第一个循环代表遍历每一篇论文
for leng_dic in range(len(dic)-1):
    author_dic=dic[leng_dic].get('authors', [])
    if author_dic is None:
        empty.append(leng_dic)
    else:

        author_name_now = []
        author_id_now = []


        #遍历每篇论文里的作者信息
        for i in range( len(author_dic)-1):
            try:
                author_id_now.append(author_dic[i]['_id'])
                author_name_now.append(author_dic[i]['name'])
            except:
                pass




        for j in range(len(author_name_now)-1):
            for t in range(len(author_name_now) - 1):
                author_name={}

                if j !=t:
                    #print(123)
                    author_name['id']=author_id_now[j]
                    author_name['name']= author_name_now[j]
                    author_name['follow_id']=author_id_now[t]
                    author_name['follow_name'] = author_name_now[t]
                    authors.append(author_name)
                    #print('start')
                    #print(authors)


data_frame = Dataframe(data=authors)
data_frame.to_csv("tu1.csv")




转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/529944.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号