热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

将PandasDataFrame转换为多个嵌套的JSON

我正在处理要转换为嵌套JSON的DataFrame(从.csv导入),但是无法创建其他嵌套级别

我正在处理要转换为嵌套JSON的DataFrame(从.csv导入),但是无法创建其他嵌套级别。我试图用一个例子来解释。在工作的最后,结果通过pymongo导入到MongoDB中。

class MockMyClass :public MyClass {
public:
MOCK_METHOD(void,update_a,(),(override));
MOCK_METHOD(void,update_b,(override));
void set_dummy_a(int arg_a) {a = arg_a;}
void set_dummy_b(int arg_b) {b = arg_b;}
int deletegate_to_real() {
return MyClass::calculate_p();
}
}
TEST(CalculatingP_Test,otherMemberFunctionsInvoked) {
MockMyClass mockob;
EXPECT_CALL(mockOb,update_a()).Times(1);
EXPECT_CALL(mockOb,update_b()).Times(1);
mockOb.delegate_to_real();
}
TEST(CalculatingP_Test,shouldCalculateP_basedon_a_and_b) {
MockMyClass mockob;
EXPECT_CALL(mockob,update_a()).WillRepeatedly([&mockob]()
{mockob.set_dummy_a(20000);});
EXPECT_CALL(mockob,update_b()).WillRepeatedly([&mockob]()
{mockob.set_dummy_b(20000);});
int expected {3000000};
EXPECT_EQ(expected,mockob.delegate_to_real());
}

我要获取的JSON应该遵循以下结构:

----------------------------------------------------
worker_id | gender | employer_id | year | job_type |
----------------------------------------------------
WORK_1 | M | EMPL_2 | 1990 | Att |
----------------------------------------------------
WORK_1 | M | EMPL_1 | 1991 | Mis |
----------------------------------------------------
WORK_1 | M | EMPL_1 | 1993 | Att |
----------------------------------------------------
WORK_2 | F | EMPL_3 | 1995 | Att |
----------------------------------------------------
WORK_2 | F | EMPL_3 | 1992 | Mis |
----------------------------------------------------
WORK_2 | F | EMPL_3 | 1994 | Att |
----------------------------------------------------
df = pd.DataFrame({'worker_id':['WORK_1','WORK_1','WORK_2','WORK_2'],'gender':['M','M','F','F'],'employer_id':['EMPL_2','EMPL_1','EMPL_3','EMPL_3'],'year':[1990,1991,1993,1995,1992,1994],'job_type':['Att','Mis','Att','Att']
})

通过在这里进行了一些有用的讨论,我能够为每个特定的工作合同嵌套一个对象(“数组类型”)(对象的每一行应代表一个特定的工作合同,然后是其他几个变量)。在堆栈溢出。尽管如此,我也想区分作品的类型(在“ Mis”和“ Att”之间的示例中),然后创建另一个嵌套的关卡。

我以前在工人之间的工作合同中嵌套的代码如下。

{ "worker_id": "WORK_1","gender": "M","job_type" : [
{ "Att": [
{
"employer_id": "EMPL_1","year": 1990
},{
"employer_id": "EMPL_2","year": 1993
}
]
},{ "Mis": [
{
"employer_id": "EMPL_1","year": 1991
}
]
}
]
},{ "worker_id": "WORK_2","gender": "F","job_type" : [
{ "Att": [
{
"employer_id": "EMPL_3","year": 1994
},{
"employer_id": "EMPL_3","year": 1995
}
]
},{ "Mis": [
{
"employer_id": "EMPL_3","year": 1992
}
]
}
]
}

我希望有人能帮助我。预先谢谢你!

更新

我尝试使用下面的脚本来增强代码(我遵循了thread)。不幸的是,我仍然没有得到想要的东西。

finalList = []
finalDict = {}
grouped = df.groupby(['worker_id','gender'
])
for key,value in grouped:
dictiOnary= {}
j = grouped.get_group(key).reset_index(drop = True)
dictionary['worker_id'] = j.at[0,'worker_id']
dictionary['gender'] = j.at[0,'gender']
dictList = []
anotherDict = {}
for i in j.index:
anotherDict['employer_id'] = j.at[i,'employer_id']
anotherDict['year'] = j.at[i,'year']
anotherDict['job_type'] = j.at[i,'job_type']
dictList.append(anotherDict.copy())
dictionary['job_type'] = dictList
finalList.append(dictionary)

我到目前为止所得到的...

# Generates a column for each kind of 'job_type'
df['att'] = ['Att' if x == 'Att' else None for x in df['job_type']]
df['mis'] = ['Mis' if x == 'Mis' else None for x in df['job_type']]
# Aggregate for the 'job_type' = 'Mis'
df_att = df.dropna(subset = ['att'])
df_att.drop(columns=['mis'])
att = (df_att.groupby(['worker_id','gender'],as_index = True)
.apply(lambda x: x[['employer_id','year','job_type']].to_dict('r'))
.reset_index()
.rename(columns = {0:'Att'}))
# Aggregate for the 'job_type' = 'Som'
df_mis= df.dropna(subset = ['mis'])
df_mis.drop(columns=['att'])
mis = (df_mis.groupby(['worker_id',as_index = False)
.apply(lambda x: x[['employer_id','job_type']].to_dict('r'))
.reset_index()
.rename(columns = {0:'Mis'}))
# Append
df_all = att.append(mis)
# Aggregate for 'worker_id' and 'gender'
j = (df_all.groupby(['worker_id',as_index = False)
.apply(lambda x: x[['Att','Mis']].to_dict('r'))
.reset_index()
.rename(columns = {0:'job_type'})
.to_json(orient = 'records'))
print(json.dumps(json.loads(j),indent = 4,sort_keys = True))


这是一个解决方案,它遍历唯一的worker_id值并为每个worker_id建立字典列表:

import pandas as pd
import json
df = pd.DataFrame({'worker_id':['WORK_1','WORK_1','WORK_2','WORK_2'],'gender':['M','M','F','F'],'employer_id':['EMPL_2','EMPL_1','EMPL_3','EMPL_3'],'year':[1990,1991,1993,1995,1992,1994],'job_type':['Att','Mis','Att','Att']})
df_G=df[['worker_id','gender']].drop_duplicates()
all_dicts=[]
for indx,vals in df_G.iterrows():
this_dict=vals.to_dict()
job_dict=(df[df.worker_id==vals['worker_id']]
.groupby(['job_type']).apply(lambda x: x[['employer_id','year']]
.to_dict('r')).to_dict())
this_dict['job_type']=[]
for key,val in job_dict.items():
print({key:val})
this_dict['job_type'].append({key:val})
all_dicts.append(this_dict)
(df[df.worker_id==vals['worker_id']].groupby(['job_type']).apply(lambda x: x[['employer_id','year']].to_dict('r')))
print(json.dumps(all_dicts,indent = 4,sort_keys = True))

打印出:

[
{
"gender": "M","job_type": [
{
"Mis": [
{
"employer_id": "EMPL_1","year": 1991
}
]
},{
"Att": [
{
"employer_id": "EMPL_2","year": 1990
},{
"employer_id": "EMPL_1","year": 1993
}
]
}
],"worker_id": "WORK_1"
},{
"gender": "F","job_type": [
{
"Mis": [
{
"employer_id": "EMPL_3","year": 1992
}
]
},{
"Att": [
{
"employer_id": "EMPL_3","year": 1995
},{
"employer_id": "EMPL_3","year": 1994
}
]
}
],"worker_id": "WORK_2"
}
]

也许不是最高效的或pythonic的,但它可以工作。如果我正确地记得pymongo,则可以将要插入的字典列表传递给它。


推荐阅读
author-avatar
fanguobiao
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有