作者:转身-说离别2013 | 来源:互联网 | 2023-09-06 10:20
长话短说-我正在Scala IDE中使用Spark代码将json转换为csv。我不了解spark,因为我只在Oracle,TD和DB2等RDBMS上工作。我所得到的是,该代码将json数据转换为csv,以及如何传递参数以从架构中检索数据。
现在,我可以使用
来获取结构和数组中的数据
val val1 = df.select(explode($"data.business").as("ID")).select($"ID.amountTO")
val1.repartition(1).write.format("com.databricks.spark.csv").option("header","true").save(args(2) + "\\Result" + "\\" + timeForpath + "\\val1")
我不知道要导出不在结构体中并且直接在模式根目录中的列,例如QAYONOutCome,QA1PartiesComments等。
root
|-- QAYONOutCome: string (nullable = true)
|-- QA1PartiesComments: string (nullable = true)
|-- QA1PartiesQID: string (nullable = true)
|-- QA1PartiesResponse: string (nullable = true)
|-- QAHolderTypeComments: string (nullable = true)
|-- QAHolderTypeQID: string (nullable = true)
|-- QAHolderTypeResponse: string (nullable = true)
|-- QAhighRiskComments: string (nullable = true)
|-- QAhighRiskQID: string (nullable = true)
|-- QAhighRiskResponse: string (nullable = true)
|-- QA2ClassComments: string (nullable = true)
|-- QA2ClassQID: string (nullable = true)
|-- QA2ClassResponse: string (nullable = true)
|-- QAoutcomeComments: string (nullable = true)
|-- QAoutcomeQID: string (nullable = true)
|-- QAoutcomeResponse: string (nullable = true)
|-- data: struct (nullable = true)
| |-- business: array (nullable = true)
| | |-- element: struct (cOntainsnull= true)
| | | |-- amountTO: string (nullable = true)
| | | |-- ID: string (nullable = true)
| | | |-- Registration: struct (nullable = true)
| | | | |-- country: string (nullable = true)
| | | | |-- id: long (nullable = true)
| | | | |-- line1: string (nullable = true)
| | | | |-- line2: string (nullable = true)
| | | | |-- postCode: string (nullable = true)
感谢您的帮助。抱歉,如果我的问题听起来很愚蠢:(。请让我知道是否需要更多信息以提供解决方案或明确说明。谢谢您。