sparksql加载txt文件02

作者：ho世英雄 | 来源：互联网 | 2023-09-01 19:12

sparksql加载txt文件02加载映射#方法2需要复制这三行importfindsparkfindspark.init()importpysparkfrom__fu

spark sql加载txt文件02

加载映射

#方法2需要复制这三行 import findspark findspark.init() import pyspark from __future__ import print_function# $example on:init_session$ from pyspark.sql import SparkSession # $example off:init_session$# $example on:schema_inferring$ from pyspark.sql import Row # $example off:schema_inferring$# $example on:programmatic_schema$ # Import data types from pyspark.sql.types import * # $example off:programmatic_schema$import osif __name__ &＃61;&＃61; "__main__":# $example on:init_session$spark &＃61; SparkSession \.builder \.appName("Python Spark SQL basic example") \.config("spark.some.config.option", "some-value") \.getOrCreate()# 动态配置指定的编程 # When a dictionary of kwargs cannot be defined ahead of time (for example, the structure of records is encoded in a string, or a text dataset will be parsed and fields will be projected differently for different users), a DataFrame can be created programmatically with three steps.# Create an RDD of tuples or lists from the original RDD; # Create the schema represented by a StructType matching the structure of tuples or lists in the RDD created in the step 1. # Apply the schema to the RDD via createDataFrame method provided by SparkSession.sc &＃61; spark.sparkContext# Load a text file and convert each line to a Row.lines &＃61; sc.textFile("C:/file/spark_package/spark-2.4.4-bin-hadoop2.7/examples/src/main/resources/people.txt")parts &＃61; lines.map(lambda l: l.split(","))# Each line is converted to a tuple.people &＃61; parts.map(lambda p: (p[0], p[1].strip()))# The schema is encoded in a string.schemaString &＃61; "name age"fields &＃61; [StructField(field_name, StringType(), True) for field_name in schemaString.split()]schema &＃61; StructType(fields)# Apply the schema to the RDD.schemaPeople &＃61; spark.createDataFrame(people, schema)# Creates a temporary view using the DataFrameschemaPeople.createOrReplaceTempView("people")# SQL can be run over DataFrames that have been registered as a table.results &＃61; spark.sql("SELECT name FROM people")results.show()# &＃43;-------&＃43;# | name|# &＃43;-------&＃43;# |Michael|# | Andy|# | Justin|# &＃43;-------&＃43;# $example off:programmatic_schema$

官网手册

def programmatic_schema_example(spark):# $example on:programmatic_schema$sc &＃61; spark.sparkContext# Load a text file and convert each line to a Row.lines &＃61; sc.textFile("examples/src/main/resources/people.txt")parts &＃61; lines.map(lambda l: l.split(","))# Each line is converted to a tuple.people &＃61; parts.map(lambda p: (p[0], p[1].strip()))# The schema is encoded in a string.schemaString &＃61; "name age"fields &＃61; [StructField(field_name, StringType(), True) for field_name in schemaString.split()]schema &＃61; StructType(fields)# Apply the schema to the RDD.schemaPeople &＃61; spark.createDataFrame(people, schema)# Creates a temporary view using the DataFrameschemaPeople.createOrReplaceTempView("people")# SQL can be run over DataFrames that have been registered as a table.results &＃61; spark.sql("SELECT name FROM people")results.show()# &＃43;-------&＃43;# | name|# &＃43;-------&＃43;# |Michael|# | Andy|# | Justin|# &＃43;-------&＃43;# $example off:programmatic_schema$

推荐阅读

get
web.py开发web 第八章 Formalchemy 服务端验证方法

本文介绍了在web.py开发中使用Formalchemy进行服务端表单数据验证的方法。以User表单为例，详细说明了对各字段的验证要求，包括必填、长度限制、唯一性等。同时介绍了如何自定义验证方法来实现验证唯一性和两个密码是否相等的功能。该文提供了相关代码示例。 ... [详细]

蜡笔小新 2023-12-12 16:36:00
join
使用 Ubuntu 中的 Python 获取浏览器历史记录

使用Ubuntu中的Python获取浏览器历史记录原文: ... [详细]

蜡笔小新 2023-12-14 08:57:59
schema
的错误消息：

ZSI.generate.Wsdl2PythonError: unsupported local simpleType restriction ... [详细]

蜡笔小新 2023-12-13 20:28:08
schema
MySQL显示SQL语句执行时间的实例详解

本文详细介绍了如何使用MySQL来显示SQL语句的执行时间，并通过MySQL Query Profiler获取CPU和内存使用量以及系统锁和表锁的时间。同时介绍了效能分析的三种方法：瓶颈分析、工作负载分析和基于比率的分析。 ... [详细]

蜡笔小新 2023-12-12 16:16:42
schema
如何进行Web.Config自定义配置节的配置转换

本文讨论了如何使用Web.Config进行自定义配置节的配置转换。作者提到，他将msbuild设置为详细模式，但转换却忽略了带有替换转换的自定义部分的存在。 ... [详细]

蜡笔小新 2023-12-11 17:54:55
schema
GreenDAO快速入门

前言之前在自己做项目的时候，用到了GreenDAO数据库，其实对于数据库辅助工具库从OrmLite，到litePal再到GreenDAO，总是在不停的切换，但是没有真正去了解他们的 ... [详细]

蜡笔小新 2023-12-11 12:31:00
string
开发笔记:加密&json&StringIO模块&BytesIO模块

篇首语：本文由编程笔记#小编为大家整理，主要介绍了加密&json&StringIO模块&BytesIO模块相关的知识，希望对你有一定的参考价值。一、加密加密 ... [详细]

蜡笔小新 2023-12-14 15:18:35
list
Java容器中的compareto方法排序原理解析

本文从源码解析Java容器中的compareto方法的排序原理，讲解了在使用数组存储数据时的限制以及存储效率的问题。同时提到了Redis的五大数据结构和list、set等知识点，回忆了作者大学时代的Java学习经历。文章以作者做的思维导图作为目录，展示了整个讲解过程。 ... [详细]

蜡笔小新 2023-12-14 13:53:31
list
Oracle中tnsnames.ora的作用和配置方法

本文介绍了Oracle数据库中tnsnames.ora文件的作用和配置方法。tnsnames.ora文件在数据库启动过程中会被读取，用于解析LOCAL_LISTENER，并且与侦听无关。文章还提供了配置LOCAL_LISTENER和1522端口的示例，并展示了listener.ora文件的内容。 ... [详细]

蜡笔小新 2023-12-14 07:44:06
list
Spring 3.1：数据源未自动连接到@Configuration类的错误原因及解决方法

本文讨论了在Spring 3.1中，数据源未能自动连接到@Configuration类的错误原因，并提供了解决方法。作者发现了错误的原因，并在代码中手动定义了PersistenceAnnotationBeanPostProcessor。作者删除了该定义后，问题得到解决。此外，作者还指出了默认的PersistenceAnnotationBeanPostProcessor的注册方式，并提供了自定义该bean定义的方法。 ... [详细]

蜡笔小新 2023-12-14 03:54:26
string
Spring特性实现接口多类的动态调用详解

本文详细介绍了如何使用Spring特性实现接口多类的动态调用。通过对Spring IoC容器的基础类BeanFactory和ApplicationContext的介绍，以及getBeansOfType方法的应用，解决了在实际工作中遇到的接口及多个实现类的问题。同时，文章还提到了SPI使用的不便之处，并介绍了借助ApplicationContext实现需求的方法。阅读本文，你将了解到Spring特性的实现原理和实际应用方式。 ... [详细]

蜡笔小新 2023-12-14 03:24:19
list
关于cuowu类的错误提示和使用AdjustmentListener的问题

本文讨论了一个关于cuowu类的问题，作者在使用cuowu类时遇到了错误提示和使用AdjustmentListener的问题。文章提供了16个解决方案，并给出了两个可能导致错误的原因。 ... [详细]

蜡笔小新 2023-12-13 22:09:56
get
页面请求方法参数最长_关于 HTTP GET/POST 请求参数长度最大值的一个理解误区

http:my.oschina.netleejun2005blog136820刚看到群里又有同学在说HTTP协议下的Get请求参数长度是有大小限制的，最大不能超过XX ... [详细]

蜡笔小新 2023-12-13 19:20:03
get
Web学习历程记录（七）——Tomcat基本概念和配置

本文介绍了Web学习历程记录中关于Tomcat的基本概念和配置。首先解释了Web静态Web资源和动态Web资源的概念，以及C/S架构和B/S架构的区别。然后介绍了常见的Web服务器，包括Weblogic、WebSphere和Tomcat。接着详细讲解了Tomcat的虚拟主机、web应用和虚拟路径映射的概念和配置过程。最后简要介绍了http协议的作用。本文内容详实，适合初学者了解Tomcat的基础知识。 ... [详细]

蜡笔小新 2023-12-13 17:08:24
string
VB.NET在线急等问题解决方法，如何统计数据库字段下的数据并显示在文本框里？

本文介绍了一个在线急等问题解决方法，即如何统计数据库中某个字段下的所有数据，并将结果显示在文本框里。作者提到了自己是一个菜鸟，希望能够得到帮助。作者使用的是ACCESS数据库，并且给出了一个例子，希望得到的结果是560。作者还提到自己已经尝试了使用"select sum(字段2) from 表名"的语句，得到的结果是650，但不知道如何得到560。希望能够得到解决方案。 ... [详细]

蜡笔小新 2023-12-13 15:15:30

ho世英雄

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章