热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

InstallGreenplumOSSonUbuntu

AboutGreenplumDatabaseGreenplumDatabaseisanMPPSQLDatabasebasedonPostgreSQL. Itsusedinprodu

About Greenplum Database

Greenplum Database is an MPP SQL Database based on PostgreSQL.  Its used in production in hundreds of large corporations and government agencies around the world and including the open source has over thousands of deployments globally.

Greenplum Database scales to multi-petabyte data sizes with ease and allows a cluster of powerful servers to work together to provide a single SQL interface to the data.

In addition to using SQL for analyzing structured data, Greenplum provides modules and extensions on top of the PostgreSQL abstractions for in database machine learning and AI, Geospatial analytics, Text Search (with Apache Solr) and Text Analytics with Python and Java, and the ability to create user-defined functions with Python, R, Java, Perl, C or C++.

Greenplum Database Ubuntu Distribution

Greenplum Database is the only open source product in its category that has a large install base, and now with the release of Greenplum Database 5.3, Ready to Install binaries are hosted for the Ubuntu Operating System to make installation and deployment easy.
Ubuntu is a popular operating system in cloud-native environments and is based on the very well respected Debian Linux distribution.

In this article, I will demonstrate how to install the Open Source Greenplum Database binaries on the Ubuntu Operating System.

 

Greenplum Database binaries for Ubuntu are hosted on the Personal Package Archive system, which allows the community to contribute readily to install packages that can be installed from any internet connected system.

So let’s get right to it!

Greenplum OSS on Ubuntu Installation Instructions

First, ensure you have a supported Ubuntu OS version.  At the time of this writing, Ubuntu builds of Greenplum are built for the 16.04 LTS (long-term support) release version of Ubuntu.  Check the PPA page, for current information about which OS version is available.

Add the Greenplum PPA repository to your Ubuntu System, like this:

sudo add-apt-repository ppa:greenplum/db

Install Greenplum OSS on Ubuntu

 

Update your Ubuntu system to retrieve information from the recently added repository, like this:

sudo apt-get update

 

Install the Greenplum Database software, like this:

sudo apt-get install greenplum-db-oss

The above command will install the Greenplum Database software and any required dependencies on the system automatically and put the resulting software in /opt/gpdb.

Install Greenplum OSS on Ubuntu

 

Load Greenplum Database software into your environment with the following command:

$ . /opt/gpdb/greenplum_path.sh
$ which gpssh
/opt/gpdb/bin/gpssh

You can see the software is on the path by testing using the which command as above.  Now you can copy a Greenplum cluster configuration file template into your local directory for editing like this:

cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_singlenode .

Edit gpinitsystem Configuration File

The following edits can be made for the most simple cluster configuration running locally.

Create this file and put only your hostname into the file:
MACHINE_LIST_FILE=./hostlist_singlenode

Update this line to have a directory you want to use for primaries for example:
declare -a DATA_DIRECTORY=(/gpdata1 /gpdata2)
declare -a DATA_DIRECTORY=(/home/inovick/primary /home/inovick/primary)
And make sure the directory mentioned above exists.

Update this line to have the hostname of your machine, in my case, the hostname is ‘ubuntu’:
MASTER_HOSTNAME=hostname_of_machine
MASTER_HOSTNAME=ubuntu

Update the master data directory entry in the file and ensure it exists by making the directory:
MASTER_DIRECTORY=/home/inovick/master

That’s enough to get the database initialized and up running, so close the file and let’s initialize the cluster. We will have a master segment instance and two primary segment instances with this configuration. In more advanced setups you would configure a standby master and segment mirrors on additional hosts, and the data would be automatically both sharded (distributed) between the primary segments and mirrored from primaries to mirrors.

Run gpinitsystem

First, let’s make sure ssh keys are exchanged by running the following command.  Screenshot from my system is shown below:

gpssh-exchkeys -f hostlist_singlenode

Install Greenplum OSS on Ubuntu

 

Ok, we need to start the cluster, let’s get started. Run the following command:

gpinitsystem -c gpinitsystem_singlenode

The utility will print out what its going to do and then ask you to confirm before proceeding.  Here is an example below:

 Install Greenplum OSS on Ubuntu

Once it finishes you are good to go, you can create a database, login and start doing queries and inserting data as shown below:

Install Greenplum OSS on Ubuntu

 

 

To really get the full benefit, you will want to do some of the following things:

  • Allocate enough hardware to process large amounts of data in your cluster
  • Check the official Greenplum Database documentation
  • Watch some of the Greenplum Videos on you tu be
  • Load a lot of data using the high speed parallel load of gpload or external tables with gpfdist, PXF, or S3

That’s it for this tutorial, enjoy Greenplum OSS on Ubuntu.

 

 

PPA description

Installation into /opt/gpdb
---------------------------

sudo apt-get install -y software-properties-common
sudo add-apt-repository ppa:greenplum/db
sudo apt-get update
sudo apt-get install -y greenplum-db-oss

Initialize the cluster
----------------------

1. Install Greenplum on all the nodes you will include in your cluster as described in : https://gpdb.docs.pivotal.io/latest/install_guide/prep_os_install_gpdb.html

2. On all nodes, create unix user "gpadmin" as described at https://gpdb.docs.pivotal.io/latest/admin_guide/roles_privs.html . A convenient script for this is in Greenplum source at https://raw.githubusercontent.com/greenplum-db/gpdb/master/concourse/scripts/setup_gpadmin_user.bash

3. Change ownership of gpdb installed files to the gpadmin user, and do all the following initialization as that gpadmin user:

     chown -R gpadmin:gpadmin /opt/gpdb
     su - gpadmin
     source /opt/gpdb/greenplum_path.sh

4. Follow instructions to initialize cluster at https://gpdb.docs.pivotal.io/latest/install_guide/init_gpdb.html

Adding this PPA to your system

You can update your system with unsupported packages from this untrusted PPA by adding ppa:greenplum/db to your system's Software Sources. (Read about installing)

sudo add-apt-repository ppa:greenplum/db
sudo apt-get update

推荐阅读
  • 本文介绍了Web学习历程记录中关于Tomcat的基本概念和配置。首先解释了Web静态Web资源和动态Web资源的概念,以及C/S架构和B/S架构的区别。然后介绍了常见的Web服务器,包括Weblogic、WebSphere和Tomcat。接着详细讲解了Tomcat的虚拟主机、web应用和虚拟路径映射的概念和配置过程。最后简要介绍了http协议的作用。本文内容详实,适合初学者了解Tomcat的基础知识。 ... [详细]
  • 本文介绍了如何使用php限制数据库插入的条数并显示每次插入数据库之间的数据数目,以及避免重复提交的方法。同时还介绍了如何限制某一个数据库用户的并发连接数,以及设置数据库的连接数和连接超时时间的方法。最后提供了一些关于浏览器在线用户数和数据库连接数量比例的参考值。 ... [详细]
  • t-io 2.0.0发布-法网天眼第一版的回顾和更新说明
    本文回顾了t-io 1.x版本的工程结构和性能数据,并介绍了t-io在码云上的成绩和用户反馈。同时,还提到了@openSeLi同学发布的t-io 30W长连接并发压力测试报告。最后,详细介绍了t-io 2.0.0版本的更新内容,包括更简洁的使用方式和内置的httpsession功能。 ... [详细]
  • http:my.oschina.netleejun2005blog136820刚看到群里又有同学在说HTTP协议下的Get请求参数长度是有大小限制的,最大不能超过XX ... [详细]
  • springmvc学习笔记(十):控制器业务方法中通过注解实现封装Javabean接收表单提交的数据
    本文介绍了在springmvc学习笔记系列的第十篇中,控制器的业务方法中如何通过注解实现封装Javabean来接收表单提交的数据。同时还讨论了当有多个注册表单且字段完全相同时,如何将其交给同一个控制器处理。 ... [详细]
  • 本文介绍了南邮ctf-web的writeup,包括签到题和md5 collision。在CTF比赛和渗透测试中,可以通过查看源代码、代码注释、页面隐藏元素、超链接和HTTP响应头部来寻找flag或提示信息。利用PHP弱类型,可以发现md5('QNKCDZO')='0e830400451993494058024219903391'和md5('240610708')='0e462097431906509019562988736854'。 ... [详细]
  • 本文介绍了在Windows环境下如何配置php+apache环境,包括下载php7和apache2.4、安装vc2015运行时环境、启动php7和apache2.4等步骤。希望对需要搭建php7环境的读者有一定的参考价值。摘要长度为169字。 ... [详细]
  • 解决nginx启动报错epoll_wait() reported that client prematurely closed connection的方法
    本文介绍了解决nginx启动报错epoll_wait() reported that client prematurely closed connection的方法,包括检查location配置是否正确、pass_proxy是否需要加“/”等。同时,还介绍了修改nginx的error.log日志级别为debug,以便查看详细日志信息。 ... [详细]
  • 本文讨论了如何在codeigniter中识别来自angularjs的请求,并提供了两种方法的代码示例。作者尝试了$this->input->is_ajax_request()和自定义函数is_ajax(),但都没有成功。最后,作者展示了一个ajax请求的示例代码。 ... [详细]
  • 本文介绍了在MFC下利用C++和MFC的特性动态创建窗口的方法,包括继承现有的MFC类并加以改造、插入工具栏和状态栏对象的声明等。同时还提到了窗口销毁的处理方法。本文详细介绍了实现方法并给出了相关注意事项。 ... [详细]
  • 本文介绍了RxJava在Android开发中的广泛应用以及其在事件总线(Event Bus)实现中的使用方法。RxJava是一种基于观察者模式的异步java库,可以提高开发效率、降低维护成本。通过RxJava,开发者可以实现事件的异步处理和链式操作。对于已经具备RxJava基础的开发者来说,本文将详细介绍如何利用RxJava实现事件总线,并提供了使用建议。 ... [详细]
  • Activiti7流程定义开发笔记
    本文介绍了Activiti7流程定义的开发笔记,包括流程定义的概念、使用activiti-explorer和activiti-eclipse-designer进行建模的方式,以及生成流程图的方法。还介绍了流程定义部署的概念和步骤,包括将bpmn和png文件添加部署到activiti数据库中的方法,以及使用ZIP包进行部署的方式。同时还提到了activiti.cfg.xml文件的作用。 ... [详细]
  • 大数据Hadoop生态(20)MapReduce框架原理OutputFormat的开发笔记
    本文介绍了大数据Hadoop生态(20)MapReduce框架原理OutputFormat的开发笔记,包括outputFormat接口实现类、自定义outputFormat步骤和案例。案例中将包含nty的日志输出到nty.log文件,其他日志输出到other.log文件。同时提供了一些相关网址供参考。 ... [详细]
  • 目录浏览漏洞与目录遍历漏洞的危害及修复方法
    本文讨论了目录浏览漏洞与目录遍历漏洞的危害,包括网站结构暴露、隐秘文件访问等。同时介绍了检测方法,如使用漏洞扫描器和搜索关键词。最后提供了针对常见中间件的修复方式,包括关闭目录浏览功能。对于保护网站安全具有一定的参考价值。 ... [详细]
  • Apache Shiro 身份验证绕过漏洞 (CVE202011989) 详细解析及防范措施
    本文详细解析了Apache Shiro 身份验证绕过漏洞 (CVE202011989) 的原理和影响,并提供了相应的防范措施。Apache Shiro 是一个强大且易用的Java安全框架,常用于执行身份验证、授权、密码和会话管理。在Apache Shiro 1.5.3之前的版本中,与Spring控制器一起使用时,存在特制请求可能导致身份验证绕过的漏洞。本文还介绍了该漏洞的具体细节,并给出了防范该漏洞的建议措施。 ... [详细]
author-avatar
pacer猫处
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有