pivotal hawq

754
PRODUCT DOCUMENTATION Pivotal HAWQ ® Version 1.3.1 Pivotal HAWQ User's Guide Rev: 04 2015 Pivotal Software, Inc.

Upload: voanh

Post on 02-Jan-2017

345 views

Category:

Documents


4 download

TRANSCRIPT

  • PRODUCT DOCUMENTATION

    Pivotal HAWQVersion 1.3.1

    Pivotal HAWQ User's GuideRev: 04

    2015 Pivotal Software, Inc.

  • Copyright

    2

    Notice

    Copyright

    Copyright 2015 Pivotal Software, Inc. All rights reserved.

    Pivotal Software, Inc. believes the information in this publication is accurate as of its publication date. Theinformation is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED"AS IS." PIVOTAL SOFTWARE, INC. ("Pivotal") MAKES NO REPRESENTATIONS OR WARRANTIES OF ANYKIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMSIMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

    Use, copying, and distribution of any Pivotal software described in this publication requires an applicablesoftware license.

    All trademarks used herein are the property of Pivotal or their respective owners.

    Use of Open SourceThis product may be distributed with open source code, licensed to you in accordance with the applicable opensource license. If you would like a copy of any such source code, Pivotal will provide a copy of the source codethat is required to be made available in accordance with the applicable open source license. Pivotal may chargereasonable shipping and handling charges for such distribution.

    About Pivotal Software, Inc.Greenplum transitioned to a new corporate identity (Pivotal, Inc.) in 2013. As a result of this transition, therewill be some legacy instances of our former corporate identity (Greenplum) appearing in our products anddocumentation. If you have any questions or concerns, please do not hesitate to contact us through our web site:http://support.pivotal.io.

    Published June 2015

    http://support.pivotal.io

  • Contents

    3

    Contents

    About the Pivotal HAWQ Documentation................................................................... 13

    Chapter 1: Supported Configurations and System Requirements........14

    Chapter 2: Pivotal HAWQ 1.3.1.x Release Notes....................................16Pivotal HAWQ 1.3.1.0 Release Notes.............................................................................................. 17

    About the Pivotal HAWQ Components.................................................................................. 17New Features and Changes.................................................................................................. 18Supported Platforms............................................................................................................... 20Installation Options................................................................................................................. 20Upgrade Paths........................................................................................................................20Resolved Issues..................................................................................................................... 20Known Issues......................................................................................................................... 24HAWQ and Pivotal HD Documentation..................................................................................25

    Chapter 3: Getting Started....................................................................... 26About HAWQ and PXF..................................................................................................................... 27Installing HAWQ and PXF.................................................................................................................28

    Prepare Host Machines for HAWQ and PXF.........................................................................28Prepare Local Repository.......................................................................................................29(Optional) Configure Kerberos Security Principals.................................................................30Install and Patch HDP 2.2.6 and Ambari 1.7.0......................................................................33Install HAWQ and PXF with Ambari...................................................................................... 36(Optional) Installing HAWQ from the Command Line............................................................ 44Installing Additional HAWQ Components...............................................................................54

    Upgrading HAWQ Components........................................................................................................ 62Step 1: Prepare to Upgrade HAWQ.......................................................................................62Step 2: Download and Prepare the Latest HAWQ Distribution..............................................63Step 3: Upgrade HAWQ Nodes............................................................................................. 63Step 4: Upgrade PXF Nodes................................................................................................. 64Step 5: Upgrade Ambari Plug-in............................................................................................ 65

    Considerations for Upgrading HDFS.................................................................................................66

    Chapter 4: HAWQ Administration............................................................67HAWQ System Overview.................................................................................................................. 68

    About the HAWQ Architecture............................................................................................... 68HAWQ Master.........................................................................................................................68Redundancy and Failover in HAWQ...................................................................................... 69HAWQ Query Processing.......................................................................................................70Using HAWQ to Query Data.................................................................................................. 75Using Procedural Languages................................................................................................. 86

    Starting and Stopping HAWQ........................................................................................................... 91Starting HAWQ....................................................................................................................... 91Restarting HAWQ................................................................................................................... 91Reloading Configuration File Changes Only.......................................................................... 91Starting the Master in Maintenance Mode............................................................................. 92Stopping HAWQ..................................................................................................................... 92

  • Contents

    4

    Accessing the Database....................................................................................................................93Establishing a Database Session...........................................................................................93Supported Client Applications................................................................................................ 93HAWQ Client Applications......................................................................................................94Connecting with psql.............................................................................................................. 94pgAdmin III for HAWQ............................................................................................................95Database Application Interfaces.............................................................................................96Troubleshooting Connection Problems...................................................................................97

    Managing HAWQ Access..................................................................................................................98Configuring Client Authentication........................................................................................... 98Managing Roles and Privileges............................................................................................110

    HAWQ InputFormat for MapReduce...............................................................................................117Supported Data Types..........................................................................................................117HAWQ InputFormat Example............................................................................................... 118Accessing HAWQ Data........................................................................................................ 121

    HAWQ Filespaces and High Availability Enabled HDFS................................................................ 123Enabling the HDFS NameNode HA feature.........................................................................123

    Working with Databases..................................................................................................................126Defining Database Objects...................................................................................................126Managing Data..................................................................................................................... 145Loading and Unloading Data................................................................................................148Using PXF to Query External System Data.........................................................................184Querying Data.......................................................................................................................204

    Managing Performance................................................................................................................... 246Defining Database Performance.......................................................................................... 246Common Causes of Performance Issues............................................................................ 247Workload Management with Resource Queues...................................................................250Investigating a Performance Problem.................................................................................. 262

    Configuring the HAWQ System...................................................................................................... 265About HAWQ Master and Local Parameters....................................................................... 265Setting Configuration Parameters........................................................................................ 265Viewing Server Configuration Parameter Settings............................................................... 266Configuration Parameter Categories.................................................................................... 267

    Enabling High Availability Features.................................................................................................274Overview of HAWQ High Availability....................................................................................274Enabling Master Mirroring.................................................................................................... 276Checking the Log Files for Failed Segments....................................................................... 277Recovering a Failed Segment..............................................................................................277Recovering a Failed Master................................................................................................. 278

    Backing Up and Restoring HAWQ Databases................................................................................280About gpfdist and PXF......................................................................................................... 280About pg_dump and pg_restore...........................................................................................281About Backing Up Raw Data............................................................................................... 281Selecting a Backup Strategy/Utility...................................................................................... 281Estimating Space Requirements.......................................................................................... 282Using gpfdist......................................................................................................................... 282Using PXF.............................................................................................................................286

    Monitoring a HAWQ System........................................................................................................... 288Monitoring Database Activity and Performance................................................................... 288Monitoring System State...................................................................................................... 288Viewing the Database Server Log Files...............................................................................293Using hawq_toolkit................................................................................................................295HAWQ Error Codes..............................................................................................................295

    Routine System Maintenance Tasks...............................................................................................304Managing HAWQ Log Files..................................................................................................304

    Recommended Monitoring and Maintenance Tasks.......................................................................305

  • Contents

    5

    Database State Monitoring Activities....................................................................................305Hardware and Operating System Monitoring....................................................................... 306Data Maintenance.................................................................................................................307Database Maintenance.........................................................................................................308Patching and Upgrading.......................................................................................................309

    Chapter 5: HAWQ Reference..................................................................310HDFS Client Configuration Parameters.......................................................................................... 311Environment Variables.................................................................................................................... 314

    Required Environment Variables..........................................................................................314Optional Environment Variables........................................................................................... 315

    Character Set Support Reference...................................................................................................317Setting the Character Set.....................................................................................................319Character Set Conversion Between Server and Client........................................................319

    Data Types...................................................................................................................................... 322Time Zones...........................................................................................................................324Date and Time Configuration Files...................................................................................... 325

    SQL Commands.............................................................................................................................. 327ABORT..................................................................................................................................327ALTER AGGREGATE...........................................................................................................327ALTER FUNCTION...............................................................................................................328ALTER OPERATOR............................................................................................................. 330ALTER OPERATOR CLASS................................................................................................ 331ALTER ROLE....................................................................................................................... 332ALTER TABLE......................................................................................................................335ALTER TABLESPACE..........................................................................................................343ALTER TYPE........................................................................................................................344ALTER USER....................................................................................................................... 345ANALYZE..............................................................................................................................345BEGIN................................................................................................................................... 347CHECKPOINT.......................................................................................................................348CLOSE.................................................................................................................................. 349COMMIT................................................................................................................................349COPY.................................................................................................................................... 350CREATE AGGREGATE........................................................................................................357CREATE DATABASE........................................................................................................... 360CREATE EXTERNAL TABLE...............................................................................................362CREATE FUNCTION............................................................................................................368CREATE GROUP................................................................................................................. 373CREATE LANGUAGE.......................................................................................................... 374CREATE OPERATOR.......................................................................................................... 376CREATE OPERATOR CLASS............................................................................................. 379CREATE RESOURCE QUEUE............................................................................................383CREATE ROLE.................................................................................................................... 387CREATE SCHEMA...............................................................................................................391CREATE SEQUENCE.......................................................................................................... 392CREATE TABLE...................................................................................................................395CREATE TABLE AS.............................................................................................................404CREATE TABLESPACE.......................................................................................................407CREATE TYPE.....................................................................................................................408CREATE USER.................................................................................................................... 412CREATE VIEW..................................................................................................................... 413DEALLOCATE...................................................................................................................... 415DECLARE............................................................................................................................. 416DROP AGGREGATE............................................................................................................417

  • Contents

    6

    DROP DATABASE............................................................................................................... 418DROP EXTERNAL TABLE...................................................................................................419DROP FILESPACE...............................................................................................................420DROP FUNCTION................................................................................................................420DROP GROUP..................................................................................................................... 421DROP OPERATOR.............................................................................................................. 422DROP OPERATOR CLASS................................................................................................. 423DROP OWNED.....................................................................................................................424DROP RESOURCE QUEUE................................................................................................ 425DROP ROLE.........................................................................................................................426DROP SCHEMA................................................................................................................... 426DROP SEQUENCE.............................................................................................................. 427DROP TABLE....................................................................................................................... 428DROP TABLESPACE...........................................................................................................429DROP TYPE......................................................................................................................... 429DROP USER........................................................................................................................ 430DROP VIEW......................................................................................................................... 431END.......................................................................................................................................431EXECUTE............................................................................................................................. 432EXPLAIN............................................................................................................................... 433FETCH.................................................................................................................................. 435GRANT..................................................................................................................................438INSERT................................................................................................................................. 442PREPARE............................................................................................................................. 443REASSIGN OWNED.............................................................................................................445RELEASE SAVEPOINT........................................................................................................446RESET.................................................................................................................................. 447REVOKE............................................................................................................................... 447ROLLBACK........................................................................................................................... 449ROLLBACK TO SAVEPOINT...............................................................................................450SAVEPOINT..........................................................................................................................451SELECT................................................................................................................................ 452SELECT INTO...................................................................................................................... 464SET....................................................................................................................................... 465SET ROLE............................................................................................................................ 466SET SESSION AUTHORIZATION....................................................................................... 468SHOW................................................................................................................................... 469TRUNCATE...........................................................................................................................470VACUUM...............................................................................................................................470

    System Catalog Reference............................................................................................................. 473System Tables...................................................................................................................... 473System Views....................................................................................................................... 474System Catalogs Definitions.................................................................................................475

    The hawq_toolkit Administrative Schema....................................................................................... 554Checking for Tables that Need Routine Maintenance......................................................... 554Viewing HAWQ Server Log Files......................................................................................... 555Checking Query Disk Spill Space Usage.............................................................................558Checking Database Object Sizes and Disk Space.............................................................. 560

    PXF External Tables and API.........................................................................................................565Creating an External Table.................................................................................................. 565About the Java Class Services and Formats.......................................................................565About Custom Profiles..........................................................................................................576About Query Filter Push-Down............................................................................................ 576Reference..............................................................................................................................581Credentials for Remote Services..........................................................................................587

    Management Utility Reference........................................................................................................ 588

  • Contents

    7

    Backend Server Programs................................................................................................... 588analyzedb..............................................................................................................................589gpactivatestandby................................................................................................................. 592gpcheckperf...........................................................................................................................594gpconfig.................................................................................................................................597gpexpand.............................................................................................................................. 599gpextract............................................................................................................................... 602gpfdist....................................................................................................................................606gpfilespace............................................................................................................................ 609gpinitstandby......................................................................................................................... 612gpinitsystem.......................................................................................................................... 614gpload................................................................................................................................... 618gplogfilter...............................................................................................................................627gppkg.................................................................................................................................... 630gpscp.....................................................................................................................................632gpssh.....................................................................................................................................633gpssh-exkeys........................................................................................................................ 635gpstart................................................................................................................................... 637gpstate.................................................................................................................................. 639gpstop................................................................................................................................... 641

    Client Utility Reference....................................................................................................................644createdb................................................................................................................................ 644createlang............................................................................................................................. 645createuser............................................................................................................................. 647dropdb................................................................................................................................... 649dropuser................................................................................................................................ 650pg_dump............................................................................................................................... 651pg_dumpall............................................................................................................................656pg_restore............................................................................................................................. 659psql........................................................................................................................................663reindexdb.............................................................................................................................. 680vacuumdb..............................................................................................................................682

    Chapter 6: Server Configuration Parameters....................................... 685Parameter Types and Values......................................................................................................... 686Setting Parameters..........................................................................................................................687Parameter Categories..................................................................................................................... 689

    Connection and Authentication Parameters.........................................................................689System Resource Consumption Parameters....................................................................... 689Pivotal Query Optimizer Parameters....................................................................................691Query Tuning Parameters.................................................................................................... 691Error Reporting and Logging Parameters............................................................................ 692Runtime Statistics Collection Parameters............................................................................ 693Automatic Statistics Collection Parameters..........................................................................693Client Connection Default Parameters................................................................................. 694Lock Management Parameters............................................................................................ 694Workload Management Parameters.....................................................................................694External Table Parameters...................................................................................................695Database Table Parameters.................................................................................................695Database and Tablespace/Filespace Parameters................................................................695Past PostgreSQL Version Compatibility Parameters........................................................... 695HAWQ Array Configuration Parameters...............................................................................695HAWQ Extension Parameters.............................................................................................. 696

    Configuration Parameters................................................................................................................697add_missing_from.................................................................................................................697

  • Contents

    8

    application_name.................................................................................................................. 697array_nulls.............................................................................................................................697authentication_timeout.......................................................................................................... 697backslash_quote................................................................................................................... 698block_size............................................................................................................................. 698bonjour_name....................................................................................................................... 698check_function_bodies..........................................................................................................698client_encoding..................................................................................................................... 698client_min_messages............................................................................................................699cpu_index_tuple_cost........................................................................................................... 699cpu_operator_cost................................................................................................................ 699cpu_tuple_cost...................................................................................................................... 700cursor_tuple_fraction.............................................................................................................700custom_variable_classes...................................................................................................... 700DateStyle...............................................................................................................................700db_user_namespace.............................................................................................................701deadlock_timeout.................................................................................................................. 701debug_assertions..................................................................................................................701debug_pretty_print................................................................................................................ 701debug_print_parse................................................................................................................ 701debug_print_plan.................................................................................................................. 702debug_print_prelim_plan.......................................................................................................702debug_print_rewritten........................................................................................................... 702debug_print_slice_table........................................................................................................ 702default_statistics_target........................................................................................................ 703default_tablespace................................................................................................................ 703default_transaction_isolation.................................................................................................703default_transaction_read_only.............................................................................................. 703dynamic_library_path............................................................................................................ 703effective_cache_size.............................................................................................................704enable_bitmapscan............................................................................................................... 704enable_groupagg.................................................................................................................. 704enable_hashagg....................................................................................................................704enable_hashjoin.................................................................................................................... 705enable_indexscan................................................................................................................. 705enable_mergejoin..................................................................................................................705enable_nestloop....................................................................................................................705enable_seqscan.................................................................................................................... 706enable_sort........................................................................................................................... 706enable_tidscan...................................................................................................................... 706escape_string_warning......................................................................................................... 706explain_pretty_print...............................................................................................................706extra_float_digits................................................................................................................... 707from_collapse_limit................................................................................................................707gp_adjust_selectivity_for_outerjoins..................................................................................... 707gp_analyze_relative_error.....................................................................................................707gp_autostats_mode...............................................................................................................708gp_backup_directIO.............................................................................................................. 708gp_backup_directIO_read_chunk_mb.................................................................................. 708gp_cached_segworkers_threshold........................................................................................709gp_command_count..............................................................................................................709gp_connectemc_mode..........................................................................................................709gp_connection_send_timeout............................................................................................... 710gp_connections_per_thread..................................................................................................710gp_content............................................................................................................................ 710gp_create_table_random_default_distribution...................................................................... 710

  • Contents

    9

    gp_dbid................................................................................................................................. 711gp_debug_linger................................................................................................................... 711gp_dynamic_partition_pruning.............................................................................................. 711gp_enable_adaptive_nestloop.............................................................................................. 711gp_enable_agg_distinct........................................................................................................ 712gp_enable_agg_distinct_pruning.......................................................................................... 712gp_enable_direct_dispatch................................................................................................... 712gp_enable_fallback_plan...................................................................................................... 712gp_enable_fast_sri................................................................................................................713gp_enable_groupext_distinct_gather.................................................................................... 713gp_enable_groupext_distinct_pruning.................................................................................. 713gp_enable_multiphase_agg.................................................................................................. 713gp_enable_predicate_propagation........................................................................................714gp_enable_preunique........................................................................................................... 714gp_enable_sequential_window_plans.................................................................................. 714gp_enable_sort_distinct........................................................................................................ 714gp_enable_sort_limit............................................................................................................. 715gp_external_enable_exec..................................................................................................... 715gp_external_grant_privileges................................................................................................ 715gp_external_max_segs......................................................................................................... 715gp_filerep_tcp_keepalives_count..........................................................................................716gp_filerep_tcp_keepalives_idle............................................................................................. 716gp_filerep_tcp_keepalives_interval....................................................................................... 716gp_fts_probe_interval............................................................................................................716gp_fts_probe_retries............................................................................................................. 717gp_fts_probe_threadcount.................................................................................................... 717gp_fts_probe_timeout........................................................................................................... 717gp_log_fts..............................................................................................................................717gp_hadoop_home................................................................................................................. 718gp_hadoop_target_version................................................................................................... 718gp_hashjoin_tuples_per_bucket........................................................................................... 718gp_idf_deduplicate................................................................................................................ 718gp_interconnect_fc_method..................................................................................................719gp_interconnect_hash_multiplier...........................................................................................719gp_interconnect_queue_depth..............................................................................................719gp_interconnect_setup_timeout............................................................................................ 720gp_interconnect_snd_queue_depth......................................................................................720gp_interconnect_type............................................................................................................720gp_log_format....................................................................................................................... 720gp_max_csv_line_length.......................................................................................................721gp_max_databases...............................................................................................................721gp_max_filespaces............................................................................................................... 721gp_max_local_distributed_cache..........................................................................................721gp_max_packet_size............................................................................................................ 722gp_max_plan_size................................................................................................................ 722gp_max_tablespaces............................................................................................................ 722gp_motion_cost_per_row......................................................................................................722gp_num_contents_in_cluster................................................................................................ 722gp_reject_percent_threshold.................................................................................................723gp_reraise_signal..................................................................................................................723gp_resqueue_memory_policy............................................................................................... 723gp_resqueue_priority............................................................................................................ 723gp_resqueue_priority_cpucores_per_segment.....................................................................724gp_resqueue_priority_sweeper_interval............................................................................... 724gp_role.................................................................................................................................. 724gp_safefswritesize.................................................................................................................724

  • Contents

    10

    gp_segment_connect_timeout.............................................................................................. 725gp_segments_for_planner.................................................................................................... 725gp_session_id....................................................................................................................... 725gp_set_proc_affinity.............................................................................................................. 725gp_set_read_only..................................................................................................................726gp_statistics_pullup_from_child_partition..............................................................................726gp_statistics_use_fkeys........................................................................................................ 726gp_vmem_idle_resource_timeout......................................................................................... 726gp_vmem_protect_limit......................................................................................................... 726gp_vmem_protect_segworker_cache_limit...........................................................................727gp_workfile_checksumming.................................................................................................. 727gp_workfile_compress_algorithm..........................................................................................727gp_workfile_limit_files_per_query......................................................................................... 728gp_workfile_limit_per_query..................................................................................................728gp_workfile_limit_per_segment.............................................................................................728integer_datetimes..................................................................................................................728IntervalStyle.......................................................................................................................... 729join_collapse_limit................................................................................................................. 729krb_caseins_users................................................................................................................ 729krb_server_keyfile................................................................................................................. 729krb_srvname......................................................................................................................... 730lc_collate............................................................................................................................... 730lc_ctype................................................................................................................................. 730lc_messages......................................................................................................................... 730lc_monetary...........................................................................................................................730lc_numeric.............................................................................................................................731lc_time...................................................................................................................................731listen_addresses................................................................................................................... 731local_preload_libraries.......................................................................................................... 731log_autostats.........................................................................................................................732log_connections.................................................................................................................... 732log_disconnections................................................................................................................732log_dispatch_stats................................................................................................................ 732log_duration.......................................................................................................................... 732log_error_verbosity............................................................................................................... 733log_executor_stats................................................................................................................ 733log_hostname........................................................................................................................733log_min_duration_statement.................................................................................................733log_min_error_statement...................................................................................................... 734log_min_messages............................................................................................................... 734log_parser_stats....................................................................................................................735log_planner_stats..................................................................................................................735log_rotation_age................................................................................................................... 735log_rotation_size................................................................................................................... 736log_statement........................................................................................................................736log_statement_stats.............................................................................................................. 736log_timezone.........................................................................................................................736log_truncate_on_rotation...................................................................................................... 737max_appendonly_tables....................................................................................................... 737max_connections.................................................................................................................. 737max_files_per_process......................................................................................................... 738max_fsm_pages....................................................................................................................738max_fsm_relations................................................................................................................ 738max_function_args................................................................................................................738max_identifier_length............................................................................................................ 738max_index_keys................................................................................................................... 739

  • Contents

    11

    max_locks_per_transaction...................................................................................................739max_prepared_transactions..................................................................................................739max_resource_portals_per_transaction................................................................................739max_resource_queues..........................................................................................................740max_stack_depth.................................................................................................................. 740max_statement_mem............................................................................................................740optimizer................................................................................................................................740optimizer_analyze_root_partition.......................................................................................... 741optimizer_control...................................................................................................................741optimizer_minidump.............................................................................................................. 741pgcrypto.fips..........................................................................................................................742password_encryption............................................................................................................ 743pgstat_track_activity_query_size.......................................................................................... 743pljava_classpath....................................................................................................................743pljava_statement_cache_size............................................................................................... 743pljava_release_lingering_savepoints.................................................................................... 744pljava_vmoptions.................................................................................................................. 744port........................................................................................................................................ 744random_page_cost............................................................................................................... 744regex_flavor.......................................................................................................................... 745resource_cleanup_gangs_on_wait....................................................................................... 745resource_select_only............................................................................................................ 745search_path.......................................................................................................................... 745seq_page_cost......................................................................................................................746server_encoding....................................................................................................................746server_version.......................................................................................................................746server_version_num..............................................................................................................746shared_buffers...................................................................................................................... 746shared_preload_libraries.......................................................................................................747ssl..........................................................................................................................................747ssl_ciphers............................................................................................................................ 747standard_conforming_strings................................................................................................748statement_mem.................................................................................................................... 748statement_timeout.................................................................................................................748stats_queue_level................................................................................................................. 749superuser_reserved_connections......................................................................................... 749tcp_keepalives_count............................................................................................................749tcp_keepalives_idle...............................................................................................................749tcp_keepalives_interval.........................................................................................................749temp_buffers......................................................................................................................... 750TimeZone.............................................................................................................................. 750timezone_abbreviations........................................................................................................ 750track_activities.......................................................................................................................751track_counts..........................................................................................................................751transaction_isolation............................................................................................................. 751transaction_read_only...........................................................................................................751transform_null_equals...........................................................................................................752unix_socket_directory........................................................................................................... 752unix_socket_group................................................................................................................ 752unix_socket_permissions...................................................................................................... 752update_process_title............................................................................................................. 752vacuum_cost_delay.............................................................................................................. 753vacuum_cost_limit.................................................................................................................753vacuum_cost_page_dirty...................................................................................................... 753vacuum_cost_page_hit......................................................................................................... 753vacuum_cost_page_miss......................................................................................................754

  • Contents

    12

    vacuum_freeze_min_age......................................................................................................754xid_stop_limit........................................................................................................................ 754xid_warn_limit....................................................................................................................... 754

  • 13

    About the Pivotal HAWQ Documentation

    Revised October 6, 2015.

    This documentation provides step-by-step procedures for installing, configuring, and managing, PivotalHAWQ. The guide also provides a complete reference for the HAWQ tools and configuration.

    Intended AudienceThis documentation is intended for administrators who wans to install or deploy HAWQ, as well asdevelopers who want to program applications that access a HAWQ system. It assumes that you haveknowledge of Linux/UNIX system administration, database management systems, database administration,and Structured Query Language (SQL). Because HAWQ is based on PostgreSQL 8.2.15, the sectionsassume that you have some familiarity with PostgreSQL.

  • Supported Configurations and System Requirements

    14

    Chapter 1

    Supported Configurations and SystemRequirements

    Check that you meet the following system requirements before you install HAWQ:

    Hadoop and Ambari Pivotal HD 3.0.1 with Ambari 1.7.1, or Hortonworks Data Platform 2.2.6 with Ambari 1.7 or 2.0.x

    Note: You must set the HDFS parameter dfs.block.access.token.enable to the correctvalue depending on whether you are running a secure or unsecured HDFS cluster:

    Set dfs.block.access.token.enable to false for unsecured HDFS clusters. Set dfs.block.access.token.enable to true for secured clusters.

    This property can be set within Ambari via Services > HDFS > Configs > Advanced hdfs-site >dfs.block.access.token.enable. After modifying this parameter, you must restart HDFS. See thePivotal HAWQ 1.3.1.0 Release Notes for more information.

    PXFHAWQ 1.3.1 requires PXF version 2.5.1. The required version of PXF is included in the HAWQ distribution.

    Operating System Red Hat Enterprise Linux (RHEL) 6.4+, 64 bit* CentOS 6.4+, 64 bit* SUSE Linux Enterprise Server (SLES) 11 SP3

    *Red Hat and CentOS version 6 are supported, starting at minor version 6.4. Major versions higher than 6,such as Red Hat 7 and CentOS 7, are not yet supported.

    Minimum CPUIntel 64 compatible (Nehalem and above). For production cluster, recommended number of CPUs is 2 (withat least 8 physical cores each).

    Minimum Memory16 GB RAM per server. Recommended memory on a production cluster is 128 GB.

    Disk Requirements 2GB per host for HAWQ installation. Approximately 300MB per segment instance for metadata. Appropriate free space for data: disks should have at least 30% free space (no more than 70%

    capacity). High-speed, local storage

  • Supported Configurations and System Requirements

    15

    Network Requirements Gigabit Ethernet within the array. For a production cluster, 10 Gigabit Ethernet recommended. Dedicated, non-blocking switch.

    Software and Utilities Python 2.7 or higher Java 1.6 or higher httpd service createrepo bash shell GNU tar GNU zip

  • Pivotal HAWQ 1.3.1.x Release Notes

    16

    Chapter 2

    Pivotal HAWQ 1.3.1.x Release Notes

    This section contains release notes for all Pivotal HAWQ 1.3.x releases.

  • Pivotal HAWQ 1.3.1.x Release Notes

    17

    Pivotal HAWQ 1.3.1.0 Release NotesRev: A02

    Published: September 16, 2015

    Contents About the Pivotal HAWQ Components New Features and Changes Supported Platforms Installation Options Upgrade Paths Resolved Issues Known Issues HAWQ and Pivotal HD Documentation

    About the Pivotal HAWQ ComponentsPivotal HAWQ comprises the following components:

    HAWQ Parallel SQL Query Engine version 1.3.1.0 HAWQ/PXF Ambari Plug-in version 1.3 Pivotal Extension Framework (PXF) version 2.5.1.0 MADlib

    HAWQ Parallel SQL Query EnginePivotal HAWQ supports low-latency analytic SQL queries, coupled with massively parallel machinelearning capabilities, to shorten data-driven innovation cycles for the enterprise. HAWQ enables discovery-based analysis of large data sets and rapid, iterative development of data analytics applications that applydeep machine learning. It reads data from and writes data to HDFS natively. Using HAWQ functionality,you can interact with petabyte range data sets. HAWQ provides users with a complete, standards-compliant SQL interface. Leveraging Pivotals parallel database technology, it consistently performs tens tohundreds of times faster than all Hadoop query engines in the market.

    HAWQ/PXF Ambari Plug-in version 1.3The Ambari plug-in enables installation for HAWQ and PXF services on either a Pivotal HD 3.0.1 cluster ora Hortonworks Data Platform (HDP) 2.2.6 cluster. In previous releases, HAWQ installation was managedby Pivotal Command Center (PCC) and Pivotal ICM. In HAWQ 1.3 installation and configuration aremanaged by the Ambari plug-in. See Installing HAWQ and PXF for more information.

    The plug-in enables:

    Configuring operating system and HDFS parameters that are required for HAWQ* Installing HAWQ and PXF as a service Configuring the HAWQ service Initializing HAWQ for the first time Starting and stopping the HAWQ and PXF services Monitoring the health of services Failing over from a primary HAWQ master service to a standby service Supports Ambari version 2.0.x and Ambari 1.7.x

  • Pivotal HAWQ 1.3.1.x Release Notes

    18

    Note: When installing to HDP 2.2.6, certain patches must be applied to Ambari 1.7.0, and certainHDFS parameters must be manually configured. See Install and Patch HDP 2.2.6 and Ambari1.7.0.

    This version of the Ambari plug-in does not support changing the number of HAWQ segments ona host after installation; using the Ambari HAWQ configuration page to change segments has noeffect.

    This version of the Ambari plug-in does not support expanding or shrinking HAWQ services via theAmbari UI.

    Pivotal Extension Framework (PXF)PXF enables SQL querying on data in the Hadoop components such as HBase, Hive, and any otherdistributed data file types. These queries execute in a single, zero materialization and fully-parallelworkflow. PXF also uses the HAWQ advanced query optimizer and executor to run analytics on theseexternal data sources. PXF connects Hadoop-based components to facilitate data joins, such as betweenHAWQ tables and HBase table. Additionally, the framework is designed for extensibility, so that user-defined connectors can provide parallel access to other data storage mechanisms and file types.

    PXF Interoperability

    PXF operates as an integral part of HAWQ, and as a light add-on to Pivotal HD. On the database side,PXF leverages the external table custom protocol system. The PXF component physically lives on theNamenode and each or some Datanodes. It operates mostly as a separate service and does not interferewith Hadoop components internals.

    Note: In order for PXF to interopare with HBase, you must manually add the PXF HBase JAR fileto the HBase classpath after installation. See Post-Install Procedure for Hive and HBase on HDP.

    MADlibMADlib is an open-source library for scalable in-database analytics. It provides data-parallelimplementations of mathematical, statistical and machine learning methods for structured and unstructureddata. MADlib combines the efforts used in commercial practice, academic research, and open-sourcedevelopment. You can find more information at http://madlib.net.

    New Features and ChangesHAWQ 1.3.1 includes the following new features and changes in behavior.

    Improved PXF Support for Avro Complex TypesIn previous releases, PXF supported Avro complex types by storing each component of the complex typein a separate column of the external table. This made it difficult to re-use the data in HAWQ, because thenumber of table columns mapped by PXF did not match the number of top-level fields in the Avro data. Inthis release of HAWQ, the PXF Avro profile supports Avro complex types by storing the data in a singleTEXT column, using delimiters to separate the component entries, key-value pairs, and records. Thisenables you to process components of the complex type programmatically. See Working with Avro Files formore information and examples.

    Support for Hive Tables in Parquet FormatIn this release of HAWQ, the PXF Hive profile now supports Hive tables that are stored in Parquet format.Support is provided for both non-partitioned and partitioned Hive tables. See Accessing Hive Tables inParquet Format for more information.

    http://madlib.net

  • Pivotal HAWQ 1.3.1.x Release Notes

    19

    EXPLAIN Shows Optimizer NameThe EXPLAIN command displays the setting of the server configuration parameter OPTIMIZER for a queryand whether the Pivotal Query Optimizer or the legacy query optimizer generated the explain plan.

    When the Pivotal Query Optimizer generates the query plan, the setting optimizer=on and thePivotal Query Optimizer version are displayed at the end of the query plan. For example.

    Settings: optimizer=on Optimizer status: PQO version 1.584

    When HAWQ falls back to the legacy optimizer to generate the plan, the setting optimizer=on andlegacy query planner are displayed at the end of the query plan. For example.

    Settings: optimizer=on Optimizer status: legacy query planner

    When the server configuration parameter OPTIMIZER is off, these lines are displayed at the end of aquery plan.

    Settings: optimizer=off Optimizer status: legacy query planner

    See also Determining the Query Optimizer that is Used.

    Configuring Multiple HAWQ SegmentsThis release of HAWQ no longer uses the hawq.segments.per.node property to configure the numberof HAWQ segments. To create multiple HAWQ segments for additional concurrency, specify multipledirectory entries for the hawq.data.directory property. See Install HAWQ and PXF with Ambari.

    Plug-in Support for Ambari 2.0.xThe HAWQ/PXF Ambari Plug-in included in this release now supports Ambari version 2.0.x in addition toAmbari 1.7.x. See Supported Configurations and System Requirements for additional information aboutHadoop and Ambari support.

    Plug-in Support for Automatic Kerberos ConfigurationThe HAWQ/PXF Ambari Plug-in included in this now deploys a kerberos.json file, which helps toautomate the security configuration for HAWQ and PXF when using Ambari 2.0.

    Documentation CorrectionsThis release of the HAWQ documentation includes these documentation corrections:

    Mapping Hive Complex TypesPrevious versions of the documentation incorrectly stated that PXFmaps each component of a Hive complex data type to a separate column of the PXF external table.The documentation was corrected to explain that Hive complex types map to TEXT columns, insertingdelimiters between the component parts. See also Table 18: Additional PXF Options.

    The HAWQ version 1.3.0 release notes incorrectly stated that issue GPSQL-3116 had been resolved.GPSQL-3116 has been resolved in the current release.

    Two known issues, GPSQL-2386 and GPSQL-2491, were resolved in HAWQ version 1.3.0 but werenot mentioned in the release notes. Resolved Issues in the HAWQ 1.3.0 release notes now describesthese fixes.

    http://hawq.docs.pivotal.io/130/docs-hawq/topics/HAWQ130ReleaseNotes.html#resolvedissues

  • Pivotal HAWQ 1.3.1.x Release Notes

    20

    Supported PlatformsSee Supported Configurations and System Requirements for a complete description of supportedplatforms. See Supported Configurations and System Requirements for additional information aboutHadoop and Ambari support.

    Installation OptionsHAWQ 1.3.1 supports interactive installation using Ambari, or manual installation via the Linux commandline. See Installing HAWQ and PXF.

    Upgrade PathsPivotal supports a manual upgrade process for migrating an existing HAWQ 1.3.0.x deployment to version1.3.1. See Installing HAWQ and PXF.

    Note: Upgrading from previous versions of HAWQ (1.2.x) is not yet supported. Pivotal is working toprovide an upgrade path in an updated version of the product.

    Resolved Issues

    HAWQ 1.3.1 Resolved Issues

    ID Category Description

    GPSQL-3116 ERROR: permission denied for relation t_id (seg2 slice1 hdw4.shins.dom:40000 pid=6860)

    GPSQL-3167 After starting and successfully canceling (ctrl-c) a large copy operation(over 100 GB) from a table to a text file, retrying the same copyoperation and cancel request would hang or result in a segmentationfault (SIGSEGV).

    MPP-18646,MPP-25052

    Query Execution For some SQL queries that generated a query plan with Shared Scannodes above a Materialize node, memory for the Shared Scan nodewas freed twice. In some cases, this caused a segmentation fault(SEGV).

    MPP-19965 Query Execution Incorrect results could be returned for a query in rare cases where,during query processing:

    the result of an operation was shared between multiple parts of aquery through a Shared Scan operator, and

    the readers of the Shared Scan appeared on both sides of a Joinoperator, and

    at least two of the readers appeared in the same plan slice.

    When all of the above conditions were met, one of the readers couldpotentially read only zero tuples from the Shared Scan operator. Thisproblem has been fixed in this release. The fix may increase memoryconsumption for queries in which the above conditions hold.

    MPP-22413 Query Planner Incorrect results could be returned for a query that combined a medianfunction with other aggregates when the GROUP BY columns were asubset of the table's distribution columns.

  • Pivotal HAWQ 1.3.1.x Release Notes

    21

    ID Category Description

    MPP-25643 Query Optimizer For some queries that contained a computed column in a GROUP BYclause, HAWQ generated an execution plan that incorrectly pulledthe computed column above the GROUP BY operation. This caused aHAWQ PANIC. This issue has been resolved.

    MPP-25697 Query Optimizer For queries that contain a UNION or UNION ALL over multiplesubexpressions, Pivotal Query Optimizer generated an execution planwith cascaded UNION or UNION ALL operators.

    Now, a more efficient plan is generated with a single n-ary UNION orUNION ALL operator.

    MPP-24438 Query Optimizer For queries that contain a GROUP BY clause that was used to groupthe results of a UNION or UNION ALL clause over more than twosubexpressions, Pivotal Query Optimizer generated an execution planthat contained cascaded UNION or UNION ALL operators.

    Now, the GROUP BY operation is pushed below the UNION or UNIONALL operator.

    MPP-25743 Query Optimizer The analyzedb utility returned an error if the utility encountered a tableor schema that contained uppercase and lowercase characters. Thisissue has been resolved.

    MPP-25722 Query Optimizer For some queries that contained computed columns in a subquery thatwere not used in the main query, Pivotal Query Optimizer generated anexecution plan that contained the unused computed columns.

    Now, Pivotal Query Optimizer generates a more efficient plan that doesnot contain unused computed columns.

    MPP-25700 Query Optimizer If the WHERE clause of a query compares a literal to a non-integerdistribution column, and if the data type of the literal differs from thatof the distribution column, the Pivotal Query Optimizer might havegenerated a query execution plan that dispatched to an incorrectsegment. This dispatching problem caused incorrect results to bereturned. This issue has been resolved.

    MPP-22682 Query Execution In some cases where a query plan contained two Shared Scanoperators that consumed results from a Materialize operator, and theShared Scan operators were executed in different slices, the querycould hang indefinitely with a deadlock. This issue has been resolved.

    #96244280 Query Optimizer Some queries on partitioned tables caused a PANIC if the querycontained both of the following features:

    An IN predicate that includes the table partitioning key A subquery whose output column is the same partitioning key from

    the outer query

    This issue has been resolved

  • Pivotal HAWQ 1.3.1.x Release Notes

    22

    PXF 2.x.x Resolved Issues

    ID Category Description

    GPSQL-3178 Issue If the Kerberos ticket lifetime was configured to expire before the defaultof 12 hours, then PXF writable external tables would not renew thekerberos ticket after it expired. This issue has been resolved.

    93163536 Previous versions of PXF did not support Hive tables that wereconfigured with a Hive default partition. If you mapped a PXF externaltable to a Hive table that used a default partition, then PXF mappedall partition values to regular fields and expected all partitions to returnactual values. Querying such a table resulted in an error similar to:

    ERROR: remote component error (500) from ':': type Exception report message java.lang.Exception: java.lang.IllegalArgumentException description The server encountered an internal error that prevented it from fulfilling this request. exception javax.servlet.ServletException: java.lang.Exception: java.lang.IllegalArgumentException (libchurl.c:852) (seg0 slice1 : pid=40893) (cdbdisp.c:1572) DETAIL: External table

    The code was modified to provide PXF support for Hive tables that usedefault partitions. See Using PXF with Hive Default Partitions to learnabout differences in query results between Hive and PXF queries whenHive tables use a default partition.

    HAWQ/PXF Ambari Plugin 1.3 Resolved Issues

    ID Description

    1913692 Added support for Ambari 2.0

    100442654 Enhancement to automate configuring security for HAWQ and PXF with Ambari 2.0

    94218980 Included changes to allow HAWQ installation on a single-node environment/VM. Seedocumentation for setup instructions.

    95638816 Enhanced ActivateStandby service action for HAWQ to include error handling

    98492306 Enhancement to suggest setting dfs.allow.truncate in hdfs-site.xml to True if notspecified

    98659238 Changed HAWQ HDFS blocksize default to 128 MB in hdfs-client.xml deployed byAmbari HAWQ plugin

    98511506 Implemented a Service Check to discover Active HAWQ Master and redirect start / stopcommands to it appropriately via Ambari

    99270494 Modified Ambari HAWQ-plugin to ssh as gpadmin user instead of root

    97746494 Implemented a check to ensure PXF instance directory permissions and ownershipsare set up correctly

    100060298 Updated Ambari hawq-plugin to use enhanced PXF service status/stop/start

  • Pivotal HAWQ 1.3.1.x Release Notes

    23

    ID Description

    99554510 Removed PXF installation dependency on Datanode due to Ambari Stack Advisorlimitations

    88202194 Included snappy jars PXF classpath to allow reading AVRO binary when snappycompression is enabled

    98596572 Fixed an issue causing PXF installation errors in secure clusters via Ambari

    96580230 Fixed an issue for PXF to call configure() from start() method so that configuration canbe updated during service restart, allowing enabling security during initialization.

    96603728 Ability to add custom property to pxf-site via Ambari by managing pxf-site.xml as aconfig file

    98062156 Included a change so Ambari calls pxf-service init only during initial installation

    99267742 Increased init timeout in metainfo.xml to 4000 seconds for larger clusters

    95579450 Fixed an issue to allow defining multiple space-delimited directory locations for HAWQsegment data directory via Ambari

    95675652 Fixed an issue where HAWQ Standby Master doesn't start after HAWQ Master startsup. Added a dependency for HAWQ Master to require Standby Master to start first.

    97345280 Removed hard-coded gpadmin home directory and derive it dynamically

    97844242 Fixed an issue where HAWQ Restart service action fails if HAWQ is already stopped

    97860242 Fixed an issue with HAWQ service check failure in some cases

    97140182 Fixed an issue with yum installation to ensure local repo location is added whenrunning setup_repo.sh

    96201608 Fixed an issue where Ambari HAWQ plugin upgrade from version 1.0 to 1.2 sets uprepo file incorrectly

    96580172 Fixed an issue where restart of HAWQ service through Ambari creates multiple entriesin limits.conf and sysctl.conf

    96631766 Fixed an issue with HAWQ hdfs.headless.keytab path to be user-defined and not hard-coded

    96632416 Fixed an issue with HAWQ data directory to be user-defined and not hard-coded

    96744678 Fixed an issue with HAWQ service Restart not working in Ambari due to port conflictwith Standby

    97135638 Fixed an issue with not working in Ambari 2.0

    100265516 Fixed duplicates entries being appended to ~gpadmin/.bashrc on HAWQ service start

    97771440 Added a change so sysctl and limits files in HAWQ are modified only if a user hadchanged a parameter

    95764984 Enhance plugin to automatically configure hawq database security parameters

  • Pivotal HAWQ 1.3.1.x Release Notes

    24

    Known Issues

    HAWQ 1.3.1 Known Issues

    Issue Description

    GPSQL-3163 For queries involving distinct aggregates expressed as window functions, the query mayreturn wrong results because of dropping the distinct qualifier in the window operator.

    88199038 When the HEADER formatting option s specified is specified for an external table, PXFignores the first line of each file fragment read from each segment, rather than just thefirst line from each file. For this reason, only use the PXF HEADER formatting option withthe HdfsTextMulti profile or another non-splittable profile. For CSV files or other filesthat include a header line, use an error table instead of the HEADER formatting option.

    90665806 If you use Ambari to install additional codecs (for example, Hadoop-LZO) then Ambariadds the codecs to /etc/hadoop/conf/core-site.xml. However, Ambari doesnot add those codecs to the PXF classpath. When codecs are not included in the PXFclasspath, queries will yield an exception similar to:

    pxf=# select * from some_table ; ERROR: remote component error (500) from '10.32.36.88:51200': type Exception report message java.lang.NoClassDefFoundError: Could not initialize class com.pivotal.pxf.plugins.hdfs.utilities.HdfsUtilities description The server encountered an internal error that prevented it fromfulfilling this request. exception javax.servlet.ServletException: java.lang.NoClassDefFoundError: Could not initialize class com.pivotal.pxf.plugins.hdfs.utilities.HdfsUtilities (libchurl.c:852)

    To resolve this issue:

    1. Add the path to each codec JAR (for example, /usr/phd/3.0.0.0-210/hadoop/lib/hadoop-lzo-0.6.0.3.0.0.0-210.jar) to the PXF classpath.

    2. Restart PXF to use the updated classpath.

  • Pivotal HAWQ 1.3.1.x Release Notes

    25

    Issue Description

    91314486 You must set the HDFS parameter dfs.block.access.token.enable to the correctvalue depending on whether you are running a secure or unsecured HDFS cluster:

    Set dfs.block.access.token.enable to false for unsecured HDFS clusters. Set dfs.block.access.token.enable to true for secured clusters.

    This property can be set within Ambari via Services > HDFS > Configs > Advancedhdfs-site > dfs.block.access.token.enable. After modifying this parameter, you mustrestart HDFS.

    If you run an unsecured cluster but the parameter is set to true, then under certain high I/O workloads HAWQ may encounter the error:

    HdfsIOException: InputStreamImpl: cannot open file: Caused by: HdfsIOException: InputStreamImpl: failed to get block visible length for Block

    Conversely, if you run a secure HDFS cluster but the parameter is set to false you mayreceive the error:

    -ERROR: Append-Only Storage Read could not open segment file 'hdfs://' for relation ''-DETAIL: HdfsIOException: InputStreamImpl: c