user guide - developer-res-cbc-cn.obs.cn-north-1 ... · data is input manually or using object...

Cloud Stream Service

User Guide

Issue 15

Date 2018-07-02

HUAWEI TECHNOLOGIES CO., LTD.

Copyright © Huawei Technologies Co., Ltd. 2018. All rights reserved.No part of this document may be reproduced or transmitted in any form or by any means without prior writtenconsent of Huawei Technologies Co., Ltd. Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.All other trademarks and trade names mentioned in this document are the property of their respectiveholders. NoticeThe purchased products, services and features are stipulated by the contract made between Huawei and thecustomer. All or part of the products, services and features described in this document may not be within thepurchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,and recommendations in this document are provided "AS IS" without warranties, guarantees orrepresentations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in thepreparation of this document to ensure accuracy of the contents, but all statements, information, andrecommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.Address: Huawei Industrial Base

Bantian, LonggangShenzhen 518129People's Republic of China

Website: http://www.huawei.com

Email: [email protected]

Issue 15 (2018-07-02) Huawei Proprietary and ConfidentialCopyright © Huawei Technologies Co., Ltd.

i

http://www.huawei.com

mailto:[email protected]

Contents

1 Introduction.................................................................................................................................... 11.1 CS................................................................................................................................................................................... 11.2 Application Scenarios.....................................................................................................................................................11.3 Functions........................................................................................................................................................................ 21.4 Related Services............................................................................................................................................................. 3

2 Getting Started............................................................................................................................... 5

3 Operation Guide..........................................................................................................................153.1 Logging In to the CS Management Console................................................................................................................ 153.2 Applying for CS............................................................................................................................................................153.3 Creating an Agency for Permission Granting...............................................................................................................173.4 Viewing System Summary........................................................................................................................................... 203.5 Preparing the Data Source and Output Channel...........................................................................................................213.6 Job Management...........................................................................................................................................................223.6.1 Introduction............................................................................................................................................................... 223.6.2 Creating a Flink Streaming SQL Job.........................................................................................................................243.6.3 Creating a Flink Streaming SQL Edge Job............................................................................................................... 273.6.4 Creating a User-Defined Flink Job............................................................................................................................ 303.6.5 Creating a User-Defined Spark Job........................................................................................................................... 333.6.6 Debugging a Job........................................................................................................................................................ 363.6.7 Visual Editor.............................................................................................................................................................. 373.6.8 Performing Operations on a Job................................................................................................................................ 503.6.9 Monitoring a Job........................................................................................................................................................533.7 Template Management..................................................................................................................................................583.8 Cluster Management.....................................................................................................................................................633.9 User Quota Management.............................................................................................................................................. 683.10 VPC Peering Connection............................................................................................................................................693.11 Audit Log....................................................................................................................................................................71

4 SQL Syntax Reference................................................................................................................ 764.1 Syntax Constraints........................................................................................................................................................764.2 Data Type......................................................................................................................................................................764.3 Operator........................................................................................................................................................................ 784.4 Function........................................................................................................................................................................ 84

Cloud Stream ServiceUser Guide Contents


ii

4.5 Geographical Functions................................................................................................................................................894.6 DDL Statement............................................................................................................................................................. 934.7 DML Statement...........................................................................................................................................................1114.7.1 SQL Syntax Definition............................................................................................................................................ 1114.7.2 SELECT...................................................................................................................................................................1124.7.3 Condition Expression...............................................................................................................................................1154.7.4 Window....................................................................................................................................................................1174.7.5 JOIN Between Stream Data and Table Data............................................................................................................1194.8 Configuring Time Models.......................................................................................................................................... 1214.9 CEP Based on Pattern Matching................................................................................................................................ 1234.10 Reserved Keywords..................................................................................................................................................129

5 FAQ.............................................................................................................................................. 1395.1 What Is CS?................................................................................................................................................................ 1395.2 What Are the Features and Advantages of CS?..........................................................................................................1395.3 What Are the Application Scenarios of CS?.............................................................................................................. 1405.4 Which Data Sources Does CS Support?.....................................................................................................................1415.5 Where Can the Job Results be Exported?...................................................................................................................1415.6 Which Data Formats Does CS Support?.................................................................................................................... 1415.7 What Kind of Code-based Jobs Does CS Support?....................................................................................................1415.8 What Is the SPU?........................................................................................................................................................1425.9 How Is Job Concurrency Implemented?.................................................................................................................... 1425.10 How Can I Check Job Output Results?.................................................................................................................... 1425.11 What Should I Do If the OBS Bucket Selected for a Job Is Not Authorized?......................................................... 142

A Change History......................................................................................................................... 144

Cloud Stream ServiceUser Guide Contents


iii

1 Introduction

1.1 CSCloud Stream Service (CS) is a real-time big data stream analysis service running on thepublic cloud. Computing clusters are fully managed by CS, allowing you to focus on StreamSQL services. CS is fully compatible with Apache Flink 1.4.2 and Apache Spark 2.2.1 APIs.

Promoted by Huawei in the IT field, CS is a distributed real-time stream computing systemfeaturing low latency (millisecond-level latency), high throughput, and high reliability.Powered on Flink, CS integrates Huawei enhanced features and security, and supports bothstream processing and batch processing methods. It provides mandatory Stream SQL featuresfor data processing, and will add algorithms of machine learning and graph computing toStream SQL in the future.

1.2 Application ScenariosCS focuses on Internet and IoT service scenarios that require timeliness and high throughput.Basically, CS provides Internet of Vehicles (IoV) services, online log analysis, online machinelearning, online graph computing, and online algorithm application recommendation formultiple industries, such as small- and medium-sized enterprises in the Internet industry, IoT,IoV, and anti-financial fraud.

l Real-time stream analysisPurpose: to analyze big data in real timeFeature: Complex stream analysis methods, such as Window, CEP, and Join, can beperformed on stream data with millisecond-level latency.Application scenarios: real-time log analysis, network traffic monitoring, real-time riskcontrol, real-time data statistics, and real-time data Extract-Transform-Load (ETL)

l IoTPurpose: to analyze online IoT dataFeature: IoT services call the APIs of CS. CS then reads sensor data in real time andexecutes users' analysis logic. Analysis results are sent to services, such as DataIngestion Service (DIS) and Relational Database Service (RDS), for data persistency,alarm or report display, or visual display of results.

Cloud Stream ServiceUser Guide 1 Introduction


1

Application scenarios: elevator IoT, industrial IoT, shared bicycles, IoV, and smart home

1.3 FunctionsCS provides the following functions:

l Distributed real-time computingLarge-scale cluster computing and auto scaling of clusters reduce costs greatly.

l Fully hosted clustersCS provides visualized information on running jobs.

l Pay-as-you-goThe pricing unit is stream processing unit (SPU), and an SPU contains one core and 4GB memory. You are charged based on the running duration of specified SPUs, accurateto seconds.

l Secure isolationSecurity protection mechanisms for tenants ensure secure job running. Tenants'computing clusters are physically isolated from each other and protected by independentsecurity configurations.

l High throughput and low latencyCS reads data from DIS and enables real-time computing services with millisecond-levellatency. It also supports natural backpressure and high-throughput pressure.

l Stream SQL online analysisAggregation functions, such as Window and Join, are supported. SQL is used to expressbusiness logic, facilitating service implementation.

l Online SQL job testJob debugging helps you check whether the SQL statement logic is correct. After sampledata is input manually or using Object Storage Service (OBS) buckets, the correct SQLstatement logic will export results as expected.

l Support for Flink streaming SQL edge jobsIn certain scenarios, data needs to be analyzed and processed near where data isgenerated when a large amount of data is generated on edge devices, which reduces theamount of data to be migrated to the cloud and improves real-time data processing. Withcombination of CS and IEF, stream computing applications are deployed on edge nodesto realize real-time data computing at edge, not on the cloud. CS then edits and deliversthe stream processing job to edge nodes for execution. This helps you quickly andaccurately analyze and process streaming data at the edge in real time.

l Exclusive cluster creation and resource quota allocation for jobsTenants can create exclusive clusters, which are physically isolated from shared clustersand other tenants' clusters and are not subject to other jobs. Tenants can also configurethe maximum SPU quota for their exclusive clusters and allocate available clusters andSPU quota for sub-users.

l Customized Flink jobYou can submit customized Flink jobs in exclusive clusters.

l Support for Spark streaming and structured streamingYou can submit customized Spark streaming jobs in exclusive clusters.



2

l Interconnection with SMNCS can connect to Simple Message Notification (SMN), enabling transmission of thealarms generated in real-time data analysis to user's mobile phones in IoT scenarios.

l Interconnection with KafkaCS can connect to Kafka clusters, enabling you to use SQL statements to read data fromKafka and write data into Kafka.

l Interconnection with CloudTableCS can connect to CloudTable so that the stream data can be stored in tables.

l Interconnection with Cloud Search ServiceAfter CS interconnects with Cloud Search Service, you can enjoy the fully compatibleopen-source Elasticsearch to implement multi-condition retrieval, statistics, andreporting of structured and unstructured text.

l Interconnection with DCSDCS provides Redis-compatible, secure, reliable, out-of-the-box, distributed cachecapabilities allowing elastic scaling and convenient management. CS can interconnectwith DCS to meet users' requirements for high concurrency and fast data access.

1.4 Related ServicesThis section describes services that CS can work with. For details about the cloud serviceecosystems and open-source ecosystems of CS and ecosystem development guide, see theCloud Stream Service Stream Ecosystem Development Guide.

l Data Ingestion Service (DIS)By default, DIS serves as a data source of CS and stores outputs of CS jobs. .– Data source: DIS accesses user data and CS reads data from the channel used by

DIS as input data for jobs.– Data output: CS writes output of jobs into DIS.

l Object Storage Service (OBS)OBS serves as a data source and backs up checkpoint data for CS.– Data source: CS reads data stored by users in OBS as input data for jobs.– Checkpoint data backup and job log saving: If the checkpoint function or job log

saving function is enabled, CS stores job snapshots and logs to OBS. In the event ofexceptions, CS can recover the job based on checkpoint data backup and query joblogs to locate the fault.

l Relational Database Service (RDS)RDS stores CS job output results.

l Simple Message Notification (SMN)SMN provides reliable and flexible large-scale message notification services to CS. Itsignificantly simplifies system coupling and pushes messages to subscription endpointsbased on requirements.

l Cloud Table Service (CloudTable)CloudTable is a distributed, scalable, and fully-hosted KeyValue data storage servicebased on Apache HBase. It provides CS with high-performance random read and writecapabilities, which are helpful when applications need to store and query a massiveamount of structured data, semi-structured data, and time series data.



3

https://support.huaweicloud.com/en-us/ecodevg-cs/cs_05_0001.html

l Identity and Access Management (IAM)IAM authenticates access to CS.

l Cloud Trace Service (CTS)CTS provides users with records of operations on CS resources, facilitating query, audit,and backtracking.

l Cloud EyeCloud Eye helps monitor job metrics for CS, delivering status information in a conciseand efficient manner.

l Elastic Cloud Server (ECS)ECS provides CS with a computing server that consists of CPUs, memory, images, andElastic Volume Service (EVS) disks and allows on-demand allocation and elasticscaling. .

l Virtual Private Cloud (VPC)VPC enables you to provision logically isolated, configurable, and manageable virtualnetworks for ECSs, improving the security of your resources in the cloud andsimplifying network deployment. VPC provides VPC peering connections to CS.

l Intelligent EdgeFabric (IEF)IEF works together with CS to provide on-cloud management, stream processing onedge devices, and real-time stream processing. This satisfies your requirements forremote control on edge computing resources, data processing, analysis and decision-making, and intelligent target. You can deploy edge stream processing applications via afew clicks as well as develop and test edge stream processing jobs based on StreamSQLon the cloud. In this case, on-cloud stream processing capabilities are delivered to theedge. Meanwhile, management, provisioning, and monitoring of edge stream processingtasks are implemented on the cloud.

l Cloud Search ServiceCloud Search Service provides hosted, distributed search engine services for CS. It isfully compatible with open-source Elasticsearch and supports multi-condition retrieval,statistics, and reporting of structured and unstructured texts.

l Distributed Cache Service (DCS)DCS provides Redis-compatible, secure, reliable, out-of-the-box, distributed cachecapabilities allowing elastic scaling and convenient management. It meets users'requirements for high concurrency and fast data access.



4

2 Getting Started

This section describes how to use CS. The general procedure for using CS is as follows:

1. Preparing Data Sources and Data Output Channels2. Saving Output Data3. Applying for CS4. Creating and Submitting a Job5. Sending Data to DIS6. Viewing Job Information and Job Execution Result7. Viewing Other Relevant Documents8. Deleting a Job

This document uses an example of Flink streaming SQL job management to help youunderstand how to use CS. For details about Flink streaming SQL edge jobs, user-definedFlink jobs, and user-defined Spark jobs, see Job Management.

In the example, vehicle information is recorded in real time and information of Audi vehicleswhose prices are lower than CNY 300,000 is exported.

In this example, you need to create a job that has one source stream and one sink stream. Thesource stream is used to record vehicle information in real time. The sink stream is used toexport information about the Audi vehicles whose prices are lower than CNY 300,000.

Prerequisites

You have registered a CS account with the public cloud.

Preparing Data Sources and Data Output Channels

CS supports other services as data sources and data output channels. For details, seePreparing the Data Source and Output Channel.

In this example, DIS serves as the data source and data output channel. Therefore, you need todeploy DIS for job JobSample.

For example, create the following DIS streams for the JobSample job. For details, seeCreating a DIS Stream in the Data Ingestion Service User Guide.l DIS stream as the input stream

Cloud Stream ServiceUser Guide 2 Getting Started


5

https://support.huaweicloud.com/en-us/dis_gls/index.html

Stream Name: csinputStream Type: CommonPartitions: 1Source Data Type: BLOBData Retention Period (days): 1Data Dumping: Off

l DIS stream as the output stream

Stream Name: csoutputStream Type: CommonPartitions: 1Source Data Type: BLOBData Retention Period (days): 1Data Dumping: Off

NOTE

Retain the default values of Stream Type, Source Data Type, and Data Retention Period (days).

Saving Output Data

In this example, you need to enable OBS for job JobSample to provide CS with the functionsof checkpoint, saving job logs, and commissioning test data.

For example, create the following OBS buckets for the JobSample job. For details, seeCreating a Bucket in the Object Storage Service Console Operation Guide.

Information about the created OBS bucket is as follows:

Region: CN North-Beijing1

Bucket Name: smoke-test

Storage Class: Standard

Bucket Policy: Private

Advanced Settings: Do not configure

NOTE

Retain the default settings for Bucket Policy and Advanced Settings.

Applying for CS

You can log in to the CS management console through a browser and apply for CS.

Step 1 Log in to the CS management console.

If you have not registered with the public cloud, click Free Registration to register anaccount with the public cloud as prompted.

Step 2 The Apply for Cloud Stream Service page is displayed.



6

https://support.huaweicloud.com/en-us/obs_gls/index.html

https://auth.huaweicloud.com/authui/login?service=https%3A%2F%2Fconsole.huaweicloud.com%2Fcs%2F%3Fregion%3Dcn-north-1#/login

Figure 2-1 Applying for CS

Step 3 Select I have read and agree to the HUAWEI CLOUD User Agreement and click Apply.

Step 4 After the application succeeds, the system automatically switches to the Overview page. Seethe following figure.

Figure 2-2 Overview

Step 5 In the CS Service Agency window that is automatically displayed, click Go toauthorization.

Figure 2-3 Creating an agency



7

Step 6 On the Cloud Resource Access Authorization page that is displayed, click Agree toauthorize.

Figure 2-4 Cloud resource access authorization

----End

Creating and Submitting a JobTo use CS, you need to create a job first, for example, JobSample.

Step 1 In the navigation tree on the left pane of the CS management console, choose JobManagement.

Figure 2-5 Job Management

Step 2 On the Job Management page, click Create. On the displayed Create Job page, setparameters as required. See the following figure.



8

Figure 2-6 Creating a job

Step 3 Click OK to switch to the Edit page, as shown in the following figure. In the SQL statementediting area, SQL statements in the template selected in Step 2 are displayed.

Figure 2-7 Editing a job

Step 4 Edit SQL statements as required.

In this example, SQL statements in the template selected in Step 2 are used. The SQLstatement details are as follows:/*** This example uses a general stream analysis template. DIS serves as both the source stream and sink stream. Therefore, you need to enable DIS and create related source and sink streams. * >>>>>>>>> Ensure that you have created desired DIS streams with your account. <<<<<<<<<< * * >>>>>Sample input<<<<< * Stream name: car_infos(car_id,car_owner,car_brand,car_price): * 1,lilei,bmw320i,28 * 2,hanmeimei,audia4,27 * >>>>>Sample output<<<<< * Stream name: audi_cheaper_than_30w(car_id,car_owner,car_brand,car_price): * 2,hanmeimei,audia4,27



9

**/

/** Obtain data from DIS stream csinput and create an input stream. * * Reconfigure the following options according to actual conditions: * channel: Indicates the name of the stream where data is located. * partition_count: Indicates the number of partitions of the stream. * encode: Indicates the data encoding format, which can be csv or json. * field_delimiter: Indicates the separator used to separate every two attributes when the CSV encoding format is adopted. **/CREATE SOURCE STREAM car_infos ( car_id STRING, car_owner STRING, car_brand STRING, car_price INT)WITH ( type = "dis", region = "cn-north-1", channel = "csinput", partition_count = "1", encode = "csv", field_delimiter = ",");

/** Create an output stream and export the data to DIS stream csoutput. * * Reconfigure the following options according to actual conditions: * channel: Indicates the name of the stream where data is located. * partition_key: Indicates the primary key used for data provisioning when there are multiple partitions in the stream. * encode: Indicates the result encoding format, which can be csv or json. * field_delimiter: Indicates the separator used to separate every two attributes when the CSV encoding format is adopted. **/CREATE SINK STREAM audi_cheaper_than_30w ( car_id STRING, car_owner STRING, car_brand STRING, car_price INT)WITH ( type = "dis", region = "cn-north-1", channel = "csoutput", partition_key = "car_owner", encode = "csv", field_delimiter = ",");

/** Output information about the Audi vehicles whose prices are lower than CNY 300,000. **/INSERT INTO audi_cheaper_than_30wSELECT *FROM car_infosWHERE car_brand like "audi%" and car_price < 30;

/**************************Insert test data into source stream csinput*************************/CREATE SINK STREAM car_info_data ( car_id STRING, car_owner STRING, car_brand STRING, car_price INT)WITH ( type ="dis", region = "cn-north-1",



10

channel = "csinput", partition_key = "car_owner", encode = "csv", field_delimiter = ",\n");

INSERT INTO car_info_dataSELECT "1", "lilei", "bmw320i", 28;INSERT INTO car_info_dataSELECT "2", "hanmeimei", "audia4", 27;/************************ Stop the process of inserting test data to the source stream. *********************************/

NOTE

The SQL statement consists of three parts:

l Creating the input stream: Ensure that fields defined for the stream are in the same format as the datasource of the input stream. Parameters in the WITH field defined for the stream must be the same asthose of the input DIS stream.

l Creating an output stream: Ensure that parameters in the WITH field defined for the stream must bethe same as those of the output DIS stream.

l Business logic: Compile SQL statements according to the scenario and insert the result data into theoutput stream.

Step 5 Click Check Semantics.l You can perform Debug, Submit, and Start operations on a job only after semantic

verification succeeds.l If verification is successful, the message "The SQL semantic verification is complete. No

error." will be displayed.l If verification fails, a red "X" mark will be displayed to the front of each error SQL

statement. You can move the mouse to the "X" mark to view error details and change theSQL statement as prompted.

Step 6 On the Running Parameter page in the right pane of the Edit page, set the parameters asfollows:l Retain the default settings of SPUs and Parallelism.l Select Save Job Log and set OBS Bucket to smoke-test. If the selected OBS bucket has

not been authorized, click Authorize OBS.l Retain the default setting Cluster Shared for Job Cluster. Alternatively, you can select

a user-defined exclusive cluster. For details about how to create a user-defined exclusivecluster, see Creating a Cluster.



11

Figure 2-8 Running parameters

NOTE

CS provides the debugging function for you to verify the business logic of jobs by using test data. Fordetails, see Debugging a Job.

Step 7 Click Save.

Step 8 Click Submit. On the displayed Job Bill page, click OK to submit and start the job.

After the job is submitted, the system automatically switches to the Job Management page,and the created job is displayed in the job list. You can view the Status column to query thejob status. After a job is successfully submitted, Status of the job will change fromSubmitting to Running.



12

Figure 2-9 Job status

If Status of a job is Submission failed or Running exception, the job fails to be submitted orfails to run. In this case, click the job name JobSample in the job list and click Running Logto query the job's run log. Rectify faults based on the log and submit the job again.

----End

Sending Data to DIS

Use the obtained DIS stream as the data source. After the job is submitted, you cancontinuously upload data to DIS over the DIS stream to provide real-time stream data sourcesto CS jobs.

In this example, local data is uploaded to DIS through the csinput stream. For detailedoperations, see Sending Data to DIS in the Data Ingestion Service User Guide.

In the following example, each record of vehicle information includes four fields, licenseplate number, vehicle owner name, vehicle brand, and vehicle price.

1,lilei,bmw320i,282,hanmeimei,audia4,27

Viewing Job Information and Job Execution Result

After a job is started, you can view the job running status by performing the following steps:

Step 1 In the navigation tree on the left pane of the CS management console, choose JobManagement to switch to the Job Management page.

Step 2 In the job list, click job name JobSample to view job details.

Figure 2-10 Job monitoring

For details, see Monitoring a Job.

To view the output result of a job, log in to the DIS management console and view data on theDIS stream that serves as the sink stream. For details, see Retrieving Data from DIS in theData Ingestion Service User Guide.

----End



13



Viewing Other Relevant DocumentsAfter performing the preceding steps, you can refer to the following to learn more about CS:

l Cloud Stream Service User Guide: This document provides concepts of the job,template, and cluster as well as details about related operations.

l Cloud Stream Service API Reference: This document provides instructions for usingAPIs of CS to perform operations on jobs.

l Cloud Stream Service Stream Ecosystem Development Guide: This documentintroduces the cloud service ecosystems and open-source ecosystems related to CloudStream Service and provides guidance on how to use stream ecosystems to performdevelopment.

l Cloud Stream Service SDK Reference: This document describes how to install andconfigure the development environment and how to perform secondary development byinvoking API functions provided by CS SDK.

Deleting a JobYou can delete unwanted running example jobs to avoid resource waste or quota occupation.

NOTE

Deleted jobs cannot be restored. Exercise caution when performing this operation.


Step 2 On the row where job JobSample is located on the Job Management page, choose More >Delete.

Step 3 In the displayed dialog box, click OK.

----End



14

https://support.huaweicloud.com/en-us/cs_gls/index.html

https://support.huaweicloud.com/en-us/api-cs/cs_02_0052.html

https://support.huaweicloud.com/en-us/ecodevg-cs/cs_05_0001.html

https://support.huaweicloud.com/en-us/sdkreference-cs/cs_04_0001.html

3 Operation Guide

3.1 Logging In to the CS Management ConsoleThis section describes how to log in to the CS management console and use CS.

PrerequisitesYou have registered an account with the management console.

ProcedureYou can log in to the CS management console using a web browser.

Step 1 Log in to the public cloud management console.


Step 2 From the menu on top of the public cloud management console, choose Service List.

Step 3 Click Cloud Stream Service under EI Enterprise Intelligence.

----End

3.2 Applying for CS

PrerequisitesYou have registered an account with the management console.

Applying for CSYou can log in to the CS management console through a browser and apply for CS.



Cloud Stream ServiceUser Guide 3 Operation Guide


15

https://auth.huaweicloud.com/authui/login?service=https%3A%2F%2Fconsole.huaweicloud.com%2Fconsole%2F&locale=en-us#/login

https://auth.huaweicloud.com/authui/login?service=https%3A%2F%2Fconsole.huaweicloud.com%2Fcs%2F%3Fregion%3Dcn-north-1#/login

Step 2 The Apply for Cloud Stream Service page is displayed.

Figure 3-1 Applying for CS

Step 3 Select I have read and agree to the HUAWEI CLOUD User Agreement and click Apply.

Step 4 After the application succeeds, the system automatically switches to the Overview page. Seethe following figure.

Figure 3-2 Overview

Step 5 In the CS Service Agency window that is automatically displayed, click Go toauthorization.



16


Step 6 On the Cloud Resource Access Authorization page that is displayed, click Agree toauthorize.


----End

3.3 Creating an Agency for Permission GrantingWhen applying for CS, create an agency used for granting permissions to CS for properlyusing related services.

NOTICEl To use CS, you need to create an agency first. Otherwise, related services, such as DIS,

SMN, OBS, and CloudTable, will become unavailable.l Only the tenant account can create the agency. For details about public cloud accounts, see

the Identity and Access Management User Guide.



17

Prerequisites

You have applied for CS. For details, see Applying for CS.

Procedure

Step 1 After logging in to the CS management console, a dialog box shown in Figure 3-5 isdisplayed if the agency is not created. In this case, click Go to authorization.


Step 2 In the Cloud Resource Access Authorization dialog box that is displayed, as shown inFigure 3-6,


Step 3 click Agree to authorize. If the "Successfully authorized, you have successfully createdthe CS Service Default Agency." message is displayed, the default agency is successfullycreated.

After the agency is created, you can view the agency information on the Agency page in theIAM management console. See Figure 3-7.



18

Figure 3-7 Viewing the agency

The following code illustrates permissions granted to CS:

{ "Version": "1.0", "Statement": [ { "Effect": "Allow", "Action": [ "OBS:Bucket:*", "OBS:Object:*" ] } ]},{ "Version": "1.0", "Statement": [ { "Effect": "Allow", "Action": [ "Cloudtable:Cloudtable:*" ] } ], "Depends": [ { "catalog": "BASE", "display_name": "Tenant Guest" }, { "catalog": "BASE", "display_name": "Server Administrator" } ]},{ "Version": "1.0", "Statement": [ { "Effect": "Allow", "Action": [ "DIS:DIS:*" ] } ], "Depends": [ { "catalog": "BASE", "display_name": "Tenant Guest" }, {



19

"catalog": "BASE", "display_name": "Server Administrator" } ]},{ "Version": "1.0", "Statement": [ { "Effect": "Allow", "Action": [ "SMN:Topic:*", "SMN:Sms:*", "SMN:Email:*" ] } ]},{ "Version": "1.0", "Statement": [ { "Effect": "Allow", "Action": [ "*:*:*" ] }, { "Effect": "Deny", "Action": [ "identity:*:*" ] } ]}

----End

3.4 Viewing System SummaryAfter you log in to the CS management console, the Overview page is displayed, or you canclick Overview in the left navigation pane to switch to this page.

View the following information on the Overview page.

l Check the Job Overview area.

– indicates the number of running jobs.

– indicates the number of finished jobs.

– indicates the number of abnormal jobs.

– indicates the number of jobs in other status.

l Check the Cluster Overview area.

– indicates the number of running clusters.

– indicates the number of abnormal clusters.

– indicates the number of clusters in other status.

l Check the Price Overview area.

– Job price



20

Table 3-1 Job-related parameters in the Price Overview area

Parameter Description

Job Price Indicates the total expense of allrunning jobs. Unit: CNY

Total Unit Price of Running Jobs Unit: CNY/hour

Total SPUs of Running Jobs Unit: PCS

Total Billing Duration of Jobs Unit: hour

– Cluster price

Table 3-2 Job-related parameters in the Price Overview area


Cluster Price Indicates the total expense of allrunning clusters. Unit: CNY

Total Unit Price of Running Clusters Unit: CNY/hour

Total SPUs of Running Clusters Unit: PCS

Total Billing Duration of Clusters Unit: hour

3.5 Preparing the Data Source and Output ChannelTo use a service as the input stream or output channel, you need to apply for the service first.

CS supports the following data sources and output channels:

l DIS as the data source and output channelTo use DIS as the data source and output channel for CS, you need to enable DIS first.For details about how to create a DIS stream, see Creating a DIS Stream in the DataIngestion Service User Guide.After applying for a DIS stream, you can upload local data to DIS to provide datasources for CS in real time. For detailed operations, see Sending Data to DIS in theData Ingestion Service User Guide.An example is provided as follows:1,lilei,bmw320i,282,hanmeimei,audia4,27

l OBS as the data sourceTo use OBS as the data source, you need to enable OBS first. For details about how toenable OBS, see Enabling OBS in the Object Storage Service Console OperationGuide.After you enable OBS, upload local files to OBS using the Internet. For detailedoperations, see Uploading a File in the Object Storage Service Console OperationGuide.



21








l RDS as the output channel

To use RDS as the output channel, you need to apply for RDS and complete datamigration. For details, see the Relational Database Service Quick Start.

l SMN as the output channel

To use SMN as the output channel, you need to create an SMN topic to obtain the URNresource ID, and then add topic subscription. For detailed operations, see the SimpleMessage Notification Quick Start.

l Kafka as the data source and output channel

If Kafka serves as both the source and sink streams, you need to create a VPC peeringconnection between CS and Kafka. For details, see VPC Peering Connection.

If the Kafka server listens on the port using hostname, you need to add the mappingbetween the hostname and IP address of the Kafka Broker node to the CS cluster. Fordetails, see Adding an IP-Domain Mapping.

l CloudTable as the data source and output channel

To use CloudTable as the data source and output channel, you need to create a cluster inCloudTable and obtain the cluster ID. For detailed operations, see Getting Started withCloudTable in the CloudTable Service User Guide.

l Cloud Search Service as the Output Channel

To use Cloud Search Service as the data source and output channel, you need to create acluster in Cloud Search Service and obtain the cluster's private network address. Fordetailed operations, see Getting Started in the Cloud Search Service User Guide.

l DCS as the output channel

To use CS as the output channel, you need to create a Redis cache instance in DCS andobtain the address used for CS to connect to the Redis instance. For detailed operations,see Getting Started in the Distributed Cache Service User Guide.

3.6 Job Management

3.6.1 IntroductionA job refers to tasks ran by a compiled Java JAR file in a distributed system. A job containsthree parts: source stream, Stream SQL data processing, and sink stream. On the JobManagement page, you can create and manage jobs. Information about all created jobs isdisplayed in the job list on the Job Management page. If a large number of jobs are created,you can turn pages to view them.

Job Management

The job list displays all created jobs. By default, jobs are sorted in time sequence, and thelatest job is displayed at the top. Table 1 describes the parameters involved in the job list.

Table 3-3 Parameters involved in the job list


ID Indicates the job ID, which is unique globally.



22

https://support.huaweicloud.com/en-us/cloudtable_gls/index.html

https://support.huaweicloud.com/en-us/es_gls/index.html

https://support.huaweicloud.com/en-us/dcs_gls/index.html


Name Indicates the job name, which is unique globally.

Type Indicates the type of a job. The following types are supported:l Flink SQLl Flink Jarl Flink Edge SQLl Spark Jar

Status Indicates the status of a job. Values include the following:l Draftl Submittingl Submission failedl Runningl Running exceptionl Idlel Stoppingl Stoppedl Stop failedl Stopped due to arrearsl Restoring (recharged jobs)l Completed

Description Indicates the description of a job.

Creation Time Indicates the time when a job is created.

Start Time Indicates the start time of the job execution.

Duration Indicates the running duration of a job.

Operation l Edit: You can click Edit to edit a job that has been created.l Start: You can click Start to start and run a job.l Stop: You can click Stop to stop a job in the Submitting or

Running status.l Delete: You can click Delete to delete a job.

NOTEA deleted job cannot be restored. Therefore, exercise caution when deleting ajob.



23

Table 3-4 Button description

Button Description

Select a certain job status from the drop-down list to display jobs of thestatus.

In the search box, enter the job name and click to search for the job.

Click to manually refresh the job list.

3.6.2 Creating a Flink Streaming SQL JobThis section describes how to create a Flink streaming SQL job. Flink SQL provides users amethod for compiling jobs according to their logic requirements. SQL-based business logicexpression facilitates service implementation. Currently, CS supports compiling Flink SQLstatements by using the SQL editor and visual editor. This section describes how to use theSQL editor to compile Flink streaming SQL jobs.

For details about the visual editor, see Visual Editor.

PrerequisitesYou have prepared the data source and data output channel. For details, see Preparing theData Source and Output Channel.

Procedure

Step 1 You can create a Flink streaming SQL job on any of the three pages: Job Management, Edit,and Template Management.l Create a job on the Job Management page

a. In the navigation tree on the left pane of the CS management console, choose JobManagement to switch to the Job Management page.

b. On the displayed Job Management page, click Create to switch to the Create Jobdialog box.

l Create a job on the Edit page


b. On the row where a created Flink streaming SQL job is located, click Edit underOperation to enter the Edit page.

c. Click Save As. The Job Save as dialog box is displayed.l Create a job on the Template Management page

a. In the navigation tree on the left pane of the CS management console, clickTemplate Management to switch to the Template Management page.

b. On the row where the desired template is located, click Create Job underOperation.



24

Step 2 Specify job parameters as required.

Table 3-5 Parameters related to job creation


Type Set Type to Flink Streaming SQL Job. In this case, you need to startjobs by compiling SQL statements.

Name Indicates the name of a job which has 1 to 57 characters and onlycontains letters, digits, hyphens (-), and underlines (_).NOTE

The job name must be unique.

Description Indicates the description of a job. It contains 0 to 512 characters.

Editor SQL Editor and Visual Editor are available. By default, SQL Editor isused.

Template This parameter is valid only when Editor is set to SQL Editor.You can select a sample template or a customized job template. Fordetails about templates, see Template Management.

Step 3 Click OK to enter the Edit page.

Step 4 Edit a job.

In the SQL statement editing area, enter SQL statements to implement business logic. Fordetails about how to compile SQL statements, see SQL Syntax Reference.

Step 5 Click Check Semantics.

l You can perform Debug, Submit, and Start operations on a job only after semanticverification succeeds.

l If verification is successful, the message "The SQL semantic verification is complete. Noerror." will be displayed.

l If verification fails, a red "X" mark will be displayed to the front of each error SQLstatement. You can move the mouse to the "X" mark to view error details and change theSQL statement as prompted.

Step 6 Set job running parameters.

Table 3-6 Job running parameter description


SPUs The stream processing unit (SPU) is the pricing unit for CS. AnSPU includes one core and 4-GB memory.

Parallelism Parallelism refers to the number of tasks where CS jobs cansimultaneously run.NOTE

The value of Parallelism must not exceed four times of (Number of SPUs– 1).



25


Enable Checkpoint Indicates whether to enable the job snapshot function. After thisfunction is enabled, jobs can be restored based on the checkpoint.The following two parameters are valid after Enable Checkpointis selected:l Checkpoint Interval (s) refers to checkpoint interval,

expressed by seconds. The parameter value ranges from 1 to999999, and the default value is 10.

l Checkpoint Mode can be set to either of the following values:– AtLeastOnce: indicates that events are processed at least

once.– ExactlyOnce: indicates that events are processed only once.

Save Job Log Indicates whether to save the job running logs to OBS.NOTE

If both Enable Checkpoint and Save Job Log are selected, OBSauthorization can be performed only once.

OBS Bucket This parameter is valid only when Enable Checkpoint or SaveJob Log is selected.Select an OBS bucket to store checkpoint and job logs.If the selected OBS bucket is not authorized, click OBSAuthorization.

Open JobAbnormality Alarm

Indicates whether to send job exceptions, for example, abnormaljob running or exceptions due to arrears, to users via SMN.

Topic Name This parameter is valid only when Open Job Abnormality Alarmis selected.Select a user-defined SMN topic. For details about how tocustomize SMN topics, see Creating a Topic in the SimpleMessage Notification User Guide.

Job Cluster Retain the default setting Cluster Shared. Alternatively, you canselect a user-defined exclusive cluster. For details about how tocreate a user-defined exclusive cluster, see Creating a Cluster.NOTE

During job creation, a sub-user can only select a cluster that has beenallocated to the user. For details about how to allocate a cluster to a sub-user, see Modifying a Sub-User.

Step 7 Click Save.





26

https://support.huaweicloud.com/en-us/smn_gls/index.html


If Status of a job is Submission failed or Running exception, the job fails to be submitted orfails to run. You can click the job name in the job list, switch to the Running Log page,rectify the fault based on the log information, and submit the job again.

NOTE

Other buttons are described as follows:

l Debug: indicates to perform job debugging. For details, see Debugging a Job.

l Save As: indicates to save the created job as a new job.

l Set as Template: indicates to set the created job as a job template.

l : indicates to modify the name or description of a job.

l : indicates to modify the SQL statement to the normal format. After clicking this button, youneed to edit SQL statements again.

l : indicates to set the theme related parameters, including Font Size, Wrap, and Page Style.

l : indicates the help center, which provides product documents to help users understand productsand product usage.

----End

3.6.3 Creating a Flink Streaming SQL Edge JobThis section describes how to create a Flink streaming SQL edge job. The Flink streamingSQL edge job analyzes and processes data near where data is generated when a large amountof data is generated on edge devices, which reduces the amount of data to be migrated to thecloud and improves real-time data processing.

Such job is a combination of CS and IEF. Stream computing applications are deployed onedge nodes to realize real-time data computing at edge, not on the cloud. CS then edits anddelivers the stream processing job to edge nodes for execution. This helps you quickly andaccurately analyze and process streaming data at the edge in real time.

Prerequisitesl IEF has been enabled.

l An ECS node has been created. The recommended configuration is 4 cores and 8 GB orhigher memory. For details about how to create an ECS node, see Purchasing andLogging In to a Linux ECS in the Elastic Cloud Server Quick Start.

l Edge computing groups have been created and edge nodes are successfully managed. Fordetails, see sections "Creating an Edge Computing Group" and "Managing EdgeNodes" in the Intelligent EdgeFabric Quick Start.

l An agency has been created for IEF. For details, see section "Creating an IEF Agency"in the Intelligent EdgeFabric Quick Start.

l An edge stream computing application edge-cs has been deployed. For details, seesection "Deploying Applications" in the Intelligent EdgeFabric Quick Start.

NOTE

If you deploy an application using a system template, ensure that the container specification is notless than the default value. Otherwise, the instance deployment fails.



27

Creating a Flink Streaming SQL Edge Job

Step 1 You can create a Flink streaming SQL job on either of the two pages: Job Management andEdit.l Create a job on the Job Management page


b. On the Job Management page, click Create to switch to the Create Job dialogbox.

l Create a job on the Edit page


b. On the row where a created Flink streaming SQL edge job is located, click Editunder Operation to enter the Edit page.

c. Click Save As. The Job Save as dialog box is displayed.




Type Set Type to Flink Streaming SQL Job. In this case, you need to startjobs by compiling SQL statements.

Name Indicates the name of a job which has 1 to 57 characters and onlycontains letters, digits, hyphens (-), and underlines (_).NOTE



Template You can select a sample template or a customized job template. Fordetails about templates, see Template Management.


Step 4 Edit a job.

Edit the Flink streaming SQL edge job as required to process data generated on edge devices.Currently, type can be set to edgehub and encode can be set to json or csv. For details aboutthe SQL syntax, see SQL Syntax Reference.

Example: Export the names and scores of students whose scores are greater than or equal to80.

create source stream student_scores(name string, score int) with ( type = "edgehub", topic = "abc", encode = "json", json_config = "score = student.score; name=student.name" ); create sink stream excellent_students(name string, score int) with ( type = "edgehub", topic = "abcd",



28

encode = "csv", field_delimiter = "," ); insert into excellent_students select name, score from student_scores where score >= 80;

Step 5 Click Check Semantics.l You can perform Debug, Submit, and Start operations on a job only after semantic

verification succeeds.l If verification is successful, the message "The SQL semantic verification is complete. No

error." will be displayed.l If verification fails, a red "X" mark will be displayed to the front of each error SQL

statement. You can move the mouse to the "X" mark to view error details and change theSQL statement as prompted.

Step 6 Set job running parameters.

Table 3-8 Job running parameter description


Parallelism Parallelism refers to the number of tasks where CS jobs cansimultaneously run.NOTE

The value of Parallelism must not exceed four times of (Number of SPUs– 1).

Job EdgeComputingGroup

Select the edge computing group to which the desired job belongs.l Defined in IEF, an edge computing group consists of an edge

node and edge devices that are working locally. Each groupdefines a complete edge computing environment.

l CS jobs can be deployed on multiple edge computing groups toimplement cooperation between CS and IEF.

Step 7 Click Save.






29

NOTE


l Debug: indicates to perform job debugging. For details, see Debugging a Job.

l Save As: indicates to save the created job as a new job.

l Set as Template: indicates to set the created job as a job template.


l : indicates to modify the SQL statement to the normal format. After clicking this button, youneed to edit SQL statements again.

l : indicates to set the theme related parameters, including Font Size, Wrap, and Page Style.


----End

Verifying Job Running

Step 1 On IEF, log in to any node that must interwork with edge nodes and install mosquito.

To download mosquito, visit https://mosquitto.org/download/.

Step 2 In the example, the following command is used to send data to edge nodes.

mosquitto_pub -h Edge node IP address -t abc -m '{"student":{"score":90,"name":"1bc2"}}';

In the command, abc refers to the topic name defined in the job.

Step 3 Open a new window and run related commands to monitor the output. Enter the followingcommand to query the names and scores of students whose scores are greater than or equal to80.

mosquitto_sub -h Edge node IP address -t abcd

In the command, abcd refers to the topic name defined in the job.

----End

3.6.4 Creating a User-Defined Flink JobThis section describes how to create a user-defined Flink job. You can perform secondarydevelopment based on Flink APIs, build your own JAR file, and submit the file to CSclusters. CS is fully compatible with open-source community APIs. To create a user-definedFlink job, you need to compile and build application JAR files. Therefore, you must have acertain understanding of Flink secondary development and have high requirements in streamcomputing complexity.

Prerequisitesl You have constructs the secondary development application code into a JAR file and

stored the JAR file on your local PC or uploaded it to the created OBS bucket.l The Flink dependency packages have been integrated into the CS server and security

hardening has been performed based on the open-source community version. Therefore,



30

https://mosquitto.org/download/

you need to exclude related Flink dependencies when building an application JAR file.To achieve this, use Maven or SBT to set scope to provided.

Procedure


Step 2 On the displayed Job Management page, click Create to switch to the Create Job dialogbox.




Type Select Flink Streaming Jar Job.

Item Indicates the name of a job which has 1 to 57 characters and onlycontains English letters, digits, hyphens (-), and underlines (_).NOTE




Step 5 Upload the JAR file.

Table 3-10 Description

Name Description

Upload a File There are three methods used for uploading the JAR file:l Local Upload: indicates to upload the JAR file saved in your

local PC to the CS server.NOTE

The size of the local JAR file cannot exceed 8 MB. To upload a JAR filewhose size exceeds 8 MB, upload the JAR file to OBS and then referenceit from OBS.

l Upload from OBS: Select a file from OBS as the data source andupload the file to the OBS bucket. CS then obtains data fromOBS.

l sample program: You can select an existing sample program fromthe public OBS bucket as required,

Main Class Indicates the main class of the JAR package to be uploaded, forexample, KafkaMessageStreaming. If this parameter is notspecified, the main class is determined based on the Manifest file inthe JAR package.



31

Name Description

Main ClassArguments

Indicates the list of parameters related to the main class. Every twoparameters are separated by a space.

Step 6 Perform basic configurations.

Table 3-11 Parameter description

Name Description

Job Cluster For user-defined jobs, you must select a cluster created by tenantsand then bind the cluster.If the target cluster does not exist in the list, use the tenant account togrant permissions and allocate the SPU quota to the sub-user on theUser Quota Management page. For details, see Modifying a Sub-User.

SPUs An SPU includes one core and 4 GB memory. The number of SPUsranges from 2 to 400.

Job Manager SPUs Set the number of SPUs used for Job Manager. By default, one SPUis configured. You can select one to four SPUs for Job Manager.

Parallelism Set the parallelism quantity for each operator of a job.NOTE

l The Parallelism value cannot be greater than four times of the number ofSPUs used for Task Manager.

l You are advised to set this parameter to a value greater than theparallelism in the code. Otherwise, job submission may fail.

Save Job Log Indicates whether to enable the job log saving function.If this function is enabled, you need to select an authorized OBSbucket. If the selected OBS bucket is not authorized, click OBSAuthorization.NOTE

For details about operations related to OBS, see Getting Started in theObject Storage Service Console Operation Guide.


Indicates whether to send job exceptions, for example, abnormal jobrunning or exceptions due to arrears, to users via SMN.

Topic Name This parameter is valid only when Open Job Abnormality Alarmis selected.Select a user-defined SMN topic. For details about how to customizeSMN topics, see Creating a Topic in the Simple MessageNotification User Guide.

Step 7 (Optional) After parameter configurations are complete, click Save.




32






NOTE




----End

3.6.5 Creating a User-Defined Spark JobThis section describes how to create a user-defined Spark job. You can perform secondarydevelopment based on Spark APIs, build your own JAR file, and submit the file to CSclusters. CS is fully compatible with open-source community APIs. To create a user-definedSpark job, you need to compile and build application JAR files. Therefore, you must have acertain understanding of Spark secondary development and have high requirements in streamcomputing complexity.

Prerequisitesl You have constructs the secondary development application code into a JAR file and

stored the JAR file on your local PC or uploaded it to the created OBS bucket.

l The Spark dependency packages have been integrated into the CS server and securityhardening has been performed based on the open-source community version. Therefore,you need to exclude related Spark dependencies when building an application JAR file.To achieve this, use Maven or SBT to set scope to provided.

Procedure


Step 2 On the displayed Job Management page, click Create to switch to the Create Job dialogbox.




Type Select Spark Streaming Jar Job.



33


Name Indicates the name of a job which has 1 to 57 characters and onlycontains English letters, digits, hyphens (-), and underlines (_).NOTE


Description Indicates the description of a job. It contains 0 to 512 bytes.


Step 5 Upload the JAR file.

Table 3-13 Description

Name Description

Upload a File There are three methods used for uploading the JAR file:l Local Upload: indicates to upload the JAR file saved in your

local PC to the CS server.NOTE

The size of the local JAR file cannot exceed 8 MB. To upload a JAR filewhose size exceeds 8 MB, upload the JAR file to OBS and then referenceit from OBS.

l Upload from OBS: Select a file from OBS as the data source andupload the file to the OBS bucket. CS then obtains data fromOBS.

l sample program: You can select an existing sample program fromthe public OBS bucket as required,

Main Class Indicates the main class of the JAR package to be uploaded, forexample, KafkaMessageStreaming. If this parameter is notspecified, the main class is determined based on the Manifest file inthe JAR package.

Main ClassArguments

Indicates the list of parameters related to the main class. Every twoparameters are separated by a space.

Step 6 Upload the configuration files.

Configuration file can be in .xml or .conf format. If there are multiple configuration files,compress them to a .zip package and upload the package.

There are two methods to upload the configuration files:

l Local Upload: indicates to upload the file saved in your local PC to the CS server.

l OBS Upload: Select a file from OBS as the data source and upload the file to the OBSbucket. CS then obtains data from OBS.

Step 7 Perform basic configurations.



34


Name Description

Job Cluster For user-defined jobs, you must select a cluster created by tenantsand then bind the cluster.If the target cluster does not exist in the list, use the tenant account togrant permissions and allocate the SPU quota to the sub-user on theUser Quota Management page. For details, see Modifying a Sub-User.

SPUs An SPU includes one core and 4 GB memory.Displays the total number of SPUs configured for a user-definedSpark job, including the SPUs configured for the management unitand all executor nodes.

Driver SPUs Set the number of SPUs used for Job Manager. By default, one SPUis configured. You can select one to four SPUs for Job Manager.

Executor SPUs Set the number of SPUs used for each Executor node. By default,one SPU is configured. You can select one to four SPUs for JobManager.

Executor Number Indicates the number of Executor nodes. The value ranges from 1 to100. The default value is 1.

Save Job Log Indicates whether to enable the job log saving function.If this function is enabled, you need to select an authorized OBSbucket. If the selected OBS bucket is not authorized, click OBSAuthorization.NOTE

For details about operations related to OBS, see Getting Started in theObject Storage Service Console Operation Guide.


Indicates whether to send job exceptions, for example, abnormal jobrunning or exceptions due to arrears, to users via SMN.

Topic Name This parameter is valid only when Open Job Abnormality Alarmis selected.Select a user-defined SMN topic. For details about how to customizeSMN topics, see Creating a Topic in the Simple MessageNotification User Guide.

Step 8 (Optional) After parameter configurations are complete, click Save.






35




NOTE




----End

3.6.6 Debugging a JobThe debugging function checks the business logic of your compiled SQL statements beforethe jobs are executed. It helps prevent unnecessary fees generated when you run streamingFlink SQL jobs. This function supports jobs of only the Flink Streaming SQL Job type.

Procedure


Step 2 On the Job Management page, locate the row where the target job resides and click Editunder Operation to switch to the Edit page.

For a job that is being created, you can debug the job on the Edit page.

Step 3 Click Debug to parse the compiled SQL statements. The Debugging Parameter page isdisplayed in the right pane of the Editing page.

Step 4 Set debugging parameters.l Data Storage Address: Select an OBS bucket to save debugging logs. If you select an

unauthorized OBS bucket, click OBS Authorization.l Data Input Mode: The following two options are available:

– OBS(CSV): If you select OBS(CSV), prepare OBS data first before using CS. Fordetails, see Preparing the Data Source and Output Channel. OBS data is storedin CSV format, where multiple records are separated by line breaks and differentfields in a single record are separated by commas (,).

– If Manual typing is selected, you need to compile SQL statements to configure aninput stream as the data source. In manual recording mode, you need to enter thevalue of each field in a single record.

l Set STUDENT_SCORES.– If OBS is selected, select an OBS object as the input stream data.– If Manual typing is selected, specify attribute parameters as prompted. Only one

record is allowed for an input stream. See Figure 3-8.



36

Figure 3-8 Debugging parameters

Step 5 Click Start Debugging. After debugging is complete, the Debugging Result page willappear.l If the debugging result meets the expectation, the job is running properly.l If the debugging result does not meet the expectation, business logic errors may occur. In

this case, modify SQL statements and conduct debugging again.

Figure 3-9 Debugging result

----End

3.6.7 Visual EditorCS provides a visual editor (also called visual SQL editor) for users who are not familiar withSQL development. The visual editor encapsulates upstream and downstream services (such asDIS and CloudTable) and internal logic operators (such as filter and window) that need to beinterconnected with CS into drag-and-drop components. It allows you to easily create a jobtopology by dragging required elements into the canvas and then connecting them. Byclicking each element in the canvas, you can set related parameters. The visual editor consistsof three areas:

l Drag-and-Drop Element area: includes a variety of source elements, operator elements,and sink elements. Element types are to be added to satisfy your requirements in variousscenarios.



37

– Sink Element: includes DIS, OBS, and CloudTable.– Operator Element: includes Union, Filter, Window, and Select.– Sink Element: includes DIS, CloudTable, SMN, and RDS.

l Canvas areal Element parameter setting area

ProcedureThe following procedure describes how to create a Flink streaming SQL job by using thevisual editor in the DIS-CS (Window)-DIS scenario.



Step 3 On the Job Management page, click Create to switch to the Create Job dialog box.




Type Select Flink Streaming SQL Job.NOTE

The visual editor supports only Flink Streaming SQL Job.

Name Indicates the name of a job which has 1 to 57 characters and only containsEnglish letters, digits, hyphens (-), and underlines (_).NOTE



Editor Select Visual Editor. Options SQL Editor and Visual Editor areavailable.


Step 6 Drag desired elements, such as DIS, Window, and DIS, to the canvas area.



38

Figure 3-10 Dragging elements to the canvas area

NOTE

You can double-click an element to delete it.

Step 7 Connect each element according to the logical connection.

Starting from the egress port of an element, drag on the canvas to the ingress port of anotherelement. You cannot directly connect the egress port of a source element to the ingress port ofthe sink element. In normal cases, the ingress port of the desired element turns green, ratherthan remain unchanged.

NOTE

You can double-click a connection to delete it.

Step 8 Configure the element parameters in the canvas area.

1. Click the source element, for example, source_dis_1. In the displayed area at the rightside, configure parameters related to the element, including parameters involved in DataStream Attribute Settings and Element Parameter Settings.



39

Table 3-16 Parameters to be configured when DIS serves as the source element


Data StreamAttribute Settings

Click Add Attribute, and specify Attribute Name andAttribute Type.Attribute Name starts with an English letter and only consists ofEnglish letters, digits, and underscores (_). A maximum of 20characters are allowed.Supported attribute types include STRING, INT, BIGINT,BOOLEAN, DOUBLE, FLOAT, and TIMESTAMP.Click Insert Test Data to insert the test data of the attribute.Click Delete Test Data to delete the test data of the attribute.In the attribute list, click Delete in a row, where the attribute youwant to delete resides, to delete the attribute.

Element Parameter Settings

Type Indicates the element type. The options are as follows dependingon various source elements:– source-dis– source-obs– source-cloudtable

Region Indicates the region where a user resides.

DIS Stream This parameter is valid only when DIS is selected under SourceElement.Select a DIS stream.

Partitions This parameter is valid only when DIS is selected under SourceElement.Partitions are the base throughput unit of a DIS stream. Eachpartition supports a read speed of up to 2 MB/s and a write speedof up to 1000 records/s and 1 MB/s.

Encoding This parameter is valid only when DIS is selected under SourceElement.Indicates the data encoding mode, which can be CSV or JSON.

Field Delimiter This parameter is valid only when DIS is selected under SourceElement and Encode is set to CSV or when OBS is selectedunder Source Element.This parameter indicates the delimiter between attributes. Thedefault value is a comma (,).

JSON config This parameter is valid only when DIS is selected under SourceElement and Encoding is set to JSON.Configure the mapping between the JSON field and the streamdefinition field, for example,attr1=student.name;attr2=student.age;.



40

Table 3-17 Parameters to be configured when OBS serves as the source element







OBS Bucket This parameter is valid only when OBS is selected under SourceElement.Select the OBS bucket where the selected source element islocated.

Object Name This parameter is valid only when OBS is selected under SourceElement.Indicates the name of the object stored in the OBS bucket wheresource data is located.

Row Delimiter This parameter is valid only when OBS is selected under SourceElement.Indicates the delimiter between rows, such as: "\n".



41

Table 3-18 Parameters to be configured when CloudTable serves as the source element







Table Name This parameter is valid only when CloudTable is selected underSource Element.Indicates the name of the data table to be read.

Cluster ID This parameter is valid only when CloudTable is selected underSource Element.Indicates the ID of the cluster to which the data table to be readbelongs.

Table Columns This parameter is valid only when CloudTable is selected underSource Element.This parameter value is in the format of"rowKey,f1:c1,f1:c2,f2:c1". Ensure that the column quantity isthe same as the number of attributes added in Data StreamAttribute Settings.

2. Click an operator element, for example, operator_window_1. In the displayed area at

the right side, configure parameters related to the element.The Window operator supports the following two time types: Event Time and ProcessingTime. For each time type, the following three window types are supported: tumblingwindow (TUMBLE), sliding window (HOP), and session window (SESSION). You cancalculate the data in the window, such as summing up and averaging the data in thewindow.



42

Table 3-19 Window operator element parameter configuration


Source Attributes Displays the data source, attribute name, and type that arespecified in Source Element.

Window Aggregation Parameter Configuration

Time Type The Window operator supports the following two time types:Event Time and Processing Time.

Time Attribute If Time Type is set to Event Time, this parameter indicates theuser-provided event time, which is the data with Type oftimestamp in Source Attributes.If Time Type is set to Processing Time, this parameter indicatesthe local system time proctime when events are handled.

WaterMark This parameter is valid only when Time Type is set to EventTime.If Time Type is set to Event Time, you must specify thisparameter. This is because user data is usually disordered. If awatermark is not configured to properly delay user data, the dataaggregation result may be greatly different from the expected.This parameter can be set to By time period or By number ofevents.

Delay Period Indicates the maximum delay time. The default value is 20Seconds.

Send Period This parameter is valid only when WaterMark is set to By timeperiod.Indicates the watermark sending interval. The default value is 10Seconds.

Event Number This parameter is valid only when WaterMark is set to Bynumber of events.Indicates the number of data packets, upon which the watermarkis sent. The default value is 10.

GroupBy

Window Type For each time type, the following three window types aresupported:– Tumbling window– Sliding window– Session window

Window Period The Window operator assigns each element to a window of aspecified window size. The specified window size is the windowperiod. The default window period is 1 day.



43


Group Attribute This parameter is optional.Grouping can be performed by time window or attribute. Thisparameter indicates the attribute specified in Source Element.Multiple attributes can be selected.

Sliding Period This parameter is valid only when Window Type is set to HopWindow.The sliding window has two parameters: size and slide.– The size parameter is indicated by Window Period and

refers to the window size.– The slide parameter is indicated by Sliding Period and refers

to each slide step.If the slide value is smaller than the size value, sliding windowscan be overlapping. In this case, elements are assigned tomultiple windows.If the slide value is equal to the size value, the window can beconsidered as a tumbling window.If the slide value is greater than the size value, the window isconsidered a jump window. In this case, windows are notoverlapping and there are no gaps between windows.

Select Attribute Click Add Select Attribute, and specify Function Type andType.Function Type can be set to Window, Aggregate, or NoFunction.Various window functions can be selected depending on yourWindow Type setting:– If Window Type is set to Tumble Window, window

functions TUMBLE_START and TUMBLE_END areavailable.

– If Window Type is set to Hop Window, window functionsHOP_START and HOP_END are available.

– If Window Type is set to Session Window, windowfunctions SESSION_START and SESSION_END areavailable.

The aggregate functions of following types are supported: Count,AVG, SUM, MAX, and MIN.Type can be set to the following: STRING, INT, BIGINT,BOOLEAN, DOUBLE, FLOAT, and TIMESTAMP.

Click to display the function parameter setting area and setparameters as required.Click Delete to delete the corresponding function type.



44

The Select operator corresponds to the SQL statement Select used for selecting data fromdata streams. Attribute Name in Source Attributes must be the existing attribute namespecified in the source element connected to the Select operator.

Table 3-20 Select operator element parameter configuration



Output Attributes Click Add Attribute, and specify Select Field and Type.The Select operator is used to select the sink stream. Each outputattribute can be:– Input attribute of the data source– Logical collection of data source attributes, such as addition

or subtraction of attributes– Function calculation on the source attribute– OthersType can be set to the following: STRING, INT, BIGINT,BOOLEAN, DOUBLE, FLOAT, and TIMESTAMP.Click Delete to delete the corresponding attribute.

The Filter operator corresponds to the SQL statement WHERE used for filtering datafrom data streams. The filter rules support arithmetic operators, relational operators, andlogical operators.

Table 3-21 Filter operator element parameter configuration



Filter Rules Click Add Rule to specify a filter rule. You can add multiplerules.Click Delete to delete the corresponding filter rule.

Output Attributes Displays the attribute name and type.

The Union operator is used to combine multiple streams. Ensure that the streams havethe same attribute, including the attribute type and attribute sequence. Specifically, theattribute in the same row of each source element must have the same Type setting.



45

Table 3-22 Union operator element parameter configuration


Output Attributes Displays the attribute name and type.

3. Click a sink element, for example, sink_dis_1. In the displayed area at the right side,

configure parameters related to the element, including parameters involved in DataStream Attribute Settings and Element Parameter Settings.

Table 3-23 Parameters to be configured when DIS serves as the sink element



Click Add Attribute, and specify Attribute Name andAttribute Type.Attribute Name starts with an English letter and only consistsof English letters, digits, and underscores (_). A maximum of 20characters are allowed.Supported attribute types include STRING, INT, BIGINT,BOOLEAN, DOUBLE, FLOAT, and TIMESTAMP.In the attribute list, click Delete in a row, where the attribute youwant to delete resides, to delete the attribute.


Type Indicates the sink element type. The options are as followsdepending on various sink elements:– sink-dis– sink-cloudtable– sink-smn– sink-rds


DIS Stream This parameter is valid only when DIS is selected under SinkElement.Select a DIS stream.

Partition Key This parameter is valid only when DIS is selected under SinkElement.This parameter refers to the key used for data grouping whenDIS serves as the sink stream.Key used for data grouping when there are multiple partitions ina DIS stream. Multiple keys are separated by using commas (,).

Encoding This parameter is valid only when DIS is selected under SinkElement.Indicates the data encoding mode, which can be CSV or JSON.



46


Field Delimiter This parameter is valid only when DIS is selected under SinkElement.This parameter indicates the delimiter between attributes. Thedefault value is a comma (,).

Table 3-24 Parameters to be configured when CloudTable serves as the sink element







Table Name This parameter is valid only when CloudTable is selected underSink Element.Indicates the name of the data table to be read.

Cluster ID This parameter is valid only when CloudTable is selected underSink Element.Indicates the ID of the cluster to which the data table to be readbelongs.

Table Columns This parameter is valid only when CloudTable is selected underSink Element.The format is rowKey,f1:c1,f1:c2,f2:c1. The number of columnsmust be the same as the number of attributes specified in thesource element.



47


Abnormal Table This parameter is valid only when CloudTable is selected underSink Element.Indicates the table for dumping abnormal data. This table is usedto store data that cannot be written into HBase according tospecified configuration. If this field is specified, abnormal datawill be written into the specified table. If unspecified, abnormaldata will be abandoned.

Empty Table This parameter is valid only when CloudTable is selected underSink Element.Indicates whether to create a table if the target table or columnfamily where data is to be written does not exist. The defaultvalue is FALSE.

Data Records This parameter is valid only when CloudTable is selected underSink Element.Indicates the amount of data to be written in batches at a time.The value must be a positive integer. The upper limit is 100. Thedefault value is 10.

Table 3-25 Parameters to be configured when SMN serves as the sink element









48


Topic URN This parameter is valid only when SMN is selected under SinkElement.Indicates the topic URN.

Message Subject This parameter is valid only when SMN is selected under SinkElement.Indicates the title of the message sent to SMN.

Column Name This parameter is valid only when SMN is selected under SinkElement.Indicates the column name of the output stream whose content isthe content of the message

Table 3-26 Parameters to be configured when RDS serves as the sink element







Username This parameter is valid only when RDS is selected under SinkElement.Indicates the username root used for creating the RDS databaseinstance.

Password This parameter is valid only when RDS is selected under SinkElement.Indicates the password that is specified during RDS databaseinstance creation.



49


DB URL This parameter is valid only when RDS is selected under SinkElement.The parameter value is a combination of the private network IPaddress, port number, and DB name of the node where the DB islocated. The format is as follows: mysql://192.168.0.12:8635/dbName

Table Name This parameter is valid only when RDS is selected under SinkElement.Indicates the name of the table created in the node where the DBis located.

Step 9 (Optional) Click SQL Editor to convert the information in the visual editor into SQLstatements.

Step 10 (Optional) Click Save to save the job parameter settings.

Step 11 If you need to run the job, click Submit.

----End

3.6.8 Performing Operations on a JobAfter a job is created, you can perform operations on the job as required.

l Editing a Jobl Starting Jobsl Job Configuration Listl Stopping Jobsl Deleting Jobs

Editing a JobYou can edit a created job, for example, modifying the SQL statement, job name, jobdescription, or job configurations.


Step 2 In the row where the job you want to edit is located, click Edit the Operation column toswitch to the Edit page.

Step 3 Edit the job as required.

For details about the Edit page for Flink streaming SQL jobs, see Creating a FlinkStreaming SQL Job.

For details about the Edit page for Flink streaming SQL edge jobs, see Creating a FlinkStreaming SQL Edge Job.

For details about the Edit page for user-defined Flink jobs, see Creating a User-DefinedFlink Job.



50

For details about the Edit page for user-defined Spark jobs, see Creating a User-DefinedSpark Job.

----End

Starting Jobs

To start created jobs or jobs that have been stopped, perform the following steps:


Step 2 Perform either of the following methods to start jobs:l Starting a single job

Select a job, and click Start in the Operation column. Alternatively, you can click Startin the upper left area and click OK in the displayed Start Job dialog box.

l Starting multiple jobs in batchesSelect multiple jobs and click Start in the upper left corner of the job list to startmultiple jobs.

Step 3 On the Job Configuration List page, click OK.

After a job is started, its status is displayed in the Status column on the Job Managementpage.

----End

Job Configuration List

Upon submitting a job or starting a running job, you need to confirm the job costs.

Step 1 You will enter the Job Configuration List page after submitting a job on the JobManagement page or after starting a job on the Job Management page.l Submit a job on the Edit page


b. In the row where the job you want to edit is located, click Edit the Operationcolumn to switch to the Edit page.

c. Edit SQL statements and configure parameters on the Running Parameters page.d. Click Submit to switch to the Job Configuration List page.

l Start a job on the Job Management page


b. In the row where the job you want to start is located, click Start in the Operationcolumn to switch to the Job Configuration List page.

On the Job Configuration List page, you can click Price Details to view the product pricedetails.

Step 2 (Optional) Click Price Calculator in the lower right corner.

1. You can view the price details on the Product Price Details page.



51

2. You can learn the fee calculation methods on the Price Calculator page.

On the Price Calculator page, you can obtain the optimal product configuration throughmultiple tries.

Step 3 (Optional) Click Cancel to cancel the operation of running a job.

Step 4 After confirming the configuration fee, click OK to submit the job.

----End

Stopping Jobs

You can stop jobs that are in the Running or Submitting status.


Step 2 Stops jobs.

Perform either of the following methods to stop jobs:l Stopping a job

In the Operation column of the job that you want to stop, choose More > Stop.Alternatively, you can select a job and click Stop above the job list.

l Stopping jobs in batchesSelect the jobs that you want to stop, click Stop above the job list to stop multiple jobs.


During the procedure of stopping jobs, the following Status settings may appear:l If Status is Stopping, the job is being stopped.l If Status is Stopped, the job has been stopped.l If Status is Stop failed, the job fails to be stopped.

----End

Deleting Jobs

A deleted job cannot be restored. Therefore, exercise caution when deleting a job.

Step 1 In the left navigation pane of the CS management console, choose Job Management toswitch to the Job Management page.

Step 2 Perform either of the following methods to delete jobs:l Deleting a single job

In the Operation column of the job that you want to delete, choose More > Delete.Alternatively, you can select a job and click Delete above the job list.

l Deleting jobs in batchesSelect the jobs that you want to delete and click Delete above the job list.

Step 3 Click OK.

----End



52

3.6.9 Monitoring a JobAfter a job is created, you can view the job details through the following operations:

l Viewing Job Detailsl Checking the Dashboardl Viewing the Job Execution Planl Viewing the Task List of a Jobl Querying Job Audit Logsl Viewing Job Running Logs

Viewing Job Details

This section describes how to view job details. After you create and run a job, you can viewjob details, including SQL statements and parameter settings. For a user-defined job, you canonly view its parameter settings.


Step 2 In the Name column, click the job name to switch to the Job Details page.

On the Job Details page, you can view SQL statements, Parameter List, and total cost forthe job.

Table 3-27 Parameters


Type Indicates the type of a SQL job.

ID Indicates the job ID.

Status Indicates the status of a job.

Running Mode If you create a job in a shared cluster, this parameter is Shared.If you create a job in a user-defined cluster, this parameter isExclusively.

Job Cluster If you create a job in a shared cluster, this parameter is ClusterShared.If you create a job in a user-defined cluster, the specific clustername is displayed.

Creation Time Indicates the time when a job is created.

Start Time Indicates the start time of a job.

Total Billing Time Indicates the total running duration of a job for charging.

SPU Indicates the number of SPUs for a job.

Parallelism Indicates the number of tasks where CS jobs can simultaneouslyrun.



53


Enable Checkpoint Select Enable Checkpoint to save the intermediate job runningresults to OBS, thereby preventing data loss in the event ofexceptions.

Checkpoint Interval (s) This parameter is valid only when Enable Checkpoint is set totrue.Indicates the interval between storing intermediate job runningresults to OBS.

Checkpoint Mode This parameter is valid only when Enable Checkpoint is set totrue.Indicates the checkpoint mode. Values include:l AtLeastOnce: indicates that events are processed at least

once.l ExactlyOnce: indicates that events are processed only once.

Save Job Log Select Save Job Log to save job run logs to OBS so that you canlocate faults by using run logs in the event of faults.

Data Storage Address Indicates the name of the OBS bucket where data is dumped.

----End

Checking the DashboardYou can view details about job data input and output through the dashboard.


Step 2 In the Name column on the Job Management page, click the desired job name. On thedisplayed page, click Job Monitoring.

The following table describes monitoring metrics related to Spark jobs.

Table 3-28 Monitoring metrics related to Spark jobs

Metric Description

Input Size Provides the number of input records for a Spark job.

Processing Time Provides the processing time distribution chart of all mini-batchtasks.

Scheduling Delay Provides the scheduling delay distribution chart of all mini-batchtasks.

Total Delay Provides the total scheduling delay of all mini-batch tasks.

The following table describes monitoring metrics related to Flink jobs.



54

Table 3-29 Monitoring metrics related to Flink jobs

Metric Description

Data Input Rate Provides the data input rate of a Flink job. Unit: Data records/s

Total Input Records Provides the total number of input data records in a Flink job.Unit: Data records

Total Input Bytes Provides the total input bytes of a Flink job. Unit: Byte

Data Output Rate Provides the data output rate of a Flink job. Unit: Data records/s

Total Output Records Provides the total number of output data records in a Flink job.Unit: Data records

Total Output Bytes Provides the total output bytes of a Flink job. Unit: Byte

Job Manager Memory Provides the memory occupied by Job Manager of a Flink job.

Thread Count Provides the total number of Job Manager JVM threads of aFlink job.

Task Statistics Provides the number of task slots in a Flink job.

Step 3 (Optional) Click Real-Time Refresh to refresh the running jobs in real time. The charts areupdated every 10 seconds.

Step 4 Click . In the displayed Add Chart dialog box, specify the parameter as required.

Step 5 (Optional) Click to delete a metric.

Step 6 Click Add to add a metric. You can view the added chart on the Job Monitoring page.

----End

Viewing the Job Execution Plan

You can view the execution plan to learn about the operator stream information about therunning job.

NOTE

Execution plans of Spark jobs cannot be viewed.


Step 2 In the Name column on the Job Management page, click the desired job name. On thedisplayed page, click Execution Plan.

l Scroll the mouse wheel or click to zoom in or zoom out the streamdiagram.



55

l The stream diagram displays the operator stream information about the running job inreal time.

----End

Viewing the Task List of a Job

You can view details about each task running on a job, including the task start time, numberof received and transmitted bytes, and running duration.

NOTE

The task list of the Spark job cannot be viewed.


Step 2 In the Name column on the Job Management page, click the desired job name. On thedisplayed page, click Task List.

1. View the operator task list.



Name Indicates the name of an operator.

Duration Indicates the running duration of an operator.

Parallelism Indicates the number of parallel tasks in an operator.

Task Operator tasks are categorized into the following:– The number in red indicates the number of failed tasks.– The number in light gray indicates the number of canceled

tasks.– The number in yellow indicates the number of tasks that are

being canceled.– The number in green indicates the number of finished tasks.– The number in blue indicates the number of running tasks.– The number in sky blue indicates the number of tasks that

are being deployed.– The number in dark gray indicates the number of tasks in a

queue.

Status Indicates the status of an operator task.

Sent Records Indicates the records of an operator to send data.

Sent Bytes Indicates the number of bytes sent by an operator.

Received Bytes Indicates the number of bytes received by an operator.

Received Records Indicates the records of an operator receiving data.

Start Time Indicates the time when an operator starts running.



56


End Time Indicates the time when an operator stops running.

2. Click to view the task list.



Start Time Indicates the time when a task starts running.

End Time Indicates the time when a task stops running.

Duration Indicates the task running duration.

Received Bytes Indicates the number of bytes received by a task.

Received Records Indicates the records received by a task.

Sent Bytes Indicates the number of bytes sent by a task.

Sent Records Indicates the records sent by a task.

Attempts Indicates the number of retry attempts after a task issuspended.

Host Indicates the host IP address of the operator.

----End

Querying Job Audit Logs

You can view the job operation records in audit logs, such as job creation, submission,running, and stop.


Step 2 In the Name column on the Job Management page, click the desired job name to switch tothe Job Details page.

Step 3 Click Audit Log to view audit logs of the job.

A maximum of 50 logs can be displayed. For more audit logs, query them in CTS. For detailsabout how to view audit logs in CTS, see section "Querying Real-Time Traces" in the CloudTrace Service Quick Start.

NOTE

If no information is displayed on the Audit Log page, you need to enable CTS.

1. Click Enable to switch to the CTS Authorization page.

2. Click OK.

You can also log in to the CTS management console to enable CTS. For details, see Enabling CTS.



57

Table 3-32 Parameters related to audit logs

Parameter Parameter description

Event Name Indicates the name of an event.

Resource Name Indicates the name of a running job.

Resource ID Indicates the ID of a running job.

Type Indicates the operation type.

Level Indicates the event level. Values include the following:l incidentl warningl normal

Operator Indicates the account used to run a job.

Generated Indicates the time when an event occurs.

Source IP Address Indicates the IP address of the operator.

Operation Result Indicates the operation result.

----End

Viewing Job Running Logs

You can view the run logs to locate the faults occurring during job running.


Step 2 In the Name column on the Job Management page, click the desired job name. On thedisplayed page, click Running Log.

On the displayed page, you can view information on the JobManager and TaskManagerpages for running jobs.

Information on the JobManager and TaskManager pages is updated every 1 minute. Onlythe latest running logs are displayed. For more running logs, download them from thecorresponding OBS bucket.

If the job is not running, information on the TaskManager page cannot be viewed.

----End

3.7 Template ManagementWhen you use SQL jobs, the system provides SQL job related templates. To use a SQL job,you can modify the SQL statements in the existing template as required, which saves the timefor compiling SQL statements. Alternatively, you can customize a template as required,facilitating future modification. You can create and manage templates on the TemplateManagement page. Information about all created templates is displayed in the template list



58

on the Template Management page. If a large number of templates are created, you can turnpages to view them.

This section describes the following:

l Template listl Creating a Templatel Creating a Job Based on a Templatel Viewing Template Detailsl Modifying a Templatel Deleting Templates

Template list

All custom templates are displayed in the custom template list on the Template Managementpage. Table 1 describes parameters involved in the custom template list.

Table 3-33 Parameters involved in the custom template list


Name Indicates the name of a template which has 1 to 64 characters and onlycontains English letters, digits, hyphens (-), and underlines (_).

Description Indicates the description of a template. It contains 0 to 512 characters.

Creation Time Indicates the time when a template is created.

Updated Time Indicates the latest time when a template is modified.

Operation l You can click Edit to modify a template that has been created.l You can click Create job to create a job directly by using the

template. After a job is created, the system switches to the Edit pageunder Job Management.

l You can click Delete to delete a created template.

Table 3-34 Button description

Button Description

Create Click Create to create a custom template.

Delete Click Delete to delete one or more custom templates.

In the search box, enter the template name and click to search forthe template.

Click to manually refresh the template list.



59

Creating a Template

You can create a template using any of the following methods:

l Creating a template on the Template Management page

a. In the left navigation pane of the CS management console, click TemplateManagement, Custom Template.

b. Click Create to switch to the Create Template dialog box.

c. Specify Name and Description.

Table 3-35 Parameters related to the template configuration


Name Indicates the name of a template which has 1 to 64 charactersand only contains letters, digits, hyphens (-), and underlines(_).NOTE

The template name must be unique.

Description Indicates the description of a template. It contains 0 to 512characters.

d. Click OK to enter the Edit page.

The following table lists operations allowed on the Edit page.

Table 3-36 Operations allowed on the Edit page

Name Description

SQL statement editing area In the area, you can enter detailed SQLstatements to implement businesslogic. For details about how to compileSQL statements, see SQL SyntaxReference.

Save Save the compiled SQL statements.

Save As Save a created template as a newtemplate. This function is optional.

Modify the template name anddescription. This function is optional.

Format SQL statements. This functionis optional. After SQL statements areformatted, you need to compile SQLstatements again.

Set the font size, line wrap, and pagestyle. This function is optional.



60

Name Description

Provide customers product documentsto help users understand products andproduct usage. This function isoptional.

e. In the SQL statement editing area, enter SQL statements to implement business

logic. For details about how to compile SQL statements, see SQL SyntaxReference.

f. After the SQL statements are compiled, click Save.After a template is created successfully, it will be displayed in the custom templatelist. You can click Create Job in the Operation column of the template you havecreated to create a job based on the template. For details about how to create a job,see Creating a Flink Streaming SQL Job.

l Creating a template based on the existing job template

a. In the left navigation pane of the CS management console, click TemplateManagement, Custom Template.

b. On the row where the desired template is located in the custom template list, clickEdit under Operation to enter the Edit page.

c. Click Save As.d. In the Template Save As dialog box that is displayed, specify Name and

Description and click OK.l Creating a template using a created job


b. On the Job Management page, click Create to switch to the Create Job dialogbox.

c. Specify parameters as required.d. Click OK to enter the Edit page.e. After the SQL statement is compiled, click Set to Template.f. In the Set to Template dialog box that is displayed, specify Name and Description

and click OK.l Creating a template based on the existing job


b. In the job list, locate the row where the job that you want to set as a templateresides, and click Edit in the Operation column.

c. Click Set to Template.d. In the Set to Template dialog box that is displayed, specify Name and Description

and click OK.

Creating a Job Based on a Template

You can create jobs based on sample templates or custom templates.



61

Step 1 In the navigation tree on the left pane of the CS management console, click TemplateManagement to switch to the Template Management page.

Step 2 On the row where the desired template is located, click Create Job under Operation. Fordetails, see Creating a Flink Streaming SQL Job.

----End

Viewing Template Details

Step 1 In the navigation tree on the left pane of the CS management console, click TemplateManagement to switch to the Template Management page.

Step 2 In the Name column of the sample template list or custom template list, click the name of thetemplate you want to view.

The template description and SQL statements involved in the current template are displayed.

Step 3 (Optional, only available for custom templates) Click Audit Log to view the operation logs ofthe current custom template.

For details about audit logs of a template, see Querying Template Audit Logs.

Step 4 (Optional, only available for custom templates) Click Edit in the upper right corner. You canedit the custom template on the Edit page.

----End

Modifying a Template

You can modify created custom templates as required. Sample templates cannot be modified.

Step 1 In the left navigation pane of the CS management console, click Template Management,Custom Template.

Step 2 In the row where the template you want to modify is located in the custom template list, clickEdit in the Operation column to enter the Edit page.

Step 3 In the SQL statement editing area, modify the SQL statements as required.

Step 4 (Optional) Click to modify the template name and description.

Step 5 Click Save.

NOTE

You can access the Edit page through the Template Details page. The procedure is as follows:

1. In the custom template list, click the name of the template you want to modify to switch to theTemplate Details page.

2. Click Edit to enter the Edit page.

----End

Deleting Templates

You can delete a custom template as required. The sample template cannot be deleted. Deletedtemplates cannot be restored. Exercise caution when performing this operation.



62


Step 2 In the custom template list, select the templates you want to delete and click Delete.


----End

3.8 Cluster ManagementCluster management provides users with exclusive clusters that are physically isolated and notsubject to other jobs. User-defined jobs can run only on exclusive clusters. To use user-defined jobs, you must create an exclusive cluster.

This section describes the following:

l Cluster Listl Creating a Clusterl Viewing Cluster Informationl Adding an IP-Domain Mappingl Modifying a Clusterl Job Managementl Stopping a Clusterl Restarting a Clusterl Deleting a Cluster

Cluster ListAll clusters are listed in the cluster list on the Cluster Management page. Table 1 describesparameters involved in the cluster list.

Table 3-37 Parameters involved in the cluster list


ID Indicates the ID of a cluster, which is automatically allocated duringcluster creation.

Name Indicates the name of a cluster which has 1 to 64 characters and onlycontains English letters, digits, hyphens (-), and underlines (_).



63


Status Indicates the cluster status. Possible values are as follows:l Creatingl ECS creation failedl Ready to startl Startingl Runningl Stoppingl Stop failedl Stoppedl Restartingl Deletingl Deletedl Delete failedl Arrears, Stoppingl Arrears, Stop failedl Arrears stoppedl Restoring (recharged cluster)l Thaw recovery failed

Description Indicates the description of a cluster. It contains 0 to 512 characters.

SPU Usage(UsedSPUs/SPUQuota)

Displays the SPU usage of a cluster. For example, 2/12, where 2indicates the SPUs used by the cluster and 12 indicates the SPU quota ofthe cluster.

Creation Time Indicates the time when a cluster is created.

Operation l You can click Job Management to perform operations on all jobs inthe cluster.

l You can click Delete to delete the cluster that has been created.l Choose More > Stop to stop the target cluster.l Choose More > Start to enable the target cluster.

Creating a Cluster

Step 1 In the navigation tree on the left pane of the CS management console, click ClusterManagement to switch to the Cluster Management page.

Step 2 On the Tenant Cluster page, click Create Cluster.

Step 3 In the displayed dialog box, specify parameters as required.



64

Table 3-38 Parameters related to cluster configuration


Name Indicates the name of a cluster which has 1 to 100 characters andonly contains letters, digits, hyphens (-), and underlines (_).NOTE

The cluster name must be unique.

Description Indicates the description of a cluster. It contains 0 to 512characters.

SPU Quota Indicates the number of available SPUs (excluding basic resourceconsumption of the cluster) for a job. The value ranges from 1 to400. The default value is 12.

Advanced Settings On this page, you can customize the network segment of the VPCto which the cluster belongs.Currently, the following network segments are supported:10.0.0.0/8~24, 172.16.0.0/12~24, and 192.168.0.0/16~24.

Step 4 Click OK to enter the Cluster Configuration List page.

Step 5 After confirming the cluster configuration, click OK.

The system automatically switches to the Cluster Management page, and Status of thecreated cluster is Creating. A cluster is successfully created only when Status is Running.

It takes about 1 to 3 minutes to create a cluster.

----End

Viewing Cluster InformationYou can view the detailed information about created clusters.


Step 2 On the row where the cluster you want to view is located, click the cluster name in the Namecolumn to switch to the Cluster Details page.

On the Cluster Details page, you can view detailed information about the current cluster.

Click Test address connectivity. In the dialog box that is displayed, enter the address to betested and click OK to test whether the connection between the current cluster and thespecified address is normal. The address can be a domain name plus IP address or a specifiedport.

Step 3 Click VPC Peering to display information about the VPC peering connection of the currentcluster.

For details about the VPC peering connection, see VPC Peering Connection.

Step 4 Click IP Domain Mapping to display information about the IP-domain mapping of thecurrent cluster.

For details about how to add an IP-domain mapping, see Adding an IP-Domain Mapping.



65

Step 5 Click Audit Log to display operation logs of the current cluster.

For details about cluster audit logs, see Querying Cluster Audit Logs.

----End

Adding an IP-Domain Mapping

After creating a cluster, you can add an IP-domain mapping for the cluster to connect to otherservices.


Step 2 On the row where the target cluster is located, click the cluster name in the Name column toswitch to the Cluster Details page.

Step 3 Click IP Domain Mapping to display information about the IP-domain mapping of thecurrent cluster.

Step 4 To create an IP-domain mapping, click Create IP Domain Mapping. In the displayed dialogbox, specify Domain and IP and click OK.

After an IP-domain mapping is created successfully, the current CS cluster can interconnectwith the mapped IP address.

NOTE

l The domain name can contain only letters, digits, hyphens (-), and dots (.), and must start or endwith a maximum of letters or digits. It contains a maximum of 67 characters.

l To edit an IP-domain mapping in the mapping list, locate the row where the IP-domain mapping islocated and click Edit in the Operation column.

l To delete an IP-domain mapping in the mapping list, locate the row where the IP-domain mapping islocated and click Delete in the Operation column.

----End

Modifying a Cluster

You can modify the cluster name, description, and SPU quota of a cluster in Running status.


Step 2 On the row where the cluster you want to modify is located, click the cluster name in theName column to switch to the Cluster Details page.

Step 3 Click next to Name to change the cluster name.

Step 4 Click next to Description to modify the description of the cluster.

Step 5 Click next to the SPU Quota to modify the SPU quota of the cluster.

NOTE

If the used SPUs reach the cluster's SPU quota, increase the SPU quota. Otherwise, you cannot createjobs in the cluster.

----End



66

Job Management


Step 2 Locate the row where the target cluster resides, click Job Management in the Operationcolumn to switch to the Job Management page of the cluster.

NOTE

l Only the jobs for which the cluster is selected during job creation are displayed.

l On the Job Management page in Cluster Management, you can only view, start, stop, and deletejobs in the cluster. For details, see Performing Operations on a Job.

----End

Stopping a ClusterTo stop a cluster, perform the following operations:

After a cluster is stopped, all jobs in the cluster are stopped. Exercise caution whenperforming this operation.


Step 2 Locate the row where the cluster you want to delete is located, click More > Stop in theOperation column to switch to the Stop Cluster page.

Step 3 Click OK.

----End

Restarting a ClusterTo restart a stopped cluster, perform the following operations:


Step 2 Locate the row where the cluster you want to restart is located, click More > Restart in theOperation column.

It takes 1 to 3 minutes to restart the cluster.

----End

Deleting a ClusterIf you do not need to use a cluster, perform the following operations to delete it:

A deleted cluster cannot be restored and all jobs in the deleted cluster will be stopped.Exercise caution when performing this operation.


Step 2 Locate the row where the cluster you want to delete is located, click Delete in the Operationcolumn to switch to the Delete Cluster dialog box.



67

Step 3 Click OK.

----End

3.9 User Quota ManagementYou can manage sub-users on the User Quota Management page, such as creating sub-users,allocating SPU quotas for sub-users, binding sub-users to clusters, and unbinding sub-usersfrom clusters.

This section is organized into the following parts:

l Sub-user Listl Creating a Sub-userl Modifying a Sub-User

Sub-user List

All sub-users of a tenant are displayed in the sub-user list on the User Quota Managementpage. Table 3-39 describes parameters involved in the sub-user list.

Table 3-39 Parameters involved in the sub-user list


Name Indicates the username of a sub-user.

User ID Indicates the ID of a sub-user, which is automatically allocated by thesystem during sub-user creation.

Used SPU Indicates the number of SPUs used by a sub-user.

SPU Quota Indicates the total number of SPUs that can be used for a user based onallocated clusters. The value ranges from 1 to 1000.

Cluster List Indicates the cluster that is allocated for a sub-user. Users can create jobsin the allocated clusters.

Operation You can click Save Configuration to save the configuration of the SPUquota and bound cluster for a sub-user.

Creating a Sub-user

Users can create up to 50 sub-users in a tenant.


Step 2 Switch to the User Quota Management page. Click Create Sub-User, and the systemautomatically switches to the User page of IAM.

Step 3 Click Create User. The Create Cluster page is displayed.



68

You can create sub-users as required. For details, see Creating Users in the Identity andAccess Management User Guide.

NOTE

User groups are not involved in CS. Therefore, you do not need to select the user group when creating auser in IAM.

You can view the created sub-user in the sub-user list on the User Quota Management pageunder Cluster Management in the CS management console.

Step 4 (Optional) Click Refresh User Information to refresh the sub-user list in real time.

----End

Modifying a Sub-User

After a sub-user is created, you can reallocate the SPU quota and cluster list for the sub-useras required.

Step 1 In the left navigation pane of the CS management console, click Cluster Management. Onthe displayed Cluster Management page, click User Quota Management.

Step 2 In the sub-user list, locate the row where the target sub-user resides, and reconfigure the valuein the SPU Quota column.

The SPU quota of a sub-user ranges from 1 to 1000.

Step 3 In the Cluster List column, select the clusters that are allocated to the sub-user. One or moreclusters are allowed.

Step 4 Click Save Settings. In the displayed dialog box, click OK.

----End

3.10 VPC Peering ConnectionA VPC peering connection refers to a network connection between two VPCs. Users in twoVPCs can use private IP addresses to communicate with each other as if the two VPCs wereon the same network. To enable two VPCs to communicate with each other, you can create aVPC peering connection between the two VPCs. CS allows users to create VPC peeringconnections between VPCs where exclusive CS clusters are created and other VPCs. If youhave established an ECS instance on the ECS server when using CS, you can click VPCPeering to connect the created CS clusters to the ECS instance.

For more information about VPC peering connections, see VPC Peering Connection in theVirtual Private Cloud User Guide.

Prerequisites

You have created a tenant cluster.

Procedure




69

https://support.huaweicloud.com/en-us/usermanual-vpc/en-us_topic_0046655035.html

Step 2 On the row of the cluster you want to query, click the cluster name in the Name column toswitch to the Cluster Detail page. Click VPC Peering.

Step 3 On the displayed page, click Create VPC Peering Connection. In the displayed dialog box,specify Name and Peer VPC, and click OK.

Step 4 Click Accept Request.

Step 5 After the status of the VPC peering connection becomes Accepted, click Add Route. In thedisplayed dialog box, specify parameters in Local Route and Peer Route, and click OK.

Figure 3-11 Adding a route

NOTE

l Parameters Destination in Local Route and Peer Route have been automatically set by the system.Generally, retain the default values. If there are custom requirements, modify them as required.

l You can click View Peer VPC or View Local VPC to show information about the peer or localVPC.

l After a VPC peering connection is created, you can run the job used for accessing ECSs in the peerVPC in the current cluster. However, ECS security groups may have different configurations andyou may not be allowed to access ports on the peer end. In this case, configure the security grouprule of the corresponding ECS and add rules on corresponding ports in inbound and outbounddirections. For details about how to configure the security group rule for an ECS, see Configuring aSecurity Group Rule in the Elastic Cloud Server User Guide.

l CIDRs must not overlap at both ends of a VPC peering connection. During cluster creation, you canconfigure the VPC network segment where the cluster resides. Ensure that the configured networksegment does not conflict with that of the peer end.

Step 6 (Optional) If the VPC peering connection is not required, click Delete.



70

After the VPC peering connection is deleted, communication between CS clusters and thepeer end will be interrupted. Therefore, exercise caution when deleting a VPC peeringconnection.

----End

3.11 Audit LogYou can use CTS to record key operation events related to CS. The events can be used invarious scenarios such as security analysis, compliance audit, resource tracing, and problemlocating. This section is organized as follows:

l Enabling CTSl Disabling the Audit Log Functionl Key Operationsl Querying Job Audit Logsl Querying Cluster Audit Logsl Querying Template Audit Logs

Enabling CTS

A tracker will be automatically created after CTS is enabled. All traces recorded by CTS areassociated with a tracker. Currently, only one tracker can be created for each account.

Step 1 On the CS management console, choose Service List > Management & Deployment >Cloud Trace Service. The CTS management console is displayed.

Step 2 In the navigation pane on the left, click Tracker.

Step 3 Click Enable CTS.

Step 4 On the Enable CTS page that is displayed, click Enable.

If you enable Apply to All Regions, the tracker is created in all regions of the current site toimprove the completeness and accuracy of the current tenant's audit logs.

After CTS is enabled, the system automatically assigns a tracker. You can view details aboutthe created tracker on the Tracker page.

----End

Disabling the Audit Log Function

If you want to disable the audit log function, disable the tracker in CTS.

Step 1 On the CS management console, choose Service List > Management & Deployment >Cloud Trace Service. The CTS management console is displayed.

Step 2 In the navigation pane on the left, click Tracker.

Step 3 In the tracker list, click Disable in the Operation column.

Step 4 In the displayed dialog box, click OK to disable the tracker.



71

After the tracker is disabled, the Disable button in the Operation column is switched toEnable. To enable the tracker again, click Enable and then click OK. The system will startrecording operations again.

After the tracker is disabled, the system will stop recording operations, but you can still viewexisting operation records.

----End

Key Operations

Table 3-40 describes the CS operations that can be recorded by CTS.

Table 3-40 CS operations that can be recorded by CTS

Operation Resource Type Event Name

Creating a job job createNewJob

Editing a job job editJob

Deleting a job job deleteJob

Starting a job job startJob

Stopping a job job stopJob

Deleting jobs in batches job deleteJobInBatch

Creating a template template createTemplate

Updating a template template updateTemplate

Deleting a template template deleteTemplate

Stopping jobs of an overdueaccount

job stopArrearageJob

Restoring jobs of an overdueaccount

job recoverArrearageJob

Deleting jobs of an overdueaccount

job deleteArrearageJob

Creating a cluster cluster createCluster

Deleting a cluster cluster deleteCluster

Adding nodes to a cluster cluster scalaUpCluster

Downsizing a cluster cluster scalaDownCluster

Expanding or downsizing acluster

cluster scalaCluster

Creating a tenant cluster cluster createReservedCluster

Updating a tenant cluster cluster updateReservedCluster



72

Operation Resource Type Event Name

Deleting a tenant cluster cluster deleteReservedCluster

Updating the user quota cluster updateUserQuota

Querying Job Audit Logs

You can view the job operation records in audit logs, such as job creation, submission,running, and stop.


Step 2 In the Name column on the Job Management page, click the desired job name to switch tothe Job Details page.

Step 3 Click Audit Log to view audit logs of the job.


NOTE



2. Click OK.





Resource Name Indicates the name of a running job.

Resource ID Indicates the ID of a running job.

Type Indicates the operation type.


Operator Indicates the account used to run a job.






73

----End

Querying Cluster Audit Logs

Cluster management allows you to view audit logs of a cluster.


Step 2 In the Name column on the Cluster Management page, click the desired cluster name toswitch to the Cluster Details page.

Step 3 Click Audit Log to view audit logs of the cluster.


NOTE

l If no information is displayed on the Audit Log page, you need to enable CTS.


2. Click OK.


l If ETS has been enabled for Audit Log under Job Management, you do not need to enable it forAudit Log under Cluster Management.




Resource Name Indicates the name of a running cluster.

Resource ID Indicates the ID of a running cluster.

Type Indicates the cluster operation type.


Operator Indicates the account used to run a cluster.




----End



74

Querying Template Audit LogsYou can view audit logs of a custom template by performing operations on the CustomTemplate page.


Step 2 In the Name column, click the name of a template whose audit logs you want to view toswitch to the Template Details page.

Step 3 Click Audit Log to view audit logs of the template.


NOTE



2. Click OK.





Resource Name Indicates the template name.

Resource ID Indicates the ID of a template.

Type Indicates the template operation type.


Operator Indicates the account used to operate a template.




----End



75

4 SQL Syntax Reference

4.1 Syntax Constraintsl Currently, Stream SQL supports only SELECT, FROM, WHERE, UNION, aggregation,

and JOIN syntax based on stream tables.l Data cannot be added into the source stream.l The sink stream cannot be used to perform query operations.

Data Types Supported by Syntaxl Basic data types: VARCHAR, STRING, BOOLEAN, TINYINT, SMALLINT,

INTEGER/INT, BIGINT, REAL/FLOAT, DOUBLE, DECIMAL, DATE, TIME, andTIMESTAMP

l Array: Square brackets ([]) are used to quote fields. For example:insert into temp select CARDINALITY(ARRAY[1,2,3]) FROM OrderA;

4.2 Data Type

OverviewData type is a basic attribute of data and used to distinguish different types of data. Data ofdifferent types occupies different storage space and supports different operations. Data isstored in data tables in the database. Each column of a data table defines the data type. Duringstorage, data must be stored according to data types.

Similar to the open source community, Stream SQL of the Huawei big data platform supportsboth native data types and complex data types.

Native Data TypesTable 4-1 lists native data types supported by Stream SQL.

Cloud Stream ServiceUser Guide 4 SQL Syntax Reference


76

Table 4-1 Native data types

Data Type Description StorageSpace

Value Range

VARCHAR Character with a variablelength

- -

BOOLEAN Boolean - TRUE/FALSE

TINYINT Signed integer 1 byte –128 to 127

SMALLINT Signed integer 2 bytes –32768 to 32767

INT Signed integer 4 bytes –2147483648 to2147483647

INTEGER Signed integer 4 bytes –2147483648 to2147483647

BIGINT Signed integer 8 bytes –9223372036854775808to 9223372036854775807

REAL Single-precision floatingpoint

4 bytes -

FLOAT Single-precision floatingpoint

4 bytes -

DOUBLE Double-precision floating-point

8 bytes -

DECIMAL Data type of valid fixedplaces and decimal places

- -

DATE Date type in the format ofyyyy-MM-dd, forexample, 2014-05-29

- DATE does not containtime information. Its valueranges from 0000-01-01 to9999-12-31.

TIME Time type in the format ofHH:MM:SSFor example, 20:17:40

- -

TIMESTAMP(3) Timestamp of date andtimeFor example, 1969-07-2020:17:40

- -

INTERVALtimeUnit [TOtimeUnit]

Time intervalFor example, INTERVAL'1:5' YEAR TO MONTH,INTERVAL '45' DAY

- -



77

Complex Data Types

Table 4-2 lists complex data types supported by Stream SQL.

Table 4-2 Complex data types

Data Type Description

ARRAY A group of ordered fields that are must be of the same data type

MAP A group of unordered key/value pairs. Keys must be of the native datatype, whereas values can be of the native data type or complex datatype. All keys or values in a map must be of the same data type.

4.3 Operator

Relational Operators

All data types can be compared by using relational operators and the result is returned as aBoolean value.

Relational operators are binary operators. Types of the compared data must be the same or thetypes must support implicit conversion.

Table 4-3 lists all relational operators supported by Stream SQL.

Table 4-3 Relational operators

Operator ReturnedData Type

Description

A = B BOOLEAN If A equals B, TRUE is returned. Otherwise, FALSE isreturned. This operator is used for value assignment.

A <> B BOOLEAN If A is not equal to B, TRUE is returned. Otherwise,FALSE is returned. If A or B is NULL, NULL isreturned. This operator follows the standard SQL syntax.

A < B BOOLEAN If A is less than B, TRUE is returned. Otherwise,FALSE is returned. If A or B is NULL, NULL isreturned.

A <= B BOOLEAN If A is less than or equal to B, TRUE is returned.Otherwise, FALSE is returned. If A or B is NULL,NULL is returned.

A > B BOOLEAN If A is greater than B, TRUE is returned. Otherwise,FALSE is returned. If A or B is NULL, NULL isreturned.

A >= B BOOLEAN If A is greater than or equal to B, TRUE is returned.Otherwise, FALSE is returned. If A or B is NULL,NULL is returned.



78


Description

A IS NULL BOOLEAN If A is NULL, TRUE is returned. Otherwise, FALSE isreturned.

A IS NOTNULL

BOOLEAN If A is not NULL, TRUE is returned. Otherwise, FALSEis returned.

A ISDISTINCTFROM B

BOOLEAN If A is not equal to B, TRUE is returned. NULLindicates A equals B.

A IS NOTDISTINCTFROM B

BOOLEAN If A is equal to B, TRUE is returned. NULL indicates Aequals B.

A BETWEEN[ASYMMETRIC |SYMMETRIC] B AND C

BOOLEAN If A is greater than or equal to B but less than or equal toC, TRUE is returned.l ASYMMETRIC: indicates that B and C are location-

related.For example, "A BETWEEN ASYMMETRIC BAND C" is equivalent to "A BETWEEN B AND C".

l SYMMETRIC: indicates that B and C are notlocation-related.For example, "A BETWEEN SYMMETRIC B ANDC" is equivalent to "A BETWEEN B AND C) OR (ABETWEEN C AND B".

A NOTBETWEEN BAND C

BOOLEAN If A is less than B or greater than C, TRUE is returned.

A LIKE B[ ESCAPEC ]

BOOLEAN If A matches pattern B, TRUE is returned. The escapecharacter C can be defined as required.

A NOT LIKEB [ ESCAPEC ]

BOOLEAN If A does not match pattern B, TRUE is returned. Theescape character C can be defined as required.

A SIMILARTO B[ ESCAPEC ]

BOOLEAN If A matches regular expression B, TRUE is returned.The escape character C can be defined as required.

A NOTSIMILAR TOB [ ESCAPEC ]

BOOLEAN If A does not match regular expression B, TRUE isreturned. The escape character C can be defined asrequired.

value IN(value [,value]* )

BOOLEAN If the value is equal to any value in the list, TRUE isreturned.



79


Description

value NOT IN(value [,value]* )

BOOLEAN If the value is not equal to any value in the list, TRUE isreturned.

Precautions

l Values of the double, real, and float types may be different in precision. The equal sign(=) is not recommended for comparing two values of the double type. You are advised toobtain the absolute value by subtracting these two values and determine whether they arethe same based on the absolute value. If the absolute value is small enough, the twovalues of the double data type are regarded equal. For example:abs(0.9999999999 - 1.0000000000) < 0.000000001 //The precision decimal places of 0.9999999999 and 1.0000000000 are 10, while the precision decimal place of 0.000000001 is 9. Therefore, 0.9999999999 can be regarded equal to 1.0000000000.

l Comparison between data of the numeric type and character strings is allowed. Duringcomparison using relational operators, including >, <, ≤, and ≥, data of the string type isconverted to numeric type by default. No characters other than numeric characters areallowed.

l Character strings can be compared using relational operators.

Logical Operators

Common logical operators are AND, OR, and NOT. Their priority order is NOT > AND >OR.

Table 4-4 lists the calculation rules. A and B indicate logical expressions.

Table 4-4 Logical operators


Description

A OR B BOOLEAN If A or B is TRUE, TRUE is returned. Three-valued logic issupported.

A AND B BOOLEAN If both A and B are TRUE, TRUE is returned. Three-valuedlogic is supported.

NOT A BOOLEAN If A is not TRUE, TRUE is returned. If A is UNKNOWN,UNKNOWN is returned.

A ISFALSE

BOOLEAN If A is TRUE, TRUE is returned. If A is UNKNOWN,FALSE is returned.

A IS NOTFALSE

BOOLEAN If A is not FALSE, TRUE is returned. If A is UNKNOWN,TRUE is returned.

A ISTRUE

BOOLEAN If A is TRUE, TRUE is returned. If A is UNKNOWN,FALSE is returned.



80


Description

A IS NOTTRUE

BOOLEAN If A is not TRUE, TRUE is returned. If A is UNKNOWN,TRUE is returned.

A ISUNKNOWN

BOOLEAN If A is UNKNOWN, TRUE is returned.

A IS NOTUNKNOWN

BOOLEAN If A is not UNKNOWN, TRUE is returned.

Precautions

Only data of the Boolean type can be used for calculation using logical operators. Implicittype conversion is not supported.

Arithmetical OperatorsArithmetic operators include binary operators and unary operators, for all of which, thereturned results are of the numeric type. Table 4-5 lists arithmetic operators supported byStream SQL.

Table 4-5 Arithmetical operators

Operator

ReturnedData Type

Description

+numeric

All numerictypes

Numbers are returned.

-numeric

All numerictypes

Negative numbers are returned.

A + B All numerictypes

A plus B. The result data type depends on the operation datatypes. For example, if a floating-point number is added to aninteger, a floating-point number will be returned.

A - B All numerictypes

A minus B. The result data type depends on the operation datatypes.

A * B All numerictypes

Multiply A and B. The result data type depends on theoperation data types.

A / B All numerictypes

Divide A by B. The result is a number of the double data type(double-precision).



81

Operator

ReturnedData Type

Description

POWER(A,B)

All numerictypes

Return the value of A raised to the power B.

ABS(numeric)

All numerictypes

Return the absolute value of a specified value.

MOD(A, B)

All numerictypes

Return the remainder (modulus) of A divided by B. A negativevalue is returned only when A is a negative value.

SQRT(A)

All numerictypes

Return the square root of A.

LN(A) All numerictypes

Return the nature logarithm of A (base e).

LOG10(A)

All numerictypes

Return the base 10 logarithms of A.

EXP(A)

All numerictypes

Return the value of e raised to the power of A.

CEIL(A)CEILING(A)

All numerictypes

Return the smallest integer that is greater than or equal to A.For example: ceil(21.2) = 22.

FLOOR(A)

All numerictypes

Return the largest integer that is less than or equal to A. Forexample: floor(21.2) = 21.

SIN(A)

All numerictypes

Return the sine value of A.

COS(A)

All numerictypes

Return the cosine value of A.

TAN(A)

All numerictypes

Return the tangent value of A.

COT(A)

All numerictypes

Return the cotangent value of A.

ASIN(A)

All numerictypes

Return the arc sine value of A.

ACOS(A)

All numerictypes

Return the arc cosine value of A.

ATAN(A)

All numerictypes

Return the arc tangent value of A.



82

Operator

ReturnedData Type

Description

DEGREES(A)

All numerictypes

Convert the value A from radians to degrees.

RADIANS(A)

All numerictypes

Convert the value A from degrees to radians.

SIGN(A)

All numerictypes

Return the sign of A. 1 is returned if A is positive. –1 isreturned if A is negative. Otherwise, 0 is returned.

ROUND(A,d)

All numerictypes

Round A to d places right to the decimal point. d is an int type.For example: round(21.263,2) = 21.26.

PI() All numerictypes

Return the value of pi.

Precautions

Data of the string type is not allowed in arithmetic operations.

String Operators

Table 4-6 lists common operation rules for string operators. A and B indicate stringexpressions.

Table 4-6 String operators

Operator ReturnedDataType

Description

A || B STRING Return string concatenated from A and B.

CHAR_LENGTH(A) INT Return the number of characters in string A.

CHARACTER_LENGTH(A)

INT Return the number of characters in string A.

UPPER(A) STRING Return the uppercase letter A.

LOWER(A) STRING Return the lowercase letter a.

POSITION(A IN B) INT Return the position where A first appears in B.

TRIM( { BOTH |LEADING |TRAILING } AFROM B)

STRING Remove A at the start position, or end position, orboth the start and end positions from B. By default,string expressions A at both the start and endpositions are removed.



83

Operator ReturnedDataType

Description

OVERLAY(APLACING B FROMinteger [ FOR B ])

STRING Replace A with B.

SUBSTRING(AFROM integer)

STRING Return the substring that starting from a fixedposition of A.

SUBSTRING(AFROM integer FORinteger)

STRING Return the substring that starting from a fixedposition and with the given length A.

INITCAP(A) STRING Return the string whose first letter is in uppercaseand the other letters in lowercase. Words aresequences of alphanumeric characters separated bynon-alphanumeric characters.

MD5(String expr) STRING Return the md5 value of a string.

SHA1(String expr) STRING Return the SHA1 value of a string.

SHA256(String expr) STRING Return the SHA256 value of a string.

replace(String expr,String toreplace,String replace)

STRING The character string replacement function is used toreplace all "toreplace" in the expr string with"replace".

hash_code(Stringexpr)

INT Obtain the hash value. In addition to string, theparameter supports int, bigint, float, and double.

4.4 Function

Type Conversion Function

Syntax

CAST(value AS type)

Description

This function is used to forcibly convert types.

Precautions

If the input is NULL, NULL is returned.

Example

Convert amount into a character string. The specified length of the string is invalid after theconversion.

insert into temp select cast(amount as VARCHAR(10)) from source_stream;



84

Time FunctionTable 4-7 lists the time functions supported by Stream SQL.

Table 4-7 Time functions

Function Returned DataType

Description

DATE string DATE Parse the date string (yyyy-MM-dd) to a SQL date.

TIME string TIME Parse the time string (HH:mm:ss) to the SQL time.

TIMESTAMP string TIMESTAMP

Convert the time string into timestamp. The timestring format is yyyy-MM-dd HH:mm:ss.fff.

INTERVAL stringrange

INTERVAL

There are two types of intervals: yyyy-MM and ddHH:mm:sss.fff'. The range of yyyy-MM can beYEAR or YEAR TO MONTH, with the precision ofmonth. The range of dd HH:mm:sss.fff' can be DAYTO HOUR, DAY TO MINUTE, DAY TO SECOND,or DAY TO MILLISECONDS, with the precision ofmillisecond. For example, if the range is DAY TOSECOND, the day, hour, minute, and second are allvalid and the precision is second. DAY TO MINUTEindicates that the precision is minute.For example:INTERVAL '10 00:00:00.004' DAY TOmilliseconds indicates that the interval is 10 daysand 4 milliseconds.INTERVAL '10' DAY indicates that the interval is10 days and INTERVAL '2-10' YEAR TOMONTH indicates that the interval is 2 years and 10months.

CURRENT_DATE DATE Return the SQL date of UTC time zone.

CURRENT_TIME TIME Return the SQL time of UTC time zone.

CURRENT_TIMESTAMP

TIMESTAMP

Return the SQL timestamp of UTC time zone.

LOCALTIME TIME Return the SQL time of the current time zone.

LOCALTIMESTAMP TIMESTAMP

Return the SQL timestamp of the current time zone.

EXTRACT(timeintervalunit FROMtemporal)

INT Extract part of the time point or interval. Return thepart in the int type.For example, extract the date 2006-06-05 and return5.



85


Description

FLOOR(timepoint TOtimeintervalunit)

TIME Round a time point down to the given unit.For example, 12:44:00 is returned fromFLOOR(TIME '12:44:31' TO MINUTE).

CEIL(timepoint TOtimeintervalunit)

TIME Round a time point up to the given unit.For example, 12:45:00 is returned from CEIL(TIME'12:44:31' TO MINUTE).

QUARTER(date) INT Return the quarter from the SQL date.

(timepoint, temporal)OVERLAPS(timepoint, temporal)

BOOLEAN

Check whether two intervals overlap. The timepoints and time are converted into a time range witha start point and an end point. The function isleftEnd >= rightStart && rightEnd >= leftStart. IfleftEnd is greater than or equal to rightStart andrightEnd is greater than or equal to leftStart, true isreturned. Otherwise, false is returned.For example:l If leftEnd is 3:55:00 (2:55:00+1:00:00), rightStart

is 3:30:00, rightEnd is 5:30:00(3:30:00+2:00:00), and leftStart is 2:55:00, truewill be returned.Specifically, true is returned from (TIME'2:55:00', INTERVAL '1' HOUR) OVERLAPS(TIME '3:30:00', INTERVAL '2' HOUR).

l If leftEnd is 10:00:00, rightStart is 10:15:00,rightEnd is 13:15:00 (10:15:00+3:00:00), andleftStart is 9:00:00, false will be returned.Specifically, false is returned from (TIME'9:00:00', TIME '10:00:00') OVERLAPS (TIME'10:15:00', INTERVAL '3' HOUR).

to_localtimestamp(long expr)

TIMESTAMP

Convert the time to the local time.

Precautions

None

Example

insert into temp SELECT Date '2015-10-11' FROM OrderA;//Date is returnedinsert into temp1 SELECT Time '12:14:50' FROM OrderA;//Time is returnedinsert into temp2 SELECT Timestamp '2015-10-11 12:14:50' FROM OrderA;//Timestamp is returned



86

Aggregate FunctionsAn aggregate function performs a calculation operation on a set of input values and returns avalue. For example, the COUNT function counts the number of rows retrieved by an SQLstatement. Table 4-8 lists aggregate functions.

Table 4-8 Aggregate functions

Function Return DataType

Description

COUNT(value [,value]* )

DOUBLE Return count of values that are not null.

COUNT(*) BIGINT Return count of tuples.

AVG(numeric) DOUBLE Return average (arithmetic mean) of all inputvalues.

SUM(numeric) DOUBLE Return the sum of all input numerical values.

MAX(value) DOUBLE Return the maximum value of all input values.

MIN(value) DOUBLE Return the minimum value of all input values.

STDDEV_POP(value)

DOUBLE Return the population standard deviation of allnumeric fields of all input values.

STDDEV_SAMP(value)

DOUBLE Return the sample standard deviation of allnumeric fields of all input values.

VAR_POP(value) DOUBLE Return the population variance (square ofpopulation standard deviation) of numeral fields ofall input values.

VAR_SAMP(value) DOUBLE Return the sample variance (square of the samplestandard deviation) of numeric fields of all inputvalues.

Precautions

None

Example

None



87

Array Functions

Table 4-9 Array functions


Description

CARDINALITY(ARRAY)

INT Return the element count of an array.

ELEMENT(ARRAY)

- Return the sole element of an array with a singleelement. If the array contains no elements, null isreturned. If the array contains multiple elements,an exception is reported.

Precautions

None

Example

The element count of the following arrays is 3.

insert into temp select CARDINALITY(ARRAY[TRUE, TRUE, FALSE]) from source_stream;

HELLO WORLD is returned.

insert into temp select ELEMENT(ARRAY['HELLO WORLD']) from source_stream;

Attribute Access Functions

Table 4-10 Attribute access functions


Description

tableName.compositeType.field

- Select a single field, use the name to access thefield of Apache Flink composite types, such asTuple and POJO, and return the value.

tableName.compositeType.*

- Select all fields, and convert Apache Flinkcomposite types, such as Tuple and POJO, and alltheir direct subtypes into a simple table. Eachsubtype is a separate field.

Precautions

None

Example

None



88

4.5 Geographical Functions

DescriptionTable 4-11 describes the basic geospatial geometric elements.

Table 4-11 Basic geospatial geometric element table

Geospatial geometricelements

Description Example

ST_POINT(latitude,longitude)

Indicates a geographical point,including the longitude and latitude.

ST_POINT(1.12012,1.23401)

ST_LINE(array[point1...pointN])

Indicates a geographical line formedby connecting multiple geographicalpoints (ST_POINT) in sequence.The line can be a polygonal line or astraight line.

ST_LINE(ARRAY[ST_POINT(1.12, 2.23),ST_POINT(1.13,2.44),ST_POINT(1.13,2.44)])

ST_POLYGON(array[point1...point1])

Indicates a geographical polygon,which is a closed polygon areaformed by connecting multiplegeographical points (ST_POINT)with the same start and end points insequence.

ST_POLYGON(ARRAY[ST_POINT(1.0,1.0), ST_POINT(2.0,1.0), ST_POINT(2.0,2.0), ST_POINT(1.0,1.0)])

ST_CIRCLE(point,radius)

Indicates a geographical circle thatconsists of ST_POINT and a radius.

ST_CIRCLE(ST_POINT(1.0, 1.0), 1.234)

You can build complex geospatial geometries based on basic geospatial geometric elements.Table 4-12 describes the related transformation methods.

Table 4-12 Transformation methods for building complex geometric elements based on basicgeospatial geometric elements

TransformationMethod

Description Example

ST_BUFFER(geometry,distance)

Creates a polygon that surrounds thegeospatial geometric elements at agiven distance. Generally, thisfunction is used to build the roadarea of a certain width for yawdetection.

ST_BUFFER(ST_LINE(ARRAY[ST_POINT(1.12, 2.23),ST_POINT(1.13,2.44),ST_POINT(1.13,2.44)]),1.0)



89

TransformationMethod

Description Example

ST_INTERSECTION(geometry, geometry)

Creates a polygon that delimits theoverlapping area of two givengeospatial geometric elements.

ST_INTERSECTION(ST_CIRCLE(ST_POINT(1.0, 1.0), 2.0),ST_CIRCLE(ST_POINT(3.0, 1.0), 1.234))

ST_ENVELOPE(geometry)

Creates the minimal rectanglepolygon including the givengeospatial geometric elements.

ST_ENVELOPE(ST_CIRCLE(ST_POINT(1.0, 1.0), 2.0))

CS provides multiple functions used for performing operations on and determining locationsof geospatial geometric elements. Table 4-13 describes the SQL scalar functions.

Table 4-13 SQL scalar function table


Description

ST_DISTANCE(point1, point2)

DOUBLE Calculates the distance between the two geographiclocations.An example is provided as follows:Select ST_DISTANCE(ST_POINT(x1, y1),ST_POINT(x2, y2)) FROM input

ST_PERIMETER(polygon)

DOUBLE Calculates the circumference of a polygon.An example is provided as follows:SelectST_PERIMETER(ST_POLYGON(ARRAY[ST_POINT(x11, y11), ST_POINT(x12, y12),ST_POINT(x11, y11)]) FROM input

ST_AREA (polygon) DOUBLE Calculates the area of a polygon.An example is provided as follows:SelectST_AREA(ST_POLYGON(ARRAY[ST_POINT(x11, y11), ST_POINT(x12, y12), ST_POINT(x11,y11)]) FROM input



90


Description

ST_OVERLAPS(polygon1, polygon2)

BOOLEAN

Checks whether one polygon overlaps with another.An example is provided as follows:SELECTST_OVERLAPS(ST_POLYGON(ARRAY[ST_POINT(x11, y11), ST_POINT(x12, y12),ST_POINT(x11, y11)]),ST_POLYGON(ARRAY[ST_POINT(x21, y21),ST_POINT(x22, y22), ST_POINT(x23, y23),ST_POINT(x21, y21)])) FROM input

ST_INTERSECT BOOLEAN

Checks whether two line segments, rather than thetwo straight lines where the two line segments arelocated, intersect each other.An example is provided as follows:SELECTST_INTERSECT(ST_LINE(ARRAY[ST_POINT(x11, y11), ST_POINT(x12, y12)]),ST_LINE(ARRAY[ST_POINT(x21, y21),ST_POINT(x22, y22), ST_POINT(x23, y23)]))FROM input

ST_WITHIN BOOLEAN

Checks whether one point is contained inside ageometry (polygon or circle).An example is provided as follows:SELECT ST_WITHIN(ST_POINT(x11, y11),ST_POLYGON(ARRAY[ST_POINT(x21, y21),ST_POINT(x22, y22), ST_POINT(x23, y23),ST_POINT(x21, y21)])) FROM input

ST_CONTAINS BOOLEAN

Checks whether the first geometry contains thesecond geometry.An example is provided as follows:SELECTST_CONTAINS(ST_POLYGON(ARRAY[ST_POINT(x11, y11), ST_POINT(x12, y12), ST_POINT(x11,y11)]), ST_POLYGON(ARRAY[ST_POINT(x21,y21), ST_POINT(x22, y22), ST_POINT(x23, y23),ST_POINT(x21, y21)])) FROM input



91


Description

ST_COVERS BOOLEAN

Checks whether the first geometry covers the secondgeometry. This function is similar toST_CONTAINS except the situation when judgingthe relationship between a polygon and the boundaryline of polygon, for which ST_COVERS returnsTRUE and ST_CONTAONS returns FALSE.An example is provided as follows:SELECTST_COVERS(ST_POLYGON(ARRAY[ST_POINT(x11, y11), ST_POINT(x12, y12), ST_POINT(x11,y11)]), ST_POLYGON([ST_POINT(x21, y21),ST_POINT(x22, y22), ST_POINT(x23, y23),ST_POINT(x21, y21)])) FROM input

ST_DISJOINT BOOLEAN

Checks whether one polygon is disjoint (notoverlapped) with the other polygon.An example is provided as follows:SELECTST_DISJOINT(ST_POLYGON(ARRAY[ST_POINT(x11, y11), ST_POINT(x12, y12), ST_POINT(x11,y11)]), ST_POLYGON(ARRAY[ST_POINT(x21,y21), ST_POINT(x22, y22), ST_POINT(x23, y23),ST_POINT(x21, y21)])) FROM input

CS also provides window-based SQL geographic aggregation functions specific for scenarioswhere SQL logic involves windows and aggregation. For details about the functions, seeTable 4-14.

Table 4-14 Time-related SQL geographical aggregation function table

Function Description Example

AGG_DISTANCE(point) Distance aggregation function,which is used to calculate thetotal distance of all adjacentgeographical points in thewindow.

SELECTAGG_DISTANCE(ST_POINT(x,y) FROM inputGROUP BYHOP(rowtime, INTERVAL'1' HOUR, INTERVAL '1'DAY))

AVG_SPEED(point) Average speed aggregationfunction, which is used tocalculate the average speed ofmoving tracks formed by allgeographic points in a window.

SELECTAVG_SPEED(ST_POINT(x,y) FROM input GROUPBY TUMBLE(proctime,INTERVAL '1' DAY))



92

PrecautionsNone

ExampleExample of yaw detection:

INSERT INTO yaw_warningSELECT "The car is yawing"FROM driver_behaviorWHERE NOT ST_WITHIN(ST_POINT(cast(Longitude as DOUBLE), cast(Latitude as DOUBLE)), ST_BUFFER(ST_LINE(ARRAY[ST_POINT(34.585555,105.725221),ST_POINT(34.586729,105.735974),ST_POINT(34.586492,105.740538),ST_POINT(34.586388,105.741651),ST_POINT(34.586135,105.748712),ST_POINT(34.588691,105.74997)]),0.001));

4.6 DDL Statement

DIS as Source DataCreate a source stream to ingest data from DIS. For details about DIS, see Related Services.

Syntax

CREATE SOURCE STREAM stream_id (attr_name attr_type (',' attr_nameattr_type)* )WITH (type = "dis",region = "",channel = "",partition_count = "",encode= "",field_delimiter = "",offset= "")(TIMESTAMP BY timeindicator (','timeindicator)?);timeindicator:PROCTIME '.' PROCTIME| ID '.' ROWTIME

Description


Parameter MandatoryorNot

Description

type Yes Indicates the data source type. dis indicates that the datasource is DIS.

region Yes Indicates the region where DIS for storing the data is located.

channel Yes Indicates the name of the DIS stream where data is located.

partition_count Yes Indicates the number of partitions of the DIS stream wheredata is located.



93

Parameter MandatoryorNot

Description

encode Yes Indicates the data coding format. The value can be csv andjson.NOTE

l If the coding format is csv, you need to configure fieldseparators.

l If the coding format is json, you need to configure the mappingbetween the json field and stream-defined fields. For details, seethe examples.

field_delimiter Yes Indicates a separator used to separate every two attributes.l This parameter needs to be configured if the CSV

encoding format is adopted. It can be user-defined, forexample, a comma (,).

l This parameter is not required if the JSON encodingformat is adopted.

offset No l If data is imported to the DIS stream after the job isstarted, this parameter will become invalid.

l If the job is started after data is imported to the DISstream, you can set the parameter as required.For example, if offset is set to 100, CS starts from the100th data record in DIS.

timeindicator No Indicates the timestamp added in the source stream. Thevalue can be processing time or event time.NOTE

l If this parameter is set to processing time, the format isproctime.proctime. In this case, an attribute proctime will beadded to the original attribute field. If there are three attributes inthe original attribute field, four attributes will be exported afterthis parameter is set to processing time. However, the attributelength remains unchanged if the rowtime attribute is specified.

l If this parameter is set to event time, you can select an attributein the stream as the timestamp. The format isattr_name.rowtime.

l This parameter can be simultaneously set to processing time andevent time.

Precautions

The attribute type used as the timestamp must be long or timestamp.

Example

l CSV: reads data from the DIS stream and records it as codes in CSV format. The codesare separated by commas (,).CREATE SOURCE STREAM student_scores (attr1 string, attr2 int, attr3 long)WITH (



94

type = "dis", region = "southchina", channel = "csinput", partition_count = "1", encode = "csv", field_delimiter = ",")TIMESTAMP BY attr3.rowtime;

l JSON: reads data from the DIS stream and records it as codes in JSON format. Forexample: {"student":{"name":"coco", "age":15}}.CREATE SOURCE STREAM student_scores (attr1 string, attr2 int, attr3 long)WITH ( type = "dis", region = "southchina", channel = "csinput", partition_count = "1", encode = "json", json_config = "attr1=student.name;attr2=student.age;")TIMESTAMP BY attr3.rowtime;

OBS as Source DataData used for creating a source stream is obtained from OBS. For details about OBS, seeRelated Services.

Syntax

CREATE SOURCE STREAM stream_id (attr_name attr_type (',' attr_nameattr_type)* )WITH (type = "obs",region = "",bucket = "",object_name ="",row_delimiter = "\n",field_delimiter = '',version_id = "")(TIMESTAMP BYtimeindicator (',' timeindicator)?);timeindicator:PROCTIME '.' PROCTIME| ID '.'ROWTIME

Description


Parameter Mandatory or Not Description

type Yes Indicates the data source type. obsindicates that the data source is OBS.

region Yes Indicates the region to which OBSbelongs.

bucket Yes Indicates the name of the OBS bucketwhere data is located.

object_name Yes Indicates the name of the object storedin the OBS bucket where data is located.

row_delimiter Yes Indicates a separator used to separateevery two lines.



95


field_delimiter Yes Indicates a separator used to separateevery two attributes.l This parameter needs to be

configured if the CSV encodingformat is adopted. It can be user-defined, for example, a comma (,).

l This parameter is not required if theJSON encoding format is adopted.

version_id No Indicates version number. Thisparameter is optional and required onlywhen the OBS bucket or object hasversion settings.

timeindicator No Indicates the timestamp added in thesource stream. The value can beprocessing time or event time.NOTE

l If this parameter is set to processingtime, the format is proctime.proctime.In this case, an attribute proctime will beadded to the original attribute field. Ifthere are three attributes in the originalattribute field, four attributes will beexported after this parameter is set toprocessing time. However, the attributelength remains unchanged if the rowtimeattribute is specified.

l If this parameter is set to event time,you can select an attribute in the streamas the timestamp. The format isattr_name.rowtime.

l This parameter can be simultaneouslyset to processing time and event time.

Precautions


Example

The test1 file is read from the OBS bucket. Rows are separated by '\n' and columns areseparated by ','.

CREATE SOURCE STREAM student_scores (attr1 string, attr2 int, attr3 long)WITH ( type = "obs", region = "southchina", bucket = "obssource", object_name = "test1", row_delimiter = "\n", field_delimiter = ',', version_id = "1") TIMESTAMP BY attr3.rowtime;



96

Kafka as Source Data

Create a source stream, and obtain data from Kafka. When using offline Kafka clusters, useVPC peering to connect CS to Kafka. For details about VPC, see Related Services.

Syntax

CREATE SOURCE STREAM kafka_source (name STRING, age int)WITH (type ="kafka",kafka_bootstrap_servers = "",kafka_group_id = "",kafka_topic = "",encode ="json")(TIMESTAMP BY timeindicator (','timeindicator)?);timeindicator:PROCTIME '.' PROCTIME| ID '.' ROWTIME

Description



type Yes Indicates the data source type. Kafkaindicates that the data source is Kafka.

kafka_bootstrap_servers Yes Indicates the port that connects CS toKafka. Use VPC peering to connect theCS clusters with the Kafka clusters.

kafka_group_id Yes Group ID

kafka_topic Yes Indicates the Kafka topic to be read.

encode Yes Indicates the decoding format. Only theJSON format is supported.





Precautions



97

l The attribute type used as the timestamp must be long or timestamp.l If the Kafka server listens on the port using hostname, you need to add the mapping

between the hostname and IP address of the Kafka Broker node to the CS cluster. Fordetails, see Adding an IP-Domain Mapping.

Example

Read Kafka topic test.

CREATE SOURCE STREAM kafka_source (name STRING, age int)WITH ( type = "kafka", kafka_bootstrap_servers = "ip1:port1,ip2:port2", kafka_group_id = "sourcegroup1", kafka_topic = "test", encode = "json");

CloudTable as Source DataCreate a source stream to obtain data from CloudTable. For details about CloudTable, seeRelated Services.

Syntax

CREATE SOURCE STREAM stream_id (attr_name attr_type (',' attr_nameattr_type)* )WITH (type = "cloudtable",region = "",cluster_id = "",table_name ="",table_columns = "")(TIMESTAMP BY timeindicator (','timeindicator)?);timeindicator:PROCTIME '.' PROCTIME| ID '.' ROWTIME

Description



type Yes Indicates the data source type.CloudTable indicates that the datasource is CloudTable.

region Yes Indicates the region to whichCloudTable belongs.

cluster_id Yes Indicates the ID of the cluster to whichthe data table to be read belongs.

table_name Yes Indicates the name of the data table tobe read.

table_columns Yes Indicates the column to be read. Theformat is rowKey,f1:c1,f1:c2,f2:c1. Thenumber of columns must be the same asthe number of attributes specified in thesource stream.



98






Precautions


Example

Read the student table from CloudTable.

CREATE SOURCE STREAM student_scores (attr1 string, attr2 int, attr3 long)WITH (type = "cloudtable", region = "cn-north-1", cluster_id = "209ab1b6-de25-4c48-8e1e-29e09d02de28", table_name = "student", table_columns = "rowKey,info:name,info:age,course:math,course:science"

) TIMESTAMP BY attr3.rowtime;

DIS as Sink Data StorageData is stored to DIS. For details about DIS, see Related Services.

Syntax

CREATE SINK STREAM stream_id (attr_name attr_type (',' attr_nameattr_type)* )WITH (type = "dis",region = "",channel = "",partition_key = "",encode="",field_delimiter= "");

Description



99



type Yes Indicates the output channel type. Valuedis indicates that data is stored to DIS.

region Yes Indicates the region where DIS forstoring the data is located.

channel Yes Indicates the DIS stream.

partition_key Yes Indicates the group primary key.Multiple primary keys are separated bycommas (,).

encode Yes Indicates the data coding format. Thevalue can be csv and json.NOTE

l If the coding format is csv, you need toconfigure field separators.

l If the coding format is json, you need toconfigure whether to generate an emptyfield. For details, see the examples.

field_delimiter Yes Indicates a separator used to separateevery two attributes.l This parameter needs to be

configured if the CSV encodingformat is adopted. It can be user-defined, for example, a comma (,).

l This parameter is not required if theJSON encoding format is adopted.

Precautions

None

Example

l CSV: Data is written to the DIS stream as codes in CSV format which are separated bycommas (,). If there are multiple partitions, attr1 is used as the key to distribute todifferent partitions. For example, xiaohong,12,22.CREATE SINK STREAM stream_a (attr1 string, attr2 int, attr3 int)WITH ( type = "dis", region = "southchina", channel = "csoutput", partition_key = "attr1", encode = "csv", field_delimiter = ",");

l JSON: Data is written to the DIS stream as codes in JSON format. If there are multiplepartitions, attr1 and attr2 are used as the keys to distribute to different partitions. IfenableOutputNull is set to true, an empty field (the value is null) is generated. If it is



100

set to false, no empty field is generated. For example, "attr1":"xiaohong", "attr2":12,"attr3":22.CREATE SINK STREAM stream_b (attr1 string, attr2 int, attr3 int)WITH ( type = "dis", region = "southchina", channel = "csoutput", partition_key = "attr1,attr2", encode = "json", enable_output_null = "false");

RDS as Sink Data StorageData is stored to RDS. For details about RDS, see Related Services.

Syntax

CREATE SINK STREAM stream_id (attr_name attr_type (',' attr_nameattr_type)* )WITH (type = "rds",username = "",password = "",db_url = "",table_name= "");

Description



type Yes Indicates the output channel type. Valuerds indicates that data is stored to RDS.

username Yes Indicates the username of a database.

password Yes Indicates the password of a database.

db_url Yes Indicates the database connectionaddress, for example,{database_type}://ip:port/database.Currently, two types of databaseconnections are supported: MySQL andPostgresql.Mysql: 'mysql://ip:port/database'Postgresql: 'postgresql://ip:port/database'

table_name Yes Indicates the name of the table wheredata will be added.



101


db_columns No Indicates the mapping betweenattributes in the output stream and thosein the database table. This parametermust be configured based on thesequence of attributes in the outputstream.Example:create sink stream a3(student_name string, student_age int) with (type = "rds",username = "root",password = "xxxxxxxx",db_url = "mysql://192.168.0.102:8635/test1",db_columns = "name,age",table_name = "t1");In the example, student_namecorresponds to the name attribute in thedatabase, and student_age correspondsto the age attribute in the database.NOTE

If db_columns is not configured, it isnormal that the number of attributes in theoutput stream is less than that of attributes inthe database table and the extra attributes inthe database table are all nullable or havedefault values.

primary_key No To update data in the table in real timeby using the primary key, add theprimary_key configuration item(c_timeminute in the followingexample) when creating a table. Duringdata write operations, if the specifiedprimary_key exists, data is updated.Otherwise, data is inserted.Example:CREATE SINK STREAM test(c_timeminute LONG, c_cnt LONG)WITH (type = "rds",username = "root",password = "xxxxxxxx",db_url = "mysql://192.168.0.12:8635/test",table_name = "test",primary_key = "c_timeminute");

Precautions

The stream format defined by stream_id must be the same as the table format.

Example



102

streamA data is exported to the OrderA table in the test database.

CREATE SINK STREAM streamA (attr1 string, attr2 int, attr3 int)WITH ( type = "rds", username = "root", password = "xxxxxxxx", db_url = "mysql://localhost:3306/test", table_name = "OrderA");

SMN as Sink Data StorageData is stored to SMN. For details about SMN, see Related Services.

Syntax

CREATE SINK STREAM stream_id xxx WITH(type = "smn",region = "",topic_urn ="",message_subject = "",message_column = "")

Description



type Yes Indicates the output channel type. Valuesmn indicates that data is stored toSMN.

region Yes Indicates the region to which SMNbelongs.

topic_urn Yes Indicates URN of an SMN topic. TheSMN topic serves as the destination forshort message notification and needs tobe created in SMN.

message_subject Yes Indicates the message subject sent toSMN. This parameter can be user-defined.

message_column Yes Indicates the column name in the sinkstream. Contents of the column nameserve as the message contents, which areuser-defined. Currently, only textmessages (default) are supported.

Precautions

None

Example

over_speed_warning data is exported to SMN.

CREATE SINK STREAM over_speed_warning ( over_speed_message STRING /* over speed message */



103

)WITH ( type = "smn", region = "southchina", topic_Urn = "urn:smn:cn-north-1:38834633fd6f4bae813031b5985dbdea:ddd", message_subject = "message title", message_column = "over_speed_message");

Kafka as Sink Data StorageOutput data to Kafka. When using offline Kafka clusters, use VPC peering to connect CS toKafka. For details about VPC, see Related Services.

Syntax

CREATE SINK STREAM kafka_sink (name STRING) WITH(type ="kafka",kafka_bootstrap_servers = "",kafka_topic = "",encode = "json")

Description



type Yes Indicates the output channel type.Kafka indicates that data is stored toKafka.

kafka_bootstrap_servers

Yes Indicates the port that connects CS toKafka. Use VPC peering to connect theCS clusters with the Kafka clusters.

kafka_topic Yes Indicates the Kafka topic into which CSwrites data.

encode Yes Only the JSON format is supported.

Precautions

If the Kafka server listens on the port using hostname, you need to add the mapping betweenthe hostname and IP address of the Kafka Broker node to the CS cluster. For details, seeAdding an IP-Domain Mapping.

Example

Output the data in the kafka_sink stream to Kafka.

CREATE SINK STREAM kafka_sink (name STRING) WITH ( type="kafka", kafka_bootstrap_servers = "ip1:port1,ip2:port2", kafka_topic = "testsink", encode = "json" );

CloudTable (HBase) as Sink Data StorageOutput data to CloudTable (HBase). For details about CloudTable, see Related Services.



104

Syntax

CREATE SINK STREAM stream_id (attr_name attr_type (',' attr_nameattr_type)* )WITH (type = "cloudtable",region = "",cluster_id = "",table_name ="",table_columns = "",create_if_not_exist = "")

Description



type Yes Indicates the output channel type. Valuecloudtable indicates that data is storedto CloudTable (HBase).

region Yes Indicates the region to whichCloudTable belongs.

cluster_id Yes Indicates the ID of the cluster to whichthe data table to be read belongs.

table_name Yes Indicates the name of the data table tobe read.

table_columns Yes Indicates the column to be read. Theformat is rowKey,f1:c1,f1:c2,f2:c1. Thenumber of columns must be the same asthe number of attributes specified in thesource stream.

create_if_not_exist No Indicates whether to create a table orcolumn into which the data is writtenwhen this table or column does notexist. The value can be true or false,and false is used by default.

Precautions

None

Example

Output data of stream studentB to CloudTable (HBase).

CREATE SINK STREAM studentB (attr1 string, attr2 string, attr3 int)WITH (type = "cloudtable", region = "cn-north-1", cluster_id = "209ab1b6-de25-4c48-8e1e-29e09d02de28", table_name = "student_pass_exam", table_columns = "rowKey,info:name,info:age,course:math,course:science", create_if_not_exist = "true")

CloudTable (OpenTSDB) as Sink Data StorageOutput data to CloudTable (OpenTSDB). For details about CloudTable, see Related Services.



105

Syntax

CREATE SINK STREAM stream_id (attr_name attr_type (',' attr_nameattr_type)* )WITH (type = "opentsdb",region = "",cluster_id = "",tsdb_metrics = "",tsdb_timestamps = "",tsdb_values = "",tsdb_tags = "",batch_insert_data_num = "")

Description


Parameter Mandatory orNot

Description

type Yes Indicates the output channel type. Valueopentsdb indicates that data is stored toCloudTable (OpenTSDB).

region Yes Indicates the region to which CloudTablebelongs.

cluster_id Yes Indicates the ID of the cluster to which datais to be inserted.

tsdb_metrics Yes Indicates the metric of a data point, whichcan be specified through parameterconfigurations.

tsdb_timestamps Yes Indicates the timestamp of a data point. Thedata type can be TIMESTAMP, LONG, orINT. Only dynamic columns are supported.

tsdb_values Yes Indicates the value of a data point. The datatype can be SHORT, INT, LONG, FLOAT,DOUBLE, or STRING. Dynamic columns orconstant values are supported.

tsdb_tags Yes Indicates the tags of a data point. Each oftags contains at least one tag value and up toeight tag values. Tags of the data point canbe specified through parameterconfigurations.

batch_insert_data_num No Indicates the amount of data to be written inbatches at a time. The value must be apositive integer. The upper limit is 100. Thedefault value is 8.

Precautions

None

Example

Output data of stream weather_out to CloudTable (OpenTSDB).

CREATE SINK STREAM weather_out ( timestamp_value LONG, /* Time */



106

temperature FLOAT, /* Temperature value */ humidity FLOAT, /* Humidity */ location STRING /* Location */)WITH ( type = "opentsdb", region = "$region_placeholder", cluster_id = "e05649d6-00e2-44b4-b0ff-7194adaeab3f", tsdb_metrics = "weather", tsdb_timestamps = "${timestamp_value}", tsdb_values = "${temperature}; ${humidity}", tsdb_tags = "location:${location},signify:temperature; location:${location},signify:humidity", batch_insert_data_num = "10");

Cloud Search Service as Sink Data Storage

Data is exported to Cloud Search Service. For details about Cloud Search Service, seeRelated Services.

Syntax

CREATE SINK STREAM stream_id (attr_name attr_type (',' attr_nameattr_type)* )WITH (type = "es",region = "",cluster_address = "",es_index ="",es_type= "",es_fields= "",batch_insert_data_num= "");

Description



type Yes Indicates the output channel type. Valuees indicates that data is stored to CloudSearch Service.

region Yes Region where Cloud Search Service islocated.

cluster_address Yes Indicates the private network address ofthe Cloud Search Service cluster, forexample: x.x.x.x:x.

es_index Yes Indicates the index storing the data to beinserted.

es_type Yes Indicates the document type of the datato be inserted.

es_fields Yes Indicates the key of the data field to beinserted. The parameter is in the formatof "Id, f1, f2, f3, f4". Ensure that theparameter value has a one-to-onemapping with data columns in the sinkstream. If the key is not used, remove theid keyword. Specifically, the parameteris in the format of "F1, f2, f3, f4, f5".



107


batch_insert_data_num

Yes Indicates the amount of data to bewritten in batches at a time. The valuemust be a positive integer. The upperlimit is 100. The default value is 10.

Precautions

1. Ensure that you have created a cluster on Cloud Search Service using your account. Fordetails about how to create a cluster on Cloud Search Service, see the Cloud SearchService User Guide.

2. In this scenario, jobs must run on the exclusive cluster of CS. Therefore, CS mustinterconnect with the VPC that has been connected with Cloud Search Service. You canalso set the security group rules as required.For details about how to set up the VPC peering connection, see VPC PeeringConnection.For details about how to configure security group rules, see Security Group in theVirtual Private Cloud User Guide.

Example

Data is exported to the cluster on Cloud Search Service.CREATE SINK STREAM stream_a (attr1 string, attr2 int, attr3 int, attr4 int, attr5 int,)WITH ( type = "es", region = "$region_placeholder", cluster_address = "192.168.0.212:9200", es_index = "school", es_type = "student", es_fields = "id,name,age,math,science", batch_insert_data_num = "10");

Redis as Sink Data Storage

Data is exported to DCS Redis. For details about DCS, see Related Services.

Syntax

CREATE SINK STREAM stream_id (attr_name attr_type (',' attr_nameattr_type)* )WITH (type = "dcs_redis",region = "",cluster_address = "",password ="",value_type= "",key_value= "");

Description



type Yes Indicates the output channel type. Valuedcs_redis indicates that data is exportedto DCS Redis.



108

https://support.huaweicloud.com/en-us/vpc_gls/index.html


region Yes Indicates the region where DCS forstoring the data is located.

cluster_address Yes Indicates the Redis instance connectionaddress.

password No Indicates the Redis instance connectionpassword. This parameter is not requiredif password-free access is used.

value_type Yes Indicates the data type. Multiple datatypes are separated by semicolons (;).Supported data types include string, list,hash, set, and zset.

key_value Yes Indicates the key and value. The numberof key_value pairs must be the same asthe number of types specified byvalue_type, and key_value pairs areseparated by semicolons (;). Both keyand value can be specified throughparameter configurations. The dynamiccolumn name is represented by ${column name}.

Precautions

1. Ensure that You have created a Redis cache instance on DCS using your account.For details about how to create a Redis cache instance, see the Distributed CacheService User Guide.

2. In this scenario, jobs must run on the exclusive cluster of CS. Therefore, CS mustinterconnect with the VPC that has been connected with DCS clusters. You can also setthe security group rules as required.For details about how to set up the VPC peering connection, see VPC PeeringConnection.For details about how to configure security group rules, see Security Group in theVirtual Private Cloud User Guide.

Example

The data is exported to the DCS Redis cache instance.CREATE SINK STREAM stream_a (attr1 string, attr2 int, attr3 int)WITH ( type = "dcs_redis", region = "$region_placeholder", cluster_address = "192.168.0.34:6379", password = "xxxxxxxx", value_type = "string; list; hash; set; zset", key_value = "${student_number}_str: ${student_name}; name_list: ${student_name}; ${student_number}_hash: {name:${student_name}, age: ${student_age}}; name_set: ${student_name}; math_zset: {${student_name}:${math_score}}");



109




Create a Redis Table for connecting to the Input StreamData is exported to DCS Redis. For details about DCS, see Related Services.

Syntax

CREATE TABLE table_id (attr_name attr_type (',' attr_name attr_type)* )WITH (type= "dcs_redis",cluster_address = "",password = "",value_type= "",key_column="",hash_key_column="");

Description



type Yes Indicates the output channel type. Valuedcs_redis indicates that data is exportedto DCS Redis.

cluster_address Yes Indicates the Redis instance connectionaddress.

password No Indicates the Redis instance connectionpassword. This parameter is not requiredif password-free access is used.

value_type Yes Indicates the field data type. Supporteddata types include string, list, hash, set,and zset.

key_column Yes Indicates the column name of the keyattribute.

hash_key_column No If value_type is set to hash, this fieldmust be specified as the column name ofthe level-2 key attribute.

Precautions

1. Ensure that You have created a Redis cache instance on DCS using your account.For details about how to create a Redis cache instance, see the Distributed CacheService User Guide.

2. In this scenario, jobs must run on the exclusive cluster of CS. Therefore, CS mustinterconnect with the VPC that has been connected with DCS clusters. You can also setthe security group rules as required.For details about how to set up the VPC peering connection, see VPC PeeringConnection.For details about how to configure security group rules, see Security Group in theVirtual Private Cloud User Guide.

Example

The Redis table is used to connect to the input stream.



110




CREATE TABLE table_a (attr1 string, attr2 string, attr3 string)WITH ( type = "dcs_redis", value_type = "hash", key_column = "attr1", hash_key_column = "attr2", cluster_address = "192.168.1.238:6379", password = "xxxxxxxx");

Temporary Stream

The temporary stream is used to simplify SQL logic. If complex SQL logic is followed, writeSQL statements concatenated with temporary streams. The temporary stream is just a logicalconcept and does not generate any data.

Syntax

CREATE TEMP STREAM stream_id (attr_name attr_type (',' attr_name attr_type)* )

Description

None

Precautions

None

Example

create temp stream a2(attr1 int, attr2 string);

4.7 DML Statement

4.7.1 SQL Syntax DefinitionINSERT INTO stream_name query;query: values | { select | selectWithoutFrom | query UNION [ ALL ] query }

orderItem: expression [ ASC | DESC ]

select: SELECT { * | projectItem [, projectItem ]* } FROM tableExpression [ JOIN tableExpression ] [ WHERE booleanExpression ] [ GROUP BY { groupItem [, groupItem ]* } ] [ HAVING booleanExpression ]

selectWithoutFrom: SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* }

projectItem: expression [ [ AS ] columnAlias ] | tableAlias . *



111

tableExpression: tableReference

tableReference: tablePrimary [ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]

tablePrimary: [ TABLE ] [ [ catalogName . ] schemaName . ] tableName | LATERAL TABLE '(' functionName '(' expression [, expression ]* ')' ')' | UNNEST '(' expression ')'

values: VALUES expression [, expression ]*

groupItem: expression | '(' ')' | '(' expression [, expression ]* ')' | CUBE '(' expression [, expression ]* ')' | ROLLUP '(' expression [, expression ]* ')' | GROUPING SETS '(' groupItem [, groupItem ]* ')'

4.7.2 SELECT

SELECTSyntax

SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } FROM tableExpression[ WHERE booleanExpression ] [ GROUP BY { groupItem [, groupItem ]* } ] [ HAVINGbooleanExpression ]

Description

This statement is used to select data from a table.

Precautions

l The to-be-queried table must exist.l WHERE is used to specify the filtering condition, which can be the arithmetic operator,

relational operator, or logical operator.l GROUP BY is used to specify the grouping field, which can be one or more multiple

fields.

Example

Select the order which contains more than 3 pieces of data.

insert into temp SELECT * FROM Orders WHERE units > 3;

WHERE Filtering ClauseSyntax

SELECT { * | projectItem [, projectItem ]* } FROM tableExpression [ WHEREbooleanExpression ]

Description

This statement is used to filter the query results using the WHERE clause.



112

Precautions

l The to-be-queried table must exist.l WHERE filters the records that do not meet the requirements.

Example

Filter orders which contain more than 3 pieces and fewer than 10 pieces of data.

insert into temp SELECT * FROM Orders WHERE units > 3 and units < 10;

HAVING Filtering ClauseFunction

This statement is used to filter the query results using the HAVING clause.

Syntax

SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } FROMtableExpression [ WHERE booleanExpression ] [ GROUP BY { groupItem [,groupItem ]* } ] [ HAVING booleanExpression ]

Description

Generally, HAVING and GROUP BY are used together. GROUP BY applies first forgrouping and HAVING then applies for filtering. The arithmetic operation and aggregatefunction are supported by the HAVING clause.

Precautions

If the filtering condition is subject to the query results of GROUP BY, the HAVING clause,rather than the WHERE clause, must be used for filtering.

Example

Group the student table according to the name field and filter the records in which themaximum score is higher than 95 based on groups.

insert into temp SELECT name, max(score) FROM student GROUP BY name HAVING max(score) >95

Column-Based GROUP BYFunction

This statement is used to group a table based on columns.

Syntax

SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } FROMtableExpression [ WHERE booleanExpression ] [ GROUP BY { groupItem [,groupItem ]* } ]

Description

Column-based GROUP BY can be categorized into single-column GROUP BY and multi-column GROUP BY.

l Single-column GROUP BY indicates that the GROUP BY clause contains only onecolumn.



113

l Multi-column GROUP BY indicates that the GROUP BY clause contains multiplecolumns. The table will be grouped according to all fields in the GROUP BY clause. Therecords whose fields are the same are grouped into one group.

Precautions

None

Example

Group the student table according to the score and name fields and return the groupingresults.

insert into temp SELECT name,score, max(score) FROM student GROUP BY name,score;

Expression-Based GROUP BY

Function

This statement is used to group a table according to expressions.

Syntax

SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } FROMtableExpression [ WHERE booleanExpression ] [ GROUP BY { groupItem [,groupItem ]* } ]

Description

groupItem can have one or more fields. The fields can be called by string functions, butcannot be called by aggregate functions.

Precautions

None

Example

Use the substring function to obtain the character string from the name field, group thestudent table according to the obtained character string, and return each sub character stringand the number of records.

insert into temp SELECT substring(name,6),count(name) FROM student GROUP BY substring(name,6);

GROUP BY Using HAVING

Function

This statement filters a table after grouping it using the HAVING clause.

Syntax

SELECT [ ALL | DISTINCT ] { * | projectItem [, projectItem ]* } FROMtableExpression [ WHERE booleanExpression ] [ GROUP BY { groupItem [,groupItem ]* } ] [ HAVING booleanExpression ]

Description

Generally, HAVING and GROUP BY are used together. GROUP BY applies first forgrouping and HAVING then applies for filtering.



114

Precautions

l If the filtering condition is subject to the query results of GROUP BY, the HAVINGclause, rather than the WHERE clause, must be used for filtering. HAVING and GROUPBY are used together. GROUP BY applies first for grouping and HAVING then appliesfor filtering.

l Fields used in HAVING, except for those used for aggregate functions, must exist inGROUP BY.

l The arithmetic operation and aggregate function are supported by the HAVING clause.

Example

Group the transactions according to num, use the HAVING clause to filter the records inwhich the maximum value derived from multiplying price with amount is higher than 5000,and return the filtered results.

insert into temp SELECT num, max(price*amount) FROM transactions WHERE time > '2016-06-01' GROUP BY num HAVING max(price*amount)>5000;

UNION

Syntax

query UNION [ ALL ] query

Description

This statement is used to return the union set of multiple query results.

Precautions

l Set operation is to join tables from head to tail under certain conditions. The quantity ofcolumns returned by each SELECT statement must be the same. Column types must bethe same. Column names can be different.

l By default, the repeated records returned by UNION are removed. The repeated recordsreturned by UNION ALL are not removed.

Example

Output the union set of Orders1 and Orders2 without duplicate records.

insert into temp SELECT * FROM Orders1 UNION SELECT * FROM Orders2;

4.7.3 Condition Expression

CASE Expression

Syntax

CASE value WHEN value1 [, value11 ]* THEN result1 [ WHEN valueN [, valueN1 ]*THEN resultN ]* [ ELSE resultZ ] END

or

CASE WHEN condition1 THEN result1 [ WHEN conditionN THEN resultN ]* [ ELSEresultZ ] END

Description



115

l If the value of value is value1, result1 is returned. If the value is not any of the valueslisted in the clause, resultZ is returned. If no else statement is specified, null is returned.

l If the value of condition1 is true, result1 is returned. If the value does not match anycondition listed in the clause, resultZ is returned. If no else statement is specified, null isreturned.

Precautions

l All results must be of the same type.l All conditions must be of the Boolean type.l If the value does not match any condition, the value of ELSE is returned when the else

statement is specified, and null is returned when no else statement is specified.

Example

If the value of units equals 5, 1 is returned. Otherwise, 0 is returned.

Example 1

insert into temp SELECT CASE units WHEN 5 THEN 1 ELSE 0 END FROM Orders;

Example 2

insert into temp SELECT CASE WHEN units = 5 THEN 1 ELSE 0 END FROM Orders;

NULLIF ExpressionSyntax

NULLIF(value, value)

Description

If the values are the same, NULL is returned. For example, NULL is returned from NULLIF(5,5) and 5 is returned from NULLIF (5,0).

Precautions

None

Example

If the value of units equals 3, null is returned. Otherwise, the value of units is returned.

insert into temp SELECT NULLIF(units, 3) FROM Orders;

COALESCE ExpressionSyntax

COALESCE(value, value [, value ]* )

Description

Return the first value that is not NULL, counting from left to right.

Precautions

All values must be of the same type.

Example



116

5 is returned from the following example:

insert into temp SELECT COALESCE(NULL, 5) FROM Orders;

4.7.4 Window

GROUP WINDOW

Description

Group Window is defined in GROUP BY. One record is generated from each group. GroupWindow involves the following functions:

l Array functions

Table 4-28 Array functions

Function Name Description

TUMBLE(time_attr, interval) Indicates the tumble window.time_attr can be set to processing-timeor event-time.interval specifies the window period.

HOP(time_attr, interval, interval) Indicates the extended tumble window(similar to the datastream slidingwindow). You can set the outputtriggering cycle and window period.

SESSION(time_attr, interval) Indicates the session window. A sessionwindow will be closed if no response isreturned within a duration specified byinterval.

l Window functions

Table 4-29 Window functions


TUMBLE_START(time_attr, interval) Indicates the start time of returning to thetumble window.

TUMBLE_END(time_attr, interval) Indicates the end time of returning to thetumble window.

HOP_START(time_attr, interval,interval)

Indicates the start time of returning to theextended tumble window.

HOP_END(time_attr, interval, interval) Indicates the end time of returning to theextended tumble window.

SESSION_START(time_attr, interval) Indicates the start time of returning to thesession window.



117


SESSION_END(time_attr, interval) Indicates the end time of returning to thesession window.

Example

//Calculate the SUM every day (event time).insert into temp SELECT name, TUMBLE_START(ts, INTERVAL '1' DAY) as wStart, SUM(amount) FROM Orders GROUP BY TUMBLE(ts, INTERVAL '1' DAY), name;

//Calculate the SUM every day (processing time). insert into temp SELECT name, SUM(amount) FROM Orders GROUP BY TUMBLE(proctime, INTERVAL '1' DAY), name;

//Calculate the SUM over the recent 24 hours every hour (event time).insert into temp SELECT product, SUM(amount) FROM Orders GROUP BY HOP(ts, INTERVAL '1' HOUR, INTERVAL '1' DAY), product;

//Calculate the SUM of each session and an inactive interval every 12 hours (event time).insert into temp SELECT name, SESSION_START(ts, INTERVAL '12' HOUR) AS sStart, SESSION_END(ts, INTERVAL '12' HOUR) AS sEnd, SUM(amount) FROM Orders GROUP BY SESSION(ts, INTERVAL '12' HOUR), name;

Over Window

The difference between Over Window and Group Window is that one record is generatedfrom one row in Over Window.

Syntax

OVER ([PARTITION BY partition_name]ORDER BY proctime|rowtime(ROWSnumber PRECEDING) |(RANGE (BETWEEN INTERVAL '1' SECOND PRECEDINGAND CURRENT ROW | UNBOUNDED preceding)))

Description

Table 4-30


PARTITION BY Indicates the primary key of the specifiedgroup. Each group separately performscalculation.

ORDER BY Indicates the processing time or event timeas the timestamp for data.



118


ROWS Indicates the count window.

RANGE Indicates the time window.

Precautions

l In the same SELECT statement, windows defined by aggregate functions must be thesame.

l Currently, Over Window only supports forward calculation (preceding).

l The value of ORDER BY must be specified as processing time or event time.

l Constants do not support aggregation, such as sum(2).

Example

//Calculate the count and total number from syntax rules enabled to now (in proctime).insert into temp SELECT name, count(amount) OVER (PARTITION BY name ORDER BY proctime RANGE UNBOUNDED preceding) as cnt1, sum(amount) OVER (PARTITION BY name ORDER BY proctime RANGE UNBOUNDED preceding) as cnt2 FROM Orders; //Calculate the count and total number of the recent four records (in proctime).insert into temp SELECT name, count(amount) OVER (PARTITION BY name ORDER BY proctime ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) as cnt1, sum(amount) OVER (PARTITION BY name ORDER BY proctime ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) as cnt2 FROM Orders;

//Calculate the count and total number last 60s (in eventtime). Process the events based on event time, which is the timeattr field in Orders.insert into temp SELECT name, count(amount) OVER (PARTITION BY name ORDER BY timeattr RANGE BETWEEN INTERVAL '60' SECOND PRECEDING AND CURRENT ROW) as cnt1, sum(amount) OVER (PARTITION BY name ORDER BY timeattr RANGE BETWEEN INTERVAL '60' SECOND AND CURRENT ROW) as cnt2 FROM Orders;

4.7.5 JOIN Between Stream Data and Table DataThe JOIN operation allows you to query data from a table and write the query result to thesink stream.

Syntax

Syntax

FROM tableExpression JOIN tableExpression ON value11 = value21 [ AND value12 =value22]

Description



119

The ON keyword only supports equivalent query of table attributes. If level-2 keys exist(specifically, the Redis value type is HASH), the AND keyword needs to be used to expressthe equivalent query between Key and Hash Key.

Precautions

None

ExamplePerform equivalent JOIN between the vehicle information source stream and the vehicle pricetable, get the vehicle price data, and write the price data into the vehicle information sinkstream.

CREATE SOURCE STREAM car_infos ( car_id STRING, car_owner STRING, car_brand STRING, car_detail_type STRING)WITH ( type = "dis", region = "cn-north-1", channel = "csinput", partition_count = "1", encode = "csv", field_delimiter = ",");

/** Create a data dimension table to connect to the source stream to fulfill field backfill. * * Reconfigure the following options according to actual conditions: * value_type: indicates the value type of the Redis key value. The value can be STRING, HASH, SET, ZSET, or LIST. For the HASH type, you need to specify hash_key_column as the layer-2 primary key. For the SET type, you need to concatenate all queried values using commas (,). * key_column: indicates the column name corresponding to the primary key of the dimension table. * hash_key_column: indicates the column name corresponding to the KEY of the HASHMAP when value_type is HASH. If value_type is not HASH, you do not need to set this option. * cluster_address: indicates the DCS Redis cluster address. * password: indicates the DCS Redis cluster password. **/CREATE TABLE car_price_table ( car_brand STRING, car_detail_type STRING, car_price STRING)WITH ( type = "dcs_redis", value_type = "hash", key_column = "car_brand", hash_key_column = "car_detail_type", cluster_address = "192.168.1.238:6379", password = "xxxxxxxx");

CREATE SINK STREAM audi_car_owner_info ( car_id STRING, car_owner STRING, car_brand STRING, car_detail_type STRING, car_price STRING)WITH (



120

type = "dis", region = "cn-north-1", channel = "csoutput", partition_key = "car_owner", encode = "csv", field_delimiter = ",");

INSERT INTO audi_car_owner_infoSELECT t1.car_id, t1.car_owner, t2.car_brand, t1.car_detail_type, t2.car_priceFROM car_infos as t1 join car_price_table as t2ON t2.car_brand = t1.car_brand and t2.car_detail_type = t1.car_detail_typeWHERE t1.car_brand = "audi";

4.8 Configuring Time ModelsFlink provides two time models: processing time and event time.

CS allows you to specify the time model during creation of the source stream and temporarystream.

Configuring Processing TimeProcessing time refers to the system time, which is irrelevant to the data timestamp.

Syntax

CREATE SOURCE STREAM stream_name(...) WITH (...)

TIMESTAMP BY proctime.proctime;

CREATE TEMP STREAM stream_name(...)

TIMESTAMP BY proctime.proctime;

Description

To set the processing time, you only need to add proctime.proctime following TIMESTAMPBY. You can directly use the proctime field later.

Precautions

None

ExampleCREATE SOURCE STREAM student_scores ( student_number STRING, /* Student ID */ student_name STRING, /* Name */ subject STRING, /* Subject */ score INT /* Score */)WITH ( type = "dis", region = "cn-north-1", channel = "csinput", partition_count = "1", encode = "csv", field_delimiter=",")TIMESTAMP BY proctime.proctime;

INSERT INTO score_greate_90SELECT student_name, sum(score) over (order by proctime RANGE UNBOUNDED PRECEDING) FROM student_scores;



121

Configuring Event TimeEvent Time refers to the time when an event is generated, that is, the timestamp generatedduring data generation.

Syntax

CREATE SOURCE STREAM stream_name(...) WITH (...)

TIMESTAMP BY {attr_name}.rowtime

SET WATERMARK (RANGE {time_interval} | ROWS {literal}, {time_interval});

Description

To set the event time, you need to select a certain attribute in the stream as the timestamp andset the watermark policy.

Out-of-order events or late events may occur due to network faults. The watermark must beconfigured to trigger the window for calculation after waiting for a certain period of time.Watermarks are mainly used to process out-of-order data before generated events are sent toCS during stream processing.

The following two watermark policies are available:

l By time intervalSET WATERMARK(range interval {time_unit}, interval {time_unit})

l By event quantitySET WATERMARK(rows literal, interval {time_unit})

NOTE

Parameters are separated by commas (,). The first parameter indicates the watermark sending intervaland the second indicates the maximum event delay.

Precautions

None

Example

l Send the watermark every 10s with the maximum event delay of 20s.CREATE SOURCE STREAM student_scores ( student_number STRING, /* Student ID */ student_name STRING, /* Name */ subject STRING, /* Subject */ score INT, /* Score */ time2 BIGINT)WITH ( type = "dis", region = "cn-north-1", channel = "csinput", partition_count = "1", encode = "csv", field_delimiter=",")TIMESTAMP BY time2.rowtimeSET WATERMARK (RANGE interval 10 second, interval 20 second);

INSERT INTO score_greate_90SELECT student_name, sum(score) over (order by time2 RANGE UNBOUNDED PRECEDING) FROM student_scores;



122

l Send the watermark each time 10 pieces of data are received and the maximum eventdelay is 20s.CREATE SOURCE STREAM student_scores ( student_number STRING, /* Student ID */ student_name STRING, /* Name */ subject STRING, /* Subject */ score INT, /* Score */ time2 BIGINT)WITH ( type = "dis", region = "cn-north-1", channel = "csinput", partition_count = "1", encode = "csv", field_delimiter=",")TIMESTAMP BY time2.rowtimeSET WATERMARK (ROWS 10, interval 20 second);

INSERT INTO score_greate_90SELECT student_name, sum(score) over (order by time2 RANGE UNBOUNDED PRECEDING) FROM student_scores;

4.9 CEP Based on Pattern MatchingComplex event processing (CEP) is used to detect complex patterns in endless data streams soas to identify and search patterns in various data rows. Pattern matching is a powerful aid tocomplex event handling.

CEP is used in a collection of event-driven business processes, such as abnormal behaviordetection in secure applications and the pattern of searching for prices, transaction volume,and other behavior in financial applications. It also applies to fraud detection and sensor dataanalysis.

SyntaxMATCH_RECOGNIZE ( [ PARTITION BY expression [, expression ]* ] [ ORDER BY orderItem [, orderItem ]* ] [ MEASURES measureColumn [, measureColumn ]* ] [ ONE ROW PER MATCH | ALL ROWS PER MATCH ] [ AFTER MATCH ( SKIP TO NEXT ROW | SKIP PAST LAST ROW | SKIP TO FIRST variable | SKIP TO LAST variable | SKIP TO variable ) ] PATTERN ( pattern ) [ WITHIN intervalLiteral ] [ SUBSET subsetItem [, subsetItem ]* ] DEFINE variable AS condition [, variable AS condition ]*) MR



123

NOTE

Pattern matching in SQL is performed using the MATCH_RECOGNIZE clause.MATCH_RECOGNIZE enables you to do the following tasks:

l Logically partition and order the data that is used in the MATCH_RECOGNIZE clause with itsPARTITION BY and ORDER BY clauses.

l Define patterns of rows to seek using the PATTERN clause of the MATCH_RECOGNIZE clause.These patterns use regular expression syntax.

l Specify the logical conditions required to map a row to a row pattern variable in the DEFINE clause.

l Define measures, which are expressions usable in other parts of the SQL query, in the MEASURESclause.

Syntax Description



Description

PARTITION BY No Logically divides the rows into groups.

ORDER BY No Logically orders the rows in a partition.

[ONE ROW | ALLROWS] PER MATCH

No Chooses summaries or details for each match.l ONE ROW PER MATCH: Each match produces

one summary row.l ALL ROWS PER MATCH: A match spanning

multiple rows will produce one output row foreach row in the match.

An example is provided as follows: SELECT * FROM MyTable MATCH_RECOGNIZE ( MEASURES AVG(B.id) as Bid ALL ROWS PER MATCH PATTERN (A B C) DEFINE A AS A.name = 'a', B AS B.name = 'b', C as C.name = 'c' ) MRExample descriptionAssume that the format of MyTable is (id, name) andthere are three data records: (1, a), (2, b), and (3, c).ONE ROW PER MATCH outputs the average value 2of B.ALL ROWS PER MATCH outputs each record andthe average value of B, specifically, (1,a, null), (2,b,2),(3,c,2).

MEASURES No Defines calculations for export from the patternmatching.



124


Description

PATTERN Yes Defines the row pattern that will be matched.l PATTERN (A B C) indicates to detect

concatenated events A, B, and C.l PATTERN (A | B) indicates to detect A or B.l - A - is valid only in the ALL ROWS PER

MATCH. If you specify to exclude a patternvariable from the output, then the row pattern thatis matched will not be output. For example:

SELECT * FROM MyTable MATCH_RECOGNIZE ( ALL ROWS PER MATCH PATTERN (A {- B -} C) DEFINE A AS A.name = 'a', B AS B.name = 'b', C as C.name = 'c' ) MRExample descriptionAssume that the format of MyTable is (id, name) andthere are three data records: (1, a), (2, b), and (3, c).Pattern B is excluded. Patterns A, B, and C aredetected, but only the following records are output:(1,a, null), (3,c,2), B.l Modifiers

– *: 0 or more iterations. For example, A*indicates to match A for 0 or more times.

– +: 1 or more iterations. For example, A+indicates to match A for 1 or more times.

– ? : 0 or 1 iteration. For example, A? indicates tomatch A for 0 times or once.

– {n}: n iterations (n > 0). For example, A{5}indicates to match A for five times.

– {n,}: n or more iterations (n ≥ 0). For example,A{5,} indicates to match A for five or moretimes.

– {n, m}: between n and m (inclusive) iterations(0 ≤ n ≤ m, 0 < m). For example, A{3,6}indicates to match A for 3 to 6 times.

– {, m}: between 0 and m (inclusive) iterations(m > 0). For example, A{,4} indicates to matchA for 0 to 4 times.



125


Description

SUBSET No Defines union row pattern variables.In this following example, E is a combination of Band C. The average value of E.id is the average valueof the subset BC. SELECT * FROM MyTable MATCH_RECOGNIZE ( MEASURES AVG(E.id) as eid ONE ROW PER MATCH PATTERN (A B C A) SUBSET E = (B,C) DEFINE A AS A.name = 'a', B AS B.name = 'b', C as C.name = 'c' ) MR

DEFINE Yes Defines primary pattern variables.

AFTER MATCHSKIP

No Defines where to restart the matching process after amatch is found.l SKIP TO NEXT ROW: Resumes pattern matching

at the row after the first row of the current match.l SKIP PAST LAST ROW: Resumes pattern

matching at the next row after the last row of thecurrent match.

l SKIP TO FIRST variable: Resumes patternmatching at the first row that is mapped to thepattern variable.

l SKIP TO LAST variable: Resumes patternmatching at the last row that is mapped to thepattern variable.

l SKIP TO variable: Same as SKIP TO LASTvariable.

Functions Supported by CEP

Table 4-32 Function description

Function Description

MATCH_NUMBER()

Finds which rows are in which match. It can be used in theMEASURES and DEFINE clauses.

CLASSIFIER() Finds which pattern variable applies to which rows. It can be usedin the MEASURES and DEFINE clauses.



126

Function Description

FIRST()/LAST() FIRST returns the value of an expression evaluated in the first rowof the group of rows mapped to a pattern variable. LAST returnsthe value of an expression evaluated in the last row of the group ofrows mapped to a pattern variable. In PATTERN (A B+ C), FIRST(B.id) indicates the ID of the first B in the match, and LAST (B.id)indicates the ID of the last B in the match.

NEXT()/PREV() Relative offset, which can be used in DEFINE. For example,PATTERN (A B+) DEFINE B AS B.price > PREV(B.price)

RUNNING/FINAL RUNNING indicates to match the middle value, while FINALindicates to match the final result value. Generally, RUNNING/FINAL is valid only in ALL ROWS PER MATCH. For example,if there are three records (a, 2), (b, 6), and (c, 12), then the valuesof RUNNING AVG (A.price) and FINAL AVG (A.price) are (2,6),(4,6), (6,6).

Aggregate functions(COUNT, SUM,AVG, MAX, MIN)

Aggregation operations. These functions can be used in theMEASURES and DEFINE clauses. For details about aggregationfunctions, see Aggregate Functions.

Examplel Fake plate vehicle detection

CEP conducts pattern matching based on license plate switchover features on the data ofvehicles collected by cameras installed on urban roads or high-speed roads in different areaswithin 5 minutes.

INSERT INTO fake_licensed_carSELECT * FROM camera_license_data MATCH_RECOGNIZE( PARTITION BY car_license_number ORDER BY proctime MEASURES A.car_license_number as car_license_number, A.camera_zone_number as first_zone, B.camera_zone_number as second_zone ONE ROW PER MATCH AFTER MATCH SKIP TO LAST C PATTERN (A B+ C) WITHIN interval '5' minute DEFINE B AS B.camera_zone_number <> A.camera_zone_number, C AS C.camera_zone_number = A.camera_zone_number) MR;

According to this rule, if a vehicle of a license plate number drives from area A to area B butanother vehicle of the same license plate number is detected in area A within 5 minutes, thenthe vehicle in area A is considered to carry a fake license plate.

Input data:

Zhejiang B88888, zone_A

Zhejiang AZ626M, zone_A




127



Zhejiang B88888, zone_B

Zhejiang B88888, zone_B

Zhejiang AZ626M, zone_B

Zhejiang AZ626M, zone_B

Zhejiang AZ626M, zone_C



The output is as follows:

Zhejiang B88888, zone_A, zone_B

l Alarm suppression. If event A is consecutively reported for multiple times, only event Ais reported only once.

INSERT INTO inhibitionSELECT * FROM event MATCH_RECOGNIZE( MEASURES FIRST(B.event_name) as Bname ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (B+?) DEFINE B AS B.event_name <> PREV(B.event_name) or PREV(B.event_name) is null) MR;

This statement indicates that an event that is different from the previous one is reported, whilea detected duplicate event is not reported.

Input data:

1,A

2,A

3,A

4,B

5,B

6,C

7,D

8,D

Output data:

A

B

C

D



128

4.10 Reserved KeywordsStream SQL reserves some strings as keywords. If you want to use the following characterstrings as field names, ensure that they are enclosed by back quotes, for example, `value` and`count`.

A

Table 4-33 Keywords starting with A

Reserved Keywords

A ABS ABSOLUTE

ACTION

ADA ADD ADMIN AFTER ALL ALLOCATE

ALLOW

ALTER ALWAYS

AND ANY ARE ARRAY AS ASC ASENSITIVE

ASSERTION

ASSIGNMENT

ASYMMETRIC

AT ATOMIC

ATTRIBUTE

ATTRIBUTES

AUTHORIZATION

AVG -

B

Table 4-34 Keywords starting with B

Reserved Keywords

BEFORE

BEGIN BERNOULLI

BETWEEN

BIGINT BINARY

BIT BLOB BOOLEAN

BOTH

BREADTH

BY - - - - - - - -

C

Table 4-35 Keywords starting with C

Reserved Keywords

C CALL CALLED

CARDINALITY

CASCADE

CASCADED

CASE CAST CATALOG

CATALOG_NAME



129

Reserved Keywords

CEIL CEILING

CENTURY

CHAIN CHAR CHARACTER

CHARACTERISTICTS

CHARACTERS

CHARACTER_LENGTH

CHARACTER_SET_CATALOG

CHARACTER_SET_NAME

CHARACTER_SET_SCHEMA

CHAR_LENGTH

CHECK CLASS_ORIGIN

CLOB CLOSE COALESCE

COBOL COLLATE

COLLATION

COLLATION_CATALOG

COLLATION_NAME

COLLATION_SCHEMA

COLLECT

COLUMN

COLUMN_NAME

COMMAND_FUNCTION

COMMAND_FUNCTION_CODE

COMMIT

COMMITTED

CONDITION

CONDITION_NUMBER

CONNECT

CONNECTION

CONNECTION_NAME

CONSTRAINT

CONSTRAINTS

CONSTRAINT_CATALOG

CONSTRAINT_NAME

CONSTRAINT_SCHEMA

CONSTRUCTOR

CONTAINS

CONTINUE

CONVERT

CORR CORRESPONDING

COUNT COVAR_POP

COVAR_SAMP

CREATE

CROSS CUBE CUME_DIST

CURRENT

CURRENT_CATALOG

CURRENT_DATE

CURRENT_DEFAULT_TRANSFORM_GROUP

CURRENT_PATH

CURRENT_ROLE

CURRENT_SCHEMA

CURRENT_TIME

CURRENT_TIMESTAMP

CURRENT_TRANSFORM_GROUP_FOR_TYPE

CURRENT_USER

CURSOR

CURSOR_NAME

CYCLE - -



130

D

Table 4-36 Keywords starting with D

Reserved Keywords

DATA DATABASE

DATE DATETIME_INTERVAL_CODE

DATETIME_INTERVAL_PRECISION

DAY DEALLOCATE

DEC DECADE

DECIMAL

DECLARE

DEFAULT

DEFAULTS

DEFERRABLE

DEFERRED

DEFINED

DEFINER

DEGREE

DELETE

DENSE_RANK

DEPTH DEREF DERIVED

DESC DESCRIBE

DESCRIPTION

DESCRIPTOR

DETERMINISTIC

DIAGNOSTICS

DISALLOW

DISCONNECT

DISPATCH

DISTINCT

DOMAIN

DOUBLE

DOW DOY DROP DYNAMIC

DYNAMIC_FUNCTION

DYNAMIC_FUNCTION_CODE

- - - - - - - - -

E

Table 4-37 Keywords starting with E

Reserved Keywords

EXCEPT

EACH ELEMENT

ELSE END END-EXEC

EPOCH EQUALS

ESCAPE

EVERY

EXTERNAL

EXCEPTION

EXCLUDE

EXCLUDING

EXEC EXECUTE

EXISTS EXP EXPLAIN

EXTEND

EXTRACT

- - - - - - - - -



131

F

Table 4-38 Keywords starting with F

Reserved Keywords

FOLLOWING

FOR FALSE FETCH FILTER FINAL FIRST FIRST_VALUE

FLOAT FLOOR

FUSION

FOREIGN

FORTRAN

FOUND FRAC_SECOND

FREE FROM FULL FUNCTION

-

G

Table 4-39 Keywords starting with G

Reserved Keywords

G GENERAL

GENERATED

GET GLOBAL

GO GOTO GRANT GRANTED

GROUP

GROUPING

- - - - - - - - -

H

Table 4-40 Keywords starting with H

Reserved Keywords

HAVING

HIERARCHY

HOLD HOUR - - - - - -

I

Table 4-41 Keywords starting with I

Reserved Keywords

IDENTITY

IMMEDIATE

IMPLEMENTATION

IMPORT

IN INCLUDING

INCREMENT

INDICATOR

INITIALLY

INNER

INOUT INPUT INSENSITIVE

INSERT INSTANCE

INSTANTIABLE

INT INTEGER

INTERSECT

INTERSECTION



132

Reserved Keywords

INTERVAL

INTO INVOKER

IS ISOLATION

- - - - -

J

Table 4-42 Keywords starting with J

Reserved Keywords

JAVA JOIN - - - - - - - -

K

Table 4-43 Keywords starting with K

Reserved Keywords

K KEY KEY_MEMBER

KEY_TYPE

- - - - - -

L

Table 4-44 Keywords starting with L

Reserved Keywords

LABEL LANGUAGE

LARGE LAST LAST_VALUE

LATERAL

LEADING

LEFT LENGTH

LEVEL

LIBRARY

LIKE LIMIT - - - - - - -

M

Table 4-45 Keywords starting with M

Reserved Keywords

M MAP MATCH MATCHED

MAX MAXVALUE

MEMBER

MERGE MESSAGE_LENGTH

MESSAGE_OCTET_LENGTH



133

Reserved Keywords

MESSAGE_TEXT

METHOD

MICROSECOND

MILLENNIUM

MIN MINUTE

MINVALUE

MOD MODIFIES

MODULE

MONTH

MORE MULTISET

MUMPS

- - - - - -

N

Table 4-46 Keywords starting with N

Reserved Keywords

NAME NAMES NATIONAL

NATURAL

NCHAR NCLOB NESTING

NEW NEXT NO

NONE NORMALIZE

NORMALIZED

NOT NULL NULLABLE

NULLIF NULLS NUMBER

NUMERIC

O

Table 4-47 Keywords starting with O

Reserved Keywords

OBJECT

OCTETS

OCTET_LENGTH

OF OFFSET

OLD ON ONLY OPEN OPTION

OPTIONS

OR ORDER ORDERING

ORDINALITY

OTHERS

OUT OUTER OUTPUT

OVER

OVERLAPS

OVERLAY

OVERRIDING

- - - - - - -



134

P

Table 4-48 Keywords starting with P

Reserved Keywords

PAD PARAMETER

PARAMETER_MODE

PARAMETER_NAME

PARAMETER_ORDINAL_POSITION

PARAMETER_SPECIFIC_CATALOG

PARAMETER_SPECIFIC_NAME

PARAMETER_SPECIFIC_SCHEMA

PARTIAL

PARTITION

PASCAL

PASSTHROUGH

PATH PERCENTILE_CONT

PERCENTILE_DISC

PERCENT_RANK

PLACING

PLAN PLI POSITION

POWER PRECEDING

PRECISION

PREPARE

PRESERVE

PRIMARY

PRIOR PRIVILEGES

PROCEDURE

PUBLIC

Q

Table 4-49 Keywords starting with Q

Reserved Keywords

QUARTER

- - - - - - - - -

R

Table 4-50 Keywords starting with R

Reserved Keywords

RANGE RANK READ READS REAL RECURSIVE

REF REFERENCES

REFERENCING

REGR_AVGX

REGR_AVGY

REGR_COUNT

REGR_INTERCEPT

REGR_R2

REGR_SLOPE

REGR_SXX

REGR_SXY

REGR_SYY

RELATIVE

RELEASE

REPEATABLE

RESET RESTART

RESTRICT

RESULT

RETURN

RETURNED_CARDINALITY

RETURNED_LENGTH

RETURNED_OCTET_LENGTH

RETURNED_SQLSTATE



135

Reserved Keywords

RETURNS

REVOKE

RIGHT ROLE ROLLBACK

ROLLUP

ROUTINE

ROUTINE_CATALOG

ROUTINE_NAME

ROUTINE_SCHEMA

ROW ROWS ROW_COUNT

ROW_NUMBER

- - - - - -

S

Table 4-51 Keywords starting with S

Reserved Keywords

SAVEPOINT

SCALE SCHEMA

SCHEMA_NAME

SCOPE SCOPE_CATALOGS

SCOPE_NAME

SCOPE_SCHEMA

SCROLL

SEARCH

SECOND

SECTION

SECURITY

SELECT

SELF SENSITIVE

SEQUENCE

SERIALIZABLE

SERVER

SERVER_NAME

SESSION

SESSION_USER

SET SETS SIMILAR

SIMPLE SIZE SMALLINT

SOME SOURCE

SPACE SPECIFIC

SPECIFICTYPE

SPECIFIC_NAME

SQL SQLEXCEPTION

SQLSTATE

SQLWARNING

SQL_TSI_DAY

SQL_TSI_FRAC_SECOND

SQL_TSI_HOUR

SQL_TSI_MICROSECOND

SQL_TSI_MINUTE

SQL_TSI_MONTH

SQL_TSI_QUARTER

SQL_TSI_SECOND

SQL_TSI_WEEK

SQL_TSI_YEAR

SQRT START

STATE STATEMENT

STATIC STDDEV_POP

STDDEV_SAMP

STREAM

STRUCTURE

STYLE SUBCLASS_ORIGIN

SUBMULTISET

SUBSTITUTE

SUBSTRING

SUM SYMMETRIC

SYSTEM

SYSTEM_USER

- - - -



136

T

Table 4-52 Keywords starting with T

Reserved Keywords

TABLE TABLESAMPLE

TABLE_NAME

TEMPORARY

THEN TIES TIME TIMESTAMP

TIMESTAMPADD

TIMESTAMPDIFF

TIMEZONE_HOUR

TIMEZONE_MINUTE

TINYINT

TO TOP_LEVEL_COUNT

TRAILING

TRANSACTION

TRANSACTIONS_ACTIVE

TRANSACTIONS_COMMITTED

TRANSACTIONS_ROLLED_BACK

TRANSFORM

TRANSFORMS

TRANSLATE

TRANSLATION

TREAT TRIGGER

TRIGGER_CATALOG

TRIGGER_NAME

TRIGGER_SCHEMA

TRIM

TRUE TYPE - - - - - - - -

U

Table 4-53 Keywords starting with U

Reserved Keywords

UESCAPE

UNBOUNDED

UNCOMMITTED

UNDER UNION UNIQUE

UNKNOWN

UNNAMED

UNNEST

UPDATE

UPPER UPSERT

USAGE USER USER_DEFINED_TYPE_CATALOG

USER_DEFINED_TYPE_CODE

USER_DEFINED_TYPE_NAME

USER_DEFINED_TYPE_SCHEMA

USING -

V

Table 4-54 Keywords starting with V

Reserved Keywords

VALUE VALUES

VARBINARY

VARCHAR

VARYING

VAR_POP

VAR_SAMP

VERSION

VIEW -



137

W

Table 4-55 Keywords starting with W

Reserved Keywords

WEEK WHEN WHENEVER

WHERE WIDTH_BUCKET

WINDOW

WITH WITHIN

WITHOUT

WORK

WRAPPER

WRITE - - - - - - - -

X

Table 4-56 Keywords starting with X

Reserved Keywords

XML - - - - - - - - -

Y

Table 4-57 Keywords starting with Y

Reserved Keywords

YEAR - - - - - - - - -

Z

Table 4-58 Keywords starting with Z

Reserved Keywords

ZONE - - - - - - - - -



138

5 FAQ

5.1 What Is CS?Cloud Stream Service (CS) is a real-time big data stream analysis service running on thepublic cloud. Computing clusters are fully managed by CS, enabling you to focus on StreamSQL services. CS is compatible with Apache Flink APIs, and CS jobs run in real time.

Promoted by Huawei in the IT field, CS is a distributed real-time stream computing systemfeaturing low latency (millisecond-level latency), high throughput, and high reliability.Powered on Flink, CS integrates Huawei enhanced features and security, and supports bothstream processing and batch processing methods. It provides mandatory Stream SQL featuresfor data processing, and will add algorithms of machine learning and graph computing toStream SQL in the future.

5.2 What Are the Features and Advantages of CS?CS has the following features and advantages:

l Distributed real-time computingLarge-scale cluster computing and auto scaling of clusters reduce costs greatly.

l Fully hosted clustersCS provides visualized information on running jobs.

l Pay-as-you-goThe pricing unit is stream processing unit (SPU), and an SPU contains one core and 4GB memory. You are charged based on the running duration of specified SPUs, accurateto seconds.

l Secure isolationTriple security protection mechanisms for tenants ensure secure job running. Tenants'computing clusters are physically isolated from each other and protected by independentsecurity configurations.

l High throughput and low latencyCS reads data from DIS and enables real-time computing services with millisecond-levellatency. It also supports natural backpressure and high-throughput pressure.

Cloud Stream ServiceUser Guide 5 FAQ


139

l Stream SQL online analysisAggregation functions, such as Window and Join, are supported. SQL is used to expressbusiness logic, facilitating service implementation.

l Online SQL job testJob debugging helps you check whether the SQL statement logic is correct. After sampledata is input manually or using OBS buckets, the correct SQL statement logic will exportresults as expected.

l Support for Flink streaming SQL edge jobsIn certain scenarios, data needs to be analyzed and processed near where data isgenerated when a large amount of data is generated on edge devices, which reduces theamount of data to be migrated to the cloud and improves real-time data processing. Withcombination of CS and IEF, stream computing applications are deployed on edge nodesto realize real-time data computing at edge, not on the cloud. CS then edits and deliversthe stream processing job to edge nodes for execution. This helps you quickly andaccurately analyze and process streaming data at the edge in real time.

l Exclusive cluster creation and resource quota allocation for jobsTenants can create exclusive clusters, which are physically isolated from shared clustersand other tenants' clusters and are not subject to other jobs. Tenants can also configurethe maximum SPU quota for their exclusive clusters and allocate available clusters andSPU quota for sub-users.

l Customized Flink jobYou can submit customized Flink jobs in exclusive clusters.

l Support for Spark streaming and structured streamingYou can submit customized Spark streaming jobs in exclusive clusters.

l Interconnection with SMNCS can connect to SMN, enabling transmission of the alarms generated in real-time dataanalysis to user's mobile phones in IoT scenarios.

l Interconnection with KafkaCS can connect to Kafka clusters, enabling you to use SQL statements to read data fromKafka and write data into Kafka.

l Interconnection with CloudTableCS can connect to CloudTable so that the stream data can be stored in tables.

l Interconnection with Cloud Search ServiceAfter CS interconnects with Cloud Search Service, you can enjoy the fully compatibleopen-source Elasticsearch to implement multi-condition retrieval, statistics, andreporting of structured and unstructured text.

l Interconnection with DCSDCS provides Redis-compatible, secure, reliable, out-of-the-box, distributed cachecapabilities allowing elastic scaling and convenient management. CS can interconnectwith DCS to meet users' requirements for high concurrency and fast data access.

5.3 What Are the Application Scenarios of CS?CS focuses on Internet and IoT service scenarios that require timeliness and high throughput.Basically, CS provides IoV services, online log analysis, online machine learning, onlinegraph computing, and online algorithm application recommendation for multiple industries,



140

such as small- and medium-sized enterprises in the Internet industry, IoT, IoV, and anti-financial fraud.

l Real-time stream analysisPurpose: to analyze big data in real timeFeature: Complex stream analysis methods, such as Window, CEP, and Join, can beperformed on stream data with millisecond-level latency.Application scenarios: real-time log analysis, network traffic monitoring, real-time riskcontrol, real-time data statistics, and real-time data ETL

l IoTPurpose: to analyze online IoT dataFeature: IoT services call the APIs of CS. CS then reads sensor data in real time andexecutes users' analysis logic. Analysis results are sent to services, such as DIS and RDS,for data persistency, alarm or report display, or visual display of results.Application scenarios: elevator IoT, industrial IoT, shared bicycles, IoV, and smart home

5.4 Which Data Sources Does CS Support?CS can analyze data from DIS, Kafka clusters, CloudTable, and OBS.

5.5 Where Can the Job Results be Exported?The CS job result data can be exported to the following services:

l DISl RDSl SMNl Kafka clusterl CloudTablel Cloud Search Servicel DCS Redis instance

5.6 Which Data Formats Does CS Support?l CS can read and store data in CSV or JSON format from and on DIS.l CS only reads and stores data in CSV format from and on OBS.l CS can send data in text format to SMN.l CS can read data from Kafka and write data in JSON format into Kafka.l CS can read data from CloudTable and store data in tables in CloudTable.l CS can send JSON data to Cloud Search Service.l CS can send data in key-value format to DCS Redis.

5.7 What Kind of Code-based Jobs Does CS Support?CS supports jobs developed using SQL statements and user-defined jobs using JAR files.



141

5.8 What Is the SPU?SPU is the charging unit of CS. In standard configuration, an SPU includes one core and 4 GBmemory. Multiple SPUs can be configured for a job.

5.9 How Is Job Concurrency Implemented?In CS, job concurrency indicates to start multiple concurrent tasks in a job. A SQL-statement-based job does not support concurrency of a single operator.

5.10 How Can I Check Job Output Results?l CS can output job results to DIS. Therefore, you can view the results in DIS. For detailed

operations, see Retrieving Data from DIS in the Data Ingestion Service User Guide.

l CS can output job results to RDS. Therefore, you can view the results in RDS. Fordetailed operations, see the Relational Database Service Quick Start.

l CS can output job results to SMN, and SMN sends the results to the user's terminal. Fordetailed operations, see the Simple Message Notification Quick Start.

l CS can output job results to Kafka. Therefore, you can view the results in Kafka clusters.For detailed operations, visit https://kafka.apache.org/0101/documentation.html.

l CS can output job results to CloudTable. Therefore, you can view the results inCloudTable. For detailed operations, see Getting Started with CloudTable in theCloudTable Service User Guide.

l CS can output job results to IEF. Therefore, you can view the results in CloudTable. Fordetailed operations, see the Intelligent EdgeFabric Quick Start.

l CS can export job result data to Cloud Search Service. Therefore, you can view theresults in Cloud Search Service. For detailed operations, see Getting Started withCloud Search Service in the Cloud Search Service User Guide.

l CS can export job result data to DCS. Therefore, you can view the results in DCS. Fordetailed operations, see Getting Started in the Distributed Cache Service User Guide.

5.11 What Should I Do If the OBS Bucket Selected for aJob Is Not Authorized?

If the OBS bucket selected for a job is not authorized, perform the following steps:

Step 1 On the CS management console, click Job Management.

Step 2 On the row where the target job is located, click Edit under Operation to switch to the Editpage.

Step 3 Configure parameters under Running Parameter on the Edit page.

1. Select Enable Checkpoint or Save Job Log.

2. Select OBS Bucket.



142


https://support.huaweicloud.com/en-us/qs-rds/en-us_topic_apply_for_rds.html

https://kafka.apache.org/0101/documentation.html

https://support.huaweicloud.com/en-us/cloudtable_gls/index.html

https://support.huaweicloud.com/en-us/productdesc-es/es_04_0001.html


3. Select Authorize OBS.

----End



143

A Change History

Date What's New

2018-07-02 This issue is the fifteenth official release.Modified the following topics:l Related Servicesl Operation Guide

2018-06-11 This issue is the fourteenth official release.Modified the following sections:l Getting Startedl Operation GuideAdded the following section:l JOIN Between Stream Data and Table Data

2018-05-25 This issue is the thirteen official release.Modified the following sections:l Related Servicesl SQL Syntax Reference

2018-05-17 This issue is the twelfth official release.Added the following content:VPC Peering Connection: Added information related to VPCpeering connections.Modified the following section:l Visual Editor

Cloud Stream ServiceUser Guide A Change History


144

Date What's New

2018-05-11 This issue is the eleventh official release.The following changes have been made:l Added the following sections:

– Creating a Flink Streaming SQL Edge Job– Debugging a Job– Visual Editor– DDL Statement– CEP Based on Pattern Matching

l Modified the following sections:– Functions– Related Services– Preparing the Data Source and Output Channel– Creating a Flink Streaming SQL Job– Creating a User-Defined Flink Job– Creating a User-Defined Spark Job– Geographical Functions– DDL Statement

2018-04-25 This issue is the tenth official release.The following changes have been made:l Modified the following section:

– Getting Started

2018-03-30 This issue is the ninth official release, which incorporates thefollowing changes:The following changes have been made:l Added the following section:

– Creating an Agency for Permission Grantingl Modified the following section:

– Applying for CS



145

Date What's New

2018-03-07 This issue is the eighth official release.The following changes have been made:l Modified the following sections:

– Functions– Related Services– Applying for CS– Introduction– Creating a User-Defined Flink Job– Creating a User-Defined Spark Job– Monitoring a Job– Operator– Function– DDL Statement– Configuring Time Models

2018-02-02 This issue is the seventh official release.The following changes have been made:l Added the following topics:

– Related Services– Preparing the Data Source and Output Channel– DDL Statement

2018-01-12 This issue is the sixth official release.The following changes have been made:l Modified the following section:

– Getting Started

2018-01-05 This issue is the fifth official release.The following changes have been made:l Modified the following sections:

– Related Services– Getting Started– DDL Statement– Where Can the Job Results be Exported?– Which Data Formats Does CS Support?– How Can I Check Job Output Results?



146

Date What's New

2017-12-12 This issue is the fourth official release.The following changes have been made:l Added the following section:

– Creating a User-Defined Spark Jobl Modified the following sections:

– Getting Started– Creating a Flink Streaming SQL Job– Cluster Management

2017-12-06 This issue is the third official release.The following changes have been made:l Added the following sections:

– Viewing Job Running Logs– Added the debugging function in Creating a Flink

Streaming SQL Job.– Creating a User-Defined Flink Job– Added the relationship between CS service and ECS and that

between CS and SMN in Related Services.– Cluster Management– Added SMN preparation in Preparing the Data Source and

Output Channel.– SMN as Sink Data Storage

l Modified the following sections:– Getting Started– Creating a Flink Streaming SQL Job– Performing Operations on a Job– Monitoring a Job– Template Management– Modified the hierarchical directories in SQL Syntax

Reference.

2017-09-30 This issue is the second official release.The following changes have been made:l Added the following topics:

– Preparing Data Sources and Data Output Channels inGetting Started

– Job Configuration Listl Modified the following sections:

– Steps 9 and 10 in Creating a Flink Streaming SQL Job– Step 4 in Querying Job Audit Logs

2017-08-18 This issue is the first official release.



147