teradata architecture

27
Teradata Architecture Teradata Architecture AUTHOR: JAYAKRISHNAN.V E-MAIL : [email protected]

Upload: sivakrishna

Post on 24-Oct-2014

891 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Teradata Architecture

Teradata ArchitectureTeradata Architecture

AUTHOR: JAYAKRISHNAN.V

E-MAIL : [email protected]

Page 2: Teradata Architecture

Teradata corporationTeradata corporationTeradata corporation is a vendor specializing in data

warehousing and analytic applications .

• Its products are commonly used by companies to manage data warehouses for analytical and business intelligence purpose .

• Teradata was formerly a division of NCR Corporation

Page 3: Teradata Architecture

FeaturesFeatures

• Teradata is Relational Database Management System (RDBMS)– Used for Data warehousing– Executes on Unix ,Windows NT or Windows 2000

operating systems

Teradata

DATABASE

Win 98Win NT

IBMMainframe

UNIX

Page 4: Teradata Architecture

• Compliant with ANSI standards

• Capable of running in single node or in multiple nodes

• Unlimited, Proven Scalability - amount of data and number of users; allows for an enterprise wide model of the data.

• Unlimited Parallelism - Parallel access, sorts, and aggregations.

• Mature Optimizer - Handles complex queries, up to 64 joins per query, ad-hoc processing.

Page 5: Teradata Architecture

• Models the Business - 3NF, robust view processing, & provides star schema capabilities

• Low TCO (Total Cost of Ownership) - ease of setup, maintenance, & administration; no re-orgs, lowest disk to data ratio, and robust expansion utility (recon fig).

• High Availability - no single point of failure.

• Parallel Load utilities - robust, parallel, and scalable load utilities such as Fast Load, MultiLoad, and TPump.

Page 6: Teradata Architecture

Parser Engine Parser Engine

Message Passing Layer

AMP AMP AMP AMP

Vdisk VdiskVdiskVdisk

Gateway S/W

Ethernet Adapter

TDP

CLI

Client Req

MTDP

CLI

Client Req.

MOSIChannel Adapter

Bus Adapter

Network attached system

Channelattached system

Teradata Components & ArchitectureTeradata Components & Architecture

Page 7: Teradata Architecture

Teradata RDBMS ComponentsTeradata RDBMS Components

By 2 ways we can connect to Teradata RDBMS

• 1) Network attached system - through windows, Linux, UNIX• 2) Channel attached system- through Mainframe

Network attached system• Components

a) CLI- call level interface (lowest level interface)> It creates sessions, allocates requests and response

and also fetches responsesb) MTDP – Micro Teradata Director Program

> Controls the session related issues

Page 8: Teradata Architecture

d) MOSI – Micro Operating s/m interface-> It provides OS independent interface

• For connecting to Teradata RDBMS network attached system needs Ethernet adapter

Channel attached system• CLI is having the same function as in n/w attached s/m• TDP – Teradata Director Program• Manages session traffic b/n CLI n Teradata database, session

initiation and termination, logging, verification, session balancing etc

• For connecting to Teradata RDBMS channel attached system needs host channel adapter and bus adapter in Teradata RDBMS

Page 9: Teradata Architecture

Teradata RDBMS componentsTeradata RDBMS components

• 4 main components

• 1) PE – Parser engine • 2) MPL – Message passing layer• 3) AMP – Access module processor

• 4) PDE –Parallel database extension

Page 10: Teradata Architecture

The Parsing Engine

Answer Set Response

Parsing Engine

Message Passing Layer

AMP AMP AMP AMP

SQL Request

Parser

Optimizer

Dispatcher

• The Parsing Engine is responsible for:

– Managing individual sessions (up to 120)

– Dispatching the optimized plan to the AMPs

– Input conversion (EBCDIC / ASCII) - if necessary

– Sending the answer set response back to the requesting client

Page 11: Teradata Architecture

• 3 main components for PE

a)Parser and Resolver

• Checks for syntax errors • Checks the access permission for requested db object• Check the existence of requested object and also returns error

message if appropriate access is not there.

b)Optimizer

• It prepares access plan and Explain plan which shows how my query is solved efficiently means it restructure the query in such a way that it will run more efficientlyc)Generator

• Takes the explain plan created by optimizer and converts it into db understandable language called AMP steps

Page 12: Teradata Architecture

2) MPL – Message passing layer

-> Handles the internal communication of Teradata dbms

-> AMP steps are distributed to corresponding AMP based on the Hashing algorithm over the Message Passing Layer (BYNET)

Page 13: Teradata Architecture

3) AMP – Access module processor

-> Virtual processors running under a multitasking environment

-> BYNET interface

-> Manage database

-> Interface to disk subsystem

Page 14: Teradata Architecture

• The AMPs are responsible for:- Finding the rows requested- Lock management- Sorting rows- Aggregating columns- Join processing- Output conversion and formatting- Creating answer set for client- Disk space management- Accounting- Special utility protocols- Recovery processing

The Access Module Processor (AMP)

AMPs store and retrieve rows to and from disk

Answer Set Response

Message Passing Layer (PDE and BYNET)

AMP AMP AMP AMP

SQL Request

Parsing Engine

Page 15: Teradata Architecture

`̀4) PDE –Parallel database extension

This component is an interface layer on the top of operating system. Its functions include executing vprocs (virtual processors), providing a parallel environment, scheduling sessions, debugging, etc.

Page 16: Teradata Architecture

Teradata Storage Architecture

Teradata

AMP 4AMP 3AMP 1 AMP 2

Parsing Engine(s)

Message Passing Layer

18

254 41

1290 75

80

32

667

25

Records From Client (in random sequence)

2 32 67 12 90 6 54 75 18 25 80 41

Page 17: Teradata Architecture

• The Parsing Engine dispatches request to insert a row.

• The Message Passing Layer insures that a row gets to the appropriate AMP (Access Module Processor).

• The AMP stores the row on its associated (logical) disk.

• An AMP manages a logical disk which is mapped to multiple physical disks in a disk array.

Page 18: Teradata Architecture

Teradata Retrieval Architecture

Teradata

AMP 4AMP 3AMP 1 AMP 2

Parsing Engine(s)

Message Passing Layer

18

254 41

1290 75

80

32

667

25

Rows retrieved from table

2 32 67 12 90 6 54 75 18 25 80 41

Page 19: Teradata Architecture

• The Parsing Engine dispatches a request to retrieve one or more rows.

• The Message Passing Layer insures that the appropriate AMP(s) are activated.

• The AMP(s) locate and retrieve desired row(s) in parallel access.

• Message Passing Layer returns to retrieved rows to PE.

• The PE returns row's to requesting client application.

Page 20: Teradata Architecture

Teradata Parallelism• Each PE can handle up to 120 sessions in parallel.• Each Session can handle multiple REQUESTS.• The Message Passing Layer can handle all message

activity in parallel.• Each AMP can perform up to 80 tasks in parallel.• All AMPs can work together in parallel to service any

request.• Each AMP can work on several requests in parallel.

Page 21: Teradata Architecture

Multiple Tables on Multiple AMPs

EMPLOYEE ROWSDEPARTMENT ROWSJOB ROWS

EMPLOYEE Table DEPARTMENT Table JOB Table

Parsing Engine

AMP 1 AMP 2 AMP 3 AMP 4

Message Passing Layer

EMPLOYEE ROWSDEPARTMENT ROWSJOB ROWS

EMPLOYEE ROWSDEPARTMENT ROWSJOB ROWS

EMPLOYEE ROWSDEPARTMENT ROWSJOB ROWS

• Some rows from each table may be found on each AMP.

• Each AMP may have rows from all tables.

• Ideally, each AMP will hold roughly the same amount of data.

Page 22: Teradata Architecture

• Multiple nodes may be configured to provide a Massively Parallel Processing (MPP) system.

• A physical message passing layer called the BYNET is needed to interconnect multiple nodes.

• Teradata is a linearly expandable RDBMS - as your database grows, additional nodes may be added.

Multi-Node MPP System

BYNET

DAC DAC

SMPSMPSMP SMP

DAC DAC DAC DAC DAC DAC

Page 23: Teradata Architecture

• BYNET Features:

– Enables multiple SMP nodes to communicate.

– Automatic load balancing of message traffic.

– Automatic reconfiguration after fault detection.

– Fully operational dual BYNETs provide fault tolerance.

– Scalable bandwidth as nodes are added.

– Even though there are two physical BYNETs to provide redundancy and bandwidth, Teradata and TCP/IP software only see a single network.

BYNET (for MPP)

The BYNET is a dual redundant, bi-directional interconnect network.

All SMPs are connected to both BYNETs.

BYNET 0 BYNET 1

SMP SMP SMP SMP SMP SMP SMP SMP

Page 24: Teradata Architecture

Teradata file systemTeradata file system

• Is a layer between Teradata RDBMS and PDE

• Also provides set of service calls that allows Teradata RDBMS to store and retrieve data efficiently

Page 25: Teradata Architecture

Disk ArraysDisk Arrays

• A disk array contains drive groups• Drive groups contain set of drivers • LUN (Logical units) contains a portion of every drive • pdisk is a slice of LUN• Group of pdisks assigned to an AMP is called vdisk

Page 26: Teradata Architecture

Data DistributionData Distribution• According to the primary index selected, the rows get

distributed randomly among all the AMP's.

• More the unique the primary index, the more even the distribution will be.

Page 27: Teradata Architecture

THANKS