schema registry - set you data free

23
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Satish Duggana, Hortonworks Dataworks summit - 2017, Munich

Upload: dataworks-summithadoop-summit

Post on 13-Apr-2017

69 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Schema Registry - Set you Data Free

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema RegistrySatish Duggana, HortonworksDataworks summit - 2017, Munich

Page 2: Schema Registry - Set you Data Free

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Introduction What is Schema Registry?

• A shared repository of schemas that allows applications to flexibly interact with each other

What Value does Schema Registry Provide?– Data Governance

• Provide reusable schema • Define relationship between schemas• Enable generic format conversion, and generic routing

– Operational Efficiency• To avoid attaching schema to every piece of data • Producers and consumers can evolve at different rates

Example Use– Register Schemas for Kafka Topics to be used by consumers of Kafka Topic (e.g: Nifi, StreamLine)

Page 3: Schema Registry - Set you Data Free

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Registry Concepts

• Schema Group A logical grouping/container for similar type of schemas or based any criteria that the customer has from managing the schemas

• Schema Metadata Metadata associated with a named schema.

• Schema Version The actual versioned schema associated a schema meta definition

Schema Metadata 1

Schema NameSchema TypeDescriptionCompatibility PolicySerializersDeserializers

Schema Group

Group Name

SchemaVersion 3

SchemaVersion 2

Schema Version 1versiontextFingerprint

Page 4: Schema Registry - Set you Data Free

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Registry

Schema Registry Component Architecture

SR Web Server

Schema RegistryWeb App

REST APISchema Registry Client

Java Client

Integrations

Nifi Processors Kafka Ser/Des StreamLine

SchemaStorage

Pluggable Storage

Serializer/Deserializer Jar Storage

MySQL In-Memory Local File System

HDFSPostgres

Page 5: Schema Registry - Set you Data Free

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Writer/Reader schemas

Writer schema– Senders/Producers use this schema while sending the payloads according to the given schema viz

writer’s schema

Reader/Projection schema– Receivers uses this schema to project the received payload written with a writer schema.

Sender ReceiverWriter

SchemaWriter

SchemaProjection

Schema

Page 6: Schema Registry - Set you Data Free

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema evolution

Producerv2

Consumerv2

Producerv1

Producerv4

Consumerv5

Producerv1

Consumerv7

Page 7: Schema Registry - Set you Data Free

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Compatibility Policies

What is a Compatibility Policy?– Defines the rules of how the schemas can evolve– Subsequent version updates has to honor the schema’s original compatibility.

Policies Supported– Backward– Forward– Both– None

Page 8: Schema Registry - Set you Data Free

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Backward compatibility

New version of a schema would be compatible with earlier version of that schema. Data written from earlier version of the schema, can be read with a new version of the

schema.

V1{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ]}

V2{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int", "default": -1 } ]}

Page 9: Schema Registry - Set you Data Free

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Forward compatibility

Existing schema is compatible with future versions of the schema. That means the data written from new version of the schema can still be read with old

version of the schema.

V1{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ]}

V2{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int" } ]}

Page 10: Schema Registry - Set you Data Free

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Both/Full compatibility

New version of the schema provides both backward and forward compatibilities.

V1{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" } ]}

V2{ "type": "record", "name": "book", "namespace": "registry.example", "fields": [ { "name": "id", "type": "string" }, { "name": "color", "type": "string", "default": "blue" }, { "name": "pages", "type": "int", "default": -1 }, { "name": "title", "type" : "string", "default": "" } ]}

Page 11: Schema Registry - Set you Data Free

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema composition

Schemas can be shared and reused with in existing schemas Inbuilt support in default serializer/deserializer to build effective schemas

{ "name": "account", "namespace": "com.hortonworks.example.types", "includeSchemas": [ { "name": "utils” } ], "type": "record", "fields": [ { "name": "name", "type": "string" }, { "name": "id", "type": "com.hortonworks.datatypes.uuid" } ]}

{ "name": "uuid", "type": "record", "namespace": "com.hortonworks.datatypes", "doc": "A Universally Unique Identifier, in canonical form in lowercase. This is generated from java.util.UUID Example: de305d54-75b4-431b-adb2-eb6b9e546014", "fields": [ { "name": "value", "type": "string", "default": "" } ]}

Page 12: Schema Registry - Set you Data Free

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Sender/Receiver flow

Local schema/serdes

cache

Serializer

Sender

Schema Registry Client

Message Store

Local schema/serdes

cache

Deserializer

Schema Registry Client

versionpayload

versionpayload

Schema Storage SerDes Storage

Receiver

SchemaRegistrySchemaRegistry SchemaRegistry

Page 13: Schema Registry - Set you Data Free

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Serializers/Deserializers

Snapshot based serializer/deserializer– Serializes the complete payload– Deserializes the payload to respective type

Pull based serializer/deserializer– Serialize whatever elements are required and ignore other elements– Pull out whatever elements that are required to build the desired object

Push based deserializer– Gives callback to receive parsing events for respective fields in schema

Page 14: Schema Registry - Set you Data Free

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema registry client

REST based client Caching

– Metadata– Schema versions– Ser/des libs and class loaders

URL selectors– Round robin– Failover– Custom

Page 15: Schema Registry - Set you Data Free

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HA

Storage provider – Depends on transactional support of

underlying SQL stores– Spinup required schema registry

instances

Supports HA at SchemaRegistry– Using ZK/Curator– Automatic failover of master– Master gets all writes– Slaves receive only reads

SchemaRegistry

storage

SchemaRegistrySchemaRegistry

SchemaRegistry

SchemaRegistrySchemaRegistry

storage

Page 16: Schema Registry - Set you Data Free

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Integration of Schema Registry

Kafka– Using producer/consumer API for serializer/deserializer

Nifi Processors for Schema Registry– Fetch Schema– Serialize/Deserialize with Schema

StreamLine– Lookup Schema of a Kafka Topic

Page 17: Schema Registry - Set you Data Free

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kafka integration

Local schema/serdes

cache

KafkaAvroSerializer

Producer

Schema Registry Client

Local schema/serdes

cache

KafkaAvroDeserializer

Schema Registry Client

versionpayload

versionpayload

Consumer

SchemaRegistrySchemaRegistry SchemaRegistry

Kafka

Page 18: Schema Registry - Set you Data Free

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kafka Avro ser/des protocol

ser/des can be implemented with different protocols Default ser/des send protocol/schema versions as part of the binary payload of kafka

messages– Can be enhanced to use headers/metadata instead of the message payload– Custom ser/des can be registered for schemas.

Page 19: Schema Registry - Set you Data Free

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Nifi integration

Nifi Controller Service Nifi processors

– Transforms• Avro – CSV• Avro – Json• Json – CSV

– Extracting Avro fields

Page 20: Schema Registry - Set you Data Free

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Registry UI

Page 21: Schema Registry - Set you Data Free

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

WIP/Future enhancements

Security– Kerberos support– Default authorizers and Apache Ranger support

Archiving schemas Notifications

– New versions– Archiving

Converters

Page 22: Schema Registry - Set you Data Free

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Try it out!

https://github.com/hortonworks/registry https://groups.google.com/forum/#!forum/registry Open sourced under Apache license Apache incubation soon Contributions are welcome

Page 23: Schema Registry - Set you Data Free

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Q & A