real-time data masking - percona · •insert trigger create trigger...

29
Real-Time Data Masking Pandikrishnan Gurusamy

Upload: others

Post on 23-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Real-Time Data Masking

Pandikrishnan Gurusamy

Page 2: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Indonesia Malaysia Philippines Russia Singapore Thailand VietnamHong Kong

Page 3: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

What is data masking?The process of hiding or changing original data with random characters or data.

9/26/17 3© 2017 Lazada Group. All Rights Reserved.

Last_Name SSN Salary

John 7896590245 40000

Peter 5789072345 70000

Last_Name SSN Salary

Sam 8666778934 xxxxxx

Fedric 9668996889 xxxxxx

Page 4: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Why data masking?

To Protect confidential data in test environment, When the data is used by developer or third party vendors. Companies share data from their production applications with other users for a variety of business needs:

• Analytics• Market research• Development • Testing• PCI DSS• Many others

© 2017 Lazada Group. All Rights Reserved.9/26/17 4

Page 5: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

What is sensitive data?

• Personal information (first name, last name, email, phone number, address)

• Credit card numbers• Bank account details• Varies based on business areas

9/26/17 5© 2017 Lazada Group. All Rights Reserved.

Page 6: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Data masking methods for MySQL

• pyrepl• Proxy

– MariaDB MaxScale

– ProxySQL

• Commercially available third-party tools• Triggers

9/26/17 6© 2017 Lazada Group. All Rights Reserved.

Page 7: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Triggers on SBR

When one uses statement-based replication, the binary log contains SQL statements. Slave server(s) execute the SQL statements. Triggers will be fired on the slave.

9/26/17 7© 2017 Lazada Group. All Rights Reserved.

Page 8: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

9/26/17 8© 2017 Lazada Group. All Rights Reserved.

CREATE TRIGGER email_ins BEFORE INSERT ON email FOR EACH ROW BEGIN SET NEW.mail= concat(md5(NEW.mail),'@ma.sk');

Triggers on SBR

Page 9: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Triggers on RBR

• When one uses row-based replication, the binary log contains row changes. So the triggers will not run on the slave servers.

• Example: pt-osc– If you drop the triggers created by pt-osc on the slaves during the time

altering, the table will be altered on the slaves. Because in the slave servers, the triggers will not be fired. The events were executed from the binary logs.

9/26/17 9© 2017 Lazada Group. All Rights Reserved.

Page 10: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

9/26/17 10© 2017 Lazada Group. All Rights Reserved.

CREATE TRIGGER email_ins BEFORE INSERT ON email FOR EACH ROW BEGIN SET NEW.mail= concat(md5(NEW.mail),'@ma.sk');

Triggers on RBR

Page 11: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

• slave_run_triggers_for_rbr (MariaDB 10.1.1 onwards)

9/26/17 11

Running triggers for row-based events

© 2017 Lazada Group. All Rights Reserved.

Page 12: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

It’s simple!

If you are masking very few columns across the database:• Create triggers for insert and update event on the slave server.• Enable the slave_run_triggers_for_rbr = 1

9/26/17 12© 2017 Lazada Group. All Rights Reserved.

Page 13: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

• Insert triggerCREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name= md5(NEW.last_name),NEW.email= concat(md5(NEW.email),'@ma.sk') ,NEW.phone=concat(65, FLOOR(RAND() * 78585850));

• Update triggerCREATE TRIGGER customer_db.customer_before_upt BEFORE UPDATE ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name= md5(NEW.last_name),NEW.email= concat(md5(NEW.email),'@ma.sk') ,NEW.phone=concat(65, FLOOR(RAND() * 78585850));

9/26/17 13© 2017 Lazada Group. All Rights Reserved.

Page 14: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

9/26/17 14© 2017 Lazada Group. All Rights Reserved.

Page 15: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Setup

• We are masking more than 400 columns across 110 tables• Masking process is fully automated using Python scripts• Adding new tables for masking is very simple

9/26/17 15© 2017 Lazada Group. All Rights Reserved.

Page 16: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Configuration-based masking

• masking_method Database_name.Table_name.Column_name• hash_column customer_db.customer.first_name• hash_phone customer_db.customer.phone• hash_email customer_db.customer.email

9/26/17 16© 2017 Lazada Group. All Rights Reserved.

Page 17: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

We use three masking methods

Masking methods Masking implementation of the columns

hash_column md5(column_name)

hash_phone concat(65, FLOOR(RAND() * 78585850))

hash_email concat(md5(new.email),'@ma.sk')

9/26/17 17© 2017 Lazada Group. All Rights Reserved.

Page 18: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Data masking architecture

9/26/17 18© 2017 Lazada Group. All Rights Reserved.

Page 19: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Challenges

• Building a new data masking server• pt-online schema change

9/26/17 19© 2017 Lazada Group. All Rights Reserved.

Page 20: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Building a new data masking server

• Take a backup from the master• Update all the tables that need to have the columns masked• Deploy triggers• Enable the slave_run_triggers_for_rbr• Setup the replication

• Python create_trigger_update.py

• It will create three sql files withupdate statement and triggers for insert and update operations.

9/26/17 20© 2017 Lazada Group. All Rights Reserved.

Page 21: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

pt-online schema change

pt-osc will break the replication on the slave, regardless of whether you are masking the columns on the table or not.

9/26/17 21© 2017 Lazada Group. All Rights Reserved.

Page 22: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

How pt-osc works on RBR

9/26/17 22© 2017 Lazada Group. All Rights Reserved. Lazada Confidential.

_sbtest_newsb_test Triggers

_sbtest_newsb_test

• In the RBR, the triggers created by pt-osc will not have any effect on the slaves.

• The tables will be replicated on the slave based on the row events.

• And then old table will be dropped and new table will be renamed.

Page 23: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

• In the RBR, the triggers created by pt-osc will not have any effect on the slaves.

• But we have enabled the slave_run_triggers_for_rbr to fire the triggers on the row-based events.

• When it tries to fire the triggers on the slave, which was already executed on the master, we will get the error (left).

• To fix it, we need to drop the triggers on the slave, which were created by pt-osc, and then start the slave.

• Afterwards, the table will be altered successfully.

• python slave_trigger_new.py

9/26/17 23

Scenario 1

© 2017 Lazada Group. All Rights Reserved.

Page 24: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Scenario 2

• If we need to alter the table which has masked columns (triggers):– We will get the same error as in Scenario 1 and fix it the same way.– As a result, pt-osc will be successful.

• But it should have dropped the old table and the objects associated (triggers) to that old table.– The newly altered table will have the sensitive information.– We need to follow the below steps:

• Stop the slave.• Update the table to mask the information.• Create triggers to do data masking for that table.• Start the replication.

• The above steps will be taken care of by our scripts.

9/26/17 24© 2017 Lazada Group. All Rights Reserved.

Page 25: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

How pt-osc works on Slave(RBR)

9/26/17 25© 2017 Lazada Group. All Rights Reserved. Lazada Confidential.

_sbtest_newsb_test pt_osc Triggers

_sbtest_newsb_test

• The datas in the slave were by the masked by the triggers which we deployed earlier.

• But the the temporary table _sbtest_new which was created by pt_osc should have sensitive information.

Masked Data Unmasked data

Page 26: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

How pt-osc works on Slave(RBR)

9/26/17 26© 2017 Lazada Group. All Rights Reserved. Lazada Confidential.

_sbtest_new

sb_test

Masked Data

Unmasked data

sb_test

Unmasked data

Table will be dropped by pt-osc

• As pt_osc drops the old table and renamed the temporary table.

• The objects associated (triggers) to old table will also be dropped.

• So now we have sensitive information available on the slaves.

Page 27: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

9/26/17 27© 2017 Lazada Group. All Rights Reserved. Lazada Confidential.

– stop slave;

– update sb_test set email=md5(email),updated_time=updated_time;

– CREATE TRIGGER sb_test_ins BEFORE INSERT ON sb_testFOR EACH ROW BEGINset NEW.email=md5(NEW.email);

– CREATE TRIGGER sb_test_upd BEFORE UPDATE ON sb_test FOREACH ROW BEGINset NEW.email=md5(NEW.email);

– start slave;

– Python data_masking.py

Page 28: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

• https://github.com/pandikirushnan/Real_time_data_masking

9/26/17 28© 2017 Lazada Group. All Rights Reserved.

Page 29: Real-Time Data Masking - Percona · •Insert trigger CREATE TRIGGER customer_db.customer_before_ins BEFORE INSERT ON customer FOR EACH ROW BEGIN SET NEW.first_name= md5(NEW.first_name),NEW.last_name=

Thank you!