chettinadtech.ac.inchettinadtech.ac.in/storage/16-09-24/16-09-24-09-18-17... · web viewunit v...

UNIT V INFORMATION LIFECYCLE MANAGEMENT

Data retention policies; Confidential and Sensitive data handling, lifecycle management costs. Archive data using Hadoop; Testing and delivering big data applications for performance and functionality; Challenges with data administration;

Data retention policies

A data retention policy, or records retention policy, is an organization's establishedprotocol for retaining information for operational or regulatory compliance needs.When writing a data retention policy, you need to determine how to:

Organize information so it can be searched and accessed at a later date Dispose of information that is no longer needed

Some organizations find it helpful to use a data retention policy template that provides a framework to follow when crafting the policy.

Confidential and Sensitive data handling

The following steps and ensuing sub-items are intended to provide a general roadmap. Institutions will be at varying stages of progress. Some will start with the need to establish actions in the areas of policies, processes, or technology. Some will be ready to implement, and some will be able to revise and fine-tune their processes. You will also need to prioritize your actions to mitigate risks because of the comprehensive nature of the recommendations. We've attempted to organize these in a sequence that allows you to logically follow through each step. Although each item is recommended as an effective practice, we recognize that state/local legal requirements, institutional policy, or campus culture might cause each institution to approach this differently.

Definition - "Confidential Data"

Confidential data includes both sensitive personal information and sensitive institutional information. Unauthorized access to personal confidential data may result in a significant invasion of privacy, or may expose individuals to significant financial risk. Unauthorized access to or modification of institutional confidential data may result in direct, materially negative impacts on the finances, operations, or reputation of the institution. Examples of personal confidential data include information protected under privacy laws, information concerning the pay and benefits of employees, personal identification information, medical/health information held by the institution pertaining to those affiliated with it, and data collected in the course of research on human subjects. Institutional confidential data, for example, includes institutional financial and planning information, legally privileged information, invention disclosures, and other information concerning pending patent applications.

Definition - "Policy", "Standard", "Procedure"

This document makes use of all 3 terms, and they are not synonomous. For the purposes of this document, a policy is a relatively abstract statement of a goal. A standard is a group of specific technical settings, protocols, vendors, or benchmarks. A procedure is a set of specific operating instructions. Standards and procedures are frequently developed to support policies, but can also stand alone in some situations.

For example, an institution might have a policy that states that all confidential data must be encrypted. It might also have a standard that states that the encryption must use the AES cipher with a key length of 256 bits for file encryption and use the TLS/SSL protocol for network entcryption using the AES or Blowfish ciphers, or simply require use of software "XYZ" (which is available correctly configured) to encrypt data. Finally, a procedure would give specific instructions for ensuring that the encryption policy and standards are being implemented correctly..."To encrypt a file, run this program from the server, choose the filename, choose a password, and click 'OK'," or "To ensure that your network traffic is encrypted, look at the bottom of the browser for an icon of a padlock. If it is not there, call the Help Desk at 555-1212 to report a problem."Steps

#Step 1 : Create a security risk-aware culture that includes an information security risk management program

#Step 2 : Define institutional data types #Step 3 : Clarify responsibilities and accountability for safeguarding confidential data #Step 4 : Reduce access to confidential data not absolutely essential to institutional processes #Step 5 : Establish and implement stricter controls for safeguarding confidential data #Step 6 : Provide awareness and training #Step 7 : Verify compliance routinely with your policies and procedures

Step 1: Create a security risk-aware culture that includes an information security risk management program

Statistics consistently show that the lack of security awareness and management attention to security are frequent causes of data exposure incidents. Subsequent steps in the Blueprint should significantly lower the risk of such incidents. The risk reduction, however, will be short-lived unless everyone in the institution assumes responsibility for protection of institutional data and executives and managers fully appreciate and act on the fact that strong security programs have a direct positive impact on such critical institutional issues as:

a. Meeting organizational goalsb. Maintaining efficient, uninterrupted operational processesc. Fostering a positive public imaged. Complying with legal statutes, regulations, contractual obligations

https://spaces.internet2.edu/display/2014infosecurityguide/Confidential+Data+Handling+Blueprint#ConfidentialDataHandlingBlueprint-Step7







Step 2: Define institutional data types.

A thorough understanding of the different information resources maintained and used by an institution is a key aspect of determining the requirements for protection of confidential data. Typically data types are grouped such that controls may then be applied in a manner that is commensurate to reduce financial, legal, and reputational risks to the institution.

Step 3: Clarify responsibilities and accountability for safeguarding confidential data

Along with understanding and grouping information resources, assigning responsibility for those resources is a critical component to ensure that they are properly handled.

Consider a file of employee salary information as an example. This file could be the responsibility of the central Human Resources Department, which maintains master files. It could also be the responsibility of Information Technology, which maintains the human resources information system, or of departmental staff who maintain local human resources files. Responsibility might also be split between these entities.

Step 4: Reduce access to confidential data not absolutely essential to institutional processes

One of the most effective means of not accidentally exposing confidential data is simply reducing access to the data. This includes not collecting such data if not absolutely necessary, not showing it on printed reports or computer screens, and eliminating cases where it has been historically stored in locations, both electronic and paper, where it is no longer required.

Step 5: Establish and implement stricter controls for safeguarding confidential data

Controls provide enforcement of higher level policy via either detailed procedure(s), technological mechanism(s), or both. A high level policy or assignment of responsibility does not actually protect confidential data; detailed controls provide the protection.

Step 6: Provide awareness and training

Assure that training requirements for each law, regulation, or control are being met. Build a list of training requirements that shows who must be trained based on what topics.

Step 7: Regularly verify compliance with your policies, standards, and procedures

Assure that requirements for each data element or system are being managed. Build a list of compliance requirements that indicates which data elements (or systems) the requirements apply to and which data steward (or person) is responsible for each requirement.

Information Lifecycle Management ILM is the practice of applying certain policies to effective information management. This practice has been used by Records and Information Management (RIM) Professionals for over three decades and had its basis in the management of information in paper or other physical forms (microfilm, negatives, photographs, audio or video recordings and other assets).

ILM includes every phase of a "record" from its beginning to its end. And while it is generally applied to information that rises to the classic definition of a record (Records management), it applies to any and all informational assets. During its existence, information can become a record by being identified as documenting a business transaction or as satisfying a business need. In this sense ILM has been part of the overall approach of ECM Enterprise content management.

However, in a more general perspective the term "business" must be taken in a broad sense, and not forcibly tied to direct commercial or enterprise contexts. While most records are thought of as having a relationship to enterprise business, not all do. Much recorded information serves to document an event or a critical point in history. Examples of these are birth, death, medical/health and educational records. e-Science, for example, is an emerging area where ILM has become relevant.

In the year 2004, the Storage Networking Industry Association, on behalf of the information technology (IT) and information storage industries, attempted to assign a new broader definition to Information Lifecycle Management (ILM). The oft-quoted definition that it released that October at the Storage Networking World conference in Orlando, Florida, stated that "ILM consists of the policies, processes, practices, and tools used to align the business value of information with the most appropriate and cost effective IT infrastructure from the time information is conceived through its final disposition."[1] In this view, information is aligned with business processes through management policies and service levels associated with applications, metadata, information, and data.

PolicyILM policy consists of the overarching storage and information policies that drive management processes. Policies are dictated by business goals and drivers. Therefore, policies generally tie into a framework of overall IT governance and management; change control processes; requirements for system availability and recovery times; and service level agreements (SLAs).

https://en.wikipedia.org/wiki/Business_process

https://en.wikipedia.org/wiki/Information_Lifecycle_Management#cite_note-1

https://en.wikipedia.org/wiki/Storage_Networking_Industry_Association

https://en.wikipedia.org/wiki/E-Science

https://en.wikipedia.org/wiki/Enterprise_content_management

https://en.wikipedia.org/wiki/Records_management

https://en.wikipedia.org/wiki/Video_recording

https://en.wikipedia.org/wiki/Microfilm

OperationalOperational aspects of ILM include backup and data protection; disaster recovery, restore, and restart; archiving and long-term retention; data replication; and day-to-day processes and procedures necessary to manage a storage architecture.

Infrastructure[edit]

Infrastructure facets of ILM include the logical and physical architectures; the applications dependent upon the storage platforms; security of storage; and data center constraints. Within the application realm, the relationship between applications and the production, test, and development requirements are generally most relevant for ILM.

Functionality[edit]

For the purposes of business records, there are five phases identified as being part of the lifecycle continuum along with one exception. These are:

Creation and Receipt

Distribution

Use

Maintenance

Disposition

Creation and Receipt deals with records from their point of origination. This could include their creation by a member of an organization at varying levels or receipt of information from an external source. It includes correspondence, forms, reports, drawings, computer input/output, or other sources.

Distribution is the process of managing the information once it has been created or received. This includes both internal and external distribution, as information that leaves an organization becomes a record of a transaction with others.

Use takes place after information is distributed internally, and can generate business decisions, document further actions, or serve other purposes.

Maintenance is the management of information. This can include processes such as filing, retrieval and transfers. While the connotation of 'filing' presumes the placing of information in a prescribed container and leaving it there, there is much more involved. Filing is actually the process of arranging information in a predetermined sequence and creating a system to manage it for its useful existence within an organization. Failure to establish a sound method for filing information makes its retrieval and use nearly impossible. Transferring information refers to the process of responding to

requests, retrieval from files and providing access to users authorized by the organization to have access to the information. While removed from the files, the information is tracked by the use of various processes to ensure it is returned and/or available to others who may need access to it.

Disposition is the practice of handling information that is less frequently accessed or has met its assigned retention periods. Less frequently accessed records may be considered for relocation to an 'inactive records facility' until they have met their assigned retention period. "Although a small percentage of organizational information never loses its value, the value of most information tends to decline over time until it has no further value to anyone for any purpose. The value of nearly all business information is greatest soon after it is created and generally remains active for only a short time --one to three years or so-- after which its importance and usage declines. The record then makes its life cycle transition to a semi-active and finally to an inactive state." [2] Retention periods are based on the creation of an organization-specific retention schedule, based on research of the regulatory, statutory and legal requirements for management of information for the industry in which the organization operates. Additional items to consider when establishing a retention period are any business needs that may exceed those requirements and consideration of the potential historic, intrinsic or enduring value of the information. If the information has met all of these needs and is no longer considered to be valuable, it should be disposed of by means appropriate for the content. This may include ensuring that others cannot obtain access to outdated or obsolete information as well as measures for protection privacy and confidentiality.'

Long-term records are those that are identified to have a continuing value to an organization. Based on the period assigned in the retention schedule, these may be held for periods of 25 years or longer, or may even be assigned a retention period of "indefinite" or "permanent". The term "permanent" is used much less frequently outside of the Federal Government, as it is not feasible to establish a requirement for such a retention period. There is a need to ensure records of a continuing value are managed using methods that ensure they remain persistently accessible for length of the time they are retained. While this is relatively easy to accomplish with paper or microfilm based records by providing appropriate environmental conditions and adequate protection from potential hazards, it is less simple for electronic format records. There are unique concerns related to ensuring the format they are generated/captured in remains viable and the media they are stored on remains accessible. Media is subject to both degradation and obsolescence over its lifespan, and therefore, policies and procedures must be established for the periodic conversion and migration of information stored electronically to ensure it remains accessible for its required retention periods.

Exceptions occur with non-recurring issues outside the normal day-to-day operations. One example of this is a legal hold, litigation hold or legal freeze is requested by an attorney. What follows is that the records manager will place a legal hold inside the records management application which will stop the files from being enqueued for disposition.

Archive data using Hadoop

The inexpensive cost of storage for Hadoop plus the ability to query Hadoop data with SQL makes Hadoop the prime destination for archival data. This use case has a low impact on your organization because you can start building your Hadoop skill set on data that’s not stored on performance-mission-critical systems.

What’s more, you don’t have to work hard to get at the data. (Since archived data normally is stored on systems that have low usage, it’s easier to get at than data that’s in “the limelight” on performance-mission-critical systems, like data warehouses.) If you’re already using Hadoop as a landing zone, you have the foundation for your archive! You simply keep what you want to archive and delete what you don’t.

If you think about the Hadoop’s landing zone, the queryable archive, shown in the figure, extends the value of Hadoop and starts to integrate pieces that likely already exist in your enterprise. It’s a great example of finding economies of scale and cost take-out opportunities using Hadoop.

Here, the archive component connects the landing zone and the data warehouse. The data being archived originates in the warehouse and is then stored in the Hadoop cluster, which is also provisioning the landing zone. In short, you can use the same Hadoop cluster to archive data and act as your landing zone.

The key Hadoop technology you would use to perform the archiving is Sqoop, which can move the data to be archived from the data warehouse into Hadoop. You will need to consider what form you want the data to take in your Hadoop cluster. In general, compressed Hive files are a good choice.

You can, of course, transform the data from the warehouse structures into some other form (for example, a normalized form to reduce redundancy), but this is generally not a good idea. Keeping the data in the same structure as what’s in the warehouse will make it much easier to perform a full data set query across the archived data in Hadoop and the active data that’s in the warehouse.

The concept of querying both the active and archived data sets brings up another consideration: how much data should you archive? There are really two common choices: archive everything as data is added and changed in the data warehouse, or only archive the data you deem to be cold.

Archiving everything has the benefit of enabling you to easily issue queries from one single interface across the entire data set — without a full archive, you’ll need to figure out a federated query solution where you would have to union the results from the archive and the active data warehouse.

But the downside here is that regular updates of your data warehouse’s hot data would cause headaches for the Hadoop-based archive. This is because any changes to data in individual rows and columns would require wholesale deletion and re-cataloging of existing data sets.

Now that archival data is stored in your Hadoop-based landing zone (assuming you’re using an option like the compressed Hive files mentioned previously), you can query it. This is where the SQL on Hadoop solutions can become interesting.

An excellent example of what’s possible is for the analysis tools (on the right in the figure) to directly run reports or analysis on the archived data stored in Hadoop. This is not to replace the data warehouse — after all, Hadoop would not be able to match the warehouse’s performance characteristics for supporting hundreds or more concurrent users asking complex questions.

The point here is that you can use reporting tools against Hadoop to experiment and come up with new questions to answer in a dedicated warehouse or mart.

Big Data Testing: Functional & Performance

What is Big Data?

Big data is a collection of large datasets that cannot be processed using traditional computing techniques. Testing of these datasets involves various tools, techniques and frameworks to process. Big data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, variety, and velocity.

Big Data Testing Strategy Testing Steps in verifying Big Data Applications Step 1: Data Staging Validation Step 2: "MapReduce" Validation Step 3: Output Validation Phase Architecture Testing Performance Testing Performance Testing Approach Parameters for Performance Testing Test Environment Needs Big data Testing Vs. Traditional database Testing Tools used in Big Data Scenarios Challenges in Big Data Testing

Big Data Testing Strategy

Testing Big Data application is more a verification of its data processing rather than testing the individual features of the software product. When it comes to Big data testing, performance and functional testing are the key.

In Big data testing QA engineers verify the successful processing of terabytes of data using commodity cluster and other supportive components. It demands a high level of testing skills as the processing is very fast. Processing may be of three types

Along with this, data quality is also an important factor in big data testing. Before testing the application, it is necessary to check the quality of data and should be considered as a part of database testing. It involves checking various characteristics like conformity, accuracy, duplication, consistency, validity, data completeness, etc.

Testing Steps in verifying Big Data Applications

The following figure gives a high level overview of phases in Testing Big Data Applications

Big Data Testing can be broadly divided into three steps

Step 1: Data Staging Validation

The first step of big data testing, also referred as pre-Hadoop stage involves process validation.

Data from various source like RDBMS, weblogs, social media, etc. should be validated to make sure that correct data is pulled into system

Comparing source data with the data pushed into the Hadoop system to make sure they match

Verify the right data is extracted and loaded into the correct HDFS location

Tools like Talend, Datameer, can be used for data staging validation

https://www.talend.com/

http://www.datameer.com/

Step 2: "MapReduce" Validation

The second step is a validation of "MapReduce". In this stage, the tester verifies the business logic validation on every node and then validating them after running against multiple nodes, ensuring that the

Map Reduce process works correctly Data aggregation or segregation rules are implemented on the data Key value pairs are generated Validating the data after Map Reduce process

Step 3: Output Validation Phase

The final or third stage of Big Data testing is the output validation process. The output data files are generated and ready to be moved to an EDW (Enterprise Data Warehouse) or any other system based on the requirement.

Activities in third stage includes

To check the transformation rules are correctly applied To check the data integrity and successful data load into the target system To check that there is no data corruption by comparing the target data with the HDFS file

system data

Architecture Testing

Hadoop processes very large volumes of data and is highly resource intensive. Hence, architectural testing is crucial to ensure success of your Big Data project. Poorly or improper designed system may lead to performance degradation, and the system could fail to meet the requirement. Atleast, Performance and Failover test services should be done in a Hadoop environment.

Performance testing includes testing of job completion time, memory utilization, data throughput and similar system metrics. While the motive of Failover test service is to verify that data processing occurs seamlessly in case of failure of data nodes

Performance Testing

Performance Testing for Big Data includes two main action

Data ingestion and Throughout: In this stage, the tester verifies how the fast system can consume data from various data source. Testing involves identifying different message that the queue can process in a given time frame. It also includes how quickly data can be inserted into underlying data store for example insertion rate into a Mongo and Cassandra database.

Data Processing: It involves verifying the speed with which the queries or map reduce jobs are executed. It also includes testing the data processing in isolation when the underlying data store is populated within the data sets. For example running Map Reduce jobs on the underlying HDFS

Sub-Component Performance: These systems are made up of multiple components, and it is essential to test each of these components in isolation. For example, how quickly message is indexed and consumed, mapreduce jobs, query performance, search, etc.

Performance Testing Approach

Performance testing for big data application involves testing of huge volumes of structured and unstructured data, and it requires a specific testing approach to test such massive data.

Performance Testing is executed in this order

1. Process begins with the setting of the Big data cluster which is to be tested for performance

2. Identify and design corresponding workloads3. Prepare individual clients (Custom Scripts are created)4. Execute the test and analyzes the result (If objectives are not met then tune the

component and re-execute) 5. Optimum Configuration

Parameters for Performance Testing

Various parameters to be verified for performance testing are

Data Storage: How data is stored in different nodes Commit logs: How large the commit log is allowed to grow Concurrency: How many threads can perform write and read operation Caching: Tune the cache setting "row cache" and "key cache." Timeouts: Values for connection timeout, query timeout, etc. JVM Parameters: Heap size, GC collection algorithms, etc. Map reduce performance: Sorts, merge, etc. Message queue: Message rate, size, etc.

Test Environment Needs

Test Environment needs depend on the type of application you are testing. For Big data testing, test environment should encompass

It should have enough space for storage and process large amount of data It should have cluster with distributed nodes and data It should have minimum CPU and memory utilization to keep performance high

Big data Testing Vs. Traditional database TestingProperties Traditional database testing Big data testing

Data

Tester work with structured data

Tester works with both structured as well as unstructured data

Testing approach is well defined and time-tested

Testing approach requires focused R&D efforts

Tester has the option of "Sampling" strategy doing manually or "Exhaustive Verification" strategy by automation tool

"Sampling" strategy in Big data is a challenge

Infrastructure

It does not require special test environment as the file size is limited

It requires special test environment due to large data size and files (HDFS)

Validation Tools

Tester uses either the Excel based macros or UI based automation tools

No defined tools, the range is vast from programming tools like MapReduce to HIVEQL

Testing Tools can be used with basic operating knowledge and less training.

It requires a specific set of skills and training to operate testing tool. Also, the tools are in their nascent stage and overtime it may come up with new features.

Tools used in Big Data Scenarios Big Data Cluster Big Data Tools

NoSQL: CouchDB, DatabasesMongoDB, Cassandra, Redis, ZooKeeper, Hbase

MapReduce: Hadoop, Hive, Pig, Cascading, Oozie, Kafka, S4, MapR, Flume

Storage: S3, HDFS ( Hadoop Distributed File System)

Servers: Elastic, Heroku, Google App Engine, EC2

Processing R, Yahoo! Pipes, Mechanical Turk, BigSheets, Datameer

https://www.heroku.com/

http://www.guru99.com/introduction-to-macros-in-excel.html

Challenges in Big Data Testing

Automation

Automation testing for Big data requires someone with a technical expertise. Also, automated tools are not equipped to handle unexpected problems that arise during testing

Virtualization

It is one of the integral phases of testing. Virtual machine latency creates timing problems in real time big data testing. Also managing images in Big data is a hassle.

Large Dataseto Need to verify more data and need to do it fastero Need to automate the testing efforto Need to be able to test across different platform

Performance testing challenges

Diverse set of technologies: Each sub-component belongs to different technology and requires testing in isolation

Unavailability of specific tools: No single tool can perform the end-to-end testing. For example, NoSQL might not fit for message queues

Test Scripting: A high degree of scripting is needed to design test scenarios and test cases

Test environment: It needs special test environment due to large data size Monitoring Solution: Limited solutions exists that can monitor the entire environment Diagnostic Solution: Custom solution is required to develop to drill down the

performance bottleneck areas

Challenges with data administration

Databases are the heart of applications and a vital component for business operations. And database administrators (DBAs) are under intense pressure and responsibility to ensure high performance and minimal downtime while facing myriad challenges.

Increasing workloads and database complexity are among the challenges that DBAs deal with on a daily basis, as evidenced by IT’s shifting to an application-centric approach, the introduction of multiple database platforms, and the need to manage data not only on-premises, but also in the cloud. But as the IT landscape evolves, these challenges must, and can, be overcome.

The Shift to an Application-centric Approach

At the core of nearly every application is a database. This means that when an application performance or availability problem arises, there’s a good chance it’s associated with the

http://data-informed.com/analytics-in-the-cloud-find-the-best-fit-podcast/

underlying database’s performance. Database performance impacts customers and end users, who have little patience for application issues (a recent SolarWinds survey found that 67 percent of end users say they expect IT to resolve such issues within an hour or less). In addition, in recent research by Gleanster, 88 percent of IT professionals reported the database as the most common challenge or issue with application performance. Furthermore, the cloud, DevOps, and other shifts in technology are making the entire IT department more application-focused. In the end, applications are what matters to the business and to end users. This means DBAs are being held accountable for application performance, not only database performance.

To deliver better application performance, DBAs should consider the following tips:

Be proactive and align behind end-user experience as a shared objective across the entire IT organization by looking at application performance and the impact that the database has on it continuously, not only when it becomes a major problem.

Measure performance based not on an infrastructure resources perspective, but on end-user wait times. Wait-time analysis gives DBAs a view into what end-users are waiting for and what the database is waiting for, providing clear visibility into bottlenecks.

Implement monitoring tools that provide visibility across the entire application stack, including all the infrastructure that supports the database – virtualization layers, database servers, hosts, storage systems, networks, etc.

Establish historic baselines of application and database performance that look at how applications performed at the same time on the same day last week, and the week before that, to detect any anomalies before they become larger problems.

The Introduction of Multiple Database Platforms

According to a 2015 report, most DBAs are responsible for multiple database technologies from several vendors, most commonly Oracle, SQL Server, and MySQL. In fact, over a quarter (28 percent, to be precise) of DBAs manage 26 to 100 databases at any given time. This push toward database diversity and DBA efficiency leads to an increasingly complex role for the DBA, who must learn to adapt to unfamiliar database platforms as more responsibility for business success is added to their shoulders.

The following best practices can help DBAs better manage multiple database platforms within a single IT environment:

Have a common set of goals, metrics and SLAs across all databases, ideally based on application response times, not only uptime.

o Use tools that provide a single dashboard of performance and the ability to drill down across database technologies and deployment methods, including cloud.

https://software.dell.com/whitepaper/the-real-world-of-the-database-administrator-875469

http://cdn.swcdn.net/creative/pdf/whitepapers/DD-Application-Performance-Starts-with-Database-Performance-Analysis-2015.pdf?CMP=PUB-PR-SWI-prq215_DPA_2_0-X-DL_WP-PDF

http://www.solarwinds.com/solarwinds-end-user-survey-sheds-light-on-application-importance.aspx

o Document a consistent set of processes for ensuring integrity and security: backup and restore processes, encryption at rest and on transit, detection of anomalies and potential security events in logs, to name a few.

o Establish a strategy, roadmap, and guidelines for moving to the cloud (or not) and for reducing workload costs by moving databases to lower-license-cost versions or open-source alternatives.

Make sure team members can escape firefighting mode and spend enough time proactively optimizing performance of the databases and taking care of important maintenance tasks, which can result in significant cost savings and prevent problems in the future.

Managing Data On-Premises and in the Cloud

With the allure of cost savings, greater flexibility, and more agility, many organizations are eyeing the cloud as an alternative for deploying new applications, including those with high database performance requirements. In fact, technology research firm TechNavio predicts a 62 percent annual growth rate of cloud-based databases through 2018. However, this transition creates new complexities and challenges for DBAs, especially because DBAs are ultimately responsible for both database performance and data security regardless of if the data lives on-premise or in the cloud.

Here are several pieces of advice for managing data in the cloud that DBAs should keep in mind:

o When considering which databases to move to the cloud, take into account data transfer process and latency, and how to maintain databases in sync if required, especially if applications need to integrate with others that are not in the same cloud deployment.

o Poor on-premise database performance will be poor in the cloud too. Moving the problem does not solve it. Scaling in the cloud to compensate for bad performance is the wrong approach and gets expensive quickly.

o Understand the service provider’s services and capabilities, know its SLAs, review its recommended architecture, and be very aware of scheduled maintenance.

o Think through, plan, and manage backup and recovery to ensure important data is not lost in the event of a vendor failure or outage.

o Stay on top of security, realizing that encryption is only the tip of the iceberg. Consider who will monitor database access for malicious or unauthorized access and plan for the worst. Have a documented course of action in case of a security breach or data loss.

http://data-informed.com/how-to-make-a-client-centric-transition-to-cloud/

http://data-informed.com/how-to-make-a-client-centric-transition-to-cloud/

http://www.prnewswire.com/news-releases/global-cloud-based-database-market-2014-2018-274797381.html

http://data-informed.com/5-steps-creating-scalable-data-security-plan/

o If it is important to monitor and optimize on-premise deployments, it’s even more important in the cloud, given its dynamic nature. A consistent set of tools to do so across both sides of hybrid IT environments are ideal.

Shifting to an application-centric approach to database management, managing multiple databases, and managing data on-premise and in the cloud are all challenges that DBAs face in today’s evolving database landscape. But these challenges are not insurmountable. By heeding best practices, DBAs can overcome these challenges and ensure success.

chettinadtech.ac.inchettinadtech.ac.in/storage/16-09-24/16-09-24-09-18-17... · web viewunit v...

Documents