tcssolution–infoanalytics-endtoendoperationalanalyticsusingibmsmartcloudanalytics

6
1 TCS Solution – InfoAnalytics End to End Operational Analytics using IBM SmartCloud Analytics Abstract: This Whitepaper describes TCS InfoAnalytics solution, an end to end operational analytics using IBM SmartCloud Analytics product and system logs in real time which is designed to monitor and troubleshoot for rapid problem isolation and resolution. We would be discussing three use cases in this Whitepaper, where Log analytics can be a boon – an accelerator. This is a must read for audiences who are building a big data hybrid data warehouse (Dwh) strategy or migration plans or if in operations. They are - Analytics to drive Data Reconciliation Analytics driving Technology Migrations Analytics to drive DWh Operations I. INTRODUCTION It is typically seen that IT services are becoming significant in almost all the business lines. As organizations are growing their business, IT expansions are happening exponentially. IT Technology systems are really getting complex. We are considering three problems / use cases in this white paper where TCS InfoAnalytics Solution can reduce maintenance cost by more than 20%. Data Reconciliation Technology Migrations DWh Operations Reconciliation: Business and customer data travels through multiple layers, can be virtual application layers / technology layers / processing layers undergoing variety of operations e.g. cleansing, profiling, transforming, content validations, standardization, integration, slicing & dicing & aggregations. Hence, monitoring transactional data end to end is a big challenge to ensure no data loss. Technology Migrations: With advancement in Technologies & expansion in business, Customers are undergoing enterprise level technology migrations involving 1000-15000 objects migrated along. Performing comparison testing is a big challenge. It becomes a big problem to validate if the issues arising post migration are existing issues or they originated as an outcome of migration / compatibility. DWh Operations: It is typically seen that data warehouse project is executed as part of customer’s vision to build an Enterprise data warehouse. There are many data sources like online applications, transactional systems, legacy systems, files from vendors etc. The data from these sources is dumped to a data repository called data warehouse. The data undergoes transformations, aggregations via ETL process. To facilitate the reporting requirements data is further processed and stored in Data marts for respective functional areas. Then there is Analytics layer on top of these datamarts to view the reports, dashboards, and perform other kinds of analytics like what if predictive, etc. A typical DWh project may go on for a year or two or even longer. A lot of day-to-day issues are faced by the Operational Personnel. The data may not be seen in reports, lots of errors are occurring or the data from source system is getting rejected. The revenue nos. are not matching. Challenge is when issue originates in integrated environment it becomes a challenge to trace to problem across the enterprise to understand which piece is actually problematic. Nita Khare IBM SW – Information Management & Business Analytics Lead Tata Consultancy Services Ltd [email protected] Paramjot Kochar IBM SW – Solutions Lead Tata Consultancy Services Ltd [email protected] Vinay G Rajagopal WW GSI Cloud & Analytics Leader IBM Cloud & Smarter Infrastructure Labs, SWG Group [email protected]

Upload: vinay-rajagopal

Post on 15-Aug-2015

47 views

Category:

Documents


0 download

TRANSCRIPT

1

TCS Solution – InfoAnalytics End to End Operational Analytics using IBM SmartCloud Analytics

Abstract: This Whitepaper describes TCS InfoAnalytics solution, an end to end operational analytics using IBM SmartCloud Analytics product and system logs in real time which is designed to monitor and troubleshoot for rapid problem isolation and resolution. We would be discussing three use cases in this Whitepaper, where Log analytics can be a boon – an accelerator. This is a must read for audiences who are building a big data hybrid data warehouse (Dwh) strategy or migration plans or if in operations. They are -

• Analytics to drive Data Reconciliation • Analytics driving Technology Migrations • Analytics to drive DWh Operations

I. INTRODUCTION

It is typically seen that IT services are becoming significant in almost all the business lines. As organizations are growing their business, IT expansions are happening exponentially. IT Technology systems are really getting complex. We are considering three problems / use cases in this white paper where TCS InfoAnalytics Solution can reduce maintenance cost by more than 20%.

ü Data Reconciliation ü Technology Migrations ü DWh Operations

Reconciliation: Business and customer data travels through multiple layers, can be virtual application layers / technology layers / processing layers undergoing variety of operations e.g. cleansing, profiling, transforming, content validations, standardization, integration, slicing & dicing & aggregations. Hence, monitoring transactional data end to end is a big challenge to ensure no data loss.

Technology Migrations: With advancement in Technologies & expansion in business, Customers are undergoing enterprise level technology migrations involving 1000-15000 objects migrated along. Performing comparison testing is a big challenge. It becomes a big problem to validate if the issues arising post migration are existing issues or they originated as an outcome of migration / compatibility.

DWh Operations: It is typically seen that data warehouse project is executed as part of customer’s vision to build an Enterprise data warehouse. There are many data sources like online applications, transactional systems, legacy systems, files from vendors etc. The data from these sources is dumped to a data repository called data warehouse. The data undergoes transformations, aggregations via ETL process. To facilitate the reporting requirements data is further processed and stored in Data marts for respective functional areas. Then there is Analytics layer on top of these datamarts to view the reports, dashboards, and perform other kinds of analytics like what if predictive, etc. A typical DWh project may go on for a year or two or even longer. A lot of day-to-day issues are faced by the Operational Personnel. The data may not be seen in reports, lots of errors are occurring or the data from source system is getting rejected. The revenue nos. are not matching. Challenge is when issue originates in integrated environment it becomes a challenge to trace to problem across the enterprise to understand which piece is actually problematic.

Nita Khare IBM SW – Information Management &

Business Analytics Lead Tata Consultancy Services Ltd

[email protected]

Paramjot Kochar IBM SW – Solutions Lead

Tata Consultancy Services Ltd [email protected]

Vinay G Rajagopal WW GSI Cloud & Analytics Leader IBM Cloud & Smarter Infrastructure

Labs, SWG Group [email protected]

2

II. PAIN AREAS

Due to complexity of systems and multiple technologies, it is very difficult to maintain end to end issue free operations .The added on challenge is to identify the exact issue in complex ecosystems & to perform impact analysis. Huge investments are made on operations maintenance & infrastructure support to ensure everything is running fine. The main pain areas of the client are depicted in the image below:

Fig. 1. Pain Areas of Operations

III. OUTPUT With TCS InfoAnalytics solution, we would be able to deal with above pain areas and will try to bring a solution, which can reduce a maintenance cost by more than 20 % and can help in streamlining end-to-end operations. We would be discussing for each of the use cases.

A. Data Reconciliation - Solution facilitates variety of dashboards and reports including

ü Reconciliation Reports ü Transaction Trend Analysis

Transactional Trend Analysis - Users would be able to drive trend analysis for daily / weekly & monthly transactions. It gives opportunity to support the data growth trends, data growth v/s sales forecasting etc. Transaction trends in Peak season can be an example. There are opportunities to generate alerts / a mail

notification if data is in the Log files crosses the baselines. For example if data in figure 2. we observed a peak in the trend line for a day. Peak may be a genuine against increased sales on a holiday/festival but there are chances that we get duplicate transactions.

Fig. 2. Transactional Trend Analysis – Daily Inserts to Target

Reconciliation Reports - Users would be able to drive reconciliation reports. It will help users to track business & customer transactions end to end across technology & application interfaces. The reports can help in tracking data loss / miss at any level so none of the loss of information happens. Additionally, users can record transaction counts processed successfully as well as rejected so that stake holders can be informed time to time if some transactions doesn’t comply business rules or are rejected at any level.

Hence dashboards helps in bringing in visibility of transactions (quantity & quality) which is very tedious elsewhere to maintain at flow / table level. At the same time the alert and notification mechanism supports anomaly / fault detection mechanism so that alerts / notifications can be sent to stake holders in time.

Fig. 3. Reconciliation Report– Source & Target ( Match Record Count )

§ Records Read From Source/Records Loaded to Target/Records Rejected/Records deleted can be captured and if there are any re-conciliation issues.

§ Alerts/email notifications can be configured to intimate data anomalies

§ Data Miss Issues by tracking mismatch in records processed.

Records read from Source & Loaded to Target are matching

3

B. Technology Migration - We have considered technology migration for ETL Tool – Datastage for use case. There were more 8000+ datastage jobs and 3000+ unix scripts were migrated in inventory as a part of Datastage migration. Comparison testing huge inventory is difficult. At the same time if post migration a job fails to work if is difficult to track if issue is occurring to track if issue is occurring due to existing issues in legacy environment or issue is due to migration. As a part of solution, user will be able to see the Component run status in both legacy & new environment. Case1.) Jobs fail in both legacy & migrated environment implies existing issues. Case2.) Jobs fail in migrated environment and running in legacy implies issue originated due to migration. As mentioned in figure 4. We have a scenario captured against case2. Job happens to fail in migrated Datastage 9.1 environment whereas job is running fine in legacy environment. Further next level of analysis facilitated are where users can dig in detail level of logs for failed job in DS9.1 and check for warning and fatal errors .It helps users to mitigate/analyze the issues faced and perform root-cause analysis.

Fig. 4. Comparison Testing & Issue Detection – ETLMigration Issues

Fig. 5. Hyperlink Dashboard – Issue Troubleshoot Recommendations

There are opportunities to seek expert advice for the issues faced. Tool facilitates Hyperlink dashboards. Based on fatal errors captured users will be routed to best practice / standard product documentation sites and they can seek the recommended solutions as well. Tool Facilitate

§ Comparison Testing/drive comparative Trend analysis in both the versions

§ Single Dashboard helps to avoid time in browsing all environments

§ Detailed errors & Fatal Messages available, which help in troubleshooting.

§ Quick Expert Advice help for rapid resolution/fix.

4

B. DWh Operations - We are considering a use case where business reported no data for Feb11 in a DWh environment. The business users are breathing down the neck to understand where the data went missing. We took logs from Reporting server, Application server, Databases server, ETL Application logs etc. Figure 6 depicts shows functioning of all the respective servers for all the days. At single view business can check /scrutinize the health check of the entire systems .In current problem it shows all environments running fine for the day .No issues at environment level can be anticipated. Fig. 6. DWh Summary Dashboard and Issue Troubleshoot

Further, when issue was scrutinized at application level, it was observed that all the ETL jobs also finished successfully. But when the detailed report on each run was seen issue is diagnosed that file received from source systems for Feb11 was having 0 data. Hence tracking the issue occurrence in integrated environment is easier with the help of dashboards created. Further Pro-active alerts can be configured which can inform stakeholders that file with 0 data is received as a part of preventive maintenance. Thus facilitate the issue root-cause analysis as well as proactive notification to users so avoid the potential Data miss issues especially when transactions involved are related to finance or critical to business.

5

IV. PROCESS FLOW

The Process consist of below phases – 1.) Application/System/Product Logs Collection 2.) Data Sources Declaration 3.) Insight Pack & Data Ingesting 4.) Alerts /Notification /Creating Dashboards

Log Collection: Logs are gathered for the choice of analysis. They can be application, system, middleware product, network or storage related. Based on the use case we consider logs. Logs can be manually exported as well as there are opportunities to automate Log Push or Pull .If there be need of real-time operational analytics one can leverage IBM Info Streams solution for real time data / log movement. Data Sources Declaration: Once logs are pushed/ pulled in to SCALA, Next step is to declare the valid data source. Data sources should be unique. We can consume a log multiple times by giving unique name every time. Declared data sources are exhaustive to drive analysis via SCALA. Insight Pack & Data Ingestion: The next step is to create Insight Pack. It consists of writing AQL scripts corresponding to the required attributes. SCALA uses AQL based scripts at backed to extract the required attributes from the unstructured logs .The Insight Packs are created for each product for e.g. for the use cases discussed we created Insight Packs for Datastage, Cognos , WAS, DB2 respectively . The AQL will help in finding defining the criterion for consuming the data from the sources and converting the same in useful format .The next step is to ingest the data using respective Insight Packs. Ingested data would be available for further Analysis / Reports.

Insight Packs created and out of the box ones used as part of solution are -

ü Cognos Insight Pack ü DataStage Insight Pack ü DB2 Insight Pack ü WAS Insight Pack

Alerts /Notification /Creating Dashboards: The data hence ingested would be available for creating the Reports/dashboards. Dashboards are created by querying the data against the defined patterns in the insight packs. One can select the attributes of interest to interpret the useful information. Further with the help of Python based scripting there are opportunities to create the baselines, rules for alerts etch so that once data is ingested based on the business / data quality rules alerts or notifications will be sent to the stake holders. Email Integrations are available to leverage .In dashboards as well there are multiple options to design the 2D or 3D views with a list of bar / line / bubble / line etc types of graphs which helps in performing quality reporting on top of analyzed data.

V. SOLUTION TECHNOLOGY STACK For the Use Cases we discussed in the White Paper we have used the following technologies. IBM SmartCloud Analytics can be implemented on any existing data warehouse stack or non data warehouse requirements including non-IBM stack.

6

VI. TARGET AUDIENCE

Audience of this document is expected to have basic knowledge of Technology Operations with a preferred knowledge of Data Warehousing, Technology Integrations and Infrastructure Support.

VII. BENEFITS

ü Ease of Maintenance - Reduced ongoing maintenance effort (30% of maintenance cost).

ü Robust, Flexible and Scalable operational monitoring process which is in line with industry best practices

ü Reduced risk of non-compliance ü Technical effectiveness and efficiency of the

Data Warehouse solution ü Effective operations management

VIII. SECURITY Tool just runs over the Logs supplied via Push or Pull mechanisms. It doesn’t interrupt with the ongoing operations for the applications / databases. Solution is adaptable to pull or push mechanisms, manual or automated, real time or cold logs. Solution may not be running in same environment where applications are running. All it needs is application logs. Solution maintains SCALA users / OS users can be created as trusted users so that Reports Dashboards are available for trusted users only.

IX. SUMMARY The solution is applicable for end to operational analytics including Data Warehouses or non DWh requirements where operational monitoring & rapid problem isolation and resolution is the need. For example: Business Process Monitoring, Portal Analytics, Integration Platform Analytics, Integration with Third Party Products or Data sources like ITSM, Job scheduling etc. The solution can be hosted on Cloud environments as a Service too.

X. CONTACT AUTHORS

Vinay G Rajagopal WW GSI Cloud & Analytics Leader IBM Cloud & Smarter Infrastructure Labs, SWG Group Email: [email protected] Twitter: @winzyz

Nita Khare IBM SW – Information Management & Business Analytics Lead Tata Consultancy Services Ltd [email protected]

Paramjot Kochar IBM SW – Solutions Lead Tata Consultancy Services Ltd [email protected]