connor big data

27
copyright Earthsongs Holistic LLC 2014 Big Data: Big Data: Strategies and Strategies and Synergies Synergies Melinda H. Connor Melinda H. Connor D.D., Ph.D., AMP, FAM D.D., Ph.D., AMP, FAM Adjunct Professor, Akamai University Adjunct Professor, Akamai University

Upload: david-jimenez

Post on 24-May-2015

139 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Connor big data

copyright Earthsongs Holistic LLC 2014

Big Data: Big Data: Strategies and SynergiesStrategies and Synergies

Melinda H. ConnorMelinda H. ConnorD.D., Ph.D., AMP, FAMD.D., Ph.D., AMP, FAM

Adjunct Professor, Akamai UniversityAdjunct Professor, Akamai University

Page 2: Connor big data

copyright Earthsongs Holistic LLC 2014

Melinda H. Connor, D.D., Ph.D., AMP, FAMMelinda H. Connor, D.D., Ph.D., AMP, FAMAdjunct Professor, Akamai University, Hilo, HawaiiAdjunct Professor, Akamai University, Hilo, Hawaii

Science Advisor, Spirituals for the 21st Century, Georgia Science Advisor, Spirituals for the 21st Century, Georgia and Nolan Payton Archive of Sacred Music, California and Nolan Payton Archive of Sacred Music, California

State University Dominguez Hills State University Dominguez Hills CEO, National Foundation for Energy HealingCEO, National Foundation for Energy Healing

Dr. Connor is the former team lead level 3 support for Dr. Connor is the former team lead level 3 support for IBM’s Business Intelligence Technical Support Group.IBM’s Business Intelligence Technical Support Group.

[email protected][email protected]

Page 3: Connor big data

copyright Earthsongs Holistic LLC 2014

What are the “Big Issues” What are the “Big Issues” around “Big Data”?around “Big Data”?

Page 4: Connor big data

copyright Earthsongs Holistic LLC 2014

Challenges:Challenges:

• Quality of programming skills of the Quality of programming skills of the computer programmers.computer programmers.

• Level of problem definition.Level of problem definition.• Level of actual problem understanding Level of actual problem understanding

in the specific area.in the specific area.• Correct hardware to solve the issue.Correct hardware to solve the issue.• Correct software to solve the issue.Correct software to solve the issue.

Page 5: Connor big data

copyright Earthsongs Holistic LLC 2014

Challenges con’t:Challenges con’t:

• Intersection and compatibility of the Intersection and compatibility of the hardware and software.hardware and software.

• Intersection and compatibility of the Intersection and compatibility of the software on multiple platforms.software on multiple platforms.

• Understanding of the end user needs.Understanding of the end user needs.• Production of the reports in a format Production of the reports in a format

that the end user can understand.that the end user can understand.

Page 6: Connor big data

copyright Earthsongs Holistic LLC 2014

Client QuoteClient Quote

I don’t care how your software I don’t care how your software works. I don’t want to spend works. I don’t want to spend

time with your software. I just time with your software. I just want the data I need to run my want the data I need to run my

business!business!

Page 7: Connor big data

copyright Earthsongs Holistic LLC 2014

Flip Side:Flip Side:• Poorly trained user community wanting Poorly trained user community wanting

turn key solutions.turn key solutions.

• The incorrect people making the The incorrect people making the purchasing decisions.purchasing decisions.

• Poorly defined understanding of what the Poorly defined understanding of what the “real” problem is that they are trying to “real” problem is that they are trying to solve.solve.

• Poor quality problem reports.Poor quality problem reports.

Page 8: Connor big data

copyright Earthsongs Holistic LLC 2014

Where to start...Where to start...

Page 9: Connor big data

copyright Earthsongs Holistic LLC 2014

How can utilize the terabytes per hour How can utilize the terabytes per hour that you are receiving?that you are receiving?

• Define the needs closely as possible to match Define the needs closely as possible to match the needs of the business or situationthe needs of the business or situation

• Do data mining! There will be more that you Do data mining! There will be more that you can usecan use

• Select the correct platform to do the Select the correct platform to do the processing at speedprocessing at speed

• Understand all of the tools that are available – Understand all of the tools that are available – do not limit yourself to one companies tools do not limit yourself to one companies tools but do write in clauses that the software must but do write in clauses that the software must work together or no one gets paid.work together or no one gets paid.

Page 10: Connor big data

copyright Earthsongs Holistic LLC 2014

What is the most effective What is the most effective management of this “big data”?management of this “big data”?

• Play both ends against the middle!Play both ends against the middle!–One end is the problem you are trying to solve.One end is the problem you are trying to solve.

–The other end is the report the end user needs.The other end is the report the end user needs.

• Build fast platforms that are correctly sized for the Build fast platforms that are correctly sized for the load.load.

• Limit the bottlenecks in the hardware.Limit the bottlenecks in the hardware.• Have the correct people do the purchasing and use Have the correct people do the purchasing and use

industry specialists.industry specialists.

Page 11: Connor big data

copyright Earthsongs Holistic LLC 2014

SPEED, SPEED, CORRECT PLATFORM, CORRECT PLATFORM,

CORRECT FORM OF DATA BASE, CORRECT FORM OF DATA BASE, CORRECT TOOLS for ANALYSIS CORRECT TOOLS for ANALYSIS

and the and the CORRECT FORM OF THE REPORTCORRECT FORM OF THE REPORT

Page 12: Connor big data

copyright Earthsongs Holistic LLC 2014

What are the most effective ways of What are the most effective ways of understanding the ecological understanding the ecological landscape of the data you are landscape of the data you are

receiving?receiving?

• Start by understanding the types of data you are Start by understanding the types of data you are collecting.collecting.

• Then understand the tools available.Then understand the tools available.• For example: Object oriented vs relational For example: Object oriented vs relational

databases which do you use and when do you databases which do you use and when do you use one or the other?use one or the other?

Page 13: Connor big data

copyright Earthsongs Holistic LLC 2014

How do you determine new corporate How do you determine new corporate strategic direction based on the data strategic direction based on the data

when the shape of the data itself is not when the shape of the data itself is not clear?clear?

By defining the problem that you areBy defining the problem that you are

trying to solve very tightly. Thentrying to solve very tightly. Then

you get the data which answers theyou get the data which answers the

questions.questions.

Page 14: Connor big data

copyright Earthsongs Holistic LLC 2014

How long do you keep the raw data?How long do you keep the raw data?• How much storage space do you have available and how fast How much storage space do you have available and how fast

are you getting the data?are you getting the data?• What are your storage processing speeds and how fast can you What are your storage processing speeds and how fast can you

process the data that is available.process the data that is available.• Know where the bottlenecks are in the physical limitations of Know where the bottlenecks are in the physical limitations of

your hardware:your hardware:• For example: if you have a slow IO handler? For example: if you have a slow IO handler? • Know the limitations in the way your database is designed:Know the limitations in the way your database is designed:• File vs table vs row/column locking!File vs table vs row/column locking!• What about threading? What about threading? • When is the OS software going to start thrashing?When is the OS software going to start thrashing?• What about speed of allocation of memory space?What about speed of allocation of memory space?• What are the legal requirements?What are the legal requirements?

Page 15: Connor big data

copyright Earthsongs Holistic LLC 2014

Real World Example:Real World Example:

• Internet broadcast of a science Internet broadcast of a science experiment:experiment:

• 8k users logged on a system designed 8k users logged on a system designed for 2400 users with different for 2400 users with different businesses.businesses.

• RESULTRESULT• Crashed every server in the system.Crashed every server in the system.

Page 16: Connor big data

copyright Earthsongs Holistic LLC 2014

And what data will you dump?And what data will you dump?

• Everything you can! You will be getting Everything you can! You will be getting more!more!

• Life/data runs in cycles. You will not hear or Life/data runs in cycles. You will not hear or see the information only once. There are ways see the information only once. There are ways to back up the raw data and keep it for a to back up the raw data and keep it for a number of years but do you REALLY need number of years but do you REALLY need that data?that data?

Page 17: Connor big data

copyright Earthsongs Holistic LLC 2014

What about the limitations of the What about the limitations of the hardware of the various platforms and hardware of the various platforms and

the network structure itself?the network structure itself?

• Problem definition skills of decision makers.Problem definition skills of decision makers.• They do not define the needs of the business closely They do not define the needs of the business closely

enough because they are not using the actual data.enough because they are not using the actual data.• Do not understand sizing the volume of data properly Do not understand sizing the volume of data properly

so that the correct processing platform is selected.so that the correct processing platform is selected.• Do not understand what shape the final product needs Do not understand what shape the final product needs

to be in to be useful to the team.to be in to be useful to the team.

Page 18: Connor big data

copyright Earthsongs Holistic LLC 2014

Real World Example:Real World Example:

• Hospital System (50 hospitals)Hospital System (50 hospitals)– Wanted to have end users on PC’s so selected a PC Wanted to have end users on PC’s so selected a PC

based system which could not handle the based system which could not handle the processing load.processing load.

– Decided on centralized servers without tiered Decided on centralized servers without tiered support.support.

– Did not purchase enough servers.Did not purchase enough servers.– Did not distribute network load effectively.Did not distribute network load effectively.– Did not provide enough training on the software to Did not provide enough training on the software to

medical personnel.medical personnel.

Page 19: Connor big data

copyright Earthsongs Holistic LLC 2014

Programmer TrainingProgrammer Training

• Issues with the training of the programmers: Issues with the training of the programmers:

– Many do not understand how to write the Many do not understand how to write the software to use the hardware most software to use the hardware most effectively. effectively.

– AND they do not understand the stacking. AND they do not understand the stacking.

– AND they do not understand how to AND they do not understand how to optimize the code to make the best use of optimize the code to make the best use of the compilers.the compilers.

Page 20: Connor big data

copyright Earthsongs Holistic LLC 2014

Use an industry specialist!Use an industry specialist!

Page 21: Connor big data

copyright Earthsongs Holistic LLC 2014

What are the most effective ways of What are the most effective ways of data-mining?data-mining?

• Specialized software for the platform. Specialized software for the platform. • Build the algorithms to determine if there are Build the algorithms to determine if there are

any random correspondences.any random correspondences.• Know what data you what to review.Know what data you what to review.• Build meta-data platforms whenever possible.Build meta-data platforms whenever possible.

• Have the people doing the design and builds Have the people doing the design and builds understand the shape of the data before they understand the shape of the data before they start!start!

Page 22: Connor big data

copyright Earthsongs Holistic LLC 2014

Real World Example:Real World Example:

• Soft Drink Company in 122 countries:Soft Drink Company in 122 countries:

• Need to understand peek load days for Need to understand peek load days for manufacture and distribution.manufacture and distribution.

• Problem trying to address was concurrence Problem trying to address was concurrence when one country would have to support the when one country would have to support the overload of another.overload of another.

• Meta-data critical to understanding and Meta-data critical to understanding and defining the shape of the data. defining the shape of the data.

Page 23: Connor big data

copyright Earthsongs Holistic LLC 2014

What about cross platform portability What about cross platform portability of the final product?of the final product?

Wolf Geiger (1992) - Data is only as Wolf Geiger (1992) - Data is only as good as the format in which it is good as the format in which it is presented to the person who has to use presented to the person who has to use it. If it is not in a format that they can it. If it is not in a format that they can use there is no point in spending the use there is no point in spending the time to do any of the processing. time to do any of the processing.

Page 24: Connor big data

copyright Earthsongs Holistic LLC 2014

Real World Example:Real World Example:

• Asked the end user to write down exactly what Asked the end user to write down exactly what they wanted in the report.they wanted in the report.

• Asked the manager to write down exactly what Asked the manager to write down exactly what they wanted in the report.they wanted in the report.

• Asked the computer programmer to write Asked the computer programmer to write down exactly what the clients wanted in the down exactly what the clients wanted in the report.report.

• Two of three matched. Which one did not?Two of three matched. Which one did not?

Page 25: Connor big data

copyright Earthsongs Holistic LLC 2014

Cell Phone Data: How should it be Cell Phone Data: How should it be parsed?parsed?

• Has to be done on super computers to start based on the volume of the data Has to be done on super computers to start based on the volume of the data but it has to end in PC formats!but it has to end in PC formats!

• Object oriented db with full variable length fields.Object oriented db with full variable length fields.• Needs Multi-dimensional processing:Needs Multi-dimensional processing:

– Computational linguistics.Computational linguistics.• Analysis of word stressors. Analysis of word stressors. • Analysis of grammatical syntax.Analysis of grammatical syntax.

– Cognitive focus (topic basis).Cognitive focus (topic basis).– Recognized vocal stress vs topic.Recognized vocal stress vs topic.– Risk factor assignment.Risk factor assignment.– Background noise assessment.Background noise assessment.– Probability analysis of each of the factors to determine further review.Probability analysis of each of the factors to determine further review.

• Data presentation tools have to be in a format that is currently used that Data presentation tools have to be in a format that is currently used that everyone understands where to look to find the important information.everyone understands where to look to find the important information.

• Cross platform portability!!!!Cross platform portability!!!!

Page 26: Connor big data

copyright Earthsongs Holistic LLC 2014

Questions?Questions?

Page 27: Connor big data

copyright Earthsongs Holistic LLC 2014

Thank you!Thank you!