data analytics for big data
DESCRIPTION
Data Analytics for Big Data. Vandana P. Janeja Information Systems Department, University of Maryland, Baltimore County, MD, USA. Big Data. What is Big Data? - PowerPoint PPT PresentationTRANSCRIPT
DINAMIC
Data Analytics for Big Data
Vandana P. Janeja
Information Systems Department, Information Systems Department, University of Maryland, University of Maryland,
Baltimore County, MD, USABaltimore County, MD, USA
DINAMIC
Big Data
• What is Big Data?• Recently much good science, whether physical,
biological, or social, has been forced to confront - and has often benefited from - the Big Data phenomenon.
• Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology. (p. 115)
Diebold, F.X. (2003), \Big Data Dynamic Factor Models for Macroeconomic Measurementand Forecasting: A Discussion of the Papers by Reichlin and Watson," In M. Dewa-tripont, L.P. Hansen and S. Turnovsky (eds.), Advances in Economics and Econometrics:Theory and Applications, Eighth World Congress of the Econometric Society, CambridgeUniversity Press, 115-122
DINAMIC
Big data spans four dimensions:
Volume, Velocity, Variety, and Veracity
DINAMIC
• Volume: Enterprises are awash with ever-growing data of all types, – Terabytes-petabytes-exabytes—of
information.– Turn 12 terabytes of Tweets created each
day into improved product sentiment analysis
– Convert 350 billion annual meter readings to better predict power consumption
DINAMIC
• Velocity: Sometimes 2 minutes is too late. – For time-sensitive processes such as
catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.
– Scrutinize 5 million trade events created each day to identify potential fraud
– Analyze 500 million daily call detail records in real-time to predict customer churn faster
DINAMIC • Variety: Big data is any type of data - structured and unstructured data – text, sensor data, audio, video, click streams,
log files and more. New insights are found when analyzing these data types together.
– Monitor 100’s of live video feeds from surveillance cameras to target points of interest
– Exploit the 80% data growth in images, video and documents to improve customer satisfaction
DINAMIC
• Veracity: 1 in 3 business leaders don’t trust the information they use to make decisions. – How can you act upon information if you
don’t trust it? – Establishing trust in big data presents a
huge challenge as the variety and number of sources grows.
DINAMIC
Analytics
DINAMIC
Is it all about algorithms
DINAMIC
DINAMIC
Will it make a difference if some of this data is from France and some from Maryland ?
Will it make a difference if some of this data is from LA and some from Baltimore ?
Will it make a difference if some of this data is from Maryland and some from D.C ?
Will it make a difference if some of this data is from Howard County, MD and some from Montgomery County, MD ?
DINAMICUS HIGHWAYS• 42,000 Americans Are Killed
On Highways Each Year• Nearly one-third of all fatal
crashes each year are caused by substandard road conditions and roadside hazards.
• Motor vehicle crashes cost the United States $231 billion annually, including $21 billion from Federal and State tax revenue.
• Americans Waste $67 Billion Each Year Due To Congestion
Ref: http://www.house.gov/transportation/press/press2005/release9.html
According to the 2001 statistics, NJ ranks 12 in intersection fatalities with 32.1% of all state highway fatalities, and ranks 12 in pedestrian fatalities with 17.7% of all state highway fatalities (USDOT)
DINAMIC
LA Times 4/27/09 12pm
DINAMICDr. William Schaffner, chairman of Preventive Medicine at Vanderbilt University Medical Center in Nashville, Tenn., said doctors like him have been advised by the CDC and state health department to set up a system that would test patients with flu-like symptoms and help define how widespread this outbreak is. He said the severity of the virus is hard to gauge because of the wide discrepancy in how it has affected Mexicans and Americans, and because it is occurring in places that are warm, which is very unusual. "The genetic make up of this virus has influenza experts scratching their heads," he said. "One of the things that has us worried is that could this be a virus that could continue to make mischief during the warmest parts of the year. That would be a big thing. For a respiratory virus to be active during the summer months" would be very unique.
CDC Officials Confirm Swine Flu Cases Up to 40; Outbreak May Worsen : ABC News 2/27/09 1pm
DINAMIC
April 21, 2023 Data Mining: Concepts and Techniques
15
Knowledge Discovery (KDD) Process
– Data mining—core of knowledge discovery process
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
DINAMIC
Big Data Framework
• Automatic Parallelization• Run-time
– Data partitioning– Task scheduling– Handling machine failures– Managing inter-machine communication
• Completely transparent to the programmer/analyst/user
DINAMIC
Relevant IS Courses
• IS 410 Introduction to Database Design • IS 420 Database Application
Development • IS 427 Introduction to Artificial
Intelligence: Concepts and Applications • IS 428 Data Mining Techniques and
Applications • IS 498 Special Topics• Independent studies