data-intensive and high performance computing on cloud environments gagan agrawal 1
TRANSCRIPT
![Page 1: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/1.jpg)
Data-Intensive and High Performance Computing on Cloud Environments
Gagan Agrawal
1
![Page 2: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/2.jpg)
Outline
• Introduction to Cloud Computing• Ongoing Projects in Cloud Computing
‣ Data-intensive computing Middleware System‣ Resource Provisioning with Budget and Time Constraints‣ Workflow consolidation with power constraints ‣ An Elastic Cache on the Amazon Cloud
• Other Research Projects‣ Heterogeneous High-Performance Computing‣ Deep web Integration and Mining ‣ Scientific Data Management
2
![Page 3: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/3.jpg)
3
Utilities: Things We Can’t Live without
![Page 4: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/4.jpg)
4
Utility Costs Depend on Usage
Utility Providers Consumers
Resource on Demand
![Page 5: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/5.jpg)
5
Utility Costs Depend on Usage
Utility Providers Consumers
Pay Per Usage
$
$
$
![Page 6: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/6.jpg)
Utilities of Today Haven’t Always Been Utilities
6
Hand-pumpA Horse Cart: Your purchase and `maintain’ the source of power for your transportation
![Page 7: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/7.jpg)
7
How Do We Currently Do Computing?
Resources are co-located on site
Computing ResourcesSupport Personnel
Computing Consumer
![Page 8: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/8.jpg)
8
How Do We Currently Do Computing?
Resources are co-located on site
Computing ResourcesSupport Personnel
Computing Consumer
![Page 9: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/9.jpg)
9
Computing as a Utility
Cloud “Utility” Providers:Amazon AWS, Azure,
Cloudera, Google App Engine
Consumers:Companies, labs, schools, et
al.
![Page 10: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/10.jpg)
10
ProcessedResults
Computing as a Utility
Algorithms& Data
Cloud “Utility” Providers:Amazon AWS, Azure,
Cloudera, Google App Engine
Consumers:Companies, labs, schools, et
al.
![Page 11: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/11.jpg)
11
ProcessedResults
Computing as a Utility
Algorithms& Data
Cloud “Utility” Providers:Amazon AWS, Azure,
Cloudera, Google App Engine
Consumers:Companies, labs, schools, et
al.
![Page 12: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/12.jpg)
12
Why Now?
• It has finally become cost-effective to offer computing as a service
• Large companies, e.g., Amazon, Microsoft, Google, Yahoo!‣ Already have the computing personnel,
infrastructure in place‣ Decreasing costs of hardware‣ Virtualization advancements
![Page 13: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/13.jpg)
13
Example of Cost Effectiveness at the Provider
![Page 14: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/14.jpg)
14
Why Now?
• This creates a win-win situation
• For the provider:‣ They get paid to fully utilize
otherwise idle hardware
• For the user:‣ They save on costs‣ Example: Amazon’s Cloud is
$0.10 per machine-hour
![Page 15: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/15.jpg)
15
Promises of Cloud Computing
• Cost Associativity‣ Running 1 machine for 10
hours = running 10 machines for 1 hour
• Elasticity‣ Cloud applications can
stretch and contract their resource requirements
• “Infinite resources”
![Page 16: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/16.jpg)
Research Challenges
• How do we exploit cost associativity and elasticity of the cloud for various applications?
• How do the cloud providers provide adequate QoS to various applications and users ‣ Maximize their revenue, lower their costs
• How do we develop effective services to support applications on cloud providers
• How can we combine the use of cloud and traditional resources for various applications‣ (HPC) Cloud Bursting
• How do we effectively manage large scale data on the cloud?
16
![Page 17: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/17.jpg)
Outline
• Introduction to Cloud Computing• Ongoing Projects in Cloud Computing
‣ Data-intensive computing Middleware Systems‣ Resource Provisioning with Budget and Time Constraints‣ Workflow consolidation with power constraints ‣ An Elastic Cache on the Amazon Cloud
• Other Research Projects‣ Heterogeneous High-Performance Computing‣ Deep web Integration and Mining ‣ Scientific Data Management
17
![Page 18: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/18.jpg)
April 20, 202318
•Growing need for analysis of large scale data ‣Scientific ‣Commercial
• Data-intensive Supercomputing (DISC) • Map-Reduce has received a lot of attention
‣ Database and Datamining communities ‣ High performance computing community
• Closely coupled with interest in cloud computing
Motivation
![Page 19: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/19.jpg)
April 20, 202319
•Positives: ‣Simple API
- Functional language based - Very easy to learn
‣Support for fault-tolerance - Important for very large-scale clusters
•Questions‣Performance?
- Comparison with other approaches
‣Suitability for different class of applications?
Map-Reduce: Positives and Questions
![Page 20: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/20.jpg)
Class of Data-Intensive Applications
• Many different types of applications ‣ Data-center kind of applications
- Data scans, sorting, indexing ‣ More ``compute-intensive`` data-intensive applications
- Machine learning, data mining, NLP - Map-reduce / Hadoop being widely used for this class
‣ Standard Database Operations - Sigmod 2009 paper compares Hadoop with Databases and OLAP systems
• What is Map-reduce suitable for?• What are the alternatives?
‣ MPI/OpenMP/Pthreads – too low level?
April 20, 202320
![Page 21: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/21.jpg)
Our Work
• Proposes MATE (a Map-Reduce system with an AlternaTE API) based on Generalized Reduction ‣ Phoenix implemented Map-Reduce in shared-memory
systems‣ MATE adopted Generalized Reduction, first proposed in
FREERIDE that was developed at Ohio State 2001-2003‣ Observed API similarities and subtle differences between
MapReduce and Generalized Reduction
• Comparison for ‣ Data Mining Applications ‣ Compare performance and API ‣ Understand performance overheads
• Will an alternative API be better for ``Map-Reduce``?
April 20, 202321
![Page 22: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/22.jpg)
Comparing Processing Structures
22
• Reduction Object represents the intermediate state of the execution• Reduce func. is commutative and associative• Sorting, grouping.. overheads are eliminated with red. func/obj.
April 20, 2023
![Page 23: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/23.jpg)
Observations on Processing Structure
• Map-Reduce is based on functional idea ‣ Do not maintain state
• This can lead to overheads of managing intermediate results between map and reduce‣Map could generate intermediate results of very large
size
• MATE API is based on a programmer managed reduction object ‣ Not as ‘clean’ ‣ But, avoids sorting of intermediate results ‣ Can also help shared memory parallelization ‣ Helps better fault-recovery
April 20, 202323
![Page 24: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/24.jpg)
April 20, 202324
Results: Data Mining (I)
• K-Means: 400MB dataset, 3-dim points, k = 100 on one WCI node with 8 cores
0
20
40
60
80
100
120
1 2 4 8
PhoenixMATEHadoop
Avg
. Tim
e P
er
Itera
tion
(sec)
# of threads
![Page 25: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/25.jpg)
Outline
• Introduction to Cloud Computing• Ongoing Projects in Cloud Computing
‣ Data-intensive computing Middleware Systems‣ Resource Provisioning with Budget and Time Constraints‣ Workflow consolidation with power constraints ‣ An Elastic Cache on the Amazon Cloud
• Other Research Projects‣ Heterogeneous High-Performance Computing‣ Deep web Integration and Mining ‣ Scientific Data Management
25
![Page 26: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/26.jpg)
26
Resource Provisioning Motivation: Adaptive Applications
Earthquake modelingCoastline forecasting Medical systems
• Time-Critical Event Processing- Compute-intensive- Time constraints- Application-specific flexibility- Application Quality of Service (QoS)
![Page 27: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/27.jpg)
27
Adaptive Applications (Cont’d)
Adaptive Applications that perform time-critical event processing
• Application-specific flexibility: parameter adaptation• Trade-off between application QoS and execution time
HPC ApplicationsHPC Applications(compute-intensive)(compute-intensive)HPC ApplicationsHPC Applications(compute-intensive)(compute-intensive)
• Aim at maximize performance• Do not consider adaptation
Deadline-drivenDeadline-drivenSchedulingSchedulingDeadline-drivenDeadline-drivenSchedulingScheduling
• Not very compute-intensive
![Page 28: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/28.jpg)
28
Challenges
-- Resource Budget Constraints
•Elastic Cloud Computing
- Pay-as-you-go model
•Satisfy the Application QoS with the Minimum Resource Cost
•Dynamic Resource Provisioning
- Dynamically varying application workloads
- Resource budget
![Page 29: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/29.jpg)
29
Background: Pricing Model
•Charged Fees
‣Base price
‣Transfer fee
•Linear Pricing Model
•Exponential Pricing Model
Base price charged for the smallest amount of CPU
cycles
Transfer fee for each CPU allocation change
CPU cycle at the ith allocation
Time duration at the ith allocation
Number of CPU cycle allocations
![Page 30: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/30.jpg)
30
Problem Description
• Adaptive Applications
‣ Adaptive parameters
‣ Benefit
‣ Time constraint
• Cloud Computing Environment
‣ Resource budget
‣ Overprovisioning/Underprovisioning
• Goal
‣ Maximize the application benefit while satisfying the time constraints and resource budget
![Page 31: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/31.jpg)
31
Approach Overview
Dynamic Resource Dynamic Resource Provisioning Provisioning (feedback control)(feedback control)
Dynamic Resource Dynamic Resource Provisioning Provisioning (feedback control)(feedback control)
Resource ModelResource Model(with optimization)(with optimization)Resource ModelResource Model(with optimization)(with optimization)
• Resource Provisioning Controller
‣ Multi-input-multi-output (MIMO) feedback control model
‣ Modeling between adaptive parameters and performance metrics
‣ Control policy: reinforcement learning
• Resource Model
‣ Map change of parameters to change in CPU/memory allocations
‣ Optimization: avoid frequent resource changes
change to the adaptive parameters
change to CPU/memoryallocations
![Page 32: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/32.jpg)
32
Resource Provisioning Controller
Performance Performance MetricsMetrics Performance Performance MetricsMetrics
Multi-Input-Multi-Input-Multi-Output Multi-Output ModelModel
Multi-Input-Multi-Input-Multi-Output Multi-Output ModelModel
Control Control PolicyPolicyControl Control PolicyPolicy
00
• Satisfy time constraints and resource budget
00
• Relationship between adaptive parameters and performance metrics
00
• Decide how to change values of the adaptive parameters
00
![Page 33: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/33.jpg)
33
Control Model Formulation -- Performance Metrics
•Performance Metrics
‣Processing progress: ratio between the currently obtained application benefit and the elapsed execution time
‣Performance/cost ratio: ratio between the currently obtained application benefit and the cost of the resources that have been assigned
•Notation
Application benefit obtained at time step kElapsed execution time at time step kResource cost at time step k
![Page 34: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/34.jpg)
34
Control Model Formulation -- Multi-Input-Multi-Output Model
• Auto-Regressive-Moving-Average with Exogenous Inputs (ARMAX)
‣Second-order model
‣ is ith adaptive parameter at time step k
‣ are updated at the end of every interval
Previous observed performance metricsPrevious and current values of adaptive parameters
![Page 35: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/35.jpg)
35
Framework Design
ApplicationApplicationApplicationApplication
Virtualization Management (Eucalyptus, Open Virtualization Management (Eucalyptus, Open Nebular...)Nebular...)
Xen HypervisorXen Hypervisor
VMVM VMVM...
Xen HypervisorXen Hypervisor
VMVM VMVM...
Xen HypervisorXen Hypervisor
VMVM VMVM...
ServiceDeployment
ServiceWrapper
Resource ProvisioningController
Application Controller
ResourceModel
ModelOptimizer
PerformanceManager
PriorityAssignment
StatusQuery
PerformanceAnalysis
![Page 36: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/36.jpg)
Outline
• Introduction to Cloud Computing• Ongoing Projects in Cloud Computing
‣ Data-intensive computing Middleware Systems‣ Resource Provisioning with Budget and Time Constraints‣ Workflow consolidation with power constraints ‣ An Elastic Cache on the Amazon Cloud
• Other Research Projects‣ Heterogeneous High-Performance Computing‣ Deep web Integration and Mining ‣ Scientific Data Management
36
![Page 37: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/37.jpg)
37
Workflow Consolidation: Motivation
• Another Critical Issue in Cloud Environment: Power Management
‣HPC servers consume a lot of energy
‣Significant adverse impact on the environment
• To Reduce Resource and Energy Costs
‣Server consolidation
‣Minimize the total power consumption and resource costs without a substantial degradation in performance
![Page 38: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/38.jpg)
38
Problem Description
• Our Target Applications
‣ Workflows with DAG structure
‣ Multiple processing stages
‣ Opportunities for consolidation
• Research Problems
‣ Combine parameter adaptation, budget constraints and resource allocation with consolidation and power optimization
‣ Challenge: consolidation without parameter adaptation
‣ Support power-aware parameter adaptation -- future work
![Page 39: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/39.jpg)
39
Contributions
• A power-aware consolidation framework, pSciMapper, based on hierarchical clustering and an optimization search method
• pSciMapper is able to reduce the total power consumption by up to 56% with a most a 15% slowdown for the workflow
• pSciMapper incurs low overhead and thus suitable for large-scale scientific workflows
![Page 40: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/40.jpg)
40
The pSciMapper Framework Design
Offline Analysis Online Consolidation
Scientific WorkflowsScientific WorkflowsScientific WorkflowsScientific Workflows
Resource Usage Resource Usage GenerationGenerationResource Usage Resource Usage GenerationGeneration
Temporal Feature Temporal Feature ExtractionExtractionTemporal Feature Temporal Feature ExtractionExtraction
Feature ReductionFeature Reductionand Modelingand ModelingFeature ReductionFeature Reductionand Modelingand Modeling
Time Series
KnowledgeKnowledgebasebase
Temporal Signatures
model
Hierarchical Hierarchical ClusteringClusteringHierarchical Hierarchical ClusteringClustering
Optimization Optimization SearchSearchAlgorithmAlgorithm
Optimization Optimization SearchSearchAlgorithmAlgorithm
Time VaryingTime VaryingResource Resource ProvisioningProvisioning
Time VaryingTime VaryingResource Resource ProvisioningProvisioning
ConsolidatedWorkloads
![Page 41: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/41.jpg)
Outline
• Introduction to Cloud Computing• Ongoing Projects in Cloud Computing
‣ Data-intensive computing Middleware Systems‣ Resource Provisioning with Budget and Time Constraints‣ Workflow consolidation with power constraints ‣ An Elastic Cache on the Amazon Cloud
• Other Research Projects‣ Heterogeneous High-Performance Computing‣ Deep web Integration and Mining ‣ Scientific Data Management
41
![Page 42: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/42.jpg)
Motivation: Data-Intensive Services on Clouds
• Cloud can provide flexible storage • Data-intensive services can be executed on clouds • Caching is an age-old idea to accelerate services
‣ On clouds, can we exploit elasticity
• A cost-sensitive elastic cache for clouds!
42
![Page 43: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/43.jpg)
43
Problem: Query Intensive Circumstances
. . .
. . .
. . .
![Page 44: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/44.jpg)
44
Scaling up to Handle Load. .
.
0
1
2
invoke:
haitimap(29)
(29 mod 3) = 2Which proxy has the page?h(k) = (k mod num_proxies)
h(29)
HIT!reply: data(29)
Derived Data Cache(Cloud Nodes)
HaitiMap
![Page 45: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/45.jpg)
46
Scaling up to Handle Load. .
.
0
1
2
(29 mod 4) = 1Which proxy has the page?h(k) = (k mod num_proxies)
h(29) MISS
Derived Data Cache(Cloud Nodes)
Service Infrastructure
HaitiMap
invoke:
haitimap(29)
3
![Page 46: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/46.jpg)
Outline
• Introduction to Cloud Computing• Ongoing Projects in Cloud Computing
‣ Resource Provisioning with Budget and Time Constraints‣ Workflow consolidation with power constraints ‣ An Elastic Cache on the Amazon Cloud‣ Data-intensive computing Middleware System
• Other Research Projects‣ Heterogeneous High-Performance Computing‣ Deep web Integration and Mining ‣ Scientific Data Management
47
![Page 47: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/47.jpg)
Heterogeneous High Performance Computing
• Heterogeneous arch., a common place‣ Eg., Today’s desktops & notebooks‣ Multi-core CPU + Graphics card on PCI-E
• A Recent HPC system‣ Eg., Tianhe-1 [5th fastest SC, NOV 2009] ‣ Use Multi-core CPUs and GPU (ATI Radeon HD 4870)
on each node
• Multi-core CPU and GPU usage still divided‣ Resources may be under-utilized
• Can Multi-core CPU and GPU be used simultaneously for computation?
48
![Page 48: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/48.jpg)
Overall System Design
49
User Input:Simple C code with
annotations
Application Developer
Multi-core
Middleware API
GPU Code for
CUDA
Compilation Phase
Code Generator
Run-time System
Worker Thread Creation and Management
Map Computation to CPU and GPU
Dynamic Work Distribution
Key Components
![Page 49: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/49.jpg)
Performance of K-Means (Heterogeneous - NUCS)
50
60%
![Page 50: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/50.jpg)
Outline
• Introduction to Cloud Computing• Ongoing Projects in Cloud Computing
‣ Resource Provisioning with Budget and Time Constraints‣ Workflow consolidation with power constraints ‣ An Elastic Cache on the Amazon Cloud‣ Data-intensive computing Middleware System
• Other Research Projects‣ Heterogeneous High-Performance Computing‣ Deep web Integration and Mining ‣ Scientific Data Management
51
![Page 51: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/51.jpg)
52
The Deep Web
• The definition of “the deep web” from Wikipedia
The deep Web refers to World Wide Web content that is not part of the surface web, which is indexed by standard search engines.
• Some Examples: Expedia, Priceline
![Page 52: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/52.jpg)
53
The Deep Web is Huge and Informative• 500 times larger than the
surface web• 7500 terabytes of information
(19 terabytes in the surface web)
• 550 billion documents (1 billion in the surface web)
• More than 200,000 deep web sites
• Relevant to every domain: scientific, e-commerce, market
• 95 percent of the deep web is publicly accessible (with access limitations)
![Page 53: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/53.jpg)
54
How to Access Deep Web Data1. A user issues query through input interfaces of deep web data sources
2. Query is translated into SQL style query
3. Trigger search on backend database
4. Answers returned through network
Select priceFrom ExpediaWhere depart=CMH and arrive=SEA and dedate=“7/13/10” and redate=“7/16/10”
![Page 54: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/54.jpg)
55
System OverviewD eepW eb
S chem a M in ing
Source Input Ouput Constraint
S1 A1 B1,B2 C2
S2 A1 B2,B3 C1
D1
D2
D3
D 4
D ata S ource M odel D ata S ourceD ependency M ode l
S ys tem M odels
QueryP lann ing
QueryOptim iza tion
FaultTo lerance
C o m plexS tructured Q uery
A pproxim ateQuery
A nsw ering
E xploring P art of S E E DE E P Q uerying P art of S E E DE E P
A ggregatio n/Lo wS electiv ity Q uery
Hidden schema discoveryData source integration
Structured SQL querySampling the deep webOnline aggregationLow selectivity query
![Page 55: Data-Intensive and High Performance Computing on Cloud Environments Gagan Agrawal 1](https://reader036.vdocuments.site/reader036/viewer/2022062517/56649ef15503460f94c02c8a/html5/thumbnails/55.jpg)
Summary
• Research in Cloud, High Performance Computing and Data-Intensive Computing (including data mining and web mining)
• Currently working with 10 PhD students and 5 MS students
• 10 PhDs completed in last 6 years • To get Involved
‣ Join 888 in Winter 2011
56