![Page 1: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/1.jpg)
Hadoop Open Platform-as-a-Service (Hops)
Academics: Jim Dowling, Seif Haridi
PostDocs: Gautier Berthou (SICS)
PhDs: Salman Niazi, Mahmoud Ismail, Kamal Hakimzadeh, Ali Gholami
R/Engineers: Stig Viaene (SICS), Steffen Grohschmeidt
MSc Students: Theofilos Kakantousis, Nikolaos Stangios, “Sri” Srijeyanthan, Vangelos Savvidis, Seçkin Savaşçı.
![Page 2: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/2.jpg)
What is Systems Research?*
•Systems research is the scientific study, analysis, modeling and engineering of effective software platforms.
•Its challenge is to provide dependable, powerful, performant, secure and scalable solutions within an increasingly complex IT environment.
*Drushel et al, “Fostering Systems Research in Europe”, A White Paper by EuroSys, 2006
![Page 3: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/3.jpg)
Why is Big Data Important?
•In a wide array of academic fields, the ability to effectively process data is superseding other more classical modes of research.
“More data trumps better algorithms”*
*“The Unreasonable Effectiveness of Data” [Halevey, Norvig et al 09]
![Page 4: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/4.jpg)
Bill Gates’ biggest product regret*
http://www.zdnet.com/article/bill-gates-biggest-microsoft-product-regret-winfs/
![Page 5: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/5.jpg)
Windows Future Storage (WinFS*)
•“WinFS was an attempt to bring the benefits of schema and relational databases to the Windows file system. …The WinFS effort was started around 1999 as the successor to the planned storage layer of Cairo and died in 2006 after consuming many thousands of hours of efforts from really smart engineers.”
- [Brian Welcker]*
*http://blogs.msdn.com/b/bwelcker/archive/2013/02/11/the-vision-thing.aspx
![Page 6: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/6.jpg)
Background: Hadoop Filesystem and MapRed
6
![Page 7: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/7.jpg)
HDFS: Hadoop Filesystem
write “/crawler/bot/jd.io/1”
Name node
2 1 3
5 4 6 5 6 6
3 1 4 2 1 3 4 2 5
Heartbeats Rebalance
5
2
1 3
Under-replicated blocks
Re-replicate
blocks
Data nodes Data nodes
![Page 8: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/8.jpg)
Big Data Processing with No Data Locality
Job(“/genomes/jim.bam”)
Workflow Manager
2 1 3 5 6 5 3 6 2 1 4 4 1 5 2 3 4 6
Compute Grid Node Job
submit
This doesn’t scale. Bandwidth is the bottleneck
1 6 3 2 5 4
![Page 9: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/9.jpg)
MapReduce – Data Locality
Job(“/genomes/jim.bam”)
Job Tracker
2 1 3 5 6 5 3 6 2 1 4 4 1 5 2 3 4 6
Task Tracker
Task Tracker
Task Tracker
Task Tracker
Task Tracker
Task Tracker
submit
Job Job Job Job Job Job
DN DN DN DN DN DN
R R R R = resultFile(s)
![Page 10: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/10.jpg)
MapReduce Programming Model
Batch Sequential Processing
Scan Sort
With Fault Tolerance
filter
join
![Page 11: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/11.jpg)
The NameNode
11
![Page 12: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/12.jpg)
HDFS NameNode
•Stores Mappings: path_component -> inode inode -> {block} block -> {replica1,replica2,replica3}
•External API to HDFS Clients
- Internal API to DataNodes
•Monitors Datanodes for failures, corrupted data
•Manages Leases, Quotas, (re-)replication
•Must do all this in a single JVM
- Spotify have a 90GB Heap storing references to 300m files
12
![Page 13: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/13.jpg)
High Availability for the NameNode HDFS 2.x
DN DN DN DN
NN Active
NN Standby
JN JN JN
Shared NN
log stored in
quorum of
journal nodes
NN
Checkpt NN
ZK ZK ZK
Master-Slave
Replication
of NN State.
Agreement on
the Active Master
Faster Recovery,
Cut Journal Log
DOESN’T SCALEOUT !
![Page 14: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/14.jpg)
The Evolution of the NamNode
•HDFS (2006)
- In-memory metadata
•HDFS 0.07 (2006)
- WAL (EditLog)
- FSImage
•HDFS 0.21 (2009)
- Weaken Global Lock
•HDFS 2.0 (2011)
- Eventually Consistent Replication: HA-NameNode
They reinvented
the Database
for the NameNode!
![Page 15: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/15.jpg)
Databases had these features long ago
•Oracle v6 (1988)
- Redo and Undo Logs
- Rollback Segments
•Oracle V7.1 (1994)
- Symmetric Replication
•Oracle 9i RAC (2001)
- Shared State Replication
and have continued to evolve…..
![Page 16: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/16.jpg)
The end of the One-size-fits-All Database
•Columnar Databases
- Vertica, Hana
•NewSQL Databases
- MySQL Cluster, VoltDB, Memstore, AtlasDB, FoundationDB
•Graph Databases
- Neo4J
•RDBMSes
- MySQL, Postgres, DB2, Oracle, SQLServer
•In-Memory Stores
- Memcached, Redis
•Key-Value Stores
- Dynamo, Cassandra, MongoDB, Riak
•Petabyte Databases
- BigQuery (Google), RedShift (Amazon), Impala (Cloudera)
16 Stonebraker et al, “One Size Fits All: An Idea Whose Time Has Come and Gone”, 2005
![Page 17: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/17.jpg)
17
•Distributed, In-memory
•2-Phase Commit
- Replicate DB, not the Log!
•Real-time
- Low TransactionInactive timeouts
•Commodity Hardware
•Scales out
- Millions of transactions/sec
- TB-sized datasets (48 nodes)
•Split-Brain solved with Arbitrator Pattern
•SQL and Native Blocking/Non-Blocking APIs
MySQL Cluster (NDB) – Shared Nothing DB
SQL API NDB API
30+ million update transactions/second
on a 30-node cluster
![Page 18: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/18.jpg)
HopsFS
18
![Page 19: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/19.jpg)
HopsFS
• Customizable and Scalable Metadata
• High throughput for read and write operations
• NameNode failover time≈5 seconds (vs ~1 minute for HDFS)
![Page 20: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/20.jpg)
Request Handling (Apache HDFS vs HopsFS)
Apache HDFS NameNode Request Handling
HopsFS NameNode Request Handling
![Page 21: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/21.jpg)
Fine-Grained Locking, Transactional Updates
21
• NDB gives us READ_COMMITTED isolation-level, not strong enough.
• We implemented Serializability for FS operations using implicit locking
in the DAG and row-level locking in NDB.
[Hakimzadeh, Peiro, Dowling, ”Scaling HDFS with a Strongly Consistent Relational Model for Metadata”, DAIS 2014]
![Page 22: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/22.jpg)
Preventing Deadlocks and Starvation
22
/user/jdowling/dna.bam mv
read
block_report
• Solution: all request threads for inode operations traverse the FS hierarchy
in the same order, acquiring locks in the same order.
• Block-level operations have to
follow the same order.
![Page 23: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/23.jpg)
Per Transaction Cache
•Experimentation revealed many roundtrips to the database per transaction.
•Cache intermediate transaction results at NameNodes.
•We also use Memcached at each NameNode to cache mappings of: path->{inode/blocks/replicas}
![Page 24: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/24.jpg)
Sometimes, Transactions Just ain’t Enough
24
Subtree Operations: 4-phase Protocol • Sacrifices Atomicity, but keeps Isolation and Consistency.
• Batch operations and multithreading for performance.
• Failed NameNodes handled transparently.
• Leases used to handle failed clients.
• Large Subtree Operations with millions of Inodes can’t be executed in a single
Transaction, due to the low timeouts for Transactions (real-time).
![Page 25: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/25.jpg)
Leader Election using the Database (NDB)
•We need a leader NameNode to coordinate replication and lease management
•Use NDB as shared memory for Leader Election.
•No more Zookeeper, yay!
25
![Page 26: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/26.jpg)
HopsFS Internal Protocol Scalability
•On 100PB+ clusters, internal protocols make up most of the network traffic for HDFS
•Block Reporting and Exiting Safe Mode
- Batching and work stealing.
![Page 27: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/27.jpg)
HopsFS Write Performance
27 1 Gbit Network, Nodes: 12-core Xeon X560 @ 2.8 Ghz. 2-Node NDB Cluster.
![Page 28: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/28.jpg)
HopsFS Read Performance
28 1 Gbit Network, Nodes: 12-core Xeon X560 @ 2.8 Ghz. 2-Node NDB Cluster.
![Page 29: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/29.jpg)
HopsFS Erasure Coding
HDFS 2.x Triple Replication
(300%)
2x Replication + XOR (220%)
Reed-Solomon (140%)
![Page 30: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/30.jpg)
HopsFS Erasure Coding
30
Data durability with Triple Replication Data durability with Reed-Solomon
![Page 31: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/31.jpg)
Comparison with HDFS-RAID
![Page 32: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/32.jpg)
HopsFS Snapshots
•Read-Only Root-Level Single Snapshot
- Support rollback on unsuccessful software upgrades
- Prototype developed, ongoing work on integration
- Snapshot rollback order-of-growth is O(N)
![Page 33: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/33.jpg)
We did the same for YARN…
33
![Page 34: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/34.jpg)
34
Apache Hadoop Yarn HA/Scaleout Limitations
NM NM NM NM NM
Standby
RM Primary
RM
Clients Zookeeper
• The Resource Manager (RM) is a bottleneck.
• Zookeeper throughput not high enough to persist all RM state
• Standby resource manager can only recover partial state
• All running jobs must be restarted.
• RM state not queryable.
![Page 35: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/35.jpg)
35
Hops Yarn.
NM NM NM NM NM
RM RM
Client NDB NDB NDB
• The RM is a State-Machine. Almost no session state to manage.
• Transparent failover working.
![Page 36: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/36.jpg)
Hops Yarn
•FIFO Scheduler
•Capacity Scheduler
•Fair Scheduler
•Distributed Resource Tracker Service (ongoing)
•Make YARN more interactive (ongoing)
- Reduce NodeManager Heartbeat Time
36
![Page 37: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/37.jpg)
Hops-Hadoop
NN NN NN
DN
NM
DN
NM
DN
NM
DN
NM
DN
NM
DN
NM
DN
NM
DN
NM
DN
NM
DN
NM
DN
NM
DN
NM
RM RM RM
NDB NDB NDB NDB
Exabyte-Scale Hadoop
HDFS HDFS YARN YARN
![Page 38: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/38.jpg)
The Hops Stack Continued
38
![Page 39: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/39.jpg)
Bringing Data People Together
•Data Owners
- Metadata, Ingestion
- Non-programmers
•Data Scientists
- Data analysts
- Programmers
Hops-HDFS
Hops-YARN
HopsHub Karam
el/PaaS
39
Spark Flink Adam Cuneiform
![Page 40: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/40.jpg)
Perimeter Security and Multi-Tenancy
•HopsHub
- Project-level RBAC
• Hadoop trusted proxy
- Analytics Plugin Framework
• Adam, Cuneiform, Spark, Flink, MR
- REST APIs
Related Hadoop Security Projects
Knox, Sentry, Rhino
Network Isolation
LIMS
LDAP
Kerberos
40
![Page 41: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/41.jpg)
HopsHub Two-Factor Authentication
41
![Page 42: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/42.jpg)
Projects for Multi-Tenancy; Activity Trails
Global Activity Trail
Project
42
![Page 43: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/43.jpg)
Project Membership
43
![Page 44: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/44.jpg)
File Browser (Iceberg)
HDFS Files
![Page 45: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/45.jpg)
Upload Data
Apache Flume
Overcome 3 GB browser upload limit
45
Automated Ingestion of Data
![Page 46: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/46.jpg)
Run Cuneiform Workflows on YARN
46
VCF file FastQ files Results (~250 GB) (~10 GB) (~5 MB)
Variant
Calling Annotate BAM file Align
![Page 47: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/47.jpg)
Ongoing MSc Projects
•Realizing the meta-data dream of WinFS
- Vangelis
•Optimizing YARN’s Resource Tracker Service (interactive YARN)
- Sri
•Interactive Data Analytics (Zeppelin-EE)
- Seckin
47
![Page 48: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/48.jpg)
PaaS support with Chef/Karamel
Support for EC2, Vagrant, Bare Metal. 48
![Page 49: Hadoop Open Platform-as-a-Service (Hops)/J.Dowling.pdf · Hadoop Open Platform-as-a-Service (Hops) Academics: ... • Zookeeper throughput not high enough to persist all RM state](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a741e6d7f8b9a0d558b91f8/html5/thumbnails/49.jpg)
Conclusions
•Hops will be the first European distribution of Hadoop when released.
- First beta release coming in Q1 2015
•Lots of ideas for future work
- Tighter Spark, Flink integration
- BiobankCloud support
•NGS Hadoop Workshop Feb 19-20, Stockholm
- Signup at www.biobankcloud.com
49