fiware fwd big_data_all_in_1_v1

Post on 14-Aug-2015

188 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FIWARE Developer’s WeekBig Data GE

francisco.romerobueno@telefonica.com

2

10000 feet view:Big Data in the

FIWARE architecture

OrionCB

Big Data

App

real time last-valuecontext data

historicalcontext data

IoT agents

HDFS

other storages

Cygnus

MapReduceHive

Tidoop

4

Big Data:What is it and how much data is there

What is big data?

5

> small data

What is big data?

6

> big data

http://commons.wikimedia.org/wiki/File:Interior_view_of_Stockholm_Public_Library.jpg

How much data is there?

7

Data growing forecast

8

2.3 3.612

19

11.3

39

0.5

1.4

Global users

(billions)

Global networked

devices(billions)

Global broadband speed

(Mbps)

Global traffic(zettabytes)

http://www.cisco.com/en/US/netsol/ns827/networking_solutions_sub_solution.html#~forecast

2012

20122012

2012

2017

2017

2017

2017

1 zettabyte = 1021 bytes

1,000,000,000,000,000,000,000 bytes

9

How to deal with it:Distributed storage

and computing

What happens if one shelving is not enough?

10

You buy more shelves…

… then you create an index

11

“The Avengers”, 1-100, shelf 1“The Avengers”, 101-125, shelf 2“Superman”, 1-50, shelf 2“X-Men”, 1-100, shelf 3“X-Men”, 101-200, shelf 4“X-Men”, 201, 225, shelf 5

The Avengers

The Avengers

The Avengers

The Avengers

The Avengers

Superman

Superman

X-Men

Distributed

storage!

X-Men

X-Men

X-Men

X-Men

X-Men

X-Men

X-Men

X-Men

… and you call your friends to read everything!

12

Distributed

computing!

13

Distributed storage:The Hadoop reference

Hadoop Distributed File System (HDFS)

14

• Based on Google File System• Large files are stored across multiple machines

(Datanodes) by spliting them into blocks that are distributed

• Metadata is managed by the Namenode• Scalable by simply adding more Datanodes• Fault-tolerant since HDFS replicates each block

(default to 3)• Security based on authentication (Kerberos) and

authorization (permissions, HACLs)• It is managed like a Unix-like file system

HDFS architecture

15

Namenode

Datanode0 Datanode1 DatanodeN

Rack 1 Rack 2

1 2 2 3 3 1 2

Path Replicas Block IDs

/user/user1/data/file0 2 1,3

/user/user1/data/file1 3 2,4,5

… … …

16

Managing HDFS:File System ShellHTTP REST API

File System Shell

17

• The File System Shell includes various shell-like comands that directly interact with HDFS

• The FS shell is invoked by any of these scripts:– bin/hadoop fs– bin/hdfs dfs

• All FS Shell commans take URI paths as arguments:– scheme://authority/path– Available schemas: file (local FS), hdfs (HDFS)– If nothing is especified, hdfs is considered

• It is necessary to connect to the cluster via SSH• Full commands reference

– http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html

File System Shell examples

18

$ hadoop fs -cat webinar/abriefhistoryoftime_page1CHAPTER 1OUR PICTURE OF THE UNIVERSEA well-known scientist (some say it was Bertrand Russell) once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the center of a vast$ hadoop fs -mkdir webinar/afolder$ hadoop fs -ls webinarFound 4 items-rw-r--r-- 3 frb cosmos 3431 2014-12-10 14:00 /user/frb/webinar/abriefhistoryoftime_page1-rw-r--r-- 3 frb cosmos 1604 2014-12-10 14:00 /user/frb/webinar/abriefhistoryoftime_page2-rw-r--r-- 3 frb cosmos 5257 2014-12-10 14:00 /user/frb/webinar/abriefhistoryoftime_page3drwxr-xr-x - frb cosmos 0 2015-03-10 11:09 /user/frb/webinar/afolder$ hadoop fs -rmr webinar/afolderDeleted hdfs://cosmosmaster-gi/user/frb/webinar/afolder

HTTP REST API

19

• The HTTP REST API supports the complete File System interface for HDFS

• It relies on the webhdfs schema for URIs– webhdfs://<HOST>:<HTTP_PORT>/<PATH>

• HTTP URLs are built as:– http://<HOST>:<HTTP_PORT>/webhdfs/v1/

<PATH>?op=…

• Full API specification– http://hadoop.apache.org/docs/current/hadoop

-project-dist/hadoop-hdfs/WebHDFS.html

HTTP REST API examples

20

$ curl –X GET “http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/frb/webinar/abriefhistoryoftime_page1?op=open&user.name=frb”CHAPTER 1OUR PICTURE OF THE UNIVERSEA well-known scientist (some say it was Bertrand Russell) once gave a public lecture on astronomy. He described how the earth orbits around the sun and how the sun, in turn, orbits around the center of a vast$ curl -X PUT "http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/frb/webinar/afolder?op=mkdirs&user.name=frb"{"boolean":true}$ curl –X GET "http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/frb/webinar?op=liststatus&user.name=frb"{"FileStatuses":{"FileStatus":[{"pathSuffix":"abriefhistoryoftime_page1","type":"FILE","length":3431,"owner":"frb","group":"cosmos","permission":"644","accessTime":1425995831489,"modificationTime":1418216412441,"blockSize":67108864,"replication":3},{"pathSuffix":"abriefhistoryoftime_page2","type":"FILE","length":1604,"owner":"frb","group":"cosmos","permission":"644","accessTime":1418216412460,"modificationTime":1418216412500,"blockSize":67108864,"replication":3},{"pathSuffix":"abriefhistoryoftime_page3","type":"FILE","length":5257,"owner":"frb","group":"cosmos","permission":"644","accessTime":1418216412515,"modificationTime":1418216412551,"blockSize":67108864,"replication":3},{"pathSuffix":"afolder","type":"DIRECTORY","length":0,"owner":"frb","group":"cosmos","permission":"755","accessTime":0,"modificationTime":1425995941361,"blockSize":0,"replication":0}]}}$ curl -X DELETE "http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/frb/webinar/afolder?op=delete&user.name=frb"{"boolean":true}

21

Feeding HDFS:Cygnus

Cygnus FAQ

22

• What is it for?– Cygnus is a connector in charge of persisting Orion

context data in certain configured third-party storages, creating a historical view of such data. In other words, Orion only stores the last value regarding an entity's attribute, and if an older value is required then you will have to persist it in other storage, value by value, using Cygnus.

• How does it receives context data from Orion Context Broker?– Cygnus uses the subscription/notification feature of

Orion. A subscription is made in Orion on behalf of Cygnus, detailing which entities we want to be notified when an update occurs on any of those entities attributes.

Cygnus FAQ

23

• Which storages is it able to integrate?– Current stable release is able to persist Orion context data in:

• HDFS, the Hadoop distributed file system.• MySQL, the well-know relational database manager.• CKAN, an Open Data platform.• STH, Short-Term Historic

• Which is its architecture?– Internally, Cygnus is based on Apache Flume. In fact,

Cygnus is a Flume agent, which is basically composed of a source in charge of receiving the data, a channel where the source puts the data once it has been transformed into a Flume event, and a sink, which takes Flume events from the channel in order to persist the data within its body into a third-party storage.

Basic Cygnus agent

24

Configure a basic Cygnus agent

25

• Edit /usr/cygnus/conf/agent_<id>.conf• List of sources, channels and sinks:

cygnusagent.sources = http-sourcecygnusagent.sinks = hdfs-sinkcygnusagent.channels = hdfs-channel

• Channels configurationcygnusagent.channels.hdfs-channel.type = memorycygnusagent.channels.hdfs-channel.capacity = 1000cygnusagent.channels.hdfs-channel.transactionCapacity = 100

Configure a basic Cygnus agent

26

• Sources configuration:cygnusagent.sources.http-source.channels = hdfs-channelcygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSourcecygnusagent.sources.http-source.port = 5050cygnusagent.sources.http-source.handler = es.tid.fiware.fiwareconnectors.cygnus.handlers.OrionRestHandlercygnusagent.sources.http-source.handler.notification_target = /notifycygnusagent.sources.http-source.handler.default_service = def_servcygnusagent.sources.http-source.handler.default_service_path = def_servpathcygnusagent.sources.http-source.handler.events_ttl = 10cygnusagent.sources.http-source.interceptors = ts decygnusagent.sources.http-source.interceptors.ts.type = timestampcygnusagent.sources.http-source.interceptors.de.type = es.tid.fiware.fiwareconnectors.cygnus.interceptors.DestinationExtractor$Buildercygnusagent.sources.http-source.interceptors.de.matching_table = /usr/cygnus/conf/matching_table.conf

Configure a basic Cygnus agent

27

• Sinks configuration:cygnusagent.sinks.hdfs-sink.channel = hdfs-channelcygnusagent.sinks.hdfs-sink.type = es.tid.fiware.fiwareconnectors.cygnus.sinks.OrionHDFSSinkcygnusagent.sinks.hdfs-sink.cosmos_host = cosmos.lab.fi-ware.orgcygnusagent.sinks.hdfs-sink.cosmos_port = 14000cygnusagent.sinks.hdfs-sink.cosmos_default_username = cosmos_usernamecygnusagent.sinks.hdfs-sink.cosmos_default_password = xxxxxxxxxxxxxcygnusagent.sinks.hdfs-sink.hdfs_api = httpfscygnusagent.sinks.hdfs-sink.attr_persistence = columncygnusagent.sinks.hdfs-sink.hive_host = cosmos.lab.fi-ware.orgcygnusagent.sinks.hdfs-sink.hive_port = 10000cygnusagent.sinks.hdfs-sink.krb5_auth = false

HDFS details regarding Cygnus persistence

28

• By default, for each entity Cygnus stores the data at:– /user/<your_user>/<service>/<service-path>/<entity-id>-<entity-

type>/<entity-id>-<entity-type>.txt

• Within each HDFS file, the data format may be json-row or json-column:– json-row

{ "recvTimeTs":"13453464536”, "recvTime":"2014-02-27T14:46:21”, "entityId":"Room1”, "entityType":"Room”, "attrName":"temperature”, "attrType":"centigrade”, “attrValue":"26.5”, "attrMd":[ … ]}

– json-column{ "recvTime":"2014-02-27T14:46:21”, "temperature":"26.5”, "temperature_md":[ … ], “pressure”:”90”, “pressure_md”:[ … ]}

Advanced features (version 0.7.1)

29

• Round-Robin channel selection• Pattern-based context data grouping• Kerberos authentication• Management Interface (roadmap)• Multi-tenancy support (roadmap)• Entity model-based persistence (roadmap)

Round Robin channel selection

30

• It is possible to configure more than one channel-sink pair for each storage, in order to increase the performance

• A custom ChannelSelector is needed• https://github.com/telefonicaid/fiware-connectors/b

lob/master/flume/doc/operation/performance_tuning_tips.md

RoundRobinChannelSelector configuration

31

cygnusagent.sources = mysourcecygnusagent.sinks = mysink1 mysink2 mysink3cygnusagent.channels = mychannel1 mychannel2 mychannel3

cygnusagent.sources.mysource.type = ...cygnusagent.sources.mysource.channels = mychannel1 mychannel2 mychannel3cygnusagent.sources.mysource.selector.type = es.tid.fiware.fiwareconnectors.cygnus.channelselectors.RoundRobinChannelSelectorcygnusagent.sources.mysource.selector.storages = Ncygnusagent.sources.mysource.selector.storages.storage1 = <subset_of_cygnusagent.sources.mysource.channels>...cygnusagent.sources.mysource.selector.storages.storageN = <subset_of_cygnusagent.sources.mysource.channels>

Pattern-based Context Data Grouping

32

• Default destination (HDFS file, mMySQL table or CKAN resource) is obtained as a concatenation:– destination=<entity_id>-<entityType>

• It is possible to group different context data thanks to this regex-based feature implemented as a Flume interceptor:cygnusagent.sources.http-source.interceptors = ts decygnusagent.sources.http-source.interceptors.ts.type = timestampcygnusagent.sources.http-source.interceptors.de.type = es.tid.fiware.fiwareconnectors.cygnus.interceptors.DestinationExtractor$Buildercygnusagent.sources.http-source.interceptors.de.matching_table = /usr/cygnus/conf/matching_table.conf

Matching table for pattern-based grouping

33

• CSV file (‘|’ field separator) containing rules– <id>|<comma-separated_fields>|<regex>|<destination>|

<destination_dataset>

• For instance:1|entityId,entityType|Room\.(\d*)Room|numeric_rooms|rooms2|entityId,entityType|Room\.(\D*)Room|character_rooms|rooms3|entityType,entityId|RoomRoom\.(\D*)|character_rooms|rooms4|entityType|Room|other_roorms|rooms

• https://github.com/telefonicaid/fiware-connectors/blob/master/flume/doc/design/interceptors.md#destinationextractor-interceptor

Kerberos authentication

34

• HDFS may be secured with Kerberos for authentication purposes

• Cygnus is able to persist on kerberized HDFS if the configured HDFS user has a registered Kerberos principal and this configuration is added:cygnusagent.sinks.hdfs-sink.krb5_auth = truecygnusagent.sinks.hdfs-sink.krb5_auth.krb5_user = krb5_usernamecygnusagent.sinks.hdfs-sink.krb5_auth.krb5_password = xxxxxxxxxxxxcygnusagent.sinks.hdfs-sink.krb5_auth.krb5_login_file = /usr/cygnus/conf/krb5_login.confcygnusagent.sinks.hdfs-sink.krb5_auth.krb5_conf_file = /usr/cygnus/conf/krb5.conf

• https://github.com/telefonicaid/fiware-connectors/blob/master/flume/doc/operation/hdfs_kerberos_authentication.md

35

Distributed computing:

The Hadoop reference

Hadoop was created by Doug Cutting at Yahoo!...

36

… based on the MapReduce patent by Google

Well, MapReduce was really invented by Julius Caesar

37

Divide etimpera*

* Divide and conquer

An example

38

How much pages are written in latin among the booksin the Ancient Library of Alexandria?

LATINREF1P45

GREEKREF2P128

EGYPTREF3P12

LATINpages 45

EGYPTIAN

LATINREF4P73

LATINREF5P34

EGYPTREF6P10

GREEKREF7P20

GREEKREF8P230

45 (ref 1)

still reading…

Mappers

Reducer

An example

39

How much pages are written in latin among the booksin the Ancient Library of Alexandria?

GREEKREF2P128

stillreading…

EGYPTIAN

LATINREF4P73

LATINREF5P34

EGYPTREF6P10

GREEKREF7P20

GREEKREF8P230

GREEK

45 (ref 1)

Mappers

Reducer

An example

40

How much pages are written in latin among the booksin the Ancient Library of Alexandria?

LATINpages 73

EGYPTIAN

LATINREF4P73

LATINREF5P34

GREEKREF7P20

GREEKREF8P230

LATINpages 34

45 (ref 1)

+73 (ref 4)

+34 (ref 5)

Mappers

Reducer

An example

41

How much pages are written in latin among the booksin the Ancient Library of Alexandria?

GREEK

GREEK

GREEKREF7P20

GREEKREF8P230

idle…

45 (ref 1)

+73 (ref 4)

+34 (ref 5)

Mappers

Reducer

An example

42

How much pages are written in latin among the booksin the Ancient Library of Alexandria?

idle…

idle…

idle…

45 (ref 1)

+73 (ref 4)

+34 (ref 5)

152 TOTAL

Mappers

Reducer

MapReduce applications

43

• MapReduce applications are commonly written in Java– Can be written in other languages through Hadoop Streaming

• They are executed in the command line$ hadoop jar <jar-file> <main-class> <input-dir> <output-dir>

• A MapReduce job consists of:– A driver, a piece of software where to define inputs, outputs, formats,

etc. and the entry point for launching the job– A set of Mappers, given by a piece of software defining its behaviour– A set of Reducers, given by a piece of software defining its behaviour

• There are 2 APIS– org.apache.mapred old one– org.apache.mapreduce new one

• Hadoop is distributed with MapReduce examples– [HADOOP_HOME]/hadoop-examples.jar

Implementing the example

44

• The input will be a single big file containing:symbolae botanicae,latin,230mathematica,greek,95physica,greek,109ptolomaics,egyptian,120terra,latin,541iustitia est vincit,latin,134

• The mappers will receive pieces of the above file, which will be read line by line– Each line will be represented by a (key,value) pair, i.e. the offset on the

file and the real data within the line, respectively– For each input pair a (key,value) pair will be output, i.e. a common

“num_pages” key and the third field in the line

• The reducers will receive arrays of pairs produced by the mappers, all having the same key (“num_pages”)– For each array of pairs, the sum of the values will be output as a

(key,value) pair, in this case a “total_pages” key and the sum as value

Implementing the example: JCMapper.class

45

public static class JCMapper extendsMapper<Object, Text, Text, IntWritable> {

private final Text globalKey = new Text(”num_pages"); private final IntWritable bookPages = new IntWritable();

@Override public void map(Object key, Text value, Context context) throws Exception {

String[] fields = value.toString().split(“,”); system.out.println(“Processing “ + fields[0]);

if (fields[1].equals(“latin”)) { bookPages.set(fields[2]);

context.write(globalKey, bookPages); } // if

} // map

} // JCMapper

Implementing the example: JCReducer.class

46

public static class JCReducer extendsReducer<Text, IntWritable, Text, IntWritable> {

private final IntWritable totalPages= new IntWritable();

@Override public void reduce(Text globalKey, Iterable<IntWritable> bookPages, Context context) throws Exception {

int sum = 0;

for (IntWritable val: bookPages) { sum += val.get(); } // for

totalPages.set(sum); context.write(globalKey, totalPages);

} // reduce

} // JCReducer

Implementing the example: JC.class

47

public static void main(String[] args) throws Exception { int res = ToolRunner.run( new Configuration(), new CKANMapReduceExample(), args); System.exit(res);} // main @Overridepublic int run(String[] args) throws Exception { Configuration conf = this.getConf(); Job job = Job.getInstance(conf, ”julius caesar"); job.setJarByClass(JC.class); job.setMapperClass(JCMapper.class); job.setCombinerClass(JCReducer.class); job.setReducerClass(JCReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0 : 1;} // run

48

Querying toolsHive

Querying tools

49

• MapReduce paradigm may be hard to understand and, the worst, to use

• Indeed, many data analyzers just need to query for the data– If possible, by using already well-known languages

• Regarding that, some querying tools appeared in the Hadoop ecosystem– Hive and its HiveQL language quite similar to

SQL– Pig and its Pig Latin language a new language

Hive and HiveQL

50

• HiveQL reference– https://cwiki.apache.org/confluence/display/Hive/

LanguageManual

• All the data is loaded into Hive tables– Not real tables (they don’t contain the real data) but metadata

pointing to the real data at HDFS

• The best thing is Hive uses pre-defined MapReduce jobs behind the scenes!• Column selection• Fields grouping• Table joining• Values filtering• …

• Important remark: since MapReduce is used by Hive, the queries make take some time to produce a result

Hive CLI

51

$ hivehive history file=/tmp/myuser/hive_job_log_opendata_XXX_XXX.txthive>select column1,column2,otherColumns from mytable where column1='whatever' and columns2 like '%whatever%';Total MapReduce jobs = 1Launching Job 1 out of 1Starting Job = job_201308280930_0953, Tracking URL = http://cosmosmaster-gi:50030/jobdetails.jsp?jobid=job_201308280930_0953Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=cosmosmaster-gi:8021 -kill job_201308280930_09532013-10-03 09:15:34,519 Stage-1 map = 0%, reduce = 0%2013-10-03 09:15:36,545 Stage-1 map = 67%, reduce = 0%2013-10-03 09:15:37,554 Stage-1 map = 100%, reduce = 0%2013-10-03 09:15:44,609 Stage-1 map = 100%, reduce = 33%

Hive Java API

52

• Hive CLI is OK for human-driven testing purposes• But it is not usable by remote applications

• Hive has no REST API• Hive has several drivers and libraries

• JDBC for Java• Python• PHP• ODBC for C/C++• Thrift for Java and C++• https://cwiki.apache.org/confluence/display/Hive/Hiv

eClient

• A remote Hive client usually performs:• A connection to the Hive server (TCP/10000)• The query execution

Hive Java API: get a connection

53

private static Connection getConnection(String ip, String port, String user, String password) { try { Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver"); } catch (ClassNotFoundException e) { System.out.println(e.getMessage()); return null; } // try catch try { return DriverManager.getConnection("jdbc:hive://" + ip + ":” + port + "/default?user=" + user + "&password=“ + password); } catch (SQLException e) { System.out.println(e.getMessage()); return null; } // try catch} // getConnection

Hive Java API: do the query

54

private static void doQuery() { try { Statement stmt = con.createStatement(); ResultSet res = stmt.executeQuery( "select column1,column2,” + "otherColumns from mytable where “ + “column1='whatever' and “ + "columns2 like '%whatever%'");

while (res.next()) { String column1 = res.getString(1); Integer column2 = res.getInteger(2); } // while

res.close(); stmt.close(); con.close(); } catch (SQLException e) { System.exit(0); } // try catch} // doQuery

Hive tables creation

55

• Both locally using the CLI, or remotely using the Java API, use this command:hive> Create external table...

• CSV-like HDFS fileshive> create external table <table_name> (<field1_name> <field1_type>, ..., <fieldN_name> <fieldN_type>) row format delimited field terminated by ‘<separator>' location ‘/user/<username>/<path>/<to>/<the>/<data>';

• Json-like HDFS filescreate external table <table_name> (<field1_name> <field1_type>, ..., <fieldN_name> <fieldN_type>) row format serde 'org.openx.data.jsonserde.JsonSerDe' location ‘/user/<username>/<path>/<to>/<the>/<data>’;

• Cygnus automatically creates the Hive tables for the entities it is notified for!

56

TidoopExtensions for Hadoop

Non HDFS inputs for MapReduce

57

• TIDOOP is about all the developments related to Hadoop made Teléfonica Investigación y Desarrollo– https://github.com/telefonicaid/fiware-tidoop

• Part of this repo are the extensions allowing using generic non HDFS data– https://github.com/telefonicaid/fiware-tidoop/tree/develop

/tidoop-hadoop-ext

• Non-HDFS sources:– CKAN (beta available)– MongoDB-based Short-Term Historic (roadmap)

• Other specific public repos UNDER STUDY… suggestions accepted

Non HDFS inputs for MapReduce: driver

58

@Overridepublic int run(String[] args) throws Exception { Configuration conf = this.getConf(); Job job = Job.getInstance(conf, ”ckan mr"); job.setJarByClass(CKANMapReduceExample.class); job.setMapperClass(CKANMapper.class); job.setCombinerClass(CKANReducer.class); job.setReducerClass(CKANReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // start of the relevant part job.setInputFormatClass(CKANInputFormat.class); CKANInputFormat.addCKANInput(job, ckanInputs); CKANInputFormat.setCKANEnvironmnet(job, ckanHost, ckanPort, sslEnabled, ckanAPIKey); CKANInputFormat.setCKANSplitsLength(job, splitsLength); // end of the relevant part FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0 : 1;} // run

59

TidoopMapReduce Library

MapReduce Library

60

• Under Tidoop, there is a library of predefined general purpose MapReduce jobs– https://github.com/telefonicaid/fiware-tidoop/tree/develop/ti

doop-mr-lib

• Jobs within the library may be used as any other MapReduce job, but it can be used remotely through a REST API as well– https://github.com/telefonicaid/fiware-tidoop/tree/develop/ti

doop-mr-lib-api

• Available jobs:– Filter by regex– MapOnly (mapping function as parameter)

• Currently working on adding more and more jobs

61

Big Data services at FIWARE LAB

Cosmos

Cosmos storage services

62

• Currently:– Hadoop cluster combining HDFS storage and

MapReduce computing in the same nodes– 10 virtual nodes– 1 TB capacity (333 GB real capacity, replicas=3)– OAuth2-based secure access to WebHDFS/HttpFS

• Upcoming:– Specific HDFS cluster for storage– 6 physical nodes (+ other 6 planified)– 20 TB capacity (6,7 TB real capacity, replicas=3)– OAuth2-based secure access to WebHDFS/HttpFS– Support to CKAN for Open Big Data

Cosmos computing services

63

• Currently:– Hadoop cluster combining HDFS storage and MapReduce

computing in the same nodes– 10 virtual nodes– 100 GB capacity– MapReduce and Hive available– No queues, no resources reservation mechanisms

• Upcoming:– Specific HDFS cluster for computing– 8 physical nodes (+ other 8 planified)– 256 GB capacity– MapReduce, Hive and Tidoop available– Reservation mechanisms through YARN queues

Cosmos portal

64

• Currently:– http://cosmos.lab.fi-ware.org/cosmos-gui/– FIWARE Lab credentials are directly requested– A HDFS userspace is created in the shared Hadoop

cluster

• Upcoming:– Real delegated authentication– A HDFS userspace is created in the storage cluster– Features to deploy and run MapReduce jobs

Thanks!Thanks!

top related