heuritech: apache spark rex

17

APACHE SPARK REX

Upload: didmarin

Post on 17-Jul-2015

1.046 views

Category:

Software

0 download

Report

Download

Tags:

Embed Size (px):

TRANSCRIPT

Page 1: Heuritech: Apache Spark REX

APACHE SPARK REX

Page 2: Heuritech: Apache Spark REX

ABOUT MEDidier Marin

PhD in Computer Science (UPMC) Machine Learning, Reinforcement Learning & RoboticsCo-founder of HeuritechLikes functional programming and distributed computing

Page 3: Heuritech: Apache Spark REX

We develop tools to make sense from raw text dataCustomer insight using the text of visited web pages

Page 4: Heuritech: Apache Spark REX

Data Analytics Platform Qualify users using their web logs50M lines/dayMatch CRM and web data

Page 5: Heuritech: Apache Spark REX

Page 6: Heuritech: Apache Spark REX

WHY SPARK ?Performance, in particular whenbatch size < total RAM in clusterMore general than MR, high-level APIExtensions (ML, streaming) andconnectors (Cassandra)Growing community

Page 7: Heuritech: Apache Spark REX

PARSING LOGSdef parseLine(line: String): Either[ParsingError, LogData] = ???

val logs = sc.textFile("logfile").map(parseLine(_))

val validLogs = logs.flatMap(_.right.toOption)

Page 8: Heuritech: Apache Spark REX

LAMBDA ARCHITECTURE

Page 9: Heuritech: Apache Spark REX

IMPLEMENTATION

Page 10: Heuritech: Apache Spark REX

CLUSTER CONFIGURATIONLXC + saltN containers : 1 master/executor + (N-1) executorsCassandra node for each Spark executorUsing an "uber"-JAR to submit jobsSharing data through NFS

Page 11: Heuritech: Apache Spark REX

Page 12: Heuritech: Apache Spark REX

MANAGING SPARK'S MEMORYDefault: 40 % working memory, 60 % cache20 % of cache used to unroll blocks

Explicit caching for huge RDDs we reuse:validLogs.persist(StorageLevel.MEMORY_AND_DISK)

Partition tuning may be necessary (spark.default.parallelism)

Page 13: Heuritech: Apache Spark REX

AGGREGATIONval words = sc.parallelize(List("a","b","a","c"))

words.groupBy(x=>x).mapValues(_.size).collect

// Array((a,2), (b,1), (c,1))

words.map(x=>(x,1)).reduceByKey(_+_).collect

// Array((a,2), (b,1), (c,1))

Page 14: Heuritech: Apache Spark REX

AGGREGATIONgroupBy

Page 15: Heuritech: Apache Spark REX

see also &

AGGREGATIONreduceByKey

combineByKey foldByKey

Page 16: Heuritech: Apache Spark REX

Databricks knowledge base

Spark users mailing list

Parsing Apache logs with Spark (Scala)

USEFUL LINKSgithub.com/databricks/spark-knowledgebase

apache-spark-user-list.1001560.n3.nabble.com

alvinalexander.com/scala/analyzing-apache-access-logs-files-spark-scala

Page 17: Heuritech: Apache Spark REX

THANK YOU !

[email protected]

Residency Presentation. Rex & Me Rex = 10 Months

Estesimetro removibile T-REX T-REX removable extensometer · 2. DESCRIZIONE STRUMENTI 2. INSTRUMENTS DESCRIPTION Estensimetro removibile T-REX T-REX removable exstensometer SISGEO

· BMW ConnectedDrive i3 REX BMW i3s i3s REX BMW . i3 REX i3s i3s REX . I-E-DWR Comfort i3 REX i3 REX 1 i3s REX i3s i3s REX . 50 Neutronic%fi / i3 i3 REx Neutronic%fi

Fashion Deep Learning & Heuritech,forum.gfii.fr/uploads/docs/ForumGFII_2017_Heuritech.pdf · 24/10/2016 · Heuritech, Deep Learning & Fashion. Heuritech, Deep Learning & Fashion

Http://amirenglishclub.synthasite.com/. T. REX Tyrannosaurus Rex Tyranno = Tyrant Saurus = Lizard Rex = King

REX Alabastri Di Rex

Apache Ignite and Apache Spark - GridGain Systems · Ignite and Spark Integration Spark Application Spark Worker Spark Job Spark Job Yarn Mesos Docker HDFS Spark Worker Spark Job

REX/REX F/REX K/REX K F REX DUAL/REX DUAL F · rex/rex f/rex k/rex k f rex dual/rex dual f ... 2.4 rex dual/rex ... 191 10,5 13,5 14,0 8 5 105 216 230 50 ip40 20 xxxx rex 8 rex k

Deep Learning for Manipulation and Navigationslazebni.cs.illinois.edu/spring17/lec19_manipulation.pdf · Image from Heuritech Blog, 2016. Indoor Flight Controller — Results Demonstrates

REPLACEMENT SPARK PLUGS Spark Plug Application Chart · REPLACEMENT SPARK PLUGS Spark Plug Application Chart ... EC Series Air-Cooled 1 ... REPLACEMENT SPARK PLUGS Spark Plug Application

Spark Architecture · Spark Architecture Spark Shuffle ... Spark Shuffle Spark DataFrames . ... – Entry point of the Spark Shell (Scala, Python, R) – The place where SparkContext

16199_Bonus Rex Ou Rex Inutilis

Spark SQL and DataFrames Spark GraphX Spark Mlib Spark ...Spark GraphX! Spark Mlib! Spark Streaming Lightning-fast cluster computing. Chaining transformations 2. ... Covert RDD to

Drumbeat Rex & Barack 13-07-18 Rex, Lies & Videotape

MP 2554/3054/3554/4054/5054/6054 (Ricoh/Lanier/nashuatec ... · Rex-Rotary MP 4054SP, Rex-Rotary MP 4054ASP, Rex-Rotary MP 5054SP, Rex-Rotary MP 5054ASP, Rex-Rotary MP 6054SP, Gestetner

Rex Techno Dotari Bm 2#0703 Rex Techno En

Apache Spark REX Heuritech for La Poste

T-Rex ToneBug Serie - images.musicstore.deimages.musicstore.de/intershop/workshop/grand_gtrs/T-Rex-ToneBug.pdf · 166 grand gtrs T-Rex ToneBug – Fuzz, Booster, Wah & Sustainer T-Rex

T-REX T-REX ADVANCE - KEYLINE S.p.A. T-REX | T-REX ADVANCE T-Rexは、レーザーキー、穿孔キーおよびチューブラーキー用のメカニカル・キーマシンです。

Spark, spark streaming & tachyon

REX Cassandra et Spark au service de la musique en ligne (Français)

SPARK SPARK VRT

Heuritech arXiv:1812.02611v1 [cs.CV] 6 Dec 2018 · [email protected] [email protected] [email protected] [email protected] Figure 1. Dataset merging

Panel Mounting Type Indicator REX-AC/DP Series · Panel Mounting Type Indicator REX-AC/DP Series General Description REX-AC110/410 REX-DP110/410 Features The REX-AC110/410 Digital

CATALOGO DE PRODUCTOS PROSEGI - prosegicdmx.comprosegicdmx.com/material/CATALOGO_PROSEGI_OFICIAL_2019.pdfPROTECCION RESPIRATORIA MOD. REX 350V REX 350 MOD. REX 340 REX 330 MOD WILSON

NGK SPARK pÚü6s RESISTOR TYPE SPARK PLUGS SPARK PLUGS ... · ngk spark pÚü6s resistor type spark plugs spark plugs bougies bujias

REX 450 450w REX 500 500w REX 700 700w - McCulloch · Rex 450/500 – 2,2 κιλά / Rex 700 – 2,3 κιλά ... Non lavorare mai a piedi nudi o indossando sandali aperti. USO 13)

Readasaurus rex

Products Guide T-REX CASTER...1 T-REX KOREA T-REX Caster T-REX Leveling Foot T-REX Saftey 제 1공장 인천광역시 서구 봉수대로 161번길 32(가좌동) TEL 032.574.1984 FAX