big$data,$hadoop$and$nosql$ - irityoann.pitarch/docs/coursbigdata/bigdatalesson.pdf · agenda 1....
Post on 20-May-2020
6 Views
Preview:
TRANSCRIPT
Big Data, Hadoop and NoSQL (Almost) Everything you have ever
wanted to know about
Yoann Pitarch
Agenda
1. Big Data 1. DefiniHon 2. ApplicaHons
2. Hadoop / MapReduce CompuHng Paradigm 1. Overview 2. How it works
3. NoSQL databases
Agenda
1. Big Data 1. DefiniHon 2. ApplicaHons
2. Hadoop / MapReduce CompuHng Paradigm 1. Overview 2. How it works
3. NoSQL databases
IntroducHon
J. Manyika et al., Big data, the next fron0er for innova0on, compe00on, and produc0vity, McKinsey Global InsHtute, 2011
Some DefiniHons • A buzz word
• « Data which size is too large and complex to be treated (harversted, stored, analysed, spreaded) by usual systems »
• « A data can be considered as big when tradi8onal analysis methods fail in dealing with them »
• « A collecHon of data from tradi8onal and digital sources inside and outside a company that represents a source for ongoing discovery and analysis »
The 3, 4 or 5 “Vs”
• Possibly Veracity and/or Variability as well Source: h_p://www.datasciencecentral.com/
Digital Informa8on and usage: what has changed ?
Digital Data
94%
6%
25%
75%
Analogique
1%
99%
3%
97%
Digital
1986 1993 2000 2007
3 16 54
295
0
100
200
300
400
NOTE: Numbers may not sum to rounding Hilbert and Lopez, « The world’s technological capacity to store, communicate, and compute informa0on », Science, 2011 J. Manyika et al., Big data, the next fron0er for innova0on, compe00on, and produc0vity, McKinsey Global InsHtute, 2011.
Exabytes
Storage and CompuHng
1986 1993 2000 2007
NOTE: Numbers may not sum to rounding SOURCE : Hilbert and Lopez, « The world’s technological capacity to store, communicate, and compute informa0on », Science, 2011 J. Manyika et al., Big data, the next fron0er for innova0on, compe00on, and produc0vity, McKinsey Global InsHtute, 2011.
1012 million instruc5ons per second
41%
17%
9%
33%
6%
23%
6% 65%
6% 3% 5%
86%
3% 6%
25%
66%
<0,001 0.004 0.289
6.379
0
2
4
6
Types Video Image Audio Texte/
Numbers
Insurance
Banking
CommunicaHon and media
ConstrucHon
EducaHon
Gouvernement
Health care
PénétraHon
Low
Medium
High
SOURCE: McKinsey Global Ins0tute analysis J. Manyika et al., Big data, the next fron0er for innova0on, compe00on, and produc0vity, McKinsey Global InsHtute, 2011.
Big Data from Internet/Web 2.0
R. Kalakota, 2012
Internet Users
World: 2 405 518 376 internet users in june 2012 (populaHon: 7 017 846 922) France: 52,228,905 Internet users in june 2012 Indexed Web : more than 8.15 billion pages (Sept, 2012)
Internet World Stats – www.internetworldstats.com/stats.htm
Internet users
PopulaHon 0
500 000 000
1 000 000 000
1 500 000 000
2 000 000 000
2 500 000 000
3 000 000 000
3 500 000 000
4 000 000 000
Africa Asia Europe Middle East North
America LaHn America Oceania
Source: h_p://fr.slideshare.net/stevenvanbelleghem/social-‐media-‐around-‐the-‐world-‐2011
North Ouest South East Europe
Know at least one SN 98% 97% 99% 99% 98%
Member of at least one SN 75% 66% 77% 79% 73%
Avg nber of SN 1,8 1,5 2,2 1,9 1,9
North
Ouest
South
East
Social Network PenetraHon
Social Network • Facebook users
– 835 525 280 (march 2012) – 50% via mobile – 25 000 000 in France (penetraHon rate 38%)
Social Networks • Twi_er users
– 140 000 000 acHve users – 340 000 000 tweets per day
• Some stats (h_p://fr.slideshare.net/stevenvanbelleghem)
– 400 billion users daily (Facebook) – Session: facebook 37mn; twi_er 23 mn – 50% follow a trade mark – 36% write about a company or a trade mark – 19% write about the company they belong to
Big Data from Science • Astronomy
– THéMIS solar telescope: 20 TB raw reduced data – European Solar Telescope: ½ PB/day (2018) – CDS: data base of all the astronomical objects, related papers, …
• Fluid analysis – ParHcle Image Velocimetry: 1 mn ; 1000 images/seconde ; 1 TB (encodage 16 bits)
– EquaHon Navier-‐Stokes: grild 108 noeuds; 105 steps; 40 TB of data per variable
• Medecine – Radiology, video of surgeries, reports
Harvested by Smart Sensors
• Car industry • Meters: water, power, … • Networks: communicaHons (emails, telephone, …)
Submarine Cable Map
Submarine Cable Map
• Prism • Qui ?
Data Centers
• « Within four years, two-‐thirds of all data center traffic across the world — as well as workloads — will be cloud based » Cisco – 2012
• Global data center traffic will grow fourfold between 2011 and 2016, reaching a total of 6.6 ze_abytes annually
• Global cloud traffic, the fastest-‐growing component of data center traffic, will grow sixfold – a 44% combined annual growth rate (CAGR) – from 683 exabytes of annual traffic in 2011 to 4.3 ze_abytes by 2016
Data Centers
h_p://www.cisco.com/en/US/soluHons/collateral/ns341/ns525/ns537/ns705/ns1175/Cloud_Index_White_Paper.html
Global Data Center IP Traffic Growth
Data Centers
h_p://www.cisco.com/en/US/soluHons/collateral/ns341/ns525/ns537/ns705/ns1175/Cloud_Index_White_Paper.html
Workload DistribuHon: 2011-‐2016
ApplicaHons
documentaHon.Hce.ac-‐orleans-‐tours.fr
ApplicaHons
• Science monitoring & technical watch – Analysis of the compeHtors – Trend detecHon
• Market segmentaHon / customer micro-‐segmentaHon • Social network analysis
– Opinion mining – Digital idenHty
• ReacHon to failures • AutomaHc and smart regulaHon
– Water, electricity, …
• AutomaHc monitoring of domesHc equipments
Technologies and Methods
– CS: Collect, Store, Treat, Spread – Agregate, Analyse, Visualize – MulHdisciplinary: CS, mathemaHcs, economy, sociology
– Methods • MulHdimensional analysis and visualiszaHon • Machine learning • Data mining
Methods
– Not necessarly fully new methods but – Volume implies new treatment principles
• Large processing and storage capacity => cloud compuHng
• High-‐performance and parallel compuHng => MAHOUT Apache Mahout: Scalable machine learning and data mining
• Data reducHon / dimensionality reducHon • Distributed data & treatments => NOSQL ; MapReduce
Apache™ Hadoop®!
Cloud CompuHng
4+ billion phones by 2010 [Source: Nokia]
Web 2.0-enabled PCs, TVs, etc.
Businesses, from startups to enterprises
An emerging compuHng paradigm where data and services reside in massively scalable data centers and can be ubiquitously accessed from any connected devices over the internet.
Credit: IBM Corp. [F. Desprez, LIP ENS Lyon / INRIA]
Cloud CompuHng • Two key concepts
– Processing 1000x more data does not have to be 1000x harder
– Cycles and bytes, not hardware are the new commodity
• Cloud compuHng is – Providing services on virtual machines allocated on top of a large physical machine pool
– A method to address scalability and availability concerns for large scale applicaHons
– DemocraHzed distributed compuHng
[F. Desprez, LIP ENS Lyon / INRIA]
Amazon ElasHc Compute Cloud A set of APIs and business models which give developer-level access to Amazon’s infrastructure and content:
" Data As A Service " Amazon E-‐Commerce Service " Amazon Historical Pricing
" Search As A Service " Alexa Web InformaHon Service " Alexa Top Sites " Alexa Site Thumbnail " Alexa Web Search Playorm
" Infrastructure As A Service " Amazon Simple Queue Service " Amazon Simple Storage Service " Amazon ElasHc Compute Cloud
" People As A Service " Amazon Mechanical Trunk
Compute
Store Message
ElasHc Compute Cloud
Simple Storage Service
Simple Queue Service
Credits: Jeff Barr, Amazon
Before Analysis
HARVESTING
World Digital Library
h_p://www.wdl.org/en/
HarvesHng
• Technical issues – Different API to query – Different formats
• Other issues – SelecHng sources – Defining queries – Need for homogeneizaHon
Issues and Challenges • Data heterogenity
– Formats (pages vs tweets vs videos) – Fiability : quality, validity, privacy – Accessibility: open data /
• Technics and technologies – Hardware (capacity, security) – So{ware
• OrganisaHonnal – Various skills and competences (computer science, mathemaHcs, economics, social science, ….)
Agenda
1. Big Data 1. DefiniHon 2. ApplicaHons
2. Hadoop / MapReduce CompuHng Paradigm 1. Overview 2. How it works
3. NoSQL databases
Large-‐Scale Data AnalyHcs • MapReduce compuHng paradigm, e.g., Hadoop, vs. tradiHonal
database systems
35
Database
vs.
� Many enterprises are turning to Hadoop
¡ Especially applicaHons generaHng big data
¡ Web applicaHons, social networks, scienHfic applicaHons
Why Hadoop is Able to Compete?
36
Scalability (petabytes of data, thousands of machines)
Database vs.
Flexibility in accepHng all data formats (no schema)
Commodity inexpensive hardware
Efficient and simple fault-‐tolerant mechanism
Performance (tons of indexing, tuning, data organizaHon tech.)
Features: -‐ Provenance tracking -‐ AnnotaHon management -‐ ….
What is Hadoop (1)
• Hadoop is a so{ware framework for distributed processing of large datasets across large clusters of computers – Large datasets à Terabytes or petabytes of data – Large clusters à hundreds or thousands of nodes
• Hadoop is open-‐source implementaHon for Google MapReduce
• Hadoop is based on a simple programming model called MapReduce
• Hadoop is based on a simple data model, any data will fit
37
What is Hadoop (2) • Hadoop framework consists on two main layers
– Distributed file system (HDFS) – ExecuHon engine (MapReduce)
38
Hadoop Master/Slave Architecture
• Hadoop is designed as a master-‐slave shared-‐nothing architecture
39
Master node (single node)
Many slave nodes
Design Principles of Hadoop
• Need to process big data • Need to parallelize computaHon across thousands of nodes
• Commodity hardware – Large number of low-‐end cheap machines working in parallel to solve a compuHng problem
• This is in contrast to Parallel DBs – Small number of high-‐end expensive machines
40
Design Principles of Hadoop
• Automa8c paralleliza8on & distribu8on – Hidden from the end-‐user
• Fault tolerance and automa8c recovery – Nodes/tasks will fail and will recover automaHcally
• Clean and simple programming abstrac8on – Users only provide two funcHons “map” and “reduce”
41
Who Uses MapReduce/Hadoop
• Google: Inventors of MapReduce compuHng paradigm
• Yahoo: Developing Hadoop open-‐source of MapReduce
• IBM, Microso{, Oracle • Facebook, Amazon, AOL, NetFlex • Many others + universiHes and research labs
42
Hadoop: How it Works
43
Hadoop Architecture
44
Master node (single node)
Many slave nodes
• Distributed file system (HDFS) • ExecuHon engine (MapReduce)
Hadoop Distributed File System (HDFS)
45
Centralized namenode -‐ Maintains metadata info about files
Many datanode (1000s) -‐ Store the actual data -‐ Files are divided into blocks -‐ Each block is replicated N Hmes (Default = 3)
File F 1 2 3 4 5
Blocks (64 MB)
Main ProperHes of HDFS
• Large: A HDFS instance may consist of thousands of server machines, each storing part of the file system’s data
• ReplicaHon: Each data block is replicated many Hmes (default is 3)
• Failure: Failure is the norm rather than excepHon • Fault Tolerance: DetecHon of faults and quick, automaHc recovery from them is a core architectural goal of HDFS – Namenode is consistently checking Datanodes
46
Map-‐Reduce ExecuHon Engine (Example: Color Count)
47
Shuffle & SorHng based on k
Reduce
Reduce
Reduce
Map
Map
Map
Map
Input blocks on HDFS
Produces (k, v) ( , 1)
Parse-hash
Parse-hash
Parse-hash
Parse-hash
Consumes(k, [v]) ( , [1,1,1,1,1,1..])
Produces(k’, v’) ( , 100)
Users only provide the “Map” and “Reduce” func5ons
ProperHes of MapReduce Engine
• Job Tracker is the master node (runs with the namenode) – Receives the user’s job – Decides on how many tasks will run (number of mappers) – Decides on where to run each mapper (concept of locality)
48
• This file has 5 Blocks à run 5 map tasks
• Where to run the task reading block “1” • Try to run it on Node 1 or Node 3
Node 1 Node 2 Node 3
ProperHes of MapReduce Engine (Cont’d)
• Task Tracker is the slave node (runs on each datanode) – Receives the task from Job Tracker – Runs the task unHl compleHon (either map or reduce task) – Always in communicaHon with the Job Tracker reporHng progress
49
Reduce
Reduce
Reduce
Map
Map
Map
Map
Parse-hash
Parse-hash
Parse-hash
Parse-hash
In this example, 1 map-‐reduce job consists of 4 map tasks and 3 reduce tasks
Key-‐Value Pairs
• Mappers and Reducers are users’ code (provided funcHons) • Just need to obey the Key-‐Value pairs interface • Mappers:
– Consume <key, value> pairs – Produce <key, value> pairs
• Reducers: – Consume <key, <list of values>> – Produce <key, value>
• Shuffling and Sor8ng: – Hidden phase between mappers and reducers – Groups all similar keys from all mappers, sorts and passes them to a certain reducer in the form of <key, <list of values>>
50
MapReduce Phases
51
Deciding on what will be the key and what will be the value è developer’s responsibility
Example 1: Word Count
52
Map Tasks
Reduce Tasks
• Job: Count the occurrences of each word in a data set
Example 2: Color Count
53
Shuffle & SorHng based on k
Reduce
Reduce
Reduce
Map
Map
Map
Map
Input blocks on HDFS
Produces (k, v) ( , 1)
Parse-hash
Parse-hash
Parse-hash
Parse-hash
Consumes(k, [v]) ( , [1,1,1,1,1,1..])
Produces(k’, v’) ( , 100)
Job: Count the number of each color in a data set
Part0003
Part0002
Part0001
That’s the output file, it has 3 parts on probably 3 different machines
Example 3: Color Filter
54
Job: Select only the blue and the green colors Input blocks on HDFS
Map
Map
Map
Map
Produces (k, v) ( , 1)
Write to HDFS
Write to HDFS
Write to HDFS
Write to HDFS
• Each map task will select only the blue or green colors
• No need for reduce phase Part0001
Part0002
Part0003
Part0004
That’s the output file, it has 4 parts on probably 4 different machines
Bigger Picture: Hadoop vs. Other Systems
55
Distributed Databases Hadoop
CompuHng Model -‐ NoHon of transacHons -‐ TransacHon is the unit of work -‐ ACID properHes, Concurrency
control
-‐ NoHon of jobs -‐ Job is the unit of work -‐ No concurrency control
Data Model -‐ Structured data with known schema -‐ Read/Write mode
-‐ Any data will fit in any format -‐ (un)(semi)structured -‐ ReadOnly mode
Cost Model -‐ Expensive servers -‐ Cheap commodity machines
Fault Tolerance -‐ Failures are rare -‐ Recovery mechanisms
-‐ Failures are common over thousands of machines
-‐ Simple yet efficient fault tolerance
Key CharacterisHcs -‐ Efficiency, opHmizaHons, fine-‐tuning -‐ Scalability, flexibility, fault tolerance
Agenda
1. Big Data 1. DefiniHon 2. ApplicaHons
2. Hadoop / MapReduce CompuHng Paradigm 1. Overview 2. How it works
3. NoSQL databases
57
IntroducHon
• Relaxing ACID properHes • NoSQL databases
– Categories of NoSQL databases – Typical NoSQL API – RepresentaHves of NoSQL databases
• Conclusions
58
Relaxing ACID properHes • Atomicity, Consistency, IsolaHon, Durability • Cloud compuHng: ACID is hard to achieve, moreover, it is not
always required, e.g. for blogs, status updates, product lisHngs, etc.
• Availability – TradiHonally, thought of as the server/process available 99.999 %
of Hme
– For a large-‐scale node system, there is a high probability that a node is either down or that there is a network parHHoning
• ParHHon tolerance – Ensures that write and read operaHons are redirected to available
replicas when segments of the network become disconnected
59
Eventual Consistency • Eventual Consistency
– When no updates occur for a long period of Hme, eventually all updates will propagate through the system and all the nodes will be consistent
– For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service
• BASE (Basically Available, So{ state, Eventual consistency) properHes, as opposed to ACID
• So{ state: copies of a data item may be inconsistent • Eventually Consistent – copies becomes consistent at some later Hme if there are no more updates to that data item
• Basically Available – possibiliHes of faults but not a fault of the whole system
60
CAP Theorem • Suppose 3 properHes of a system
– Consistency (all copies have same value) – Availability (system can run even if parts have failed) – ParHHons (network can break into two or more parts, each with acHve
systems that can not influence other parts)
• Brewer’s CAP “Theorem”: for any system sharing data it is impossible to guarantee simultaneously all of these three properHes
• Very large systems will parHHon at some point – it is necessary to decide between C and A – tradiHonal DBMS prefer C over A and P – most Web applicaHons choose A (except in specific applicaHons such
as order processing)
61
CAP Theorem
• Drop A or C of ACID – Relaxing C makes replicaHon easy, facilitates fault tolerance,
– Relaxing A reduces (or eliminates) need for distributed concurrency control.
62
NoSQL databases • The name stands for Not Only SQL • Common features:
– Non-‐relaHonal – Usually do not require a fixed table schema – Horizontal scalable – Mostly open source
• More characterisHcs – Relax one or more of the ACID properHes (see CAP theorem) – ReplicaHon support – Easy API (if SQL, then only its very restricted variant)
• Do not fully support relaHonal features – No join operaHons (except within parHHons), – No referenHal integrity constraints across parHHons.
63
Categories of NoSQL databases • Key-‐value stores • Column NoSQL databases • Document-‐based • XML databases (myXMLDB, Tamino, Sedna) • Graph database (neo4j, InfoGrid)
64
Categories of NoSQL databases • Key-‐value stores • Column NoSQL databases • Document-‐based • XML databases (myXMLDB, Tamino, Sedna) • Graph database (neo4j, InfoGrid)
65
Key-‐Value Data Stores
• Example: SimpleDB – Based on Amazon’s Single Storage Service (S3) – Items (represent objects) having one or more pairs (name, value), where name denotes an a_ribute.
– An a_ribute can have mulHple values. – Items are combined into domains.
66
Column-‐oriented* • Store data in column order • Allow key-‐value pairs to be stored (and retrieved on key)
in a massively parallel system – Data model: families of a_ributes defined in a schema, new a_ributes can be added
– Storing principle: big hashed distributed tables – ProperHes: parHHoning (horizontally and/or verHcally), high availability etc. completely transparent to applicaHon
* Be_er: extendible records
67
Column-‐oriented
• Example: BigTable – Indexed by row key, column key and timestamp. i.e. (row: string , column: string , Hme: int64 ) à String.
– Rows are ordered in lexicographic order by row key. – Row range for a table is dynamically parHHoned, each row range is called a tablet.
– Columns: syntax is family:qualifier
“Contents:” “anchor:cnnsi.com” “anchor:my.look.ca”
“mff.ksi.www” “MFF” “MFF.cz” t3
t5 t6 t9 t8
<html> <html> <html>
column family
68
A table representaHon of a row in BigTable
Row key Time stamp Column name Column family Grandchildren
h_p://ksi.... t1 "Jack" "Claire" 7
t2 "Jack" "Claire" 7 "Barbara" 6
t3 "Jack" "Claire" 7 "Barbara" 6 "Magda" 3
69
Column-‐oriented
• Example: Cassandra – Keyspace: Usually the name of the applicaHon; e.g., 'Twi_er', 'Wordpress‘.
– Column family: structure containing an unlimited number of rows
– Column: a tuple with name, value and Hmestamp – Key: name of record – Super column: contains more columns
70
Document-‐based
• Based on JSON format: a data model which supports lists, maps, dates, Boolean with nesHng
• Really: indexed semistructured documents • Example: Mongo
– {Name:"Jaroslav", Address:"Malostranske nám. 25, 118 00 Praha 1“ Grandchildren: [Claire: "7", Barbara: "6", "Magda: "3", "Kirsten: "1", "OHs: "3", Richard: "1"]
}
71
Typical NoSQL API • Basic API access:
– get(key) • Extract the value given a key
– put(key, value) • Create or update the value given its key
– delete(key) • Remove the key and its associated value
– Execute(key, operaHon, parameters) • Invoke an operaHon to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).
72
RepresentaHves of NoSQL databases key-‐valued
Name Producer Data model Querying
SimpleDB Amazon set of couples (key, {attribute}), where attribute is a couple (name, value)
restricted SQL; select, delete, GetAttributes, and PutAttributes operations
Redis Salvatore Sanfilippo
set of couples (key, value), where value is simple typed value, list, ordered (according to ranking) or unordered set, hash value
primitive operations for each value type
Dynamo Amazon like SimpleDB simple get operation and put in a context
Voldemort LinkeId like SimpleDB similar to Dynamo
73
RepresentaHves of NoSQL databases column-‐oriented
Name Producer Data model Querying
BigTable Google set of couples (key, {value}) selection (by combination of row, column, and time stamp ranges)
HBase Apache groups of columns (a BigTable clone)
JRUBY IRB-based shell (similar to SQL)
Hypertable Hypertable like BigTable HQL (Hypertext Query Language)
CASSANDRA Apache (originally Facebook)
columns, groups of columns corresponding to a key (supercolumns)
simple selections on key, range queries, column or columns ranges
PNUTS Yahoo (hashed or ordered) tables, typed arrays, flexible schema
selection and projection from a single table (retrieve an arbitrary single record by primary key, range queries, complex predicates, ordering, top-k)
74
RepresentaHves of NoSQL databases document-‐based
Name Producer Data model Querying
MongoDB 10gen object-structured documents stored in collections; each object has a primary key called ObjectId
manipulations with objects in collections (find object or objects via simple selections and logical expressions, delete, update,)
Couchbase Couchbase1 document as a list of named (structured) items (JSON document)
by key and key range, views via Javascript and MapReduce
1a{er merging Membase and CouchOne
DATAKON 2011 J. Pokorný 75
76
Conclusions of the NoSQL secHon
• NoSQL database cover only a part of data-‐intensive cloud applicaHons (mainly Web applicaHons).
• Problems with cloud compuHng: – SaaS applicaHons require enterprise-‐level funcHonality, including ACID transacHons, security, and other features associated with commercial RDBMS technology, i.e. NoSQL should not be the only opHon in the cloud.
– Hybrid soluHons: • Voldemort with MySQL as one of storage backend • deal with NoSQL data as semistructured data ⇒ integraHng RDBMS and NoSQL via SQL/XML
77
Conclusions of the NoSQL secHon • next generaHon of highly scalable and elasHc RDBMS:
NewSQL databases (from April 2011) – they are designed to scale out horizontally on shared nothing machines,
– sHll provide ACID guarantees, – applicaHons interact with the database primarily using SQL, – the system employs a lock-‐free concurrency control scheme to avoid user shut down,
– the system provides higher performance than available from the tradiHonal systems.
• Examples: MySQL Cluster (most mature soluHon), VoltDB, Clustrix, ScalArc, …
DATAKON 2011 J. Pokorný 78
Conclusions of the NoSQL secHon
• New buzzword: SPRAIN – 6 key factors for alternaHve data management: – Scalability – Performance – Relaxed consistency – Agility – Intricacy – Necessity
• Conclusion in conclusions: These are exciHng Hmes for Data Management research and development
Références • J. Manyika et al., Big data, the next fron0er for innova0on, compe00on, and produc0vity,
McKinsey Global InsHtute, 2011. • S. Déjean, J. Mothe, Visual clustering for data analysis and graphical user interfaces,
Handbook of Cluster Analysis, 2012. • D. Oelke et al., Visual Opinion Analysis of Customer Feedback Data, IEEE Symposium of
Visual AnalyHcs in Science and Technology, 2009 • D. Oelke et al., Visual Readability Analysis: How to Make Your WriHngs Easier to Read, IEEE
Trans. on VisualizaHon and computer graphics, 18(5):662-‐674, 2012 • A. Ynnerman, Visualizing the medical data explosion, TED, 2011 • N. Kejzar, S. Korenjak-‐Cerne, and V. Batagelj. Network analysis of works on clustering and
classificaHon from web of science. In Proceedings of IFCS’09, pages 525–536. Springer, 2010. • S. Déjean, P MarHn, A. Baccini, and P. Besse. Clustering Hme series gene expression data
using smoothing spline derivaHves. EURASIP Journal on Bioinforma0cs and Systems Biology, 2007. arHcle ID 70561.
• B. Dousset. Tetralogie : interacHvity for compeHHve intelligence, 2012. h_p://atlas.irit.fr/PIE/OuHls/Tetralogie.html
• A. Baccini, S. Déjean, L. Lafage, and J. Mothe. How many performance measures to evaluate informaHon retrieval systems? Knowledge and Informa0on System, pages 693–713, 2011.
top related