do you know what k-means? cluster-analysen

65
JAX 2017 - Big Data– H. Erb Do you know what k-Means? Cluster-Analysen durchführen und in Echtzeit implementieren Harald Erb Oracle Business Analytics & Big Data

Upload: harald-erb

Post on 22-Jan-2018

196 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Do you know what k-Means? Cluster-Analysen durchführen und in Echtzeit implementieren

Harald Erb

Oracle Business Analytics & Big Data

Page 2: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb 2

Speaker Bio

Harald Erb Sales Engineer,

Information Architect

Business Analytics & Big Data

+49 (0)6103 397-403

[email protected]

• Architect, Project Lead • Requirements Analysis • DWH-/BI Development

1998 2009 2011 2017

Database SQL*Plus Intermedia-Text

Cross Industry Engagements

• Solution Architect • Region Western Europe

PLM Analytics

Warehouse Builder

In-Database Analytics

• Information Architect • Region EMEA, DE/CH Cluster

Big Data Discovery

Information Discovery

BI In-Memory Machine

HDFS, Hive, Impala

Big Data Spatial & Graph

E-Commerce/Intranet Solutions

Page 3: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Agenda

Data Lab

Innovation

Discovery

Output Events

& Daten

Handlungsrelevante

Informationen Umsetzbare

Erkenntnisse

Data

Reservoir

Data Factory Enterprise

Information Store

(Echtzeit-)

Datenstrom

“Tagesgeschäft”

Unternehmens-

daten (ERP, CRM,

operative Daten)

Externe

strukturierte

Daten

Line of Governance

Intelligente

Prozesse

Event Engine BI &

Analyse

Oracle White Paper - Information Management & Big Data, A Reference Architecture

Page 4: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Cluster-Analysen mit k-Means Wie und wozu?

Page 5: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Cluster-Analysen mit k-Means

5

• Cluster-Analyse:

Oberbegriff für multivariate Methoden, die versuchen, Strukturen (Cluster) in den Massendaten zu finden

Methoden basieren meist auf Berechnungen der Distanz der Beobachtungen im multidimensionalen Datenraum

• Typische Fragestellungen:

Kundensegmente oder ähnliche Textdokumente finden

Abnormale Datenpunkte innerhalb eines großen Data Sets oder Massendaten finden

Guter Einstieg in Datenanalysen, bevor Klassifikations- oder Regressionsmethoden zum Einsatz kommen

• k-Means Clustering :

gehört zu den Austauschverfahren und ist ein partitionierender Clustering Algorithmus

Unterscheidung. Algorithmus von Lloyd ("der" k-Means Algorithmus) vs. MacQueen (führte mit dem Begriff "k-Means allerdings einen anderen Algorithmus ein)

• Vor- und Nachteile

Einfache und schnelle Implementierung, gute Laufzeiten

Schwachstelle: Die gefundene Lösung hängt stark von den gewählten Startpunkten ab

• Verschiedene k-Means Variationen, u.a.:

k-Medians: verwendet statt der euklidischen Distanz, u.a. die sog. „Manhattan-Distanz“ zur Abstandsberechnung

k-Means++-Algorithmus: wählt die Cluster-Schwerpunkte nicht zufällig, sondern nach Vorschrift

k-Medoids (PAM, Partitioning Around Medoids) minimiert die Distanzen (statt der Summe der Varianzen bei k-Means)

Page 6: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Prinzip/Ablauf einer k-Means Cluster-Analyse

1. Initialisierung 2. Klassifizierung 3. Cluster-Zentren (Mean) berechnen 4. Iteration

n Durchgänge bis die Verschiebung der Cluster-Zentren ausreichend klein ist bzw. gegen 0 geht

Page 7: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

k-Means Cluster-Analysen Der ambitionierte Fachanwender

Page 8: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Business Analyst (ambitionierter Fachanwender)

Herausforderung:

Know-how bei Auswahl/ Anwendung/Parametrisierung geeigneter Algorithmen

Vorhandenes Toolset hilft i.d.R. nur begrenzt (Daten-menge, Funktionalität)

Will mathematische Verfahren für mehr/ bessere Erkenntnisse nutzen:

Kann damit in Massen-daten Strukturen finden, wenn sonstige Ansatz- punkte fehlen

Page 9: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Fallbeispiel: „Das zweite Standbein“

Wein-Saison 2015@Oracle – ein fast fiktives Szenario

9

Hamburg

Bremen

Hannover

Frankfurt

Munich

Geneva

Zurich

Copenhagen

Stockholm

Gothenburg

Oslo Helsinki

Heidelberg

Stuttgart

Page 10: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Fallbeispiel: „Das zweite Standbein“ A

nza

hl B

este

llun

gen

Weinangebote

An

zah

l Bes

tellu

nge

n

Weinangebote

Ziel: Erkennen möglicher Bestellmuster durch Bildung von Gruppen (Cluster) aus einer Menge von ähnlichen Objekten

Ausgangslage: Bestellungen vs. Weinangebote

Klassifizierung historischer Bestelldaten

Page 11: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Cluster-Analyse „zu Fuß“: MS Excel + Solver

1. Start: Welche Geschäfte sind zustande bzw. nicht zustande gekommen?

Matrix mit 32 Angeboten pro Kunde

Geschäft zu stande gekommen? 0 = Nein, 1 = Ja

Hilfsmittel: Pivot-Assistent

2. Anzahl Cluster festlegen

Nicht zu groß beginnen, z.B. k=4 Cluster denn es sollen nicht zu viele unterschiedl. Newletter getextet werden müssen...

3. Cluster-Bildung vorbereiten (für k=4 Cluster)

Abstandsmessung zwischen Cluster-Zentrum und einem anderen Punkt: euklidische Norm

Excel: Array-Formel {=SQRT(SUM((.... .....)^2))}

Kleinster Abstand zu Cluster-Zentren bestimmen

Zuweisung zu Cluster mit Funktion MATCH()

4. Cluster-Zentren bestimmen (k=4 Cluster)

Optimierung mit dem Ziel: Optimale Punkte finden, bei denen die Summe der Abstände zw. den Kunden und ihren zugewiesenen Clustern am kleinsten ist

Hilfsmittel: Solver verwendet evolutionären Algorithmus (= Kombination aus regelloser Suche und „Ausbrüten“), Solver-Parameter:

• Ziel = Minimierung der Summe der Abstände zwischen Kunde und Cluster-Zentren

• Entscheidungsvariablen: ..

• Bedinungen: Cluster-Zentren sollen Werte zwischen 0 und 1 haben

5. Interpretation der Ergebnisse (k=4 Cluster)

Mitglieder (=Angebote) bzw. Top-Verkäufe zu einem Cluster untersuchen mit Funktion SUMIF()

Ergebnisse noch nicht eindeutig, Cluster überlappen sich teilweise Anzahl k Cluster verändern?

Page 12: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

12

Page 13: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Cluster-Analyse „zu Fuß“: MS Excel + Solver

6. Cluster-Bewertung (Silhouette) berechnen

Zunächst als Hilfsmittel eine Abstandsmatrix für alle Kunden untereinander anlegen, dazu helfen u.a. die Excel-Funktionen OFFSET()

Silhouette: Durchschnittl. Abstand zu Mitgliedern im nächstgelegenen benachb. Cluster vs. durchschnittl. Abstand zu den Mitgliedern im eigenen Cluster.

Silhouetten-Bewertung: Wertebereich liegt zwischen -1 und +1 (perfekt)

Hilfreiche Excel-Funktionen: AVERAGEIF(), SMALL(), INDEX(), IF()

7. Ergebnis (Cluster k=4)

Silhouette = 0,149

Entscheidung: Anzahl k für Cluster-Bildung verändern

8. Neue Anzahl k für Cluster festlegen

Jetzt mit k=5 Cluster rechnen

9. Cluster-Zentren bestimmen (k=5 Cluster)

...((wie zuvor))

10. Interpretation der Ergebnisse (k=5 Cluster)

Silhouette = 0,134

Entscheidung: Algorithmus verändern und Abstands-berechnung überdenken

11. k-Medians Algorithmus verwenden (k=5 Cluster)

Vorgenommene Änderung: im Solver mit binären Daten rechnen (Kunde hat gekauft = 1 oder nicht = 0)

12. Auswahl der geeigneten Abstandsmetrik

Ziel der Analyse: „Warum wurde etwas gekauft?“ binäre 1 (gekauft) ist mehr wert als 0 (nicht gekauft)

Neue Abstandsmetrik: statt „Manhattan-Distanz“ asymmetrische Abstandsrechnung für die 0-1-Daten wählen Kosinus-Abstand (Kosinus-Ähnlichkeit)

13. Cluster-Bildung mit Kosinus-Abstand (k=5 Cluster)

...((wie zuvor, nur mit binärer Solver-Bedingung)

Page 14: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Page 15: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

k-Means Clustering für alle: Data Viz Tools

Einfach mit 1-Click-Funktionen – aber aussagekräftig?

Page 16: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

k-Means Cluster-Resultate interpretieren Hier geht die Analyse erst richtig los!

quora.com/How-do-you-interpret-k-means-clustering-results

Page 17: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Page 18: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

k-Means Clustering für alle: Data Viz Tools

Syntax-Informationen zur Cluster-Bildung (über Wrapper technisch implementiert via R-Integration)

Verfügbare Methoden für K-Means: MacQueen, Lloyd, Hartigan-Wong, Forgy,...

Mehr Methoden, präzise Paramtetrisieren – noch einfach?

Page 19: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

k-Means Cluster-Analysen Teamsport im Data Lab

Page 20: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

1876: Edison’s Invention Factory, Menlo Park, NJ

Page 21: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Characteristics of Digital Business Leaders

They ‘Reframe’ Challenges Looking at them from new perspectives and multiple angles

They Sprint They work at pace - researching, testing and evaluating current ideas while generating new ones

They Appreciate That

Failure Can Be Good

and are not afraid of new ideas

They Convert Data Into Value

They invest heavily in analyzing their own data and data from external sources to establish patterns and un-noticed opportunities

Source: Oracle EMEA, Digital Group

Page 22: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Design Thinking, Experimentation, Agility

Page 23: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Analytical Competency Center

Collaboration Model and Services

» offers a tiered service model, which can be flexibly adapted to the analytics competencies of the specialist departments

» serves as an incubator for specialist departments without Analytics competence and enables rapid statements (i.e. within 10 days) on the potential and feasibility of use cases

» supports knowledge building and exchange in the area of Advanced Analytics within the company

Department – has no own Analytics

competencies

Department – has domain specific

Analytics competencies

Department – runs their own Analytics

infrastructure

„Analytics as a Service“ – including platform services and project execution

Providing platform, methodological expertise and comprehensive data knowledge

ACC ACC ACC

Analysis in the backend and on mass data

Domain knowledge Domain specific analysis

Domain specific

analysis on specialized

infra-structure

Page 24: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Science – Types

Source: Cloudera

Page 25: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Science – Process Model

Source: Klaas Bollhoefer, Chief Data Scientist, *um

Page 26: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Lab

Data Management Architecture Sandboxes

26

Page 27: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Lab and overall Data Management

Line of Governance

Data Lake

Data Processing

Data Enrichment Raw Data

Sets

Curated & Transformed

Data Sets Data

Aggregation

Data Lab

Sandboxes

Data Catalog Data Discovery

Tools

Transformations Prototyping

Analytic Tools

Enterprise Information

Store

Operational Data Store

Data Federation & Virtualization Layer

Co

mm

on

SQ

L A

cces

s to

A

LL D

ata

Orchestration, Scheduling & Monitoring

Metadata Management

Data Ingestion

Batch Integration

Real-Time Integration

Data Streaming

Data Wrangling

Reporting / Business

Intelligence

Data Driven Applications

Advanced Analytics

Non-structured Sources

Logs

Social Media

External Data

Interactions

Structured Data

Master Data

Applications

Channels

Data Stores

Adhoc Files or Data Sets

Page 28: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Lab – Key Requirements

Based on Raw Data

Full Access to Data Sources

(Select only)

Complete Sandbox

Environment

Agile Experimentation

“Fail Fast”

Page 29: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Lab - Team Sport

DWH / OLTP

Databases

Hadoop

Data Engineer

Data Science

Discovery Output

Business Analyst

New KPI, Report Requirement

Data Scientist

New Data Set (cleaned / enriched)

Page 30: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Lab

Exploratory Analysis Big Data Discovery

30

Page 31: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Scenario: Investigation of Retail Sales

Demo Part 1

Additional data (Store locations and

other attributes)

Historical Retail Sales data

New Data Set in Data Catalog

train.csv - historical data including Sales test.csv - historical data excluding Sales store.csv - supplemental information about Retail Stores

Page 32: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Catalog Create/maintain Catalog, Identify Data Sets, Define Data Project, Grant Access

Page 33: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Project Inspect & Annotate Data Sets inside the Project

Page 34: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Project Verifying Data Quality and Information Potential, Experimenting with a Scratchpad

Page 35: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Project Link multiple Data Sets and finetune attributes (Aggregation, Refinements)

Page 36: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Project Create Dashboards from scratch, configure/modify and extend as needed

Page 37: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Project From interactive Summaries...

Page 38: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Project ...down to the lowest Detail Level

Page 39: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Project ...Timeline view, side-by-side comparisons

Page 40: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Project Document analysis steps, findings and share with peers

Page 41: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Lab

Data Science Tools, Python ML, Spark Jupyter Notebook

42

Page 42: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Lab - Team Sport (Continued) Data Scientist is taking over and applies Statistical Methods

DWH / OLTP

Databases

Hadoop

Data Engineer

Data Science

Discovery Output

Business Analyst

New KPI, Report Requirement

Data Scientist

New Data Set (cleaned / enriched)

Page 43: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Sampling and Transformation Pushdown

Page 44: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Shaping a Data Set for further processing Handling of sparse Data / NULL values

Page 45: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Shaping a Data Set for further processing Aggregation

• Roll up low-level data to higher grains, i.e. Store Level

• Intuitive UI helps analysts find the right grains

• Execute at full scale using

• Results can be sampled or indexed in full

Page 46: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Shaping a Data Set for further processing Joining multiple Data Sets

Blend huge Data Sets in BDD

• UI to support experimentation, preview

• Execute at scale with

Page 47: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Analysis with Python (Re-)use data from Oracle Big Data Discovery while working with the BDD Shell

List of Oracle Big Data Discovery Data Sets

Converting a Oracle Big Data Discovery Data Set into an Apache Spark Dataframe

Import of additional Libraries for Data Analysis & Machine Learning

•BDD Shell is an interactive tool designed to work with BDD without using Studio's front-end

•Provides a way to explore and manipulate the internals of BDD and interact with Hadoop

•Python-based shell

•Exposes all BDD data objects

•Easy-to-use Python Wrappers for BDD APIs and Python Utilities

•Use of Third-party Libraries, e.g., Pandas and NumPy

Data Science

Page 48: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Notebooks for a better user experience xx

Easiest way to use the BDD-Shell

– Visual appeal, ease of use, collaboration features of an integrated platform

– Power and flexibility of custom code

– Pick up BDD’s datasets and leverage Machine Learning algorithms to infer new insight

Data Science

Page 49: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Demo Part 2

Re-using a Data Set for Machine Learning

Shaping a new Data Set in Big Data Discovery Tool

Machine Learning with Python ML & Spark in Jupyter Notebook

Page 50: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Demo Part 2

Using Jupyter Notebook, Python ML & Spark

Page 51: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Exploratory Analysis

Decomposition of the original time series data set to check for trend, seasonal effects

Checking for new features, patterns, outliers etc.

Page 52: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Prediction and Validation of Results

Extraction of non-linear features using Gradient Boosting Decision Trees

Result of Random Forest decision trees showing how much each

explanatory variable affects the model

Prediction using a simple Linear Regression model

Page 53: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

K-Means Cluster-Analysen Stream Analytics Beispiel

Page 54: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Data Science for the Enterprise Discovery and monetising steps have different requirements

Ad

apte

d f

rom

Clo

ud

era

Line of Governance

• Commercial exploitation • Narrower toolset • Integration to operations • Non-functional requirements • Code standardisation & governance

• Unbounded discovery • Self-Service sandbox • Wide toolset • Agile methods

Page 55: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Oracle Big Data Analytics Platform

Relational database option

7) Data Scientist develops R code and deploys the R code for parallel server-side execution in the database or Hadoop

Oracle R Enterprise

Oracle R Hadoop

6) Data Scientist accesses all data in HDFS, NoSQL and relational databases via Oracle Big Data SQL

Oracle Big Data SQL

Oracle Database

Plant automation

1) Data extract, transform, load

Oracle Data Integrator/ Goldengate

Big Data - Spark NoSQL HDFS

CX/CRM

ERP, Asset management

Devices

Oracle Machine Learning

5) Data Scientist analyses the data by predictive tools

4) Collaboration with Data Scientists by sharing snapshots

Oracle Big Data

Discovery

3) Subject Matter Expert explores and discovers

Oracle Big Data

Discovery

2) Automatic catalog of all data sets in Hadoop

9) Stream analytics

Business Activity Monitor

8) Data Scientist loads model configuration as PMML for stream analytics execution

Oracle Stream Analytics

λ

Kaf

ka

PMML

λ

Page 56: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Predictive Model Markup Language PMML / JPMML

Source: en.wikipedia.org/wiki/Predictive_Model_Markup_Language

PMML

» is an XML-based predictive model interchange format

» provides a way for analytic applications to describe and exchange predictive models produced by data mining and machine learning algorithms

PMML components » Header: contains general information about the PMML document

» Data Dictionary: contains definitions for all possible fields used by the model.

» Data Transformations: transformations allow for the mapping of user data into a more desirable form to be used by the mining model. PMML defines several kinds of simple data transformations (Normalization, Discretization, Value mapping, custom and built-in Functions , Aggregation

» Model: contains the definition of the data mining model.

» Mining Schema: a list of all fields used in the model.

» Targets: allows for post-processing of the predicted value in the format of scaling if the output of the model is continuous.

» Output: this element can be used to name all the desired output fields expected from the model.

Page 57: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Batch Analytics Custom k-Means Code using PySpark (Spark Python API)

Page 58: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Page 59: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Stream Analytics

k-Means Machine Learning Pattern

Prinzip der Anomalie-Erkennung (im 2-dimensionalen Raum)

Beispiel: Herstellung von Netzteilen. Ein 2-dimensionaler Raum ist über 2 Variablen (Features) definiert, die Messpunkte liegen üblicherweise in 2 Gruppen. Ein auffälliges Gerät ( x ) liegt zwar im Toleranzbereich aber außerhalb der beiden Gruppen

Demo- Video

Page 60: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Stream Analytics

Event Stream zuweisen

Live Output

Topolgie-Ansicht

Demo- Video

Page 61: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Takeway Message & Bonus

Page 62: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Take away Message Smart analysieren – mit Vorgehensmodell, Wissen und Experimentierumgebung

Data Analytics Kreislauf ein iterativer Prozess inkl. Fehlschläge! Im Fokus steht dabei immer die fachliche Aufgabe bzw. das Ziel der Analyse

Trotz komfortabler Analyse-Tools kommt man nicht an solcher Lektüre vorbei, denn....

...k-Means clustering (und die Anwendung anderer Algorithmen) is not a free lunch varianceexplained.org/r/kmeans-free-lunch

Page 63: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

github.com/oracle

developer.oracle.com

Page 64: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

Bonus: Analytics mit python-cx_Oracle

oracle.com/technetwork github.com/oracle

Page 65: Do you know what k-Means? Cluster-Analysen

JAX 2017 - Big Data– H. Erb

DANKE