j&j thesis presentation july 2016

Machine Learning Techniques In Categorical Time Series Analysis

Of Manufacturing Process

Haris Michailidis, Isidora Tourni

National Technical University of AthensSchool of Electrical and Computer Engineering

Professor: Nectarios KozirisJ&J Responsible: Michalis AvgoulisPresentation Date: 26/07/2016

Contents

● Problem Motivation

● Visualization

● Machine Learning

● Results

● Future Work

Introduction

Problem Motivation

In cooperation with Johnson & Johnson Hellas

Goals:

● Visualisation of Mixing Process● Quantification of Procedures● Classification & Clustering of processes

Further Goals:

● Optimization of the Mixing Process● Comparison with Golden Standard (Evaluation)● Comparison between different batches of the same Product

Process Description

Example Vessel Actions:

● Heating● Agitation● Addition of Materials ● Pressure adjustment

PLC logging

● Output to CSV

MixingRaw Materials Bottling

Product Categories

Emulsion

Product Cleaning Group

Product Categories

Picsou C

Product Group

Apple Cream

Data-Set Description ~130.000 rows/year

45 message code sets (values,set-points)

Categorical Data

Visualization

Visualization Tool

Goals:

● Visualization of Mixing Process● Selective representation of variables● Overview with flexible Timeframe● Accessible from multiple terminals (web interface)

The human brain processes visuals 60.000 timesfaster than text. *

* Forrester CSO Insights 2012

Visualization Tool (interface)

Initial Page of the Visualisation Tool

4 days overview

1 day overview

Detailed box in complex visualization

Machine Learning

Goal: Explore the possibilities of Machine Learning in Manufacturing space, in order to produce useful insights for the process.

● Classification● Clustering

Challenges: Represent an object in an N-dimensional space

● Representation of each batch | Object Creation● Data Cleansing / Creation of training set | Labelling● “Distance” between batches | Distance Calculation

Inspiration:

● DNA sequence analysis → Markov Models

Machine Learning Introduction

Unit of analysis: batch

1. Data cleansing

Value - Set-Point Flattening, Typos Correction

2. Labelling of batches

Through 2 files:

● Log file: containing manual entries from operators● Mapping table: containing information for each product

3. Time-series splitting to batches

Depending on business rules which derived from experience and observations. Keep only production chunks.

Solving the Challenges

4. Feature selection

Message Number

5. Representation of each batch 6. Unequal length time-series comparison

Message Mapping Table

Transition Matrix Concept

Sequence 1 :

B-B-C-A-B-C-A-C-A-B-C

Sequence 2 :

A-A-B-B-A-B-B-C-C-A-B-B-C-A-B-C-A-C

A 0.00 0.66 0.33

B 0.00 0.25 0.75

C 1.00 0.00 0.00

A 0.17 0.67 0.17

B 0.14 0.43 0.43

C 0.75 0.00 0.25

Transition Matrix 1 : Transition Matrix 2 :

Solving the Challenges

4. Feature selection

Message Number

5. Representation of each batch 6. Unequal length time-series comparison

Chunk Object, containing:● Transition Matrix (fixed size 45x45)● Labels

7. Distance calculation method

Great research area

Transition Matrix

Message Mapping Table

Distance Evaluation

● Distance {batch - batch} → Distance between 2D Transition Matrices

Problems:

1. Choosing the proper Vector Distance Metric2. Converting 2D Transition Matrix → Vector

Solutions:

1. Distance between Vectors:● Euclidean Distance● Cosine Distance ● Kullback- Leibler Divergence ● Kolmogorov- Smirnov Test● Infinite Norm

Distance Evaluation2. 2D Matrix → Vector: *

A. Append each row to the firstB. Append each row from the diagonal matrix to the firstC. Average of distances between corresponding rows

* Not using Space-Filling curves due to unrelated spatial characteristics.22

Classification (supervised)

The process of classifying objects accordingto shared attributes.

Algorithms used:

● Nearest Centroid● k-Nearest Neighbors

Evaluation Methods:

● Accuracy ● Cohen’s Kappa (Kappa coefficient)

Clustering (unsupervised)

The task of grouping objects in such way that objects in the same group (cluster) are more similar to each other than to those in other groups.

Algorithms used:

● k-Means

Evaluation Methods:

● V-Measure● Rand-Index

Classification Results

Distance Comparison | Classification

Nearest Centroid Classifier

Train - Test Split Evaluation [1/2]

Classification Baseline (ZeroR):Product Cleaning Group Accuracy: 0.520

Product Group Accuracy: 0.377

k-Nearest Neighbors Classifier

Train - Test Split Evaluation [2/2]

Classification Baseline (ZeroR):Product Cleaning Group Accuracy: 0.520

Product Group Accuracy: 0.377

Clustering Results

Distance Comparison | Clustering

Conclusions

1) Visualizationa) Visual Production Overviewb) Enabling Comparison between batches

2) Machine Learninga) Valid Representation of Categorical Time-Seriesb) Quantification of Production Processesc) Application of Machine Learning Techniques

Future Work | Academic

● Research on 2D-specific Distance Metrics● Clustering Algorithms, based on Markov Models● Classification using Transition Matrices of different Dimensions (Markov-

0,2,...,N)● Different Feature Selection (temperature, pressure, etc)

● Data Gathering Automation● Creation of Golden Standard for each Product● Scoring of Production Process● Distribution of Batches compared to the Average Batch● Clustering to more efficient clusters based on the process

Future Work | Business

Thank you!

Questions?

Appendix

Distance Comparison | Classification

Setup:● Algorithm:

○ Nearest Centroid Classifier● Attributes:

○ Product Cleaning Group○ Product Group

● Split: ○ 80% training set, 20% test set

● Distances:○ All

Determining k in k-Nearest Neighbors

○ k-Nearest Neighbors● Attributes:

○ Product Cleaning Group○ Product Group

● Split: ○ 80% training set, 20% test set

● Distances (Average of):○ Euclidean total○ Cosine vector○ KL - Divergence diagonal

Train - Test Split Evaluation

○ Nearest Centroid Classifier○ k-Nearest Neighbors

● Attributes:○ Product Cleaning Group○ Product Group

● Split (train-test): ○ 80% - 20%○ 65% - 35%○ 50% - 50%

● Distances (Average of):○ Euclidean total○ Cosine vector○ KL - Divergence diagonal

Distance Comparison | Clustering

○ Baseline○ k-Means

● Initial Centroid Sets Type: ○ All centroids of each set belong to different clusters (Alldiff)

Average of 20 sets○ All centroids of each set belong to the same cluster (Allsame)

Average of 20 sets● Distances:

○ All

Impact of Initial Centroids

○ Baseline○ k-Means

● Initial Centroid Sets Type: ○ All centroids of each set belong to different clusters (Alldiff)

Average of 100 sets○ All centroids of each set belong to the same cluster (Allsame)

Average of 100 sets○ All centroids of each set belong to a random cluster (Allrandom)

Average of 100 sets● Distances (Average of):

○ Euclidean Total○ Euclidean Rowl○ Euclidean Column

Determining k in k-Nearest Neighbors [1/2]

Accuracy: Average: 0.727 Deviation: <1% Kappa: Average: 0.531 Deviation: ~2%

Determining k in k-Nearest Neighbors [2/2]

Accuracy: Average: 0.560 Deviation: <1% Kappa: Average: 0.391 Deviation: ~1%

Distance Comparison | Classification [2/2]

Distance Comparison | Clustering [2/2]

Impact of Initial Centroids [2/2]

Labelling

1. Data cleansing 2. Labelling of batches

Object Creation

3. Time-series splitting to batches 4. Representation of each batch (chunk) 5. Feature selection 6. Unequal length time-series comparison

Distance Calculation

7. Distance calculation method

Challenges in ML

Impact of Initial Centroids

j&j thesis presentation july 2016

Documents

thesis dissertation guide july 2021

final thesis...

robinson j - msc thesis - final

mcfadden jeffrey j 201003 dma thesis

gavin d. j. harper - msc. thesis

michael j-fetkovich-drtechn-thesis declinacion yacimientos

thesis final version_ 14 july 2006

j. kogan ma thesis 082898

thesis - j baumgardt final quality assurance challenges

esme choi-thesis book mfa j&m

j ladd thesis june 1

overview of process and policies for thesis …...thesis...

2015-july-31-dorch-tiffanie-mba thesis

courtney j campbell masters thesis

thesis j pigott

altered patterns of reflex excitability, balance, and...

diehl j phd thesis 07

williamson richard j 201111 edd thesis

j. m. castelo - phd thesis, 2011-2015

phd thesis amin karbassi july 2010 complete