machine-learning methods for classification of ... · machine-learning methods for classification...

Machine-Learning Methods for

Classification of Semiconductor

Defects Shing Chiang Tan

Multimedia University, Malaysia

sctan@mmu.edu.my

At Graduate School of Information, Production and Systems Waseda University, 12 Nov. 2012

Acknowledgement

This work is an outcome from a research collaboration with Prof. Watada of Waseda University, Prof. Marzuki Khalid and Dr. Zuwarie of Universiti Teknologi Malaysia. All researchers are grateful to Intel Malaysia for providing real data without which this work would never have commenced.

Outline

• Introduction: Semiconductor Manufacturing & Wafer Defect Detection

• Imbalanced Data, Problem, Issues and Classification Metrics

• Machine-Learning Methods and Results

• Summary

A. Introduction:

Semiconductor Manufacturing

Operation – How a wafer is produced

and tested?

Fabrication Sort Assembly Class test

Quality Cost Savings

Voluminous Data from Production

Process

Correlation Analysis

Alternative:

Machine-Learning Methods

•conducted manually •time consuming •complicated

•learning information from data •automatic defect detection

(Between-Class) Imbalanced Data

Small number of records (defective cases), minority

concept

Large number of records (non-defective ), majority concept

Unfavorable accuracies

Nature of the Imbalanced Data

Problem Dataspace

Nature

Intrinsic

(directly related)

Extrinsic

(indirectly related)

e.g., time interval for data acquisition

Data Quantity

Imbalance due to rare instances/absolute rarity

Relative imbalance

Imbalanced Data Complexity

Overlapping data from different classes

Lack of representative data rare instances are limited

Small disjuncts of data – either noise/outliers OR useful information

Imbalanced Data Complexity

Figure adapted from He and Garcia (2009)

Data-Driven Wafer Defect Detection:

Encrypted Dataset from Intel Dataset Attribute 1

(categorical)

Attribute 2 (numerical)

Attribute 3 (numerical)

Output (numerical)

Training Set

1A to 215A 174610–261970

174650–278140

Test Set 1A to 215A

180800–299370

176670–301770

Interpolate Rate of Test Data (%)

100 44.66 93.06 100

Extrapolate Rate of Test Data (%)

0 55.34 6.94 0

•AN OVERVIEW OF AN ENCRYPTED SEMICONDUCTOR DATASET

Encrypted Dataset Distribution

Data Classification: Confusion Matrix

Classification Metric

Definition

Correctly classified rate (CCR)

True positive rate (TPR)

True negative rate (TNR)

Geometric mean (G-mean)

Predicted Positive (Label=0)

Predicted Negative (Label=1)

Actual Positive (Label=0)

True Positive (TP) False Negative (FN)

Actual Negative (Label=1) False Positive (FP) True Negative (TN)

TNFNFPTP

TNTPCCR

TNRTPRmean-G

Imbalanced Data

Classification

Algorithm-Level Approach

Modified learning

algorithms

Kernel-based learning

Data-Level Approach

Sampling Methods: Over-, under-sampling

Synthetic Sampling (SMOTE)

Cost Sensitive Approach

Bootstrap sampling

Cost-sensitive function in

learning

Note: SMOTE – Synthetic Minority Oversampling TEchnique

Imbalanced Data

Classification

Algorithm-Level Approach

Modified learning

algorithms

Kernel-based learning

Data-Level Approach

Sampling Methods: Over-, under-sampling

Synthetic Sampling (SMOTE)

Cost Sensitive Approach

Bootstrap sampling

Cost-sensitive function in

learning

Additional Method

Testing stage of classification

Imbalanced Learning: Algorithm-Level

Evolutionary FAM and Evolutionary FAMDDA

Fuzzy ARTMAP

with Dynamic Decay

Adjustment (FAMDDA)

Hybrid Genetic

Algorithms

Fuzzy ARTMAP

Multilayer Perceptron

Evolutionary Programming

Method A Method B

……

Input vector, a Target vector, b

Map Field

vigilance

vigilance map field

Complement Coding

……

A = ( a, 1 – a )

……

B = ( b, 1 – b )

……

Fuzzy ARTMAP (FAM)

(Carpenter et al, 1992)

ARTa ARTb

Flow Chart of FAM Learning Process

Initialization FAM Weights

Complement Coding

Input Patterns, a

Choice Function, T

ART Vigilance Test

MAP Field Vigilance Test

Learning

Winner

Process: Category selection, test, search

Resonance

Fuzzy ARTMAP with Dynamic Decay

Adjustment Algorithm (FAMDDA)

Commit

Shrink

• include a new training pattern into existing of FAM

• introduce a new prototype

• if a new pattern is incorrectly classified by an existing prototype of different class, the width of this prototype is reduced to overcome conflict.

Flow Chart of FAMDDA Learning Process

Initialization FAMDDA Weights

Complement Coding

Input Patterns, a

Choice Function, T

ART Vigilance Test

MAP Field Vigilance Test

Learning with

prototypes’ width

adjustment

Winner

Process: Category selection, test, search

Resonance

Evolutionary FAMDDA/FAM with Hybrid

Genetic Algorithms (GAs) (Baskar et al,

GA Search

Search for near-optimum feasible solutions with GA.

I Local Search

Fine-tune the selected feasible solution (phase I). Direct search algorithm to reduce the size of search region.

Network Environment

Evolutionary Environment

GA search

Local search

EPNet (Yao, 1999)

Evolving Feedforward neural network performs adaptation in terms of learning and evolution

Evolve architecture and connection weights using evolutionary programming

5 mutation operators

The Construction of EPNet

Figure adapted from Yao (1999).

Algorithm-Level Classification: Results

•performance comparison with other classification methods

Model CRR TPR TNR G-mean

KNN 91 95 43 64

SVM-RBF 75 74 83 79

EPNet 80.34 80.18 82.30 81.84

FAM-HGA 87.16 87.79 79.42 83.50

FAMDDA-HGA 88.69 89.75 75.73 82.44

Additional Method: Testing Stage of

Classification

Rule-Based Classifiers

FAMDDA-FIM

FAM-FIM

Rectangular Basis Function

Network (RecBFN)

NEFCLASS

Note: FIM – Fuzzy Inference Mechanism

FAMDDA-FIM and FAM-FIM

Knowledge base formation from learning with FAMDDA/FAM

Knowledge base extraction from a trained network,

Reasoning process

Ljc jaj ,,2,1 , w

Reasoning Process in a Trained

FAMDDA/FAM

Determining output. By a weighted sum of all rules’ firing strengths.

Aggregating firing strength of all rules. The activation levels of the rules from different classes are aggregated by an additive combination.

Matching degree. Calculate firing strength of the antecedent of each rule, associate directly to its consequent class.

Rectangular Basis Function Network

(RecBFN) (Huber and Berthold, 1995)

A method to learn hyper-rectangles (rules) directly from data.

Applies a constructive learning algorithm (Dynamic Decay Adjustment algorithm)

Hyper-rectangles translated directly as rules.

Rectangular Basis Function Network

(RecBFN)

• The output layer computes a weighted sum of the activations of the RecBF units.

Figure adapted from Berthod and Huber (1995)

Neuro Fuzzy CLASSification (NEFCLASS)

(Nauck and Kruse, 1997)

A neuro-fuzzy classifier: A 3-layer (input/rule/output) fuzzy perceptron + backpropagation algorithm.

Learns the shape of membership functions (fuzzy sets )

Train with prior knowledge/from scratch with data.

NEFCLASS If x1 is µ1 and x2 is µ2

and… and … xn is µn then the pattern (x1, x2, …, xn) belongs to ci

Testing Stage of Classification: Results

Model CRR TPR TNR G-mean

KNN 91 95 43 64

SVM-RBF 75 74 83 79

RecBFN 8.90 86.83 93.85 1.30

NEFCLASS 5.14 4.12 0.81 44.60

FAM-FIM 89.08 90.13 76.20 82.87

FAMDDA-FIM 87.18 87.78 79.78 83.69

G-Mean

Summary

• Current work: some machine-learning methods based on artificial neural networks, rule-based system and evolutionary algorithms.

• Further improvements in classification performances with other machine-learning methods (algorithm-level approach).

• Future work: boosting algorithm + machine-learning model

References

• Baskar, S., Subraraj, P., and Rao, M. V. C. (2001). Performance of hybrid real coded genetic algorithms. International Journal of Computational Engineering Science, 2, 583-601.

• Berthold, M. R., and Diamond, J. (1998). Constructive training of probabilistic neural network. Neurocomputing, 19, 167-183.

• Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds, J. H., and Rosen, D. B. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analogue multidimensional maps. IEEE Transactions on Neural Networks, 3, 698-713.

• He, H. and Garcia, E.A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263 – 1284.

• Huber, K. -P., and Berthold, M. R. (1995). Building precise classifiers with automatic rule extraction. Proceedings of the IEEE International Conference on Neural Networks, 3, 1263-1268.

• D. Nauck and R. Kruse (1997). A neuro-fuzzy method to learn fuzzy classification rules from data. Fuzzy Sets and Systems, 89, 277–288.

• Yao, X. (1999). Evolving artificial neural networks. Proceedings of IEEE, 87, 1423 – 1447.

machine-learning methods for classification of ... · machine-learning methods for classification...

Documents

machine learning - object detection and classification

support vector machine & image classification applications

cs 391l: machine learning: inductive classification

image classification and support vector machine

music genre classification using machine learning

machine learning for document classification

android malware classification using parallelized machine...

classification based machine learning algorithms

machine learning - introduction to bayesian classification

machine learning in 5 minutes— classification

text classification using machine learning

machine learning - neural and statistical classification

machine learning application for classification prediction

classification of milling machine

machine learning applied in product classification

machine learning classification of interplanetary coronal

pavit coran - mmu.edu.my

exploring machine learning classification algorithms...

classification: support vector machine

inf 5860 machine learning for image classification lecture...