modelling and analysis of waviness reduction in soft-pad

International Journal of Production Research,Vol. 44, No. 13, 1 July 2006, 2605–2623

Modelling and analysis of waviness reduction in soft-pad grinding

of wire-sawn silicon wafers by support vector regression

JUDONG SHENy, Z. J. PEIy, E. S. LEE*y and G. R. FISHERz

yDepartment of Industrial and Manufacturing Systems Engineering, Kansas State University,

Manhattan, KS 66506, USA

zMEMC Electronic Materials, Inc., 501 Pearl Dr., St. Peten, MO 63376, USA

(Revision received December 2005)

The manufacturing of silicon wafers forms the most important step in theconstruction of integrated circuit (IC) chips. One of the difficulties in thismanufacture process is the removal of the waviness from the resulting wafers.In this paper, mathematical modelling and analysis of this removal process iscarried out by the use of the support vector regression (SVR) algorithm. Theresults show that SVR is ideally suited for the modelling of this complicatedprocess. Furthermore, by the use of the learning ability of SVR, the model can becontinuously improved as more data become available. Based on the resultingmodel, the influences of the various factors on the rate of removal and the easeof control of the removal process are also discussed.

Keywords: Manufacturing; Modelling; Silicon wafer; Waviness removal;Statistical learning theory; Support vector regression

1. Introduction

Integrated circuit (IC) chips are usually built on silicon substrates, which are in theform of wafers. After the silicon ingot is obtained by crystal growth, a sequence ofprocesses is required to manufacture high-quality silicon wafers, including slicing,flattening, etching, and other finishing steps (Van Zant 2000, Pei et al. 2004).Recently, the preferred slicing approach has been wire sawing. An undesirablephenomenon associated with wire sawing is waviness. The waviness has to beremoved by subsequent lapping, grinding, or other processes since it adversely affectsthe flatness of the wafers. Several approaches have been proposed to reduce oreliminate the waviness induced by wire sawing, such as wafer grinding followed bylapping (Vandamme et al. 2000), the use of a soft-pad (Kassir and Walsh 1999), etc.It has been shown that soft-pad grinding is the most promising approach since it isvery effective in reducing the waviness and very easily implemented in conventionalgrinding environments (Xin et al. 2004).

The removal of the waviness resulting from wire-sawn slicing is a complicatedand delicate operation. To understand this removal process, mathematical modellingand analysis should form an integral part of the investigation. However, very fewinvestigations have appeared in the literature. Pei and co-workers (Xin et al. 2002,

International Journal of Production Research

ISSN 0020–7543 print/ISSN 1366–588X online � 2006 Taylor & Francis

http://www.tandf.co.uk/journals

DOI: 10.1080/00207540600558049

2004, Pei et al. 2003, J. Wu et al. 2003, Sun et al. 2004) have investigated thisremoval operation by the use of finite element analysis. Jiao et al. (2003) investigatedthis waviness removal process through the use of a learning-based technique, namelythe fuzzy adaptive network (FAN).

The learning-based approaches appear to be ideally suited to the modelling ofthis complicated process. Learning can improve the initially assumed approximatemodel. Thus, as more data become available, this approximate model can bemodified and updated by learning. However, Jiao et al. (2003) found that the FAN isnot very easy to use. The basic problem in applying the FAN is the slow convergence.Another problem is the presence of local maxima.

In this paper, the recently developed learning technique, namely the support vectormachine (SVM) (Vapnik 1995, 1999, Kecman 2001), is used to model this removalprocess. The FAN (Cheng 2000, Cheng and Lee 2001), which combines the ability ofthe linguistic representation of fuzzy logic with the learning ability of the neuralnetwork, has several drawbacks such as the need for a fairly large amountof representative training data, a slow rate of convergence, the problem of localoptima, and poor generalization performance as a result of over-fitting or under-fitting (Cheng 2000, Cheng and Lee 2001, Kecman 2001).Most of these drawbacks aredue to the fact that these earlier learning machines are based on the empiricalrisk minimization (ERM) principle (Vapnik 1999, Kecman 2001). SVM, a learningapproach based on the structural risk minimization (SRM) principle, could overcomemost of these drawbacks. Because it is based on the statistical learning theory, SVMimproves the generalization performance and works well with a reasonably smallamount of training data. Furthermore, because the training is generally based on thesolution of quadratic programming, the learning process is fast and can be used foronline purposes. Thus, SVM is amuchmore powerful learning algorithm and has beenapplied to solve both the classification problem, generally known as support vectorclassification (SVC); and the prediction and regression problem, generally known assupport vector regression (SVR). Furthermore, SVM works well with problems thathave a large number of attributes but have only a small number of available data.

Because of its many attractive features, SVM should be a very useful approachfor the modelling and analysis of the waviness removal process. The purpose of thispaper is to apply SVR to the modelling and analysis of this complicatedmanufacturing process. Since this removal process has been investigated by theuse of the FAN in an earlier paper (Jiao et al. 2003), a second purpose is to comparethese two learning-based techniques, and to show that SVR is a more effectiveapproach than the FAN.

The remainder of this paper is organized as follows. Section 2 summarizes theSVR approach and section 3 applies the SVR method to model the wavinessreduction of soft-pad grinding. Sections 4 and 5 present the results and compare theresults with those obtained by the FAN (Jiao et al. 2003).

2. Support vector machines

Only the essential concepts and equations are summarized in the following. Fora more detailed presentation, the reader is referred to the literature (Vapnik 1995,1999, Kecman 2001).

2606 J. Shen et al.

Both SVC and SVR are based on statistical learning theory developed by

Vapnik and Chervonenkis (Vapnik 1995, 1999, Kecman 2001). SVM is a system for

efficiently training linear learning machines in kernel-induced feature spaces by

exploiting optimization theory and respecting the insights of the generalization

theory.SVM has been shown to be able to achieve good generalization performance

for both classification and regression of high-dimensional data sets. SVR consists

mainly of three components: statistical learning theory, the regression/optimization

problem and the use of kernels. These three components are summarized briefly

in the following.

2.1 Statistical learning theory

Suppose l training data pairs are given as (x1, y1), . . . , (xl, yl), x 2 <n, y 2 <. They are

obtained from an unknown distribution P(x, y). The expected risk to learn the

mapping xi 7!yi from this set of data is given by the risk function

Rð�Þ ¼

ZLð y, f ðx,�ÞÞdPðx, yÞ, ð1Þ

where Lð�Þ is a loss function. The absolute error and square error are the most

popular loss functions for traditional regression problems. For example, if the loss

function is the absolute error, then the risk function takes the form:

Rð�Þ ¼

Zy� fðx,�Þ�� dPðx, yÞ: ð2Þ

The task of the machine is to find the best mapping xi 7!f �ðx,�0Þ out of a set of

possible mappings xi 7! f ðx,�Þ, where the function f �ðx,�0Þ minimizes the risk

function R(�) over the class of functions f ðx,�Þ, � 2 �, under the conditions that the

joint probability distribution P(x, y) is unknown and the only available information

is contained in the l training data pairs. Since the joint distribution P(x, y) is

unknown, the risk function R(�) cannot be calculated exactly. In practice, the

expected risk function R(�) is frequently replaced by the empirical risk function

Rempð�Þ ¼1

l

Xl

i¼1

Lðyi, f ðxi,�ÞÞ: ð3Þ

For example, if the loss function is the absolute error (L1 norm), the empirical risk

function takes the form:

Rempð�Þ ¼1

l

Xl

i¼1

yi � f ðxi,�Þ�� : ð4Þ

The empirical risk function is used by most traditional regression methods and

learning algorithms and is known as the empirical risk minimization (ERM)

approach. The generalization performance of ERM-based methods is usually not

very good since the empirical risk function Remp(�) covers only the information from

the given training data, with no consideration of the general population or the

Modelling and analysis of waviness reduction 2607

unseen data. To overcome this difficulty, the following bond was established for theexpected risk, which holds with a probability of at least 1� �:

Rð�Þ � Rempð�Þ þ�ðl, h, �Þ, ð5Þ

where l is the number of examples, h is a non-negative integer known as the VC(Vapnik–Chervonenkis) dimension of f (x,�), and � is a number with the value0 � � � 1. The second term on the right-hand side is called the VC confidence.A typical uniform VC bound for SVC and SVR has the following form:

Rð�Þ � Rempð�Þ þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffihðlogð2l=hÞ þ 1Þ � logð�=4Þ

l

r: ð6Þ

The VC dimension h in (5) and (6) is a controlling parameter for minimizing thegeneralization bound R(�), is a measure of the capacity of the learning machine anddefines the complexity of the model. It is usually but not always related to l.

The ERM-based approach cannot handle systems well when the availabletraining data set is small. In contrast, structural risk minimization (SRM) can handlesystems with a relatively small available training data set and is an approximateimplementation of the SRM principle by the use of the generalization bond. Thus,unlike most other machine learning methods, SVM allows one to control theexpected risk, not simply the empirical risk. This distinct property combined with thesuperior computational performance makes SVM a very interesting and powerfulapproach.

2.2 Regression and optimization

Given l training data pairs with unknown joint distribution P(x, y), the goal ofthe general regression-learning problem is to find a function which, based on thegiven data, can predict the actual response y as closely as possible in the sense ofthe statistical learning theory. Both linear and non-linear regression can be solved.For non-linear regression, SVR first uses a non-linear mapping function ’ to mapthe original training data into the feature space F and then constructs a hyperplanethat is as close as possible to as many points as possible in the feature space.The hyperplane in the feature space takes the following form:

f ðxÞ ¼Xl

i¼1

wi’iðxÞ þ b: ð7Þ

SVR based on SRM principles minimizes the expected risk by minimizing both theempirical risk and the VC dimension. To measure the empirical risk, a specific lossfunction is needed. Some of the commonly used loss functions include the quadraticloss function, linear loss function and Huber loss function. SVR uses a new lossfunction called the "-insensitive loss function, as proposed by Vapnik. It appearsthat for finite sample regression problems, Vapnik’s " -insensitive loss function witha properly chosen "-parameter actually yields better generalization than other lossfunctions. The "-insensitive loss function is defined as:

L"ð y, f ðx,!ÞÞ ¼0 if jy� fðx,!Þj � "jy� fðx,!Þj � " otherwise

�, ð8Þ

2608 J. Shen et al.

where the error is penalized only if the points falls outside the "-tube. The errorsor loss values of the training examples above and below the "-tube are representedas two non-negative slack variables � and � �, respectively. A linear function in thefeature space is formed from the original non-linear function in the original space.

In feature space, SVR minimizes both the empirical risk (or "-insensitive loss) andthe model complexity (or model capacity) simultaneously. The minimization of theformer is achieved by minimizing

Pli¼1 ð�i þ � �i Þ and the minimization of the latter

can be obtained by minimizing jjwjj2. In other words, SVR tries to find a functionf (x) in the feature space that has at most " deviation from the actually obtainedtargets yi based on all the training data examples, and, at the same time, is as flatas possible. Thus, the regression problem is reduced to the minimization of thefollowing quadratic programming problem:

min1

2jjwjj2 þ C

Xl

i¼1

ð�i þ � �i Þ

s:t:f ðxi,wÞ � yi � "þ �iyi � f ðxi,wÞ � "þ � �i

�i, ��i � 0, i ¼ 1, . . . , l

,

8<: ð9Þ

where C is a positive and predefined constant which influences the trade-off betweenthe approximation error (empirical risk) and the flatness of model f (modelcomplexity or weights vector norm jjwjj), and " is also a user-defined precisionparameter which determines the size of the "-tube. Thus, the regression problem isreduced to the solution of a convex programming problem, which always has aunique solution (Bazaraa and Shetty 1993).

The minimization problem (equation (9)), can be solved by using theKarush–Kuhn–Tucker (KKT) theory (Bazaraa and Shetty 1993). The desiredoptimal regression hyperplane for non-linear regression is:

f ðx,wÞ ¼Xl

i¼1

ð�i � � �i Þ’ðxiÞ � ’ðxÞ þ b �: ð10Þ

2.3 The kernel function

To map an input non-linear x-space into a high-dimensional linear feature z-space,two basic problems need to be considered, namely how to choose the mappingfunction ’ðxÞ and the ‘curse of dimensionality’ problem. The second problem wasoriginally introduced by Bellman in connection with the exponentially increasingstorage space or computational effort with the increase of the dimensionality of theproblem. If the dimensionality of the feature space is very large, the time required tocompute the scalar product can be excessively long. This explosion in dimensionalitycan be avoided by using the kernel functions. The training algorithm depends onthe data only through the scalar product ’ðxiÞ � ’ðxjÞ in the optimal regressionhyperplane (equation (10)). Thus, we can define the kernel function in the featurespace as:

Kðxi, xjÞ ¼ zi � zj ¼ ’ðxiÞ � ’ðxjÞ: ð11Þ


The basic advantage in using a kernel function is that the required scalar productin feature space ’ðxiÞ � ’ðxjÞ is calculated directly by computing the kernels K(xi, xj)for the given training data vectors in the input space. In this way, the possibilityof computing in an extremely high-dimensional feature space F is avoided.Furthermore, if there is such a kernel function, one need not explicitly know whatthe actual mapping ’ðxÞ is. Another advantage of kernel function is that one cansolve the problem in roughly the same amount of time it would take to train SVRon the unmapped data since the only change is to replace xi � xj by K(xi, xj) in thetraining algorithm.

Any symmetric positive semi-definite function K(x, y) in input space, whichsatisfies the Mercer’s conditions, can represent the scalar product in the featurespace. Investigators have used various kernel functions for SVC and SVR. Severalof the popular ones are listed in table 1.

3. Modelling waviness reduction in soft-pad grinding of wire-sawn silicon

wafers by SVR

The wafer grinding process is illustrated in figure 1. Grinding wheels are diamondcup wheels. The workpiece (wafer) is held on a porous ceramic chuck by means ofa vacuum. The rotation axis of the grinding wheel is offset by a distance of the wheelradius relative to the rotation axis of the wafer. During grinding, the grindingwheel and the wafer rotate about their own rotation axes simultaneously, while thewheel is fed toward the wafer along its axis. Figure 2 explains why soft-pad grindingcan effectively remove waviness. When grinding the first side of a wire-sawn wafer,a perforated resilient pad is inserted in between the wafer and the ceramic chuck. Thesoft pad accommodates and supports the wavy surface of the wafer and holds thewafer in a less deformed condition. Thus the waviness of the top surface is removedby grinding. This grounded surface then serves as the flat reference plane for grindingthe other side of the wafer on a conventional ceramic chuck (J. Wu et al. 2003).

3.1 Data preparation

In the soft-pad grinding of wire-sawn wafers, the pad’s material properties (e.g.elastic modulus and Poisson’s ratio), geometry (e.g. thickness) and other grindingparameters (e.g. grinding force and waviness wavelength of the silicon wafers) willaffect waviness removal and process control. Five parameters are studied in thispaper, namely the waviness wavelength (W) of the silicon wafer, the thickness of the

Table 1. Four popular kernel functions.

Kernels Functions Parameters

Linear Kðxi, xjÞ ¼ xi � xj NonePolynomial Kðxi, xjÞ ¼ ½ðxi � xjÞ þ 1�d d—polynomial degreeGuassian RBF Kðxi, xjÞ ¼ e�jjxi�xjjj

2=2�2 �—width constantMultilayer perceptron(only for some values of �)

Kðxi, xjÞ ¼ tanhðkxi � xj � �Þ K, �—MLP constants

2610 J. Shen et al.

soft pad (T), the elastic modulus of the pad (E), Poisson’s ratio of the pad (�), andthe grinding force (F).

Two output responses or variables are considered: (1) relative peak displacement(RPD) and (2) valley displacement (VD). RPD is the average displacement of thewaviness peaks relative to the waviness valleys, along the grinding force direc-tion; while VD is the average displacement of the waviness valleys, also along thegrinding force direction. Figure 3 illustrates both relative peak displacement andvalley displacement. It is clear that the smaller the RPD the better the top wafersurface will retain its original wavy shape, and therefore the more effective the wavi-ness removal process will be. In principle, VD has no effect on waviness removal,but a large valley displacement will make it difficult to control the position of thewafer, causing problems for the grinding operation (Pei et al. 2003, J. Wu et al. 2003,Sun et al. 2004).

The two output variables RPD and VD are independent of each other.In SVR, multiple output problems can usually be reduced to a set of single outputproblems. Thus, the process has five inputs from which it is desired to predict a

Active grindingzone

Wheel rpm Chuck rpm

Rotation axisof wafer

Silicon wafer

Rotation axisof wheel

Feedrate

Grinding wheel

Side view

Top view

Figure 1. The wafer grinding process.


single output. This process has been modelled by the FAN in an earlier publication(Jiao et al. 2003). The same training sets as those used in Jiao et al. (2003) are usedhere. In this way, a comparison between the FAN and SVR can be carried out.The test data sets for SVR are listed in table 2.

During-grinding position and shape of the wafer

Original position and shape of the wafer

Z P

osi

tio

n

During-grinding shape of the wafer

Relative peakdisplacement

Valleydisplacement

Figure 3. Illustration of relative peak displacement (RPD) and valley displacement (VD).

Wafer with waviness beforeputting on a ceramic chuck

Wafer on a ceramic chuckwith a soft pad in between

First side of the wavy wafer isground flat while the waferkeeps its original wavy shapedue to the soft pad

Flat wafer is obtained

Wafer is flipped over

Second side of the wafer isground flat on the ceramicchuck

Figure 2. Illustration of soft-pad grinding.

2612 J. Shen et al.

3.2 Selection of kernel function and parameters

Optimal parameter selection for the SVR model remains an important issue.The SVR generalization performance or estimation accuracy has been shown tobe dependent on a good set of parameters C, " and the kernel parameters (C. Wuet al. 2003, Cherkassky and Ma 2004). Parameter C determines the trade-off betweenthe model complexity (flatness) and the degree to which deviations larger than " aretolerated in optimization formulation (Cherkassky and Ma 2004). Parameter "controls the width of the "-insensitive zone. The value of " can also affect the numberof support vectors used to construct the regression function. The larger the valueof ", the fewer support vectors are selected. On the other hand, larger " values resultin ‘flatter’ estimates. Hence, both C and " values affect model complexity. Selectinga particular kernel type and kernel function parameters is usually based onapplication-domain knowledge and they should reflect the distribution of input (x)values of the training data. In addition, some validation techniques such asbootstrapping and cross-validation can be used to determine a good kernel(Cherkassky and Ma 2004). For instance, the width parameter of RBF kernelsin SVR should reflect the distribution/range of x values of the training data.

Choosing parameters can be time consuming. Thus, in practice, an approximateset of parameters can be decided first based on the user’s knowledge or by simplyguessing. Then cross-validation is performed on the training data and the estimatedparameters based on the evaluation of the model.

Several different combinations of the three parameters were tried. The resultsshow that the RBF kernel performs better than the other kernels. Thus, the RBFkernel function is used in this work. Furthermore, the width parameter �¼ 1.0 forthe RBF kernel leads to the best possible results. It was discovered that if the othertwo parameters are set as C¼ 100 and "¼ 0.1, a reasonably good estimate can beobtained. The dimensionality of the input space is five. The SVR model based onthe above chosen parameters was obtained by carrying out different runs for eachresponse variable.

3.3 Performance criterion

In order to compare the predicted performance with that of the performanceobtained from the FAN in Jiao et al. (2003), some kinds of evaluation criteria

Table 2. Testing data set.

TestNo.

Wavinesswavelength

(mm)

Pad elasticmodulus(MPa)

PadPoisson’sratio

Padthickness(mm)

Grindingforce (N)

Relative peakdisplacement

(mm)

Valleydisplacement

(mm)

1 10 30 0.27 1.4 40 2.827 24.0722 10 55 0.35 1 25 2.063 9.6653 10 60 0.37 1 25 2.168 8.664 16.6667 50 0.25 1.2 40 15.089 21.8695 16.6667 70 0.13 0.6 40 17.616 5.346 16.6667 75 0.15 1 30 14.913 7.17 16.6667 90 0.17 1 30 15.227 6.13


are needed. In this paper, mean absolute error (MAE) is used to measure thedeviation between the actual and the predicted values. MAE is defined as follows:

MAE ¼1

k

Xki¼1

pi � ai�� , ð12Þ

where pi is the predicted value for the ith test data, ai is the actual response valuefor the ith test data and k is the number of test data sets.

4. Numerical results and comparison with the FAN

Using the adopted model, experiments for the training of the SVR model werecarried out by using the given training data sets. After training, the model was testedby using the testing data sets. The results are compared with those obtained by theuse of the FAN. The predicted values of RPD and VD are shown in figures 4 and 5,respectively. Figure 4 is a plot of the predicted RPD values against the experimentalones and figure 5 is a similar plot for VD values. These two figures show both theresults by SVR and by the FAN (Jiao et al. 2003). It can be seen that the resultsproduced by SVR are much closer to the experimental data than those produced bythe FAN. Thus, the SVR method is better than the FAN method.

To analyse the results further, the absolute errors (AEs) for both the SVR andthe FAN approaches are listed in table 3. It can be seen that all the AEs produced bySVR are smaller than those produced by the FAN. Using the MAE performance

Figure 4. Comparison of relative peak displacement (RPD) by SVR and FAN.

2614 J. Shen et al.

criterion, as defined by equation (12), both the SVR-induced MAE values, 0.338586and 0.9444 for RPD and VD, respectively, are smaller than the FAN correspondingMAE values of 0.907286 and 1.971, respectively. These results indicate that theSRM-based SVR method possesses better generalization performance than theFAN method.

It should be noted that given limited training and testing data sets, unlike theneural network-based FAN approach, the SVR method seldom encountersconvergence problems. Compared with the FAN method, SVR can always convergerapidly and also can avoid a local minimum due to the global property of thesolution of the quadratic programming problem.

Figure 5. Comparison of valley displacement (VD) by SVR and FAN.

Table 3. Predicted results using SVR and FAN methods.

Test No. AE of FAN RPD AE of SVR RPD AE of FAN VD AE of SVR VD

1 0.296 0.0902 3.445 1.65832 0.589 0.1708 2.178 0.57693 0.71 0.23 2.998 2.29884 0.847 0.0278 1.727 0.2075 0.768 0.4776 2.356 1.26766 0.932 0.3091 0.684 0.33637 2.209 1.0646 0.409 0.2659

MAE 0.907286 0.338586 1.971 0.9444


5. Predicted influences

By using the results obtained from the training of the SVR model, the

influences of the various input variables on the outputs can be predicted.

These influences on the RPD and the VD are shown in figures 6–10. The simulated

(a)

(b)

Figure 6. Influences of waviness wavelength on RPD and VD.

2616 J. Shen et al.

data, indicated by the use of circles and squares, are also shown in thefigures. As mentioned in section 3.1, a smaller RPD is preferred for wavinessremoval. For VD, although it does not influence waviness removal directly,a larger VD is not desirable for process control since it causes difficulty in

(a)

(b)

Figure 7. Influences of elastic modulus on RPD and VD.


controlling the target wafer thickness of the grinding process (Xin et al. 2004).By fixing any four of the five parameters, the non-linear relationship betweenthe one remaining input parameter and any one of the response variables couldbe obtained.

(a)

(b)

Figure 8. Influences of Poisson’s ratio on RPD and VD.

2618 J. Shen et al.

5.1 Effects of waviness wavelength

Figure 6 illustrates the effects of waviness wavelength on RPD and VD. It can

be observed that RPD increases rapidly with the waviness wavelength. However,

VD decreases gradually as waviness wavelength increases. So, shorter waviness

(a)

(b)

Figure 9. Influences of pad thickness on RPD and VD.


wavelength is more effective in eliminating the waviness in the grinding process, butincreases the difficulty of process control. Consequently, in order to reduce thewaviness, the wire-sawing operation should produce a shorter waviness wavelength.In addition, the effect of waviness wavelength on waviness reduction is strongerwhen the pad’s elastic modulus is larger.

(a)

(b)

Figure 10. Influences of grinding force on RPD and VD.

2620 J. Shen et al.

5.2 Effects of the pad’s elastic modulus

The influence of the pad’s elastic modulus on RPD and VD is shown in figure 7.Figure 7(a) shows that RPD gradually increases with the pad’s elastic modulus,and thus it is easier to remove the waviness if a smaller elastic modulus is used.Figure 7(b) indicates that VD decreases gradually with the increase of elasticmodulus when the elastic modulus is larger than 20 MPa, but decreases rapidly whenit is less than 20 MPa; a larger elastic modulus is desired for process control.

5.3 Effects of the pad’s Poisson’s ratio

It can be seen from figure 8 that with an increase in the pad’s Poisson’s ratio, RPDincreases slowly while VD decreases rapidly. So a pad with a smaller Poisson’s ratioshould be used from the standpoint of removing waviness. However, a smallerPoisson’s ratio will also lead to difficulty in controlling the target wafer thickness inthe grinding process. Consequently, an optimal pad Poisson’s ratio is needed in orderto obtain both small RPD and controllable VD.

5.4 Effects of pad thickness

RPD decreases gradually as the pad thickness increases, which leads to improvedwaviness reduction (see figure 9(a)). But VD increases fairly rapidly when the padbecomes thicker (see figure 9(b)). In order to balance this conflict, an optimal padthickness should be obtained.

5.5 Effects of grinding force

Figure 10 demonstrates that, at high waviness wavelength, both RPD and VDincrease fairly rapidly. However, this rate of increase becomes fairly slow with lowwaviness wavelength. Therefore, the grinding force should be lowered as much aspossible in order to remove waviness and to make the control of the process easier.In addition, the effect of grinding force on waviness reduction is much stronger whenthe waviness wavelength is large.

6. Discussion

The results in this paper show that SVR is an effective approach for the modellingand analysis of the waviness removal process, even with very limited data. Thisprocess with the same data was investigated by the use of the FAN in an earlier paper(Jiao et al. 2003). One problem in that earlier investigation was the slow convergence.Another problem was the presence of different local maxima. In the SVR approach,there is no convergence or local maxima problems owing to the use of quadraticprogramming to solve a convex programming problem. Furthermore, as shownby the results, higher accuracy is obtained by the use of SVR than that obtainedby the FAN. Thus, as expected, SVR has a better generalization performance thanthe FAN.


Support vector regression achieves better performance in comparison with theearlier learning approaches such as the FAN and the various neural networkalgorithms. This is due to the fact that the fundamental theories such as statisticallearning theory and non-linear programming theory can be embodied in the process.The embodying of non-linear or quadratic programming in the process is due to theclever adaptation or use of the kernel function.

Acknowledgements

This study was supported by the National Science Foundation throughCareer Award DMI-0348290 and Grant DMI-0218237, and by the AdvancedManufacturing Institute of Kansas State University.

References

Bazaraa, M. and Shetty, C., Nonlinear Programming: Theory and Algorithms, 1993(Wiley: New York).

Cheng, C.B. and Lee, E.S., Switching regression analysis by fuzzy adaptive network. Eur.J. Oper. Res., 2001, 128, 647–663.

Cheng, C.B., Nonparametric fuzzy regression and its applications. PhD dissertation, KansasSate University, Manhattan, Kansas, USA, 2000.

Cherkassky, V. and Ma, Y., Practical selection of SVM parameters and noise estimation forSVM regression. Neur. Net., 2004, 17, 113–126.

Jiao, Y., Wu, J., Lei, S.T., Pei, Z.J. and Lee, E.S., Application of fuzzy adaptive networksin manufacturing: waviness removal in grinding of wire-sawn silicon wafers. inProceedings of ASME International Mechanical Engineering Congress and Exposition(IMECE), Washington, DC, 16–21 November 2003.

Kassir, S.M. and Walsh, T.A., Grinding process and apparatus for planarizing sawed wafers,US Patent 5,964,646, 1999.

Kecman, V. Learning and Soft Computing: Support Vector Machines, Neural Networks,and Fuzzy Logic Models, 2001 (MIT Press: Cambridge, MA).

Pei, Z.J., Kassir, S., Bhagavat, M. and Fisher, G.R., An experimental investigation intosoft-pad grinding of wire-sawn silicon wafers. Int. J. Mach. Tools Manuf., 2004, 44,297–304.

Pei, Z.J., Xin, X.J. and Liu, W.J., Finite element analysis for grinding of wire-sawn siliconwafers: a designed experiment. Int. J. Mach. Tools Manuf., 2003, 43, 7–16.

Sun, X.K., Pei, Z.J., Xin, X.J. and Fouts, M., Waviness removal in grinding of wire-sawnsilicon wafers: 3D finite element analysis with designed experiments. Int. J. Mach. ToolsManuf., 2004, 44, 1–19.

Van Zant, P., Microchip Fabrication: A Practical Guide to Semiconductor Processing, 4th ed.,2000 (McGraw-Hill: New York).

Vandamme, R., Xin, Y. and Pei, Z.J., Method of processing semiconductor wafers, US Patent6,114,245, 2000.

Vapnik, V.N., The Nature of Statistical Learning Theory, 1995 (Springer: New York).Vapnik, V.N., An overview of statistical learning theory. IEEE Trans. Neur. Net., 1999, 10,

988–999.Wu, C., Wei, C., Su, D. and Chang, M., Travel time prediction with support vector regression.

IEEE Intell. Transport. Sys., 2003, 2, 1438–1442.Wu, J., Sun, X.K., Pei, Z.J. and Xin, X.J., Soft-pad grinding of wire-sawn silicon wafers:

finite element analysis using 25 factorial design, CD-ROM. in Proceedings of the 12thAnnual Industrial Engineering Research Conference (IERC-2003), Portland, OR,18–20 May 2003.

2622 J. Shen et al.

Xin, X.J., Liu, W.J. and Pei, Z.J., Modeling of waviness reduction in silicon wafer grindingby finite element method. in Proceedings of the International Conference on Modelingand Analysis of Semiconductor Manufacturing (MASM 2002), Tempe, AZ, 10–12April 2002, pp. 24–29.

Xin, X.J., Pei, Z.J. and Liu, W.J., Finite element analysis on soft-pad grinding of wire-sawnsilicon wafers. J. Electron. Packag., 2004, 126, 177–185.


modelling and analysis of waviness reduction in soft-pad

Documents