prediccion rna

Upload: gerardo-vera-pacco

Post on 14-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Prediccion Rna

    1/8

    Defect Prediction With Neural NetworksRobert L. Stites, Bryan Ward*, and Robert V. Walters***IKONIX Inc and qSTR Corp.

    The industrial and scientific world abound with problemsthat are poorly un&rstood or for which apparent anoma-lous conditions exist. Artificial Neural Networks areutilized with conventional techniques to extract salientfeatures and relationships which are non-linear in nature.Defect causality in a large continuous flow chemicalprocess is investigated. Significant gains in the predictionof defects over tradit ional statistical methods are achieved.

    INTRODUCTIONA chemical processing plant experienced unexplainedfluctuations in defect measurements for a multi-stagecontinuous liquid flow system with significant mixing timelags. Repeated attempts to utilize traditional statisticalmethods were largely unsuccessful in the determimtion ofany causal relationships.The ultimate system objective is the control of quality inorder to maximize the economic potential of the plantprocess. A precursor to the development of any controlsystem is the development of a model of the plant operatingcharacteristics. The focus of the effort described herein istwofold:

    develop a model capable of more accuratepredict ion of defect measurements, andidentif ication of causal relationships with-in the process.

    A model can be developed from either known first princi-

    Permission to cnpy without fee atl or part of this material is grantedprovided that the copies are not made or distributed for directcommercial advantage, the ACM copyr ight noti ce and the title of thepublication and its date appear, and notice is given that m ying is by!ermission of the Association for Computing Machinery. o copyotherwise, or to republish requires a fee and/or speci fic permission.

    pies or derived from empirical data. In this particular case,the later method was required. At a later time and with anadequate model, corrective actions may be taken and abetter control system for the process cart be developed.The process is continuous flow, raw materials are enteredinto the system, material flows through several stages, eachwith high capacitance, and exits as finished product on acontinuous basis. The total lag through the system is on theorder of a day. This system operates around the clock,three hundred sixty-five days per year.Data are available from a data acquisition system, qualitycontrol records, and handwritten maintenance records.Lapses in recording of data and modification of the dataprior to recording are known to be problems.A description of a number of aspects of the chemicalprocess and the recording techniques is provided. Minimaldetailed information on the process itself has been providedto the neural network development team. These descrip-tions are critical to the understanding of the difficulties indeveloping any method of prediction.The central effort is to ascertain if it is feasible for anartificial neural network to learn to associate a given defectrate with only the observed conditions in the plant.A neural network is developed which produces better thanan order of magnitude improvement for prediction of thedefect rates than traditional statist ical methods. Result ingpredictions do not precisely match the actual statistics butare capable of explaining a significant portion of thedefects.The neural network provides a view into a unique combina-tion of the use of production factors, physical sensors, andhuman factors in the prediction of defect rates.

    @ 1991 ACM 089791432-519110005/0199 $1.50199

  • 7/27/2019 Prediccion Rna

    2/8

    The process includes a number of stages involving largemixing and processing at each stage, each with largecapacitances. In a continuous flow process, raw materialscontinuously enter into the system and the finished productexits on a continuous basis. No batches are involved. Thissystem operates twenty-four hours a day, three hundredsixty-five days per year.Raw Continuous Flow ChemicalMaterials--p .~~2*

    Process

    %+--l1

    WaW Mix ...................::*::%+ .,...:.,.,.:......,:,:;:,:,:::,:::,:,:,:,:,Mix.:::,:::,:,:,::!,:,:,:,:,:...,.::::,:,:,.,............................. Mix....x..:$&X.Stage 1 _FinishedI- ProductStage 2 w.0. Stage N

    Figure 1 - General ized schematic of the continuousf low chemica l process.

    Due to the proprietary nature of the process involved, onlyminimal information about the process is made available tothe ANN team. This essentially prohibited the use ofknown relationships in the design of the network. Availabledata recorded by the acquisition portion of the controlsystem, records of quality control measurements, andvarious operating parameters for the plant are provided.Quality ControlQuality control measurements are based upon a number offactors with one measurement selected as most representa-tive of overall quality for this effort. This quantitativelaboratory measurement requires an extended period of timeto evaluate and is sampled at regular intervals of two timeunits. All samples are time tagged for alignment withsensor data.SensorsSensors for the existing control system are located through-out the process. Over four hundred sensors are provided forall standard types of measurements:

    temperature,humidity,pressure, etc.

    Redundancy and computational constraints are used toreduce the number of sensors to just over 100.Approximate time offsets of the sensors relative to the

    quality control sample location are estimated based uponlimited measurements. Due to the continuous mixing of theproduct during the various stages, actual arrival times of theproduct at the quality control sample point follows someundetermined distribution. This can be best illustrated byconsidering a situation in which a large number of coloredballs are simultaneously added to the process at the samepoint or location in the process. Depending on the numberof mixing stages involved, they will become spread out overthe remaining length of the process. If the time of arrivalof the balls is noted at the quality control station, it can beseen that all of the balls will not arrive at the same time butwill follow some distribution. Since no information wasavailable about the distribution of the lag times of theproduct through the process, singular arrival times areutil ized for alignment.Data Acquisition and Recording SystemThe acquisition system acquires data at a rate many timesthe basic recording rate for control purposes. The data areaveraged over the interval of the recording time unit withthe average value recorded with a time tag associated withthe end of the interval. An immediate effect of thisaveraging over the recording interval is that the nyquistfrequency is a variable between two and four recording timeunits. In addition, the non-linear filter effect of the averageoperat ion leaves uncertain the power levels associated withthe higher frequencies.Loss of data from the acquisition system occurs frequent ly.At times the loss is for all data for a particular time intervaland at others it may be only for individual sensors. Timealignment of the sensor data to the quality control defectdata indicates that a substantial number of the data do nothave a complete set of sensor data to be utilized as input.Maintenance OperationsRoutine maintenance operations are performed regularlywith no sw-up or shutdown sequences involved during therecorded data acquisition period. Time of maintenanceoccurrence is poorly, if at all, recorded. Most of themaintenance funct ions are est imated to be of short duration,on the order of one quarter to one half recording time unit.It is considered unlikely that the impact of maintenance isreflected directly in any of the sensor data.Human FactorsInformation about team organization, shift assignments, andquality control assignments are provided. Initial informationfrom the traditional statistical efforts indicate that thesehuman factors have been eliminated as having any causaleffect on the defect rate. A major finding of the study isthe critical importance of these human factors.

    200

  • 7/27/2019 Prediccion Rna

    3/8

    enhanced if different segments of the data set are used.

    Data are recorded over a long period onto magnetic media,and written form, The sensor data is time aligned to thedefect quality measurements and the handwritten data isadded. Data dropouts are noted with special out of boundsvalues.Due to the extremely large number of data dropouts, it isdetermined that all data should be treated as discrete datasamples rather than as a time series in training the network.Standard statistical values are ascertained on all of thesensors and the defect data.Evaluation of the sensor and quality data utilizing:

    auto and cross covariance,fourier analysis, andspectral correlation

    are used. A number of the sensors are eliminated asredundant information, and some specific filtering of thedata is indicated.

    Figz&e 2 Power spectral estimate. Note the peak atthe 24.98 unit frequency.

    Similar ities exist between sensors and the quality data in thefrequency domain. Spectral evaluation of both the defectdata and the input data reveals that a number of periodic-ities exist in the data. In addition, three dominant frequencygroupings permeate all of the sensor and quality data, indiffering combinations. These frequencies of 227, 24.98,and 11.77 time units are not precise but rather fuzzy innature, centered at these frequencies. The most dominant,24.98 time units, is illustrated in figure 2. The power levelof this period permeates almost all of the sensors and thedefect measurements. The cepstrum also contains a domi-nant periodicity at the same interval. Interestingly, therelative significance of these frequencies are diminished or

    FilteringThe sensor and the defect data are low pass filtered, usingan eight pole butterworth filter, to eliminate the highfrequency content. Prior to filtering, missing data isrepresented by the interpolated value between the twoadjacent known samples. While some ringing is observedin the output of the filter at these missing portions of thedata, they are re-marked as missing data and the remainingdata is minimally impacted.Weigend, et al [90], among others, note that the networkwill first attempt to fit the strongest features and then laterattempt to fit the noise. The filtering of the data shouldhopefully increase the rate of convergence to a solution bythe network.

    NEURAL NETWORK DESIGNTime and funding constraints in conjunction with the com-plexity of the problem suggests that the use of a back errorpropagation approach is the best candidate artificial neuralnetwork paradigm to apply to the problem. Knowledge inan artificial neural network is represented in two basicways, structure and interconnection strength.Feed Forward NetworkBased on experience with other problems with similar com-plexities a network of the general structure depicted infigure 3 is utilized. The network with two hidden layersis not ftdly interconnected. Groupings of the network nodesinto subnetworks is based upon known spatial, temporal,and functional relationships within the network elements.Functional relationships are based primari ly upon what l ittleis known about the stage in the process and the units ofparticular sensors. Additioml process factors such as shiftand quality control personnel are also included and groupedinto subnetworks.The neuronal, or processing, element utilized in thisnetwork is one in which the inputs are the product of theoutput of some other neuronal element multiplied by aweight, Wji, which is unique for each connection. Theseinputs are summed and the sum pmcesscd by a function.The output of a neuronal element can be expressed as

    Opi = f (Dvijo,j)j (1)

    where thej. is a sigmoidal function, in this case the logisticfunction,

    201

  • 7/27/2019 Prediccion Rna

    4/8

    Input HiddenLayer Layer 1Hidden

    Group 1 : x :N Layer 2Inputs q .Group 2:lnDuts q

    ~ : \\7N--W ,q

    Inputs : A : /XV

    outputLayer

    qIndividual qInputs q1

    Figwe 3 Exnrnple ar ti@cial neural network illu.rtrat -ing clusters of inpui units connected to multiple hki-den layers,

    Tabk I Number of neurons in network.

    Neural Network StructureInput Units 113Hidden Layer 1 22Hidden Layer 2 8Output Layer 1

    f(a) = _ (2)Connections to a bias, or constant, unit and the weight forthat connection are not explicitly shown. Bias unit connec-tions are treated in the same manner as a connection to any

    other unit.

    NETWORK TRAININGThe data are divided into two segments, the first utilized fortraining of the network and the second for testing. The dataare presented to the network and a comparison of the defectprediction from the network is made with the observedquality control measurement. The difference is then usedto propagate the error back through the various elementsand layers of the network.An iterative improvement in the network structure isplanned through evaluation of the performance of thenetwork and some reconfiguration or eliminat ion of proces-singelements and interconnections.Interconnection strengths are modif ied during the learningprocess in order to minimize the sum of the squared errorduring the learning period. Weights are modified aftereach pattern presentation rather than at the completion ofone complete epoch. Experience also suggests that as thenetwork appears to approach its convergence limit, somerandom perturbation in the training sequence may allow afurther reduction in the error.Back Error PropagationPerhaps the most widely utilized neural network paradigmis that of Back Error Propagation first used by Werbos [74]and later published by Le Cun [85], Parker [85], andRumelham, oet.al. [86]. For the purposes of this effort, thenomenclature of Rumelhart [86] is used, The basic objee-tive is to minimize the sum of the squared error ( target, tpi,less the output, oPi) over a series of pattern, p, presentations.

    E=~ ;~ (tPi-oP,)P 1=1

    (3)

    The error is then distributed across the network in a mannerwhich allows the interconnection weights, Wij, to be modi-fied according to the following rule at time n+l:

    Aw,; = @W;+ ?@piOpiwhere, using the function in equation 2,

    tip, = (rpi-opi)opi( 1 -Opi)

    (4)

    (5)represents the amount of error attributable for an outputunit, andfor any hidden layer units.

    202

  • 7/27/2019 Prediccion Rna

    5/8

    Proportion CoefficientPov = of = of *loo. 9)

    Variance Determination

    where theX# 0=(0igwe 4 Basic neuron element illustrating thedependence the weighted sum of the output from aprior layer .

    5,, = Opi(1- opi)xi3pkwbk

    The momentum term, u, provides a smoothing

    (6)

    of theweight modification over ~ entire epoch. Note-that theweights are actually modified after the presentation of eachpattern, p. The learning rate, q, is used to determine theamount that each weight is modified. For this problem, itwas determined that

    a=.7 (7)

    and

    q=.5 (8)

    were reasonable starting values. The momentum term andlearning rate were both reduced as the squared error ap-peared to approach a minimum.MetricSeveral metrics are useful in the evaluation of the perfor-mance of the network during training and testing. The mostfrequently utilized metric for backpropagation is the sum ofthe mean squared error (eq. 3) and some var iance ti.mction.It has been noted by many researchers that, in some cases,the reduction of the mean square error is not sufficient toadequately control the learning of the neural network. Inthis particular case the coefficient of determination (orexpressed as a percentage in proportion of the varianceaccounted for, POV ) is found to be the most useful.

    coefficient - ~ correlation 12of - 1 coefficient 1(lo)

    determination

    As the term proportion of variance would suggest, thismetric is a measure of that portion of the variance from themean of the target signal is explained by the output of theneural network. This metric is also available, through thecorrelation coefficient, for the results of the traditionalstatistical approach pursued previously.Pattern PresentationThe order of presentation of the patterns can sometimesresult in the network learning the order. In this case,however, this problem did not occur. A sequential orderwas used, except that as the squared error appeared toapproach an asymptote, one or two random updates wouldbe used. This perturbation to the network would brieflyincrease the squared error but would allow the network torapidly settie to a lower squared error then that initiallyobserved.Convergence CriteriaConvergence of the network is monitored in two terms.The mean squared error is monitored to determine that nochange, above a given threshold, has occurred during anepoch. The second term monitored is the Proportion ofVariance. It is desirable to ensure that this term does notbegin to decrease steadily over several epochs, while at thesame time the mean square error is still decreasing.Training DurationThe training @od for the network was 19,600 presenta-tions, each with the training set consisting of 1250 exam-ples. At the completion of each epoch, the predictions andtargets are used to develop the Proportion of Variance valuefor the network at that point.

    NETWORK VARIATION USING RECENTOBSERVED DEFECTSLapedes and Farber (87) and Weigend, et al (90) havedemonstrated that a neural network is capable of extract ionof complex relationships from data in a noisy environment

    203

  • 7/27/2019 Prediccion Rna

    6/8

    Figure 5- Neural Network Prediction of Defect Rate for Test Data #2, POV = 27.2%

    Figure 6- Neural Network Target Defect Rate for Test Data #2.

    Figure 7- Neural Network Prediction of Defect Rate with one prior Observation, POV = 81.2%

    Figure 8- Newal Network Target Defect Rate for Training Data.

    through the use of past observations. Although much of thenoise had already been removed by the filter discussed in anearlier section, a decision was made to observe the perfor-mance of the network when prior observations of the defectrate where utilized as input. These past observations werein addition to those inputs already in use by the network.Trials with from one to five of the most recent observationsof defect rates were utilized as inputs. The highest POV

    values were obtained by conducting the training in twostages. First, the training of the network described in thebody of the paper, is completed. The outputs of thisnetwork are then utilized along with the prior observeddefects, as inputs to a smaller network. The results of thisnetwork variation are included in Table III for completenesswith some discussion of their implication in the followingparagraphs.

    204

  • 7/27/2019 Prediccion Rna

    7/8

    RESULTS Tab& III - Results of using prior observed defects as input.The neural network produced predictions of the defect ratefor the chemical process are ten to twenty times better thantraditional techniques. This estimate is based upon themetric of the POV or the coefficient of determination.Independent estimates of the POV by traditional techniqueshad only been performed for the training data set and werenot performed for the test data set. The known and devel-oped results are portrayed in table II.

    Tahk II - Results

    Proportion of VarianceTraining TestData Data

    Traditional Statistical 2=0 -Artificial Neural Network 47.6% 27.2%

    1043.2%

    Generalization in the network is apparent by the high POVvalues attained on the test data. The test data first evaluat-ed yielded the higher of the values listed in the table. Thisresult was so high that another section of data was selectedfor testing. That data is the lower listed.The results with the network utilizing the prior observedpredictions were better than expected. The use of a singleprior observation as input should yield a higher POV value,as this value, in the absence of noise and any other inputs,will usually be closer to the target value than the mean.It is interesting to note that, while the networks with onethrough four prior defect observations yielded POVS in theeighty percentile range, the POV for the network whichutilized five prior observations, achieved a dramatic addi-tional ten percent increase. This would seem to indicatethat some relationship had been discovered by the networkwith the addition of the fifth prior observation. Thestrength of the weights associated with that input wouldseem to bear out this conclusion. Further investigation ofthis result is anticipated.CAUSAL INFERENCEThe predict ive results are significant but perhaps even moreprofound is the result of the efforts to determine causatityin the process. A review of the relative weights in the

    Proportion of VarianceUsing Prior Observations

    TrainingDataOne Prior Observation 8~%

    Two Prior Observations 82.1%Three Prior Observations 83.6%Four Prior Observations 86.7%Five Prior Observations 97.6%

    interconnect ions between the processing elements allowssome evaluation of the relative importance of particularinputs. This work was performed by manual inspection ofthe weight structure at given output levels from the network.Although this is tedious and subject to error, it was immedi-ately apparent that approximately seventy- five ~rcent of theimpact on the prediction was the result of about one quarterof the inputs.Keeping in mind that only about twenty-five to fifty percentof the variance from the mean is explained in the artificialneural network model, the dominant portion of cause isattributed to human factors, i.e. personnel, shift, day ofshift , etc.ConsiderationsWhy do traditional techniques not achieve the same re-sults? Several factors are apparent in the data. The first isthe high degree of non-linearity that exists within thesystem. Examination of the weight structure for the biasand input connections, over the input range, suggest that thenon-linear range of the sigmoidal function is utilized. Thesecond is the discontinuous nature of the solution space asa function of many of the factors involved, This can beobserved in the case of a shift team scheduled in proximityto another specific team, but not to any other. This is, inessence, analogous to the exclusive OR problem.Can the results with the artificial neural network be im-proved? There are a number of improvements that can benoted immediately by improving data quality and integrity.

    205

  • 7/27/2019 Prediccion Rna

    8/8

    First, the sampling of the input data can be improved. Anincreased sampling rate, improved reliability of recording,and replacement of the averaging process, will all improvethe quality of the input data. If a filter is to be used on thesensor data, it should be carefully selected. Calibrations ofsensors and quality control procedures may also eliminatemany of the fluctuations which occur between variouspersonnel combinations.The design of the neural network architecture can beimproved by the use of demonstrated relat ionships to reducethe components (neurons, connections, and weights) in thestructure. The improved data quality may allow the use ofthe data without filtering the source data. Techniques suchas those which utilize prior defect observations may provideadditional benefits.

    CONCLUSIONArtificial Neural Networks can be a powerful tool in theidentification of salient features in processes where tradi-tional techniques do not perform well on their own. Theartificial neural network is not a rejection of those tech-niques but rather an enhancement of the tool set availableto the investigator, whether in the industrial or the biotech-nology arena.

    Bavarian, Behnam, Introduction to Neural Networks forAutomatic Control, IEEE Control Systems Maga-zine, April, 1988.

    Guez, Allen, Eilbert, and Kam, Neural Network Architec-ture for Control, IEEE Control Systems Magazine,April, 1988.

    Lapedes, A. S., and Farber, R. M., Nonlinear signalprocessing using neural networks: prediction andsystem modeling, TR LA-UR-87-2662, LOSAlamos National Laboratory, 1987.

    Le Cun, Y., Une procedure dapprentissage pour reseau aseuil assymetrique, in Proc. Cognitiva 85, Paris,June 1985, pp. 599-604.

    Rumelhart, D., Hinton, D., and Williams, G., Learninginternal representations by error propagation, inRumelhart, D. and McClelland, eds., ParallelDistributed Processing, Vol 1, Cambridge, MA:MIT Press, 1986.

    Weigend, A. S., Huberman, B. A., and Rumelhart, D. E.,Predicting the Future A Connectionist Approach,submitted to the International Journal of NeuralSystems, April 1990.

    Werbos, P. J., Beyond regression: New tools for predictionand analysis in the behavioral sciences, Ph.Ddissertation, Comm. on Appl. Math., HarvardUniversity, Cambridge, MA, Nov 1974.

    Parker, D. B., Learning-logic, MIT Center for Computa-tional Research in Economics Management Sci-ence, Cambridge, MA, TR-47, 1985.

    206