parallel simulated annealing for stochastic reservoir modeling

Upload: martik-martikian

Post on 14-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Parallel Simulated Annealing for Stochastic Reservoir Modeling

    1/10

    SPE 26418Parallel Simulated Annealing for Stochastic Reservoir ModelingM.N, F%tndaand L.W. Lake, U, of TexasSPE Members

    Socletv of PetroleumErminesn

    (l00WMlhl 1993. SOClefYof PetroleumEngineers,Inc.Th18-r W- preparedforpmwntatlonaI the$SthAnnualTechnicalConferenceam!Exhlbltlonofthe Soclefyof Pelro!WmEngineersheldIn Houston.TeXSS.3-6 Octobor1993.Tfrlapaperwae wlwtwf for prewntatlonby an SPE ProgramCommltfeafollowingreviewof Informationcontainedin an abstractsubmittedby the author(s).Con!enloof the paper.aspraaented,have notbeen revtewedbytha Societyof PatroleumErrginaeraendwe subjectto correctionby the author(s).The matariat as presented,doesnot neces*rlIY reflectanypos!flwIof the SOCletyof PetroleumEnglnwre, itsofflwre, ormemtram.Pap-wspresentedatSPE maatlngaaresubjecttopublicationraviewbyEditorialCommU!eeeof the SOCIeIYOfpefrdaumEngI~rE. parmlaatonooopyISrestrictedoan abstractofnotmorethen300worde.Iiluetrallonamsynotbe copied.TheabetracfshouldcontainCJMSPlcUOusknowWImsmofwhere and bywhomtha paperIs prewrrted.Write Llbrarlan,SPE, P.O. Box833S36, Richardson,TX 7E41W-3S3B,U.S.A.Telex, 163245SPEUT.ABSTRACTSimulatedannealing (SA)Wutiques have showngreatpotentialto generate geologically realistic permeability fields bycombining data from many sources, such as well logs, cores,and tracer tests. However, the application of SA in reservoirdescription and simulation is limited owing to its prohibitivelylarge computational time requirement, even on modernMqXWornputem.This paper introduces an implementation of a parallelSA algoxithmfor stochastic reservoirmodelingon a Hypercubepraxtwor network (Mel iPSC 860). The correspondingsequential code, which incorporates univariate and bivariatestatistics to generate the permeabilityfield, is optimizedto gainmaximum advantage fkom the parallel application. In thisparticularparallel implementation,eachprocessorruns the samesource code asynchronously using the single-instruction-multipledata (SXMD)approach,with synchronizationlinked toan optirmdity tesL By porting the SA algorithm to a parallelcOmputerwe can generatepermeabilityfielm thatrepresentnon-Gaussian bivariate statistics and complex flow geometry at afinerscale thanwaspreviouslyfeasible.INTRODUCTIONInverse modeling, a technology which includes the simulatedannealing(SA) algorithms, is a systematicprocedureto estimatethe constitutiveproperties of a physical system based on a few

    experimental observations. A physical system can be thoughtof as a portion of the universe delineated by physical ormathematicalboundaries,suchas the earthfor a geophysicist,anunderground reservoir for a petroleum engineer, or a quantumparticle for a quantumphysicist. The set of physicalparametersdescribing such systems depends on the specific models used.For instance, a geophysicist characterizing the earthsmantlemight use the elastic properties of the solids as the parameters,whereasa petroleumengineerrequirespermeabilityandporositydistributions to characterize the reservoir rocks. To investigatesuch physical systems with inverse modeling we useexperimentalobservationsto infer the actualvaluesof themodelparameters. The most general way to accomplish this is byassigning probabilities to all the possible values of the modelpmeters. It follows then that the measurementof data, the aprimi information in the physical correlation betweenexperimental observations and model parameters, can bedescribed by using probability densities. Simulated annealingalgorithms offer solutions to large, often complex, inverseproblems by determining such probability densities in a fullynonlinearway.Simulatedannealing (SA) algorithms,which belong toa subclass of the general inverse problems, have seena numberof engineering applications statistical mechanics, imageanalysis,l artificial intelligence,2 optimization in groundwaterma nagemen t,3 seismic inversion,4*5 and reservoirmodeling.6**0 SA techniques have shown great promise inobtaining integrated reservoir description because they cancombine data from several sources, such as cores, well logs,

    Referencesand illustrationsat endof paper9

  • 7/30/2019 Parallel Simulated Annealing for Stochastic Reservoir Modeling

    2/10

    2 PARALLELSIMULATEDANNEALINGFOR STOCHASTICRESERVOIRMODELING SPE 26418seismictraces, and interwelltracer flows. The advantageof SAover traditional stochastic techniques is their ability toincorporateeffectivepropertiesderivedfmm integratedmeasures.The disadvantage of SA lies in the large computational (CPU)time required for a reasonable convergenceon single-processormachines. However, since they are principally a variation ofMonteCarlo simulationtechnique, they are usually well suitedto parallelization.In this paper we describe the parallelization of the socalled heat bath algonthm4 on the Intel iPSC 860 Hypercubecomputer. Our results show that, when the problem size islarge, a considerable degree of speed-up is gained by usingmultiple processors. The efficiency of using multipleprmxssorsalso increaseswith the sizeof theproblembecauseofthe reduction in the ratio of communication to computationtime.SIMULATED ANNEALING - APPLICATION TOINVERSE PROBLEMSConsiderthe generationof a permeabilityfieldon a specificgridas an example of inverse problems. This permeability fieldshould match experimental observations, such as core data,variograms,tracer flowhistory, etc.Traditional methods used to solve such problems inpetroleumengineering include type-curvematching,7numericalsimulation,* spectral conditioning and matrix decompositionmethods.9 Thesemethodssufler from numerousdrawbaclw thetype-curve matching techniques are based on very simplemathematicalmodels and henceyield non-unique results. Thestochastic simulation methods of~n assume that thedistributionof permeability is stationary and Gaussian. In addition,generating a stochastic field that matches observations from atracertest or a pressure transient test requiresa large numberofrandomrealizations,sampledexhaustively,until a desiredmatchis obtainti An exhaustivesearchof all the possiblewalizationsis computationallyredundantand prohibitive.However,unlike the traditionalmethods,which tend toexhaustivelysearch through all possible realizations (also calledtheconfigurationspace&noted byE), SA searchesthroughonlya portion of it (See Fig. 1). In addition, unlike the traditionalmethods where the selection of any realization, which is alsocalleda slate or a conjigurafion, is an independenteven~ in SAit depends on its immediate previous neighbor. Thus, SA canbe thought as an one-step Markovprocess in the configurationspace. The principle of SA involves moving betweenneighboring states within E. At each step, when a state isvisited, an objective function, the weighted sum of squazeddifferencesbetweenthe experimentalandcomputedattributes, isevaluated. Mathematically, the objective function is a mappingfromE onto the real line, e: E + ill,and the sequential SA onE genemtcsa randomsequence (&J 6 E, of configurations thatmarches toward the desired convergence as the number ofselections n + -. TMs is illustrated in Fig. 1. The transitionprobability between any two states, which is also termed theGibbsprobability function,depends stronglyon the differencesbetween the values of the objective functions of these states.

    The transition probability between the states Xn and Xn+l canbeexpressedasp. (Ac)= exp [ - %~1-)] (1)

    nwhere Tn is a temperature-like function and the series (Tn) is aset of monotonically decreasing positive numbers called thecoolingschedule,Following 13q.(1) it is easy to show that the sequence(Xn) attains global convergence as Tn + O. This is because(Xn) is a Markov process and the transition probability, pnbetween neighboring configurations in (Xn) forms a Gaussianrandomvariable ranging between Oand 1 with a mean value =0.5. Now, if we equateEq. (1) to 0.5, then, in an averagesense,the ratio *will always be less than unity. As a result,(en+l - en) will approachOas Tn + Oforcing (X~ toward thedesirableconvergence.There are two broad classes of SA algorithm that arecommonly used for reservoir characterization6i10S11:(1). theMetropolis,l 1 and (2). the heat bath algorithm.4 TheMetropolis method for SA, although simple and effective, canbe computationally prohibitive for large-scale reservoirengineeringproblems because of the huge number of rejectionmoves it makes, especially at low temperature. Becauseof thisreason, in this paper we focus our attention on the alternativemethod, the heat bath algorithm (HBA), which is better suitedfor reservoir description. Ouenes et aL 11 and Datta Gupta6describe the application of Metropolis algorithm for reservoirdescriptionandAzencott13givesan extensivereviewon parallelMetropolisalgorithms.

    In the next sectionwe presenta brief descriptionof thesequential HBA followed by two parallelization schemes, Wealso comparetie efficiencyof theseschemes.

    Thereme five importantcomponentsin any SAalgorithm%Aconciserepi%sentmionldata structurk -A scalarobjective function,e, whichexpresses the objectivesof the optimization as a single number and also assignsweights amongmultipleobjectives,Aprocedurefor generatingrandomchanges.A controlparameterT andan anneahg schedule,(Tn).A convergencecriterion.What makes the HBA more attractive than the Metropolisalgorithm, especially for reservoir characterization, is thatannealing all the sites in every iteration tends to obtain themost general solution with the minimum number of movesor perturbations.Algorithm - For the generation of stochastic fieldsusingHBA thevarious stepscanbe summarizedasSelect an initial state. Usually this is a distribution ofpermeability,ki, on a desiredgrid, sampledrandomlyfromaspeciileddistributionbasedon experimentaldata.

    10

  • 7/30/2019 Parallel Simulated Annealing for Stochastic Reservoir Modeling

    3/10

    t.. .,

    SPE 26418 MANMATHN. PANDAANDLARRYW. LAKE2. Begin annealing. Start at the first gridblock and let thepermeability of this block assume M possibIe values of kiwhere i = O,..., M-1, randomly selected from theexperimentaldistribution. HereM is a numberconvenientlychosen for computational efficiency. Calculate thecorresponding objective functions, e(ki). Choose a newvalue of permeability for the block by drawing at randomfromthe following distribution:

    P(lq) = exp (-e(ki)/ T) /~ *exp(-e@i)/ T). (2)i=OPermeability values that reduce the objective function e(ki) aregenerallychosen because they give rise to large probabilities inEq. (2).3. Visit sequentially all other gridblock and update thepermeabilityas in step 3. An iteration is completewhen all thegridblock havebeen visited.4. On completion of an iteration, lower the temperature T,according to a specified coding schedule, for example, T =(0.8)n To where n is the iteration number and To is the initialtemperature.5. Return to step 2 and continue until a suitably definedconvergenceis satisfied.Obje#tiveFunction- Stochastic permeability fields canbe generated using the HBA by posing this task as anoptimization problem. We assume that permeability is aspatiallyrelatedrandomvariable. The averagepropertiesof sucha field are generally obtained from wre and log &ta, and theautocorrelationstructure is defined by variograrns. We assumethat the vmiogram, y(h),depends on the separation distance honly,

    2~) = E[ {Z(x)- z(x+h))?l,where z(x) is a spatially-related random variable, such aspermeabilityhere, and E is the expectationoperator. In order togeneratea randompermeabilityJeld with a specifiedcorrelationstructure, we minimize the error between the variogramcomputed from the generated field and the experimentalvariogram. Thus theobjectivefunctioncan be written as

    Minimize ( ~ @i(7C(ho -ye (hi))2 ] , (3)alliwherethe subscriptsc ande standforcomputedandexperimentalattributes, respectively, and the qs are weighting factors thatsum to one. We use equal weightsforall lags jnEq. (3).The choice of qs depends strongly on the experienceand the judgment of the user. For instance, inF@(3) wewouldlike to assign a relatively large value to qs that correspond tosmall lag distancesto preserve the spatialcovariancestructureofthe genexatedpermeability field, whiie relatively smaller valuesare usually assigned to those corresponding to large lagdistances, This is because the permeability value of anygridblockis influencedonly by the blocks in its nearvicinity.

    PARALLEL SIMULATED.

    computationally very intense

    3ANNEALINGSimulated annealing is aalgorithm. Even though itsamples only a fraction of the entire configuration space, the

    absolute number of moves or perturbations often becomesprohibitively large even for modem supercomputers. Inaddition, computational time also increases as the number ofterms in the objective functionincreases. For example,additionof tracer flowdata to Eq. (3) to compute the objective functionmay increase the computation time sharply. However, beingprimarily a Monte Carlo method, SA is well suited forapplication on parallel computers. Particularly for largespatiallycorrelatedproblems, like reservoirmodeling, theHBAis more suitable for parallel application.This is because, unlikethe Metropolis algorithm, in HBA there is no rejection ofmoves at low temperature% this is particularly an attractivefeature for using computer resources efficiently. In this paperwe focus our attention on parallelizing the HBA for reservoircharactetiation. Azencott13 gives a thorough review on theparallel Metropolis algorithms that is suitable for generatinguncordated randomfields.~- One of the ch~lenges Ofparallel computing techniques is the assignment ofcomputational sub-domains to individual processors such thatthe required mmmunication between them is minimum. Eventhough,a large numberof ud hoc approachescanbe foundin theliterature, an efficient parallelization scheme appears to beproblem-dependent. In this paper, we describe two schemes ofparallel HBA, called scheme 1 and 2, and discuss their relativeadvantagesand disadvantages.Scheme 1. This scheme divides the computational grid intoequal-sizedsubgrids,thenumberof which is equal to thenumberof processorsbeing used. This is also called the systolicor thedomain-division scheme.13 Each processor is assigned to aparticular subgrid. The processors apply SA to their respectivesubgrids asynchronously. Synchronization is linked to anoptimality test. Figure 2 presents an illustration of thisscheme. Consider generating a permeability field on nbl gridsusing n processors. Scheme 1 divides this grid into n equal-ized subgrids and assigns them to the processors. Figure 2showsa schematicof an initial assignmentof fourprocessorsonan arbitrary grid. For completing one iteration, the processorsapply HBA to all the gridblock in their subgrids sequentially.After the optimal permeability values are determined on all theprocessors following a perturbation, these values are sharedamong the processors by a globalsendoperation to update thepermeability field. Since the update comes after the optimalvalues--are determined, the optimality condition lags thecomputation by one iteration. The effect of this lag on theoptimal condition, however, does not become critical as long asthe size of the subgrids is larger than the measure of theautocorrelationof the fiel~ IIIa later sectionwedemonstratetheeffect of integral measures on the performance of this schemethrough an example. The time complexity per iteration,which

    11

  • 7/30/2019 Parallel Simulated Annealing for Stochastic Reservoir Modeling

    4/10

    4 PARALLELSIMULATEDANNEALINGFORSTOCHASTICRESERVOIRMODELING SPE 26418is the distribution of CPU time among various operations forarithmetic computation and communication between theprocessors,of this wheme canM expressedas:

    Communicatiorcnbl (tgwnd+ (n-l) trav)Computation:!!# (km fi.) ndiv + ~ (bpt perm.) (4)

    repeats the entire operation. The time complexity of thisschemei~Communication:nbl ( (n-l) tsend + tgsend+ 2 (n-l)

    trecv)Computation ~ (k fn.) ndiv + (top ~m.) ($where tsend is the CPU time spent to send a message betweentwoprocessors.

    n =nbl =ndiv =

    tgsend =kecv =%7fn. =

    topt perm. =

    numbr ofprocessorstotal numberof gridblocknumber of divisions between theminimum and the maximumperrneabtityCPU time taken by a global sendoperationCPU time taken by a receive operationbetweentwoprocessors.CPU time required for computing theerror function,

    all iCPU time required to determine theoptimal permeability from the Gibbs~obabilf~ distributionha typical scheme1applicationthe serviceprocessorornode (usually processor O)reads the input data and sends theinformation to all other processors by a global send operation.As the processors sequentially visit the gridb!ocks in theirrespective subgrids they evaluate the optimal permeabilityyfollowing a procedure that is identical to the sequential HBA.Even though each processor performs annealing within itssubgrid, it uses the permeability values of the entire grid meshto evahate the objective function. Henceaftereachgridblock isamealed the new permeability value is shared by all theprocessorsby a globalexchangeoperationin order to updatethepermeability field. After an iteration is completed, thetemperature is lowered, and annealing is continued until a prespcified convergencecriterionis satisfied.

    Scheme 2. This scheme assigns all the processors to a singlegridblock at a time starting with gridblock 1. This schemeworks like the master-slave scheme.13 Each processor isassigned a permeability between the maximum and theminimum values for which it computes the Gibbs probabilityfimction. We illustrate this schemein Fig. 3. In scheme2 oneof the prowsors, often called the master, does all the book-keeping and the others, called the slaves, carry out thecomputationsand send the results to the master processor. Themaster determines the optimal permeability value for thegridblock fmm the computed Gibbs probability function andsends the optimal value to the slavesby a global send operation.Then the processors march to a neighboring gridblock. Whenall the gridblock are sequentially visited one iteration iscomplete. At this time the master lowers the temperature and

    Scheme2 inchxdesthe followingsteps:1. Define the initial distributionof permeabilityon a gridmeshas in sequentialHBA.2. Randomly select a specific number of permeabilities, ndiv,fromtheexperimentalpermeabilitydistribution.3. Designate the masterprocessor (usually processor 0), Readthe input data on the masterprocessorand send the informationto all other processorsby a global send operation,4. Start with gridblock 1. Assign one permea~lity from ndivvalues to eachprocessor. These processorscompute theGibbsprobability @q. 2) and send them to the master. After all ndivpermeability values have been assigned, the master determinesthe optimal permeability from the entire Gibbs probabilityfunctionand sends it to all the other processor by a global sendoperation.4. Sequentially visit all the gridblock to complete oneiteration.5. Lowerthe temperatureartdrepeat steps 3 and4.6. Repeat steps 3 to 5 untill a desired stopping criterion issatisfiedComparingEqs. (4) and (5) we observe that in scheme2 the communicationoverhead is larger than in scheme 1. Yet,for reservoir modeling applicationswhere permeability is oftenspatially correlated, scheme2 is a more attractive option sincethe accuracy of this algorithm does not depend on the spatialstructure of the permeability field. In the next section wepresent the results of an application of the parallel HBA togeneratestochasticpermeabilityfields that showsthe validityofthe above cktim. We also compare the merits of these twoschemesthere.RESULTS AND DISCUSSION

    ofCombinatorialoptimizationtechniques,suchas SA,are superiorthan the traditional stochastic simulationmethods in generatingstochastic permeability fields because (1) these methods arecapableof ittcoqxxatinginformationfrom a numberof soumes,and (2) the results obtainedby thesemethodsare robustbecauseof the non-hwar solution procedures. ParalfeISA ako has anadditionaladvantagein termsof savingCPU time.In this section we apply the parallel HBA to simulatethe distribution of permeability on a slab obtained from anactual sample by assuming that permeability is a spatiallyrandomvariablewith known variograms. We also assume thatpermeability is log-normally distributed with a second-order

    12

  • 7/30/2019 Parallel Simulated Annealing for Stochastic Reservoir Modeling

    5/10

    . .

    SPE 26418 MANMATHN. PANDAANDLARRYW. LAKE 5statimarity structure in space. We compare the efftciemcyofboth parallelization schemes in terms of their CPU timerequirement and accumcy, We extend the application to studythe effat of problem size on the efficiency of the algorithms.Our applicationalso demonstratesthat thechoiceof paralklizingschemes depends on the autocorrelation structure of a randomfield

    - The AntoliniSandstone is an eolian outcrop from northern Arizona.14 Arectangularsamplemeasuring38x 13x 2 cmwas characterizedby minipermeameter measurements on each square centimeterand by a miscible tracer flow. Our study focuses on one of thefacesof the slabdenotedasFaceB.Figure 4(a) shows a contour map of the airpermeability distribution on Face B. The permeability variesbetween 10 and 1480 md with an arithmetic average value of477mdand standarddeviationof 314md. WepresentMatberonestimates15of theaverageverticaland the horizontalvariogmmsinFig. 4(b). To apply the parallelHBAwe define the objectivefunctionas theweightedsumof thesquareddifferencesbetweenthe computed and experimental average horizontal and verticalvariograms. All rhe simulation runs are carried out on an InteliPSC860 Hypercubeat The Universityof Texas at Austin. Theresultsampresentedbelow.Scheme 1. Figure 5 shows the behavior of the objectivefunction (i.e., squared error in Eq. (3)) as a function of thenumber of iterations. The number of processors is variedparametricallyfrom 1 to 8. When the number of processors isincreasedhorn 4 to 8 them is significantinterferencebetweentheprocessors that forces the permeability configuration to a localminimumafter 20 itemions. The interference is becauseof thelargespatialcorrelationof the permeabilityfieldThe cause of interference between the processorsdepends on the integral scales of the permeability field. Thelarger the integral scale, the stronger is the interference. Forinstance, consider the variograms of the Antolini sample. Itsintegral scales am 15 cm and 3 cm in horizontal and verticaldirections, respectively. This means that a perturbation in anygridblock permeability influences the permeability of all thegridblock that fall within the ellipticarea whose halfmajoraxisis the horizontal integral scale and the half minor axis, thevertical integral scale. In scheme 1 when the number ofprocessors is increased from 4 to 8 the horizontal distancebetween two consecutive processors become less than thehorizontal integral scale and this give rise to processorintelfenmce.

    Figure 6 compares four realizations of Antolini corepermeabilitygeneratedusing 1,2,4 and 8 processors. The localminimum that occurs when 8 processors are used is veryapparentfrom the last figure.In this section we also study the effect of theautocorrelation structure of a permeability field on theperformanceof scheme 1. To accomplish this we use a powerlawwuiogram(fractal)model in thehorizontal direction. Figure7(a) presents five cases with varying Hurst coefficien~ H,(Yang, 1989}. Figure 7(b) presents the variation in the

    objective functions corresponding to these five cases. Fromthese figureswe infer that scheme 1 is less sensitive to fieldsthat have very little spatial correlation, which corresponds tosmall value of H (S 0.1). As H increases the degree ofinterference between the processors increases, forcing thesolution to a local minimum.Scheme 2. Figure 8 shows the behavior of the objectivefunctionwhen scheme2 is used. This figure varies the numberof processors from 1 to 16. We observe that the objectivefunctiondecreasesmonotonicallywith the numberof iterations.Unlikescheme1,the behaviorof the objectivefunctiondoesnotdependon thenumberof processom. Theaccumcyof theresultsis also appaxentfrom Fig, 9, whichpresents four realizationsofAntolini core permeability obtained using 1, 2, 8, and 16processors.Scheme 2 of the parallel HBA is better suited forapplication to spatially correlated problems. This scheme isrobust in terms of the number of processors and theautocorrelation structure of the permeability field. However,comparing Fig. 6 and 9 we find that the communicationoverheadinscheme2 is larger than in scheme1,whichindicatesthat a better method would be a combination of these twoschemes. In this paper, however, we restrict to the use ofscheme2 forparallelizationof HBA.To verify the accuracy of the generated permeabilityfiekls in Fig. 9 wepresent results of a tracer flowacrossa two-dimensional vertical cross section using the chemical floodingsimulator UTCHEM.16 Figure 10compares the history of thetracer effluent concentration obtained from the simulation withexperimentaldata. The match between the simulation and theexperimentaldata shows the validhy of scheme2 in generatingspatiallycorrelatedpermeabilityfields.

    We have so far studied theapplication of the parallel SA algorithm to generate spatiallycorrelated permeability fields without regard to the effect ofvarious parameterson the efficiencyof the algorithm. Someofthe importantparametersof SA that effectthe efficiencyare thesize of the problem, the cooling schedule, and the stoppingcriterion. The cooling schedule and the stoppingcriterionhaveidenticaleffectson thesequentialand theparallelSA algorithms,In this section we study the effectof problem size,i.e.,the numberof gridblock, on the performanceof the parallel SAalgorithm. The objective here is to determine the robustnessofthe parallel algorithm, in terms of computational time andefficiency, as the problem size increases. The algorithm isrobustand linearly scalable if the CPU time requiredto satisfyaconvergence criterion increases linearly with the size of theproblem. In the case of a linearly scalablealgorithmthenumberof arithmetic operations increases linearly with the size of theproblem. l%ereforqthe numberof floatingpoint operationsperunit CPU time remains approximately constant as the size ofthe problem increases. We summarize the sensitivity of theparallelheat bathalgorithmbelow.(i) ~ion time vs. nroblem siZQ- Figure 11 is a plot ofCPU timerequiredto obtain a suitableconvergenceof the HBA

    13

  • 7/30/2019 Parallel Simulated Annealing for Stochastic Reservoir Modeling

    6/10

    6 PARALLELSIMULATEDANNEALINGFOR STOCHASTICRESERVOIRMODELING SPE 26418for various numberof processors. In this figure, the size of thepermeability field is the parameter. The computation time for agiven problem size decreases as the numlxw of processorsincreases. The nxhwtionof computation time is largestfor largeproblems because of the increase in the computation tocommunication ratio. In other words, small problems incur acomparativelylargecommunicationoverheadonmultiprocessorconfigurations, and, hence, are better suited for sequentialoperationcompared to large problems. Figure 11also indicatesthat there is a considerabledecrease in CPU time as the numberof prcwessorsis increased A ten-fold speed-up is obtained ingenerating permeability fields on a 75 x 25 grid using 16processors.(ii) Effj.@ncv vs. ~ - Figure 12 presents a plot ofefficiencyvs. theproblemsize for various numberof processors,varying from 1 to 16. The efficiency of an algorithm on nprocessorsis

    (6)where n= efficiency,%tl, tn = CPU time taken to solve aproblemon 1 and n processors,respectivelynbl = problem size (numberofgridbloch hem)n = numberof processorsThe efficiency of multipmessor systems is always less than100 % because of ?he loss of CPU time in communictltionamong the processors. Figure 12 also shows that there is ageneral trend of increasing efficiency of multiprocessorconfigurations for large problems. This is caused by thereduction in the ratio of communication to computation time.The average efficiencyof scheme 2 in our applications is 85 %when16processorsare used.(iii) ~ s. @@3.Wt2 - ne number of floatingpoint operations (FLOP) executed by an algorithm is anotherindicationof its efficiency. MonteCarlo simulation algorithms,and especially SA algorithms, typically achieve large operationcounts. Figure 13 presents the operation count for variousproblem sizes. A typical operation count is about 10millionfloating point operations per second (MFLOP) per processorwhen the number of gridblock is larger than 500. This meansthat a large fraction of the CPU time is invested in arithmeticcomputations. A large operation count also means that only asmall fraction of CPU time is used during communicationsamong the processors. Since the parallel HBA is only slightlysensitive to the problem size, it can be easily used for solvinglarge problems without losing a large fraction of thecomputationalefficiency.

    CONCLUSIONSThis paperpresents an application of simulated annealing (SA)on parallel computers to generate spatially correlated data.Generationof stochastic permeability fields is only a particularexample. We considertwo schemesof parrdlelkttion of theheatbath algorithm (HBA) and discuss their relative merits, inparticular relevance to stochastic generation of permeabilityfields. Generationof uncurrdated dataon pamllelwmpuccrsisrelatively straight forward where a parallel Metropolisalgorithm13 can be easily used. What makes the generationofcorrektteddata on parallel machines particularly difficult is thatthe computationson individual gridblock are not independent.Thus, a change in any gridblock influences the values of othergridblock affecting the accuracy of the results. This paperpresents anew schemeto paral!elizethe HBA that can generatespatially correlated data without affecting the accuracy of theresults. These generated fields honor the experimentalvariograms. In our applications, we also vary the size of thepermeabilityfields generatedand thenumberof processorsusedto study the sensitivity of these parameters.We arrive at the following conclusions from theanalysisof theparallelapplicationof the heatbath algorithm.(1) Parallel heat bath algorithm, scheme 2, is wellsuited for stochastic generation of strongly spatially correlateddata liie permeabilityfields.(2) A master-slave type approach to parallelize HBAyields more accurate and robust results compared to a domain-divisiontype approach.(3) Since relatively !itt.lesynchronizedcommunicationis necessary,the application of HBAon multiprocessorsyieldsrelativelylarge computationalefficiency. A typicalefficiencyinthis study is 85 %.(4) A typical operation count of SA algorithms on anIntel iPSC 860 on a 75 x 25 grid mesh is 10 MFLOP perprwessor when16processorsareused.(5) In generatingpermeability fields on an 75 x 25 gridusing 16processors,weobtaineda ten-foldspeed-up.ACKNOWLEDGMENTSThis work is supported by the EnhancedOil and Gas RecoveryResearchProgramof the Center for Petroleum and GeosystemsEngineering at The University of Texas at Austin. Larry W.Lake held a Shell Distinguished Chair during the period of thiswork and currently holds the Montcrief Centenial Chair inPetroleumEngineeringat TheUniversity of Texas at Austin.

    NOMENCLATUREen Energyor objective functionat nthmovee~i) Energyor objectivefunctionforkih Lagdistancek Permeabilityof a gridblock,L2

    *14

  • 7/30/2019 Parallel Simulated Annealing for Stochastic Reservoir Modeling

    7/10

    SPE26418EnblndivPn+Ttltnt~ndlgsendt~vterr fn.topt perm.

    MANMATHN, PANDAANDLARRYW. LAKEfkrSge permeability,L2Total numberof gridbhcksNamber of divisions between the minimum andthemaximumpermeabilityTransition probability from n to nilconfigurationTemperatureor control parameter for simulatedannealingTime taken to solve a problem on a singleprocessorinEq. (6)Time taken to solve a problem on p proce:inEq. (6)CPU time spent to send a message between twoprocessorsCPU time takenby a global sendoperationCPUtime takenby an messagereceiveoperationbetweentwoprocessors.CPU time rquired for computing the errorfunctionCPU time required to determinethe optimalpermeability from theGibbsprobabilitydistribution

    Greek SymbolsY Variogmrnto Weighting factor, used to provide selectivepreferenceto datan Efficiencyof a parallel algorithmas in I@ (6)

    REFERENCES1. Geman, S. and Geman, D.: Stochastic Relaxation,Gibbs Distributions, and the Bayesian Restoration ofImages; Institute of Elect. Electron. Eng. Trans. onPatterns Analysis and Machine Intelligence, ~AM1- ~(1984)721-741.2. Hinton, G. and Sejnowski, T.: OptimalPerceptualInference: Proceedings of Inst.Elect, andElectron.Eng.Comp. Sot. Conf on Computer Vision and PatternRecognition (1983) 448-453.3. Dougherty, D.E. and Marryot4 R.A.: MarlcovChain Length Effects on Optimization in GroundwaterManagement by Simulated Annealing; ComputationalMethods in Geosciences, W.E. Fitzgibbon and M.F.Wheeler(etIs.),SIAMPhiladelphia (1992)S3 -65.4. Rothman, D.H.: Nonlinear Inversion, StatisticalMechanics,andRmidual StaticsEstimation,Geophysics(1985) 50,12,2784-2796.

    5. Sen, M.K. and Stoffa, P.L.: Nonlinear One-Dimensional Seismic Waveform Inversion UsingSimulated Annealing, Geophysics (1991) 56, 1624-1638.6. Datta Gupta, A.: Stochastic Heterogeneity,Dispersion and Field Tracer Response: Ph.D.dissertation,The U. of Texas,Austin (1992).7. Lee, W.J.: Well Testing, SPE Text Book Series, 1,SPE of AIME,New York (1982).8. Allison, S.B., Pope, G.A. and Sephernoori, K.:Analysis of Field Tracers for Rt .ervoir Description,~.pet. Sci. Eng. (1991) 5,2,173-186.9. Yang, A.P.: Stochastic Heterogeneity andDispersion: Fh.D.dissertation, TheU. of Texas, Austin(1990).10. Farmer, C.L.: The Mathematical Generation ofReservoir Geology in Numerical Rocks, JointIMAISPEEuropean Conference on the MathematicsofOilRecovery, RobinsonCollege, CambridgeUniversity,(July, 1989j.11. Ouenes, A., Bahralolom, L, Gutjahr, A., and Lee,R.: Conditioning Permeability Fields by SimulatedAnnealing, paper presented at the Third EuropeanConference on the Mathematics of Oil Recovery, Delft,Netherlands(1992)June 17-19.12. Metropolis, N,, Rosenbluth, A,, Rosenbluth, M.,Teller,A., andTeller, E.: Equationof StateCalculationsby Fast Computing Machines? Journal of ChemicalPhysic~(1953) 21,6,1087-1092.13. SimulatedAnneaiing:Parailelization Techniques,R.Azencott (cd.), John Wiley and Sons, Inc., New York(1992).14. Ganapathy, S., Wreath, D.G., Lim, M.T., Rouse,B.A., Pope, G.A., and Sephemoori, K.: Simulation ofHeterogeneous Sandstone Experiments CharacterizedUsing CT Scanning: paper SPE 21757 presented at theWestern Regional Meeting, Long Beach, California,March20-22,1991.15. Matheron,G; Traite de GeostatistiqueAppliqu4eVTome1.MemoiresduBureau deRecherches Gkologiqueset Minieres,No. 14,Editions Technip, Paris, 1962.16. Datta Gupta, A., Pope, G.A., Sephernoori, K., andThrasher, R.: A Symmetric Positive DefiniteFormulation of a Three~DimeniionalMicellar/PolymerSimulator,SPEReser. Eng. (Nov. 1986)622-632.

    15

    7

  • 7/30/2019 Parallel Simulated Annealing for Stochastic Reservoir Modeling

    8/10

    e,

    SPE26418Random Walk

    Markov Su~=Space \ Conve;ged SolutionMode; Space (minimum energy)

    Figure 1. Schematicof dte simulatedanncalbigalgorithmthatsamplesonly a fractionof drecntkc osmlclspace. EachmoveworkaMm aMarkov process.

    Scheme2

    (For gridblc+km)

    k (maximum)k(1)k(2)

    k (minimum)

    Pesmcabilidcssclecccdfsumtbccxpcrimemaldistribution

    Figure3. Schcmc2 for parallel theHBA. All theproccaaoraamassignedtothegcidblockssqucntially. This schcrneyieldsscsblcsolutions.

    Scheme 1

    Pmccssor

    x dbcction ~

    q b Gridblock

    q b q - B, 1 k Sub-domains

    Marchingin y &cc[ion Marchingin x ditcctiofl

    Fi_ 2. SchemeI for paralleltheHBA. The processorsvisit thetidblwks in their=-sub-domainsqucntially. The optimal conditionlagsthecomputationby

    :

    16

    onei!cration.

    I.w Distance,ens

    mZUSE4(b). Avcmgchorizonti, andverticalcx~rimcn~ semiv~O_ compur~fmm FaceB Antolini Sandatoncpermeabilitymcasurcmcnts.

    md

    38cm4 bHorizontal

    ?igurc4(a). PecmcabilitydistributiononFaceB of theAmolird slab.

    16001400

    ; 1200$ 1000

    6006004002000

  • 7/30/2019 Parallel Simulated Annealing for Stochastic Reservoir Modeling

    9/10

    Parallelof FUA Scheme1

    0S1015202S 30 3s .40NumberofIterations

    Figure5. Changeofobjectivefunctionwith thenumberof iterationsfor pamdlelscheme1. Thc sesultaconvergeto aIwal nrinimumwhenthenumberofprocessorsusedis Iargcrthan 4.

    mdI60014CI, 1200* 1000,, 8006(IO4fXl2000Figure6. Stochasticpermeabilitydistributionon FaceB of theAntolini slab.

    Pesmseabilityieldsgcncratcdusingschemo1of pasallclHBA onaniPSC 860 Hypercubeona 38x 13mesh.

    SPE~6418

    Iertical Variogmm= H =0.1+- H=O.4H=O.S-a- H.o.6~ -- H=O.98., . . 1. . . 3. ...1....1 . . . . 1.2..1...0

    Figure, 7(a).

    5 10 1s 20 25 30 35 40LagDistance,cm

    Power law (fractal) variograma used to study thesensitivity of scheme1 to the autocorrelationsmcmreof thepctmcability field. The verticafvariogmrnis theaverageexperimentalvtiogrsm compcitedmm FaceBAnttrlini Sandstonepermeabilitymeasurements.

    I 06

    -+- H.o.s103 Hm O.*

    -*- H.0,4lor ,. :..+.!c-mm. Sj- 0,7S I-*-rs. oc9 1, :, I,nl 1 I I 1..0 10 20 30 40

    NumberofIterations

    Figure7(b). Effect of autocorrelationstructureon thepetiorrnanccof scherm 1.For small Hurstcoefficientscheme1is stable. However, whenH becomeslargethesolutionisforcedtowards10CSIminima.

    ParsllcIHBA .%hcme2or~~

    .

    051015 20 25 30 35 40Numberof Iterations

    Figure8. Changeof objectivefunctionwith thenumberofiterationsfor pwallel HB.4,scheme2. This plotshowsthatscheme2 is a stablealgorithm.

    17

  • 7/30/2019 Parallel Simulated Annealing for Stochastic Reservoir Modeling

    10/10

    No. ofpmccsems= 1,CPU timo= 79MJsec

    No. ofprwesaors=4, CPU2fmo= 2015W

    No. ofprocessors=, CPUtftrw= 1209 S02

    No. of prucssora = 16,CPU time= 826 sw

    1fxxl1400. 12LM1000$X 8006004002000Figure9. Stwhastfcpsrmeab]litydisrnbmiononFaceB of theAmolini sktb.Permeability fieldsgerteratcdusingscheme2of paraflclHBA on an

    iPSC 860 Hypercubeona 38x 13mesh.

    i!1. 00.80.60.4

    0.20.0

    I 8 , i

    q

    0. 0 0.4 0.8 1.2 1.6 2.0Pore Volume Injected

    Figure10. Cmnpa.isonof. xperimcntaleffluenttracerconccntzarionwith lheresultsfmm tracerflo- simulationacrossa verticafcresssectionof theAmolini slabgeneratedusingparallelHBA, achcme2.

    2 0 5 10 1s 20Number of Processors

    Figure11. Computationtimerequiredformultiprocessorcmrtigurs!ionsor variouaproblemsir%s.CPU time rwhceswith thenumberofprwessors. All runsarcma& on Intel iPSC 860 Hypercube.

    120

    [:!

    No. of Processors = 1 2100 / /-- -- --- -- - - --- -- --- - - --- - -----_. .- - -- -- 4e 80s

    \_-- . ..-. --e

    : 60 do------- *------~ \ 8,40 q 1620/ a()~o 500 1000 1s00 7300

    problem Size, number of grldblocks

    Figure 12. COmpmationslc ffk ie rrc y o f v ari ou s m ul ti pr oc e sso r c o nfig ura tio ns.Eficiency increaseswi!h theproblemsize bxausc of thereductionin communicationoverhead.

    1-.1. y.

    &$ +.;y-:rdbocka2000~ 1s -%