icecube simulation with ppc on gpus dmitry chirkin, uw madison photon propagation code graphics...
TRANSCRIPT
![Page 1: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/1.jpg)
IceCube simulation with PPC on GPUs
Dmitry Chirkin, UW Madison
photon propagation codegraphics processing unit
![Page 2: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/2.jpg)
IceCube simulation with PPC
Photonics: 2000 – up to now Photon propagation code PPC: 2009 - now
![Page 3: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/3.jpg)
Photonics: conventional on CPU
• First, run photonics to fill space with photons, tabulate the result
• Create such tables for nominal light sources: cascade and uniform half-muon
• Simulate photon propagation by looking up photon density in tabulated distributions
Table generation is slow Simulation suffers from a wide range of binning artifacts Simulation is also slow! (most time is spent loading the tables)
![Page 4: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/4.jpg)
Direct photon tracking with PPC
• simulating flasher/standard candle photons• same code for muon/cascade simulation
• using precise scattering function: linear combination of HG+SAM• using tabulated (in 10 m depth slices) layered ice structure• employing 6-parameter ice model to extrapolate in wavelength
• tilt in the ice layer structure is properly taken into account
• transparent folding of acceptance and efficiencies• precise tracking through layers of ice, no interpolation needed
• precise simulation of the longitudinal development of cascades and• angular distribution of particles emitting Cherenkov photons
photon propagation code
![Page 5: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/5.jpg)
Approximation to Mie scattering
fSL
Simplified Liu:
Henyey-Greenstein:
Mie:
Describes scattering on acid, mineral, salt, and soot with concentrations and radii at SP
![Page 6: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/6.jpg)
Dependence on g=<cos()> and fSL
g=<cos()> fSL
0.8 00.9 00.95 0
0.9 0.30.9 0.50.9 1.0
flashing 63-50 64-50
![Page 7: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/7.jpg)
Ice tilt in ppc
Measured with dust loggers (Ryan Bay)
![Page 8: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/8.jpg)
Photon angular profile
from thesis of Christopher Wiebusch
![Page 9: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/9.jpg)
PPC simulation on GPUgraphics processing unit
execution threads
propagation steps(between scatterings)
photon absorbed
new photon created(taken from the pool)
threads completetheir execution(no more photons)
Running on an NVidia GTX 295 CUDA-capable card,ppc is configured with:
448 threads in 30 blocks (total of 13440 threads)average of ~ 1024 photons per thread (total of 1.38 . 107 photons per call)
![Page 10: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/10.jpg)
Photon Propagation Code: PPCThere are 5 versions of the ppc:
• original c++• "fast" c++• in Assembly• for CUDA GPU• icetray module
All versions verified to produce identical results
comparison with i3mcmlhttp://icecube.wisc.edu/~dima/work/WISC/ppc/
![Page 11: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/11.jpg)
ppc icetray module
• at http://code.icecube.wisc.edu/svn/projects/ppc/trunk/
• uses a wrapper: private/ppc/i3ppc.cxx, compiled by cmake system into the libppc.so
• additional library libxppc.so is compiled by cmakeSet GPU_XPPC:BOOL=ON or OFF
• or can also be compiled by running make in private/ppc/gpu: “make glib” compiles gpu-accelerated version (needs cuda tools) “make clib” compiles cpu version (from the same sources!)
![Page 12: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/12.jpg)
ppc example script run.pyif(len(sys.argv)!=6): print "Use: run.py [corsika/nugen/flasher] [gpu] [seed] [infile/num of flasher events] [outfile]" sys.exit()…det = "ic86"detector = False…os.putenv("PPCTABLESDIR", expandvars("$I3_BUILD/ppc/resources/ice/mie"))…if(mode == "flasher"): … str=63 dom=20 nph=8.e9
tray.AddModule("I3PhotoFlash", "photoflash")(…)
os.putenv("WFLA", "405") # flasher wavelength; set to 337 for standard candles os.putenv("FLDR", "-1") # direction of the first flasher LED … # Set FLDR=x+(n-1)*360, where 0<=x<360 and n>0 to simulate n LEDs in a # symmetrical n-fold pattern, with first LED centered in the direction x. # Negative or unset FLDR simulates a symmetric in azimuth pattern of light.
tray.AddModule("i3ppc", "ppc")( ("gpu", gpu), ("bad", bad), ("nph", nph*0.1315/25), # corrected for efficiency and DOM oversize factor; eff(337)=0.0354 ("fla", OMKey(str, dom)), # set str=-str for tilted flashers, str=0 and dom=1,2 for SC1 and 2 )
else:
![Page 13: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/13.jpg)
ppc-pick and ppc-eff
ppc-pick: restrict to primaries below MaxEpri
load("libppc-pick")
tray.AddModule("I3IcePickModule<I3EpriFilt>","emax")( ("DiscardEvents", True), ("MaxEpri", 1.e9*I3Units.GeV) )
ppc-eff: reduce efficiency from 1.0 to eff
load("libppc-eff")
tray.AddModule("AdjEff", "eff")( ("eff", eff) )
![Page 14: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/14.jpg)
ppc homepage
http://icecube.wisc.edu/~dima/work/WISC/ppc
![Page 15: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/15.jpg)
GPU scalingOriginal: 1/2.08 1/2.70CPU c++: 1.00 1.00Assembly: 1.25 1.37GTX 295: 147 157GTX/Ori: 307 424C1060: 104 112C2050: 157 150GTX 480: 210 204
On GTX 295: 1.296 GHzRunning on 30 MPs x 448 threadsKernel uses: l=0 r=35 s=8176 c=62400
On GTX 480: 1.401 GHzRunning on 15 MPs x 768 threadsKernel uses: l=0 r=40 s=3960 c=62400
On C1060: 1.296 GHzRunning on 30 MPs x 448 threadsKernel uses: l=0 r=35 s=3992 c=62400
On C2050: 1.147 GHzRunning on 14 MPs x 768 threadsKernel uses: l=0 r=41 s=3960 c=62400
Uses cudaGetDeviceProperties() to get the number of multiprocessors,Uses cudaFuncGetAttributes() to get the maximum number of threads
![Page 16: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/16.jpg)
GTX 480 vs. GTX 295
• GTX 295 has 2 GPUs• 240 MPs in 30 cores
• 8 MPs per core• 2 single-precision sFPUs 60 sFPUs per GPU
480 cores per card 120 sFPUs per card
Shared memory:• 16Kb per core• 960 Kb per card
• GTX 480 has 1 GPU• 480 MPs in 15 cores
• 32 MPs per core• 4 single-precision sFPUs 60 sFPUs per GPU
480 cores per card 60 sFPUs per card (!)
shared memory• up to 48Kb of per core• up to 720 Kb per card
Why is ppc not a factor 2 faster on GTX 480 GPU than on GTX 295 GPU?
![Page 17: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/17.jpg)
Kernel time calculationRun 3232 (corsika) IC86 processing on cuda002 (per file):
GTX 295: Device time: 1123741.1 (in-kernel: 1115487.9...1122539.1) [ms]GTX 480: Device time: 693447.8 (in-kernel: 691775.9...693586.2) [ms]
If more than 1 thread is running using same GPU:
Device time: 1417203.1 (in-kernel: 1072643.6...1079405.0) [ms]
3 counters: 1. time difference before/after kernel launch in host code2. in-kernel, using cycle counter: min thread time3. max thread time
Also, real/user/sys times of top:
gpus 6cpus 1cores 8files 693Real 749m4.693sUser 3456m10.888ssys 39m50.369sDevice time: 245312940.1 216887330.9 218253017.2 [ms]
files: 693 real: 64.8553 user: 37.8357 gpu: 58.9978 kernel: 52.4899 [seconds]
81%-91% GPU utilization
![Page 18: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/18.jpg)
Concurrent execution
time
CPU GPU CPU GPUThread 1:
CPU GPU CPUGPUThread 2:
CPU
GPU
CPU
GPU
CPU
GPU
CPU
GPUOne thread:
Create track segments
Copy track segments to GPU
Process photon hits
Copy photon hits from GPU
Need 2 buffers for track segments and photon hits
However: have 2 buffers:1 on host and 1 on GPU!Just need to synchronize
before the buffers are re-used
![Page 19: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/19.jpg)
Typical run times
corsika: run 3232: 10493 10.0345 sec filesic86/spx/3232 on cuda00[123] (53.4 seconds per job)1.2 days of real detector time in 6.5 days
nugen: run 2972: 9993 200000-event files; E^-2 weightedic86/spx/2972 on cudatest (25.1 seconds per job)entire 10k set of files in 2.9 days this is enough for an atmnu/diffuse analysis!
Considerations:
• Maximize GPU utilization by running only mmc+ppc parts on the GPU nodes• still, IC40 mmc+ppc+detector was run with ~80% GPU utilization
• run with 100% DOM efficiency, save all ppc events with at least 1 MC hit• apply a range of allowed efficiencies (70-100%) later with ppc-eff module
![Page 20: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/20.jpg)
Use in analysis
PPC run on GPUs was already used in several analyses already published or in progress.
The ease of changing the ice parameters facilitated propagation of ice uncertainties through the analysis, as all “systematics” simulation sets are simulated in roughly the same amount of time, with no extra overhead.
A similar quality uncertainty analysis based on photonics simulation would have taken much longer because of the large CPU cost of the initial table generation.
![Page 21: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/21.jpg)
OSG Summer School ‘10
DAG (Directed Acyclical Graph) -based simulation
• Separate simulation segments into tasks
• Assign task to a node in DAG
![Page 22: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/22.jpg)
DedicatedDedicatedGPU clusterGPU cluster
GPU-based simulation
• We have recently began experimenting
with GPU-based implementation of
portions of IceCube simulation.
• DAG assigns separate tasks to different
compute nodes
• Execution of photon propagation
simulation on dedicated GPU nodes.
• For many simulations GPU segment of
chain is much faster than the rest of the
simulation.
• Small number GPU-enabled machines
can consume the data from large pool lf
CPU cores.
PPCPPC
generatorgenerator generatorgeneratorgeneratorgenerator generatorgenerator
DetectorDetector DetectorDetector
![Page 23: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/23.jpg)
GPU-based simulation
• Optimal DAG differ depending on the
specific simulation
![Page 24: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/24.jpg)
Current Status of GPU-based Simulation Production
• NuGen and CORSIKA simulation currently running on Madison cluster: NPX3+CUDA
• Testing optimal DAG configurations to take advantage of GPUs
• Current Condor queue has option for selecting machines with GPUs
• There are multiple cores and multiple GPUs on each machine
• Condor assigns environment variable ${_CONDOR_SLOT} which is used as parameter to select a GPU on PPC.
• SGE, PBS IceProd plugins being writtend to support DAGs in order to incorporate other non-Condor sites:
• NERSC Dirac and Tesla and clusters in Dortmund, DESY, Alberta
• Other IceCube sites report to have GPUs available and will be incorporated into production.
![Page 25: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/25.jpg)
GPU/PPC production: coincident neutrino-CR muons
![Page 26: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/26.jpg)
Our initial GPU cluster
4 computers:• 1 cudatest• 3 cuda nodes (cuda001-3)
Each has4-core CPU3 GPU cards, each with 2 GPUs (i.e. 6 GPUs per computer)
Each computer is ~ $3500
![Page 27: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/27.jpg)
Our initial cluster
cudatest:
Found 6 devices, driver 2030, runtime 20300(1.3): GeForce GTX 295 1.296 GHz G(939261952) S(16384) C(65536) R(16384) W(32)
l1 o1 c0 h1 i0 m30 a256 M(262144) T(512: 512,512,64) G(65535,65535,1)1(1.3): GeForce GTX 295 1.296 GHz G(939261952) S(16384) C(65536) R(16384) W(32)
l0 o1 c0 h1 i0 m30 a256 M(262144) T(512: 512,512,64) G(65535,65535,1)2(1.3): GeForce GTX 295 1.296 GHz G(939261952) S(16384) C(65536) R(16384) W(32)
l0 o1 c0 h1 i0 m30 a256 M(262144) T(512: 512,512,64) G(65535,65535,1)3(1.3): GeForce GTX 295 1.296 GHz G(938803200) S(16384) C(65536) R(16384) W(32)
l0 o1 c0 h1 i0 m30 a256 M(262144) T(512: 512,512,64) G(65535,65535,1)4(1.3): GeForce GTX 295 1.296 GHz G(939261952) S(16384) C(65536) R(16384) W(32)
l0 o1 c0 h1 i0 m30 a256 M(262144) T(512: 512,512,64) G(65535,65535,1)5(1.3): GeForce GTX 295 1.296 GHz G(939261952) S(16384) C(65536) R(16384) W(32)
l0 o1 c0 h1 i0 m30 a256 M(262144) T(512: 512,512,64) G(65535,65535,1)
3 GTX 295 cards, each with 2 GPUs
PSU
0 and 14 and 52 and 3
nvidia-smi -lsa
GPU 0:Product Name : GeForce GTX 295Serial : 1803836293359PCI ID : 5eb10deTemperature : 87 C
GPU 1:Product Name : GeForce GTX 295Serial : 2497590956570PCI ID : 5eb10deTemperature : 90 C
GPU 2:Product Name : GeForce GTX 295Serial : 1247671583504PCI ID : 5eb10deTemperature : 100 C
GPU 3:Product Name : GeForce GTX 295Serial : 2353575330598PCI ID : 5eb10deTemperature : 105 C
GPU 4:Product Name : GeForce GTX 295Serial : 1939228426794PCI ID : 5eb10deTemperature : 100 C
GPU 5:Product Name : GeForce GTX 295Serial : 2347233542940PCI ID : 5eb10deTemperature : 103 C
![Page 28: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/28.jpg)
BAD multiprocessors (MPs)clistcudatest 0 1 2 3 4 5cuda001 0 1 2 3 4 5cuda002 0 1 2 3 4 5cuda003 0 1 2 3 4 5
#badmpscuda001 3 22cuda002 2 20cuda002 4 10
Disable 3 bad GPUs out of 24: 12.5%Disable 3 bad MPs out of 720: 0.4%!
Configured: xR=5 eff=0.95 sf=0.2 g=0.943Loaded 12 angsens coefficientsLoaded 6x170 dust layer pointsLoaded 16028 random multipliersLoaded 42 wavelenth pointsLoaded 171 ice layersLoaded 3540 DOMs (19x19)Processing f2k muons from stdin on device 2Total GPU memory usage: 83053520photons: 13762560 hits: 991Error: TOT was a nan or an inf 1 times! Bad MP #20photons: 13762560 hits: 393photons: 13762560 hits: 570photons: 13762560 hits: 501photons: 13762560 hits: 832photons: 13762560 hits: 717CUDA Error: unspecified launch failure
Total GPU memory usage: 83053520photons: 13762560 hits: 938Error: TOT was a nan or an inf 9 times! Bad MP #20 #20 #20 #20photons: 13762560 hits: 442photons: 13762560 hits: 627CUDA Error: unspecified launch failure
[dima@cuda002 gpu]$ cat mmc.1.f2k | BADMP=20 ./ppc 2 > /dev/nullConfigured: xR=5 eff=0.95 sf=0.2 g=0.943Loaded 12 angsens coefficientsLoaded 6x170 dust layer pointsLoaded 16028 random multipliersLoaded 42 wavelenth pointsLoaded 171 ice layersLoaded 3540 DOMs (19x19)Processing f2k muons from stdin on device 2Not using MP #20Total GPU memory usage: 83053520photons: 13762560 hits: 871…photons: 1813560 hits: 114
Device time: 31970.7 (in-kernel: 31725.6...31954.8) [ms]
Failure rates:
![Page 29: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/29.jpg)
• 3 x DELL PowerEdge C410x– 48 Tesla M2070 GPGPU– 21504 GPU cores– 48 TFlops single precision– 24 TFlops double precision
• 6 x DELL PowerEdge C6145– 24 AMD Opteron™ 6100 Processors– 288 CPU cores– 7 TFlops double precision
• QDR Infiniband Interconnect– Allows high speed MPI applications
• ~ $ 200,000
The IceWave Cluster
![Page 30: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/30.jpg)
•DELL PowerEdge C410x– 16 PCIe Expansion chassis – Use of C2050 TESLA GPUs ()– Flexible assignment of GPUs toServers
Allows 1-4 GPUs per server
Basic Elements
![Page 31: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/31.jpg)
•DELL PowerEdge C6145– 2 x 4 CPU Servers– AMD Opteron™ 6100 (Magny-Cours)– 12 cores per processor– 48-96 cores per 2U server– 192 GB per 2U server
Basic Elements
![Page 32: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/32.jpg)
Concluding remarks
• PPC (photon propagation code) is used by IceCube for photon tracking
• Precise treatment of the photon propagation, includes all known effects (longitudinal development of particle cascades, ice tilt, etc.)
• PPC can be run on CPUs or GPUs; running on a GPU is 100s of times faster than on a CPU core
• We use DAG to run PPC routinely on GPUs for mass production of simulated data
• GPU computers can be assembled with NVidia or AMD video cards however, some problems exist in consumer video cards bad MPs can be worked around in PPC computing-grade hardware can be used instead
![Page 33: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/33.jpg)
Extra slides
![Page 34: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/34.jpg)
Oversize DOM treatmentOversized DOM treatment (designed for minimum bias compared to oversize=1):
oversize only in direction perpendicular to the photon time needed to reach the nominal (non-oversized) DOM surface is added re-use the photon after it hits a DOM and ensure the causality in the flasher simulation
The oversize model was chosen carefully to produce the best possible agreement with the nominal x1 case (see next slide).
nominal DOMoversized DOM
oversized ~ 5 times
phot
on
This is a crucial optimization, however:Some bias is unavoidable since DOMs occupy larger space:
x1: diameter of 33 cmx5: 1.65 mx16: 5.3 m
This could lead to ~5-10% variation of the individual DOM simulated rates.
![Page 35: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/35.jpg)
Timing of oversized DOM MC
xR=1defaultdo not track back to detected DOMdo not track after detectionno ovesize delta correction!do not check causalitydel=(sqrtf(b*b+(1/(e.zR*e.zR-1)*c)-D)*e.zR-hdel=e.R-OMR
Flashing 63-50 63-48
64-48
64-52
xR=1
default
![Page 36: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/36.jpg)
ice density: 0.9216 mwe
handbook of chemistry and physics
T.Gow's data of density near the surface
T=221.5-0.00045319*d+5.822e-6*d2-273.15 (fit to AMANDA data)
Fit to (1-p1*exp(-p2*d))*f(T(d))*(1+0.94e-12*9.8*917*d)
![Page 37: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/37.jpg)
Device enumeration
cuda002:
Found 5 devices, driver 3010, runtime 30100(2.0): GeForce GTX 480 1.401 GHz G(1610285056) S(49152) C(65536) R(32768) W(32)
l0 o1 c0 h1 i0 m15 a512 M(2147483647) T(1024: 1024,1024,64) G(65535,65535,1)1(1.3): GeForce GTX 295 1.242 GHz G(938803200) S(16384) C(65536) R(16384) W(32)
l0 o1 c0 h1 i0 m30 a256 M(2147483647) T(512: 512,512,64) G(65535,65535,1)2(1.3): GeForce GTX 295 1.242 GHz G(939327488) S(16384) C(65536) R(16384) W(32)
l0 o1 c0 h1 i0 m30 a256 M(2147483647) T(512: 512,512,64) G(65535,65535,1)3(1.3): GeForce GTX 295 1.242 GHz G(939327488) S(16384) C(65536) R(16384) W(32)
l0 o1 c0 h1 i0 m30 a256 M(2147483647) T(512: 512,512,64) G(65535,65535,1)4(1.3): GeForce GTX 295 1.242 GHz G(939327488) S(16384) C(65536) R(16384) W(32)
l0 o1 c0 h1 i0 m30 a256 M(2147483647) T(512: 512,512,64) G(65535,65535,1)
2 GTX 295 cards, 1 GTX 480 card
PSU
1 and 23 and 4
0
nvidia-smi -a
GPU 0:Product Name : GeForce GTX 295PCI ID : 5eb10deTemperature : 68 C
GPU 1:Product Name : GeForce GTX 295PCI ID : 5eb10deTemperature : 73 C
GPU 2:Product Name : GeForce GTX 480PCI ID : 6c010deTemperature : 106 C
GPU 3:Product Name : GeForce GTX 295PCI ID : 5eb10deTemperature : 90 C
GPU 4:Product Name : GeForce GTX 295PCI ID : 5eb10deTemperature : 91 C
0 and 13 and 4
2
![Page 38: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/38.jpg)
Fermi vs. Teslacudatest: Found 6 devices, driver 2030, runtime 20300(1.3): GeForce GTX 295 1.296 GHz G(939261952) S(16384) C(65536) R(16384) W(32)
l1 o1 c0 h1 i0 m30 a256 M(262144) T(512: 512,512,64) G(65535,65535,1)
tesla: Found 1 devices, driver 3000, runtime 30000(1.3): Tesla C1060 1.296 GHz G(4294770688) S(16384) C(65536) R(16384) W(32)
l1 o1 c0 h1 i0 m30 a256 M(2147483647) T(512: 512,512,64) G(65535,65535,1)
fermi: Found 1 devices, driver 3000, runtime 30000(2.0): Tesla C2050 1.147 GHz G(2817982464) S(49152) C(65536) R(32768) W(32)
l1 o1 c0 h1 i0 m14 a512 M(2147483647) T(1024: 1024,1024,64) G(65535,65535,1)
beta: Found 1 devices, driver 3010, runtime 30100(2.0): Tesla C2050 1.147 GHz G(2817982464) S(49152) C(65536) R(32768) W(32)
l1 o1 c0 h1 i0 m14 a512 M(2147483647) T(1024: 1024,1024,64) G(65535,65535,1)
11: arch=11 make gpu12: arch=12 make gpu (default/best)1x: arch=12 make gpu with -ftz=true -prec-div=false -prec-sqrt=false20: arch=20 make gpu2x: arch=20 make gpu with -ftz=true -prec-div=false -prec-sqrt=false
Flasher f2kmuon
35587.8 13797.034246.7 10990.8
40989.2 13563.240423.2 11514.9
29114.5 11361.427343.8 8760.027346.6 8755.5
29024.8 11429.227631.5 8833.227630.9 8833.328950.1 9079.528955.1 9073.2
![Page 39: IceCube simulation with PPC on GPUs Dmitry Chirkin, UW Madison photon propagation code graphics processing unit](https://reader036.vdocuments.site/reader036/viewer/2022062717/56649e4c5503460f94b4183c/html5/thumbnails/39.jpg)
Other
Consider:
• building production computers with only 2 cards, leaving a space in between• using 6-core CPUs if paired with 3 GPU cards
• 4-way Tesla GPU-only servers a possible solution• Consumer GTX card much faster than Tesla/Fermi cards
GTX 295 was so far found to be a better choice than GTX 480• but: no longer available!
Reliability:• 0.4% loss of advertised capacity in GTX 295 cards• however: 2 of 3 affected cards were “refurbished”• do cards deteriorate over time? The failed MPs did not change in ~3 months