hrl proprietary june 18, 2010 work performed by hrl under darpa contract hrl0011-09-c-001 1 large...

R A D I C A LR A D I C A L HRL PROPRIETARY

June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 1

Large Scale Simulations

HRL Shared Software FrameworkGPU Computing cluster

Narayan Srinivasa Aleksey Nogin



Shared Software Infrastructure

• Infrastructure overview – three aspects:– Legal “limited LGPL” – like agreement– General Public License (GPL) does not permit incorporating HRL code into

proprietary programs. Since the HRL code is a subroutine library, you may consider it more useful to permit linking proprietary applications (which will be our partners code) with the library. This is allowed by LGPL.

– Subversion server for sharing code– The API and the software itself

• Summary of the latest:– Legal agreement is “stuck” on some technicalities and it would take time to

resolve• In the meantime, we will rely on existing subcontracts for HRL<->Sub sharing

– The subversion server “ExRep” is fully operational• Already contains the HRL Shared Infrastructure code.

– The GPU cluster is fully operational– We have ported our infrastructure to GPUs (full 1ms updates!)– Most of the multi-GPU/multi-node code is written

• Some refactoring of the initialization and “glue” code still needed.

Work performed by HRL under DARPA contract HRL0011-09-C-001



HRL Shared Source Agreement

• Terms (reminder):– LGPL-style, but limited to “SyNAPSE Team Members” and “SyNAPSE purposes”

only– “Shared Source” code can be modified and redistributed to any “SyNAPSE Team

Members”– Object code have to be accompanied by source – or source can be placed in the

Subversion repository– Code for separate pieces that only use the “shared source” infrastructure through its

APIs does not have to become a part of “shared source”• You do not have to release your models to “shared source”

• Currently “stuck” on export restriction technicalities– Would take time to resolve

• Unfortunately our legal turnaround is very slow

– For now, we will rely on existing subcontracts for 2-way HRL ↔ Sub sharing• Disable Sub ↔ Sub sharing not covered by subcontracts:

– Provide a Shared area with read-only access to not-HRL people– Separate areas for those who want to share with HRL




Subversion Repository

• The subversion server “ExRep” is fully operational– Already contains the HRL Shared Infrastructure code.

• You have to agree to ExRep Term and Conditions to get access– This is not SyNAPSE-specific and separate from subcontracts and Shared

Source Agreement

– Agreement binds you as an ExRep user, not your Institution• E.g. you promise not to share your account credentials with others

– Aleksey emailed all prospective users a copy of the Agreement• You need to send Aleksey an email stating that you agree.

• SSH public keys are used to grant access– Aleksey have emailed all prospective users instructions

– You need to email Aleksey a copy of your public key

• ExRep is capable of sending email notifications for all commits– We are waiting on IT to allow outgoing emails to non-HRL accounts




GPU-Based High Performance Computing Cluster

HRL has purchased a high-performance computing cluster at no cost to DARPA– SyNAPSE project will be the primary user– Head node:

• 2of: NVIDIA Tesla C1060 GPUs, each with:– 933 GFLOP peak performance– 4GB of GDDR3 memory, at 102 GB/sec– PCIe 2.0 x16 interconnect (16 GB/sec)

• 48GB RAM• 2 of: 4-core Nehalem 2.66 Ghz CPUs (64-bit)• 11TB HDDs (RAID configuration – 8.5TB usable)

– 91 compute nodes, each:• 2 of: NVIDIA Tesla M1060 GPUs• 12 GB RAM• 2 of: 4-core Nehalem 2.26 Ghz CPUs (64-bit)

– Hi-speed 20Gbps InfiniBand Interconnect– 1Gbps Ethernet switch

The cluster is now fully operational



GPU Cluster – InfiniBand Fabric

96-port fast InfiniBand fabric

36-portSwitch

…

16 compute nodes (20Gbps each)

36-portSwitch

…16 com

pute nodes

(20Gbps each)36-port

Switch

…

16 c

ompu

te n

odes

(20G

bps

each

)

36-portSwitch

…

16 compute nodes

(20Gbps each)

36-portSwitch

…

16 compute nodes (20Gbps each)

36-portSwitch

…

16 c

ompu

te n

odes

(20G

bps

each

)

• Switches run at 40Gbps

• Interface cards run at 20Gbps.

• Each 2 switches connected at 160Gbps

4x40Gbps

4x40

Gbp

s

4x40Gbps

4x40Gbps

4x40Gbps 4x40Gbps



GPU and multi-GPU code

• We have ported our infrastructure to GPUs – Full 1ms updates, do not have to rely on UCI 1s batching

• A closer match to CPU simulations and hardware

– Do not implement axonal delays– Artificial “80%/20%” uniformly connected network:

• 105 neurons 107 synapses @ 10Hz – runs in real time

– A 2D 2-layer random Gaussian connectivity network:• 0.3*105 neurons 0.8*107 synapses @10Hz – 3.2x faster than real time

– Generic experiment code runs the same on CPU/GPU based on a compilation flag in a configuration file.

• We have mostly implemented an MPI-based framework:– Running on multiple GPUs, multiple CPUs, or even a mix of the two– Initialization code needs to be rewritten to work with MPI– The API for specifying the experiments need to be updated to work

with the new code.


June 18, 2010 Work performed by HRL under DARPA contract HRL0011-09-C-001 11Work performed by HRL under DARPA contract HRL0011-09-C-001

Shared Simulation & Experimentation Infrastructure

For each experiment, a custom binary is compiled, with 4 components:

Code Glue

Network Creating a description of the neural network to be simulated (connectivity, parameters, etc)

• PyNN style C++ API

• Translation code

Inputs Generating the input signals for the network, or:

Taking the input signals from the virtual environment

C++ API

Computation Simulating the spiking neural network on a CPU, GPU, or a cluster; may have experiment-specific compilation options

• C++ API• Build scripts

Analysis Printing experiment-specific and generic statistics during the simulation; saving synaptic weights and/or spike trains for off-line analysis.

C++ APIs:• On-line• Off-line.

• Portions of the code will be experiment-specific• Portions of the code will be provided by the shared infrastructure



Neural Networks – Levels of Flexibility

Currently we support three different levels of flexibility:– Per-simulation – compile-time switches and compile-time global

constants defined in build scripts (including “experiment definition files”). Fastest and most efficient, least flexible.

– Per-neuron – including defining properties of synapses as a property of pre- or post-synaptic neurons.

– Per-synapse – memory-intensive, would like to avoid.

In general, would prefer to have the least flexibility that we can get away with.

Simulator may support features that are not (yet?) expected to be included in hardware, but we have to be careful.



Neural Model Flexibility

Per-simulation Per-neuron Per-synapse

Neuron model LIF or Izhikevich Izhikevich a,b,c,d

Synapse model • Enabled or not:– Inhibitory STDP– Short-term plasticity– Weighted STDP

• Output: instant. or exp. decay.

• Parameters:– STDP: A+,A-,t+,t-– Max weight– STP, Inh STDP, etc

• Whether or not:– Plastic (post &

pre – plastic when both say “yes”)

– Inhibitory (pre)• Parameters:

– Will be made per-neuron as needed

Axonal delays

(may decide not to support)

External inputs • Spike trains (“dummy neurons”)

• Current injection

Off-line data What to collect Spike trains (new) Synaptic weights

Outputs Spike trains



ComputeExecute simulation

steps

ComputeExecute simulation

steps

CPUCPU CUDACUDA

Users extend, if neededUsers extend, if needed

API Overview

BuildNetworkIncremental construction of neural networks

BuildNetworkIncremental construction of neural networks

NetworkImmutable portion of the network state (connectivity, parameters)

NetworkImmutable portion of the network state (connectivity, parameters)

StateMutable portion of the network state (weights, statistics)

StateMutable portion of the network state (weights, statistics)

User’s code for constructing a network

User’s code for constructing a network Statistics

At regular interval – save data for future analysis, print basic stats

StatisticsAt regular interval – save data for future analysis, print basic stats

Users extend, if neededUsers extend, if needed

InputGenCall-back functions to fill in input spike trains and/or currents

InputGenCall-back functions to fill in input spike trains and/or currents

ExperimentControls the computation

ExperimentControls the computation

MainMainVirtual Environment

(optional)

API/control dependencies, not data flow



Building networks incrementally.API Fragments (Simplified)

struct NeuronKind {struct NeuronKind SetInhibitory(bool inhibotory = true);NumberGen a, b, c, d; // Izhikevich parameters – constant, or probability distribution parameters

}

class BuildNetwork { // Add a new set of neurons to the networkPopulation NewPopulation (int size, NeuronKind & neuron);

};

struct SynapseKind {

NumberGen weight; // Initial weight

NumberGen delay; // Axonal delay

}

class Population { // New synapses - to a different populations. Return the number of synapses

int ConnectFull(NeuronPopulation& to, SynapseKind & synapse);

int Connect1to1(NeuronPopulation& to, SynapseKind & synapse);

int ConnectRandom(NeuronPopulation& to, float probability, SynapseKind & synapse);

int ConnectGauss(NeuronPopulation& to, float max_probability, float expected_inputs, SynapseKind);

int ConnectFixedPreNum(NeuronPopulation& to, float n, const SynapseKind & synapse);

}




Building networks incrementally.Example

BuildNetwork build;

Population excitatory = build.NewPopulation (800, NeuronKind());

Population inhibitory = build.NewPopulation(200, NeuronKind().SetInhibitory());

excitatory.ConnectRandom(excitatory, 0.2); // E -> E

excitatory.ConnectRandom(inhibitory, 0.2); // E -> I

inhibitory.ConnectRandom(excitatory, 0.2); // I -> E

inhibitory.ConnectRandom(inhibitory, 0.2); // I -> I




Overview of the Shared Framework Code

• Major components:– The simulator and related glue code (meant to be immutable)– mk/config file - selects the main parts:

• Which experiment to run (some experiments have variants)• Which computation engine to use (cpu or cuda)• Which communication engine to use (null or mpi)

– Experiment-definition file (roughly one per experiment):• Defines the per-simulation parameters• Specifies which files contain the experiment code modules

– Experiment code:• May be split into several files• Pieces of an experiment code can be reused in different experiments

– Analyzers for off-line data analysis• Generic and experiment-specific

• Code in ExRep contains the complete simulator, and several sample experiments and analyzers.



ExRep Directory Structure

• ExRep repository root: svn+ssh://[email protected]/exrep/– Note: you do not have access to ExRep root, only to some

particular subdirectories• Many versions of Subversion have a problem with it, make sure to use

svn version 1.6.11 – this latest version fixes some bugs related to this scenario.

– SyNAPSE area in ExRep:

…/CRAD/SyNAPSE/• SyNAPSE Shared Area – a subdirectory of …/SyNAPSE

…/Code/Shared– Mentioned by name in the Shared Source Agreement– Right now you’ll only get read access – and only to this subdirectory

• Other subdirectories of SyNAPSE directory will be created as needed



SyNAPSE Shared Framework Directory Structure

Under svn+ssh://[email protected]/exrep/CRAD/SyNAPSE/Code/Shared/– The OMake subdirectory contains some generic build scripts for the OMake Build

Tool – CUDA, MPI, etc.

– The Sim subdirectory contains the framework itself, with subdirectories:• hrlsim – core framework, C++ code and headers

– hrlsim/config.h – generated by the build process, summarizes all the per-simulation parameters (with comments) – more on next slide

• mk – core framework, build scripts– mk/config – global configuration file for the build (not in ExRep, will be created on

first invocation of the build tool)– mk/compute-consts.om – default simulation parameters

• sample_exp – sample simulation experiments and helper/template code– …/mk/*.exp – experiment definition files– …/src/ – C++ source files for experiments:

» setup a network, generate inputs, print extra statistics in on-line mode

– …/analyzers/ – off-line analysis templates and samples (C++)– …/scripts/ – shell/Python scripts for follow-up analysis and visualization

• Data – directory for temporary off-line data (weights, spikes, etc).



Running an Existing Experiment

• Once:– Download the OMake Build Tool from http://omake.metaprl.org/

• We will probably need to release an updated version soon– Go to Sim directory– Run “omake” – this will create a default mk/config file

• Edit the mk/config file– It has several configuration variables, each fully commented

• Which experiment, which computation engine, initial RNG seed, etc– The file is re-created by OMake on every run

• Only value changes for existing variables are allowed/preserved– The list of valid experiments is generated from experiment definition files

(sample_exp/mk/*.exp)• Run “omake” to build the custom simulator

– Generates the ./sim or ./sim-cuda binary– Will generate hrlsim/config.h in the process

• Useful summary of per-simulation parameters– Will also build all applicable analyzers

• Run the custom simulator “./sim N” (or “./sim-cuda N”)– Where “N” is the simulation duration in virtual seconds– “N” can be omitted when the experiment definition file gives a default duration



Defining a New Experiment

• Create a Sim/private_exp directory– With the subdirectories following the structure of the sample_exp

• Create a new experiment definition file– Needs to go into Sim/private_exp/mk/– With a .exp extension– Use an existing sample file as a template

• Create the C++ code– Needs to go into Sim/private_exp/src/– The experiment definition file should list all the .cpp files you are

using – from either private_exp/src of sample_exp/src

• Proceed as described in the previous slide– After you create your new experiment definition file and run “omake”

for the first time, the list of available experiments in mk/config will include your new experiment

hrl proprietary june 18, 2010 work performed by hrl under darpa contract hrl0011-09-c-001 1 large...

Documents