FPGA Serial Channels

Aus PC2 Doc
Wechseln zu: Navigation, Suche

For the general FPGA usage on Noctua visit the main documentation page.

All FPGA boards on Noctua offer 4 point-to-point connections to other FPGA boards when the node is configured with a p520_max_sg280l BSP. From the OpenCL environment, these are used as external serial channels. A status reg value of 0xfff11ff1 in the diagnose indicates an active connection. The topologies of these connections are fully configurable with each job allocation. There is a number of predefined topologies that can be selected with a shorthand notation like --fpgalink="pair", or you can provide a series of individual connection descriptions like --fpgalink="n00:acl0:ch0-n01:acl0:ch0".

Editor with GUI

The required notation to configure custom point-to-point connections can be generated with the FPGA-Link GUI.

Custom topologies

The notation nXX:aclY:chZ describes a unique serial channel endpoint within a job allocation according to the following pattern

  • nXX, e.g. n02 specifies the node ID within your allocation, starting with n00 for the first node, n02 will specify the third node of your allocation. You can not use higher node IDs than the number of nodes requested by the allocation. At allocation time, the node ID is translated to a concrete node name, e.g. fpga-0008.
  • aclY, e.i. acl0 and acl1 describe the first and second FPGA board within each node.
  • chZ, e.i. ch0, ch1, ch2 and ch3 describe the 4 external channel connections for each board.

By specifying one unique pair of serial channel endpoints per --fpgalink argument, an arbitrary topology can be created within a job allocation. When the task starts, the topology will be summarized and for each fpgalink, an environment variable will be exported.

The following example uses one node n00 and connects all four channels from the first FPGA board acl0 to the four channels of the second FPGA board acl1 (see figure). The custom topology example can be directly used in the FPGA-Link GUI using this link.

Custom.svg
srun -A pc2-mitarbeiter --constraint=19.2.0_max -N 1 --fpgalink="n00:acl0:ch0-n00:acl1:ch0" --fpgalink="n00:acl0:ch1-n00:acl1:ch1" --fpgalink="n00:acl0:ch2-n00:acl1:ch2" --fpgalink="n00:acl0:ch3-n00:acl1:ch3" -p fpga --pty bash
...
Summarizing most recent topology information and exporting FPGALINK variables:
Host list
fpga-0004
Generated connections
FPGALINK0=fpga-0004:acl0:ch0-fpga-0004:acl1:ch0
FPGALINK1=fpga-0004:acl0:ch1-fpga-0004:acl1:ch1
FPGALINK2=fpga-0004:acl0:ch2-fpga-0004:acl1:ch2
FPGALINK3=fpga-0004:acl0:ch3-fpga-0004:acl1:ch3

We recommend using srun and sbatch, because this information is not automatically shown when using salloc (the configuration itself still works). When using salloc, you can still recover the information and setup your environment variables by invoking

source /opt/cray/slurm/default/etc/scripts/SAllocTopologyInfo.sh

Predefined topologies

As it can be tedious and error-prone to define each connection manually, we also provide a set of predefined topologies to be requested. The following table summarizes the available options.

Topology type Invokation Min-Max number of nodes Brief description
pair --fpgalink="pair" 1-N Pairwise connect the 2 FPGAs within each node
clique --fpgalink="clique" 2 All-to-all connection for 2 nodes, 4 FPGAs
ring --fpgalink="ringO" 1-N Ring with two links per direction, acl0 down, acl1 up
--fpgalink="ringN" 1-N Ring with two links per direction, acl0 down, acl1 down
--fpgalink="ringZ" 1-N Ring with two links per direction, acl0 and acl1 neighbors
torus --fpgalink="torus2" 1-N Torus with 2 FPGAs per row
--fpgalink="torus3" 2-N Torus with 3 FPGAs per row
--fpgalink="torus4" 2-N Torus with 4 FPGAs per row
--fpgalink="torus5" 3-N Torus with 5 FPGAs per row
--fpgalink="torus6" 3-N Torus with 6 FPGAs per row

Pair topology

Within each node, all channels of one FPGA board are connected to the respective channel of the other FPGA board. No connections between nodes are made.

The following example uses three nodes n00-n02 and connects within each node all four channels from the first FPGA board acl0 to the four channels of the second FPGA board acl1 (see figure). The pair topology example can be directly used in the FPGA-Link GUI using this link.

Pair.svg
srun -p fpga -A pc2-mitarbeiter --constraint=19.2.0_max -N 3 --fpgalink=pair --pty bash
...
Summarizing most recent topology information and exporting FPGALINK variables:
Host list
fpga-0001
fpga-0002
fpga-0003
Pair topology
Generated connections
FPGALINK0=fpga-0001:acl0:ch0-fpga-0001:acl1:ch0
FPGALINK1=fpga-0001:acl0:ch1-fpga-0001:acl1:ch1
FPGALINK2=fpga-0001:acl0:ch2-fpga-0001:acl1:ch2
FPGALINK3=fpga-0001:acl0:ch3-fpga-0001:acl1:ch3
FPGALINK4=fpga-0002:acl0:ch0-fpga-0002:acl1:ch0
FPGALINK5=fpga-0002:acl0:ch1-fpga-0002:acl1:ch1
FPGALINK6=fpga-0002:acl0:ch2-fpga-0002:acl1:ch2
FPGALINK7=fpga-0002:acl0:ch3-fpga-0002:acl1:ch3
FPGALINK8=fpga-0003:acl0:ch0-fpga-0003:acl1:ch0
FPGALINK9=fpga-0003:acl0:ch1-fpga-0003:acl1:ch1
FPGALINK10=fpga-0003:acl0:ch2-fpga-0003:acl1:ch2
FPGALINK11=fpga-0003:acl0:ch3-fpga-0003:acl1:ch3
Topology configuration request accepted after 0.297791957855s

Clique topology

Within a pair of 2 nodes, each of the 4 FPGAs is connected to all 3 other FPGAs. Channel 0: to the same FPGA in the other node; channel 1: to the other FPGA in the same node; channel 2: to the other FPGA in the other node.

The following example uses three nodes n00-n02 and connects within each node all four channels from the first FPGA board acl0 to the four channels of the second FPGA board acl1 (see figure). The clique topology example can be directly used in the FPGA-Link GUI using this link.

Clique.svg
srun -p fpga -A pc2-mitarbeiter --constraint=19.2.0_max -N 2 --fpgalink=clique --pty bash
...
Summarizing most recent topology information and exporting FPGALINK variables:
Host list
fpga-0013
fpga-0014
Clique topology
Generated connections
FPGALINK0=fpga-0013:acl0:ch0-fpga-0014:acl0:ch0
FPGALINK1=fpga-0013:acl1:ch0-fpga-0014:acl1:ch0
FPGALINK2=fpga-0013:acl0:ch1-fpga-0013:acl1:ch1
FPGALINK3=fpga-0014:acl0:ch1-fpga-0014:acl1:ch1
FPGALINK4=fpga-0013:acl0:ch2-fpga-0014:acl1:ch2
FPGALINK5=fpga-0013:acl1:ch2-fpga-0014:acl0:ch2
FPGALINK6=fpga-0013:acl0:ch3-fpga-0014:acl1:ch3
FPGALINK7=fpga-0013:acl1:ch3-fpga-0014:acl0:ch3

Ring topology

This setup puts all FPGAs in a ring topology that defines for each FPGA the neighbor FPGAs "north" and "south". It connects each FPGA's channels 0 and 2 to the "north" direction and channels 1 and 3 to the "south" direction. Thus, the local perspective for each node within the topology is

// local view from FPGA "local" to neighbors "north" and "south"
// ch0 and ch2 connect to neighbor "north"
local:ch0 <-> north:ch1
local:ch2 <-> north:ch3
// ch1 and ch3 connect to neighbor "south"
local:ch1 <-> south:ch0
local:ch3 <-> south:ch2

Three different variants define how the FPGAs are arranged into the ring

// --fpgalink="ringO"
// ringO, going down in acl0 column and back up in acl1 column
// Column from north to south, end connected back to start
fpga-0001:acl0
fpga-0002:acl0
fpga-0003:acl0
fpga-0004:acl0
fpga-0004:acl1
fpga-0003:acl1
fpga-0002:acl1
fpga-0001:acl1
// --fpgalink="ringN"
// ringN, going down in acl0 column then down in acl1 column
// Column from north to south, end connected back to start
fpga-0001:acl0
fpga-0002:acl0
fpga-0003:acl0
fpga-0004:acl0
fpga-0001:acl1
fpga-0002:acl1
fpga-0003:acl1
fpga-0004:acl1
// --fpgalink="ringZ"
// ringZ, going down through nodes, zigzaging between acl0 and acl1
// Column from north to south, end connected back to start
fpga-0001:acl0
fpga-0001:acl1
fpga-0002:acl0
fpga-0002:acl1
fpga-0003:acl0
fpga-0003:acl1
fpga-0004:acl0
fpga-0004:acl1

Full example for a ringO with 4 nodes. See this example in the FPGA-Link GUI using this link.

RingO.svg
srun -p fpga -A pc2-mitarbeiter --constraint=19.2.0_max -N 4 --fpgalink=ringO --pty bash
Summarizing most recent topology information and exporting FPGALINK variables:
Host list
fpga-0009
fpga-0010
fpga-0011
fpga-0012
Ring topology information: column from north to south, end connected back to start
fpga-0009:acl0
fpga-0010:acl0
fpga-0011:acl0
fpga-0012:acl0
fpga-0012:acl1
fpga-0011:acl1
fpga-0010:acl1
fpga-0009:acl1
Generated connections
FPGALINK0=fpga-0009:acl0:ch1-fpga-0010:acl0:ch0
FPGALINK1=fpga-0009:acl0:ch3-fpga-0010:acl0:ch2
FPGALINK2=fpga-0010:acl0:ch1-fpga-0011:acl0:ch0
FPGALINK3=fpga-0010:acl0:ch3-fpga-0011:acl0:ch2
FPGALINK4=fpga-0011:acl0:ch1-fpga-0012:acl0:ch0
FPGALINK5=fpga-0011:acl0:ch3-fpga-0012:acl0:ch2
FPGALINK6=fpga-0012:acl0:ch1-fpga-0012:acl1:ch0
FPGALINK7=fpga-0012:acl0:ch3-fpga-0012:acl1:ch2
FPGALINK8=fpga-0012:acl1:ch1-fpga-0011:acl1:ch0
FPGALINK9=fpga-0012:acl1:ch3-fpga-0011:acl1:ch2
FPGALINK10=fpga-0011:acl1:ch1-fpga-0010:acl1:ch0
FPGALINK11=fpga-0011:acl1:ch3-fpga-0010:acl1:ch2
FPGALINK12=fpga-0010:acl1:ch1-fpga-0009:acl1:ch0
FPGALINK13=fpga-0010:acl1:ch3-fpga-0009:acl1:ch2
FPGALINK14=fpga-0009:acl1:ch1-fpga-0009:acl0:ch0
FPGALINK15=fpga-0009:acl1:ch3-fpga-0009:acl0:ch2

Torus topology

This setup puts all FPGAs in a torus topology that defines for each FPGA the neighbor FPGAs "north", "south", "west", "east". It connects each FPGA's channel 0 to the "north" direction, channel 1 to the "south" direction, channel 2 to the "west" direction and channel 3 to the "east" direction. Thus, the local perspective for each node within the topology is

// local view from FPGA "local" to neighbors "north", "south", "west", "east"
// ch0 connects to neighbor "north"
local:ch0 <-> north:ch1
// ch1 connects to neighbor "south"
local:ch1 <-> south:ch0
// ch2 connects to neighbor "west"
local:ch2 <-> west:ch3
// ch3 connects to neighbor "east"
local:ch3 <-> east:ch2

The torus topology can be instantiated with a configurable width, that is number of FPGAs that are connected in "west-east" direction. With an uneven width, FPGAs in the same node can belong to consecutive rows of the torus. The number of FPGAs gets rounded down to the biggest full torus for the given width. The following block illustrates 3 different torus topologies on nodes fpga-[0001-0005]

// --fpgalink="torus2"
// Torus with width 2 and height 5
// Columns from north to south, rows from west to east, end connected back to start
fpga-0001:acl0 - fpga-0001:acl1
fpga-0002:acl0 - fpga-0002:acl1
fpga-0003:acl0 - fpga-0003:acl1
fpga-0004:acl0 - fpga-0004:acl1
fpga-0005:acl0 - fpga-0005:acl1
 
// --fpgalink="torus3"
// Torus with width 3 and height 3
// Columns from north to south, rows from west to east, end connected back to start
fpga-0001:acl0 - fpga-0001:acl1 - fpga-0002:acl0
fpga-0002:acl1 - fpga-0003:acl0 - fpga-0003:acl1
fpga-0004:acl0 - fpga-0004:acl1 - fpga-0005:acl0
 
// --fpgalink="torus4"
// Torus with width 4 and height 2
// Columns from north to south, rows from west to east, end connected back to start
fpga-0001:acl0 - fpga-0001:acl1 - fpga-0002:acl0 - fpga-0002:acl1
fpga-0003:acl0 - fpga-0003:acl1 - fpga-0004:acl0 - fpga-0004:acl1

Full example for a torus4 with 8 nodes. See this example in the FPGA-Link GUI using this link.

Torus4.svg
srun -p fpga -A pc2-mitarbeiter --constraint=19.2.0_max -N 8 --fpgalink=torus4 --pty bash
...
Summarizing most recent topology information and exporting FPGALINK variables:
Host list
fpga-0001
fpga-0002
fpga-0003
fpga-0004
fpga-0005
fpga-0006
fpga-0007
fpga-0008
Torus topology with width 4 and height 4
Torus topology information: columns from north to south, rows from west to east, end connected back to start
fpga-0001:acl0 - fpga-0001:acl1 - fpga-0002:acl0 - fpga-0002:acl1
fpga-0003:acl0 - fpga-0003:acl1 - fpga-0004:acl0 - fpga-0004:acl1
fpga-0005:acl0 - fpga-0005:acl1 - fpga-0006:acl0 - fpga-0006:acl1
fpga-0007:acl0 - fpga-0007:acl1 - fpga-0008:acl0 - fpga-0008:acl1
Generated connections
FPGALINK0=fpga-0001:acl0:ch1-fpga-0003:acl0:ch0
FPGALINK1=fpga-0001:acl0:ch3-fpga-0001:acl1:ch2
FPGALINK2=fpga-0001:acl1:ch1-fpga-0003:acl1:ch0
FPGALINK3=fpga-0001:acl1:ch3-fpga-0002:acl0:ch2
FPGALINK4=fpga-0002:acl0:ch1-fpga-0004:acl0:ch0
FPGALINK5=fpga-0002:acl0:ch3-fpga-0002:acl1:ch2
FPGALINK6=fpga-0002:acl1:ch1-fpga-0004:acl1:ch0
FPGALINK7=fpga-0002:acl1:ch3-fpga-0001:acl0:ch2
FPGALINK8=fpga-0003:acl0:ch1-fpga-0005:acl0:ch0
FPGALINK9=fpga-0003:acl0:ch3-fpga-0003:acl1:ch2
FPGALINK10=fpga-0003:acl1:ch1-fpga-0005:acl1:ch0
FPGALINK11=fpga-0003:acl1:ch3-fpga-0004:acl0:ch2
FPGALINK12=fpga-0004:acl0:ch1-fpga-0006:acl0:ch0
FPGALINK13=fpga-0004:acl0:ch3-fpga-0004:acl1:ch2
FPGALINK14=fpga-0004:acl1:ch1-fpga-0006:acl1:ch0
FPGALINK15=fpga-0004:acl1:ch3-fpga-0003:acl0:ch2
FPGALINK16=fpga-0005:acl0:ch1-fpga-0007:acl0:ch0
FPGALINK17=fpga-0005:acl0:ch3-fpga-0005:acl1:ch2
FPGALINK18=fpga-0005:acl1:ch1-fpga-0007:acl1:ch0
FPGALINK19=fpga-0005:acl1:ch3-fpga-0006:acl0:ch2
FPGALINK20=fpga-0006:acl0:ch1-fpga-0008:acl0:ch0
FPGALINK21=fpga-0006:acl0:ch3-fpga-0006:acl1:ch2
FPGALINK22=fpga-0006:acl1:ch1-fpga-0008:acl1:ch0
FPGALINK23=fpga-0006:acl1:ch3-fpga-0005:acl0:ch2
FPGALINK24=fpga-0007:acl0:ch1-fpga-0001:acl0:ch0
FPGALINK25=fpga-0007:acl0:ch3-fpga-0007:acl1:ch2
FPGALINK26=fpga-0007:acl1:ch1-fpga-0001:acl1:ch0
FPGALINK27=fpga-0007:acl1:ch3-fpga-0008:acl0:ch2
FPGALINK28=fpga-0008:acl0:ch1-fpga-0002:acl0:ch0
FPGALINK29=fpga-0008:acl0:ch3-fpga-0008:acl1:ch2
FPGALINK30=fpga-0008:acl1:ch1-fpga-0002:acl1:ch0
FPGALINK31=fpga-0008:acl1:ch3-fpga-0007:acl0:ch2

Legacy topology setup

The following setup is completely replaced by the user configurable job specific topologies. It currently is kept here for reference about earlier measurements.

Some FPGA boards are connected with direct point-to-point connections that are abstracted in the OpenCL environment as Serial Channels. A status reg value of 0xfff11ff1 in the diagnose indicates an active connection. Currently available connections documented as

<nodename>:<devicename>:<channelname> 
with 
<nodename>    in fpga-0001 -- fpga-0016
<devicename>  in acl0, acl1
<channelname> in ch0, ch1, ch2, ch3

Islands with 4 FPGAs

fpga-0010 + fpga-0011

// four FPGAs with all-to-all connections
// ch0 realizes vertical connections
fpga-0010:acl0:ch0 <-> fpga-0011:acl0:ch0
fpga-0010:acl1:ch0 <-> fpga-0011:acl1:ch0
// ch1 realizes horizontal connections
fpga-0010:acl0:ch1 <-> fpga-0010:acl1:ch1
fpga-0011:acl0:ch1 <-> fpga-0011:acl1:ch1
// ch2 realizes diagonal connections
fpga-0010:acl0:ch2 <-> fpga-0011:acl1:ch2
fpga-0010:acl1:ch2 <-> fpga-0011:acl0:ch2

fpga-0015 + fpga-0016

// four FPGAs with all-to-all connections
// ch0 realizes vertical connections
fpga-0015:acl0:ch0 <-> fpga-0016:acl0:ch0
fpga-0015:acl1:ch0 <-> fpga-0016:acl1:ch0
// ch1 realizes horizontal connections
fpga-0015:acl0:ch1 <-> fpga-0015:acl1:ch1
fpga-0016:acl0:ch1 <-> fpga-0016:acl1:ch1
// ch2 realizes diagonal connections
fpga-0015:acl0:ch2 <-> fpga-0016:acl1:ch2
fpga-0015:acl1:ch2 <-> fpga-0016:acl0:ch2

Nodes with internal connections of 2 FPGAs

fpga-0012

// four connections from one board to the other
fpga-0012:acl0:ch0 <-> fpga-0012:acl1:ch0
fpga-0012:acl0:ch1 <-> fpga-0012:acl1:ch1
fpga-0012:acl0:ch2 <-> fpga-0012:acl1:ch2
fpga-0012:acl0:ch3 <-> fpga-0012:acl1:ch3

fpga-0013

// four connections from one board to the other
fpga-0013:acl0:ch0 <-> fpga-0013:acl1:ch0
fpga-0013:acl0:ch1 <-> fpga-0013:acl1:ch1
fpga-0013:acl0:ch2 <-> fpga-0013:acl1:ch2
fpga-0013:acl0:ch3 <-> fpga-0013:acl1:ch3

fpga-0014

// two connections from one board to the other
fpga-0014:acl0:ch0 <-> fpga-0014:acl1:ch0
fpga-0014:acl0:ch3 <-> fpga-0014:acl1:ch3


A torus connecting 5 nodes and 10 FPGAs

14 August 2019: Torus updated from 4 to 5 nodes

The topology forms 2 colums that span all nodes and 4 rows that connect the FPGAs within each node.

fpga-0005:acl0 - fpga-0005:acl1
fpga-0006:acl0 - fpga-0006:acl1
fpga-0007:acl0 - fpga-0007:acl1
fpga-0008:acl0 - fpga-0008:acl1
fpga-0009:acl0 - fpga-0009:acl1
// e.g. 
// the "north" neighbor of fpga-0005:acl0 is fpga-0009:acl0 (wrap around)
// the "south" neighbor of fpga-0005:acl0 is fpga-0006:acl0
// the "west" neighbor of fpga-0005:acl0 is fpga-0005:acl1 (wrap around)
// the "east" neighbor of fpga-0005:acl0 is fpga-0005:acl1

The local view of connections as seen from within a local node is as follows.

// local view from FPGA "local" to neighbors "north", "south", "west", "east"
// ch0 connects to neighbor "north"
local:ch0 <-> north:ch1
// ch1 connects to neighbor "south"
local:ch1 <-> south:ch0
// ch2 connects to neighbor "west"
local:ch2 <-> west:ch3
// ch3 connects to neighbor "east"
local:ch3 <-> east:ch2

The complete set of connections is as follows

fpga-0005:acl0:ch1 <-> fpga-0006:acl0:ch0
fpga-0005:acl0:ch3 <-> fpga-0005:acl1:ch2
fpga-0005:acl1:ch1 <-> fpga-0006:acl1:ch0
fpga-0005:acl1:ch3 <-> fpga-0005:acl0:ch2
fpga-0006:acl0:ch1 <-> fpga-0007:acl0:ch0
fpga-0006:acl0:ch3 <-> fpga-0006:acl1:ch2
fpga-0006:acl1:ch1 <-> fpga-0007:acl1:ch0
fpga-0006:acl1:ch3 <-> fpga-0006:acl0:ch2
fpga-0007:acl0:ch1 <-> fpga-0008:acl0:ch0
fpga-0007:acl0:ch3 <-> fpga-0007:acl1:ch2
fpga-0007:acl1:ch1 <-> fpga-0008:acl1:ch0
fpga-0007:acl1:ch3 <-> fpga-0007:acl0:ch2
fpga-0008:acl0:ch1 <-> fpga-0009:acl0:ch0
fpga-0008:acl0:ch3 <-> fpga-0008:acl1:ch2
fpga-0008:acl1:ch1 <-> fpga-0009:acl1:ch0
fpga-0008:acl1:ch3 <-> fpga-0008:acl0:ch2
fpga-0009:acl0:ch1 <-> fpga-0006:acl0:ch0
fpga-0009:acl0:ch3 <-> fpga-0009:acl1:ch2
fpga-0009:acl1:ch1 <-> fpga-0006:acl1:ch0
fpga-0009:acl1:ch3 <-> fpga-0009:acl0:ch2