Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

Infiniband for the Masses
ClusterMonkey ^ | 27 January 2008 | Jeff Layton

Posted on 01/30/2008 5:09:48 AM PST by ShadowAce

The Linux cluster world is moving towards InfiniBand for many reasons: bandwidth, latency, message rate, N/2, price/performance, and other factors that affect performance and price. But usually it's focused on larger systems, many times greater 64 nodes up to multiple thousand nodes. At that same time the reasons for moving to InfiniBand are still valid for smaller clusters, particularly performance, but the economics are not. Basically InfiniBand is just too expensive for smaller systems and usually does not make sense from a price/performance perspective. But that has just changed...

The Rise of InfiniBand

InfiniBand has made a remarkable rise in performance since inception. Just a few years ago, Single Data Rate (SDR) InfiniBand was the standard. SDR has a 10Gbit/s signaling rate and about a 8Gbit/s data rate (recall that GigE is 1GBit/s for signaling and data). Coupled with these high bandwidths was a much lower latency and CPU overhead. The performance of InfiniBand was a very attractive feature that attracted cluster people to it like moths to a flame.

The very first InfiniBand products were pricey. Shortly thereafter, the price started to drop to the point where you could get SDR InfiniBand for less than $1,500 a node (includes the HCA or IB card, cable, and switch port costs). Sometimes you could get it for less than $1,000 a node. In short order it became a much selected interconnect for clusters.

Not long after SDR was out, Double-Data Rate (DDR) InfiniBand came out. DDR InfiniBand has a 20Gbit/s signaling rate and about a 16Gbit/s data rate. Basically you had twice the bandwidth of SDR. In conjunction with the bandwidth increase was a drop in latency. Initially DDR was priced just a bit above SDR, but quickly DDR was priced the same as SDR. So now you could get twice the bandwidth and lower latency compared to SDR for less than $1,200 a node. Consequently, SDR all but disappeared.

Recently Mellanox has announced that Quad-Data Rate (QDR) InfiniBand silicon for the HCA's was available and silicon for QDR switches would be available soon. QDR InfiniBand now has a signal rate of 40Gbit/s and a data rate of about 32Gbit/s. You should start to see QDR HCA's and switches for purchase in late Q3 or Q4 of this year.

Overall InfiniBand provides performance benefits to many applications including those that use MPI and also the those used in the traditional data centers such as Oracle, VMWare, financial etc. The ever-growing demands for compute capabilities for those applications drive the growth of InfiniBand.

A Quick Network Comparison

As you are probably aware of, the network can have a big impact on code performance, particularly if you are running parallel codes that use MPI (or God help you - PVM). Table One below lists some common publicly reported interconnect characteristics for GigE, low-latency GigE, 10GigE, SDR InfiniBand (two flavors), and DDR InfiniBand.

Table One - Common Network Characteristics

Network

Latency
(microseconds)
Bandwidth
(MBps)
N/2
(bytes)
GigE

~29-120 ~125 ~8,000
Low Latency GigE: GAMMA ~9.5
(MPI)

~125 ~7,600
10 GigE: Chelsio
(Copper)
9.6 ~862

~100,000+
Infiniband: Mellanox SDR Infinihost (PCI-X) 4.1 760 512

InfiniBand: Mellanox Infinihost III EX SDR 2.6 938 480
InfiniBand: Mellanox Infinohost III EX DDR 2.25 1502 480
Infiniband: Mellanox ConnectX DDR PCIe Gen2 1

1880 256


I don't want to cover the details of these characteristics in this article (here's an article that might help despite it's age). You can see from the table that SDR InfiniBand is still much better than GigE, low-latency GigE, or even 10GigE.

The Rise of SDR InfiniBand

IB is expensive for smaller clusters because the HCA's are fairly expensive and most of the time, the smallest switch you could buy had 24-ports. So if you only had, let's say, 4 to 8 nodes, than the per node cost for the switch was just too high (a factor of 3-4 compared to 24 nodes). But on the application performance side, smaller clusters could use InfiniBand, particularly as the number of cores per node increases. The smaller clusters don't necessarily need to huge bandwidth that DDR InfiniBand offers and many times don't need the extremely low latency of DDR InfiniBand. The bandwidth and latency of SDR InfiniBand will greatly help the applications. But InfiniBand is has always been considered too expensive. Until now.

Mellanox and Colfax International have teamed up to bring back SDR but at a price point that makes it extremely attractive for small clusters. At this point you're saying "Shut up and tell me the prices!" As I tell my children, "Just relax" but I usually end up with something thrown in my general direction. Since I don't want anyone to thrown things at me, let's go over the prices. BTW - the website with all of the prices is here.

Note: The HCA listed in Table Two does not seem to have recent public benchmark data available. Therefore, actual performance may differ from that shown in Table One.



Table Two - SDR Infiniband Pricing from Colfax

Product
Price ($)
without shipping
Colfax Product Description/Part Number

SDR HCA NIC PCI-Express x4
$125
MHES14-XTC InfiniHost III Lx, Single Port 4X InfiniBand / PCI-Express x4,
Low Profile HCA Card, Memory Free, RoHS (R5) Compliant, (Tiger)
8-port 4X SDR switch
$750
Flextronics ODM model F-X430066, 8 Port 4X SDR InfiniBand switch
24-port 4X 1U SDR Infiniband switch (Unmanaged)
$2,400

Flextronics ODM, 4X SDR InfiniBand switch model F-X430060,
24-port 4X SDR w/ Media Adapter Support, one power supply
0.5 meter SDR cable
$35
MCC4L30-00A 4x microGiGaCN latch, 30 AWG, 0.5 meter

1 meter SDR cable
$39
MCC4L30-001 4x microGiGaCN latch, 30 AWG, 1 meter
2 meter SDR cable
$46

MCC4L30-002 4x microGiGaCN latch, 30 AWG, 2 meters
3 meter SDR cable
$52
MCC4L30-003 4x microGiGaCN latch, 30 AWG, 3 meters
4 meter SDR cable

$58
MCC4L28-004 4x microGiGaCN latch, 28 AWG, 4 meters
5 meter SDR cable
$65
MCC4L28-005 4x microGiGaCN latch, 28 AWG, 5 meters

6 meter SDR cable
$86
MCC4L24-006 4x microGiGaCN latch, 24 AWG, 6 meters
7 meter SDR cable
$93

MCC4L24-007 4x microGiGaCN latch, 24 AWG, 7 meters
8 meter SDR cable
$99
MCC4L24-008 4x microGiGaCN latch, 24 AWG, 8 meters



So let's do a little math. Table Three below has the InfiniBand prices for 8 nodes.

Table Three - 8 nodes with SDR InfiniBand

HCA
Price ($) without shipping
HCA's (8 of them) $1,000
8-port SDR switch $750
1 meter CX-4 cables (8 of them) $280
Total $2,030

Price Per Node $253.75



So if you buy SDR InfiniBand for 8 nodes you will pay less than $255 a node! (without shipping of course).

Let's do the same thing for a 24 node SDR cluster

Table Four - 24 nodes with SDR InfiniBand

HCA
Price ($) without shipping
HCA (24 of them) $3,000
24-port SDR switch $2,400
1 meter CX-4 cables (24 of them) $840
Total $6,240

Price Per Node $260.00



The price is slightly higher than for 8-ports because of the switch costs. I'm not sure about you, but this is a fantastic price and is moving down in the general direction of GigE! (Well, not quite, but it's getting there!)

How do I Get Me Some of That?

Ordering SDR InfiniBand at these prices is easy. Colfax International has set up a webpage that allows you to order on-line! Just go to the page and place your order. If you need large quantities or special arrangements please send an email to sales( you know what to put here) colfaxdirect.com.

Please Note: ClusterMonkey or any of its authors have no financial interest in Colfax International. We just like cheap hardware.

To Infinity and Beyond!

I hate to end in a Buzz Light-year quote, but it seems somewhat appropriate. For smaller clusters you usually had to rely on GigE as the interconnect. Now you can afford to add SDR InfiniBand to these systems without it being too expensive. So this means we now get a big boost in performance on these smaller systems (including the one in my basement! Woo! Hoo!). Now we can truly begin to think outside the box or more like outside the server room.

We can start thinking about adding a parallel file system to these smaller clusters or even think about exporting NFS over native IB protocols from the master node. Also don't forget that you can run TCP over IB. (See the The OpenFabrics Alliance for the complete software stack.) Even with SDR InfiniBand you will get much faster TCP performance over IB than GigE. So you can start thinking about applications or places were GigE limits performance (anyone wants to play multi-player games using IPoIB?).


Jeff Layton is having way too much fun writing this article, proving that it's hard to keep a good geek down. When he's not creating havoc in his household, he can be found hanging out at the Fry's coffee shop (never during working hours) and admiring the shiny new CPUs that come in, and cringing when someone buys Microsoft Vista.


TOPICS: Computers/Internet
KEYWORDS: cluster; infiniband

1 posted on 01/30/2008 5:09:50 AM PST by ShadowAce
[ Post Reply | Private Reply | View Replies]

To: rdb3; Calvinist_Dark_Lord; GodGunsandGuts; CyberCowboy777; Salo; Bobsat; JosephW; ...

Another thing I did not see mentioned in this article is that the IB cables are changing. Intel is creating some Fibre-Optic IB cables that are supposed to be much better than the standard copper cables.

2 posted on 01/30/2008 5:11:05 AM PST by ShadowAce (Linux -- The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Intel’s new strategic roadmap for 2008 - 2009 has them moving into some new realms on chipset architecture as well. Specifically they’re doing away with the FSB on mobos and going to a universal interconnect at 32 GT/s.

The Infiniband stuff is pretty remarkable. The last company I was with had a 10 node implementation that was just unreal. It handled universal system communications worldwide. This is good news, for sure.


3 posted on 01/30/2008 6:01:54 AM PST by rarestia ("One man with a gun can control 100 without one." - Lenin / Molwn Labe!)
[ Post Reply | Private Reply | To 2 | View Replies]

To: rarestia
The Infiniband stuff is pretty remarkable.

It is, but the current copper IB is a PITA to work with. I've installed/replaced IB cables on systems from 40 nodes to over 1200 nodes. That stuff, while blazingly fast, is a pain as the curve radius allowed is quite large.

4 posted on 01/30/2008 6:07:14 AM PST by ShadowAce (Linux -- The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 3 | View Replies]

To: ShadowAce

Wow. I don’t speak geek. What does this mean for mr. average joe computer guy?


5 posted on 01/30/2008 6:05:31 PM PST by Big Giant Head (I should change my tagline to "Big Giant penguin on my Head")
[ Post Reply | Private Reply | To 2 | View Replies]

To: Big Giant Head

To be honest, not a whole lot. But if you decide to network two or more together and want *really* fast connection speeds (like you’re putting together a cluster), then this is the way to go.


6 posted on 01/31/2008 5:14:13 AM PST by ShadowAce (Linux -- The Ultimate Windows Service Pack)
[ Post Reply | Private Reply | To 5 | View Replies]

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson