Why Supercomputing Matters

To your typical IT organization, the Top500 Supercomputing list released twice a year -- while interesting -- has little bearing on today's operations. Grand proclamations and goals, such as reaching Exaflop performance by 2018, also have little impact on the day-to-day goings-on in most data centers. (As quick background info: A FLOP is the number of FLoating Point Operations performed Per Second; an Exaflop is 10¹⁸ or 1,000,000,000,000,000,000 FLOPs.)

While they may not affect you today, such developments are important, because progress at the high-end impacts the low-end and midrange. It wasn't that long ago that the capabilities of today's smartphone equalled those of a supercomputer. Consider Knight's Corner, which Intel showed off at Supercomputing (SC) 11 in Seattle Washington earlier this month. Initially announced in June, Intel unveiled the chip in silicon at the computing show. Although an official general availability date has not yet been set, the chips have made it to silicon and were demoed. A single 22 nm chip delivers 1 teraflop of sustained double-precision performance. That's 1 trillion calculations per second. (If you're looking for a relative sense of size and speed, this infographic offers a clear sense of scale.)

This is not the first time Intel has delivered a 1 Tflop system. Back in in 1997, it debuted ASCI Red at Sandia national Labs -- 9,298 Pentium 2 Xeon processors in servers spanning 72 cabinets consuming 800 Kw of power. Consider that the power of your workstation (or even your smartphone) is as powerful as your typical supercomputer was 15 years ago.

At the other extreme, is Nvidia. In his keynote address on Tuesday, Cofounder, President and CEO Jen-Hsun Huang told how Nvidia was able to harness the capabilities of supercomputing by clustering workstations and commodity servers to get the compute power required to deliver the graphical capabilities needed. Its "disruption at the low-end of the market," has upped the ante on the mainstream, and made it possible to essentially put a supercomputer into a workstation.

Most enterprises, of course, fall somewhere in between Intel's high-end and Nvidia's massively clustered pizza boxes. What impact will supercomuting have on them?

Power Remains the No. 1 Problem

You can purchase the fastest systems and throughput on the market; you can ensure that your servers have high availability and your software is configured just right. But it's all meaningless if your power bill exceeds that of a large city.

As performance scales upward, supercomputers are increasingly feeling power-constrained, which determines to some degree where they can be hosted. Although these limitations are felt most acutely in HPC, they are a universal dilemma for any company whose core business hangs on managing the data in its data warehouse -- i.e., those dependent on big data and the power to bring it to life.

Facebook may be one of the best examples of this. The company doesn't exactly spring to mind when you think of supercomputing, but its core business revolves around big data that must be accessible to users and available for mining. About two-and-a-half years ago, it became apparent that the industry standard servers that the company relied on were not meeting its needs. Amir Michael, a server and data center engineer at Facebook, began leading an effort to customize the servers going into Facebook's newer data centers.

The servers are built from industry standard components. From their form factor to their design, they are designed to be more energy efficient. It was easier to build to a custom design than to fix what is there, Michael explained. The servers Facebook built have larger heat syncs and thus are taller than a typical 1U server (because of this they also sit in a custom-built chassis and rack). They also contain only the necessary components. Plastic bezels and other aesthetic additions were dropped, including the "face plate." This enables air to flow in and out more freely. With these changes, fans are able to operate more efficiently because less air needs to be moved. This reduces total energy per server by as much as 10 percent to 20 percent, Michael explained. The motherboard has also been tweaked to for 92 percent efficiency.

These changes at the server level, as well as modifications at the data center level (e.g, relying on renewable energy or climate conditions of data center locations) have helped keep Facebook's energy costs from spiraling out of control.

Facebook is not the only company dealing with these issues issues. Google, Amazon and other companies for which big data is key to their business are contending with similar challenges. Enter the Open Compute Project, an open source hardware initiative Facebook launched in April and whose members to date include Intel, ASUS, Dell, Mellanox, Red Hat and Cloudera.

Facebook's latest endeavor, a customized storage device, is being built through the open source project. Michael described it as a "general-purpose box that has some unique attributes."

Google and Amazon, while not part of the project, also make use of customized hardware. However, unlike Facebook, they have chosen not to make their specs public.

As big data becomes ever more critical for both HPC and social media, hardware suited the computing needs of these companies will become ever more important. At this point, it is too soon to determine whether customized hardware will become the norm or if the OEMs will adapt to the changed needs.

The OEMS are hardly in the dark when it comes to power management awareness. Faced with the realization that power and capacity will be hitting the wall by 2016, HP developed a low-energy server technology to cut energy, power and space, Glenn Keels, director of marketing, Hyperscale Business Unit, Industry Standard Servers and Software, said. To increase computing power while reigning in energy costs, Project Moonshot was born in early November. It aims to help companies that deliver web services, social media and simple content. HP is seeking partnerships for Project Moonshot via the HP Pathfinder Program.

The initial round of products will be powered by Calxeda's EnergyCore ARM RISC server chip processor and will fall under its Enterprise Server, Storage and Networking line.

Its first offering, the Redstone Server Development Platform, scheduled for release in the first half of 2012, will incorporate more than 2,800 Atom-processer-based servers in a single rack. According to HP, these servers will consume 89 percent less energy and 94 percent less space. They will also be priced 63 percent lower. Keels said he believes the offering will be particularly relevant to the social media space.

Companies will have the opportunity to experiment, test and benchmark applications on the Redstone Server Development Platform, other extreme low-energy platforms and traditional servers when HP Discovery Lab launches in January 2012 Keels said.

HPC Nexus Is Shifting Out of North America

The Supercomputing Conference has traditionally been a technical and research-oriented show. Only in recent years have commercial entities used the show as a platform to showcase their wares. The show has always also had a global feel, with a friendly rivalry over which nation had the most spots on the list. Since the first Supercomputing show in 1993, the centerpiece has been the unveiling of the 500 fastest supercomputing installs. In November 2011, 263, or 53 percent, were in the United States. While this is a slight increase over the 255 back in June, it is a far cry from the 305 in November 2005.

The Asia-Pacific region -- China, in particular -- is on the upswing. In recent years, China's expanded presence on the list is most noteworthy. In this most recent list, it had 74 supercomputer installs on the list. That's 15 percent. This is remarkable griven that in June, it had 61 supercomputers on the list, and a year ago it had 41. Go back to November 2009 and it had 21.

No other nation has seen such rapid growth. Eastern Asia, as a whole, performed strongly with 22 percent system share to North America's 54.4 percent.

Interestingly, the Top 10 systems broke down similarly. Two systems were from Japan, two from China, five from the United States and one from France. Also interesting and revealing is the fact that, "This is the first time since we began publishing the list back in 1993 that the top 10 systems showed no turnover," Top500 editor Erich Strohmaier, noted on Top500 website.

The Top 10 supercomputers ranked in order identical to June 2011. In all cases, however, they performed faster confirming that the bar continues to rise ever higher.

While the list in and of itself is a purely academic, save for vendor marketing, it is a barometer of where innovation is coming from. China's rapid ascent is certainly prevalent, but more important is China's announcement that it is building a supercomputer capable of Petaflop performance (10¹⁵ FLOPs) from the ground up using completely domestic parts, including its homegrown SW1600 chips.

Supercomputing is the apex for cutting-edge computing. Development efforts that go into supercomputing trickle down to mainstream businesses -- and they do so faster with each year. Supercomputing also makes innovation possible on other fronts. If the United States is to remain at the forefront of innovation, similar developmental partnerships and funding must occur.

The bottle neck in computer data management has never really been about processor speed. But about ‘retrieval and storage’ speed.

You can have the fastest processor in the world, but if your data stream runs at the speed of a 386, then the processor is going to be waiting a long time before it can even begin any calculations or data handling.

Even ‘solid state’ drives aren’t adequate enough,in terms of speed, and there useful life span, and replacement cost, isn’t much of a help either.

They need to develop a better way to access the data.

Note that the equivalence between ‘exa’ and ‘trillion’ is based on this article probably being written in the U.K. The British use the “long scale” where a ‘trillion’ is a million billion. In the U.s. we use the “short scale” where a ‘trillion’ is a thousand billion.

When I saw this in the article I had to look it up on Wikipedia. I was unaware of this difference in scale usage in various areas of the world.

The article is quite correct in that “Power Remains the No. 1 Problem”. I just filed a patent for a change to the scheduling algorithms for an HPC system which is designed to reduce peak power consumption. Just controlling the way user jobs are launched on these platforms can make a big difference in the power consumed and the heat generated.

My company has bragging rights on an installation in France which is in the top 10 of the Top500 Supercomputing list. I visited the site last year when it was installed. We build the hardware and supply the infrastructure software.

When you move into the HPC arena storage speed can be less important than interconnect speed. Typcial algorithms are distributed across 10’s of thousands of compute nodes and each subdivided part of the algorithm has to share intermediate results with other parts running on many other machines. The message rates can be horrendous and the network latencies can become the bottlenecks.

And actually, processor speed was the bottleneck in many applications. During the 70’s and 80’s, as a mainframe vendor (we were one of the 7 dwarfs) our performance department always couched their reports in terms of processor cycles. I constantly tried to get them to pay more attention to I/O (input/output) to no avail. They once produced an “I/O performance” report, but when I read the details the figure of merit was the number of machine instructions per I/O!

What was rated as supercomputer power only a couple of decades ago can now be had for a pittance. And you can hold it in your hand....

The ASUS Eee Pad Transformer Prime—let's call it the Transformer Prime, for short—can now be pre-ordered online, report several sources. It may not be the most eagerly awaited gadget of all time, and it may not be the most elegantly named, but it marks a new era: the Transformer Prime is the first tablet that comes with a quad-core processor built in.

For the spec-hungry: The tablet ships with the Android 3.2 Honeycomb operating system, though it can be upgraded to Ice Cream Sandwich (Android 4.0). It has a 10.1-inch screen, in 1280 x 800 resolution, protected by Gorilla Glass. RAM's one GB, and the device packs two cameras, an eight-megapixel affair in the rear, a 1.2-megapixel affair in the front. A fully charged battery should get you 12 hours, and the whole thing will set you back either $500 or $600, depending on whether you want 32 GB or 64 GB of flash storage.

But again, the main event here is the processor: the Tegra 3 quad-core CPU, heretofore only available in desktops and the occasional high-end laptop. The chip, codenamed Kal-El, debuted in February. Per the Verge, Tegra 3 supposedly quintuples the performance of Tegra 2. By juggling processing demands among its four cores, as well as a fifth "companion core," the device can handle heavy-duty assignments while still keeping a reasonable battery life.

The cool-running Tegra 3 quad core

Plug in an HDMI cable and hook it to your big HDTV and use a bluetooth keyboard. :-)

Oh, yes. More info about the Tegra 3 processor.

We met with 2 senior Nvidia representatives who gave us a technical demonstration of the prototype Nvidia Tegra 3 quad core processor. This unit was amazing featuring 12 GPU cores! The resolution for this new processor does take it to the next level with 1440p! This is two times 720p and is 1.5X better then the latest 1080p. This is indeed next level of processing power and is absolutely a very adequate progression from Tegra 2, which featured only 2 cores.

Not only will this new Nvidia Tegra 3 processor be able to handle superior resolution, but the lead engineer told us that it easily handles 3D. In general, 3D requires double the frame rate (120 frames per second) and double the pixels.

What you’re really describing is “message rate”. The fastest message rates are obtained either in highly-proprietary interconnects from the likes of Cray or IBM (with their Blue Gene systems)in “capability”-level systems; commercial OTS solutions would clearly be Infiniband (QDR, or quad data rate, =40Gb/second). That’s pretty smokin’.

Disclaimer: This is my business; my field....and has been for years. I currently work with one of only two remaining Infiniband suppliers in the world.

The author of this piece clearly isn’t an HPC type, but that’s ok. I was at SC11 in Seattle (I go to SC every year). Great show as usual, sucky weather. Frankly, there really were no show-stopping announcements at the show this year. Still, it’s always a lot of fun.

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.