HPC Wire

Subscribe to HPC Wire feed
Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them
Updated: 20 hours 28 min ago

Tensors Come of Age: Why the AI Revolution Will Help HPC

Mon, 11/13/2017 - 07:00

A Quick Retrospect

Thirty years ago, parallel computing was coming of age. A bitter battle began between stalwart vector computing supporters and advocates of various approaches to parallel computing. IBM skeptic Alan Karp, reacting to announcements of nCUBE’s 1024-microprocessor system and Thinking Machines’ 65,536-element array, made a public $100 wager that no one could get a parallel speedup of over 200 on real HPC workloads. Gordon Bell softened that to an annual award for the best speedup, what we now know as the Gordon Bell Prize.

John Gustafson

This year also marks the 30th Supercomputing Conference. At the first SC in 1988, Seymour Cray gave the keynote, and said he might consider combining up to 16 processors. Just weeks before that event, Sandia researchers had managed to get thousand-fold speedups on the 1024-processor nCUBE for several DOE workloads, but those results were awaiting publication.

The magazine Supercomputing Review was following the battle with interest, publishing a piece by a defender of the old way of doing things, Jack Worlton, titled “The Parallel Processing Bandwagon.” It declared parallelism a nutty idea that would never be the right way to build a supercomputer. Amdahl’s law and all that. A rebuttal by Gustafson titled “The Vector Gravy Train” was to appear in the next issue… but there was no next issue of Supercomputing Review. SR had made the bold step of turning into the first online magazine, back in 1987, with a new name.

Lenore Mullin

Happy 30th Anniversary, HPCwire!

What better occasion than to write about another technology that is coming of age, one we will look back on as a watershed? That technology is tensor computing: Optimized multidimensional array processing using novel arithmetic[1].

Thank you, AI

You can hardly throw a tchotchke on the trade show floor of SC17 without hitting a vendor talking about artificial intelligence (AI), deep learning, and neural nets. Google recently open-sourced its TensorFlow AI library and Tensor Processing Unit. Intel bought Nervana. Micron, AMD, ARM, Nvidia, and a raft of startups are suddenly pursuing an AI strategy. Two key ideas keep appearing:

  • An architecture optimized for tensors
  • Departure from 32-bit and 64-bit IEEE 754 floating-point arithmetic

What’s going on? And is this relevant to HPC, or is it unrelated? Why are we seeing convergent evolution to the use of tensor processors, optimized tensor algebras in languages, and nontraditional arithmetic formats?

What’s going on is that computing is bandwidth-bound, so we need to make much better use of the bits we slosh around a system. Tensor architectures place data closer to where it is needed. New arithmetic represents the needed numerical values using fewer bits. This AI-driven revolution will have a huge benefit for HPC workloads. Even if Moore’s law stopped dead in its tracks, these approaches increase computing speed and cut space and energy consumption.

Tensor languages have actually been around for years. Remember APL and Fortran 90, all you old-timers? However, now we are within reach of techniques that can automatically optimize arbitrary tensor operations on tensor architectures, using an augmented compilation environment that minimizes clunky indexing and unnecessary scratch storage[2]. That’s crucial for portability.

Portability suffers, temporarily, as we break free from standard numerical formats. You can turn float precision down to 16-bit, but then the shortcomings of IEEE format really become apparent, like wasting over 2,000 possible bit patterns on “Not a Number” instead of using them for numerical values. AI is providing the impetus to ask what comes after floats, which are awfully long in the tooth and have never followed algebraic laws. HPC people will someday be grateful that AI researchers helped fix this long-standing problem.

The Most Over-Discovered Trick in HPC

As early as the 1950s, according to the late numerical analyst Herb Keller, programmers discovered they could make linear algebra go faster by blocking the data to fit the architecture. Matrix-matrix operations in particular run best when the matrices are tiled into submatrices, and even sub-submatrices. That was the beginning of dimension lifting, an approach that seems to get re-discovered by every generation of HPC programmers. It’s time for a “grand unification” of the technique.

Level N BLAS

The BLAS developers started in the 1970s with loops on lists (level 1), then realizing doubly nested loops were needed (level 2), then triply nested (level 3), and then LAPACK and SCALAPACK introduced blocking to better fit computer architectures. In other words, we’ve been computing with tensors for a long time, but not admitting it! Kudos to Google for naming their TPU the way they did. What we need now is “level N BLAS.”

Consider this abstract way of thinking about a dot product of four-element vectors:

Notice the vector components are not numbered; think of them as a set, not a list, because that allows us to rearrange them to fit any memory architecture. The components are used once in this case, multiplied, and summed to some level (in this case, all the way down to a single number). Multiplications can be completely parallel if the hardware allows, and summation can be as parallel as binary sum reduction allows.

Now consider the same inputs, but used for 2-by-2 matrix-matrix multiplication:

Each input is used twice, either by a broadcast method or re-use, depending on what the hardware supports. The summation is only one level deep this time.

Finally, use the sets for an outer product, where each input is used four times to create 16 parallel multiplications, which are not summed at all.

All these operations can be captured in a single unified framework, and that is what we mean by “Level N BLAS.” The sets of numbers are best organized as tensors that fit the target architecture and its cost functions. A matrix really isn’t two-dimensional in concept; that’s just for human convenience, and semantics treat it that way. An algebra exists for index manipulation that can be part of the compiler smarts, freeing the programmer from having to worry about details like “Is this row-major or column-major order[4]?” Tensors free you from imposing linear ordering that isn’t required by the algorithm and that impedes optimal data placement.

Besides linear algebra, tensors are what you need for Fast Fourier Transforms (FFTs), convolutions for signal and image processing, and yes, neural networks. Knowledge representation models like PARAFAC or CANDECOMP use tensors. Most people aren’t taught tensors in college math, and tensors admittedly look pretty scary with all those subscripts. One of Einstein’s best inventions was a shorthand notation that gets rid of a lot of the subscripts (because General Relativity requires tensor math), but it still takes a lot of practice to get a “feel” for how tensors work. The good news is, computer users don’t have to learn that skill, and only a few computer programmers have to. There now exists a theory[4], and many prototypes[5], for handling tensors automatically. We just need a few programmers to make use of the existing theory of array indexing to build and maintain those tools for distribution to all[6]. Imagine being able to automatically generate a Fast Fourier Transform (FFT) without having to worry about the indexing! That’s already been prototyped[7].

Which leads us to another HPC trend that we need for architecture portability…

The Rise of the Installer Program

In the old days, code development meant edit, compile, link, and load. Nowadays, people never talk about “linkers” and “loaders.” But we certainly talk about precompilers, makefiles and installer programs. We’ve also seen the rise of just-in-time compilation in languages like Java, with system-specific byte codes to get both portability and sometimes, surprisingly high performance. The nature of who-does-what has changed quite a bit over the last few decades. Now, for example, HPC software vendors cannot ship a binary for a cluster supercomputer because they cannot know which MPI library is in use; the installer links that in.

The compiler, or preprocessor, doesn’t have to guess what the target architecture is; it can instead specify what needs to be done, but not how, stopping at an intermediate language level. The installer knows what the costs are of all the data motions in the example diagrams above, and can predict precisely what the cost of a particular memory layout is. What you can predict, you can optimize. The installer takes care of the how.

James Demmel has often described the terrible challenge of building a ScaLAPACK-like library that gets high performance for all possible situations. Call it “The Demmel Dilemma.” It appears we are about to resolve that dilemma. With tensor-friendly architectures, and proper division of labor between the human programmer and the preprocessor, compiler, and installer, we can look forward to a day when we don’t need 50 pages of compiler flag documentation, or endless trial-and-error experimentation with ways to lay out arrays in storage that is hierarchical, parallel, and complicated. Automation is feasible, and essential.

The Return of the Exact Dot Product

There is one thing we’ve left out though, and it is one of the most exciting developments that will enable all this to work. You’ve probably never heard of it. It’s the exact dot product approach invented by Ulrich Kulisch, back in the late 1960s, but made eminently practical by some folks at Berkeley just this year[8].

With floats, because of rounding errors, you will typically get a different result when you change the way a sum is grouped. Floats disobey the associative law: (a + b) + c, rounded, is not the same as a + (b + c). That’s particularly hazardous when accumulating a lot of small quantities into a single sum, like when doing Monte Carlo methods, or a dot product. Just think of how often a scientific code needs to do the sum of products, even if it doesn’t do linear algebra. Graphics codes are full of three-dimensional and two-dimensional dot products. Suppose you could calculate sums of products exactly, rounding only when converting back to the working real number format?

You might think that would take a huge, arbitrary precision library. It doesn’t. Kulisch noticed that for floating-point numbers, a fixed-size register with a few hundred bits suffices as scratch space for perfect accuracy results even for vectors that are billions of floats long. You might think it would run too slowly, because of the usual speed-accuracy tradeoff. Surprise: It runs 3–6 times faster than a dot product with rounding after every multiply-add. Berkeley hardware engineers discovered this and published their result just this summer. In fact, the exact dot product is an excellent way to get over 90 percent of the peak multiply-add speed of a system, because the operations pipeline.

Unfortunately, the exact dot product idea has been repeatedly and firmly rejected by the IEEE 754 committee that defines how floats work. Fortunately, it is an absolute requirement in posit arithmetic[9] and can greatly reduce the need for double precision quantities in HPC programs. Imagine doing a structural analysis program with 32-bit variables throughout, yet getting 7 correct decimals of accuracy in the result, guaranteed. That’s effectively like doubling bandwidth and storage compared to the 64-bits-everywhere approach typically used for structural analysis.

A Scary-Looking Math Example

If you don’t like formulas, just skip this. Suppose you’re using a conjugate gradient solver, and you want to evaluate its kernel as fast as possible:

A theory exists to mechanically transform these formulas to a “normal form” that looks like this:

That, plus hardware-specific information, allows automatic data layout that minimizes indexing and temporary storage, and maximizes locality of access for any architecture. And with novel arithmetic like posits that supports the exact dot product, you get a bitwise identical result no matter how the task is organized to run in parallel, and at near-peak speed. Programmers won’t have to wrestle with data placement, nor will they have to waste hours trying to figure out if the parallel answer is different because of a bug or because of rounding errors.

What People Will Remember, 30 Years from Now

By 2047, people may look back on the era of IEEE floating-point arithmetic the way we now regard the EBCDIC character set used on IBM mainframes (which many readers may never have heard of, but it predates ASCII). They’ll wonder how people ever tolerated the lack of repeatability and portability and the rounding errors that were indistinguishable from programming bugs, and they may reminisce about how people wasted 15-decimal accuracy on every variable as insurance, when they only needed four decimals in the result. Not unlike the way some of us old-timers remember “vectorizing” code in 1987 to get it to run faster, or “unrolling” loops to help out the compiler.

Thirty years from now, the burden of code tuning and portability for arrays will be back where it belongs: on the computer itself. Programmers will have long forgotten how to tile matrices into submatrices because the compiler-installer combination will do that for tensors for any architecture, and will produce bitwise-identical results on all systems.
The big changes that are permitting this watershed are all happening now. This year. These are exciting times! □

[1] A. Acar et al., “Tensor Computing for Internet of Things,” Dagstuhl Reports, Vol. 6, No. 4, 2016, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, doi:10.4230/DagRep.6.4.57, http://drops.dagstuhl.de/opus/volltexte/2016/6691 pp. 57–79.

[2] Rosencrantz et al., “On Minimizing Materializations of Array-Valued Temporaries,” ACM Trans. Program. Lang. Syst., Vol. 28, No. 6, 2006, http://doi.acm.org/10.1145/118663, pp.1145–1177.

[3] L. Mullin and S. Thibault, “Reduction Semantics for Array Expressions: The Psi Compiler,” Technical Report, University of Missouri-Rolla Computer Science Dept., 1994.

[4] K. Berkling, Arrays and the Lambda Calculus, SU0CIS-90-22, CASE Center and School of CIS, Syracuse University, May 1990.

[5] S. Thibault et al., “Generating Indexing Functions of Regularly Sparse Arrays for Array Compilers,” Technical Report CSC-94-08, University of Missouri-Rolla, 1994.

[6] L. Mullin and J. Raynolds, Conformal Computing: Algebraically Connecting the Hardware/Software Boundary using a Uniform Approach to High-Performance Computation for Software and Hardware Applications, arXiv:0803.2386, 2008.

[7] H. Hunt et al., “A Transformation-Based Approach for the Design of Parallel/Distributed Scientific Software: The FFT,” CoRR, 2008, http://dblp.uni-trier.de/rec/bib/journals/corr/abs-0811-2535.

[8] http://arith24.arithsymposium.org/slides/s7-koenig.pdf.

[9] http://www.posithub.org.

About the Authors

John L. Gustafson
john.gustafson@nus.edu.sg

John L. Gustafson, Ph.D., is currently Visiting Scientist at A*STAR and Professor of Computer Science at National University of Singapore. He is a former Senior Fellow and Chief Product Architect at AMD, and a former Director at Intel Labs. His work showing practical speedups for distributed memory parallel computing in 1988 led to his receipt of the inaugural Gordon Bell Prize, and his formulation of the underlying principle of “weak scaling” is now referred to as Gustafson‘s law. His 2015 book, “The End of Error: Unum Computing” has been an Amazon best-seller in its category. He is a Golden Core member of IEEE. He is also an “SC Perennial” who has been to every Supercomputing conference since the first one in 1988. He is an honors graduate of Caltech and received his MS and PhD from Iowa State University.

Lenore Mullin
lenore@albany.edu

Lenore M. Mullin, Ph.D., is an Emeritus Professor, Computer Science, University at Albany, SUNY,  a Research Software Consultant to REX Computing, Inc. and Senior Computational Mathematician at Etaphase, Inc. Dr. Mullin invented a new theory of n-dimensional tensors/arrays in her 1988 Dissertation, A Mathematics of Arrays (MoA) that includes an indexing calculus, The Psi Calculus. This theory built on her tenure at IBM Research working with Turing Award Winner, Kenneth Iverson. She has built numerous software and hardware prototypes illustrating both the power and mechanization of MoA and the Psi Calculus. MoA was recognized by NSF with the 1992 Presidential Faculty Fellowship, entitled “Intermediate Languages for Enhanced Parallel Performance”, awarded to only 30 nationally. Her binary transpose was accepted and incorporated into Fortran 90. On sabbatical at MIT Lincoln Laboratory, she worked to improve the standard missile software through MoA design. As an IPA, she ran the Algorithms, Numerical and Symbolic Computation program in NSF’s CISE CCF. While on another leave she was Program Director in DOE’s ASCR Program. She lives in Arlington, Va.

The post Tensors Come of Age: Why the AI Revolution Will Help HPC appeared first on HPCwire.

CoolIT Systems Launches Rack DCLC AHx2 Heat Exchange Module

Sat, 11/11/2017 - 21:26

CALGARY, AB, November 10, 2017 – CoolIT Systems (CoolIT), world leader in energy efficient liquid cooling solutions for HPC, Cloud, and Enterprise markets, has expanded its Rack DCLC product line with the release of the AHx2 Heat Exchange Module. This compact Liquid-to-Air heat exchanger makes it possible for Direct Contact Liquid Cooling (DCLC) enabled servers to be thermally tested during the factory burn-in process, without additional liquid cooling infrastructure. CoolIT will officially launch the AHx2 at the Supercomputing Conference 2017 (SC17) in Denver, Colorado.

The AHx2 is a vital addition to CoolIT’s broad range of liquid cooling products. It is a compact, easy to transport air heat exchanger designed to enable factory server burn-in when liquid is not present in the facility. As a Liquid-to-Air heat exchanger, the AHx2 dissipates heat from the coolant in the server loop to the ambient environment. AHx2 provides direct liquid cooling to four DCLC enabled servers, and provides 2kW of heat load management. The design and size allows the unit to safely sit on top or adjacent to a server chassis during manufacturing.

“The Rack DCLC AHx2 Module is the ideal way for OEMs and System Integrators to conduct thermal testing during the factory burn-in process,” said Patrick McGinn, VP of Product Marketing, CoolIT Systems. “Our customers will appreciate having access to such robust testing potential in such a compact design, without needing to invest in supplementary liquid cooling infrastructure.”

The AHx2 Heat Exchange Module is a product designed to meet a critical customer need, and as such, is an important part of CoolIT’s modular product array. SC17 attendees can learn more about the solution by visiting CoolIT at booth 1601 from November 13-16. To set up an appointment, contact Lauren Macready at lauren.macready@coolitsystems.com

About CoolIT Systems.

CoolIT Systems, Inc. is the world leader in energy efficient liquid cooling technology for the Data Center, Server and Desktop markets. CoolIT’s Rack DCLC platform is a modular, rack-based, advanced cooling solution that allows for dramatic increases in rack densities, component performance, and power efficiencies. The technology can be deployed with any server and in any rack making it a truly flexible solution. For more information about CoolIT Systems and its technology, visit www.coolitsystems.com.

About Supercomputing Conference (SC17)

Established in 1988, the annual SC conference continues to grow steadily in size and impact each year. Approximately 5,000 people participate in the technical program, with about 11,000 people overall. SC has built a diverse community of participants including researchers, scientists, application developers, computing center staff and management, computing industry staff, agency program managers, journalists, and congressional staffers. This diversity is one of the conference’s main strengths, making it a yearly “must attend” forum for stakeholders throughout the technical computing community. For more information, visit https://sc17.supercomputing.org/.

Source: CoolIT Systems, Inc.

The post CoolIT Systems Launches Rack DCLC AHx2 Heat Exchange Module appeared first on HPCwire.

IBM Announces Advances to IBM Quantum Systems & Ecosystem

Sat, 11/11/2017 - 15:33

YORKTOWN HEIGHTS, N.Y., Nov. 11, 2017 — IBM announced two significant quantum processor upgrades for its IBM Q early-access commercial systems. These upgrades represent rapid advances in quantum hardware as IBM continues to drive progress across the entire quantum computing technology stack, with focus on systems, software, applications and enablement.

  • The first IBM Q systems available online to clients will have a 20 qubit processor, featuring improvements in superconducting qubit design, connectivity and packaging. Coherence times (the amount of time available to perform quantum computations) lead the field with an average value of 90 microseconds, and allow high-fidelity quantum operations.
  • IBM has also successfully built and measured an operational prototype 50 qubit processor with similar performance metrics. This new processor expands upon the 20 qubit architecture and will be made available in the next generation IBM Q systems.

Clients will have online access to the computing power of the first IBM Q systems by the end of 2017, with a series of planned upgrades during 2018. IBM is focused on making available advanced, scalable universal quantum computing systems to clients to explore practical applications. The latest hardware advances are a result of three generations of development since IBM first launched a working quantum computer online for anyone to freely access in May 2016. Within 18 months, IBM has brought online a 5 and 16 qubit system for public access through the IBM Q experience and developed the world’s most advanced public quantum computing ecosystem.

An IBM cryostat wired for a prototype 50 qubit system. (PRNewsfoto/IBM)

“We are, and always have been, focused on building technology with the potential to create value for our clients and the world,” said Dario Gil, vice president of AI and IBM Q, IBM Research. “The ability to reliably operate several working quantum systems and putting them online was not possible just a few years ago. Now, we can scale IBM processors up to 50 qubits due to tremendous feats of science and engineering. These latest advances show that we are quickly making quantum systems and tools available that could offer an advantage for tackling problems outside the realm of classical machines.”

Over the next year, IBM Q scientists will continue to work to improve its devices including the quality of qubits, circuit connectivity, and error rates of operations to increase the depth for running quantum algorithms. For example, within six months, the IBM team was able to extend the coherence times for the 20 qubit processor to be twice that of the publicly available 5 and 16 qubit systems on the IBM Q experience.

In addition to building working systems, IBM continues to grow its robust quantum computing ecosystem, including open-source software tools, applications for near-term systems, and educational and enablement materials for the quantum community. Through the IBM Q experience, over 60,000 users have run over 1.7M quantum experiments and generated over 35 third-party research publications. Users have registered from over 1500 universities, 300 high schools, and 300 private institutions worldwide, many of whom are accessing the IBM Q experience as part of their formal education. This form of open access and open research is critical for accelerated learning and implementation of quantum computing.

“I use the IBM Q experience and QISKit as an integral part of my classroom teaching on quantum computing, and I cannot emphasize enough how important it is. In prior years, the course was interesting theoretically, but felt like it described some far off future,” said Andrew Houck, professor of electrical engineering, Princeton University. “Thanks to this incredible resource that IBM offers, I have students run actual quantum algorithms on a real quantum computer as part of their assignments! This drives home the point that this is a real technology, not just a pipe dream.  What once seemed like an impossible future is now something they can use from their dorm rooms. Now, our enrollments are skyrocketing, drawing excitement from top students from a very wide range of disciplines.”

To augment this ecosystem of quantum researchers and application development, IBM rolled out earlier this year its QISKit (www.qiskit.org) project, an open-source software developer kit to program and run quantum computers. IBM Q scientists have now expanded QISKit to enable users to create quantum computing programs and execute them on one of IBM’s real quantum processors or quantum simulators available online. Recent additions to QISKit also include new functionality and visualization tools for studying the state of the quantum system, integration of QISKit with the IBM Data Science Experience, a compiler that maps desired experiments onto the available hardware, and worked examples of quantum applications.

“Being able to work on IBM’s quantum hardware and have access through an open source platform like QISKit has been crucial in helping us to understand what algorithms–and real-world use cases–might be viable to run on near-term processors,” said Matt Johnson, CEO, QC Ware. “Simulators don’t currently capture the nuances of the actual quantum hardware platforms, and nothing is more convincing for a proof-of-concept than results obtained from an actual quantum processor.”

Quantum computing promises to be able to solve certain problems – such as chemical simulations and types of optimization – that will forever be beyond the practical reach of classical machines. In a recent Nature paper, the IBM Q team pioneered a new way to look at chemistry problems using quantum hardware that could one day transform the way new drugs and materials are discovered. A Jupyter notebook that can be used to repeat the experiments that led to this quantum chemistry breakthrough is available in the QISKit tutorials. Similar tutorials are also provided that detail implementation of optimization problems such as MaxCut and Traveling Salesman on IBM’s quantum hardware.

This ground-breaking work demonstrates it is possible to solve interesting problems using near term devices and that it will be possible to find a quantum advantage over classical computers. IBM has made significant strides tackling problems on small scale universal quantum computing systems. Improvements to error mitigation and to the quality of qubits are our focus for making quantum computing systems useful for practical applications in the near future. As well, IBM has industrial partners exploring practical quantum applications through the IBM Research Frontiers Institute, a consortium that develops and shares a portfolio of ground-breaking computing technologies and evaluates their business implications. Founding members include Samsung, JSR, Honda, Hitachi Metals, Canon, and Nagase.

These quantum advances are being presented today at the IEEE Industry Summit on the Future Of Computing as part of IEEE Rebooting Computing Week.

IBM Q is an industry-first initiative to build commercially available universal quantum computing systems for business and science applications. For more information about IBM’s quantum computing efforts, please visit www.ibm.com/ibmq.

Source: IBM

The post IBM Announces Advances to IBM Quantum Systems & Ecosystem appeared first on HPCwire.

Early Cluster Comp Betting Odds Favor China, Taiwan, and Poland

Sat, 11/11/2017 - 10:15

So far the early action in the betting pool favors Taiwan’s NTHU, China’s Tsinghua, and, surprisingly, Poland’s University of Warsaw. Other notables include Team Texas at 9 to 1, the German juggernaut FAU/TUC team at 12 to 1, and the University of Illinois at 13 to 1.

There are several teams that haven’t seen any action yet, including last year’s winner USTC, third place 2016 winner Team Peking, and up and comer Nanyang University.

I’m also not seeing any betting love for perennial favorite Team Chowder (Boston).

If you want to find out more about the teams before laying down your (virtual) money, you can see our exhaustive profiles of each team here. That should give you enough info to start laying down some money on the win line.

The betting window will be open until this coming Tuesday, so get in and get paid. Here’s a link to the betting pool.

The post Early Cluster Comp Betting Odds Favor China, Taiwan, and Poland appeared first on HPCwire.

Indiana University Showcases SC17 Activities

Sat, 11/11/2017 - 09:41

DENVER, Colo., Nov. 11 — Computing and networking experts from Indiana University will gather in the Mile High City next week for SC17, the International Conference for High Performance Computing, Networking, Storage and Analysis taking place November 12-17 in Denver. SC17 is one of the world’s foremost tech events, annually attracting thousands of scientists, researchers, and IT experts from across the world.

IU’s Pervasive Technology InstituteGlobal Research Network Operations Center, and School of Informatics, Computing and Engineering (SICE) will team up to host a research-oriented booth (#601) in the exhibition portion of the conference, showcasing current research and educational initiatives.

With the theme “We put the ‘super’ in computing,” the IU booth will showcase staff and faculty members and projects that are pushing the boundaries of what’s possible in computing and networking. Although they may not sport capes, the IU team devotes its considerable abilities to harnessing the cloud, achieving maximum throughput, engineering intelligent systems, and thwarting real-life cybervillains.

“SC17 marks the 20th anniversary of IU’s first display at the Supercomputing Conference, a milestone that underscores our deep commitment to leveraging high performance computing and networking to benefit the IU community, the state of Indiana, and the world,” said Brad Wheeler, IU vice president for IT and chief information officer. “In that time span, our researchers, scientists, and technologists have not only put IU on the map in the world of HPC, but their talents and discoveries have made IU a true leader in this increasingly important realm.”

One highlight of IU’s participation in SC17 is Judy Qiu’s invited talk, “Harp-DAAL: A Next Generation Platform for High Performance Machine Learning on HPC-Cloud.” Qiu is an associate professor in the intelligent systems engineering department in SICE. She will discuss growth in HPC and machine learning for big data with cloud infrastructure, and introduce Harp-DAAL, a high performance machine learning framework.

The Supercomputing Conference is always a fantastic opportunity to showcase the work that is being conducted at SICE and provides a spotlight for our wonderful faculty.

Raj Acharya, dean of the IU School of Informatics, Computing and Engineering

“The Supercomputing Conference is always a fantastic opportunity to showcase the work that is being conducted at SICE and provides a spotlight for our wonderful faculty,” said Raj Acharya, dean of SICE. “The conference itself is so valuable because it brings together the greatest minds in supercomputing in an atmosphere of collaboration that is as inspiring as it is informative. We’re always thrilled to be a part of it.”

This year, the IU team continues its leadership role in organizing the conference. Matt Link, associate vice president and director of systems for IU Research Technologies, serves as a member of the SC Steering Committee. Scott Michael, manager of research analytics, is vice chair of the Students@SC committee, and Jenett Tillotson, senior system administrator for high performance systems, is a member of the Student Cluster Competition committee.

Additionally, IU network engineers will continue a decades-long tradition of helping to operate SCinet, one of the most powerful and advanced networks in the world. Created each year for the conference, SCinet is a high-capacity network to support the applications and experiments that are the hallmark of the SC conference. Laura Pettit, SICE director of intelligent systems engineering research operations, is the SCinet volunteer services co-chair, and ISE doctoral students Lucas Brasilino and Jeremy Musser are also volunteering with SCinet.

This year, the IU booth will include a range of presentations and demonstrations:

  • Current Trends and Future Challenges in HPC by Jack Donagarra, University of Tennessee and Oak Ridge National Laboratory.
  • Special event: Jetstream and OpenStack by Dave Hancock and partners. OpenStack is the emerging standard for deploying cloud computing capabilities, and cloud-based infrastructure is increasingly able to handle HPC workloads. During this special event, members of the Jetstream team and the OpenStack Foundation Scientific Working Group will discuss how they use OpenStack to serve HPC customers.
  • Science Gateways with Apache Airavata by Marlon Pierce, Eroma Abeysinghe and Surresh Marru. Science gateways are user interfaces and user-supporting services that simplify access to advanced resources for novice users and provide new modes of usage for power users. Apache Airavata is open source cyberinfrastructure software for building science gateways. During this demonstration, the presenters provide an overview of recent developments.
  • Big Data Toolkit Spanning HPC, Grid, Edge and Cloud Computing by Geoffrey Fox. This demonstration looks at big data programming environments such as Hadoop, Spark, Flink, Heron, Pregel; HPC concepts such as MPI and asynchronous many-task runtimes; and cloud/grid/edge ideas such as event-driven computing, serverless computing, workflow and services.
  • Cybersecurity for Science by Von Welch. The Center for Applied Cybersecurity Research, affiliated with the Pervasive Technology Institute at Indiana University, specializes in cybersecurity for R&D. In this scope, the center works with science communities across the country, including leading the National Science Foundation’s Cybersecurity Center of Excellence. This talk will provide an overview of what cybersecurity means in the context of science and how it can enable productive, trusted scientific research.
  • Enabling High-Speed Networking for Researchers by Chris Robb. With data networking becoming increasingly complex and opaque, researchers are often unsure how to address poor performance between their endpoints. This talk will introduce the IRNC NOC Performance Engagement Team (PET) and show how it can help researchers determine the best approach to achieving their maximum bandwidth potential.
  • Scientific Workflow Integrity for Pegasus by Von Welch and partners. The Pegasus Workflow Management System is a popular system for orchestrating complex scientific workflows. In this talk, the PIs of the NSF-funded Scientific Workflow Integrity for Pegasus project will talk about scientific data integrity challenges and their work to add greater assurances to Pegasus for data integrity.
  • Macroscopes from the “Places & Spaces: Mapping Science” Exhibition by Katy Börner. See up to 100 large-format maps that showcase effective visualization techniques to communicate science to the general public. These interactive visualizations, called macroscopes, help people see patterns in data that are too large or complex to view unaided.
  • Proteus: A Configurable FPGA Cluster for High Performance Networking by Martin Swany. Proteus is new HPC cluster and research testbed that will enable investigation of novel and advanced architectures in HPC. Using FPGAs to optimize the performance of common parallel operations, this serves as a model for hardware accelerated network “microservices.”
  • International Networks at IU by Jennifer Schopf. International Networks at IU is a multi-million dollar NSF-funded program that supports the use of international links between the United States, Europe, Asia and Africa. Demos will review our currently supported links, as well as the measurement and monitoring services deployed on the links.

About the IU School of Informatics, Computing, and Engineering
The School of Informatics, Computing, and Engineering’s rare combination of programs—including informatics, computer science, library science, information science and intelligent systems engineering—makes SICE one of the largest, broadest and most accomplished of its kind. The extensive programs are united by a focus on information and technology.

About the Pervasive Technology Institute
The Pervasive Technology Institute (PTI) at Indiana University is a world-class organization dedicated to the development and delivery of innovative information technology to advance research, education, industry and society. Since 2000, PTI has received more than $50 million from the National Science Foundation to advance the nation’s research cyberinfrastructure.

About the Global Research Network Operations Center
The Global Research Network Operations Center (GlobalNOC) supports advanced international, national, regional and local high-performance research and education networks. GlobalNOC plays a major role in transforming the face of digital science, research and education in Indiana, the United States, and the world by providing unparalleled network operations and engineering needed for reliable and cost-effective access to specialized facilities for research and education.

Source: Indiana University

The post Indiana University Showcases SC17 Activities appeared first on HPCwire.

Exhaustive Profile of SC17 Cluster Competition Teams – Let’s go DEEP…

Sat, 11/11/2017 - 08:24

Like a fat man at a Vegas buffet, we’re now ready to delve deeply into the SC17 cluster teams. In this article, we’re going to take our initial personal look at the teams… their hopes, their dreams, and even, in some cases, their favorite songs.

First up, the teams from the United States……

Georgia Institute of Technology: The GIT team, or Team Swarm as they call themselves, is a new entrant into the world of high stakes student clustering. The term ‘swarm’ is a reference to their university mascot, the yellow jacket – which is a pretty nasty stinging bug. The team has a wide range of experience including GPU acceleration, IC fabrication, and data analytics.

They believe their unique competitive advantage lies in automation. No, they’re not using The Clapper (although it would be pretty cool to have a “clap on” “clap off” cluster). Team Swarm has assembled a tool stack that automates their system environment and allows them focus on optimizing their apps rather than managing their system. The biggest thing they’re looking forward to? Crushing LINPACK and HPCG, plus meeting the other teams. Their favorite song? “Never Gonna Give You Up” by Rick Astley. Damn, just typing that song title has now lodged it in my head.

Northeastern University: They’ve dubbed themselves “Team HACKs”, which stands for “Huskies with Accelerated Computing Kernels” and references NEU’s Husky mascot. This isn’t their first cluster rodeo, they’ve been here before. Last year, they were the victim of a shipping error and had to run competition tasks on a hoopty hodgepodge of components.

To prep for the competition this year, they’ve been working very closely with their vendor partner AMD, plus tapping the brains of the NEU grad students in the research lab. NEU is looking to make a big comeback in this year’s competition, judging by their theme song “Don’t Stop Believing” by Journey…damn, another song stuck in my brain. What’s worse is that I also have the mental picture of the kids from Glee performing it. I’m going to go hit myself with a hammer.

San Diego State University/Southwest Oklahoma State University: This is a melded team that has the longest abbreviation in competition history: SDSUSWOKSU. Hmm….not as ungainly as I thought, you can actually pronounce it. To make things easier, they’ve dubbed their team “Thread Weavers”, which refers to the fact that modern computers use threads. They need to take this nickname back into the lab and come up with something better – or just let me give them a nickname.

This is the first major competition for the Oklahoma side of the roster, while the San Diego side has a couple of returning veterans from last year’s competition. The team seems well organized and has been meeting regularly during the summer and fall in preparation for entering the crucible that is the SC17 Student Cluster Competition. They’ve been using Zoom and Slack to facilitate their meet ups and have become a close-knit group.

I listened to their theme song, “Tonight” by Kleeer, and while it’s funky enough, it isn’t rousing. How are you going to rally your cluster troops into frenzy with a funkadelic smooth groove? But I’m old, what the hell do I know about music?

MGHPCC: In the first line of their team profile they say “Our team is called the MGHPCC Green Team. This name was chosen some years ago because all of the competitors came from the universities that founded the Mass Green HPC Center.” No, they’re wrong. Their name is Team Chowder(or Chowdah) or Team Boston, or Team “So you think you’re better than me?” and always has been. I laid those nicknames on them at their first competition and I’m not giving them up.

The team is keeping their secret sauce a secret, maybe even from themselves, but they’ve certainly been putting in the time, meeting for the last nine months in preparation for the competition. They had a very interesting answer to the question about how long it would take them to reach Denver from their base in Boston…

“At an average walking speed of 3.1 miles per hour and an 8 hour break per day, we expect to arrive in 39 days. Walking in parallel (side by side) will not speed up our journey, but walking in a single file reduces air resistance and saves time.”

Nicely done, Team Chowder. I also want to give them some props for their theme song, David Bowie’s “Under Pressure”, very appropriate for the competition. Welcome back, Boston.

Chicago Fusion Team: This is a pretty complicated team. Some members are from the Illinois Institute of Technology, others are from Maine South High School, and still others are representing Adlai Stevenson High School – all located in or near Chicago. They’re being sponsored by an alliance of heavy hitters including Intel, Argonne National Laboratory, Calyos, NVIDIA, and the National Science Foundation.

Since they didn’t complete the team profile paperwork, I don’t have a lot of details about what they do for fun and their favorite song. However, they did submit their final architectural proposal which has all sorts of details about their cluster. We’ll be covering their configuration in more detail as we get into the competition, but they’re bringing a LOT of hardware – enough to consume 8,200 watts if it were all fired up without throttling. That’s nearly three times the 3,000 watt hard cap, so there will be significant throttling and probably even some agonizing reappraisal when it comes to their configuration.

One other interesting point that caught my attention is that they’re going to be running a two-phase liquid cooling on at least some of their components in order to reduce power usage and, hopefully, run in turbo mode as much as possible. We’ll report more details about the team as they become available.

University of Texas/Texas State University: This is the second time this mixed team has competed at a SC Student Cluster Competition. They’ve dubbed themselves “Team Longcats” in a nod to their respective school mascots. This is not to be confused with the term “Long Pork”, which is how cannibals refer to humans.

The team has been working together since last April to prepare for the SC17 cluster competition marathon. They’re backed by the combined might of long time sponsors Dell and the Texas Advanced Computing Center.

This is a team with a gaudy history. The Texas Long Horn team took the SC Student Cluster Competition crown (although there is no actual crown) three times in a row (SC12, SC13, and SC14) – a feat that has not been duplicated. They’re anxious to drink deeply from the Student Cluster Competition chalice of victory yet again.

The Longcats have wide ranging interests that include fencing, music, walking around outside, electrical engineering, and even designing musical shoes.

University of Illinois: This is the second time that the U of I has entered the Student Cluster Competition arena. One of the unique things about this team is that in addition to the normal things a team does to prepare for a competition, like researching the applications, practicing setting up their machine, etc., they’ve also been working out with the staff of hardcore financial services experts at Jump Trading. It’s an unorthodox training method, but those guys definitely know how to get performance out of a system.

The U of I team is looking forward to networking with others in the HPC industry at the show, and hopefully expanding their skill set at the same time. Their profile also made me laugh when they said “Some of our team members attended the “Dinner with interesting people” event at SC16, but ultimately decided to leave and have dinner with some less interesting people.”

Their profile also revealed a strange and horrible coincidence: their team song is “Never Gonna Give You Up” by the highly regarded Rick Astley – the exact same song as the Georgia Institute of Technology team. Yikes.

University of Utah: This is the second time we’re seeing the SupercompUtes from Utah in a SC competition. Last year, at SC16 in Salt Lake City, the team turned in an unprecedented performance, finishing second overall. This is a huge achievement for a first time competitor and makes them a team to be contended with. Four veterans from that team will be returning this year.

The team believes that their secret sauce is that they’ve trained at altitude for the cluster competition. Salt Lake City sits at 4,327 feet above sea level – even higher if you’re lab is on the second or third story of a building. Denver, with an average altitude of 5,280 feet, isn’t all that much higher than Salt Lake, so the Utes should be well accustomed to high altitude clustering – a point in their favor.

They’ve also picked an inspiring song to drive their team: “Warriors” by Imagine Dragons…good choice.

The Utes are excited to meet the other teams and also, like the other teams, wanting to explore possible HPC careers. So if you’re an employer looking to nab high-performance employees or interns, swing by the student cluster competition area at SC17 and talk to the students. They’re highly motivated, highly skilled, and have the drive and initiative that every employer values.

William Henry Harrison High School: They wrote the tersest team profile in the competition, so they’re not giving me a lot to work with. First fact is that they’ve dubbed their team “The Sudo Wrestlers” which is a nice play on a Linux term. They believe that their edge in the competition is that they’re younger than the other competitors – which is absolutely true, given that they are the first all-high school team in the big iron division of the cluster competition.

They’re led by Lev Gorenstein, a veteran coach who has led several teams in the past, which is definitely an advantage for the plucky team of high schoolers. What isn’t an advantage is their team song: “Careless Whisper” by George Michael. Not exactly the song you’d pick to drive top performance, right? What happened? Was “Wake Me Up Before You Go Go” already taken by someone?

The SC Student Cluster Competitions are international affairs and this year is no exception. Denver is hosting seven teams from non-US countries, let’s take an up close and personal look at those teams, starting with the teams from Europe….

Friedrich Alexander University/Technical University of Munich: These teams wrote a pun-tastic team profile, chock full of, as they put it, “p(h)uns” and fun. Unfortunately for them, I can’t stand puns – they’re the lowest form of humor, just above limericks.

What isn’t funny (or even phunny, as they’d put it) is the skill and expertise these two teams are bringing to the competition. FAU is coming off of a Highest LINPACK win at the ISC17 competition and TUC finished in the upper echelon of teams at last year’s SC16 competition. Coupled together, this team could really make some waves at SC17.

I’ve had a glimpse of their proposed hardware for this competition and, damn, they’re packing some power. It should be a favorite for the Highest LINPACK award and a solid competitor for the Overall Championship as well. We’ll see what happens.

University of Warsaw: This team is actually an amalgamation of students from Lodz University of Technology, University of Warsaw, and Warsaw University of Technology – but they’ll always be Team Warsaw to me. Team Warsaw burst onto the big league cluster scene at ASC17 in Wuxi, China. They shocked the cluster world by coming out of nowhere to nab second place in the Highest LINPACK portion of the competition.

Based on my observations in China, this is a happy team that works together well. They have a finely honed sense of humor and an optimistic outlook. When it comes to this year’s competition, the team says “we want to hear our cluster screeching while running the HPL benchmark.” They’re also looking forward to renewing friendships with other teams from ASC17 as well as making new friendships with other teams.

When they’re not clustering, team members enjoy walking up and down hills and rocks, being underwater, and reading things.

This year’s SC17 competition has a large slate of teams hailing from Asia. Let’s get to know them a bit better….

Nanyang Technological University: This will be the sixth appearance in a major competition for the ‘Pride of Singapore’ NYU. They notched a win in the Highest LINPACK at ASC’15 in Taiyuan, China, but have been shut out of the other major awards. I think this is a team that’s ready to make the move to the next level. They have the experience and are highly motivated. They’ve even named their team “Supernova” with the thought that SC17 could be their time to shine.

They think their edge at SC17 will be the work they’ve done on application optimization, an effort that they didn’t put much time into at SC16, although they took first place on the code optimization task at that competition.

This year they’re going DEEP on the applications, talking to domain experts, combing the web, and actually reading physical books (gasp!). They believe this work will give them unique and comprehensive knowledge of the applications which will translate into a win at SC17. Nice having you back, Nanyang, good luck.

National Tsing Hua University: NTHU is a frequent entrant in major league cluster competitions. Over the years they’ve participated in an amazing 12 Student Cluster Competitions, taking down the Overall Championship or LINPACK Award four times.

This edition of the team has dubbed themselves “./confizure” which, I think, the configure command on Azure. They’re the first team to use Arch Linux in a major cluster competition, which could be an advantage or maybe a disadvantage if things go sideways. When it comes to SC17, they’re looking forward to seeing how the other teams deal with the promised power shutoff event – that should be highly interesting.

When it comes to having fun, this team most enjoys making fun of each other – which almost automatically makes them my favorite team, right? In another humorous twist, their team song is a national health exercise they all had to perform every day in elementary school. Here’s a link, it’s hilarious.  I’m looking forward to making them perform this same exercise every morning before I give them their keyboards.

University of Peking: This is another team that didn’t waste any words when filling out their team profile form. Their nickname is: Team Peking. Their secret sauce is their excellent advisors, solid vendor support, and active team members with different backgrounds.

They’re obviously holding their cards close to their collective vests, not wanting to give anything away. However they did let us know that their team song is “He’s a Pirate” from the Pirates of the Caribbean movie. They also let it drop that one of the major activities is to debate which is the best text editor.

This is the second time we’ve seen Team Peking at a SC cluster competition. Last year, as a newbie team, they managed to land third place for the Overall Championship award, which is quite a feat. They are running new hardware this year, so we’ll see what happens, but this is definitely a team to keep an eye on.

Tsinghua University: Team Tsinghua, or team THU-PACMAN, as they’ve dubbed themselves, is an intensely focused team. This isn’t a surprise when you consider that there is more at stake for them than for perhaps any other team. If Tsinghua can win the Overall Championship at SC17, then they will have completed a record shattering second Student Cluster Competition Grand Slam. This means they would have won all three major competitions (ASC, ISC, and SC) in a single year. The 2015 Tsinghua team is the only other team to have done this in cluster competition history.

There isn’t a whole lot of detail in the Team Tsinghua profile. They like playing online games together during their off time. Their team song is the song that plays during Pac-Man, if you can call a bunch of “waca waca waca” noises a song. But more than anything else, they seem to like winning Student Cluster Competition championships. We’ll see if they can make their 2017 Grand Slam dream come true next week.

University of Science & Technology of China: As the USTC team participated in more competitions, their abilities grew to the point where they took home all the marbles in 2016 and are returning in 2017 to defend their crown (although there isn’t an actual crown).
Most of the team this year is new, so this will be their first time competing in the mind-twisting marathon that is the SC17 Student Cluster Competition. The team points to ‘hard work’ as their secret sauce in the competition this year. They’re also the only team to have specified a spirit animal for the competition. For USTC, their spirit animal is the “Swan Goose” which is lauded in Chinese literature for its perseverance and bravery. As they put it in their profile, they intend to soar like a swan goose. Good thing the convention center ceilings are 40 feet high in most places.

Ok, so if you’re still reading, you now have the personal rundown on each team. If a team has captured your fancy, you should lay your (virtual) money on them in our annual betting pool. You can find the betting pool page here, just solve the captcha and you’ll have a chance to lay down a virtual $1,000 on any team (or teams) of your choice. Here’s the link to the pool.

In upcoming articles we’re going to take a look at the applications the students will face during SC17, the configurations of each team, plus video interviews of each team. Stay tuned to HPCwire for more….

The post Exhaustive Profile of SC17 Cluster Competition Teams – Let’s go DEEP… appeared first on HPCwire.

Intel, AMD Moves Rattle GPU Market

Fri, 11/10/2017 - 17:02

Intel Corp. has lured away the former head of AMD’s graphics business as the world’s largest chipmaker forms a high-end graphics unit to compete with GPU market leader Nvidia.

Intel rattled tech markets this week by hiring AMD’s Raja Koduri to head its new Core and Visual Computing Group. The hiring came days after the chip rivals announced a graphics partnership.

Signaling its strategy of taking on Nvidia in the high-flying GPU market, Intel’s chief engineering officer, Murthy Renduchintala, said the hiring of Koduri underscored Intel’s “plans to aggressively expand our computing and graphics capabilities and build on our very strong and broad differentiated IP foundation.”

Koduri previously served as senior vice president and chief architect of AMD’s Radeon Technologies Group. There, he oversaw AMD’s graphics development. Koduri, 49, was Apple’s director of graphics architecture before joining AMD. At Apple, he led the company’s transition to Retina laptop displays.

Intel said Koduri would assume his new graphics duties in early December.

Raja Koduri

Koduri’s hiring sent AMD’s shares plummeting on the Nasdaq exchange, although they were beginning to recover on Friday (Nov. 10). Likewise, Nvidia’s shares sank on Thursday after Intel’s announcement but were up sharply by the end of the week after announcing record quarterly revenues.

Nvidia has been touting its accelerated GPU platforms with thousands of cores as the next step in computing as Moore’s Law runs out of steam. The reference to Intel co-founder Gordon Moore is seen as a shot across Intel’s bow by the GPU leader as the chipmakers mass their forces to compete in the nascent AI chip and algorithm markets.

“Being the world’s AI platform is our focus,” Greg Estes, Nvidia’s vice president of developer programs, stressed during a recent company event in Washington, DC.

Intel’s announcement of Koduri’s hiring came days after it unveiled a partnership with AMD to compete with Nvidia in the GPU market. “Our collaboration with Intel expands the installed base for AMD Radeon GPUs and brings to market a differentiated solution for high-performance graphics,” Scott Herkelman, vice president and general manager, AMD Radeon Technologies Group, noted in the press release.

AMD’s announcement of the Intel deal, reported elsewhere, was pulled from its website after the Koduri hiring was disclosed.

Intel said its new Core processor initially aimed at the gaming market would combine a high-performance CPU with AMD’s Radeon graphics components.

Hence, the high-end of the graphics market is shaping up as a battle between Nvidia’s many-core accelerated GPUs that emphasize parallelism versus Intel’s hybrid CPU-discrete graphics approach. While Intel emphasizes hardware horsepower through advances in high-bandwidth memory and new chip designs combined with discrete graphics, Nvidia is combining lots of many-core processors and big data to tackle emerging deep learning problems such as inference.

The key battleground will be the AI market where algorithms and APIs rather than traditional coding will help determine winners and losers, Nvidia’s Estes argued.

The post Intel, AMD Moves Rattle GPU Market appeared first on HPCwire.

Moonshot Research and Providentia Worldwide Collaborate on HPC and Big Data Services for Industry

Fri, 11/10/2017 - 11:06

Nov. 10, 2017 — Moonshot Research LLC (Champaign IL) and Providentia Worldwide, LLC (Washington D.C.) have agreed to jointly offer business and technical services and consulting in the areas of high-performance computing and big data services. The Moonshot/Providentia team brings expertise in driving ROI in enterprise computing by focusing on best practices in HPC, cloud and enterprise IT.

Merle Giles, CEO of Moonshot Research said, “I am absolutely delighted to work with the world’s experts in using HPC to achieve real-time analytics. Speed has become the ultimate competitive advantage in the world of technology-enabled products and services. The impact of utilizing our customer-first approach to industrial innovation and ROI are substantial.”

Ryan Quick and Arno Kolster of Providentia Worldwide are pioneers in adopting a hybrid approach to analytics, using techniques and software typically deployed independently in cloud, enterprise and HPC workflows. Their adoption of HPC solutions for real-time fraud detection at PayPal was unconventional, yet proved to be the perfect solution for achieving extreme data ingestion rates and rapid machine-driven decision making.

Giles brings a business sense to this mix of technology integration after proving the impact of his customer-first approach at NCSA’s Private Sector Program at the University of Illinois. Together, the Moonshot/Providentia team of experts offers independent, vendor-agnostic solutions that result in reduced time-to-solution, scale and increased certainty at less cost.

Giles, Quick and Kolster have each earned awards and recognition from HPCwire. All three are members of Hyperion Research’s HPC User Forum steering committee and have been invited speakers in numerous countries around the world. Giles was co-editor of a 2015 book entitled Industrial Applications of High-Performance Computing: Best Global Practices published by CRC Press.

Source: Moonshot Research

The post Moonshot Research and Providentia Worldwide Collaborate on HPC and Big Data Services for Industry appeared first on HPCwire.

Sugon Announces 1st OPA-Based Torus Switch

Fri, 11/10/2017 - 08:38

Nov. 10, 2017 — As supercomputers achieve petascale and reach toward exascale, efficient communication among thousands of nodes becomes an important question. One pioneer solution is the Silicon Switch (an OPA-based Torus topology switch) by Sugon, China’s high-performance computing leader. A demo of the switch was exhibited at the SC17 Denver.

“Large-scale supercomputers, especially those quasi-Exascale or Exascale systems, have to face severe challenges in terms of system scale, scalability, cost, energy consumption, reliability, etc. The Silicon Switch released by Sugon adopts the Torus architecture and the state-of-art OPA technology, and then carries more competitive features, including advanced performance, almost infinite scalability, and excellent fault tolerance ability. It shall be a wise choice for Exascale supercomputer,” said Dr. Li Bin, General Manager of Business Department for HPC Product of Sugon.

Compared with the traditional Fat-tree network topology, the Torus direct network, which emphasizes the neighboring interconnection, has obvious advantages in scalability and cost/performance, since it only holds a linear dependency between the network cost and the system scale. In addition, the rich redundant data paths and the dynamic routing give inherent superiority in fault tolerance ability. All these features well meet the requirements of Exascale supercomputers and pave a new trend of high-speed network technology.

Dr.Li Bin further remarked that Sugon had realized 3D-Torus network in 2015, as a solution for their Earth System Numerical Simulator. Recently, Sugon’s researches in Torus network technology have made bigger breakthroughs. The dimension of the Torus network has evolved from 3D to 6D which can effectively reduce of the longest network hops of large-scale systems. At the software level, the deadlock-free dynamic routing algorithms supporting 6D-Torus have been verified and tested in the actual environment. At the hardware level, the Silicon Switch released this time is an important sample of the hardware implementation.

The “Silicon” mentioned above refers to a unit in Torus high-dimensional direct network. With the 3D-Torus topological structure adopted in asilicon unit, multiple silicon units can agglomerate into a higher-dimensional 4D/5D/6D-Torus direct network. Integrating a 3D-Torus silicon unit into a modular switch can bring many benefits, such as greatly improving the integration and density of the system, simplifying the network cabling, reducing the deployment complexity and costs. The released Silicon Switch can support up to 192 ports (100Gb each). Different Silicon Switches could be connected through a 400Gb specific interface.

Leveraging the integrated Silicon Switch could also greatly raise the popularity of Torus high-speed network technology, since there is almost no change on the computing nodes side and then some small and medium-scale high-performance computing systems may adopt the Torus topology smoothly.

It is worth mentioning that the released Silicon Switch by Sugon supports the cold-plate direct liquid cooling as well. It has been marking the extension of Sugon’s liquid cooling technology from the computing device to the network system. In fact, the liquid cooling technology has played a key role in improving the integration and reliability of the large-scale network systems in terms of reducing their energy consumption.

The flourishing development of high-performance computing and artificial intelligence rely on not only the powerful computing parts, but also on efficient communication parts. Sugon shall aim to blaze new trails in computing, storage, networking and other core technologies.

Source: Sugon

The post Sugon Announces 1st OPA-Based Torus Switch appeared first on HPCwire.

NVIDIA Announces Financial Results for Third Quarter Fiscal 2018

Fri, 11/10/2017 - 08:16

Nov. 10, 2017 — NVIDIA has reported record revenue for the third quarter ended October 29, 2017, of $2.64 billion, up 32 percent from $2.00 billion a year earlier, and up 18 percent from $2.23 billion in the previous quarter, with growth across all its platforms.

GAAP earnings per diluted share for the quarter were a record $1.33, up 60 percent from $0.83 a year ago and up 45 percent from $0.92 in the previous quarter. Non-GAAP earnings per diluted share were $1.33, also a record, up 41 percent from $0.94 a year earlier and up 32 percent from $1.01 in the previous quarter.

“We had a great quarter across all of our growth drivers,” said Jensen Huang, founder and chief executive officer of NVIDIA. “Industries across the world are accelerating their adoption of AI.

“Our Volta GPU has been embraced by every major internet and cloud service provider and computer maker. Our new TensorRT inference acceleration platform opens us to growth in hyperscale datacenters. GeForce and Nintendo Switch are tapped into the strongest growth dynamics of gaming. And our new DRIVE PX Pegasus for robotaxis has been adopted by companies around the world. We are well positioned for continued growth,” he said.

Capital Return

During the first nine months of fiscal 2018, NVIDIA returned to shareholders $909 million in share repurchases and $250 million in cash dividends. As a result, the company returned an aggregate of $1.16 billion to shareholders in the first nine months of the fiscal year. The company intends to return $1.25 billion to shareholders in fiscal 2018.

For fiscal 2019, NVIDIA intends to return $1.25 billion to shareholders through ongoing quarterly cash dividends and share repurchases. The company announced a 7 percent increase in its quarterly cash dividend to $0.15 per share from $0.14 per share, to be paid with its next quarterly cash dividend on December 15, 2017, to all shareholders of record on November 24, 2017.

Q3 FY2018 Summary

GAAP ($ in millions except earnings per share) Q3 FY18 Q2 FY18 Q3 FY17 Q/Q Y/Y Revenue $ 2,636 $ 2,230 $ 2,004 Up 18% Up 32% Gross margin 59.5 % 58.4 % 59.0 % Up 110 bps Up 50 bps Operating expenses $ 674 $ 614 $ 544 Up 10% Up 24% Operating income $ 895 $ 688 $ 639 Up 30% Up 40% Net income $ 838 $ 583 $ 542 Up 44% Up 55% Diluted earnings per share $ 1.33 $ 0.92 $ 0.83 Up 45% Up 60%

 

Non-GAAP ($ in millions except earnings per share) Q3 FY18 Q2 FY18 Q3 FY17 Q/Q Y/Y Revenue $ 2,636 $ 2,230 $ 2,004 Up 18% Up 32% Gross margin 59.7 % 58.6 % 59.2 % Up 110 bps Up 50 bps Operating expenses $ 570 $ 533 $ 478 Up 7% Up 19% Operating income $ 1,005 $ 773 $ 708 Up 30% Up 42% Net income $ 833 $ 638 $ 570 Up 31% Up 46% Diluted earnings per share $ 1.33 $ 1.01 $ 0.94 Up 32% Up 41%

NVIDIA’s outlook for the fourth quarter of fiscal 2018 is as follows:

  • Revenue is expected to be $2.65 billion, plus or minus two percent.
  • GAAP and non-GAAP gross margins are expected to be 59.7 percent and 60.0 percent, respectively, plus or minus 50 basis points.
  • GAAP and non-GAAP operating expenses are expected to be approximately $722 million and $600 million, respectively.
  • GAAP and non-GAAP other income and expense are both expected to be nominal.
  • GAAP and non-GAAP tax rates are both expected to be 17.5 percent, plus or minus one percent, excluding any discrete items. GAAP discrete items include excess tax benefits or deficiencies related to stock-based compensation, which the company expects to generate variability on a quarter by quarter basis.

Third Quarter Fiscal 2018 Highlights

During the third quarter, NVIDIA achieved progress in these areas:

Datacenter

Gaming

Professional Visualization

Automotive

  • Announced NVIDIA DRIVE PX Pegasus, the world’s first auto-grade AI computer designed to enable a new class of driverless robotaxis without steering wheels, pedals or mirrors.

Autonomous Machines/AI Edge Computing

Source: NVIDIA

The post NVIDIA Announces Financial Results for Third Quarter Fiscal 2018 appeared first on HPCwire.

Caringo Introduces Caringo Drive for Swarm Scale-Out Hybrid Storage

Fri, 11/10/2017 - 08:07

AUSTIN, Tex., Nov. 10, 2017 — Today, Caringo announced their latest product, Caringo Drive, a virtual drive for Swarm Scale-Out Hybrid Storage, which they will demo at SC17 Booth 1001 in Denver, Colorado, November 13–16, 2017, along with their complete product line. Once Caringo Drive is installed on macOS and Windows systems, customers have convenient access and can easily drag and drop files to Swarm with background parallel transfer. This speeds content uploads and provides simple drive-based access to Swarm from applications.

Caringo’s flagship product Swarm eliminates storage silos by turning standard server hardware into a limitless pool of data resources delivering continuous protection, multi-tenancy and metering for chargebacks. HPC customers are able to offload data from primary storage and enable collaboration while reducing storage TCO by 75% and scaling to 100s of petabytes.

VP of Product Tony Barbagallo said, “Many organizations like Argonne National Laboratories and Texas Tech University trust their storage infrastructure to Caringo Swarm to provide infinite expansion across multi-vendor, local, and cloud-based storage. They use Swarm to store, preserve, and protect data generated in dispersed locations to facilitate in-depth research, drive technological breakthroughs, and support thousands of staff, researchers, and students around the world. With Caringo Drive, we expand our toolset to empower our customers to easily manage their Swarm cluster.”

In addition to showcasing their complete product line at SC17, Caringo will offer a no-cost, full-featured 100TB licenses of Caringo Swarm Scale-Out Hybrid Storage for qualified High-Performance Computing (HPC) customers. SC17 attendees are also invited to join the Caringo team at their widely anticipated Happy Hour at 2 pm, Tuesday and Wednesday, in the Caringo booth. For more information, see https://www.caringo.com/sc17/.

The 100TB license promotion and integration consultation is available now to qualified HPC and Education organizations. Interested parties can visit https://www.caringo.com/solutions/hpc/ for more information.

About Caringo

Founded in 2005, Caringo is committed to helping customers unlock the value of their data and solve issues associated with data protection, management, organization, and search at massive scale.

Source: Caringo

The post Caringo Introduces Caringo Drive for Swarm Scale-Out Hybrid Storage appeared first on HPCwire.

SIGHPC Education and IHPCTC Join Forces to Promote HPC Education and Training

Fri, 11/10/2017 - 07:57

Nov. 10, 2017 — The SIGHPC Education Chapter (SIGHPCEDU) and the International High Performance Computing Training Consortium (IHPCTC) have announced an integration of their efforts to build a combined collaborative community focused on the development, dissemination, and assessment of HPC training and education materials.  The goals of the collaboration include the promotion of HPC training activities, avoidance of duplication of efforts in creating such materials, and the assessment of the impacts of that training.   

The combined organization has begun work on a number of short- and long-term activities aimed at those goals.  Those activities will be discussed at the SIGHPC Education Chapter BoF at SC17 (November16 – 12:15 PM Room 205-207).  They include the preparation of a master list of existing training materials, webinars, blogs and discussion forums, and outlets for publishing training and education experiences.  The outcomes of the discussion at SC17 will be posted on the SIGHPCEDU website (https://sighpceducation.acm.org/) following the conference.  Those interested in volunteering to assist with these efforts should contact the SIGHPC Education Chapter Officers (Richard Coffey, Fernanda Foertter, Steve Gordon, Dana Brunson, and Holly Hirst) at SIGHPCEDUC-OFFICERS@listserv.acm.org.

The SIGHPC Education Chapter is the first virtual chapter of the ACM.  Its objectives are to:

  • Promote an increased knowledge of, and greater interest in, the educational and scientific aspects of HPC and their applications.
  • Provide a means of communication among individuals having an interest in education and career building activities relating to HPC.
  • Promote and collate education activities and programs through formal and informal education activities.
  • Provide guidance to the community on the competencies required for effective application of computational modeling, simulation, data analysis, and visualization techniques.
  • Provide information on quality educational programs and materials as will as facilitating experience building access to existing HPC resources.

Membership is only $10 per year for professionals and $5 for students.

The International High Performance Computing Training Consortium is an ad hoc group of training professionals formed in response to several training workshops held at the annual SC meetings.  Its members includes professional staff from 18 countries.  This group has been organizing HPC training workshops at SC for the past four years.  We welcome you to join us for the Fourth SC Workshop on Best Practices for HPC Training on Sunday from 2-5:30 pm in room 601 of the Convention Center.

Source: SIGHPC

The post SIGHPC Education and IHPCTC Join Forces to Promote HPC Education and Training appeared first on HPCwire.

Fujitsu to Build PRIMERGY Supercomputer for the Institute of Fluid Science at Tohoku University

Thu, 11/09/2017 - 22:22

Nov. 10 — Fujitsu today announced that it has received an order for “The Supercomputer System” from the Institute of Fluid Science at Tohoku University.

The Supercomputer System will consist of multiple computational systems using the latest Fujitsu Server PRIMERGY x86 servers, and is planned to deliver a peak theoretical performance in excess of 2.7 petaflops.

The Supercomputer System will be deployed to the Advanced Fluid Information Research Center in the Institute of Fluid Science, Tohoku University in Sendai, Miyagi Prefecture, with plans to begin operations in fiscal 2018. Through deployment and operations of this system, Fujitsu will support the Tohoku University Institute of Fluid Science in the advancement of its research into the phenomena of fluids in a variety of fields, including biology, energy, aerospace and semiconductors.

Background

The Institute of Fluid Science at Tohoku University has contributed to the development of fluid science in a variety of fields, including clarifying the flow of blood through the body, and controlling plasma flow in semiconductor manufacturing, using a next-generation integrated research method that unites creative experimental research with supercomputer-based computational research.

Now, the institute is upgrading and significantly improving the performance of its core equipment, The Supercomputer System, in order to further enhance its fluid science research in fields such as health, welfare and medicine, the environment and energy, aerospace and manufacturing.

Fujitsu received the order for this system based on a proposal that combined software-based virtualization technology with a large-scale computational system that utilizes the technology Fujitsu has cultivated through HPC development.

Details of the New System

The Supercomputer System is comprised of the core supercomputer which has three computation systems, including two shared-memory parallel computation systems, which can use large capacity memory space, and one distributed-memory parallel computation system, which can execute large-scale parallel programs. It also has a login server and application and remote graphics server, as well as software and a variety of subsystems for tasks such as visualization and storage. The three computation systems in the core supercomputer will consist of Fujitsu’s latest PRIMERGY x86 servers, which are planned to deliver the distributed-memory parallel computation system’s theoretical peak performance in excess of 2.7 petaflops. In addition, by employing a water cooling model, this system will also offer high energy efficiency.

Related Websites

Fujitsu Server PRIMERGY x86 Servers

About Fujitsu

Fujitsu is the leading Japanese information and communication technology (ICT) company offering a full range of technology products, solutions and services. Approximately 155,000 Fujitsu people support customers in more than 100 countries. We use our experience and the power of ICT to shape the future of society with our customers. Fujitsu Limited (TSE: 6702) reported consolidated revenues of 4.5 trillion yen (US$40 billion) for the fiscal year ended March 31, 2017. For more information, please see http://www.fujitsu.com.

Source: Fujitsu

The post Fujitsu to Build PRIMERGY Supercomputer for the Institute of Fluid Science at Tohoku University appeared first on HPCwire.

SC17: Legion Seeks to Elevate HPC Programming

Thu, 11/09/2017 - 22:15

As modern HPC architectures become ever more complex, so too does the task of programming these machines. In the quest for the trifecta of better performance, portability and programmability, new HPC programming systems are being developed. The Legion programming system, a data-centric parallel programming system for writing portable high performance programs, is one such effort that is being developed at Stanford University in collaboration with Nvidia and several U.S. Department of Energy labs.

In this Q&A, Stanford University Computer Science Chair Alex Aiken and Nvidia Chief Scientist Bill Dally provide an overview of Legion, its goals and its relevance for exascale computing. Aiken will hold a tutorial on the Legion programming model this Sunday at SC17 in Denver from 1:30-5pm MT.

HPCwire: Let’s start with a basic, but important question: why does HPC need new programming models?

Alex Aiken, professor and the chair of computer science at Stanford

Alex Aiken and Bill Dally: New programming models are needed to raise the level of programming to enhance portability across types and generations of high-performance computers. Today programmers specify low-level details, like how much parallelism to exploit, and how to stage data through levels of memory. These low-level details tie an application to the performance of a specific machine, and the effort required to modify the code to target future machines is becoming a major obstacle to actually doing high performance computing. By elevating the level of programming, these target-dependent decisions can be made by the programming system, making it easier to write performant codes, and making the codes themselves performance portable.

HPCwire: What is the Legion programming system? What are the main goals of the project?

Aiken and Dally: Legion is a new programming model for modern supercomputing systems that aims to provide excellent performance, portability, and scalability of application codes across a wide range of hardware. A Legion application is composed of tasks written in language of the programmer’s choice, such as C++, CUDA, Fortran, or OpenACC. Legion tasks specify which “regions” of data they will access as well as what kinds of accesses will be performed. Knowledge of the data used by each task allows Legion to confer many benefits to application developers:

Bill Dally, Nvidia chief scientist & Stanford professor

First, a Legion programming system can analyze the tasks and their data usage to automatically and safely infer parallelism and perform the scheduling transformations necessary to fill an exascale machine, even if the code was written in an apparently-sequential style.

Second, the programming system’s knowledge of which data will be accessed by each task allows Legion to automatically insert the necessary data movement for a complex memory hierarchy, greatly simplifying application code and reducing (or often eliminating) idle cyclems on processors waiting for necessary data to arrive.

Finally, Legion’s machine-agnostic description of an application in terms of tasks and regions decouples the process of specifying an application from the determination of how it is mapped to a target machine. This allows the porting and tuning of an application to be done independently from its development and facilitates tuning by machine experts or even a machine learning algorithm. This makes Legion programs inherently performance portable.

HPCwire: The DOE is investing in Legion development as part as part its exascale program. How is Legion positioned to address the challenges of exascale?

Aiken and Dally: Legion is designed for exascale computation. Legion guarantees that parallel execution has the same result as sequential execution, which is a huge advantage for debugging at scale. Legion also provides rich capabilities for describing how a Legion program uses its data. Since managing and moving data is the limiter in many current petascale and future exascale applications, these features give Legion the information it needs to do a much better job of managing data placement and movement than current programming systems. Legion is also highly asynchronous, avoiding the global synchronization constructs which only become more expensive on larger machines. Finally, under the hood, the Legion implementation exploits the extra information it has about a program’s data and its asynchronous capabilities to the hilt, performing much more sophisticated static and dynamic analysis of programs than is possible in current systems to support Legion’s higher level of abstraction while providing scalable and portable performance.

HPCwire: Why is Nvidia involved in Legion? How does Legion fit into Nvidia’s vision for computing?

Dally: Nvidia wants to make it easy for people to develop production application codes that can scale to exascale machines and easily be ported between supercomputers with different GPU generations, numbers of GPUs, and different sized memory hierarchies. By letting programmers specify target-independent codes at a high level, leaving the mapping decisions to the programming system, Legion accomplishes these goals.

Nvidia is also very excited to collaborate with leading researchers from Stanford University and Los Alamos National Lab to move this technology forward.

HPCwire: One of the stated goals/features of Legion is performance portability; at a high-level, how does it achieve this?

Aiken and Dally: Performance portability is achieved in Legion through a strict separation of concerns: we aim to completely decouple the description of the computation from how it is mapped to the target machine. This approach manifests itself explicitly in the programming model: all Legion programs consist of two parts: a machine-independent specification that describes the computation abstractly without any machine details, and one or more application- and/or machine-specific mappers that make policy decisions about how the application should be executed on the target machine. Machine-independent applications can therefore be written once and easily migrated to new machines only by changing the mapping decisions. Importantly, mapping decisions can only impact the performance of the code and never the correctness as the programming system uses program analysis to determine if any data movement and synchronization is necessary to satisfy the mapping decisions.

HPCwire: Alex, what will you be covering in your SC17 tutorial on Sunday and who should attend?

Aiken: The tutorial will cover the major features of the Legion programming system and will be hands-on; participants will be writing programs almost from the start and every concept will be illustrated with a small programming exercise. Anyone who is interested in learning something about the benefits and state of the art of task-based programming models, and of Legion specifically, should find the tutorial useful.

HPCwire: What is the most challenging part of developing a new HPC programming model?

Aiken and Dally: The most challenging part is managing expectations. Is it easy to forget that it took MPI more than 15 years from the time that the initial prototypes were proposed to when really solid implementations were available for use. Many users are expecting new HPC programming models such as Legion to mature much faster than this. We’ve been lucky to collaborate with groups like Jackie Chen’s combustion group at Sandia National Lab, the FleCSi team at Los Alamos National Lab, and the LCLS-II software team at SLAC that are willing to work with us on real applications that push us through our growing pains and ensure the end result will be one that is broadly useful in the HPC programming ecosystem.

HPCwire: How hard is it for an HPC programmer with a legacy application to migrate that application to Legion?

Aiken and Dally: Legion is designed to facilitate the incremental migration of an MPI-based application. Legion interoperates with MPI, allowing a porting effort to focus on moving the performance-critical sections (e.g., the main time-stepping loop or a key solver) to Legion tasks while leaving other parts of the application such as initialization or file I/O in their original MPI-based form. And since Legion operates at the granularity of tasks, the compute heavy “inner loops” from the original optimized application code can often be used directly as the body of newly-created Legion tasks.

As an example, the combustion simulation application S3D, developed at Sandia National Labs, consists of over 200,000 lines of Fortran+MPI code, but only two engineer-months of effort were required to port the main integration loop to Legion. The integration loop comprises only 15 percent of the overall code base, but consumes 97 percent of the cycles during execution. Although still contained in the original Fortran shell, the use of the Legion version of the integration loop allows S3D to run more than 4x faster than the original Fortran version, and over 2x faster than other GPU-accelerated versions of the code.

The above figure shows the architecture of the Legion programming system. Applications targeting Legion have the option of either being written in the Regent programming language or written directly to the Legion C++ runtime interface. Applications written in Regent are compiled to LLVM (and call a C wrapper for the C++ runtime API). Additional info.

The post SC17: Legion Seeks to Elevate HPC Programming appeared first on HPCwire.

Ahead of SC17, Mellanox Launches Scalable 200G Switch Platforms

Thu, 11/09/2017 - 15:54

In the run-up to the annual supercomputing conference SC17 next week in Denver, Mellanox made a series of announcements today, including a scalable switch platform based on its HDR 200G InfiniBand technology and the first deployment of a 100Gb/s Linux kernel-based Ethernet switch.

The company touts its HDR (High Data Rate) 200G InfiniBand Quantum, which offers up to 800 ports of 200Gb/s or 1,600 ports 100Gb/s in one chassis, as the most scalable switch platform available.

The platform family includes:

  • Quantum QM8700: 40-port 200Gb/s or 80-port 100Gb/s
  • Quantum CS8510: modular 200-port 200Gb/s or 400-port 100Gb/s
  • Quantum CS8500: modular 800-port 200Gb/s or 1,600-port 100Gb/s

Mellanox said the Quantum product line’s switch density will enable space and power consumption optimization, reducing network equipment cost by 4X, electricity costs by 2X and improving data transfer time by 2X.

Departmental-scale implementations of the Quantum QM8700 switch connects 80 servers, which the company said is 1.7 times higher than competitive products. For enterprise-scale, a 2-layer Quantum switch topology connects 3,200 servers, 2.8X higher. For hyperscale, a 3-layer Quantum switch topology connects 128,000 servers, or 4.6 times higher than competitive products.

“The HDR 200G Quantum switch platforms will enable the highest scalability, all the while dramatically reducing the data center capital and operational expenses,” said Gilad Shainer, vice president of marketing at Mellanox. “Quantum will enable the next generation of high-performance computing, deep learning, big data, cloud and storage platforms to deliver the highest performance and setting a clear path to exascale computing.”

“Mellanox Quantum more than doubles the number of compute nodes per InfiniBand leaf switch, which supports the industry-leading physical density of the Penguin Tundra ES platform,” said Jussi Kukkonen, vice president, Advanced Solutions, Penguin Computing, Inc. “In early 2018, Penguin will bring to market the first systems featuring a true PCI-Express generation 4 I/O subsystem, unlocking the full 200Gbps performance potential of the Mellanox Quantum and InfiniBand HDR.”

Mellanox also announced that Atos, a European services provider focused on digital transformation, big data, cybersecurity and HPC, will  incorporate the HDR (200G) and HDR100 (100G) InfiniBand solutions, on Atos’s BullSequana X1000 open server supercomputer platform.

Mellanox said the new Quantum switches will start shipping in the first half of 2018 and they will be demonstrated at SC17.

Mellanox also announced the first major production deployment of a 100Gb/s Ethernet Spectrum switch based on the Linux Switchdev driver to support the content distribution network service of NGENIX, a subsidiary of Rostelecom, a leading Russian telecom provider. Mellanox said this is the first major deployment of an open Ethernet switch based on the Switchdev (a common API for swapping Ethernet switches in and out of networks) driver that has been accepted and is available as open source as part of the Linux kernel.

The Switchdev driver runs as part of the standard kernel, and thus enables downstream Linux OS distributions and off-the-shelf Linux-based applications to operate the switch, the company said. The driver abstracts proprietary ASIC application programming interfaces (APIs) with standard Linux APIs for the switch data plane configuration. The key advantage of Switchdev for network administrators and software developers is an open source driver that doesn’t rely on any vendor-specific binary packages, with a well-known, well-documented and open data plane abstraction that is native to Linux.

Mellanox said the combination of Spectrum switch systems running an open, standard Linux distribution provides NGENIX with unified Linux interfaces across datacenter entities, servers and switches, with no compromise on performance.

“We were looking for a truly open solution to power our next generation 100Gb Ethernet network,” said Dmitry Krikov, CTO at NGENIX. “The choice was clear. Not only was the Mellanox Spectrum-based switch the only truly open, Linux kernel-based solution, but also allows us to use a single infrastructure to manage, authorize and monitor our entire network. In addition, it’s proving to be very cost-effective in terms of price-performance.”

Mellanox said there is strong market demand among web companies and network operators for a common API to swap Ethernet switches in and out of networks as easily as a new server. “As a pioneer in network disaggregation, Mellanox Technologies has been a major contributor to enabling the infrastructure of the open source Switchdev model,” the company said in a prepared statement.

Mellanox said the Linux kernel Switchdev driver is available for all Mellanox Spectrum SN2000 switch systems, as well as Mellanox Spectrum switch ASIC. The SN2000 portfolio is available in a variety of port and speed configurations (10/25/40/50/100GbE), including the SN2100, a high-density, half-width 16-port non-blocking 100GbE switch. The driver will also be available for Spectrum-2, the next generation 6.4Tb/s switch ASIC and the SN3000 switch systems using it. Both are expected to be available in 2018, the company said.

The post Ahead of SC17, Mellanox Launches Scalable 200G Switch Platforms appeared first on HPCwire.

HPC Cloud Startup XTREME Design Gets Series A Funding, Expands to US

Thu, 11/09/2017 - 13:49

TOKYO, Nov. 9, 2017 — Japanese HPC Cloud Startup XTREME Design Inc. (XTREME-D) today announced the completion of a $2.75M financing agreement in Series-A round funding led by World Innovation Lab (WiL). The total funding amount is now $4M, including the previous Pre-A round, and WiL is now the lead investor. XTREME-D will be exhibiting at Supercomputing in Denver, Colorado, November 13–16, where the company will be launching and pre-announcing several new products.

XTREME-D is well-known in Japan for architecting technical computing in the cloud. The company develops and sells XTREME DNA, a cloud-based, virtual, supercomputing-on-demand service that provides unattended services ranging from the construction of a high-speed analysis system on the cloud to optimal operation monitoring, and eliminates the systems after use. XTREME DNA was launched to cut costs and make it easy to build HPC clusters in just in 10 minutes. Cost savings extend beyond equipment, as engineers with the specialized skills required to construct complex clusters are no longer needed.

XTREME-D’s new funding will be utilized for R&D, market launch in the US, expanding the feature set of the company’s current products, and developing new solutions for launch in H1 next year. Details about upcoming products can be shared in private meetings at XTREME-D’s booth at Supercomputing, where demos of XTREME DNA and the launch of their new Computer Aided Engineering (CAE) template for ChainerMN (the distributed cloud version of the advanced parallel deep learning framework Chainer) can also be viewed.

“The market is expanding rapidly as demand for high-speed analytical systems for the Internet of Things (IoT), Artificial Intelligence (AI), Deep Learning, and Machine Learning grows,” said Masataka Matsumoto, General Partner of WiL. “These systems are needed not just within traditional HPC but also across broader fields, such as smart cities and bioinformatics. XTREME Design provides access to supercomputing resources for companies that didn’t previously have it, and has tremendous growth potential right now within High Performance Technical Computing and beyond.”

XTREME-D is capitalizing on this exciting market environment with overseas business expansion to both North America and EMEA. Vitec Electronics Americas of San Diego, California, is XTREME-D’s first US reseller, utilizing XTREME-D to provide a range of products, including access to the world’s fastest class GPU instance of SkyScale® (provided by One Stop Systems) running on Microsoft Azure. Partnership with WiL, which has offices in both Japan and United States, will help XTREME-D to make a full-scale entry into the North American market.

German-based ViMOS Technologies is XTREME-D’s first European reseller, and the company is interested in signing up additional ISVs and systems integrators across the US and EMEA. “We have a very strong offering that allows resellers to provide turnkey cloud-based virtual supercomputers to their customers, including software, middleware, and system configuration,” said Naoki Shibata, Founder and CEO of XTREME Design. “Being able to access a virtual supercomputer for minimal budget and no need of a specialized skill set to configure the system is a compelling sales pitch.”

Visit XTREME Design at booth 1485 at SC17 in Denver, Colorado from November 13–16 for an eyes-only sneak peak at next-generation products for the democratization of HPC.

About XTREME Design

XTREME Design Inc. was established in 2015 and is headquartered in Shinagawa-ku, Tokyo. The company has one goal — the democratization of supercomputing. Its cloud-based, virtual, supercomputing-on-demand service XTREME DNA makes HPC resources available to everyone, delivering an easy-to-use customer experience through a robust UI/UX and cloud management features. XTREME DNA delivers high-end compute capabilities supporting private, public, and hybrid cloud, featuring the latest CPUs, GPUs, and interconnect options. Applications include CAE, machine learning, deep learning, high performance data analysis, and IoT. For more information visit http: //xd-lab.net/en.

About World Innovation Labs

World Innovation Labs LLC (WiL) connects entrepreneurs with corporate resources to build global businesses. WiL Fund II, LP is a pooled venture investment development fund managed by WiL and headquartered in Palo Alto, California. The fund specializes in seed investments in technology, media, telecom, and technical services in the United States and Japan, and is engaged in fostering new business through collaboration with large companies. Through this collaboration WiL seeks to develop activities that accelerate open innovation and disseminate entrepreneurial spirit. For more information visit www.wilab.com.

Source: XTREME Design

The post HPC Cloud Startup XTREME Design Gets Series A Funding, Expands to US appeared first on HPCwire.

FileCatalyst to Attend SuperComputing 2017 in Denver Colorado

Thu, 11/09/2017 - 12:13

OTTAWA, Ontario, Nov. 9, 2017 — FileCatalyst, an Emmy award-winning pioneer in managed file transfers and a world-leading accelerated file transfer solution, will exhibit at SuperComputing 2017 (SC17) in booth 2255 from November 13-16 at the Colorado Convention Center, Denver, Colorado. FileCatalyst will be showcasing all of the latest advancements made to their suite of accelerated file transfer solutions including:

  • Newly updated Graphical User Interfaces (GUI’s) that work with any modern web browser.
  • New consumption-based billing, which will be available in per hour, and per transferred GB models.
  • The FileCatalyst TransferAgent client can now run on Linux, Windows, and OSX as a service, with the addition of a two-way file transfer pane.
  • FileCatalyst Direct Server now has extended and improved web-based administration for HotFolder client and Server.
  • FileCatalyst Central now allows users to configure personalized map views of their deployment (either through geographical or functional maps), real-time and historical transfer data for all nodes, node-to-node-transfers, and TransferAgent client support.
  • FileCatalyst Workflow has integrated TransferAgent for file areas, a video file preview feature, and embeddable upload forms.

“FileCatalyst is entering an exciting period,” says Chris Bailey, Co-Founder, and CEO of FileCatalyst. “Our portfolio has not only seen an update to the GUI’s, but we are happy to include some new features that our customers have requested. This year has also seen growth within our ISV ecosystem, as well as our channel partner portfolio. We are thrilled that people are seeking out FileCatalyst, and we are excited to showcase all of our offerings at SC17.”

FileCatalyst has also developed some ISV partnerships that include:

  • Acembly and FileCatalyst have partnered to create Acembly File Accelerator (AFA), Powered by FileCatalyst, which accelerates file transfers to and from the cloud. The solution also includes a new consumption (per GB) based model.
  • FileCatalyst has integrated with Caringo Swarm to accelerate the transmission of digital assets, allowing Caringo to deliver an even better end-user experience with reduced complexity and costs. Caringo and FileCatalyst will be doing a draw for a $300 Amazon gift card during the conference. Visit the Caringo booth (1001) and FileCatalyst (2255) for a chance to win.
  • NICE Software, a pioneer in technical and engineering cloud solutions, will be providing demos of their HPC Portal, EnginFrame, as well as FileCatalyst Direct running on AWS in booth 2117.

For those attending SC17 that want to learn more, FileCatalyst will be in Booth 2255 from November 13-16. They will be showcasing their entire suite of products, as well as giving live demos on the tradeshow floor.

About FileCatalyst

Located in Ottawa, Canada, a pioneer in managed file transfers and an Emmy award-winning leader of accelerated file transfer solutions. The company, founded in 2000, has more than one thousand customers in media & entertainment, energy & mining, gaming, and printing, including many Fortune 500 companies as well as military and government organizations. FileCatalyst is a software platform designed to accelerate and manage file transfers securely and reliably. FileCatalyst is immune to the effects that latency and packet loss have on traditional file transfer methods like FTP, HTTP, or CIFS. Global organizations use FileCatalyst to solve issues related to file transfer, including content distribution, file sharing, and offsite backups.

Source: FileCatalyst

The post FileCatalyst to Attend SuperComputing 2017 in Denver Colorado appeared first on HPCwire.

The Hair-Raising Potential of Exascale Animation

Thu, 11/09/2017 - 12:05

Nov. 9, 2017 — There is no questioning the power of a full head of shiny, buoyant hair. Not in real life, not in commercials, and, it turns out, not in computer-generated (CG) animation. Just as more expensive brands of shampoos provide volume, luster, and flow to a human head of hair, so too does more expensive computational power provide the waggle of a prince’s mane or raise the hackles of an evil yak.

Hair proves to be one of the most complex assets in animation, as each strand is comprised of near-infinite individual particles, affecting the way every other strand behaves. With the 2016 release of their feature TrollsDreamWorks Animation had an entire ensemble of characters with hair as a primary feature. The studio will raise the bar again with the film sequel slated for 2020.

The history of DreamWorks Animation is, in many ways, the history of technical advances in computing over the last three decades. Those milestones are evidenced by that flow of hair—or lack thereof—the ripple in a dragon’s leathery wing, or the texture and number of environments in any given film.

Exascale computing will push the Media and Entertainment industry beyond today’s technical barriers.

As the development and accessibility of high-performance computers explode beyond current limits, so too will the creative possibilities for the future of CG animation ignite.

Jeff Wike, Chief Technology Officer (CTO) of DreamWorks Animation, has seen many of the company’s innovations come and go, and fully appreciates both the obstacles and the potential of technological advances on his industry.

“Even today, technology limits what our artists can create,” says Wike. “They always want to up the game, and with the massive amount of technology that we throw at these films, the stakes are enormous.”

Along with his duties as CTO, Wike is a member of the U.S. Department of Energy’s Exascale Computing Project (ECP) Industry Council. The advisory council is comprised of an eclectic group of industry leaders reliant on and looking to the future of high-performance computing, now hurtling toward the exascale frontier.

The ability to perform a billion billion operations per second changes the manufacturing and services landscape for many types of industries and, as Wike will tell you, strip away the creative process and those in the animation industry are manufacturers of digital products.

“This is bigger than any one company or any one industry,” he says. “As a member of the ECP’s Industry Council, we share a common interest and goal with companies representing a diverse group of U.S. industries anxiously anticipating the era of exascale computing.”

Such capability could open a speed-of-light gap between DreamWorks’ current 3D animation and the studio’s origins, 23 years ago, as a 2D animation company producing computer-aided hand-drawn images

Growing CG animation

Wike’s role has certainly evolved since he joined DreamWorks in 1997, with the distinctive job title of technical gunslinger, a position in which he served, he says, as part inventor, part MacGyver, and part tech support.

When Chris deFaria joined DreamWorks Animation as president in March 2017, he instantly identified an untapped opportunity that only could be pursued at a studio where storytellers and technology innovators work in close proximity. He created a collaboration between these two areas in which the artists’ infinite imaginations drive cutting edge technology innovations which, in turn, drive the engineers to imagine even bigger. In essence, a perpetual motion machine of innovation and efficiency.

Under this new reign, Wike distills his broader role into three simple goals: make sure employees have what they need, reduce the cost and production time of films, and continue to innovate in those areas that are transformational.

High-Performance Computing Is Key to Innovation

For DreamWorks—and other large industry players like Disney and Pixar—the transformation of the animated landscape is, and has been, driven by innovations in computer software and hardware.

Much of the CG animation industry was built on the backs of what were, in the late 1990s, fairly high-performance graphics-enabled processors. But computer technology advanced so quickly, DreamWorks was challenged to keep up with the latest and greatest.

“Some of the animators had home computers that were faster than what we had at work,” Wike recalls.

By the time Shrek appeared in 2001, after the early successes of DreamWorks’ first fully CG animated feature, Antz, and Pixar’s Toy Story, it was clear to the fledgling industry, and the movie industry as a whole, that CG animation was the next big wave. Audiences, too, already were expecting higher quality, more complexity and greater diversification with each succeeding film.

To meet mounting expectations, the industry needed a computational overhaul to afford them more power and greater consistency. As the early graphics processors faced more competition, the industry banded together to agree on common requirements, such as commodity hardware, open source libraries, and codes. This developed into an approved list that makes it easier for vendors to support.

Today, DreamWorks’ artists are using high-end dual processor, 32-core workstations with network-attached storage and HPE Gen9 servers utilizing 22,000 cores in the company’s data center. That number is expected to nearly double soon, as the company has now ramped up for production of How to Train Your Dragon 3.

It’s still a long way from exascale. It’s still a long way from petascale, for that matter; compared to current petascale computers that can comprise upwards of 750,000 cores. But the industry continues to push the envelope of what’s possible and what is available. Continuous upgrades in hardware, along with retooling and development of software, create ever-more astounding visuals and further prepare the industry for the next massive leap in computing power.

“I’d be naïve to say that we’re ready for exascale, but we’re certainly mindful of it,” says Wike. “That’s one reason we are so interested in what the ECP is doing.  The interaction with the technology stakeholders from a wide variety of industries is invaluable as we try to understand the full implications and benefits of exascale as an innovation driver for our own industry.”

To read more, follow this link: https://www.exascaleproject.org/hair-raising-potential-exascale-animation/

Source: Exascale Computing Project

The post The Hair-Raising Potential of Exascale Animation appeared first on HPCwire.

Gidel Launches New High Performance Line of Acceleration Boards Based on Intel’s Stratix 10 FPGA

Thu, 11/09/2017 - 09:49

SANTA CLARA, California, and Or-Akiva, Israel, Nov. 9, 2017 – Gidel, a technology leader in high-performance accelerators utilizing FPGAs, has launched their latest product line, the Proc10S. The Proc10S is part of the Proc family of high performance, scalable compute acceleration boards, but is based on the Stratix 10 FPGA, which was released by Intel in late 2016. The Stratix 10 represents twice the performance gain over the Arria 10, with 30% lower power consumption per TFLOP.

The Proc10S pushes data processing power to new heights with peak single precision performance of up to 10 TFLOPS per device, based on 25 MB of L1 cache at up to 94 TB/s peak bandwidth. The board features an Intel Stratix 10SG 2800/2100/1100 FPGA with 16-lane PCI-Express Gen 3.0 and an 18+ GB multi-level memory structure consisting of three banks of DDR4 memory on board and on DIMMs (up to 260 GB of DDR4).

With up to 2.8 million logic elements, the Proc10S gives designers incredible performance potential. It also features flexible high-speed communication ports — dual SFP+ and dual QSFP+ support at 26 Gb/s per channel — and a PHS connector for a high speed daughter board that features eight channels of full duplex Tx/Rx and up to 139 Gb/s total.

Gidel’s newest acceleration board was designed with high density Big Data and HPC applications in mind. “The Proc10S is a heavy-duty FPGA and thus opens new markets in HPC for Gidel, such as Deep Learning and Big Data analytics,” says Ofer Pravda, VP Marketing and Sales at Gidel. “Gidel’s long history in algorithm acceleration utilizing FPGA technology has resulted in an enormous wealth of product knowledge that provides us with an advantage in certain HPC and Vision arenas.”

Artificial Intelligence and Deep Learning are ideal markets for the Proc10S because features need to be extracted from data in order to solve predictive problems, such as image classification and detection, image recognition and tagging, network intrusion detection, and fraud/face detection. Other applica­tions include compute intense algorithm processing, network analytics, communications, cyber security, storage, big data analytics, and cloud computing.

The Proc10S is supported by the ProcDeveloper’s Kit™, Gidel’s proprietary tools that make developing on FPGA fast and easy, and allow for simultaneous acceleration of multiple applications or processes, unmatched HDL design productivity (VHDL or Verilog), and simple integration with software applica­tions. Gidel’s tools make developing on FPGA accessible to software engineers by automatically gen­erating an Application Support Package (ASP) and an API that maps the relevant user’s variables directly into the FPGA design. The tools offer a solution that is unique in the market, and together with Intel’s HLS and OpenCL allow unmatched development efficiency and effectiveness.

The Proc 10S 2100/2800 will be available in Q1 2018; additional Proc10S accelerators will be released later next year.

Visit Gidel in booth 1242 at SC17 in Denver, Colorado (Nov 13-16) to explore the Proc10S board and view demos on acceleration applications.

About Gidel

For 25 years, Gidel has been a technology leader in high performance, innovative, FPGA-based acceler­ators. Gidel’s reconfigurable platforms and development tools have been used for optimal application tailoring and for reducing the time and cost of project development. Gidel’s dedicated support and its products’ performance, ease-of-use, and long-life cycles have been well appreciated by satisfied cus­­tomers in diverse markets who continuously use Gidel’s products, generation after generation. For more information visit www.gidel.com.

Source: Gidel

The post Gidel Launches New High Performance Line of Acceleration Boards Based on Intel’s Stratix 10 FPGA appeared first on HPCwire.

The HPC Storage Cocktail, Both Shaking and Stirring SC17

Thu, 11/09/2017 - 09:35

I’ve no doubt that familiar themes will be circulating the halls of Supercomputing in Denver, echoes of last year’s show – how to survive in the post-Moore’s Law era, the race to exascale, how to access quantum computing. But this year I think there will be another overarching theme added to coffee queue chat: how to cope with the new norm, the HPC storage cocktail.

I’m referring to a practice that more and more people are considering: mixing different environments as well as on-prem and cloud platforms to make storage spend go as far as possible. As the new architecture from ARM gains traction and more and more people look to cloud platforms to boost their on-premise clusters, there’s no doubt that the question of how to make these systems work together effectively will be on people’s lips.

Mixing storage systems can throw up real problems, or uncover problems that until that point have been hidden. For example, moving to a new environment can expose I/O problems that weren’t there before including bad I/O patterns such as small reads and writes that can look like CPU activity until the I/O is profiled. An organisation won’t be able to feel the benefit that investment in a new storage system should bring unless the bridge between the existing system and the new can be fully understood.

At SC, this issue will certainly be addressed and there will be the usual rainbow of storage solutions and add-on technologies to help. Our team are looking forward to learning about new solutions that are emerging to help organisations to manage mixed systems. It’s still early days for many people to have adopted this type of environment, but we’ve already spoken to a lot of people who are testing the water with hybrid cloud environments.

At this stage most organisations we work with are selecting specific projects to migrate to the cloud and thinking about new storage architectures that they can exploit with that move. Object storage has a set-up cost, but with potentially good long-term cost savings I expect that a lot of vendors will be pushing that for on-prem deployments as well.

Containerization is another flavour to add to the mix. Most people are looking at Docker or Singularity as the two main options sitting on top of various platforms such as OpenStack or Kubernetes. While Singularity is little known outside the HPC community, from a high level it seems to better support some of the data demands of HPC applications, but obviously doesn’t have such a developed ecosystem around it as Docker. This year’s SC might be the year that more make the leap to deploy it in production and see how it measures up.

Another trend I believe will be that we will see far more people treading the halls of SC who might not have been there in previous years. Big data and the growth of AI mean that more and more industries are looking to what has been considered HPC storage to provide the big compute that they need to run their applications and programs.

These trends all feed into each other. The presence of these newcomers with their different views on hardware and software is no doubt speeding up the growth of cloud platforms in the traditional HPC storage market, which is no bad thing. We could all do having our viewpoints shaken up.

In general, we are heading into an era with more variety, more competitive platforms, serving a greater and more diverse range of customers. This could well be the most exciting SC yet as just a few of the opportunities that this cocktail presents start to become apparent.

About the Author

Dr. Rosemary Francis is CEO and founder of Ellexus, the I/O profiling company. Ellexus makes application profiling and monitoring tools that can be run on a live compute cluster to protect from rogue jobs and noisy neighbors, make cloud migration easy and allow a cluster to be scaled rapidly. The system- and storage-agnostic tools provide end-to-end visibility into exactly what applications and users are up to.

The post The HPC Storage Cocktail, Both Shaking and Stirring SC17 appeared first on HPCwire.

Pages