HPC Wire

Subscribe to HPC Wire feed
Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them
Updated: 16 hours 36 min ago

Fujitsu Develops WAN Acceleration Technology Utilizing FPGA Accelerators

Mon, 12/11/2017 - 10:37

TOKYO, Dec. 11, 2017 — Fujitsu Laboratories Ltd. today announced the development of WAN acceleration technology that can deliver transfer speeds up to 40Gbps for migration of large volumes of data between clouds, using servers equipped with field-programmable gate arrays (FPGAs).

Connections in wide area networks (WANs) between clouds are moving from 1Gbps lines to 10Gbps lines, but with the recent advance of digital technology, including IoT and AI, there is an even greater demand for faster high-speed data transfers as huge volumes of data are collected in the cloud. Until now the effective transfer speed of WAN connections has been raised using techniques to reduce the volume of data, such as compression and deduplication. However, with WAN lines of 10Gbps there are enormous volumes of data to be processed, and existing WAN acceleration technologies usable in cloud servers have not been able to sufficiently raise the effective transfer rate.

Fujitsu Laboratories has now developed WAN acceleration technology capable of real-time operation even with speeds of 10Gbps or higher. WAN acceleration technology is achieved with a dedicated computational unit specialized for a variety of processing, such as feature value calculations and compression processing, mounted onto an FPGA equipped on a server, and in tandem with this, by enabling highly parallel operation of the computational units by supplying data at the appropriate times based on the predicted completion of each computation.

In a test environment where this technology was deployed on servers that use FPGAs, and where the servers were connected with 10Gbps lines, Fujitsu Laboratories confirmed that this technology achieved effective transfer rates of up to 40Gbps, the highest performance in the industry. With this technology, it has become possible to transfer data at high-speeds between clouds, including data sharing and backups, enabling the creation of next-generation cloud services that share and utilize large volumes of data across a variety of companies and locations.

Fujitsu Laboratories aims to deploy this technology, capable of use in cloud environments, as an application loaded on an FPGA-equipped server. It is continuing evaluations in practical environments with the goal of commercializing this technology during fiscal 2018.

Fujitsu Laboratories will announce details of this technology at the 2017 International Conference on Field-Programmable Technology (FPT 2017), an international conference to be held in Melbourne, Australia on December 11-13.

Development Background

As the cloud has grown in recent years, there has been a movement to increase data and server management and maintenance efficiency by migrating data (i.e., internal documents, design data, and email) that had been managed on internal servers to the cloud. In addition, as shown by the spread in the use of digital technology such as IoT and AI, there are high expectations for the ways that work and business will be transformed by the analysis and use of large volumes of data, including camera images from factories and other on-site locations, and log data from devices. Given this, there has been explosive growth in the volume of data passing through WAN lines between clouds, spurring a need for next-generation WAN acceleration technology capable of huge data transfers at high-speed between clouds.


WAN acceleration technologies improve effective transfer speeds by reducing the volume of data through compression or deduplication of the data to be transferred. When transferring data at even higher speeds using 10Gbps network lines, the volume of data needing to be processed is so great that the compression and deduplication processing speed in the server bottlenecks. Therefore, in order to improve real-time operation, there is a need for either CPUs that can operate at higher speeds, or for WAN acceleration technology with faster processing speeds.

About the Newly Developed Technology

Fujitsu Laboratories has now developed WAN acceleration technology that can achieve real-time operation usable in the cloud even with speeds of 10Gbps or more, using server-mounted FPGAs as accelerators. Efficient operations with WAN acceleration technology are accomplished by using an FPGA to process a portion of the processing for which the computation is heavy and for which it is difficult to improve processing speed in the CPU, when performing compression or deduplication for WAN acceleration processing, and by efficiently connecting the CPU with the FPGA accelerator. Details of the technology are as follows.

1. FPGA parallelization technology using highly parallel dedicated computational units

Fujitsu Laboratories has developed FPGA parallelization technology that can significantly reduce the processing time required for data compression and deduplication by deploying dedicated computational units specialized for data partitioning, feature value calculation, and lossless compression processing in a FPGA in a highly parallel configuration, and by enabling highly parallel operation of the computational units by delivering data at the appropriate times based on predictions of the completion of each calculation.

2. Technology to optimize the flow of processing between CPU and FPGA

Previously, in determining whether to apply lossless compression to data based on the identification of duplication in that data, it was necessary to read the data twice, both before and after the duplication identification was executed on the FPGA, increasing overhead and preventing the system from delivering sufficient performance. Now, by consolidating the processing handoff onto the FPGA, handling both the preprocessing for duplication identification and the compression processing on the FPGA, and using a processing sequence that controls how the compression processing results are reflected on the CPU based on the results of the duplication identification, this technology reduces the overhead between the CPU and FPGA from reloading the input data and from control exchanges. This reduces the waiting time due to the handoff of data and control between the CPU and FPGA, delivering efficient coordinated operation of the CPU and FPGA accelerator.


Fujitsu Laboratories deployed this newly developed technology in servers installed with FPGAs, confirming acceleration approximately thirty times the performance of CPU processing alone. Fujitsu Laboratories evaluated the transfer speed for a high volume of data in a test environment where the servers were connected with 10Gbps connections, and in a test simulating the regular backup of data, including documents and video, confirmed that this technology achieved transfer speeds up to 40Gbps, an industry record. This technology has significantly improved data transfer efficiency over WAN connections, enabling high-speed data transfers between clouds, such as data sharing and backups, making possible the creation of next-generation cloud services that share and use large volumes of data between a variety of companies and locations.

Future Plans

Fujitsu Laboratories will continue to evaluate this technology in practical environments, deploying this technology in virtual appliances that can be used in cloud environments. Fujitsu Laboratories aims to make this technology available as a product of Fujitsu Limited during fiscal 2018.

About Fujitsu Laboratories

Founded in 1968 as a wholly owned subsidiary of Fujitsu Limited, Fujitsu Laboratories Ltd. is one of the premier research centers in the world. With a global network of laboratories in Japan, China, the United States and Europe, the organization conducts a wide range of basic and applied research in the areas of Next-generation Services, Computer Servers, Networks, Electronic Devices and Advanced Materials. For more information, please see: http://www.fujitsu.com/jp/group/labs/en/.

About Fujitsu Ltd

Fujitsu is a leading Japanese information and communication technology (ICT) company, offering a full range of technology products, solutions, and services. Approximately 155,000 Fujitsu people support customers in more than 100 countries. We use our experience and the power of ICT to shape the future of society with our customers. Fujitsu Limited (TSE: 6702) reported consolidated revenues of 4.5 trillion yen (US$40 billion) for the fiscal year ended March 31, 2017. For more information, please seehttp://www.fujitsu.com.

Source: Fujitsu Ltd

The post Fujitsu Develops WAN Acceleration Technology Utilizing FPGA Accelerators appeared first on HPCwire.

HPC Iron, Soft, Data, People – It Takes an Ecosystem!

Mon, 12/11/2017 - 09:53

Cutting edge advanced computing hardware (aka big iron) does not stand by itself. These computers are the pinnacle of a myriad of technologies that must be carefully woven together by people to create the computational capabilities that are used to deliver insights into the behaviors of complex systems. This collection of technologies and people has been called the High Performance Computing (HPC) ecosystem. This is an appropriate metaphor because it evokes the complicated nature of the interdependent elements needed to deliver first of a kind computing systems.

The idea of the HPC ecosystem has been around for years and most recently appeared in one of the objectives for the National Strategic Computing Initiative (NSCI). The 4th objective calls for “Increasing the capacity and capability of an enduring national HPC ecosystem.” This leads to the questions of, “what makes up the HPC ecosystem” and why is it so important? Perhaps the more important question is, why does the United States need to be careful about letting its HPC ecosystem diminish?

The heart of the HPC ecosystem is clearly the “big humming boxes” that contain the advanced computing hardware. The rows upon rows of cabinets are the focal point of the electronic components, operating software, and application programs that provide the capabilities that produce the results used to create new scientific and engineering insights that are the real purpose of the HPC ecosystem. However, it is misleading to think that any one computer at any one time is sufficient to make up an ecosystem. Rather, the HPC ecosystem requires a continuous pipeline of computer hardware and software. It is that continuous flow of developing technologies that keeps HPC progressing on the cutting edge.

The hardware element of the pipeline includes systems and components that are under development, but are not currently available. This includes the basic research that will create the scientific discoveries that enable new approaches to computer designs. The ongoing demand for “cutting edge” systems is important to keep system and component designers pushing the performance envelope. The pipeline also includes the currently installed highest performance systems. These are the systems that are being tested and optimized. Every time a system like this is installed, technology surprises are found that must be identified and accommodated. The hardware pipeline also includes systems on the trailing edge. At this point, the computer hardware is quite stable and allows a focus on developing and optimizing modeling and simulation applications.

One of the greatest challenges of maintaining the HPC ecosystem is recognizing that there are significant financial commitments needed to keep the pipeline filled. There are many examples of organizations that believed that buying a single big computer would make them part of the ecosystem. In those cases, they were right, but only temporarily. Being part of the HPC ecosystem requires being committed to buying the next cutting-edge system based on the lessons learned from the last system.

Another critical element of the HPC ecosystem is software. This generally falls into two categories – software needed to operate the computer (also called middleware or the “stack”) and software that provides insights into end user questions (called applications). Middleware plays the critical role of managing the operations of the hardware systems and enabling the execution of applications software. Middleware includes computer operating systems, file systems and network controllers. This type of software also includes compilers that translate application programs into the machine language that will be executed on hardware. There are quite a number of other pieces of middleware software that include libraries of commonly needed functions, programming tools, performance monitors, and debuggers.

Applications software span a wide range and are as varied as the problems users want to address through computation. Some applications are quick “throwaway” (prototype) attempts to explore potential ways in which computers may be used to address a problem. Other applications software is written, sometimes with different solution methods, to simulate physical behaviors of complex systems. This software will sometimes last for decades and will be progressively improved. An important aspect of these types of applications is the experimental validation data that provide confidence that the results can be trusted. For this type of applications software, setting up the problem that can include finite element mesh generation, populating that mesh with material properties and launching the execution are important parts of the ecosystem. Other elements of usability of application software include the computers, software, and displays that allow users to visualize and explore simulation results.

Data is yet another essential element of the HPC ecosystem. Data is the lifeblood in the circulatory system that flows through the system to keep it doing useful things. The HPC ecosystem includes systems that hold and move data from one element to another. Hardware aspects of the data system include memory, storage devices, and networking. Also software device drivers and file systems are needed to keep track of the data. With the growing trend to add machine learning and artificial intelligence to the HPC ecosystem, its ability to process and productively use data are becoming increasingly significant.

Finally, and most importantly, trained and highly skilled people are an essential part of the HPC ecosystem. Just like computing systems, these people make up a “pipeline” that starts in elementary school and continues through undergraduate and then advanced degrees. Attracting and educating these people in computing technologies is critical. Another important part of the people pipeline of the HPC ecosystem are the jobs offered by academia, national labs, government, and industry. These professional experiences provide the opportunities needed to practice and hone HPC skills.

The origins of the United States’ HPC ecosystem dates back to the decision by the U.S. Army Research Lab to procure an electronic computer to calculate ballistic tables for its artillery during World War II (i.e. ENIAC). That event led to finding and training the people, who in many cases were women, to program and operate the computer. The ENIAC was just the start of the nation’s significant investment in hardware, middleware software, and applications. However, just because the United States was the first does not mean that it was alone. Europe and Japan also have robust HPC ecosystems for years and most recently China has determinedly set out to create one of their own.

The United States and other countries made the necessary investments in their HPC ecosystems because they understood the strategic advantages that staying at the cutting edge of computing provides. These well-document advantages apply to many areas that include: national security, discovery science, economic competitiveness, energy security and curing diseases.

The challenge of maintaining the HPC ecosystem is that, just like a natural ecosystem, the HPC version can be threatened by becoming too narrow and lacking diversity. This applies to the hardware, middleware, and applications software. Betting on just a few types of technologies can be disastrous if one approach fails. Diversity also means having and using a healthy range of systems that covers the highest performance cutting edge systems to wide deployment of mid and low-end production systems. Another aspect of diversity is the range of applications that can productively use on advanced computing resources.

Perhaps the greatest challenge to an ecosystem is complacency and assuming that it, and the necessary people, will always be there. This can take the form of an attitude that it is good enough to become a HPC technology follower and acceptable to purchase HPC systems and services from other nations. Once a HPC ecosystem has been lost, it is not clear if it can be regained. Having a robust HPC ecosystem can last for decades, through many “half lives” of hardware. A healthy ecosystem allows puts countries in a leadership position and this means the ability to influence HPC technologies in ways that best serve their strategic goals. Happily, the 4th NSCI objective signals that the United States understands these challenges and the importance of maintaining a healthy HPC ecosystem.

About the Author

Alex Larzelere is a senior fellow at the U.S. Council on Competitiveness, the president of Larzelere & Associates Consulting and HPCwire’s policy editor. He is currently a technologist, speaker and author on a number of disruptive technologies that include: advanced modeling and simulation; high performance computing; artificial intelligence; the Internet of Things; and additive manufacturing. Alex’s career has included time in federal service (working closely with DOE national labs), private industry, and as founder of a small business. Throughout that time, he led programs that implemented the use of cutting edge advanced computing technologies to enable high resolution, multi-physics simulations of complex physical systems. Alex is the author of “Delivering Insight: The History of the Accelerated Strategic Computing Initiative (ASCI).”

The post HPC Iron, Soft, Data, People – It Takes an Ecosystem! appeared first on HPCwire.

MareNostrum 4 Chosen as ‘Most Beautiful Data Center’

Mon, 12/11/2017 - 09:28

BARCELONA, Dec. 11, 2017 — MareNostrum 4 supercomputer has been chosen as the winner of the Most Beautiful Data Center in the world Prize, hosted by the Datacenter Dynamics (DCD) Company.

There are 15 prizes in different categories, besides the prize for the most beautiful data centre, which is elected by popular vote. MareNostrum 4 competed with such impressive facilities as the Switch Pyramid in Michigan, the Bahnhof Pionen in Stockholm or the Norwegian Green Mountain. BSC supercomputer has prevailed for its particular location, inside the chapel of Torre Girona, located in the North Campus of the Universitat Politècnica de Catalunya (UPC).

The awards ceremony took place on December 7th in London and both Mateo Valero, BSC Director, and Sergi Girona, Operations department Director, received the prize.

About MareNostrum 4

MareNostrum is the generic name used by BSC to refer to the different upgrades of its most emblematic supercomputer, the most powerful in Spain.  The first version was installed in 2005, and the fourth version is currently in operation.

MareNostrum 4 began operations last July, and according to the latest call of the Top500 list, it ranks the 16th position among the highest performing supercomputers. Currently, MareNostrum provides 11.1 Petaflops of processing power – that is, the capacity to perform 11.1 x (1015) operations per second– to scientific production and innovation. This capacity will be increased soon thanks to the installation of new clusters, featuring emerging technologies, which are currently being developed in USA and Japan.

Aside from being the most beautiful, MareNostrum has been dubbed the most interesting supercomputer in the world due to the heterogeneity of the architecture it will include once installation of the supercomputer is complete. Its total speed will be 13.7 Petaflops. Its main memory is of 390 Terabytes and it has the capacity to store 14 Petabytes (14 million Gigabytes) of data. A high-speed network connects all the components in the supercomputer to one another.

MareNostrum 4 has been funded by the Economy, Industry and Competitiveness Ministry of the Spanish Government and was awarded by public tender to IBM Company, which integrated into a single machine its own technologies together with the ones developed by Lenovo, Intel and Fujitsu.

About Barcelona Supercomputing Center

Barcelona Supercomputing Center (BSC) is the national supercomputing centre in Spain. BSC specialises in High Performance Computing (HPC) and its mission is two-fold: to provide infrastructure and supercomputing services to European scientists, and to generate knowledge and technology to transfer to business and society.

Source: Barcelona Supercomputing Center

The post MareNostrum 4 Chosen as ‘Most Beautiful Data Center’ appeared first on HPCwire.

PSSC Labs Launches PowerWulf HPC Clusters with Pre-Configured Intel Data Center Blocks

Mon, 12/11/2017 - 08:54

LAKE FOREST, Calif., Dec. 11, 2017 – PSSC Labs, a developer of custom HPC and Big Data computing solutions, today announced its PowerWulf HPC clusters are now available with Intel’s new Xeon Scalable Processors and Intel’s Omni-Path HPC Fabric to deliver the performance needed to tackle cutting edge computing tasks including real-time analytics, virtualized infrastructure and high-performance computing.

PowerWulf clusters are built with Intel’s Data Center Blocks to ensure a truly turnkey solution that addresses customer integration challenges. Today’s customer datacenters require unique server solutions that run complex, business-critical workloads. Intel Data Center Blocks configurations are purpose-built with all-Intel technology, optimized to address the needs of specific market segments. These fully validated blocks deliver performance, reliability and quality for solutions customer want and can trust to handle their demanding cloud, HPC, and business critical workloads.

PSSC Labs PowerWulf HPC Clusters are available as config-to-order (CTO) to meet the specific needs of a customer. Key features of these solutions include:

  • Pre-configured and fully validated blocks with the latest Intel HPC technology
  • Powered by the Intel Xeon processor Scalable family, delivers an overall performance increase up to 1.65x compared to the previous generation, and up to 5x Online Transaction       Processing warehouse workloads versus the current install base.
  • 2 operating system options to choose: RedHat, SUSE, and CentOS Linux
  • Multiple models with different support options
  • Intel Fabric Suite 10.5.1, Lustre 2.10
  • Intel Omni-Path Host Fabric Interface (Intel OP HFI) Adapter 100 Series and FDR/EDR InfiniBand Fabric
  • Intel Datacenter SATA and NVMe Solid State Drives (SSD)

“Intel’s integrated and fully-validated Data Center Blocks enables PSSC Labs to deliver more efficient and turnkey approach and reduce time to market, complexity and the costs of system design, validation and integration,” said Alex Lesser, EVP of PSSC Labs. “Partnering with Intel allows us to offer our customers the latest hardware options in our line of custom turn-key PowerWulf HPC clusters for a variety of applications across government, academic and commercial environments.”

PowerWulf HPC clusters also feature PSSC Labs CBeST Cluster Management Toolkit (Complete Beowulf Software Toolkit) to deliver a preconfigured solution with all the necessary hardware, network settings and cluster management software prior to shipping. With its component structure, CBeST is the most flexible cluster management software package available.

Every PowerWulf HPC Cluster includes a three year unlimited phone / email support package (additional year support available) with all support provided by PSSC Labs US-based team of engineers. PSSC Labs is an Intel HPC Data Center Specialist and has been a Platinum Provider with Intel since 2009. For more information see http://www.pssclabs.com/solutions/hpc-cluster/


Source: PSSC Labs

The post PSSC Labs Launches PowerWulf HPC Clusters with Pre-Configured Intel Data Center Blocks appeared first on HPCwire.

Intel® Omni-Path Architecture and Intel® Xeon® Scalable Processor Family Enable Breakthrough Science on 13.7 petaFLOPS MareNostrum 4

Mon, 12/11/2017 - 08:49

In publicly and privately funded computational science research, dollars (or Euros in this case) follow FLOPS. And when you’re one of the leading computing centers in Europe with a reputation around the world of highly reliable, leading edge technology resources, you look for the best in supercomputing in order to continue supporting breakthrough research. Thus, Barcelona Supercomputing Center (BSC) is driven to build leading supercomputing clusters for its research clients in the public and private sectors.

MareNostrum 4 is nestled within the Torre Girona chapel

“We have the privilege of users coming back to us each year to run their projects,” said Sergio Girona, BSC’s Operations Department Director. “They return because we reliably provide the technology and services they need year after year, and because our systems are of the highest level.” Supported by the Spanish and Catalan governments and funded by the Ministry of Economy and Competitiveness with €34 million in 2015, BSC sought to take its MareNostrum 3 system to the next-generation of computing capabilities. It specified multiple clusters for both general computational needs of ongoing research, and for development of next-generation codes based on emerging supercomputing technologies and tools for the Exascale computing era. It fell to IBM, who partnered with Fujitsu and Lenovo, to design and build MareNostrum 4.

MareNostrum 4 is a multi-cluster system which main cluster and data storage are interconnected by the Intel® Omni-Path Architecture (Intel® OPA) fabric. A general-purpose compute cluster, with 3,456 nodes of Intel® Xeon® Scalable Processor Product Family will provide up to 11.1 petaFLOPS of computational capacity. A smaller cluster delivering up to 0.5 petaFLOPS is built on the Intel® Xeon Phi Processor 7250. A third small cluster up to 1.5 petaFLOPS will include Power9* and Nvidia GPUs, and a fourth one, made of ARM v8 processors, will provide other 0.5 petaFLOPS of performance. And an IBM storage array will round out the system. All systems are interconnected with the storage subsystem. MareNostrum 4 is designed to be twelve times faster than its predecessor.

Spain’s 13.7 petaFLOPS supercomputer contributes to the Partnership for Advanced Computing in Europe (PRACE) and supports the Spanish Supercomputing Network (RES).

“From my point of view,” stated Girona, “Intel had, at the time of the procurement, the best processor for general purpose systems. Intel is very good on specific domains, and they continue to innovate in other domains. That is why we chose Intel processors for the general-purpose cluster and Intel Xeon Phi Processor for one of the emerging technology clusters, on which we can explore new code development.” The system was in production by July 2017 and placed at number 13 in the June 2017 Top500 list and number 16 on the November 2017 list.

“The Barcelona Supercomputing Center team is committed to maximizing MareNostrum in any way we can,” concluded Girona. “But MareNostrum is not about us. Our purpose at BSC is to help others. We are successful when the scientists and engineers using MareNostrum’s computing power get all the data they need to further their discoveries. It is always rewarding to know we help others to further cutting-edge scientific exploration.”

Learn more about Intel HPC resources >

The post Intel® Omni-Path Architecture and Intel® Xeon® Scalable Processor Family Enable Breakthrough Science on 13.7 petaFLOPS MareNostrum 4 appeared first on HPCwire.

TACC Works with C-DAC, India to Organize Workshop on Software Challenges in Supercomputing

Fri, 12/08/2017 - 11:17

Dec. 8, 2017 — The Texas Advanced Computing Center (TACC) in the U.S. – a world leader in supercomputing – is collaborating with the Centre for Development of Advanced Computing (C-DAC) in India to host a workshop on the “Software Challenges to Exascale Computing (SCEC17)” on December 17th, 2017, from 9 AM to 7 PM at the Hotel Royal Orchid, in Jaipur. The main goal of this workshop is to foster international collaborations in the area of software for the current and next generation supercomputing systems.

At the workshop, exciting talks on advanced software engineering and supercomputing will be delivered by world leaders from the National Science Foundation in the U.S. (https://nsf.gov/), leading academic institutions in India, Japan and the U.S., R&D organizations, and industry. In line with the 2015 “National Strategic Computing Initiative (NSCI)” of the U.S. government, and the “Skill India” campaign of the Government of India, the workshop includes training on using supercomputing resources to solve problems of high societal impact, like earthquake simulation studies and drug discovery efforts. (Additional details on the workshop can be found at: https://scecforum.github.io/)

“I am delighted to collaborate with our colleagues at C-DAC and contribute towards developing a skilled workforce and a strong community in the area of high-level software tools for supercomputing platforms,” said Dr. Ritu Arora, the SCEC17 workshop chair and a scientist at TACC. “Without a concerted effort in this area, it will be hard to lower the adoption barriers to supercomputing and to make it accessible to the masses, especially the non-traditional users of the supercomputers.”

Intel and Nvidia, two key industry players in the supercomputing sector, are generously supporting the workshop. The workshop will provide a forum through which hardware vendors and software developers can communicate with each other and influence the architecture of the next-generation supercomputing systems and the supporting software stack. By fostering cross-disciplinary associations, the workshop will serve as a stepping-stone towards innovations in the future.

About TACC: The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is a leading advanced computing research center in the world. TACC provides comprehensive advanced computing resources and support services to researchers across the USA. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

About C-DAC: Centre for Development of Advanced Computing (C-DAC) is the premier R&D organization of the Ministry of Electronics and Information Technology (MeitY) for carrying out R&D in IT, Electronics and associated areas. Different areas of C-DAC, had originated at different times, many of which came out as a result of identification of opportunities.

Source: TACC

The post TACC Works with C-DAC, India to Organize Workshop on Software Challenges in Supercomputing appeared first on HPCwire.


Fri, 12/08/2017 - 11:03

LONG BEACH, Calif., Dec. 8, 2017 — NVIDIA today introduced TITAN V, a powerful GPU for the PC, driven by the NVIDIA Volta GPU architecture.

Announced by NVIDIA founder and CEO Jensen Huang at the annual NIPS conference, TITAN V excels at computational processing for scientific simulation. Its 21.1 billion transistors deliver 110 teraflops of raw horsepower, 9x that of its predecessor, and extreme energy efficiency.

“Our vision for Volta was to push the outer limits of high performance computing and AI. We broke new ground with its new processor architecture, instructions, numerical formats, memory architecture and processor links,” said Huang. “With TITAN V, we are putting Volta into the hands of researchers and scientists all over the world. I can’t wait to see their breakthrough discoveries.”

NVIDIA Supercomputing GPU Architecture, Now for the PC

TITAN V’s Volta architecture features a major redesign of the streaming multiprocessor that is at the center of the GPU. It doubles the energy efficiency of the previous generation Pascal design, enabling dramatic boosts in performance in the same power envelope.

New Tensor Cores designed specifically for deep learning deliver up to 9x higher peak teraflops. With independent parallel integer and floating-point data paths, Volta is also much more efficient on workloads with a mix of computation and addressing calculations. Its new combined L1 data cache and shared memory unit significantly improve performance while also simplifying programming.

Fabricated on a new TSMC 12-nanometer FFN high-performance manufacturing process customized for NVIDIA, TITAN V also incorporates Volta’s highly tuned 12GB HBM2 memory subsystem for advanced memory bandwidth utilization.

Free AI Software on NVIDIA GPU Cloud

TITAN V’s power is ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.

Users of TITAN V can gain immediate access to the latest GPU-optimized AI, deep learning and HPC software by signing up at no charge for an NVIDIA GPU Cloud account. This container registry includes NVIDIA-optimized deep learning frameworks, third-party managed HPC applications, NVIDIA HPC visualization tools and the NVIDIA TensorRT inferencing optimizer.

Immediate Availability

TITAN V is available to purchase today for $2,999 from the NVIDIA store in participating countries.


NVIDIA’s (NASDAQ:NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots and self-driving cars that can perceive and understand the world. More information at http://nvidianews.nvidia.com/.

Source: NVIDIA

The post NVIDIA Introduces TITAN V GPU appeared first on HPCwire.

Data Vortex Technologies Partners with Providentia Worldwide to Develop Novel Solutions

Thu, 12/07/2017 - 16:18

AUSTIN, December 6 – This November, proprietary network company Data Vortex Technologies formalized a partnership with Providentia Worldwide, LLC. Providentia is a technologies and solutions consulting venture which bridges the gap between traditional HPC and enterprise computing. The company works with Data Vortex and potential partners to develop novel solutions for Data Vortex technologies and to assist with systems integration into new markets. This partnership will leverage the deep experience in enterprise and hyperscale environments of Providentia Worldwide founders, Ryan Quick and Arno Kolster, and merge the unique performance characteristics of the Data Vortex with traditional systems.

“Providentia Worldwide is excited to see the Data Vortex network in new areas which can benefit from their fine-grained, egalitarian efficiencies. Messaging middleware, data and compute intensive appliances, real-time predictive analytics and anomaly detection, and parallel application computing are promising focus areas for Data Vortex Technologies,” says Quick. “Disrupting entrenched enterprise deployment designs is never easy, but when the gains are large enough, the effort is well worth it. Providentia Worldwide sees the potential to dramatically improve performance and capabilities in these areas, causing a sea-change in how fine-grained network problems are solved going forward.”

The senior technical teams of Data Vortex and Providentia are working on demonstrating the capabilities and performance of popular open source applications on the proprietary Data Vortex Network. The goal is to bring unprecedented performance increases to spaces that are often unaffected by traditional advancements in supercomputing. “This is a necessary step for us – the Providentia partnership is adding breadth to the Data Vortex effort,” says Data Vortex President, Carolyn Coke Reed Devany. “Up to this point we have deployed HPC systems, built with commodity servers connected with Data Vortex switches and VICs [Vortex Interconnect Cards], to federal and academic customers. The company is now offering a network solution that will allow customers to connect an array of different devices to address their most challenging data movement needs.”

Source: Data Vortex Technologies; Providentia Worldwide

The post Data Vortex Technologies Partners with Providentia Worldwide to Develop Novel Solutions appeared first on HPCwire.

The Stanford Living Heart Project Wins Prestigious HPC Awards During SC17

Thu, 12/07/2017 - 16:15

Dec. 7 — During SC’17, the 30th International Conference for High Performance Computing, Networking, Storage and Analysis in Denver, in November, UberCloud – on behalf of the Stanford Living Heart Project – received three HPC awards. It started at the preceding Intel HPC Developer Conference when the Living Heart Project (LHP), presented by Burak Yenier from UberCloud, won a best paper award. During SC’17 on Monday, the Stanford LHP Team received the HPCwire Editors’ Choice Award for Best Use of HPC in the Cloud. And finally, on Tuesday, the team won the Hyperion (former IDC) Award for Innovation Excellence, elected by the Steering Committee of the HPC User Forum.

The Stanford LHP project dealt with simulating cardiac arrhythmia which can be an undesirable and potentially lethal side effect of drugs. During this condition, the electrical activity of the heart turns chaotic, decimating its pumping function, thus diminishing the circulation of blood through the body. Some kind of cardiac arrhythmia, if not treated with a defibrillator, will cause death within minutes.

Before a new drug reaches the market, pharmaceutical companies need to check for the risk of inducing arrhythmias. Currently, this process takes years and involves costly animal and human studies. In this project, the Living Matter Laboratory of Stanford University developed a new software tool enabling drug developers to quickly assess the viability of a new compound. This means better and safer drugs reaching the market to improve patients’ lives.

Figure 1: Evolution of the electrical activity for the baseline case (no drug) and after application of the drug Quinidine. The electrical propagation turns chaotic after the drug is applied, showing the high risk of Quinidine to produce arrhythmias.

“The Living Heart Project team, led by researchers from the Living Matter Laboratory at Stanford University, is proud and humbled by being elected from HPCwire’s editors for the Best Use of HPC in the Cloud, and from the 29 renowned members of the HPC User Forum Steering Committee for the 2017 Hyperion Innovation Excellence Award”, said Wolfgang Gentzsch from The UberCloud. “And we are deeply grateful for all the support from Hewlett Packard Enterprise and Intel (the sponsors), Dassault Systemes SIMULIA (for Abaqus 2017), Advania (providing HPC Cloud resources), and the UberCloud tech team for containerizing Abaqus and integrating all software and hardware components into one seamless solution stack.” 

Figure 2: Electrocardiograms: tracing for a healthy, baseline case, versus the arrhythmic development after applying the drug Sotalol.

A computational model that is able to assess the response of new drug compounds rapidly and inexpensively is of great interest for pharmaceutical companies, doctors, and patients. Such a tool will increase the number of successful drugs that reach the market, while decreasing cost and time to develop them, and thus help hundreds of thousands of patients in the future. However, the creation of a suitable model requires taking a multiscale approach that is computationally expensive: the electrical activity of cells is modelled in high detail and resolved simultaneously in the entire heart. Due to the fast dynamics that occur in this problem, the spatial and temporal resolutions are highly demanding. For more details about the Stanford Living Heart Project please read the previous HPCwire article HERE. .

About UberCloud

UberCloud is the online Community, Marketplace, and Software Container Factory where engineers, scientists, and their service providers, discover, try, and buy ubiquitous high-performance computing power and Software-as-a-Service, from Cloud resource providers and application software vendors around the world. UberCloud’s unique high-performance software container technology simplifies software packageability and portability, enables ease of access and instant use of engineering SaaS solutions, and maintains scalability across multiple compute nodes. Please visit www.TheUberCloud.com or contact us at www.TheUberCloud.com/help/.

Source: UberCloud

The post The Stanford Living Heart Project Wins Prestigious HPC Awards During SC17 appeared first on HPCwire.

Supermicro Announces Scale-Up SuperServer Certified for SAP HANA

Thu, 12/07/2017 - 12:31

SAN JOSE, Calif., Dec. 7, 2017 — Super Micro Computer, Inc. (NASDAQ: SMCI), a global leader in enterprise computing, storage, networking solutions, green computing technology and an SAP global technology partner, today announced that its latest 2U 4-Socket SuperServer (2049U-TR4) supporting the highest performance Intel Xeon Scalable processors, maximum memory and all-flash SSD storage has been certified for operating the SAP HANA platform. SuperServer 2049U-TR4 for SAP HANA supports customers by offering a unique scale-up single node system based on a well-defined hardware specification designed to meet the most demanding performance requirements of SAP HANA in-memory technology.

“Combining our capabilities in delivering high-performance, high-efficiency server technology, innovation, end-to-end green computing solutions to the data center, and cloud computing with the in-memory computing capabilities of SAP HANA, Supermicro SuperServer 2049U-TR4 for SAP HANA offers customers a pre-assembled, pre-installed, pre-configured, standardized and highly optimized solution for mission-critical database and applications running on SAP HANA,” said Charles Liang, President and CEO of Supermicro. “The SAP HANA certification is a vital addition to our solution portfolio further enabling Supermicro to provision and service innovative new mission-critical solutions for the most demanding enterprise customer requirements.”

Supermicro is collaborating with SAP to bring its rich portfolio of open cloud-scale computing solutions to enterprise customers looking to transition from traditional high-cost proprietary systems to open, cost-optimized, software-defined architectures. To support this collaboration, Supermicro has recently joined the SAP global technology partner program.

SAP HANA combines database, data processing, and application platform capabilities in-memory. The platform provides libraries for predictive, planning, text processing, spatial and business analytics. By providing advanced capabilities, such as predictive text analytics, spatial processing and data virtualization on the same architecture, it further simplifies application development and processing across big-data sources and structures. This makes SAP HANA a highly suitable platform for building and deploying next-generation, real-time applications and analytics.

The new SAP-certified solution complements existing solutions from Supermicro for SAP NetWeaver technology platform and helps support customers’ transition to SAP HANA and SAP S/4HANA. In fact, Supermicro has certified its complete portfolio of server and storage solutions to support the SAP NetWeaver technology platform running on Linux. Designed for enterprises that require the highest operational efficiency and maximum performance, all these Supermicro SuperServer solutions are ready for SAP applications based on the NetWeaver technology platform such as SAP ECC, SAP BW and SAP CRM, either as application or database server in a two- or three-tier SAP configuration.

Supermicro plans to continue expanding its portfolio of SAP HANA certified systems including an 8-socket scale-up solution based on the SuperServer 7089P-TR4 and a 4-socket solution based on its SuperBlade in the first half of 2018.

About Super Micro Computer, Inc. (NASDAQ: SMCI)

Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology is a premier provider of advanced Server Building Block Solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/Big Data, HPC and Embedded Systems worldwide.

Source: Super Micro Computer, Inc.

The post Supermicro Announces Scale-Up SuperServer Certified for SAP HANA appeared first on HPCwire.

CMU, PSC and Pitt to Build Brain Data Repository

Thu, 12/07/2017 - 11:12

Dec. 7, 2017 — Researchers with Carnegie Mellon’s Molecular Biosensor and Imaging Center (MBIC), the Pittsburgh Supercomputing Center (PSC) and the University of Pittsburgh’s Center for Biological Imaging (CBI) will help to usher in an era of open data research in neuroscience by building a confocal fluorescence microscopy data repository. The data archive will give researchers easy, searchable access to petabytes of existing data.

The project is funded by a $5 million, five-year grant from the National Institutes of Health’s (NIH’s) National Institute of Mental Health (MH114793) and is part of the federal BRAIN initiative.

“This grant is a testament to the fact that Pittsburgh is a leader in the fields of neuroscience, imaging and computer science,” said Marcel Bruchez, MBIC director, professor of biological sciences and chemistry at Carnegie Mellon and co-principal investigator of the grant. “By merging these disciplines, we will create a tool that helps the entire neuroscience community advance our understanding of the brain at an even faster pace.”

New imaging tools and technologies, like large-volume confocal fluorescence microscopy, have greatly accelerated neuroscience research in the past five years by allowing researchers to image large regions of the brain at such a high level of resolution that they can zoom in to the level of a single neuron or synapse, or zoom out to the level of the whole brain. These images, however, contain such a large amount of data that only a small part of one brain’s worth of data can be accessed at a time using a standard desktop computer. Additionally, images are often collected in different ways — at different resolutions, using different methodologies and different orientations. Comparing and combining data from multiple whole brains and datasets requires the power of supercomputing.

“PSC has a long experience with handling massive datasets for its users, as well as a deep skillset in processing microscopic images with high-performance computing,” said Alex Ropelewski, director of PSC’s Biomedical Applications Group and a co-principal investigator in the NIH grant. “This partnership with MBIC and CBI was a natural step in the ongoing collaborations between the institutions.”

The Pittsburgh-based team will bring together MBIC and CBI’s expertise in cell imaging and microscopy and pair it with the PSC’s long history of experience in biomedical supercomputing to create a system called the Brain Imaging Archive. Researchers will be able to submit their whole brain images, along with metadata about the images, to the archive. There the data will be indexed into a searchable system that can be accessed using the internet. Researchers can search the system to find existing data that will help them narrow down their research targets, making research much more efficient.

About PSC

The Pittsburgh Supercomputing Center is a joint effort of Carnegie Mellon University and the University of Pittsburgh. Established in 1986, PSC is supported by several federal agencies, the Commonwealth of Pennsylvania and private industry, and is a leading partner in XSEDE (Extreme Science and Engineering Discovery Environment), the National Science Foundation cyberinfrastructure program.

Source: PSC

The post CMU, PSC and Pitt to Build Brain Data Repository appeared first on HPCwire.

System Fabric Works and ThinkParQ Partner for Parallel File System

Thu, 12/07/2017 - 09:59

AUSTIN, Tex. and KAISERLAUTERN, Germany, Dec. 7, 2017 — Today System Fabric Works announces its support and integration of the BeeGFS file system with the latest NetApp E-Series All Flash and HDD storage systems which makes BeeGFS available on the family of NetApp E-Series Hyperscale Storage products as part of System Fabric Work’s (SFW) Converged Infrastructure solutions for high-performance Enterprise Computing, Data Analytics and Machine Learning.

“We are pleased to announce our Gold Partner relationship with ThinkParQ,” said Kevin Moran, President and CEO, System Fabric Works. “Together, SFW and ThinkParQ can deliver, worldwide, a highly converged, scalable computing solution based on BeeGFS, engineered with NetApp E-Series, a choice of InfiniBand, Omni-Path, RDMA over Ethernet and NVMe over Fabrics for targeted performance and 99.9999 reliability utilizing customer-chosen clustered servers and clients and SFW’s services for architecture, integration, acceptance and on-going support services.”

SFW’s solutions can utilize each of these networking topologies for optimal BeeGFS performance and 99.9999 reliability as full turnkey deployments, adapted to utilize customer-chosen clustered servers and clients.  SFW provides services for architecture, integration, acceptance and on-going support services.

BeeGFS delivered by ThinkParQ, is the leading parallel cluster file system designed specifically to deal with I/O intensive workloads in performance-critical environments. With a strong focus on performance and high flexibility, including converged environments where storage servers are also used for computing, BeeGFS help customers worldwide to increase their productivity by delivering results faster and by enabling analysis methods that were not possible without the specific advantages of BeeGFS.

Designed for very easy installation and management, BeeGFS transparently spreads user data across multiple servers. Therefore, the users can simply scale performance and capacity to the desired level by increasing the number of servers and disks in the system, seamlessly from small clusters up to enterprise-class systems with thousands of nodes. BeeGFS, which is available open source, is powering the storage of hundreds of scientific and industry customer-sites worldwide.

Sven Breuner, CEO of ThinkParQ, stated, “The long experience and solid track record of System Fabric Works in the field of enterprise storage makes us proud of this new partnership. Together, we can now deliver perfectly tailored solutions that meet and exceed customer expectations, no matter whether the customer needs a traditional spinning disk system for high capacity, an all-flash system for maximum performance or a cost-effective hybrid solution with pools of spinning disks and flash drives together in the same file system.”

Due to its performance tuned design and various optimized features, BeeGFS is ideal for demanding, high-performance, high-throughput workloads found in Technical Computing for modeling and simulation, product engineering, life sciences, deep learning, predictive analytics, media, financial services, and many more business advantage tools.

With the new storage pools feature in BeeGFS v7, users can now have their current project pinned to the latest NetApp E-Series All Flash SSD pool to have the full performance of an all-flash system, while the rest of the data resides on spinning disks, where it also can be accessed directly – all within the same namespace and thus completely transparent for applications.

SFW BeeGFS solutions can be based on x86_64 and ARM64 ISAs, support multiple networks with dynamic failover and provide fault-tolerance with built-in replication, and come with additional file system integrity and storage reliability features. Another compelling part of the solution offerings is BeeOND (BeeGFS on demand) which allows on the fly creation of a temporary parallel file system instances on the internal SSDs of compute nodes a per-job basis for burst-buffering. Graphical monitoring and an additional command line interface provides easy management for any kind of environment.

SFW BeeGFS high performance storage solutions with architectural design, implementation and on-gong support services are immediately available from System Fabric Works.

About ThinkParQ

ThinkParQ was founded as a spin-off from the Fraunhofer Center for High Performance Computing by the key people behind BeeGFS to bring fast, robust, scalable storage to market. ThinkParQ is responsible for support, provides consulting, organizes and attends events, and works together with system integrators to create turn-key solutions. ThinkParQ and Fraunhofer internally cooperate closely to deliver high quality support services and to drive further development and optimization of BeeGFS for tomorrow’s performance-critical systems. Visit www.thinkparq.com to learn more about the company.

About System Fabric Works

System Fabric Works (“SFW”), based in Austin, TX, specializes in delivering engineering, integration and strategic consulting services to organizations that seek to implement high performance computing and storage systems, low latency fabrics and the necessary related software. Derived from its 15 years of experience, SFW also offers custom integration and deployment of commodity servers and storage systems at many levels of performance, scale and cost effectiveness that are not available from mainstream suppliers. SFW personnel are widely recognized experts in the fields of high performance computing, networking and storage systems particularly with respect to OpenFabrics Software, InfiniBand, Ethernet and energy saving, efficient computing technologies such as RDMA. Detailed information describing SFW’s areas of expertise and corporate capabilities can be found at www.systemfabricworks.com.

Source: System Fabric Works

The post System Fabric Works and ThinkParQ Partner for Parallel File System appeared first on HPCwire.

Call for Sessions and Registration Now Open for 14th Annual OpenFabrics Alliance Workshop

Thu, 12/07/2017 - 09:06

BEAVERTON, Ore., Dec. 6, 2017 — The OpenFabrics Alliance (OFA) has published a Call for Sessions for its 14th annual OFA Workshop, taking place April 9-13, 2018, in Boulder, CO. The OFA Workshop is a premier means of fostering collaboration among those who develop fabrics, deploy fabrics and create applications that rely on fabrics. It is the only event of its kind where fabric developers and users can discuss emerging fabric technologies, collaborate on future industry requirements, and address problems that exist today. In support of advancing open networking communities, the OFA is proud to announce that Promoter Member Los Alamos National Laboratory, a strong supporter of collaborative development of fabric technologies, will underwrite a portion of the Workshop. For more information about the OFA Workshop and to find support opportunities, visit the event website.

Call for Sessions

The OFA Workshop 2018 Call for Sessions encourages industry experts and thought leaders to help shape this year’s discussions by presenting or leading discussions on critical high performance networking issues. Sessions are designed to educate attendees on current development opportunities, troubleshooting techniques, and disruptive technologies affecting the deployment of high performance computing environments. The OFA Workshop places a high value on collaboration and exchanges among participants. In keeping with the theme of collaboration, proposals for Birds of a Feather sessions and panels are particularly encouraged.

The deadline to submit session proposals is February 16, 2018, at 5:00 p.m. PST. For a list of recommended session topics, formats and submission instructions download the official OFA Workshop 2018 Call for Sessions flyer.


Early bird registration is now open for all participants of the OFA Workshop 2018. For more information on event registration and lodging, visit the OFA Workshop 2018 Registration webpage.

Dates: April 9-13, 2018

Location: Embassy Suites by Hilton Boulder, CO

Registration Site: http://bit.ly/OFA2018REGRegistration Fee: $695 (Early Bird to March 19, 2018), $815 (Regular)

Lodging: Embassy Suites room discounts available until 6:00 p.m. MDT on Monday, March 19, 2018, or until room block is filled.

About the OpenFabrics Alliance

The OpenFabrics Alliance (OFA) is a 501(c) (6) non-profit company that develops, tests, licenses and distributes the OpenFabrics Software (OFS) – multi-platform, high performance, low-latency and energy-efficient open-source RDMA software. OpenFabrics Software is used in business, operational, research and scientific infrastructures that require fast fabrics/networks, efficient storage and low-latency computing. OFS is free and is included in major Linux distributions, as well as Microsoft Windows Server 2012. In addition to developing and supporting this RDMA software, the Alliance delivers training, workshops and interoperability testing to ensure all releases meet multivendor enterprise requirements for security, reliability and efficiency. For more information about the OFA, visit www.openfabrics.org.

Source: OpenFabrics Alliance

The post Call for Sessions and Registration Now Open for 14th Annual OpenFabrics Alliance Workshop appeared first on HPCwire.

Cray and NERSC Partner to Drive Advanced AI Development at Scale

Wed, 12/06/2017 - 16:54

SEATTLE, December 6, 2017 – Global supercomputer leader Cray Inc. today announced the company has joined the Big Data Center (BDC) at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC). The collaboration between the two organizations is representative of Cray’s commitment to leverage its supercomputing expertise, technologies, and best practices to advance the adoption of Artificial Intelligence (AI), deep learning, and data-intensive computing.

The BDC at NERSC was established with a goal of addressing the Department of Energy’s leading data-intensive science problems, harnessing the performance and scale of the Cray XC40 “Cori” supercomputer at NERSC. The collaboration is focused on three fundamental areas that are key to unlocking the capabilities required for the most challenging data-intensive workflows:

·       Advancing the state-of-the-art in scalable, deep learning training algorithms, which is critical to the ability to train models as quickly as possible in an environment of ever-increasing data sizes and complexity;

·       Developing a framework for automated hyper-parameter tuning, which provides optimized training of deep learning models and maximizes a model’s predictive accuracy;

·       Exploring the use of deep learning techniques and applications against a diverse set of important scientific use cases, such as genomics and climate change, which broadens the range of scientific disciplines where advanced AI can have an impact.
“We are really excited to have Cray join the Big Data Center,” said Prabhat, Director of the Big Data Center, and Group Lead for Data and Analytics Services at NERSC. “Cray’s deep expertise in systems, software, and scaling is critical in working towards the BDC mission of enabling capability applications for data-intensive science on Cori. Cray and NERSC, working together with Intel and our IPCC academic partners, are well positioned to tackle performance and scaling challenges of Deep Learning.”

“Deep learning is increasingly dependent on high performance computing, and as the leader in supercomputing, Cray is focused on collaborating with the innovators in AI to address present and future challenges for our customers,” said Per Nyberg, Cray’s Senior Director of Artificial Intelligence and Analytics. “Joining the Big Data Center at NERSC is an important step forward in fostering the advancement of deep learning for science and enterprise, and is another example of our continued R&D investments in AI.”

About the Big Data Center at NERSC

The Big Data Center is a collaboration between the Department of Energy’s National Energy Research Scientific Computing Center (NERSC), Intel, and five Intel Parallel Computing Centers (IPCCs). The five IPCCs that are part of the Big Data Center program include the University of California-Berkeley, the University of California-Davis, New York University (NYU), Oxford University, and the University of Liverpool.  The Big Data Center program was established in August 2017.

About Cray Inc.

Global supercomputing leader Cray Inc. (Nasdaq: CRAY) provides innovative systems and solutions enabling scientists and engineers in industry, academia and government to meet existing and future simulation and analytics challenges. Leveraging more than 40 years of experience in developing and servicing the world’s most advanced supercomputers, Cray offers a comprehensive portfolio of supercomputers and big data storage and analytics solutions delivering unrivaled performance, efficiency and scalability. Cray’s Adaptive Supercomputing vision is focused on delivering innovative next-generation products that integrate diverse processing technologies into a unified architecture, allowing customers to meet the market’s continued demand for realized performance. Go to www.cray.com for more information.

Source: Cray Inc.

The post Cray and NERSC Partner to Drive Advanced AI Development at Scale appeared first on HPCwire.

IBM Begins Power9 Rollout with Backing from DOE, Google

Wed, 12/06/2017 - 15:37

After over a year of buildup, IBM is unveiling its first Power9 system based on the same architecture as the Department of Energy CORAL supercomputers Summit and Sierra. The new AC922 server pairs two Power9 CPUs with between four and six Nvidia Tesla V100 NVLink GPUs. IBM is positioning the Power9 architecture as “a game-changing powerhouse for AI and cognitive workloads.”

The AC922 extends many of the design elements introduced in Power8 “Minsky” boxes with a focus on enabling connectivity to a range of accelerators – Nvidia GPUs, ASICs, FPGAs, and PCIe-connected devices — using an array of interfaces. In addition to being the first servers to incorporate PCIe Gen4, the new systems support the NVLink 2.0 and OpenCAPI protocols, which offer nearly 10x the maximum bandwidth of PCI-E 3.0 based x86 systems, according to IBM.

IBM AC922 rendering

“We designed Power9 with the notion that it will work as a peer computer or a peer processor to other processors,” said Sumit Gupta, vice president of of AI and HPC within IBM’s Cognitive Systems business unit, ahead of the launch. “Whether it’s GPU accelerators or FPGAs or other accelerators that are in the market, our aim was to provide the links and the hooks to give all these accelerators equal footing in the server.”

In the coming months and years there will be additional Power9-based servers to follow from IBM and its ecosystem partners but this launch is all about the flagship AC922 platform and specifically its benefits to AI and cognitive computing – something Ken King, general manager of OpenPOWER for IBM Systems Group, shared with HPCwire when we sat down with him at SC17 in Denver.

“We didn’t build this system just for doing traditional HPC workloads,” King said. “When you look at what Power9 has with NVLink 2.0 we’re going from 80 gigabits per second throughput [in NVLink 1.0] to over 150 gigabits per second throughput. PCIe Gen3 only has 16. That GPU to CPU I/O is critical for a lot of the deep learning and machine learning workloads.”

Coherency, which Power9 introduces via both CAPI and NVLink 2.0, is another key enabler. As AI models grow large, they can easily outgrow GPU memory capacity but the AC922 addresses these concerns by allowing accelerated applications to leverage system memory as GPU memory. This reduces latency and simplifies programming by eliminating data movement and locality requirements.

The AC922 server can be configured with either four or six Nvidia Volta V100 GPUs. According to IBM, a four GPU air-cooled version will be available December 22 and both four- and six-GPU water-cooled options are expected to follow in the second quarter of 2018.

While the new Power9 boxes have gone by a couple different codenames (“Witherspoon” and “Newell”), we’ve also heard folks at IBM refer to them informally as their Summit servers and indeed there is great visibility in being the manufacturer for what is widely expected to be the United States’ next fastest supercomputer. Thousands of the AC922 nodes are being connected together along with storage and networking to drive approximately 200 petaflops at Oak Ridge and 120 petaflops at Lawrence Livermore.

As King pointed out in our interview, only one of the original CORAL contractors is fulfilling its mission to deliver a “pre-exascale” supercomputer to the collaboration of US labs.

IBM has also been tapped by Google, which with partner Rackspace is building a server with Power9 processors called Zaius. In a prepared statement, Bart Sano, vice president of Google Platforms, praised “IBM’s progress in the development of the latest POWER technology” and said “the POWER9 OpenCAPI Bus and large memory capabilities allow for further opportunities for innovation in Google data centers.”

IBM sees the hyperscale market as “a good volume opportunity” but is obviously aware of the impact that volume pricing has had on the traditional server market. “We do see strong pull from them, but we have many other elements in play,” said Gupta. “We have solutions that go after the very fast-growing AI space, we have solutions that go after the open source databases, the NoSQL datacenters. We have announced a partnership with Nutanix to go after the hyperconverged space. So if you look at it, we have lots of different elements that drive the volume and opportunity around our Linux on Power servers, including of course SAP HANA.”

IBM will also be selling Power9 chips through its OpenPower ecosystem, which now encompasses 300 members. IBM says it’s committed to deploying three versions of the Power9 chip, one this year, one in 2018 and another in 2019. The scale-out variant is the one it is delivering with CORAL and with the AC922 server. “Then there will be a scale-up processor, which is the traditional chip targeted towards the AIX and the high-end space and then there’s another one that will be more of an accelerated offering with enhanced memory and other features built into it; we’re working with other memory providers to do that,” said King.

He added that there might be another version developed outside of IBM, leveraging OpenPower, which gives other organizations the opportunity to utilize IBM’s intellectual property to build their own differentiated chips and servers.

King is confident that the demand for IBM’s latest platform is there. “I think we are going to see strong out of the chute opportunities for Power9 in 2018. We’re hoping to see some growth this quarter with the solution that we’re bringing out with CORAL but that will be more around the ESP customers. Next year is when we’re expecting that pent up demand to start showing positive return overall for our business results.”

A lot is riding on the success of Power9 after Power8 failed to generate the kind of profits that IBM had hoped for. There was growth in the first year said King but after that Power8 started declining. He added that capabilities like Nutanix and building PowerAI and other software based solutions on top of it have led to a bit of a rebound. “It’s still negative but it’s low negative,” he said, “but it’s sequentially grown quarter to quarter in the last three quarters since Bob Picciano [SVP of IBM Cognitive Systems] came on.”

Several IBM reps we spoke with acknowledged that pricing or at least pricing perception was a problem for Power8.

“For our traditional market I think pricing was competitive; for some of the new markets that we’re trying to get into like the hyperscaler datacenters I think we’ve got some work to do,” said King. “It’s really a TCO and a price-performance competitiveness versus price only. And we think we’re going to have a much better price performance competitiveness with Power9 in the hyperscalers and some of the low-end Linux spaces that are really the new markets.”

“We know what we need to do for Power9 and we’re very confident with a lot of the workload capabilities that we’ve built on top of this architecture that we’re going to see a lot more growth, positive growth on Power9, with PowerAI with Nutanix with some of the other workloads we’ve put in there and it’s not going to be a hardware only reason,” King continued. “It’s going to be a lot of the software capabilities that we’ve built on top of the platform, and supporting more of the newer workloads that are out there. If you look at the IDC studies of the growth curve of cognitive infrastructure it goes from about $1.6 billion to $4.5 billion over the next two or three years – it’s a huge hockey stick – and we have built and designed Power9 for that market, specifically and primarily for that market.”

The post IBM Begins Power9 Rollout with Backing from DOE, Google appeared first on HPCwire.

University of Oregon Uses Supercomputer to Power New Research Advanced Computing Facility

Wed, 12/06/2017 - 13:31

Dec. 6, 2017 — A supercomputer that can perform more than 250 trillion calculations per second is powering the UO’s leap into data science as the heart of a new $2.2 million Research Advanced Computing Services facility.

Known as Talapas, the powerhouse mainframe is one of the fastest academic supercomputers in the Northwest. Its computing horsepower will aid researchers doing everything from statistical studies to genomic assemblies to quantum chemistry.

“It’s already had a profound impact on my research,” said Eric Corwin, an associate professor in the Department of Physics who employed the center’s supercomputer to examine a physical process known as “jamming.” “Computational tasks that would otherwise have taken a year running on a lab computer can be finished in just a few days, which means we can be much more exploratory in our approach, which has already led to several unexpected discoveries.”

The new center, which opens officially Dec. 6, is already available to faculty members who register as principal investigators or to members of registered research teams. The center, known by the acronym RACS, offers access to large-scale computing and will soon add high-speed data transfer capabilities, support for data sharing and other services.

In addition to boosting the university’s capacity for big data, the new center opens new doors of discovery for faculty across the spectrum of disciplines, schools, colleges and departments. Director Nick Maggio says the center will also help train students for the careers of tomorrow and make the UO more competitive in recruiting new faculty and securing research funding.

“It allows our researchers to evaluate novel technologies and explore new paradigms of computing that weren’t available to them before,” Maggio said. “We’re here to lower every barrier possible so that research computing can flourish at the University of Oregon.”

Talapas is 10 times more powerful than its aging predecessor, ACISS. In just the first few months of testing, the center has helped faculty members performing molecular dynamics simulations, image analysis, machine learning, deep learning and other types of projects.

Bill Cresko, a professor in the Department of Biology who serves as an associate vice president for research, directs the UO’s Presidential Initiative in Data Science. He points to the high-performance computing center as a crucial element of the initiative.

The center will bring together existing faculty and recruit new faculty across the UO’s schools and colleges to create new research and education programs. The center and the initiative are funded through the $50 million Presidential Fund for Excellence announced earlier this year by UO President Michael Schill.

“Research is becoming more and more data-intensive every day, and it’s crucial that we have the capacity to perform the kinds of larger and larger simulations that the high-performance computing center enables,” Cresko said. “The center will play a key role in our continued success as a research institution and our commitment to discovery and innovation.”

The Research Advanced Computing Services center has a staff of four that includes Maggio, a computational scientist and two system administrators. Among other things, the team has been tasked with transitioning users off of the old system and bringing researchers up to speed with the powerful new technology.

The center is one of nine research core facilities supported by the UO’s Office of the Vice President of Research and Innovation.

“It’s a real jewel in the midst of our growing research enterprise,” said David Conover, UO’s vice president for research and innovation. “It goes a long way toward our goal of advancing transformative excellence in research, innovation and graduate education, and it’s exciting to think about all of the new discoveries and new collaborations that will grow out of the facility.”

Although the machine is physically located in the basement of the Allen Hall Data Center, researchers from any department can sign up to access its services from their desktops. Increasingly, Maggio said, researchers who previously didn’t use such computational approaches are becoming computational researchers after encountering new research projects that quickly overwhelm the limits of their local resources.

“With the data explosion that’s occurred over the last 10 years, new opportunities for computational research exist in every field,” Maggio said. “There is no such thing as a non-computational discipline anymore.”

Maggio credits Schill and the UO Board of Trustees with seeing the importance of high-performance computing and prioritizing the funding and creation of the new center in under two years. Joe Sventek, head of the Department of Computer and Information Science, led a faculty committee that developed plans for acquiring the computational hardware, implemented the hiring of key staff such as Maggio and helped launch RACS — all in record time.

“The fact that Joe and the committee completed this task so quickly is simply amazing,” Conover said.

Looking to the future, Maggio envisions more and more researchers accessing the new facility.  Already, more than 300 different lab members from nearly 80 labs have requested access, and high-performance computing will likely play an increasing role in powering new research initiatives, such as the Phil and Penny Knight Campus for Accelerating Scientific Impact.

“This is the fastest and largest computing asset that the University of Oregon has ever had and it’s still growing,” Maggio said. “This is an incredibly exciting time to be engaged in computational research at the University of Oregon.”

To request access to large-scale computing resources, contact the Research Advanced Computing Services center at racs@uoregon.edu.

Source: University of Oregon

The post University of Oregon Uses Supercomputer to Power New Research Advanced Computing Facility appeared first on HPCwire.

PEZY President Arrested, Charged with Fraud

Wed, 12/06/2017 - 10:19

The head of Japanese supercomputing firm PEZY Computing was arrested Tuesday on suspicion of defrauding a government institution of 431 million yen (~$3.8 million). According to reports in the Japanese press, PEZY founder, president and CEO Motoaki Saito and another PEZY employee, Daisuke Suzuki, are charged with profiting from padded claims they submitted to the New Energy and Industrial Technology Development Organization (NEDO).

PEZY, which stands for Peta, Exa, Zetta, Yotta, designed the manycore processors for “Gyoukou” one of the world’s fastest and most energy-efficient supercomputers. Installed at Japan Agency for Marine-Earth Science and Technology, Gyoukou achieved a fourth-place ranking on the November 2017 Top500 list with 19.14 petaflops of Linpack performance. Four of the five top systems on current Green500 listing are PEZY based, including Gyoukou in fifth position and the number one machine, Shoubu system B, operated by RIKEN.

Saito is also the founder and CEO of ExaScaler, which manufacturers the PEZY systems using its immersion liquid cooling technology, and Ultra Memory, Inc., a startup working on 3D multi-layer memory technology.

All three companies (PEZY, Exascaler, and Ultra Memory) have been in joint collaboration to develop an exascale supercomputer in the 2019 timeframe.

PEZY Computing was founded in January 2010 and introduced its first generation manycore microprocessor PEZY-1 in 2012; PEZY-SC followed in 2014. The third-generation chip, PEZY-SC2, was released in early 2017. The company has an estimated market cap of 940 million yen ($8.4 million).

NEDO is one of the largest public R&D management organizations in Japan, promoting the development and introduction of industrial and energy technologies.

The post PEZY President Arrested, Charged with Fraud appeared first on HPCwire.

Survey from HSA Foundation Highlights Importance, Benefits of Heterogeneous Systems

Wed, 12/06/2017 - 09:07

BEAVERTON, Ore., Dec. 6, 2017 — The Heterogeneous System Architecture (HSA) Foundation today released key findings from a second comprehensive members survey. The survey reinforced why heterogeneous architectures are becoming integral for future electronic systems.

HSA is a standardized platform design supported by more than 70 technology companies and universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources—including CPUs, GPUs, DSPs, FPGAs, fabrics and fixed function accelerators—in today’s complex systems-on-chip (SoCs).

Some of the survey questions – and results:

Will the system have HSA features? 

Last year, 58.82% of the respondents answered affirmatively; this year, 100%!

Will it be HSA-compliant?

In 2016, 69.23% said it would; 2017 figures rose to 80%.

What is the top challenge in implementing heterogeneous systems?

27.27% responded in 2016 that it was a lack of standards for software programming models; the 2017 survey also identified this as the most important issue, but the numbers decreased to 7.69%.

What is the top challenge in implementing heterogeneous systems?

Half of the respondents last year said it was a lack of developer ecosystem momentum.  Once again this was identified as the key issue.

Some remarks that further accentuate key survey findings:

“Many HSA Foundation members are currently designing, programming or delivering a wide range of heterogeneous systems – including those based on HSA,” said HSA Foundation President Dr. John Glossner. “Our 2017 survey provides additional insight into key issues and trends affecting these systems that power the electronic devices across every aspect of our lives.”

Greg Stoner, HSA Foundation Chairman and Managing Director said that “the Foundation is developing resources and ecosystems conducive to its members’ various focuses on different application areas, including machine learning, artificial intelligence, datacenter, embedded IoT, and high-performance computing. The Foundation has also been making progress in support of these ecosystems, getting closer to taking normal C++ code and compiling to an HSA system.”

Stoner added that “ROCm 7 by AMD will port HSA for Caffe and TensorFlow; GPT, in the meantime, is releasing an open-sourced HSAIL-based Caffe library, with the first version already up and running – this permits early access for developers.”

Dr. Xiaodong Zhang, from Huaxia General Processor Technologies, who serves as chairman of the China Regional Committee (CRC; established by the HSA Foundation to enhance global awareness of heterogeneous computing), said that “China’s semiconductor industry is rapidly developing, and the CRC is building an ecosystem in the region to include technology, talent, and markets together with an open approach to take advantage of synergies among industry, academia, research, and applications.”

About the HSA Foundation

The HSA (Heterogeneous System Architecture) Foundation is a non-profit consortium of SoC IP vendors, OEMs, Academia, SoC vendors, OSVs and ISVs, whose goal is making programming for parallel computing easy and pervasive. HSA members are building a heterogeneous computing ecosystem, rooted in industry standards, which combines scalar processing on the CPU with parallel processing on the GPU, while enabling high bandwidth access to memory and high application performance with low power consumption. HSA defines interfaces for parallel computation using CPU, GPU and other programmable and fixed function devices, while supporting a diverse set of high-level programming languages, and creating the foundation for next-generation, general-purpose computing.

Source: HSA Foundation

The post Survey from HSA Foundation Highlights Importance, Benefits of Heterogeneous Systems appeared first on HPCwire.

Raytheon Developing Superconducting Computing Technology for Intelligence Community

Tue, 12/05/2017 - 12:21

CAMBRIDGE, Mass., Dec. 5, 2017 — A Raytheon BBN Technologies-led team is developing prototype cryogenic memory arrays and a scalable control architecture under an award from the Intelligence Advanced Research Projects Activity Cryogenic Computing Complexity program.

The team recently demonstrated an energy-efficient superconducting/ferromagnetic memory cell—the first integration of a superconducting switch controlling a cryogenic memory element.

“This research could generate a new approach to supercomputing that is more efficient, faster, less expensive, and requires a smaller footprint,” said Zachary Dutton, Ph.D. and manager of the quantum technologies division at Raytheon BBN Technologies.

Raytheon BBN is the prime contractor leading a team that includes:

  • Massachusetts Institute of Technology
  • New York University
  • Cornell University
  • University of Rochester
  • University of Stellenbosch
  • HYPRES, Inc.
  • Canon U.S.A, Inc.,
  • Spin Transfer Technologies, Inc.

Raytheon BBN Technologies is a wholly owned subsidiary of Raytheon Company (NYSE: RTN).

About Raytheon 

Raytheon Company, with 2016 sales of $24 billion and 63,000 employees, is a technology and innovation leader specializing in defense, civil government and cybersecurity solutions. With a history of innovation spanning 95 years, Raytheon provides state-of-the-art electronics, mission systems integration, C5ITM products and services, sensing, effects, and mission support for customers in more than 80 countries. Raytheon is headquartered in Waltham, Massachusetts.

Source: Raytheon

The post Raytheon Developing Superconducting Computing Technology for Intelligence Community appeared first on HPCwire.

Cavium Partners with IBM for Next Generation Platforms by Joining OpenCAPI

Tue, 12/05/2017 - 12:12

SAN JOSE, Calif., Dec. 5, 2017 — Cavium, Inc. (NASDAQ: CAVM), a leading provider of semiconductor products that enable secure and intelligent processing for enterprise, data center, wired and wireless networking is partnering with IBM for next generation platforms by joining OpenCAPI, an initiative founded by IBM, Google, AMD and others. OpenCAPI provides high-bandwidth, low latency interface optimized to connect accelerators, IO devices and memory to CPUs. With this announcement Cavium plans to bring its leadership in server IO and security offloads to next generation platforms that support the OpenCAPI interface.

Traditional system architectures are becoming a bottleneck for new classes of data-centric applications that require faster access to peripheral resources like memory, I/O and accelerators. For the efficient deployment and success of such applications, it is imperative to put the compute power closer to the data. OpenCAPI, a mature and complete specification enables such a server design, that can increase datacenter server performance by several times, enabling corporate and cloud data centers to speed up big data, machine learning, analytics, and other emerging workloads. Capable of 25Gbits per second data rate, OpenCAPI delivers the best in class performance, enabling the maximum utilization of high speed I/O devices like Cavium Fibre Channel adapters, low latency Ethernet NICs, programmable SmartNIC and security solutions.

Cavium delivers the industry’s most comprehensive family of I/O adapters and network acclerators which have the potential to be seamlessly inegrated into OpenCAPI based systems. Cavium’s portfolio includes FastLinQ® Ethernet Adapters, Converged Networking Adapters, LiquidIO SmartNICs, Fibre Channel Adapters and NITROX® Security Accelerators that cover the entire spectrum for data-centric application connectivity, offload and accleration requirements.

“We welcome Cavium to the OpenCAPI consortium to fuel innovation for today’s data-intensive cognitive workloads,” said Bob Picciano, Senior Vice President, IBM Cognitive Systems. “Together, we will tap into Cavium’s next-generation technology, including networking and accelerators, and work in tandem with other partners’ systems technology to unleash high-performance capabilities for our clients’ data center workloads.”

“We are excited to be a part of the OpenCAPI consortium. As our partnership with IBM continues to grow, we see more synergies in high speed communication and Artificial Intelligence applications,” said Syed Ali, founder and CEO of Cavium.  “We look forward to working with IBM to enable exponential performance gains for these applications.”

About Cavium

Cavium, Inc. (NASDAQ: CAVM), offers a broad portfolio of infrastructure solutions for compute, security, storage, switching, connectivity and baseband processing. Cavium’s highly integrated multi-core SoC products deliver software compatible solutions across low to high performance points enabling secure and intelligent functionality in Enterprise, Data Center and Service Provider Equipment. Cavium processors and solutions are supported by an extensive ecosystem of operating systems, tools, application stacks, hardware reference designs and other products. Cavium is headquartered in San Jose, CA with design centers in California, Massachusetts, India, Israel, China and Taiwan.

Source: Cavium

The post Cavium Partners with IBM for Next Generation Platforms by Joining OpenCAPI appeared first on HPCwire.