Feed aggregator

Mines Residence Life staff win major conference awards

Colorado School of Mines - Mon, 11/13/2017 - 11:37

Colorado School of Mines residence life staff took home multiple major awards from the Intermountain Affiliate of College and University Residence Halls Regional Leadership Conference, held Nov. 3-6 in Albuquerque, New Mexico.

Mines received the Program of the Year award for bringing members of the U.S Paralympic goalball team to campus to teach students how to play the game and then organizing a campus tournament. Goalball, designed for athletes with impaired vision, has teams competing to throw a ball with bells inside into their opponents’ goal.

Chase Schumacher, an engineering physics major and a second-year resident assistant in Weaver Towers, was named Student Staff Member of the Year.

Mary F. Elliott, Mines’ director of housing and residence life, was named Advisor of the Year.

Mines students were also honored for presenting two of the top 12 programs at the conference. Brandon Bakka, a chemical engineering student, was recognized for “How LGBTQ+ People Navigate the Jungle of College Campuses.” Schumacher and Keenan Urmann, a mechanical engineering student, were recognized for “Miracle Gro Fer(Tea)lizer, a weekly Tuesday Tea program in the residence halls.

CONTACT
Mark Ramirez, Managing Editor, Communications and Marketing | 303-273-3088 | ramirez@mines.edu
Emilie Rusch, Public Information Specialist, Communications and Marketing | 303-273-3361 | erusch@mines.edu

Categories: Partner News

Netlist, Nyriad and TYAN to Accelerate the Adoption of NVDIMMs and GPUs for Storage

HPC Wire - Mon, 11/13/2017 - 10:49

DENVER, Nov. 13, 2017 — Netlist, Inc. (NASDAQ: NLST), Nyriad and TYAN today announced a solution to support Netlist NVvault non-volatile memory for cache acceleration in Nyriad’s graphics processing unit (GPU)-accelerated storage platform, NSULATE on a TYAN Thunder server.

By adopting Netlist’s NVvault DDR4 NVDIMM-N non-volatile memory, Nyriad NSULATE-based storage systems can be configured to achieve millions of IOPS, sustaining high throughput while also enabling levels of storage resilience and integrity –  impossible with traditional central processing unit (CPU) or redundant array of independent disks (RAID) -based solutions.

The Netlist and Nyriad technologies will be showcased on a TYAN Thunder HX FT77D-B7109 dual root complex 4U 8GPU server configured with Netlist’s NVvault at the SuperComputing 2017 Conference Exhibition taking place in Denver, CO from November 13-16. Additional information on the demonstration will be available at Netlist’s booth #2069 and TYAN’s booth #1269.

C.K. Hong, Netlist Chief Executive Officer, said, “NVvault, which is part of our storage-class memory family of solutions, is vital to Nyriad’s NSULATE accelerated and resilient storage-processing architecture. When combined with TYAN’s latest server targeted at big data and high-performance computing applications, we have created a game changing platform to drive improved IOPS (input/output operations per second), security, scale, performance and total storage array cost per terabyte. The solution enables NVvault to bring substantial performance benefits to end user applications such as big-data analytics by storing data in a way that is directly accessible to high performance GPUs.”

Nyriad Chief Executive Officer Matthew Simmons stated, “Processing and storing large volumes of data has become so I/O (input/output) intensive that traditional storage and network fabrics can’t cope with the volume of information that needs to be processed and stored in real-time. However, GPUs have become the dominant solution for modern high-performance computing, big-data and machine learning applications.  Our collaboration with Netlist and TYAN has broken this bottleneck and will enable major leaps in exascale storage performance and efficiency.”

Danny Hsu, Vice President of MiTAC Computing Technology Corporation’s TYAN Business Unit stated, “For many years, TYAN has met the ongoing challenge to provide efficient and powerful products that can support demanding applications in many areas, including the storage and high-performance computer space. Towards this goal, we are working with Netlist and Nyriad to define a new kind of computing solution to address vastly larger data sets and analytics, offering huge performance gains for customers worldwide.”

Netlist’s NVvault DDR4 is an NVDIMM-N that provides data acceleration and protection in a JEDEC standard DDR4 interface. It is designed to be integrated into industry standard server or storage solutions.  NVvault is a persistent memory technology that has been widely adopted by industry standard servers and storage systems.  By combining the high performance of DDR4 DRAM with the non-volatility of NAND Flash, NVvault improves the performance and data preservation found in storage virtualization, RAID, cache protection, and data logging applications requiring high-throughput.

Nyriad’s NSULATE solves these problems by replacing RAID controllers with GPUs for all Linux storage applications. This enables the GPUs to perform double duty as both I/O controllers and compute accelerators in the same integrated solution. The combination of Netlist NV Memory with NSULATE produces the best of both worlds, with low-latency IOPS achievable by any storage solution combined with maximum data resilience, security, throughput and efficiency in the same architecture.

The first next-generation solutions based on the Netlist and Nyriad technology is expected to appear in the market from leading industry partners early next year.

About Netlist

Netlist is a leading provider of high-performance modular memory subsystems serving customers in diverse industries that require superior memory performance to empower critical business decisions. Flagship products NVvault and EXPRESSvault enable customers to accelerate data running through their servers and storage and reliably protect enterprise-level cache, metadata and log data by providing near instantaneous recovery in the event of a system failure or power outage. HybriDIMM, Netlist’s next-generation storage class memory product, addresses the growing need for real-time analytics in Big Data applications and in-memory databases. Netlist holds a portfolio of patents, many seminal, in the areas of hybrid memory, storage class memory, rank multiplication and load reduction. Netlist is part of the Russell Microcap Index.  To learn more, visit www.netlist.com.

About Nyriad

Nyriad is a New Zealand-based exascale computing company specializing in advanced data storage solutions for big data and high-performance computing. Born out of its consulting work on the Square Kilometre Array Project, the company was forced to rethink the relationship between storage, processing and bandwidth to achieve a breakthrough in system stability and performance capable of processing and storing over 160Tb/s of radio antennae data in real-time, within a power budget impossible with any modern IT solutions.

About TYAN

TYAN, as a leading server brand of MiTAC Computing Technology Corporation under the MiTAC Group (TSE:3706), designs, manufactures and markets advanced x86 and x86-64 server/workstation board technology, platforms and server solution products. Its products are sold to OEMs, VARs, System Integrators and Resellers worldwide for a wide range of applications. TYAN enables its customers to be technology leaders by providing scalable, highly-integrated, and reliable products for a wide range of applications such as server appliances and solutions for HPC, hyper-scale/data center, server storage and security appliance markets. For more information, visit MiTAC’s website athttp://www.mic-holdings.com or TYAN’s website at http://www.tyan.com

Source: Netlist

The post Netlist, Nyriad and TYAN to Accelerate the Adoption of NVDIMMs and GPUs for Storage appeared first on HPCwire.

Red Hat Introduces Arm Server Support for Red Hat Enterprise Linux

HPC Wire - Mon, 11/13/2017 - 10:31

Nov. 13, 2017 — Today marks a milestone for Red Hat Enterprise Linux with the addition of a new architecture to its list of fully supported platforms. Red Hat Enterprise Linux for ARM is a part of its multi-architecture strategy and the culmination of a multi-year collaboration with the upstream community and its silicon and hardware partners.

The Arm ecosystem has emerged over the last several years with server-optimized SoC (system on chip) products that are designed for cloud and hyperscale, telco and edge computing, as well as high-performance computing applications. Arm SoC designs take advantage of advances in CPU technology, system-level hardware, and packaging to offer additional choices to customers looking for tightly integrated hardware solutions.

Red Hat took a pragmatic approach to Arm servers by helping to drive open standards and develop communities of customers, partners and a broad ecosystem. Its goal was to develop a single operating platform across multiple 64-bit ARMv8-A server-class SoCs from various suppliers while using the same sources to build user functionality and consistent feature set that enables customers to deploy across a range of server implementations while maintaining application compatibility.

In 2015, Red Hat introduced a Development Preview of the operating system to silicon partners, such as Cavium and Qualcomm, and OEM partners, like HPE, that designed and built systems based on a 64-bit Arm architecture. A great example of this collaboration was the advanced technology demonstration by HPE, Cavium, and Red Hat at the International Supercomputing conference in June 2017. That prototype solution became part of HPE’s Apollo 70 system, announced today. If you are attending SuperComputing17 this week, stop by Red Hat’s booth (#1763) to learn more about this new system.

Red Hat’s focus is to provide software support for multiple architectures powered by a single operating platform – Red Hat Enterprise Linux, and driven by open innovation. Red Hat Enterprise Linux 7.4 for ARM, the first commercial release for this architecture, provides customers who have been planning to run their workloads and software and hardware partners that require a stable operating environment to continue development, with a proven and more secure enterprise-grade platform. They plan to continue their work with the ecosystem to expand the reach for Red Hat Enterprise Linux 7.4 for ARM.

In addition to Red Hat Enterprise Linux, Red Hat is also shipping Red Hat Software Collections 3Red Hat Developer Toolset 7 and single host KVM virtualization (as an unsupported Development preview) for this architecture.

To learn more about Red Hat Enterprise Linux 7.4 for ARM, see the release notes at https://access.redhat.com/articles/3158541

Source: Red Hat

The post Red Hat Introduces Arm Server Support for Red Hat Enterprise Linux appeared first on HPCwire.

Penguin Computing Announces Intel Xeon Scalable Processor Availability for On-Demand HPC Cloud

HPC Wire - Mon, 11/13/2017 - 09:04

FREMONT, Calif., Nov. 13, 2017 — Penguin Computing, provider of high performance computing, enterprise data center and cloud solutions, today announced that more than 11,500 cores of the latest Intel Xeon Scalable processor (codenamed: Skylake-SP) will be available in December 2017 on Penguin Computing On-Demand (POD) HPC cloud. The new POD HPC cloud compute resources use Intel Xeon Gold 6148 processors, a cluster-wide Intel Omni-Path Architecture low-latency fabric and are integrated with Penguin Computing Scyld Cloud Workstation for web-based, remote desktop access into the public HPC cloud service.

“As an HPC cloud provider, we know that it is critical to provide our customers with the latest processor technologies,” said Victor Gregorio, Senior Vice President, Cloud Services, Penguin Computing. “The latest Intel Xeon Scalable processor expansion will provide an ideal compute environment for MPI workloads that can leverage thousands of cores for computation. We have significant customer demand for POD HPC cloud in applicable areas like high-resolution weather forecasting and computational fluid dynamics, including solutions from software partners like ANSYS, Flow Science and CD-adapco.”

“Intel offers a balanced portfolio of HPC optimized components like the Intel Xeon Scalable processor and Intel Omni-Path Architecture, which provides the foundation for researchers and innovators to drive new discoveries and build new products faster than ever before,” said Trish Damkroger, Vice President of technical computing at Intel. “Penguin Computing On-Demand provides an easy and flexible path to access the latest technology so more users can realize the benefits of HPC.”

Scientists and engineers at every company are trying to innovate faster while holding down costs. Modeling and simulation are the backbone of these efforts. Customers may wish to run simulations at scale, or many different permutations simultaneously but may require more computing resources than are readily available in-house. The POD HPC cloud offers organizations a flexible, cost effective approach to meeting these requirements.

The Intel Xeon Scalable processor provides increased performance, a unified stack optimized for key workloads including data analytics, and integrated technologies including networking, acceleration and storage. The processor’s increased performance is realized through innovations including Intel® AVX-512 extensions that can deliver up to 2x FLOPS per clock cycle, which is especially important for HPC, data analytics and hardware-enhanced security/cryptography workloads. Along with numerous acceleration refinements, the new processor offers integrated 100 Gb/s Intel® Omni-Path Architecture fabric options. With these improvements, the Intel Xeon Scalable Platinum 8180 processor yielded an increase of up to 8.2x more double precision GFLOPS/sec when compared to Intel Xeon processor E5-2690 (codenamed Sandy Bridge) common in the server installed-base, and a 2.27x increase over the previous-generation Intel Xeon processor E5-2699 v4 (codenamed Broadwell)1.

The doubling of cores in the publicly available POD HPC cloud resources in 2017 was proceeded by a 50 percent increase in capacity in 2016. As customer demand continues to increase, POD HPC cloud will continue to grow using the most current technologies to deliver the actionable insights that organizations require.

Visit Penguin Computing at Booth 1801 during SC17 in Denver.

About Penguin Computing

Penguin Computing is one of the largest private suppliers of enterprise and high-performance computing solutions in North America and has built and operates the leading specialized public HPC cloud service, Penguin Computing On-Demand (POD). Penguin Computing pioneers the design, engineering, integration and delivery of solutions that are based on open architectures and comprise non-proprietary components from a variety of vendors. Penguin Computing is also one of a limited number of authorized Open Compute Project (OCP) solution providers leveraging this Facebook-led initiative to bring the most efficient open data center solutions to a broader market, and has announced the Tundra product line which applies the benefits of OCP to high performance computing. Penguin Computing has systems installed with more than 2,500 customers in 40 countries across eight major vertical markets.

Source: Penguin Computing

The post Penguin Computing Announces Intel Xeon Scalable Processor Availability for On-Demand HPC Cloud appeared first on HPCwire.

Cavium and Leading Partners to Showcase ThunderX2 Arm-Based Server Platforms and FastLinQ Ethernet Adapters for HPC at SC17

HPC Wire - Mon, 11/13/2017 - 08:52

SAN JOSE, Calif. and DENVER, Nov. 13, 2017 — Cavium, Inc. (NASDAQ: CAVM), a leading provider of semiconductor products that enable secure and intelligent processing for enterprise, data center, wired and wireless networking, will showcase various ThunderX2 Arm-based server platforms for high performance computing at this year’s Supercomputing (SC17) conference taking place in the Colorado Convention Center in Denver, Colorado from November 13th to 16th.

ThunderX2 server SoC integrates fully out-of-order, high-performance custom cores supporting single and dual-socket configurations. ThunderX2 is optimized to drive high computational performance delivering outstanding memory bandwidth and memory capacity. The new line of ThunderX2 processors includes multiple SKUs for both scale up and scale out applications and is fully compliant with Armv8-A architecture specifications as well as the Arm Server Base System Architecture and Arm Server Base Boot Requirements standards.

ThunderX2 SoC family is supported by a comprehensive software ecosystem ranging from platform level systems management and firmware to commercial Operating Systems, Development Environments and Applications. Cavium has actively engaged in server industry standards groups such as UEFI and delivered numerous reference platforms to a broad array of community and corporate partners.  Cavium has also demonstrated its leadership role in the Open Source software community driving upstream kernel enablement and toolchain optimization, actively contributing to Linaro’s Enterprise and Networking Groups, investing in key Linux Foundation projects such as DPDK, OpenHPC, OPNFV and Xen and sponsoring the FreeBSD Foundation’s Armv8 server implementation.

SC17 Show Highlights and Product Demonstrations

Cavium’s executive leaders and technology experts will be available to discuss the company’s ThunderX2 processor technology, platforms, roadmap and HPC target solutions while demonstrating a range of platforms and configurations. Many of Cavium’s key partners will also be present with demonstrations that include system implementation, system software, tools and applications.  In addition to the ThunderX2 based ODM and OEM platforms and Cavium’s FastLinQ Ethernet Adapters, the following product demonstrations will be on display on the show floor and at Cavium’s booth #349.

  • Cavium ThunderX2 – 64–bit Armv8 based SoC family that significantly increases performance, memory bandwidth and memory capacity. We will be demonstrating various applications running on ThunderX2 in both single and dual socket configurations. Cavium’s systems partners Bull/Atos (Booth #1925), Cray (Booth #625), Gigabyte (Booth #2151), HPE (Booth #925), and Penguin (Booth #1801) will be showcasing HPC platforms based on ThunderX2. Cavium’s software partners will be demonstrating a variety of software tools and applications optimized for ThunderX2. In addition, there will be a full rack of ThunderX2 based systems showcased in HPE’s Comanche collaboration booth #494.
  • Cavium FastLinQ: – 10/25/40/50/100Gb Ethernet adapters that enable the highest level of application performance with the industry’s only Universal RDMA capability that supports RoCE v1, RoCE v2 and iWARP concurrently. With the explosion of data there is a critical need for fast and intelligent I/O throughout the data center. Cavium FastLinQ products enable machine learning, data analytics, NVMe over fabrics storage while maximizing system performance.

The following additional presentations by Cavium will cover ThunderX2 updates, Arm Ecosystem, and End User Optimizations focused on HPC.

  • On Monday, November 13, 2017 at 3:30 pm, Surya Hotha, Director of Product Marketing for Cavium’s Datacenter Processor Group, will present ThunderX2 in
  • HPC applications at the third annual Arm SC HPC User Forum.
  • On Tuesday, November 14, 2017 at 10.30 am, Giri Chukkapalli, Distinguished Engineer, will present ThunderX2 technology overview at the Red Hat Theater.
  • On Tuesday, November 14, 2017 at 2.30 pm, Varun Shah, Product Marketing Manager for Cavium’s Datacenter Processor Group, will present ThunderX2 advantages for the HPC Market at the Exhibitor Forum.
  • On Tuesday, November 14, 2017 at 2.30 pm, Giri Chukkapalli, Distinguished Engineer, will present ThunderX2 technology overview at the HPE Theater.
  • On Wednesday, November 15, 2017 at 2.30 pm, Cavium experts will present at the SUSE booth.

To schedule a meeting at SC17, please send an email to sales@cavium.com and enter SC17 Meeting Request in the subject line.

About Cavium

Cavium, Inc. (NASDAQ: CAVM), offers a broad portfolio of infrastructure solutions for compute, security, storage, switching, connectivity and baseband processing. Cavium’s highly integrated multi-core SoC products deliver software compatible solutions across low to high performance points enabling secure and intelligent functionality in Enterprise, Data Center and Service Provider Equipment. Cavium processors and solutions are supported by an extensive ecosystem of operating systems, tools, application stacks, hardware-reference designs and other products. Cavium is headquartered in San Jose, CA with design centers in California, Massachusetts, India, Israel, China and Taiwan. For more information, please visit: http://www.cavium.com.

Source: Cavium

The post Cavium and Leading Partners to Showcase ThunderX2 Arm-Based Server Platforms and FastLinQ Ethernet Adapters for HPC at SC17 appeared first on HPCwire.

Oak Ridge National Laboratory Acquires Atos Quantum Learning Machine

HPC Wire - Mon, 11/13/2017 - 08:31

PARIS and IRVING, Tex., Nov. 13, 2017 — Atos, a global leader in digital transformation, today announces a new contract with US-based Oak Ridge National Laboratory (ORNL) for a 30-Qubit Atos Quantum Learning Machine (QLM), the world’s highest-performing quantum simulator.

Designed by the ‘Atos Quantum’ laboratory, the first major quantum industry program in Europe, the Atos QLM combines an ultra-compact machine with a universal programming language. The appliance enables researchers and engineers to develop and test today the quantum applications and algorithms of tomorrow.

As the Department of Energy’s largest multi-program science and energy laboratory, ORNL employs almost 5,000 people, including scientists and engineers in more than 100 disciplines. The Atos QLM-30, processing up to 30 quantum bits (Qubits) in-memory, installed at ORNL was operational within hours thanks to Atos’ fast-start process. Set up as a stand-alone appliance, the Atos QLM can run on premise ensuring confidentiality of clients’ research programs and data.

ORNL’s Quantum Computing Institute Director, Dr. Travis Humble says:

“At ORNL, we are preparing for the next-generation of high-performance computing by investigating unique technologies such as quantum computing.

We are researching how quantum computing can provide new methods for advancing scientific applications important to the Department of Energy.

Our researchers focus on applications in the physical sciences, such as chemistry, materials science, and biology, as well as the applied and data sciences. Numerical simulation helps to guide development of these scientific applications and support understanding program correctness. The Atos Quantum Learning Machine provides a unique platform for testing new quantum programming ideas.”

Thierry Breton, CEO and Chairman of Atos, adds:

“We are glad to accompany Oak Ridge National Laboratory from the outset in what is likely to be the next major technological evolution. Thanks to our Atos Quantum Learning Machine, designed by our quantum lab supported by an internationally renowned Scientific Council, researchers from the Department of Energy will benefit from a simulation environment which will enable them to develop quantum algorithms to prepare for the major accelerations to come.”

In the coming years, quantum computing should be able to tackle the explosion of data brought about by Big Data and the Internet of Things. Thanks to its innovative targeted computing acceleration capabilities based in particular on the exascale class supercomputer Bull Sequana, quantum computing should also foster developments in deep learning, algorithms and artificial intelligence for domains as varied as pharmaceutical or new materials. To move forward on these issues, Atos plans to set up several partnerships with research centers and universities around the world.

About Atos

Atos is a global leader in digital transformation with approximately 100,000 employees in 72 countries and annual revenue of around € 12 billion. European number one in Big Data, Cybersecurity, High Performance Computing and Digital Workplace, the Group provides Cloud services, Infrastructure & Data Management, Business & Platform solutions, as well as transactional services through Worldline, the European leader in the payment industry. With its cutting-edge technologies, digital expertise and industry knowledge, Atos supports the digital transformation of its clients across various business sectors: Defense, Financial Services, Health, Manufacturing, Media, Energy & Utilities, Public sector, Retail, Telecommunications and Transportation. The Group is the Worldwide Information Technology Partner for the Olympic & Paralympic Games and operates under the brands Atos, Atos Consulting, Atos Worldgrid, Bull, Canopy, Unify and Worldline. Atos SE (Societas Europaea) is listed on the CAC40 Paris stock index.

Source: Atos

The post Oak Ridge National Laboratory Acquires Atos Quantum Learning Machine appeared first on HPCwire.

DDN Announces New Solutions and Next Generation Monitoring Tools

HPC Wire - Mon, 11/13/2017 - 08:26

DENVER and SANTA CLARA, Calif., Nov. 13, 2017 — DataDirect Networks (DDN) today announced new high-performance computing (HPC) storage solutions and capabilities, which it will feature this week at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC17) in Denver, Colorado. The new solutions include an entry-level burst buffer appliance (IME140) for cost-effective I/O acceleration, a next generation monitoring software (DDN Insight), and the company’s new declustered RAID solution (SFA Declustered RAID “DCR”) for increased data protection in massive storage pools. DDN also announced recent HPC customer wins in some of the world’s largest supercomputing centers.

“Modern HPC workflows require new levels of performance, flexibility and reliability to turn data and ideas into value,” said John Abbott, founder and research VP, 451 Research. “With its long-standing HPC storage heritage, DDN is strongly positioned with closely integrated components that can deliver extreme I/O performance, comprehensive monitoring at scale and new levels of data protection.”

New DDN Solutions and Next Generation Monitoring Tools

HPC and data-intensive enterprise environments are facing new pressures that stem from higher application diversity and sophistication along with a steep growth in volume of active datasets. These trends present a tough challenge to today’s filesystems in delivering the performance and economics to match business needs and compute capability. In addition, as rotational drive capacities grow the risk of data loss increases due to longer drive rebuild times. DDN’s latest technology innovations deliver the enhanced performance, flexibility and management simplicity needed to solve these challenges and to accelerate large-scale workflows for greater operational efficiency and ROI.

  • DDN IME140 
    DDN has expanded its IME product line with the new IME140 that makes IME scale-out flash accessible to more organizations at lower cost. The IME140 supports extreme file performance in a small 1U flash data appliance. Each appliance can deliver more than 11GB/s write and 20GB/s read throughputs and more than 1M file IOPs (read and write). Starting with a resilient solution as small as 4 units, the IME140 allows organizations to cost-effectively scale performance independent of amount of capacity required. Traditional parallel file systems often cannot keep pace with the mixed I/O requirements of modern workloads and fail to deliver the potential of flash. The IME software implements a faster, leaner data path which delivers to applications the low-latencies and high throughputs of NVMe. The IME140 1U building block allows organizations to intelligently apply fast flash where it is needed, while maintaining cost-effective capacity on HDD within the file system.
  • DDN Insight 
    DDN Insight is DDN’s next-generation monitoring software.  Easy to deploy, DDN Insight allows customers to monitor the most challenging environments at scale, across multiple file systems and storage appliances. With DDN Insight customers can quickly identify and address hot spots, bottlenecks and misbehaving applications. Tightly integrated with SFA, EXAScaler, GRIDScaler and IME, DDN Insight delivers an intuitive way for customers to comprehensively monitor their complete DDN-based ecosystem.

Availability

The DDN IME140 will ship in volume in the first quarter of 2018. The SFA DCR is shipping today with the SFA14KX, and DDN Insight monitoring software is integrated and shipping today with DDN’s SFA, EXAScaler, GRIDScaler and IME solutions.

About DDN

DataDirect Networks (DDN) is a leading big data storage supplier to data-intensive, global organizations. For almost 20 years, DDN has designed, developed, deployed and optimized systems, software and storage solutions that enable enterprises, service providers, universities and government agencies to generate more value and to accelerate time to insight from their data and information, on premise and in the cloud. Organizations leverage the power of DDN storage technology and the deep technical expertise of its team to capture, store, process, analyze, collaborate and distribute data, information and content at the largest scale in the most efficient, reliable and cost-effective manner. DDN customers include many of the world’s leading financial services firms and banks, healthcare and life science organizations, manufacturing and energy companies, government and research facilities, and web and cloud service providers. For more information, go to www.ddn.com or call 1-800-837-2298.

Source: DDN

The post DDN Announces New Solutions and Next Generation Monitoring Tools appeared first on HPCwire.

CoolIT Systems Showcases Newest Datacenter Liquid Cooling Innovations for OEM and Enterprise Customers at SC17

HPC Wire - Mon, 11/13/2017 - 08:04

DENVER, Nov. 13, 2017 — CoolIT Systems (CoolIT), a global leader in energy efficient liquid cooling solutions for HPC, Cloud and Hyperscale markets, returns to the highly-anticipated Supercomputing Conference 2017 (SC17) in Denver, Colorado for the sixth consecutive year with its latest Rack DCLC and Closed-Loop DCLC innovations for data centers and servers.

As the most popular integration partner for OEM server manufacturers, CoolIT will showcase liquid-enabled servers from Intel, Dell EMC, HPE, and Huawei. Combined with the broadest range of heat exchangers and supporting liquid infrastructure, CoolIT and their OEM partners are delivering the most complete and robust liquid cooling solutions to the HPC market. CoolIT OEM solutions being shown at booth 1601 include:

  • Intel Buchanan Pass – CoolIT is pleased to announce the liquid-enabled Buchanan Pass server with coldplates managing heat from the processor, voltage regulator, and memory.
  • Dell EMC PowerEdge C6420 – this liquid-enabled server will be on display at the Dell EMC booth (#913) within a fully populated rack, including stainless steel Manifold Modules and the best-in-class CHx80 Heat Exchange Module. With factory-installed liquid cooling, this server is purpose-built for high performance and hyperscale workloads.
  • HPE Apollo 2000 Gen9 System – optimized with Rack DCLC to significantly enhance overall data center performance and efficiency.
  • HPE Apollo Trade and Match Server Solution – optimized with Closed-Loop DCLC to increase density, decrease TCO and take advantage of enhanced performance to capitalize on High Frequency Trading trends.
  • STULZ Micro Data Center – combining CoolIT’s Rack DCLC with STULZ’ world-renowned mission critical air cooling products to create a single enclosed solution for managing high-density compute requirements.

Debuting at SC17 are two industry-first Heat Exchange Modules:

  • Rack DCLC AHx10, CoolIT’s new Liquid-to-Air CDU that delivers the benefits of rack level liquid cooling without the requirement for facility water. The standard 5U system manages 7kW at 25°C ambient air temperature and is expandable to 6U or 7U configurations (via the available expansion kit) to scale capacity up to 10kW of heat load.
  • Rack DCLC AHx2, CoolIT’s new Liquid-to-Air heat exchanger tool for OEMs and System Integrators for DCLC enabled servers to be thermally tested during the factory burn-in process, without liquid cooling infrastructure.

CoolIT will also showcase its Liquid-to-Liquid heat exchangers, including the stand-alone Rack DCLC CHx650, and the 4u Rack DCLC CHx80 that provides 80-100kW cooling capacity with N+1 reliability to manage the most challenging, high density HPC racks.

For the first time, CoolIT will showcase its advanced Rack DCLC Command2 Control System for Heat Exchange Modules. Attendees can experience the plug-and-play functionality of Command2, including built-in autonomous controls and sophisticated safety features.

The latest CPU and GPU coldplate assemblies to support CoolIT’s passive Rack DCLC platform will be displayed, including the RX1 for Intel Xeon Scalable Processor Family (Skylake), the GP1 for NVIDIA Tesla P100 and GP2 for NVIDIA Tesla V100. Additionally, CoolIT’s full coverage MX1, MX2 and MX3 memory cooling coldplates will be featured.

CoolIT will highlight customer installations including:

  • Canadian Hydrogen Intensity Mapping Experiment (CHIME), the world’s largest low-frequency radio telescope. Deployed inside a containerized environment, CoolIT’s liquid cooled system consists of 256 rack-mounted General Technics GT0180 custom 4u servers housed in 26 racks managed by Rack DCLC CHx40 Heat Exchange Modules. Featuring liquid cooled Intel Xeon Processor E5-2620 v3 and dual AMD FirePro S9300x2, CoolIT significantly lowers operating temperatures and improves performance and power-efficiencies.
  • Poznan Supercomputing and Networking Center (PSNC). The PSNC “Eagle” cluster uses 1,232 liquid cooled Huawei CH121 servers to increase density and reduce energy consumption. PSNC was able to deploy this new cluster within their existing data center without having to invest in additional air cooling infrastructure. The heated liquid is also being reused for local heating needs.

In partnership with STULZ, CoolIT will host an SC17 Exhibitor Forum presentation on high-density Chip-to-Atmosphere data center cooling solutions on Thursday 16th Nov at 11.00am. CoolIT encourages all attendees to join the Chip-to-Atmosphere: Providing Safe and Effective Cooling for High-Density, High-Performance Data Center Environments presentation in room 503-504. During the session, David Meadows, Director of Industry, Standards and Technology at STULZ Air Technology Systems, Inc. and Geoff Lyon, CEO and CTO at CoolIT Systems, will be discussing the efficiency gains and performance enhancements made possible by liquid cooling solutions.

“Liquid cooling in the data center continues to grow in adoption and delivers more compelling ROIs. Our collaboration with OEM partners such as Dell EMC, HPE, Intel and STULZ provides further evidence that the future of the data center is destined for liquid cooling,” said Geoff Lyon, CEO and CTO at CoolIT Systems.

To learn more about how CoolIT’s products and solutions maximize data center performance and efficiency, visit booth 1601 at SC17. Executives and technical staff will be on site to guide attendees through new product showcases, live demos. To set up an appointment, contact Lauren Macready at marketing@coolitsystems.com.

About CoolIT Systems

CoolIT Systems, Inc. is a world leader in energy efficient liquid cooling technology for the Data Center, Server and Desktop markets. CoolIT’s Rack DCLC platform is a modular, rack-based, advanced cooling solution that allows for dramatic increases in rack densities, component performance, and power efficiencies. The technology can be deployed with any server and in any rack making it a truly flexible solution. For more information about CoolIT Systems and its technology, visit www.coolitsystems.com.

About Supercomputing Conference (SC17) 

Established in 1988, the annual SC conference continues to grow steadily in size and impact each year. Approximately 5,000 people participate in the technical program, with about 11,000 people overall. SC has built a diverse community of participants including researchers, scientists, application developers, computing center staff and management, computing industry staff, agency program managers, journalists, and congressional staffers. This diversity is one of the conference’s main strengths, making it a yearly “must attend” forum for stakeholders throughout the technical computing community. For more information, visit http://sc17.supercomputing.org/.

Source: CoolIT Systems

The post CoolIT Systems Showcases Newest Datacenter Liquid Cooling Innovations for OEM and Enterprise Customers at SC17 appeared first on HPCwire.

Flipping the Flops and Reading the Top500 Tea Leaves

HPC Wire - Mon, 11/13/2017 - 07:58

The 50th edition of the Top500 list, the biannual publication of the world’s fastest supercomputers based on public Linpack benchmarking results, was released from SC17 in Denver, Colorado, this morning and once again China is in the spotlight, having taken what is on the surface at least a definitive lead in multiple dimensions. China now claims the most systems, biggest flops share and the number one machine for 10 consecutive lists. It’s a coup-level achievement to pull off in five years, disrupting 20 years of US dominance on the Top500, but reading deeper into the Top500 tea leaves reveals a more nuanced analysis that has as much to do with China’s benchmarking chops as it does its supercomputing flops.

PEZY-SC2 chip at ISC 2017 –click to enlarge

Before we thread that needle, let’s take a moment to review the movement at the top of the list. There are no new list entrants in the top ten and no change in the top three, but the upgraded ZettaScaler-2.2 “Gyoukou” stuck its landing for a fourth place ranking. Vaulting 65 spots, the supersized Gyoukou combines Xeons and PEZY-SC2 accelerators to achieve 19.14 petaflops, up from 1.68 petaflops on the previous list. The Top500 authors point out that the system’s 19,860,000 cores represent the highest level of concurrency ever recorded on the Top500 rankings.

Gyoukou also had the honor of being the fifth greenest supercomputer. Fellow ZettaScaler systems Shoubu system B, Suiren2 and Sakura, placed first, second and third respectively (see perf-per-watt numbers below). Nvidia’s DGX SaturnV Volta system, installed at Nvidia headquarters in San Jose, Calif., was the fourth greenest supercomputer.

Nov. 2017 Green500 top five — click to enlarge Nov. 2017 Top500 top 10

Another upgraded machine, Trinity, moved up three positions to seventh place thanks to a recent infusion of Intel Knights Landing Xeon Phi processors that raised its Linpack score from 8.10 petaflops to 14.14 petaflops. Trinity is a Cray XC40 supercomputer operated by Los Alamos National Laboratory and Sandia National Laboratories.

China still has a firm grip on the top of the list with 93-petaflops Sunway TaihuLight and 33.86-petaflops Tianhe-2, the number one and and two systems respectively, which together provide the new list with 15 percent of its flops. Piz Daint, the Cray XC50 system installed at the Swiss National Supercomputing Centre (CSCS) remains the third fastest system with 19.6 petaflops. With Gyoukou in fourth position, the fastest US system, Titan, slips another notch to fifth place, leaving the United States without a claim to any of the top four rankings. Benchmarked at 17.59 petaflops, the five-year-old Cray XK7 system installed at the Department of Energy’s Oak Ridge National Laboratory, captured the top spot for one list iteration before being knocked off its perch in June 2013 by China’s Tianhe-2. This is the first time in the list’s 24-year history that the US has not held at least a number four ranking.

Although China has enjoyed number one bragging rights for nearly four years, this is the first list that it also dominates by both system number and aggregate performance share as well. China has the most installed systems: 202 compared to 159 on the last list, while US is in second place with 144 down from 169 six month ago (Japan ranks third place with 35, followed by Germany with 20, France with 18, and the UK with 15.). Aggregate performance is similar: China holds 35.3 percent of list flops, and the US is second with 29.8 percent (then Japan with 10.8 percent, Germany with 4.5 percent, UK with 3.8 percent and France with 3.6 percent).

Based on these metrics, undoubtedly some publications will proclaim China’s supercomputing supremacy, but that would be premature. When China expanded its Top500 toehold by a factor of three at SC15, Intersect360 Research CEO Addison Snell remarked that it wasn’t so much that China discovered supercomputing as it discovered the Top500 list. This observation continues to hold water.

An examination of the new systems China is adding to the list indicates concerted efforts by Chinese vendors Inspur, Lenovo, Sugon and more recently Huawei to benchmark loosely coupled Web/cloud systems, which are not true HPC machines. To wit, 68 out of the 96 systems that China introduced onto the latest list utilize 10G networking and none are deployed at research sites. The benchmarking of Internet and telecom systems for Top500 glory is not new. You can see similar fingerprints on the list (current and historical) from HPE and IBM, but China has doubled down. For comparison’s sake, the US put 19 new systems on the list and eight of those rely on 10G networking.

Top500 development over time–countries by performance share. US is red; China is dark blue. Click to enlarge.

Not only has the Linpacking of non-HPC systems inflated China’s list presence, it’s changed the networking demographics as the number of Ethernet-based machines climbs steadily. As the Top500 authors note, Gigabit Ethernet now connects 228 systems with 204 systems using 10G interfaces. InfiniBand technology is now found on 163 systems, down from 178 systems six months ago, and is the second most-used internal system interconnect technology.

Snell provided additional perspective: “What we’re seeing is a concerted effort to list systems in China, particularly from China-based system vendors. The submission rules allow for what is essentially benchmarking by proxy. If Linpack is run and verified on one system, the result can be assumed for other systems of the same (or greater) configuration, so it’s possible to put together concerted efforts to list more systems, whether out of a desire to show apparent market share, or simply for national pride.”

Discussions of list purity and benchmarking by proxy aside, the High Performance Linpack or any one-dimensional metric has limited usefulness across today’s broad mix of HPC applications. This truth, well understood in HPC circles, is not always appreciated outside the community or among government stakeholders who want “something to show” for public investment.

“Actual system effectiveness is getting more difficult to compare, as the industry swings back toward specialized hardware,” Snell commented. “Just because one architecture outperforms another on one benchmark doesn’t make it the best choice for all workloads. This is particularly challenging for mixed-workload research environments trying to serve multiple domains. 88 percent of all HPC users say they will need to support multiple architectures for the next few years, running applications on the most appropriate systems for their requirements.”

Chip technology – click to expand (Source: Top500)

There has been stagnation on the list for several iterations and turnover is historically low. Neither Summit or Sierra (the US CORAL machines, projected to achieve ~180 petaflops) nor the upgraded Tianhe-2A (projected 94.97 petaflops peak) made the cut for the 50th list as had been speculated. While HPC is seeing a time of increased architectural diversity at the system and processor level, the current list is less diverse by some measures. To wit, of the 136 new systems on the list, Intel is foundational to all of them (36 of these utilize accelerators*). So no new Power, no new AMD (it’s still early for EPYC) and nothing from ARM yet. In total 471 systems, or 94.2 percent, are now using Intel processors, up a notch from 92.8 percent six months ago. The share of IBM Power processors is at 14 systems, down from 21 systems in June. There are five AMD-based systems remaining on the list, down from seven one year ago.

Nvidia’s New SaturnV Volta system. Click to enlarge.

In the US, IBM Power9 systems Summit and Sierra are on track for 2018 installation at Oak Ridge and Livermore labs (respectively), and multiple other exascale-focused systems are in play in China, Europe and Japan, showcasing a new wave of architectural diversity. We expect there will be more exciting supercomputing trends to report on from ISC 2017 in Frankfurt.

*Breakdown of the 36 new accelerated systems: 29 have P100s (one with NVLink, an HPE SGI system at number 292 (Japan)), one internal Nvidia V100 Volta system (#149, SaturnV Volta); one K80-based system (#267, Lenovo); two Sugon-built P40 systems (#161, #300), and three PEZY systems (#260, #277, #308). Further, out of the 36, only the internal Nvidia machine is US-based. 30 are Chinese (by Lenovo, Inspur, Sugon); the remaining five are Japanese (by NTT, HPE, PEZY).

The post Flipping the Flops and Reading the Top500 Tea Leaves appeared first on HPCwire.

Ellexus Releases I/O Profiling Tool Suites Based on the Arm Architecture

HPC Wire - Mon, 11/13/2017 - 07:38

CAMBRIDGE, England, Nov. 13, 2017 — Ellexus, the I/O profiling company, has released versions of its flagship products BreezeHealthcheck and Mistral, all based on the Armv8-A architecture. The move comes as part of the company’s strategy to provide cross-platform support that gives engineers a uniform tooling experience across different hardware platforms.

Accompanying the release, Ellexus is also announcing that its tools will be integrated with Arm Forge and Arm Performance Reports, market-leading tools for debugging, profiling and optimizing high performance applications, previously known as Allinea.

The integration takes advantage of a custom metrics API in the Arm tools, allowing third parties to plug into them and enable contextual analysis of more targeted performance metrics. The integration with Arm tools will provide an even more comprehensive suite of I/O profiling tools at a time when optimization has never been so important.

Unlike other profiling tools, Ellexus’ technology can be run continuously at scale. The reports generated give enough information to make every engineer an I/O expert. These tools will help organizations to deploy an I/O profiling solution as part of software qualification, as a live monitoring tool, or as a way to understand and prevent I/O problems from returning.

Ellexus Mistral is designed to run in real time on a cluster, identifying rogue jobs before they can cause a problem. In contrast, Ellexus Breeze provides an extremely detailed profile of a job or application, providing dependency analysis that makes cloud migration or migration to a different architecture easy. Ellexus’ latest tool, Healthcheck, produces a simple I/O report that tells the user what their application is doing wrong and why, giving all users the power to optimise I/O for the cluster.

Ellexus Mistral, Breeze and Healthcheck add a comprehensive layer of I/O profiling information to what is already on offer from the Arm tool suite, and can drill down to which files have been accessed. They provide additional monitoring for IT managers and dev ops engineers, in particular those who run continuous integration and testing frameworks.

Tim Whitfield, vice president and general manager, Technology Services Group, Arm, said: “Arm is always looking for ways to further optimize our high-performance application estate, and as we continue to scale up and out this has never been more important. Arm and Ellexus are continuing a deep collaboration in this space to provide a comprehensive tools suite for HPC.”

On the decision to release versions based on Arm, Dr Rosemary Francis, CEO of Ellexus, said, “As the high-performance computing industry targets new compute architectures and cloud infrastructures, it’s never been more important to optimise the way programs access large data sets. Bad I/O patterns can harm shared storage and will limit application performance, wasting millions in lost engineering time.

“We are extremely excited to announce the integration of our tools with the Arm tool suite. Together we will be able to help more organisations to get the most out of their compute clusters.”

About Ellexus

Ellexus is an I/O profiling company. From a detailed analysis of one application or workflow pipeline to whole-cluster, lightweight monitoring and reporting, it provides solutions that solve all I/O profiling needs.

Source: Ellexus

The post Ellexus Releases I/O Profiling Tool Suites Based on the Arm Architecture appeared first on HPCwire.

Tensors Come of Age: Why the AI Revolution Will Help HPC

HPC Wire - Mon, 11/13/2017 - 07:00

A Quick Retrospect

Thirty years ago, parallel computing was coming of age. A bitter battle began between stalwart vector computing supporters and advocates of various approaches to parallel computing. IBM skeptic Alan Karp, reacting to announcements of nCUBE’s 1024-microprocessor system and Thinking Machines’ 65,536-element array, made a public $100 wager that no one could get a parallel speedup of over 200 on real HPC workloads. Gordon Bell softened that to an annual award for the best speedup, what we now know as the Gordon Bell Prize.

John Gustafson

This year also marks the 30th Supercomputing Conference. At the first SC in 1988, Seymour Cray gave the keynote, and said he might consider combining up to 16 processors. Just weeks before that event, Sandia researchers had managed to get thousand-fold speedups on the 1024-processor nCUBE for several DOE workloads, but those results were awaiting publication.

The magazine Supercomputing Review was following the battle with interest, publishing a piece by a defender of the old way of doing things, Jack Worlton, titled “The Parallel Processing Bandwagon.” It declared parallelism a nutty idea that would never be the right way to build a supercomputer. Amdahl’s law and all that. A rebuttal by Gustafson titled “The Vector Gravy Train” was to appear in the next issue… but there was no next issue of Supercomputing Review. SR had made the bold step of turning into the first online magazine, back in 1987, with a new name.

Lenore Mullin

Happy 30th Anniversary, HPCwire!

What better occasion than to write about another technology that is coming of age, one we will look back on as a watershed? That technology is tensor computing: Optimized multidimensional array processing using novel arithmetic[1].

Thank you, AI

You can hardly throw a tchotchke on the trade show floor of SC17 without hitting a vendor talking about artificial intelligence (AI), deep learning, and neural nets. Google recently open-sourced its TensorFlow AI library and Tensor Processing Unit. Intel bought Nervana. Micron, AMD, ARM, Nvidia, and a raft of startups are suddenly pursuing an AI strategy. Two key ideas keep appearing:

  • An architecture optimized for tensors
  • Departure from 32-bit and 64-bit IEEE 754 floating-point arithmetic

What’s going on? And is this relevant to HPC, or is it unrelated? Why are we seeing convergent evolution to the use of tensor processors, optimized tensor algebras in languages, and nontraditional arithmetic formats?

What’s going on is that computing is bandwidth-bound, so we need to make much better use of the bits we slosh around a system. Tensor architectures place data closer to where it is needed. New arithmetic represents the needed numerical values using fewer bits. This AI-driven revolution will have a huge benefit for HPC workloads. Even if Moore’s law stopped dead in its tracks, these approaches increase computing speed and cut space and energy consumption.

Tensor languages have actually been around for years. Remember APL and Fortran 90, all you old-timers? However, now we are within reach of techniques that can automatically optimize arbitrary tensor operations on tensor architectures, using an augmented compilation environment that minimizes clunky indexing and unnecessary scratch storage[2]. That’s crucial for portability.

Portability suffers, temporarily, as we break free from standard numerical formats. You can turn float precision down to 16-bit, but then the shortcomings of IEEE format really become apparent, like wasting over 2,000 possible bit patterns on “Not a Number” instead of using them for numerical values. AI is providing the impetus to ask what comes after floats, which are awfully long in the tooth and have never followed algebraic laws. HPC people will someday be grateful that AI researchers helped fix this long-standing problem.

The Most Over-Discovered Trick in HPC

As early as the 1950s, according to the late numerical analyst Herb Keller, programmers discovered they could make linear algebra go faster by blocking the data to fit the architecture. Matrix-matrix operations in particular run best when the matrices are tiled into submatrices, and even sub-submatrices. That was the beginning of dimension lifting, an approach that seems to get re-discovered by every generation of HPC programmers. It’s time for a “grand unification” of the technique.

Level N BLAS

The BLAS developers started in the 1970s with loops on lists (level 1), then realizing doubly nested loops were needed (level 2), then triply nested (level 3), and then LAPACK and SCALAPACK introduced blocking to better fit computer architectures. In other words, we’ve been computing with tensors for a long time, but not admitting it! Kudos to Google for naming their TPU the way they did. What we need now is “level N BLAS.”

Consider this abstract way of thinking about a dot product of four-element vectors:

Notice the vector components are not numbered; think of them as a set, not a list, because that allows us to rearrange them to fit any memory architecture. The components are used once in this case, multiplied, and summed to some level (in this case, all the way down to a single number). Multiplications can be completely parallel if the hardware allows, and summation can be as parallel as binary sum reduction allows.

Now consider the same inputs, but used for 2-by-2 matrix-matrix multiplication:

Each input is used twice, either by a broadcast method or re-use, depending on what the hardware supports. The summation is only one level deep this time.

Finally, use the sets for an outer product, where each input is used four times to create 16 parallel multiplications, which are not summed at all.

All these operations can be captured in a single unified framework, and that is what we mean by “Level N BLAS.” The sets of numbers are best organized as tensors that fit the target architecture and its cost functions. A matrix really isn’t two-dimensional in concept; that’s just for human convenience, and semantics treat it that way. An algebra exists for index manipulation that can be part of the compiler smarts, freeing the programmer from having to worry about details like “Is this row-major or column-major order[4]?” Tensors free you from imposing linear ordering that isn’t required by the algorithm and that impedes optimal data placement.

Besides linear algebra, tensors are what you need for Fast Fourier Transforms (FFTs), convolutions for signal and image processing, and yes, neural networks. Knowledge representation models like PARAFAC or CANDECOMP use tensors. Most people aren’t taught tensors in college math, and tensors admittedly look pretty scary with all those subscripts. One of Einstein’s best inventions was a shorthand notation that gets rid of a lot of the subscripts (because General Relativity requires tensor math), but it still takes a lot of practice to get a “feel” for how tensors work. The good news is, computer users don’t have to learn that skill, and only a few computer programmers have to. There now exists a theory[4], and many prototypes[5], for handling tensors automatically. We just need a few programmers to make use of the existing theory of array indexing to build and maintain those tools for distribution to all[6]. Imagine being able to automatically generate a Fast Fourier Transform (FFT) without having to worry about the indexing! That’s already been prototyped[7].

Which leads us to another HPC trend that we need for architecture portability…

The Rise of the Installer Program

In the old days, code development meant edit, compile, link, and load. Nowadays, people never talk about “linkers” and “loaders.” But we certainly talk about precompilers, makefiles and installer programs. We’ve also seen the rise of just-in-time compilation in languages like Java, with system-specific byte codes to get both portability and sometimes, surprisingly high performance. The nature of who-does-what has changed quite a bit over the last few decades. Now, for example, HPC software vendors cannot ship a binary for a cluster supercomputer because they cannot know which MPI library is in use; the installer links that in.

The compiler, or preprocessor, doesn’t have to guess what the target architecture is; it can instead specify what needs to be done, but not how, stopping at an intermediate language level. The installer knows what the costs are of all the data motions in the example diagrams above, and can predict precisely what the cost of a particular memory layout is. What you can predict, you can optimize. The installer takes care of the how.

James Demmel has often described the terrible challenge of building a ScaLAPACK-like library that gets high performance for all possible situations. Call it “The Demmel Dilemma.” It appears we are about to resolve that dilemma. With tensor-friendly architectures, and proper division of labor between the human programmer and the preprocessor, compiler, and installer, we can look forward to a day when we don’t need 50 pages of compiler flag documentation, or endless trial-and-error experimentation with ways to lay out arrays in storage that is hierarchical, parallel, and complicated. Automation is feasible, and essential.

The Return of the Exact Dot Product

There is one thing we’ve left out though, and it is one of the most exciting developments that will enable all this to work. You’ve probably never heard of it. It’s the exact dot product approach invented by Ulrich Kulisch, back in the late 1960s, but made eminently practical by some folks at Berkeley just this year[8].

With floats, because of rounding errors, you will typically get a different result when you change the way a sum is grouped. Floats disobey the associative law: (a + b) + c, rounded, is not the same as a + (b + c). That’s particularly hazardous when accumulating a lot of small quantities into a single sum, like when doing Monte Carlo methods, or a dot product. Just think of how often a scientific code needs to do the sum of products, even if it doesn’t do linear algebra. Graphics codes are full of three-dimensional and two-dimensional dot products. Suppose you could calculate sums of products exactly, rounding only when converting back to the working real number format?

You might think that would take a huge, arbitrary precision library. It doesn’t. Kulisch noticed that for floating-point numbers, a fixed-size register with a few hundred bits suffices as scratch space for perfect accuracy results even for vectors that are billions of floats long. You might think it would run too slowly, because of the usual speed-accuracy tradeoff. Surprise: It runs 3–6 times faster than a dot product with rounding after every multiply-add. Berkeley hardware engineers discovered this and published their result just this summer. In fact, the exact dot product is an excellent way to get over 90 percent of the peak multiply-add speed of a system, because the operations pipeline.

Unfortunately, the exact dot product idea has been repeatedly and firmly rejected by the IEEE 754 committee that defines how floats work. Fortunately, it is an absolute requirement in posit arithmetic[9] and can greatly reduce the need for double precision quantities in HPC programs. Imagine doing a structural analysis program with 32-bit variables throughout, yet getting 7 correct decimals of accuracy in the result, guaranteed. That’s effectively like doubling bandwidth and storage compared to the 64-bits-everywhere approach typically used for structural analysis.

A Scary-Looking Math Example

If you don’t like formulas, just skip this. Suppose you’re using a conjugate gradient solver, and you want to evaluate its kernel as fast as possible:

A theory exists to mechanically transform these formulas to a “normal form” that looks like this:

That, plus hardware-specific information, allows automatic data layout that minimizes indexing and temporary storage, and maximizes locality of access for any architecture. And with novel arithmetic like posits that supports the exact dot product, you get a bitwise identical result no matter how the task is organized to run in parallel, and at near-peak speed. Programmers won’t have to wrestle with data placement, nor will they have to waste hours trying to figure out if the parallel answer is different because of a bug or because of rounding errors.

What People Will Remember, 30 Years from Now

By 2047, people may look back on the era of IEEE floating-point arithmetic the way we now regard the EBCDIC character set used on IBM mainframes (which many readers may never have heard of, but it predates ASCII). They’ll wonder how people ever tolerated the lack of repeatability and portability and the rounding errors that were indistinguishable from programming bugs, and they may reminisce about how people wasted 15-decimal accuracy on every variable as insurance, when they only needed four decimals in the result. Not unlike the way some of us old-timers remember “vectorizing” code in 1987 to get it to run faster, or “unrolling” loops to help out the compiler.

Thirty years from now, the burden of code tuning and portability for arrays will be back where it belongs: on the computer itself. Programmers will have long forgotten how to tile matrices into submatrices because the compiler-installer combination will do that for tensors for any architecture, and will produce bitwise-identical results on all systems.
The big changes that are permitting this watershed are all happening now. This year. These are exciting times! □

[1] A. Acar et al., “Tensor Computing for Internet of Things,” Dagstuhl Reports, Vol. 6, No. 4, 2016, Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, doi:10.4230/DagRep.6.4.57, http://drops.dagstuhl.de/opus/volltexte/2016/6691 pp. 57–79.

[2] Rosencrantz et al., “On Minimizing Materializations of Array-Valued Temporaries,” ACM Trans. Program. Lang. Syst., Vol. 28, No. 6, 2006, http://doi.acm.org/10.1145/118663, pp.1145–1177.

[3] L. Mullin and S. Thibault, “Reduction Semantics for Array Expressions: The Psi Compiler,” Technical Report, University of Missouri-Rolla Computer Science Dept., 1994.

[4] K. Berkling, Arrays and the Lambda Calculus, SU0CIS-90-22, CASE Center and School of CIS, Syracuse University, May 1990.

[5] S. Thibault et al., “Generating Indexing Functions of Regularly Sparse Arrays for Array Compilers,” Technical Report CSC-94-08, University of Missouri-Rolla, 1994.

[6] L. Mullin and J. Raynolds, Conformal Computing: Algebraically Connecting the Hardware/Software Boundary using a Uniform Approach to High-Performance Computation for Software and Hardware Applications, arXiv:0803.2386, 2008.

[7] H. Hunt et al., “A Transformation-Based Approach for the Design of Parallel/Distributed Scientific Software: The FFT,” CoRR, 2008, http://dblp.uni-trier.de/rec/bib/journals/corr/abs-0811-2535.

[8] http://arith24.arithsymposium.org/slides/s7-koenig.pdf.

[9] http://www.posithub.org.

About the Authors

John L. Gustafson
john.gustafson@nus.edu.sg

John L. Gustafson, Ph.D., is currently Visiting Scientist at A*STAR and Professor of Computer Science at National University of Singapore. He is a former Senior Fellow and Chief Product Architect at AMD, and a former Director at Intel Labs. His work showing practical speedups for distributed memory parallel computing in 1988 led to his receipt of the inaugural Gordon Bell Prize, and his formulation of the underlying principle of “weak scaling” is now referred to as Gustafson‘s law. His 2015 book, “The End of Error: Unum Computing” has been an Amazon best-seller in its category. He is a Golden Core member of IEEE. He is also an “SC Perennial” who has been to every Supercomputing conference since the first one in 1988. He is an honors graduate of Caltech and received his MS and PhD from Iowa State University.

Lenore Mullin
lenore@albany.edu

Lenore M. Mullin, Ph.D., is an Emeritus Professor, Computer Science, University at Albany, SUNY,  a Research Software Consultant to REX Computing, Inc. and Senior Computational Mathematician at Etaphase, Inc. Dr. Mullin invented a new theory of n-dimensional tensors/arrays in her 1988 Dissertation, A Mathematics of Arrays (MoA) that includes an indexing calculus, The Psi Calculus. This theory built on her tenure at IBM Research working with Turing Award Winner, Kenneth Iverson. She has built numerous software and hardware prototypes illustrating both the power and mechanization of MoA and the Psi Calculus. MoA was recognized by NSF with the 1992 Presidential Faculty Fellowship, entitled “Intermediate Languages for Enhanced Parallel Performance”, awarded to only 30 nationally. Her binary transpose was accepted and incorporated into Fortran 90. On sabbatical at MIT Lincoln Laboratory, she worked to improve the standard missile software through MoA design. As an IPA, she ran the Algorithms, Numerical and Symbolic Computation program in NSF’s CISE CCF. While on another leave she was Program Director in DOE’s ASCR Program. She lives in Arlington, Va.

The post Tensors Come of Age: Why the AI Revolution Will Help HPC appeared first on HPCwire.

CoolIT Systems Launches Rack DCLC AHx2 Heat Exchange Module

HPC Wire - Sat, 11/11/2017 - 21:26

CALGARY, AB, November 10, 2017 – CoolIT Systems (CoolIT), world leader in energy efficient liquid cooling solutions for HPC, Cloud, and Enterprise markets, has expanded its Rack DCLC product line with the release of the AHx2 Heat Exchange Module. This compact Liquid-to-Air heat exchanger makes it possible for Direct Contact Liquid Cooling (DCLC) enabled servers to be thermally tested during the factory burn-in process, without additional liquid cooling infrastructure. CoolIT will officially launch the AHx2 at the Supercomputing Conference 2017 (SC17) in Denver, Colorado.

The AHx2 is a vital addition to CoolIT’s broad range of liquid cooling products. It is a compact, easy to transport air heat exchanger designed to enable factory server burn-in when liquid is not present in the facility. As a Liquid-to-Air heat exchanger, the AHx2 dissipates heat from the coolant in the server loop to the ambient environment. AHx2 provides direct liquid cooling to four DCLC enabled servers, and provides 2kW of heat load management. The design and size allows the unit to safely sit on top or adjacent to a server chassis during manufacturing.

“The Rack DCLC AHx2 Module is the ideal way for OEMs and System Integrators to conduct thermal testing during the factory burn-in process,” said Patrick McGinn, VP of Product Marketing, CoolIT Systems. “Our customers will appreciate having access to such robust testing potential in such a compact design, without needing to invest in supplementary liquid cooling infrastructure.”

The AHx2 Heat Exchange Module is a product designed to meet a critical customer need, and as such, is an important part of CoolIT’s modular product array. SC17 attendees can learn more about the solution by visiting CoolIT at booth 1601 from November 13-16. To set up an appointment, contact Lauren Macready at lauren.macready@coolitsystems.com

About CoolIT Systems.

CoolIT Systems, Inc. is the world leader in energy efficient liquid cooling technology for the Data Center, Server and Desktop markets. CoolIT’s Rack DCLC platform is a modular, rack-based, advanced cooling solution that allows for dramatic increases in rack densities, component performance, and power efficiencies. The technology can be deployed with any server and in any rack making it a truly flexible solution. For more information about CoolIT Systems and its technology, visit www.coolitsystems.com.

About Supercomputing Conference (SC17)

Established in 1988, the annual SC conference continues to grow steadily in size and impact each year. Approximately 5,000 people participate in the technical program, with about 11,000 people overall. SC has built a diverse community of participants including researchers, scientists, application developers, computing center staff and management, computing industry staff, agency program managers, journalists, and congressional staffers. This diversity is one of the conference’s main strengths, making it a yearly “must attend” forum for stakeholders throughout the technical computing community. For more information, visit https://sc17.supercomputing.org/.

Source: CoolIT Systems, Inc.

The post CoolIT Systems Launches Rack DCLC AHx2 Heat Exchange Module appeared first on HPCwire.

IBM Announces Advances to IBM Quantum Systems & Ecosystem

HPC Wire - Sat, 11/11/2017 - 15:33

YORKTOWN HEIGHTS, N.Y., Nov. 11, 2017 — IBM announced two significant quantum processor upgrades for its IBM Q early-access commercial systems. These upgrades represent rapid advances in quantum hardware as IBM continues to drive progress across the entire quantum computing technology stack, with focus on systems, software, applications and enablement.

  • The first IBM Q systems available online to clients will have a 20 qubit processor, featuring improvements in superconducting qubit design, connectivity and packaging. Coherence times (the amount of time available to perform quantum computations) lead the field with an average value of 90 microseconds, and allow high-fidelity quantum operations.
  • IBM has also successfully built and measured an operational prototype 50 qubit processor with similar performance metrics. This new processor expands upon the 20 qubit architecture and will be made available in the next generation IBM Q systems.

Clients will have online access to the computing power of the first IBM Q systems by the end of 2017, with a series of planned upgrades during 2018. IBM is focused on making available advanced, scalable universal quantum computing systems to clients to explore practical applications. The latest hardware advances are a result of three generations of development since IBM first launched a working quantum computer online for anyone to freely access in May 2016. Within 18 months, IBM has brought online a 5 and 16 qubit system for public access through the IBM Q experience and developed the world’s most advanced public quantum computing ecosystem.

An IBM cryostat wired for a prototype 50 qubit system. (PRNewsfoto/IBM)

“We are, and always have been, focused on building technology with the potential to create value for our clients and the world,” said Dario Gil, vice president of AI and IBM Q, IBM Research. “The ability to reliably operate several working quantum systems and putting them online was not possible just a few years ago. Now, we can scale IBM processors up to 50 qubits due to tremendous feats of science and engineering. These latest advances show that we are quickly making quantum systems and tools available that could offer an advantage for tackling problems outside the realm of classical machines.”

Over the next year, IBM Q scientists will continue to work to improve its devices including the quality of qubits, circuit connectivity, and error rates of operations to increase the depth for running quantum algorithms. For example, within six months, the IBM team was able to extend the coherence times for the 20 qubit processor to be twice that of the publicly available 5 and 16 qubit systems on the IBM Q experience.

In addition to building working systems, IBM continues to grow its robust quantum computing ecosystem, including open-source software tools, applications for near-term systems, and educational and enablement materials for the quantum community. Through the IBM Q experience, over 60,000 users have run over 1.7M quantum experiments and generated over 35 third-party research publications. Users have registered from over 1500 universities, 300 high schools, and 300 private institutions worldwide, many of whom are accessing the IBM Q experience as part of their formal education. This form of open access and open research is critical for accelerated learning and implementation of quantum computing.

“I use the IBM Q experience and QISKit as an integral part of my classroom teaching on quantum computing, and I cannot emphasize enough how important it is. In prior years, the course was interesting theoretically, but felt like it described some far off future,” said Andrew Houck, professor of electrical engineering, Princeton University. “Thanks to this incredible resource that IBM offers, I have students run actual quantum algorithms on a real quantum computer as part of their assignments! This drives home the point that this is a real technology, not just a pipe dream.  What once seemed like an impossible future is now something they can use from their dorm rooms. Now, our enrollments are skyrocketing, drawing excitement from top students from a very wide range of disciplines.”

To augment this ecosystem of quantum researchers and application development, IBM rolled out earlier this year its QISKit (www.qiskit.org) project, an open-source software developer kit to program and run quantum computers. IBM Q scientists have now expanded QISKit to enable users to create quantum computing programs and execute them on one of IBM’s real quantum processors or quantum simulators available online. Recent additions to QISKit also include new functionality and visualization tools for studying the state of the quantum system, integration of QISKit with the IBM Data Science Experience, a compiler that maps desired experiments onto the available hardware, and worked examples of quantum applications.

“Being able to work on IBM’s quantum hardware and have access through an open source platform like QISKit has been crucial in helping us to understand what algorithms–and real-world use cases–might be viable to run on near-term processors,” said Matt Johnson, CEO, QC Ware. “Simulators don’t currently capture the nuances of the actual quantum hardware platforms, and nothing is more convincing for a proof-of-concept than results obtained from an actual quantum processor.”

Quantum computing promises to be able to solve certain problems – such as chemical simulations and types of optimization – that will forever be beyond the practical reach of classical machines. In a recent Nature paper, the IBM Q team pioneered a new way to look at chemistry problems using quantum hardware that could one day transform the way new drugs and materials are discovered. A Jupyter notebook that can be used to repeat the experiments that led to this quantum chemistry breakthrough is available in the QISKit tutorials. Similar tutorials are also provided that detail implementation of optimization problems such as MaxCut and Traveling Salesman on IBM’s quantum hardware.

This ground-breaking work demonstrates it is possible to solve interesting problems using near term devices and that it will be possible to find a quantum advantage over classical computers. IBM has made significant strides tackling problems on small scale universal quantum computing systems. Improvements to error mitigation and to the quality of qubits are our focus for making quantum computing systems useful for practical applications in the near future. As well, IBM has industrial partners exploring practical quantum applications through the IBM Research Frontiers Institute, a consortium that develops and shares a portfolio of ground-breaking computing technologies and evaluates their business implications. Founding members include Samsung, JSR, Honda, Hitachi Metals, Canon, and Nagase.

These quantum advances are being presented today at the IEEE Industry Summit on the Future Of Computing as part of IEEE Rebooting Computing Week.

IBM Q is an industry-first initiative to build commercially available universal quantum computing systems for business and science applications. For more information about IBM’s quantum computing efforts, please visit www.ibm.com/ibmq.

Source: IBM

The post IBM Announces Advances to IBM Quantum Systems & Ecosystem appeared first on HPCwire.

Early Cluster Comp Betting Odds Favor China, Taiwan, and Poland

HPC Wire - Sat, 11/11/2017 - 10:15

So far the early action in the betting pool favors Taiwan’s NTHU, China’s Tsinghua, and, surprisingly, Poland’s University of Warsaw. Other notables include Team Texas at 9 to 1, the German juggernaut FAU/TUC team at 12 to 1, and the University of Illinois at 13 to 1.

There are several teams that haven’t seen any action yet, including last year’s winner USTC, third place 2016 winner Team Peking, and up and comer Nanyang University.

I’m also not seeing any betting love for perennial favorite Team Chowder (Boston).

If you want to find out more about the teams before laying down your (virtual) money, you can see our exhaustive profiles of each team here. That should give you enough info to start laying down some money on the win line.

The betting window will be open until this coming Tuesday, so get in and get paid. Here’s a link to the betting pool.

The post Early Cluster Comp Betting Odds Favor China, Taiwan, and Poland appeared first on HPCwire.

Indiana University Showcases SC17 Activities

HPC Wire - Sat, 11/11/2017 - 09:41

DENVER, Colo., Nov. 11 — Computing and networking experts from Indiana University will gather in the Mile High City next week for SC17, the International Conference for High Performance Computing, Networking, Storage and Analysis taking place November 12-17 in Denver. SC17 is one of the world’s foremost tech events, annually attracting thousands of scientists, researchers, and IT experts from across the world.

IU’s Pervasive Technology InstituteGlobal Research Network Operations Center, and School of Informatics, Computing and Engineering (SICE) will team up to host a research-oriented booth (#601) in the exhibition portion of the conference, showcasing current research and educational initiatives.

With the theme “We put the ‘super’ in computing,” the IU booth will showcase staff and faculty members and projects that are pushing the boundaries of what’s possible in computing and networking. Although they may not sport capes, the IU team devotes its considerable abilities to harnessing the cloud, achieving maximum throughput, engineering intelligent systems, and thwarting real-life cybervillains.

“SC17 marks the 20th anniversary of IU’s first display at the Supercomputing Conference, a milestone that underscores our deep commitment to leveraging high performance computing and networking to benefit the IU community, the state of Indiana, and the world,” said Brad Wheeler, IU vice president for IT and chief information officer. “In that time span, our researchers, scientists, and technologists have not only put IU on the map in the world of HPC, but their talents and discoveries have made IU a true leader in this increasingly important realm.”

One highlight of IU’s participation in SC17 is Judy Qiu’s invited talk, “Harp-DAAL: A Next Generation Platform for High Performance Machine Learning on HPC-Cloud.” Qiu is an associate professor in the intelligent systems engineering department in SICE. She will discuss growth in HPC and machine learning for big data with cloud infrastructure, and introduce Harp-DAAL, a high performance machine learning framework.

The Supercomputing Conference is always a fantastic opportunity to showcase the work that is being conducted at SICE and provides a spotlight for our wonderful faculty.

Raj Acharya, dean of the IU School of Informatics, Computing and Engineering

“The Supercomputing Conference is always a fantastic opportunity to showcase the work that is being conducted at SICE and provides a spotlight for our wonderful faculty,” said Raj Acharya, dean of SICE. “The conference itself is so valuable because it brings together the greatest minds in supercomputing in an atmosphere of collaboration that is as inspiring as it is informative. We’re always thrilled to be a part of it.”

This year, the IU team continues its leadership role in organizing the conference. Matt Link, associate vice president and director of systems for IU Research Technologies, serves as a member of the SC Steering Committee. Scott Michael, manager of research analytics, is vice chair of the Students@SC committee, and Jenett Tillotson, senior system administrator for high performance systems, is a member of the Student Cluster Competition committee.

Additionally, IU network engineers will continue a decades-long tradition of helping to operate SCinet, one of the most powerful and advanced networks in the world. Created each year for the conference, SCinet is a high-capacity network to support the applications and experiments that are the hallmark of the SC conference. Laura Pettit, SICE director of intelligent systems engineering research operations, is the SCinet volunteer services co-chair, and ISE doctoral students Lucas Brasilino and Jeremy Musser are also volunteering with SCinet.

This year, the IU booth will include a range of presentations and demonstrations:

  • Current Trends and Future Challenges in HPC by Jack Donagarra, University of Tennessee and Oak Ridge National Laboratory.
  • Special event: Jetstream and OpenStack by Dave Hancock and partners. OpenStack is the emerging standard for deploying cloud computing capabilities, and cloud-based infrastructure is increasingly able to handle HPC workloads. During this special event, members of the Jetstream team and the OpenStack Foundation Scientific Working Group will discuss how they use OpenStack to serve HPC customers.
  • Science Gateways with Apache Airavata by Marlon Pierce, Eroma Abeysinghe and Surresh Marru. Science gateways are user interfaces and user-supporting services that simplify access to advanced resources for novice users and provide new modes of usage for power users. Apache Airavata is open source cyberinfrastructure software for building science gateways. During this demonstration, the presenters provide an overview of recent developments.
  • Big Data Toolkit Spanning HPC, Grid, Edge and Cloud Computing by Geoffrey Fox. This demonstration looks at big data programming environments such as Hadoop, Spark, Flink, Heron, Pregel; HPC concepts such as MPI and asynchronous many-task runtimes; and cloud/grid/edge ideas such as event-driven computing, serverless computing, workflow and services.
  • Cybersecurity for Science by Von Welch. The Center for Applied Cybersecurity Research, affiliated with the Pervasive Technology Institute at Indiana University, specializes in cybersecurity for R&D. In this scope, the center works with science communities across the country, including leading the National Science Foundation’s Cybersecurity Center of Excellence. This talk will provide an overview of what cybersecurity means in the context of science and how it can enable productive, trusted scientific research.
  • Enabling High-Speed Networking for Researchers by Chris Robb. With data networking becoming increasingly complex and opaque, researchers are often unsure how to address poor performance between their endpoints. This talk will introduce the IRNC NOC Performance Engagement Team (PET) and show how it can help researchers determine the best approach to achieving their maximum bandwidth potential.
  • Scientific Workflow Integrity for Pegasus by Von Welch and partners. The Pegasus Workflow Management System is a popular system for orchestrating complex scientific workflows. In this talk, the PIs of the NSF-funded Scientific Workflow Integrity for Pegasus project will talk about scientific data integrity challenges and their work to add greater assurances to Pegasus for data integrity.
  • Macroscopes from the “Places & Spaces: Mapping Science” Exhibition by Katy Börner. See up to 100 large-format maps that showcase effective visualization techniques to communicate science to the general public. These interactive visualizations, called macroscopes, help people see patterns in data that are too large or complex to view unaided.
  • Proteus: A Configurable FPGA Cluster for High Performance Networking by Martin Swany. Proteus is new HPC cluster and research testbed that will enable investigation of novel and advanced architectures in HPC. Using FPGAs to optimize the performance of common parallel operations, this serves as a model for hardware accelerated network “microservices.”
  • International Networks at IU by Jennifer Schopf. International Networks at IU is a multi-million dollar NSF-funded program that supports the use of international links between the United States, Europe, Asia and Africa. Demos will review our currently supported links, as well as the measurement and monitoring services deployed on the links.

About the IU School of Informatics, Computing, and Engineering
The School of Informatics, Computing, and Engineering’s rare combination of programs—including informatics, computer science, library science, information science and intelligent systems engineering—makes SICE one of the largest, broadest and most accomplished of its kind. The extensive programs are united by a focus on information and technology.

About the Pervasive Technology Institute
The Pervasive Technology Institute (PTI) at Indiana University is a world-class organization dedicated to the development and delivery of innovative information technology to advance research, education, industry and society. Since 2000, PTI has received more than $50 million from the National Science Foundation to advance the nation’s research cyberinfrastructure.

About the Global Research Network Operations Center
The Global Research Network Operations Center (GlobalNOC) supports advanced international, national, regional and local high-performance research and education networks. GlobalNOC plays a major role in transforming the face of digital science, research and education in Indiana, the United States, and the world by providing unparalleled network operations and engineering needed for reliable and cost-effective access to specialized facilities for research and education.

Source: Indiana University

The post Indiana University Showcases SC17 Activities appeared first on HPCwire.

Exhaustive Profile of SC17 Cluster Competition Teams – Let’s go DEEP…

HPC Wire - Sat, 11/11/2017 - 08:24

Like a fat man at a Vegas buffet, we’re now ready to delve deeply into the SC17 cluster teams. In this article, we’re going to take our initial personal look at the teams… their hopes, their dreams, and even, in some cases, their favorite songs.

First up, the teams from the United States……

Georgia Institute of Technology: The GIT team, or Team Swarm as they call themselves, is a new entrant into the world of high stakes student clustering. The term ‘swarm’ is a reference to their university mascot, the yellow jacket – which is a pretty nasty stinging bug. The team has a wide range of experience including GPU acceleration, IC fabrication, and data analytics.

They believe their unique competitive advantage lies in automation. No, they’re not using The Clapper (although it would be pretty cool to have a “clap on” “clap off” cluster). Team Swarm has assembled a tool stack that automates their system environment and allows them focus on optimizing their apps rather than managing their system. The biggest thing they’re looking forward to? Crushing LINPACK and HPCG, plus meeting the other teams. Their favorite song? “Never Gonna Give You Up” by Rick Astley. Damn, just typing that song title has now lodged it in my head.

Northeastern University: They’ve dubbed themselves “Team HACKs”, which stands for “Huskies with Accelerated Computing Kernels” and references NEU’s Husky mascot. This isn’t their first cluster rodeo, they’ve been here before. Last year, they were the victim of a shipping error and had to run competition tasks on a hoopty hodgepodge of components.

To prep for the competition this year, they’ve been working very closely with their vendor partner AMD, plus tapping the brains of the NEU grad students in the research lab. NEU is looking to make a big comeback in this year’s competition, judging by their theme song “Don’t Stop Believing” by Journey…damn, another song stuck in my brain. What’s worse is that I also have the mental picture of the kids from Glee performing it. I’m going to go hit myself with a hammer.

San Diego State University/Southwest Oklahoma State University: This is a melded team that has the longest abbreviation in competition history: SDSUSWOKSU. Hmm….not as ungainly as I thought, you can actually pronounce it. To make things easier, they’ve dubbed their team “Thread Weavers”, which refers to the fact that modern computers use threads. They need to take this nickname back into the lab and come up with something better – or just let me give them a nickname.

This is the first major competition for the Oklahoma side of the roster, while the San Diego side has a couple of returning veterans from last year’s competition. The team seems well organized and has been meeting regularly during the summer and fall in preparation for entering the crucible that is the SC17 Student Cluster Competition. They’ve been using Zoom and Slack to facilitate their meet ups and have become a close-knit group.

I listened to their theme song, “Tonight” by Kleeer, and while it’s funky enough, it isn’t rousing. How are you going to rally your cluster troops into frenzy with a funkadelic smooth groove? But I’m old, what the hell do I know about music?

MGHPCC: In the first line of their team profile they say “Our team is called the MGHPCC Green Team. This name was chosen some years ago because all of the competitors came from the universities that founded the Mass Green HPC Center.” No, they’re wrong. Their name is Team Chowder(or Chowdah) or Team Boston, or Team “So you think you’re better than me?” and always has been. I laid those nicknames on them at their first competition and I’m not giving them up.

The team is keeping their secret sauce a secret, maybe even from themselves, but they’ve certainly been putting in the time, meeting for the last nine months in preparation for the competition. They had a very interesting answer to the question about how long it would take them to reach Denver from their base in Boston…

“At an average walking speed of 3.1 miles per hour and an 8 hour break per day, we expect to arrive in 39 days. Walking in parallel (side by side) will not speed up our journey, but walking in a single file reduces air resistance and saves time.”

Nicely done, Team Chowder. I also want to give them some props for their theme song, David Bowie’s “Under Pressure”, very appropriate for the competition. Welcome back, Boston.

Chicago Fusion Team: This is a pretty complicated team. Some members are from the Illinois Institute of Technology, others are from Maine South High School, and still others are representing Adlai Stevenson High School – all located in or near Chicago. They’re being sponsored by an alliance of heavy hitters including Intel, Argonne National Laboratory, Calyos, NVIDIA, and the National Science Foundation.

Since they didn’t complete the team profile paperwork, I don’t have a lot of details about what they do for fun and their favorite song. However, they did submit their final architectural proposal which has all sorts of details about their cluster. We’ll be covering their configuration in more detail as we get into the competition, but they’re bringing a LOT of hardware – enough to consume 8,200 watts if it were all fired up without throttling. That’s nearly three times the 3,000 watt hard cap, so there will be significant throttling and probably even some agonizing reappraisal when it comes to their configuration.

One other interesting point that caught my attention is that they’re going to be running a two-phase liquid cooling on at least some of their components in order to reduce power usage and, hopefully, run in turbo mode as much as possible. We’ll report more details about the team as they become available.

University of Texas/Texas State University: This is the second time this mixed team has competed at a SC Student Cluster Competition. They’ve dubbed themselves “Team Longcats” in a nod to their respective school mascots. This is not to be confused with the term “Long Pork”, which is how cannibals refer to humans.

The team has been working together since last April to prepare for the SC17 cluster competition marathon. They’re backed by the combined might of long time sponsors Dell and the Texas Advanced Computing Center.

This is a team with a gaudy history. The Texas Long Horn team took the SC Student Cluster Competition crown (although there is no actual crown) three times in a row (SC12, SC13, and SC14) – a feat that has not been duplicated. They’re anxious to drink deeply from the Student Cluster Competition chalice of victory yet again.

The Longcats have wide ranging interests that include fencing, music, walking around outside, electrical engineering, and even designing musical shoes.

University of Illinois: This is the second time that the U of I has entered the Student Cluster Competition arena. One of the unique things about this team is that in addition to the normal things a team does to prepare for a competition, like researching the applications, practicing setting up their machine, etc., they’ve also been working out with the staff of hardcore financial services experts at Jump Trading. It’s an unorthodox training method, but those guys definitely know how to get performance out of a system.

The U of I team is looking forward to networking with others in the HPC industry at the show, and hopefully expanding their skill set at the same time. Their profile also made me laugh when they said “Some of our team members attended the “Dinner with interesting people” event at SC16, but ultimately decided to leave and have dinner with some less interesting people.”

Their profile also revealed a strange and horrible coincidence: their team song is “Never Gonna Give You Up” by the highly regarded Rick Astley – the exact same song as the Georgia Institute of Technology team. Yikes.

University of Utah: This is the second time we’re seeing the SupercompUtes from Utah in a SC competition. Last year, at SC16 in Salt Lake City, the team turned in an unprecedented performance, finishing second overall. This is a huge achievement for a first time competitor and makes them a team to be contended with. Four veterans from that team will be returning this year.

The team believes that their secret sauce is that they’ve trained at altitude for the cluster competition. Salt Lake City sits at 4,327 feet above sea level – even higher if you’re lab is on the second or third story of a building. Denver, with an average altitude of 5,280 feet, isn’t all that much higher than Salt Lake, so the Utes should be well accustomed to high altitude clustering – a point in their favor.

They’ve also picked an inspiring song to drive their team: “Warriors” by Imagine Dragons…good choice.

The Utes are excited to meet the other teams and also, like the other teams, wanting to explore possible HPC careers. So if you’re an employer looking to nab high-performance employees or interns, swing by the student cluster competition area at SC17 and talk to the students. They’re highly motivated, highly skilled, and have the drive and initiative that every employer values.

William Henry Harrison High School: They wrote the tersest team profile in the competition, so they’re not giving me a lot to work with. First fact is that they’ve dubbed their team “The Sudo Wrestlers” which is a nice play on a Linux term. They believe that their edge in the competition is that they’re younger than the other competitors – which is absolutely true, given that they are the first all-high school team in the big iron division of the cluster competition.

They’re led by Lev Gorenstein, a veteran coach who has led several teams in the past, which is definitely an advantage for the plucky team of high schoolers. What isn’t an advantage is their team song: “Careless Whisper” by George Michael. Not exactly the song you’d pick to drive top performance, right? What happened? Was “Wake Me Up Before You Go Go” already taken by someone?

The SC Student Cluster Competitions are international affairs and this year is no exception. Denver is hosting seven teams from non-US countries, let’s take an up close and personal look at those teams, starting with the teams from Europe….

Friedrich Alexander University/Technical University of Munich: These teams wrote a pun-tastic team profile, chock full of, as they put it, “p(h)uns” and fun. Unfortunately for them, I can’t stand puns – they’re the lowest form of humor, just above limericks.

What isn’t funny (or even phunny, as they’d put it) is the skill and expertise these two teams are bringing to the competition. FAU is coming off of a Highest LINPACK win at the ISC17 competition and TUC finished in the upper echelon of teams at last year’s SC16 competition. Coupled together, this team could really make some waves at SC17.

I’ve had a glimpse of their proposed hardware for this competition and, damn, they’re packing some power. It should be a favorite for the Highest LINPACK award and a solid competitor for the Overall Championship as well. We’ll see what happens.

University of Warsaw: This team is actually an amalgamation of students from Lodz University of Technology, University of Warsaw, and Warsaw University of Technology – but they’ll always be Team Warsaw to me. Team Warsaw burst onto the big league cluster scene at ASC17 in Wuxi, China. They shocked the cluster world by coming out of nowhere to nab second place in the Highest LINPACK portion of the competition.

Based on my observations in China, this is a happy team that works together well. They have a finely honed sense of humor and an optimistic outlook. When it comes to this year’s competition, the team says “we want to hear our cluster screeching while running the HPL benchmark.” They’re also looking forward to renewing friendships with other teams from ASC17 as well as making new friendships with other teams.

When they’re not clustering, team members enjoy walking up and down hills and rocks, being underwater, and reading things.

This year’s SC17 competition has a large slate of teams hailing from Asia. Let’s get to know them a bit better….

Nanyang Technological University: This will be the sixth appearance in a major competition for the ‘Pride of Singapore’ NYU. They notched a win in the Highest LINPACK at ASC’15 in Taiyuan, China, but have been shut out of the other major awards. I think this is a team that’s ready to make the move to the next level. They have the experience and are highly motivated. They’ve even named their team “Supernova” with the thought that SC17 could be their time to shine.

They think their edge at SC17 will be the work they’ve done on application optimization, an effort that they didn’t put much time into at SC16, although they took first place on the code optimization task at that competition.

This year they’re going DEEP on the applications, talking to domain experts, combing the web, and actually reading physical books (gasp!). They believe this work will give them unique and comprehensive knowledge of the applications which will translate into a win at SC17. Nice having you back, Nanyang, good luck.

National Tsing Hua University: NTHU is a frequent entrant in major league cluster competitions. Over the years they’ve participated in an amazing 12 Student Cluster Competitions, taking down the Overall Championship or LINPACK Award four times.

This edition of the team has dubbed themselves “./confizure” which, I think, the configure command on Azure. They’re the first team to use Arch Linux in a major cluster competition, which could be an advantage or maybe a disadvantage if things go sideways. When it comes to SC17, they’re looking forward to seeing how the other teams deal with the promised power shutoff event – that should be highly interesting.

When it comes to having fun, this team most enjoys making fun of each other – which almost automatically makes them my favorite team, right? In another humorous twist, their team song is a national health exercise they all had to perform every day in elementary school. Here’s a link, it’s hilarious.  I’m looking forward to making them perform this same exercise every morning before I give them their keyboards.

University of Peking: This is another team that didn’t waste any words when filling out their team profile form. Their nickname is: Team Peking. Their secret sauce is their excellent advisors, solid vendor support, and active team members with different backgrounds.

They’re obviously holding their cards close to their collective vests, not wanting to give anything away. However they did let us know that their team song is “He’s a Pirate” from the Pirates of the Caribbean movie. They also let it drop that one of the major activities is to debate which is the best text editor.

This is the second time we’ve seen Team Peking at a SC cluster competition. Last year, as a newbie team, they managed to land third place for the Overall Championship award, which is quite a feat. They are running new hardware this year, so we’ll see what happens, but this is definitely a team to keep an eye on.

Tsinghua University: Team Tsinghua, or team THU-PACMAN, as they’ve dubbed themselves, is an intensely focused team. This isn’t a surprise when you consider that there is more at stake for them than for perhaps any other team. If Tsinghua can win the Overall Championship at SC17, then they will have completed a record shattering second Student Cluster Competition Grand Slam. This means they would have won all three major competitions (ASC, ISC, and SC) in a single year. The 2015 Tsinghua team is the only other team to have done this in cluster competition history.

There isn’t a whole lot of detail in the Team Tsinghua profile. They like playing online games together during their off time. Their team song is the song that plays during Pac-Man, if you can call a bunch of “waca waca waca” noises a song. But more than anything else, they seem to like winning Student Cluster Competition championships. We’ll see if they can make their 2017 Grand Slam dream come true next week.

University of Science & Technology of China: As the USTC team participated in more competitions, their abilities grew to the point where they took home all the marbles in 2016 and are returning in 2017 to defend their crown (although there isn’t an actual crown).
Most of the team this year is new, so this will be their first time competing in the mind-twisting marathon that is the SC17 Student Cluster Competition. The team points to ‘hard work’ as their secret sauce in the competition this year. They’re also the only team to have specified a spirit animal for the competition. For USTC, their spirit animal is the “Swan Goose” which is lauded in Chinese literature for its perseverance and bravery. As they put it in their profile, they intend to soar like a swan goose. Good thing the convention center ceilings are 40 feet high in most places.

Ok, so if you’re still reading, you now have the personal rundown on each team. If a team has captured your fancy, you should lay your (virtual) money on them in our annual betting pool. You can find the betting pool page here, just solve the captcha and you’ll have a chance to lay down a virtual $1,000 on any team (or teams) of your choice. Here’s the link to the pool.

In upcoming articles we’re going to take a look at the applications the students will face during SC17, the configurations of each team, plus video interviews of each team. Stay tuned to HPCwire for more….

The post Exhaustive Profile of SC17 Cluster Competition Teams – Let’s go DEEP… appeared first on HPCwire.

Intel, AMD Moves Rattle GPU Market

HPC Wire - Fri, 11/10/2017 - 17:02

Intel Corp. has lured away the former head of AMD’s graphics business as the world’s largest chipmaker forms a high-end graphics unit to compete with GPU market leader Nvidia.

Intel rattled tech markets this week by hiring AMD’s Raja Koduri to head its new Core and Visual Computing Group. The hiring came days after the chip rivals announced a graphics partnership.

Signaling its strategy of taking on Nvidia in the high-flying GPU market, Intel’s chief engineering officer, Murthy Renduchintala, said the hiring of Koduri underscored Intel’s “plans to aggressively expand our computing and graphics capabilities and build on our very strong and broad differentiated IP foundation.”

Koduri previously served as senior vice president and chief architect of AMD’s Radeon Technologies Group. There, he oversaw AMD’s graphics development. Koduri, 49, was Apple’s director of graphics architecture before joining AMD. At Apple, he led the company’s transition to Retina laptop displays.

Intel said Koduri would assume his new graphics duties in early December.

Raja Koduri

Koduri’s hiring sent AMD’s shares plummeting on the Nasdaq exchange, although they were beginning to recover on Friday (Nov. 10). Likewise, Nvidia’s shares sank on Thursday after Intel’s announcement but were up sharply by the end of the week after announcing record quarterly revenues.

Nvidia has been touting its accelerated GPU platforms with thousands of cores as the next step in computing as Moore’s Law runs out of steam. The reference to Intel co-founder Gordon Moore is seen as a shot across Intel’s bow by the GPU leader as the chipmakers mass their forces to compete in the nascent AI chip and algorithm markets.

“Being the world’s AI platform is our focus,” Greg Estes, Nvidia’s vice president of developer programs, stressed during a recent company event in Washington, DC.

Intel’s announcement of Koduri’s hiring came days after it unveiled a partnership with AMD to compete with Nvidia in the GPU market. “Our collaboration with Intel expands the installed base for AMD Radeon GPUs and brings to market a differentiated solution for high-performance graphics,” Scott Herkelman, vice president and general manager, AMD Radeon Technologies Group, noted in the press release.

AMD’s announcement of the Intel deal, reported elsewhere, was pulled from its website after the Koduri hiring was disclosed.

Intel said its new Core processor initially aimed at the gaming market would combine a high-performance CPU with AMD’s Radeon graphics components.

Hence, the high-end of the graphics market is shaping up as a battle between Nvidia’s many-core accelerated GPUs that emphasize parallelism versus Intel’s hybrid CPU-discrete graphics approach. While Intel emphasizes hardware horsepower through advances in high-bandwidth memory and new chip designs combined with discrete graphics, Nvidia is combining lots of many-core processors and big data to tackle emerging deep learning problems such as inference.

The key battleground will be the AI market where algorithms and APIs rather than traditional coding will help determine winners and losers, Nvidia’s Estes argued.

The post Intel, AMD Moves Rattle GPU Market appeared first on HPCwire.

Camp contributes to national report on CS enrollment surge

Colorado School of Mines - Fri, 11/10/2017 - 11:12

A Colorado School of Mines professor served on a National Academies of Sciences, Engineering and Medicine committee that recently released a report urging action to address the current surge in undergraduate computer science enrollments.

Tracy Camp, professor and head of the Computer Science Department, was one of 15 members on the national ad hoc committee tasked with examining the growing popularity of computer science courses at four-year institutions. 

According to the report, the number of bachelor’s degrees awarded in computer and information science across the U.S. has increased by 74 percent at not-for-profit institutions since 2009, versus a 16 percent increase in bachelor’s degrees overall. 

Institutions are struggling to keep up with the rising demand, with many reporting having too few faculty and instructors and insufficient classroom space and administrative support. More than half of new PhDs in computer science have taken jobs in the private sector in recent years, posing additional challenges to faculty recruitment, according to the report. 

The study was sponsored by the National Science Foundation. For more information, go to http://sites.nationalacademies.org/CSTB/CurrentProjects/CSTB_171607.

CONTACT
Emilie Rusch, Public Information Specialist, Communications and Marketing | 303-273-3361 | erusch@mines.edu
Mark Ramirez, Managing Editor, Communications and Marketing | 303-273-3088 | ramirez@mines.edu

Categories: Partner News

Moonshot Research and Providentia Worldwide Collaborate on HPC and Big Data Services for Industry

HPC Wire - Fri, 11/10/2017 - 11:06

Nov. 10, 2017 — Moonshot Research LLC (Champaign IL) and Providentia Worldwide, LLC (Washington D.C.) have agreed to jointly offer business and technical services and consulting in the areas of high-performance computing and big data services. The Moonshot/Providentia team brings expertise in driving ROI in enterprise computing by focusing on best practices in HPC, cloud and enterprise IT.

Merle Giles, CEO of Moonshot Research said, “I am absolutely delighted to work with the world’s experts in using HPC to achieve real-time analytics. Speed has become the ultimate competitive advantage in the world of technology-enabled products and services. The impact of utilizing our customer-first approach to industrial innovation and ROI are substantial.”

Ryan Quick and Arno Kolster of Providentia Worldwide are pioneers in adopting a hybrid approach to analytics, using techniques and software typically deployed independently in cloud, enterprise and HPC workflows. Their adoption of HPC solutions for real-time fraud detection at PayPal was unconventional, yet proved to be the perfect solution for achieving extreme data ingestion rates and rapid machine-driven decision making.

Giles brings a business sense to this mix of technology integration after proving the impact of his customer-first approach at NCSA’s Private Sector Program at the University of Illinois. Together, the Moonshot/Providentia team of experts offers independent, vendor-agnostic solutions that result in reduced time-to-solution, scale and increased certainty at less cost.

Giles, Quick and Kolster have each earned awards and recognition from HPCwire. All three are members of Hyperion Research’s HPC User Forum steering committee and have been invited speakers in numerous countries around the world. Giles was co-editor of a 2015 book entitled Industrial Applications of High-Performance Computing: Best Global Practices published by CRC Press.

Source: Moonshot Research

The post Moonshot Research and Providentia Worldwide Collaborate on HPC and Big Data Services for Industry appeared first on HPCwire.

Pages

Subscribe to www.rmacc.org aggregator