HPC Wire

Subscribe to HPC Wire feed
Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them
Updated: 20 hours 17 min ago

‘Negative Capacitance’ Could Bring More Efficient Transistors

Mon, 12/18/2017 - 12:34

WEST LAFAYETTE, Ind., Dec. 18, 2017 — Researchers have experimentally demonstrated how to harness a property called negative capacitance for a new type of transistor that could reduce power consumption, validating a theory proposed in 2008 by a team at Purdue University.

A new type of transistor (a) harnesses a property called negative capacitance. The device structure is shown with a transmission electron microscopy image (b) and in a detailed “energy dispersive X-ray spectrometry” mapping (c). (Purdue University photo/Mengwei Si)

The researchers used an extremely thin, or 2-D, layer of the semiconductor molybdenum disulfide to make a channel adjacent to a critical part of transistors called the gate. Then they used a “ferroelectric material” called hafnium zirconium oxide to create a key component in the newly designed gate called a negative capacitor.

Capacitance, or the storage of electrical charge, normally has a positive value. However, using the ferroelectric material in a transistor’s gate allows for negative capacitance, which could result in far lower power consumption to operate a transistor. Such an innovation could bring more efficient devices that run longer on a battery charge.

Hafnium oxide is now widely used as the dielectric, or insulating material, in the gates of today’s transistors. The new design replaces the hafnium oxide with hafnium zirconium oxide, in work led by Peide Ye, Purdue’s Richard J. and Mary Jo Schwartz Professor of Electrical and Computer Engineering.

“The overarching goal is to make more efficient transistors that consume less power, especially for power-constrained applications such as mobile phones, distributed sensors, and emerging components for the internet of things,” Ye said.

Findings are detailed in a research paper published on Dec. 18 in the journal Nature Nanotechnology.

The original theory for the concept was proposed in 2008 by Supriyo Datta, the Thomas Duncan Distinguished Professor of Electrical and Computer Engineering, and Sayeef Salahuddin, who was a Purdue doctoral student at the time and is now a professor of electrical engineering and computer sciences at the University of California, Berkeley.

The paper’s lead author was Purdue electrical and computer engineering doctoral student Mengwei Si. Among the paper’s co-authors are Ye; Ali Shakouri, the Mary Jo and Robert L. Kirk Director of Purdue’s Birck Nanotechnology Center and a professor of electrical and computer engineering; and Muhammad A. Alam, the Jai N. Gupta Professor of Electrical and Computer Engineering, who made critical and wide-ranging contributions to the theory describing the physics behind negative capacitance devices.

Transistors are tiny switches that rapidly turn on and off, enabling computers to process information in binary code. Properly switching off is of special importance to ensure that no electricity “leaks” through. This switching normally requires a minimum of 60 millivolts for every tenfold increase in current, a requirement called the thermionic limit. However, transistors that harness negative capacitance might break this fundamental limit, switching at far lower voltages and resulting in less power consumption.

New findings demonstrate the ferroelectric material and negative capacitance in the gate results in good switching in both the on and off states. The new design achieves another requirement: for the transistors to switch on and off properly they must not generate a harmful electronic property called hysteresis.

The negative capacitance was created with a process called atomic layer deposition, which is commonly used in industry, making the approach potentially practical for manufacturing.

The research is ongoing, and future work will explore whether the devices switch on and off fast enough to be practical for ultra-high speed commercial applications.

“However, even without ultrafast switching, the device could still have a transformative impact in a broad range of devices that may operate at lower frequency and must operate with low power levels,” Ye said.

Portions of the research were based at the Birck Nanotechnology Center in Purdue’s Discovery Park. The work was funded by the U.S. Air Force Office of Scientific Research, National Science Foundation, Army Research Office and Semiconductor Research Corporation.

The work was performed by researchers from Purdue, the National Nano Device Laboratories in Taiwan, and National Laboratory for Information Science and Technology of Tsinghua University in Beijing. A complete listing of the paper’s co-authors is contained in the abstract.

Source: Purdue University

The post ‘Negative Capacitance’ Could Bring More Efficient Transistors appeared first on HPCwire.

Intel Unveils Industry’s First FPGA Integrated with High Bandwidth Memory Built for Acceleration

Mon, 12/18/2017 - 11:13

Dec. 18, 2017 — Intel today announced the availability of the Intel Stratix 10 MX FPGA, the industry’s first field programmable gate array (FPGA) with integrated High Bandwidth Memory DRAM (HBM2). By integrating the FPGA and the HBM2, Intel Stratix 10 MX FPGAs offer up to 10 times the memory bandwidth when compared with standalone DDR memory solutions. These bandwidth capabilities make Intel Stratix 10 MX FPGAs the essential multi-function accelerators for high-performance computing (HPC), data centers, network functions virtualization (NFV), and broadcast applications that require hardware accelerators to speed-up mass data movements and stream data pipeline frameworks.

In HPC environments, the ability to compress and decompress data before or after mass data movements is paramount. HBM2-based FPGAs can compress and accelerate larger data movements compared with stand-alone FPGAs. With High Performance Data Analytics (HPDA) environments, streaming data pipeline frameworks like Apache* Kafka and Apache Spark Streaming require real-time hardware acceleration. Intel Stratix 10 MX FPGAs can simultaneously read/write data and encrypt/decrypt data in real-time without burdening the host CPU resources.

“To efficiently accelerate these workloads, memory bandwidth needs to keep pace with the explosion in data” said Reynette Au, vice president of marketing, Intel Programmable Solutions Group. “We designed the Intel Stratix 10 MX family to provide a new class of FPGA-based multi-function data accelerators for HPC and HPDA markets.”

The Intel Stratix 10 MX FPGA family provides a maximum memory bandwidth of 512 gigabytes per second with the integrated HBM2. HBM2 vertically stacks DRAM layers using silicon via (TSV) technology. These DRAM layers sit on a base layer that connects to the FPGA using high density micro bumps. The Intel Stratix 10 MX FPGA family utilizes Intel’s Embedded Multi-Die Interconnect Bridge (EMIB) that speeds communication between FPGA fabric and the DRAM. EMIB works to efficiently integrate HBM2 with a high-performance monolithic FPGA fabric, solving the memory bandwidth bottleneck in a power-efficient manner.

Intel is shipping several Intel Stratix 10 FPGA family variants, including the Intel Stratix 10 GX FPGAs (with 28G transceivers) and the Intel Stratix 10 SX FPGAs (with embedded quad-core ARM processor). The Intel Stratix 10 FPGA family utilizes Intel’s 14 nm FinFET manufacturing process and incorporates state-of-the-art packaging technology, including EMIB.

Source: Intel

The post Intel Unveils Industry’s First FPGA Integrated with High Bandwidth Memory Built for Acceleration appeared first on HPCwire.

Equus Compute Solutions Qualifies as 2017 Intel Platinum Technology Provider

Mon, 12/18/2017 - 10:33

Dec. 18, 2017 — Equus Compute Solutions announced it has qualified as a 2017 Intel Platinum Technology Provider in both the HPC Data Center Specialist and Cloud Data Center Specialist categories. To receive these designations, Equus demonstrated commitment and excellence in deploying Intel-based data center solutions. Equus technical staff successfully completed a set of rigorous, HPC and Cloud data center-focused training courses designed to build enhanced proficiency in delivering leading these technologies.

As an Intel Platinum Technology Provider, Equus has access to a number of value-added benefits. Access to Intel trainings and resources ensures Equus customers can gain market leading insights into the latest technologies and solutions. Collaboration with Intel cloud experts helps Equus deliver the right configuration, tailored specifically to customer requirements. The ability to leverage Intel test tools means Equus can accelerate solution schedules, ensure high quality, and offer customers the lowest total cost of ownership.

“Working closely with Intel at this Platinum level means Equus can help our customers deploy the most advanced software defined infrastructure solutions,” said Steve Grady, VP Customer Solutions. “We look forward to combining our Technology Provider program expertise with the Intel Builders Programs: Cloud, Storage and Network to create custom cost-effective solutions.”

More information on the Intel-powered Equus software defined infrastructure solutions is available athttp://www.equuscs.com/servers .

About Equus Compute Solutions

Equus Compute Solutions customizes white box servers and storage solutions to enable flexible software-defined infrastructures. Delivering low-cost solutions for the enterprise, software appliance vendors, and cloud providers, Equus is one of the leading white-box systems and solutions integrators. Over the last 28 years, we have delivered more than 3.5 million custom-configured servers, software appliances, desktops, and notebooks throughout the world. Our advanced systems support software-defined storage, networking, and virtualization that enable a generation of hyper-converged scale-out applications and solutions. From components to complete servers purchased online through ServersDirect.com, to fully customized fixed-configurations, white box is our DNA. Custom cost-optimized compute solutions is what we do, and driving successful customer business outcomes is what we deliver. Find out how to enable your software-defined world with us at www.equuscs.com.

Source: Equus Compute Solutions

The post Equus Compute Solutions Qualifies as 2017 Intel Platinum Technology Provider appeared first on HPCwire.

Australia Commits $70 Million for Next-Generation NCI Supercomputer

Mon, 12/18/2017 - 10:23

Dec. 18, 2017 — The Board of Australia’s National Computational Infrastructure (NCI), based at The Australian National University (ANU), welcomes the Australian Government’s announcement that it will invest $70 million to replace Australia’s highest performance research supercomputer, Raijin, which is rapidly nearing the end of its service life.

The funding, through the Department of Education and Training, will be provided as $69.2 million in 2017-18 and $800,000 in 2018-19.

Chair of the NCI Board, Emeritus Professor Michael Barber, said NCI was crucial to Australia’s future research needs.

“This announcement is very welcome. NCI plays a pivotal role in the national research landscape, and the supercomputer is the centrepiece of NCI’s renowned and tightly integrated, high-performance computing and data environment,” he said.

“The Government’s announcement is incredibly important for the national research endeavour.

“It means NCI can continue to provide Australian researchers with a world-class advanced computing environment that is a fusion of powerful computing, high-performance ‘big data’, and world-leading expertise that enables cutting-edge Australian research and innovation.

“The NCI supercomputer is one of the most important pieces of research infrastructure in Australia.  It is critical to the competitiveness of Australian research and development in every field of scientific and technological endeavour, spanning the national science and research priorities.”

ANU Vice-Chancellor Professor Brian Schmidt said the funding would ensure NCI remains at the centre of Australia’s research needs.

“The new NCI supercomputer will be a valuable tool for Australian researchers and industry, and will be central to scientific developments in medical research, climate and weather, engineering and all fields that require analysis of so-called big data, including, of course, astronomy,” Professor Schmidt said.

Australia’s Chief Scientist Dr Alan Finkel said high-performance computing is a national priority.

“Throughout our consultations to develop the 2016 National Research Infrastructure Roadmap the critical importance of Australia’s two high performance computers was manifestly clear,” Dr Finkel said.

“Our scientific community will be overwhelmingly delighted by the Australian Government’s decision today to support the modernisation of the NCI computer hosted at ANU.”

The announcement of funding ensures researchers in 35 universities, five national science agencies, three medical research institutes, and industry will benefit from a boost in computational horsepower, enabling new research that is more ambitious and more innovative than ever before once the new supercomputer is commissioned in early 2019.

NCI anticipates the resulting supercomputer will be ranked in the top 25 internationally.

The Australian Government’s 2016 National Research Infrastructure Roadmap specifically recognised the critical importance of such a resource, and the need for an urgent upgrade.

The new supercomputer will ensure NCI can continue to provide essential support for research funded and sustained by the national research councils (the Australian Research Council and the National Health and Medical Research Council), and the national science agencies—notably CSIRO, the Bureau of Meteorology and Geoscience Australia.

This research will drive innovation that is critical to Australia’s future economic development and the wellbeing of Australians.

About NCI

NCI, Australia’s national high-end research computing service, is in the vanguard of international advanced computing, delivering solutions that encompass computationally intensive modelling and simulation and address the needs of big data—a requirement recognised in the Australian Government’s 2016 National Research Infrastructure Roadmap (released May 2017).

Source: NCI

The post Australia Commits $70 Million for Next-Generation NCI Supercomputer appeared first on HPCwire.

Carnegie Mellon Reveals Inner Workings of Victorious AI

Mon, 12/18/2017 - 10:15

PITTSBURGH, Pa., Dec. 18, 2017 — Libratus, an artificial intelligence that defeated four top professional poker players in no-limit Texas Hold’em earlier this year, uses a three-pronged approach to master a game with more decision points than atoms in the universe, researchers at Carnegie Mellon University report.

In a paper published online Sunday by the journal Science, Tuomas Sandholm, professor of computer science, and Noam Brown, a Ph.D. student in the Computer Science Department, detail how their AI was able to achieve superhuman performance by breaking the game into computationally manageable parts. They also explain how, based on its opponents’ game play, Libratus fixed potential weaknesses in its strategy during the competition.

AI programs have defeated top humans in checkers, chess and Go — all challenging games, but ones in which both players know the exact state of the game at all times. Poker players, by contrast, contend with hidden information — what cards their opponents hold and whether an opponent is bluffing.

In a 20-day competition involving 120,000 hands at Rivers Casino in Pittsburgh during January 2017, Libratus became the first AI to defeat top human players at head’s up no-limit Texas Hold’em — the primary benchmark and long-standing challenge problem for imperfect-information game-solving by AIs.

Libratus beat each of the players individually in the two-player game and collectively amassed more than $1.8 million in chips. Measured in milli-big blinds per hand (mbb/hand), a standard used by imperfect-information game AI researchers, Libratus decisively defeated the humans by 147 mmb/hand. In poker lingo, this is 14.7 big blinds per game

“The techniques in Libratus do not use expert domain knowledge or human data and are not specific to poker,” Sandholm and Brown said in the paper. “Thus they apply to a host of imperfect-information games.” Such hidden information is ubiquitous in real-world strategic interactions, they noted, including business negotiation, cybersecurity, finance, strategic pricing and military applications.

Libratus includes three main modules, the first of which computes an abstraction of the game that is smaller and easier to solve than by considering all 10161 (the number 1 followed by 161 zeroes) possible decision points in the game. It then creates its own detailed strategy for the early rounds of Texas Hold’em and a coarse strategy for the later rounds. This strategy is called the blueprint strategy.

One example of these abstractions in poker is grouping similar hands together and treating them identically.

“Intuitively, there is little difference between a King-high flush and a Queen-high flush,” Brown said. “Treating those hands as identical reduces the complexity of the game and thus makes it computationally easier.” In the same vein, similar bet sizes also can be grouped together.

But in the final rounds of the game, a second module constructs a new, finer-grained abstraction based on the state of play. It also computes a strategy for this subgame in real-time that balances strategies across different subgames using the blueprint strategy for guidance — something that needs to be done to achieve safe subgame solving. During the January competition, Libratus performed this computation using the Pittsburgh Supercomputing Center’s Bridges computer.

Whenever an opponent makes a move that is not in the abstraction, the module computes a solution to this subgame that includes the opponent’s move. Sandholm and Brown call this nested subgame solving.

DeepStack, an AI created by the University of Alberta to play heads-up, no-limit Texas Hold’em, also includes a similar algorithm, called continual re-solving; DeepStack has yet to be tested against top professional players, however.

The third module is designed to improve the blueprint strategy as competition proceeds. Typically, Sandholm said, AIs use machine learning to find mistakes in the opponent’s strategy and exploit them. But that also opens the AI to exploitation if the opponent shifts strategy.

Instead, Libratus’ self-improver module analyzes opponents’ bet sizes to detect potential holes in Libratus’ blueprint strategy. Libratus then adds these missing decision branches, computes strategies for them, and adds them to the blueprint.

In addition to beating the human pros, Libratus was evaluated against the best prior poker AIs. These included Baby Tartanian8, a bot developed by Sandholm and Brown that won the 2016 Annual Computer Poker Competition held in conjunction with the Association for the Advancement of Artificial Intelligence Annual Conference.

Whereas Baby Tartanian8 beat the next two strongest AIs in the competition by 12 (plus/minus 10) mbb/hand and 24 (plus/minus 20) mbb/hand, Libratus bested Baby Tartanian8 by 63 (plus/minus 28) mbb/hand. DeepStack has not been tested against other AIs, the authors noted.

“The techniques that we developed are largely domain independent and can thus be applied to other strategic imperfect-information interactions, including non-recreational applications,” Sandholm and Brown concluded. “Due to the ubiquity of hidden information in real-world strategic interactions, we believe the paradigm introduced in Libratus will be critical to the future growth and widespread application of AI.”

The technology has been exclusively licensed to Strategic Machine, Inc., a company founded by Sandholm to apply strategic reasoning technologies to many different applications.

A paper by Brown and Sandholm regarding nested subgame solving recently won a Best Paper award at the Neural Information Processing Systems (NIPS 2017) conference. Libratus received the HPCwire Reader’s Choice Award for Best Use of AI at the 2017 International Conference for High Performance Computing, Networking, Storage and Analysis (SC17).

The National Science Foundation and the Army Research Office supported this research.

About Carnegie Mellon University

Carnegie Mellon (www.cmu.edu) is a private, internationally ranked research university with programs in areas ranging from science, technology and business, to public policy, the humanities and the arts. More than 13,000 students in the university’s seven schools and colleges benefit from a small student-to-faculty ratio and an education characterized by its focus on creating and implementing solutions for real problems, interdisciplinary collaboration and innovation.

Source: Carnegie Mellon University

The post Carnegie Mellon Reveals Inner Workings of Victorious AI appeared first on HPCwire.

World Record: Quantum Computer with 46 Qubits Simulated

Mon, 12/18/2017 - 10:04

Researchers from Jülich Supercomputing Centre, Wuhan University, and the University of Groningen, reported last week they successfully simulated a quantum computer with 46 quantum bits (qubits) for the first time. The researchers used the Jülich supercomputer JUQUEEN as well as the world’s fastest supercomputer Sunway TaihuLight at China’s National Supercomputing Center in Wuxi. Simulations such as this are critical for development of quantum programming expertise and software tools.

“There are only a few supercomputers in the world that currently have such a large amount of memory, an adequate number of compute nodes, and sufficiently fast network connections to even simulate a system of 45 qubits – that was the former world record,” explains Kristel Michielsen from the Jülich Supercomputing Centre (JSC). “And it’s just as important to get software up and running efficiently on the highly parallel architectures of state-of-the-art supercomputers.” A brief account of the work is posted on the Jülich web site.

Implementing quantum computing software and algorithms on classical systems is an ongoing challenge. Many codes lose efficiency if calculated in parallel on a large number of compute nodes. However, the software which Michielsen has been developing together with her partners for over ten years scales almost perfectly according to the report: “It shows hardly any loss in performance even if several million compute nodes are applied at the same time, as is the case with the Chinese supercomputer Sunway TaihuLight.”

Prof. Dr. Kristel Michielsen in front of the Jülich supercomputer JUQUEEN
Copyright: Forschungszentrum Jülich / Ralf-Uwe Limbach

Michielsen has already set a number of benchmarks in the past. In 2010, she became the first person to simulate a quantum computer with 42 qubits on the former Jülich supercomputer JUGENE. She then surpassed that world record in 2012 with the simulation of a 43-qubit system on JUQUEEN, the successor to JUGENE. Most recently, Michielsen simulated a 45-qubit quantum system together with partners from universities in Groningen, Tokyo, and Wuhan, thus equalling a record set in spring 2017. For that more than 500,000 gigabytes, or 0.5 petabytes, of memory were needed.

The latest breakthrough, which involved the simulation of a quantum computer with 46 qubits, was achieved following an adjustment of the simulation code. The representation of a quantum state now only requires 2 bytes instead of 16 bytes, without the accuracy of the results being significantly reduced. Other users can benefit from this simplification, which equates to a reduction in required memory by a factor of eight. The new version of the simulation software now enables a quantum computer with 32 qubits to be simulated on a notebook with 16 gigabytes of memory.

Link to full release: http://www.fz-juelich.de/SharedDocs/Pressemitteilungen/UK/EN/2017/2017-12-15-world-record-juelich-researchers-simulate-quantum-computer.html

The post World Record: Quantum Computer with 46 Qubits Simulated appeared first on HPCwire.

Researchers Advance User-Level Container Solution for HPC

Mon, 12/18/2017 - 09:51

Most scientific computing facilities, such us HPC or grid infrastructures, are shared among different research disciplines, and thus the system software environment needs to be generic enough to accommodate different user and applications profiles; they are multi-user environments.

Because of managerial and technical constraints, such infrastructures cannot afford offering every research project a tailored environment in their machines. Therefore the interest of exploring the applicability of containers technology on such systems is rather evident from the end-user point of view.

Researchers need then to customize their applications software to fit the computing center environment at the level of system software and batch system. Containers provide a way to pack and deploy software including all the dependencies in a way that can be executed in a seamless way, independently of the underlying Linux Operating System and environment. The main benefit of integrating the execution of containers in HPC systems would then be to provide a way to execute applications homogeneously across different resource centers.

The flagship container software, Docker, cannot be used in a satisfactory way on HPC systems, grids and in general multi-user oriented infrastructures. Deploying Docker on such facilities presents a number of problems related to the fact that within the container, processes are executed with the root id. This raises security concerns among system managers, as the Docker root might be able to gain access to root privileges in the host machine. Also, when executed as root, the processes escape from the usual managerial limits on resource consumption or accounting, imposed on regular users at shared facilities.

User-level tools

The user-level tool udocker provides a layer for users to execute Docker containers, that by definition, does not require the intervention of the system administrators. Udocker combines the pulling, extraction and execution of Docker containers without requiring privileges. The Docker image is extracted on a user-space filesystem area, and from there on, it is executed in an chroot-like environment.

udocker provides a command line interface that mimics Docker, providing a subset of its commands to be able to handle Docker images at the level of pulling, extracting and execute containers “á la Docker”.

Processes are run without privileges under the regular user id, under the same process tree, thus facilitating the enforcement of the managerial limits imposed to regular users in HPC or grid resource centers.

udocker provides several ways, depending on the application and host environment, to execute containerized applications. It is also possible to access specialized hardware like Infiniband for MPI jobs, or GPGPUs, making it adequate to execute containers in batch systems and HPC infrastructures.

udocker enables the execution of Docker containers with different engines based on intercepting system calls. Depending on the application requirements the user may choose to run in one execution mode or another. For instance CPU-intensive applications may use udocker in the ptrace execution mode, to intercept and modify pathnames; if the application is I/O intensive the interception of system calls via library pre-loading using the Fakechroot execution mode is a more adequate way to run the container. All the tools and libraries required by udocker and its execution modes are provided with udocker itself.

The udocker execution mode RunC employs the technology of user namespaces to run the containers in rootless mode. This feature can be used with modern Linux distributions with kernels from 3.9 on. However most HPC systems are conservative environments and it will take some time until they will be able to support this execution mode.

Regarding impact in performance, in the figure presented below we have plotted the weak scaling performance of openQCD, a comprehensive software package to run Lattice QCD simulations (a CPU-intensive application) from 8 to 256 cores.

As we see, the performance of the containerized version of openQCD is slightly higher than the one on the host itself. This is especially so when the execution takes place within a single node (the test machine has 24-core nodes).

This behavior has been reported consistently by container users across different hardware and system software settings, and it is related to the better libraries available in the more advanced versions of the operating systems inside the container. Clearly this feature opens the door to container exploitation in HPC mainframes since there the software system is by necessity very conservative.

Figure Caption: Weak Scaling performance of openQCD with a local lattice of Volume=32^4. The tests have been performed on the Finisterrae-II HPC system at CESGA (Spain).

Since its first release in June 2016 udocker expanded quickly in the open source community. It is being used in large international collaborations like the case of MasterCode, a leading particle physics phenomenology collaboration, which uses udocker to handle the library complexity of the set of codes included in the MasterCode.

It has also been adopted by a number of software projects to complement Docker. Among them openmole, bioconda, Common Workflow Language or SCAR.

System Administration level

Beyond the user level, several solutions have been developed in recent times to support system administrators in deploying customized containers for their users. These solutions rely on the installation of system software by the system administrator, which also is in charge of preparing the containers that the users are authorized to run on the system. The most popular of these tools is Singularity.

Singularity can be downloaded and installed from source or binaries, and must be installed by root for the software to have all the functionalities. Singularity binaries are therefore installed with SUID and need be deployed in a filesystem that allows SUID. Given the security concerns on network filesystems regarding SUID, Singularity is normally installed in a directory locally accessible to the users (i.e., not network-mounted).

Singularity offers its own containers registry, the Singularity Hub, and its own specification to create containers, the Singularity Recipe (i.e., the Singularity equivalent of the Dockerfile specification).

The default container format is squashfs, which is a compressed read-only Linux file system, where the images need to be created by root.

It also supports a sandbox format, in which the container is deployed inside a standard Unix directory, much like udocker. In particular, executing udocker in Singularity execution mode will cause the container to be executed via Singularity if installed in the system. In order to do this udocker exploits the sandbox mode.

The container building environment of Singularity belongs to root. Containers may be built either from a Singularity recipe, from a previous container coming from the Singularity Hub, or importing a container from the Docker repository. Notice that the Singularity format for containers is not compatible with Docker; therefore, in the latter case the container needs to be converted to the Singularity format.

Once the container exists, it can be executed by a regular user in a way analogous to Docker. These containers can also be checked at the binary level, at the level of sensitive content of the filesystem for example, or even for particular features defined by the system administrator.

The comparison of the most popular tools, udocker and Singularity, shows that they have a completely different scope, and the selection of one solution or another depends on the priorities at the user level and the computing center management policies.

Singularity is a system administration level tool, to be installed at this level, giving the managers of the infrastructure full control of which containers are run into the system or not. Udocker however is a user tool that acts as a layer over different execution methods, enabling regular users to run containers in their own user space, much in the philosophy of the jailed systems.

About the Authors

Jorge Gomes is a computing researcher at the Laboratory of Instrumentation and Experimental Particle Physics (LIP). He worked in the development of advanced data acquisition systems at CERN, and participated in pioneering projects in the domain of digital satellite data communications, IP over ATM, and advanced videoconferencing over IP networks. Since 2001 he has participated in numerous projects regarding distributed computing, networks and security in Europe and Latin America. He is the head of the LIP Advanced Computing and Digital Infrastructures Group and technical coordinator of the Portuguese National Grid Infrastructure, representative of Portugal in the Council of the European Grid Infrastructure (EGI) and responsible for the Portuguese participation in IBERGRID, that joins Portuguese and Spanish distributed computing infrastructures.

Isabel Campos is a physics researcher at the Spanish National Research Council (CSIC). She holds a PhD in the area of Lattice QCD simulations, and has hold research associate positions at DESY-Hamburg and Brookhaven National Lab, and Leibniz Supercomputing Center in Munich. Since 2005 she has participated in numerous project aimed at developing software and deploy distributed computing infrastructures in Europe. She is the head of the e-Science and Computing group at IFCA-CSIC, coordinator of the Spanish National Grid Infrastructure, representative of Spain in the Council of the European Grid Infrastructure (EGI) and responsible for the Spanish participation in IBERGRID, that joins the Spanish and Portuguese distributed computing infrastructures.

The post Researchers Advance User-Level Container Solution for HPC appeared first on HPCwire.

Meituan.com Selects Mellanox Interconnect Solutions to Accelerate its Artificial Intelligence, Big Data and Cloud Datacenters

Mon, 12/18/2017 - 07:33

SUNNYVALE, Calif. & YOKNEAM, Israel, Dec. 18, 2017 — Mellanox Technologies, Ltd. (NASDAQ: MLNX), a leading supplier of high-performance, end-to-end smart interconnect solutions for data center servers and storage systems, today announced that Meituan.com has selected Mellanox Spectrum Ethernet switches, ConnectX adapters and LinkX cables to accelerate its multi-thousand servers for their artificial intelligence, big data analytics and cloud data centers. Meituan.com is a leading online and on-demand delivery platform, supporting 280 million mobile users and 5 million merchants across 2,180 cities in China, and processing up to 21 million orders a day during peak times. Utilizing Mellanox 25 Gigabit and 100 Gigabit smart interconnect solutions and RDMA technology, Meituan.com can better analyze and match user needs to merchant online offers, faster and more accurately, while lowering data center operational costs.

“We have selected Mellanox smart 25 Gigabit and 100 Gigabit Ethernet adapters, switches and cables solutions to accelerate our artificial intelligence, big data and cloud data center, due to their leading performance, scalability, and RDMA technology,” said Hu XiangTao, Director of Meituan Cloud Operation at Meituan.com. “We are excited to collaborate with Mellanox to integrate its world-leading interconnect technology into our data centers, resulting in our ability to offer better services to our users, and reduce our expenses.”

“We are excited to collaborate with Meituan.com, enabling them to utilize our interconnect technology to accelerate business decisions and to offer better service to its users,” said Amir Prescher, senior vice president of business development at Mellanox Technologies. “Mellanox RDMA and our leading 25 Gigabit and 100 Gigabit Ethernet adapters and switches provide the needed data capacity and the ability to analyze growing amounts of data in real time, resulting in faster and more accurate business decisions, and better online services. We look forward to continuing to work with Meituan.com as it plans to scale its data center infrastructure.”

About Mellanox

Mellanox Technologies (NASDAQ: MLNX) is a leading supplier of end-to-end InfiniBand and Ethernet smart interconnect solutions and services for servers and storage. Mellanox interconnect solutions increase data center efficiency by providing the highest throughput and lowest latency, delivering data faster to applications and unlocking system performance capability. Mellanox offers a choice of fast interconnect products: adapters, switches, software and silicon that accelerate application runtime and maximize business results for a wide range of markets including high performance computing, enterprise data centers, Web 2.0, cloud, storage and financial services. More information is available at: www.mellanox.com.

Source: Mellanox

The post Meituan.com Selects Mellanox Interconnect Solutions to Accelerate its Artificial Intelligence, Big Data and Cloud Datacenters appeared first on HPCwire.

Leveraging Singularity to Unleash the Power of Hybrid HPC Clouds

Mon, 12/18/2017 - 01:01

Today’s HPC centers demand the flexibility of hybrid computing environments.  CFOs and CIOs understand the potential cost benefits of cloud computing and do not wish to be limited by on-premise compute investments.  A challenge administrators face when extending their computing capacity outside of their datacenter is delivering a cloud resource in a way that is easy for users to adopt. They need a straightforward extension of computing power into on-demand cloud services

HPC users have a myriad of operating system and application choices available to them in the open source community. Whether due to personal preference or scientific need, it is impossible to fit all application workflows into a single bucket.  This requires that HPC centers provide users with the tools needed to build custom application stacks in their environment of choice.

While virtualization technologies provide the capability to compartmentalize software into portable virtual machines, HPC applications face unacceptable performance and scalability limitations when running in a virtual environment.

The rise of Linux containers and the rapid success of frameworks like Docker quickly became of interest to the HPC community as containers enable the encapsulation of custom applications and their runtime environments without the performance penalty of virtualization.  Containers gave hope for the long awaited, bare-metal answer to software portability which could streamline the adaptability of hybrid cloud computing. HPC administrators quickly realized that although Linux containers could facilitate hybrid computing models, the available tools were designed for enterprise microservices and could not be easily, or securely be integrated into HPC clusters.

Singularity, founded by Gregory M. Kurtzer and supported by Lawrence Berkeley National Lab was launched in April of 2016 as the definitive framework for HPC Linux containers.  The Singularity team addressed all of the HPC challenges presented by enterprise container solutions including native support for MPI, a single flat-file image for hosting containers on parallel filesystems, restricted privilege escalation inside the container for security, and a user-space execution architecture that allowed HPC schedulers to retain control of resource management and job scheduling.

Additionally, Singularity is compatible with popular container repositories such as DockerHub.  This provides users the option to leverage the growing public depots of pre-packaged, containerized applications.  With two commands, a user can create a ready to run Singularity container from a Docker registry.

“I spoke with numerous scientists, and the problem to solve became clear. We need to support application mobility via an easily verifiable reproducible software stack that the scientists control and a runtime that fits a usage paradigm that fits a traditional HPC system architecture. This was the birth of the Singularity project.”  — Gregory M. Kurtzer.

Penguin Computing is a leader in open computing solutions, and delivers various hybrid computing options through Penguin Computing On-Demand (POD), a bare-metal HPC cloud service.  Coupled with Singularity, POD users are able to compartmentalize their software, and migrate workloads to and from the cloud without any changes to their operating system or application binaries.  Their HPC application stack is consistent and reproducible whether on-premise or in the cloud, and is not restrained by the underlying operating system HPC administrators choose to provision their clusters.

Administrators are able to drop-in Singularity to their on-premise clusters without any changes to their provisioning or scheduling environment.  The simple installation of Singularity encourages and empowers users to adopt the hybrid computing options their organization wishes to embrace. An HPC cluster can easily be enabled to allow users to create Linux containers that extend into POD for seamless integration into the cloud.

Penguin Computing’s POD is proud to be part of the community of HPC centers embracing Singularity, such as the National Institute of Health, the Texas Advanced Computing Center, and Lawrence Berkeley National Lab.

Register for Penguin Computing’s upcoming whitepaper on enabling hybrid HPC computing in the cloud through Singularity.

The post Leveraging Singularity to Unleash the Power of Hybrid HPC Clouds appeared first on HPCwire.

IBM Releases New Compilers to Exploit POWER9 Technology

Fri, 12/15/2017 - 10:04

Dec. 15, 2017 — On Dec 15, IBM released new compilers, XL C/C++ for Linux V13.1.6 and XL Fortran for Linux V15.1.6, to support the latest Power Systems server AC922 and NVIDIA GPU Volta. 

Image courtesy of IBM.

“The new C/C++ and Fortran compilers provide full exploitation of POWER9 technology for industry-leading performance and optimize HPC and Cognitive workloads through GPU acceleration, and is ideal for HPC clients, scientists, and AI leads,” says IBM. 

The POWER9 exploitation features, including a number of new POWER9 built-in functions and high-performance libraries tuned for POWER9, allow the development of optimized applications that utilize the latest POWER9 technology.

The IBM XL compilers’ support for OpenMP 4.5 is enhanced in this new release – new SIMD directives are added and functionality for existing directives is expanded to provide further exploitation and effective programming on GPU.

CUDA Fortran support is also improved to provide better performance for kernels, more functions and customized GPU configurations.

“With the best overall optimization for both CPU & GPU, XL Compilers are positioned as the performance-driven compiler brand on Power Systems to unlock HPC & Cognitive workloads. XL compilers are the ultimate choice to solve massive, complex computing tasks,” said the offering manager of IBM XL compilers.

Other key features of XL C/C++ for Linux V13.1.6 and XL Fortran for Linux V15.1.6 include:

  • Adoption of Clang V4.0 frond-end technology – XL C/C++ for Linux adopts Clang V4.0 frond-end technology, which provides a large degree of compatibility with GCC. Migration to XL C/C++ for Linux is now easy and seamless. More C++14 language features are supported, such as binary integer literals, digit separators, relaxing constraints on constexpr functions, return type deduction for normal functions, etc.
  • OpenMP interoperability with CUDA C/C++ and CUDA Fortran – Development of more portable applications is enabled by calling kernels written in CUDA C/C++ or CUDA Fortran in OpenMP programs from the host.
  • Support for CUDA Toolkit 9.0 and 9.1 – Support for the CUDA Toolkit has been upgraded to CUDA Toolkit version 9.0 and 9.1. The sm_70 and compute_70 GPU architectures are supported as defined by the CUDA Toolkit.
  • Specification of GPU architectures for the generated code – The new -qtgtarch option allows specifying the real or virtual GPU architectures where the code can run, overriding the default GPU architecture. The compiler can take maximum advantage of the capabilities and machine instructions which are specific to a GPU architecture, or common to a virtual architecture.
  • Support for the cuda-memcheck tool – A new environment variable is provided to control whether to disable the check for pinned memory in the runtime and allow the program to be executed under the cuda-memcheck tool from the NVIDIA CUDA Toolkit.
  • Pass LLVM IR bitcode libraries to llvm2ptx – LLVM IR bitcode libraries, which have a suffix of .bc, can be specified on the command line, to pass the LLVM IR bitcode libraries to llvm2ptx, the NVVM-IR to PTX translator.
  • GPU runtime inlining support for inlining calls made to the OpenMP GPU runtime libraries – This enhancement reduces overhead and significantly improves performance of OpenMP target regions that are offloaded to the accelerator.
  • Improved GPU code generation – This enhancement applies to several OpenMP directives when contained in an OpenMP target region, most notably parallel loops and reductions.

For a complete list of new XL compilers features, navigate to “What’s new” from XL C/C++ for Linux V13.1.6 documentation and XL Fortran for Linux V15.1.6 documentation on IBM Knowledge Center.

Download no-charge Community Edition and get started

The no-charge XL C/C++ for Linux and XL Fortran for Linux Community Editions are refreshed with all new functionalities, and allow for unlimited production use. They can be downloaded from the XL C/C++ for Linux and XL Fortran for Linux Marketplace website. The XL C/C++ for Linux and XL Fortran for Linux documentation on IBM Knowledge Center can guide new users through installation and basic compilation tasks. Though no official support is offered with the Community Edition, IBM compilers experts answer users’ feedback to the Community Edition raised at the XL on POWER Fortran Community Edition forum (ibm.biz/xl-power-compilers) to help users solve problems. 

Source: IBM

The post IBM Releases New Compilers to Exploit POWER9 Technology appeared first on HPCwire.

BP Supercomputer Now World’s Most Powerful for Commercial Research

Fri, 12/15/2017 - 09:49

HOUSTON, Dec. 15, 2017 — BP announced today that it has more than doubled the total computing power of its Center for High-Performance Computing (CHPC) in Houston, making it the most powerful supercomputer in the world for commercial research.

Increased computing power, speed and storage reduce the time needed to analyze large amounts of seismic data to support exploration, appraisal and development plans as well as other research and technology developments throughout BP.

“Our investment in supercomputing is another example of BP leading the way in digital technologies that deliver improved safety, reliability and efficiency across our operations and give us a clear competitive advantage,” said Ahmed Hashmi, BP’s head of upstream technology.

The Center for High-Performance Computing provides critical support to BP’s upstream business segment, where it serves as the worldwide hub for research computing. BP’s computer scientists and mathematicians at the CHPC have enabled industry breakthroughs in advanced seismic imaging and rock physics research to help with reservoir modelling.

BP’s downstream business also is using the supercomputer for fluid dynamic research to study hydrocarbon flows at refineries and pipelines to improve operational safety.

Working with Hewlett Packard Enterprise and Intel using HPE’s Apollo System and Intel’s Knights Landing processors, the recent upgrade has boosted the processing speed of BP’s supercomputer from four petaflops to nine petaflops. A petaflop of processing speed is one thousand trillion floating point operations, or “flops,” per second.

The supercomputer has a total memory of 1,140 terabytes (1.14 petabytes) and 30 petabytes of storage, the equivalent of over 500,000 iPhones.

“With the expansion and new systems in place, BP will be able to further bolster its capabilities to accurately process and manage vast amounts of seismic data to identify new business opportunities and improve operational efficiency,” said Alain Andreoli, senior vice president and general manager, Data Center Infrastructure Group, Hewlett Packard Enterprise.

Since the CHPC opened in 2013, BP has quadrupled its computing power and doubled its storage capacity and plans to continue expanding its computing capability in 2018.

About BP

BP is a global producer of oil and gas with operations in over 70 countries. Over the past 10 years, BP has invested $90 billion in the U.S. – more than any other energy company. BP employs about 14,000 people across the U.S. and supports more than 106,000 additional jobs through all its business activities. For more information on BP in the U.S., visit www.bp.com/us.

Source: BP

The post BP Supercomputer Now World’s Most Powerful for Commercial Research appeared first on HPCwire.

Mont-Blanc 2020 Project Looks to Pave the Way for New European Exascale Processors

Fri, 12/15/2017 - 09:36

LES CLAYES, France, Dec. 15, 2017 — Following on from the three successive Mont-Blanc projects since 2011, the three core partners Arm, Barcelona Supercomputing Center and Bull (Atos Group) have united again to trigger the development of the next generation of industrial processor for Big Data and High Performance Computing. The Mont-Blanc 2020 consortium also includes CEA, Forschungszentrum Jülich, Kalray, and SemiDynamics.

The Mont-Blanc 2020 project has a budget of 10.1 million Euros, funded by the European Commission under the Horizon2020 program. It was launched on 11th December at the Atos site in Les Clayes (France), with a kick-off meeting that gathered representatives of all partners.

Image courtesy of Mont-Blanc 2020.

The Mont-Blanc 2020 project intends to pave the way to the future low-power European processor for Exascale. To improve the economic sustainability of the processor generations that will result from the Mont-Blanc 2020 effort, the project includes the analysis of the requirements of other markets. The project’s strategy based on modular packaging would make it possible to create a family of SoCs
targeting different markets, such as “embedded HPC” for autonomous driving. The project’s actual objectives are to:

  • define a low-power System-on-Chip architecture targeting Exascale;
  • implement new critical building blocks (IPs) and provide a blueprint for its first generation implementation;
  • deliver initial proof-of-concept demonstration of its critical components on real life applications;
  • explore the reuse of the building blocks to serve other markets than HPC, with methodologies enabling a better time-predictability, especially for mixed-critical applications where guaranteed execution & response times are crucial.

The project will have to tackle three key challenges to achieve the desired performance with the targeted power consumption:

  1. understand the trade-offs between vector length, NoC bandwidth and memory bandwidth to maximize processing unit efficiency;
  2. an innovative on-die interconnect that can deliver enough bandwidth to the processing units, with minimum energy consumption;
  3. a high-bandwidth and low power memory solution with enough capacity and bandwidth for Exascale applications.

“The ambition of the consortium is to quickly industrialize our research. This is why we decided to rely on the Arm instruction set architecture (ISA), which is backed by a strong software ecosystem. By leveraging the current efforts, including the Mont-Blanc ecosystem and other international projects, we will benefit from the system software and applications required for successful usage” explained Said Derradji, Atos, coordinator of the Mont-Blanc 2020 project.

About the Mont-Blanc 2020 project

Mont-Blanc 2020’s goal is to initiate a family of processors that will be the basis for European Big Data / High Performance Computing exascale systems, and that will achieve market adoption and economic sustainability.

The Mont-Blanc 2020 project is run by a European consortium that includes:
Atos / Bull, the European number one in Big Data and High Performance Computing (coordinator, France);

  • Arm, the world’s leading semiconductor IP company (United Kingdom);
  • Barcelona Supercomputing Centre, the national supercomputing centre in Spain;
  • CEA, the French Alternative Energies and Atomic Energy Commission;
  • Forschungszentrum Jülich, one of the largest interdisciplinary research institutions in Europe (Germany);
  • Kalray, a leading innovator with its supercomputing on a chip MPPA solutions (France);
  • SemiDynamics, a specialist in microprocessor architecture, front-end design and verification services (Spain).

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 779877.

Source: Mont-Blanc 2020

The post Mont-Blanc 2020 Project Looks to Pave the Way for New European Exascale Processors appeared first on HPCwire.

IBM Launches Commercial Quantum Network with Samsung, ORNL

Thu, 12/14/2017 - 10:31

In the race to commercialize quantum computing, IBM is one of several companies leading the pack. Today, IBM announced it had signed JPMorgan Chase, Daimler AG, Samsung and a number of other corporations to its IBM Q Network, which provides online access to IBM’s experimental quantum computing systems. IBM is also establishing regional research hubs at IBM Research in New York, Oak Ridge National Lab in Tennessee, Keio University in Japan, Oxford University in the United Kingdom, and the University of Melbourne in Australia.

IBM Q system control panel (photo: IBM)

Twelve organizations in total will be using the IBM prototype quantum computer via the company’s cloud service to accelerate quantum development as they explore a broad set of industrial and scientific applications. Other partners include JSR Corporation, Barclays, Hitachi Metals, Honda, and Nagase.

Partners currently have access to the 20 qubit IBM Q system, which IBM announced last month, but Big Blue is also building an operational prototype 50 qubit processor, which will be made available in next generation IBM Q systems. The partners will specifically be looking to identify applications that will elicit a quantum advantage, such that they perform better or faster on a quantum machine than a classical one.

IBM leadership believes we are at the dawn of the commercial quantum era. “The IBM Q Network will serve as a vehicle to make quantum computing more accessible to businesses and organizations through access to the most advanced IBM Q systems and quantum ecosystem,” said Dario Gil, vice president of AI and IBM Q, IBM Research in a statement. “Working closely with our clients, together we can begin to explore the ways big and small quantum computing can address previously unsolvable problems applicable to industries such as financial services, automotive or chemistry. There will be a shared focus on discovering areas of quantum advantage that may lead to commercial, intellectual and societal benefit in the future.”

Experts from the newly formed IBM Q Consulting will be able to provide support and offer customized roadmaps to help clients become quantum-ready, says IBM.

With IBM Q, IBM seeks to be the first tech company to deliver commercial universal quantum computing systems for and in tandem with industry and research users. Although today marks the start of its commercial network, IBM has been providing scientists, researchers, and developers with free access to IBM Q processors since May 2016 via the IBM Q Experience. According to the company, 60,000 registered users have collectively run more than 1.7 million experiments and generated over 35 third-party research publications.

To see some really cool photos of IBM’s quantum computing technology, check out their flickr stream here (it’s really not to be missed).

The post IBM Launches Commercial Quantum Network with Samsung, ORNL appeared first on HPCwire.

Lenovo and Intel to Deliver Next-Generation Supercomputer to Leibniz Supercomputing Center

Thu, 12/14/2017 - 09:21

RESEARCH TRIANGLE PARK, N.C., Dec. 14, 2017 — Lenovo (SEHK:0992) (Pink Sheets:LNVGY) Data Center Group and Intel will deliver a next-generation supercomputer to Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences in Munich, Germany. One of the foremost European computing centers for professionals in the scientific, research and academic communities, LRZ is tasked with managing not only exponential amounts of big data, but processing and analyzing that data quickly to accelerate research initiatives around the world. For example, the LRZ recently completed the world’s largest simulation of earthquakes and resulting tsunami’s, such as the Sumatra-Andaman Earthquake. This research enables real-time scenarios planning that can help predict aftershocks and other seismic hazards.

Upon its completion in late 2018, the new supercomputer (called SuperMUC-NG) will support LRZ in its groundbreaking research across a variety of complex scientific disciplines, such as astrophysics, fluid dynamics and life sciences, by offering highly available, secure and energy-efficient high-performance computing (HPC) services that leverage industry-leading technology optimized to address the a broad range of scientific computing applications. The LRZ installation will also feature the 20-millionth server shipped by Lenovo, a significant milestone in the company’s data center history.

“Lenovo is committed to providing research institutions like LRZ with not only sheer computing power, but a true, end-to-end solution that can help effectively and efficiently solve critical humanitarian challenges. We’re pleased to be working on this next-generation project in partnership with Intel,” said Scott Tease, Executive Director, HPC and AI, Lenovo Data Center Group. “The new SuperMUC-NG installation will provide LRZ with greater compute power in a smaller data center footprint with drastically reduced energy usage through innovative water-cooling technology, offering researchers a comprehensive supercomputing solution that packs more performance than ever to accelerate critical research projects.”

The SuperMUC-NG will deliver a staggering 26.7 petaflop compute capacity powered by nearly 6,500 nodes of Lenovo’s recently-announced, next-generation ThinkSystem SD650 servers, featuring Intel Xeon Platinum processors with Intel Advanced Vector Extensions (Intel AVX 512), and interconnected with Intel Omni-Path Architecture. The new system will also include the integration of Lenovo Intelligent Computing Orchestrator (LiCO), a powerful management suite with an intuitive GUI that helps accelerate development of HPC and AI applications, as well as cloud-based components to empower LRZ researchers with the freedom to virtualize, process the vast amount of data sets and expediently share results with colleagues.

To address the often-astronomical operational expenses generated by high-performance computing (HPC) infrastructure, the new SuperMUC-NG supercomputer will benefit from Intel technical optimizations and also feature cutting-edge water cooling technology from Lenovo. In combination with Lenovo Energy Aware Run-Time (EAR) software, a technology that dynamically controls system infrastructure power while applications are still running, Lenovo’s comprehensive water-cooling technology delivers 45 percent greater electricity savings to LRZ as compared to a similar, standard air-cooled system. Together, these energy efficiency innovations will help further reduce the research center’s carbon footprint and total cost of ownership.

“Global research leaders like LRZ are driving insights that address not only some of the most complex problems we face, but that also make meaningful improvements in all of our lives,” said Trish Damkroger, Vice President of Technical Computing at Intel. “Intel offers the technical foundation that, when combined with the solution expertise of Lenovo, delivers the efficient performance and ease of programing to help LRZ’s researchers drive more discoveries with deeper analytics than have ever been possible before.”

Once operational, the LRZ SuperMUC-NG system is expected to place on the industry-wide TOP500 list.

About Lenovo

Lenovo (SEHK:0992) (Pink Sheets:LNVGY) is a US$43 billion global Fortune 500 company and a leader in providing innovative consumer, commercial, and data center technology. Our portfolio of high-quality, secure products and services covers PCs (including the legendary Think and multimode Yoga brands), workstations, servers, storage, networking, software (including ThinkSystem and ThinkAgile solutions), smart TVs and a family of mobile products like smartphones (including the Motorola brand), tablets and apps. Join us on LinkedIn, follow us on Facebook or Twitter (@Lenovo) or visit us at http://www.lenovo.com/.

Source: Lenovo

The post Lenovo and Intel to Deliver Next-Generation Supercomputer to Leibniz Supercomputing Center appeared first on HPCwire.

Second Industry-Wide, Multi-Vendor Plugfest Focused on NVMe Over Fibre Channel Fabric Completed by Fibre Channel Industry Association

Wed, 12/13/2017 - 13:48

MINNEAPOLIS, Dec. 13, 2017 — The Fibre Channel Industry Association (FCIA) today announced the completion of its second industry-wide multi-vendor plugfest focused on Non-Volatile Memory Express  (NVMe) over Fibre Channel (FC) Fabric and the first validation of the newly completed INCITS T11 FC-NVMe standard.

FCIA’s FC-NVMe plugfest was held during the NVM Express organization’s management interface (NVMe-MI) plugfest, the week of October 30, 2017, at the University of New Hampshire InterOperability Lab (UNH-IOL).  An independent provider of broad-based testing and standards conformance services for the networking industry, UNH-IOL has conducted more than 38 plugfests with FCIA over 18 years to test the continued development of FC technologies.

“The completion of this second FC-NVMe plugfest comes at a critical junction,” said FC-NVMe plugfest participant Mark Jones, president and chairman of the board, FCIA, and director, Technical Marketing and Performance, Broadcom Limited. “Major operating system vendors are just releasing support for NVMe over Fabrics and the FC-NVMe INCITS T11 standard is now complete and has been forwarded to INCITS. The FC-NVMe technology is on track to becoming as interoperable and reliable as previous generations of FC, while vastly improving performance for the next generation all-flash datacenters.”

With nine companies participating, the FCIA’s FC-NVMe plugfest featured conformance, error injection, multi-hop, and interoperability testing of FC-NVMe concurrently with Gen 6 32GFC and previous FC generation fabric switches and directors, utilizing datacenter-proven test tools and test methods.

Key accomplishments from this second FCIA-sponsored plugfest of FC-NVMe include:

  • Multiple vendor FC-NVMe initiator, switch, and target conformance and interoperability
  • Gen 6 16 and 32GFC fabric connectivity to a variety of market available NVMe drives
  • Data integrity validation over multi-vendor direct-connect and switched multi-hop fabric topologies
  • Error injection tests to validate correct FC-NVMe and FC recovery and data integrity
  • Concurrent NVMe and legacy SCSI traffic through the same FC fabric ports
  • FC-NVMe and FC over 32GFC long wave 10km single mode fiber inter-switch trunked ports
  • FC-NVMe packet inspection conformance analysis using advanced trace capture and analysis tools
  • Cross fabric inline trace based relative performance comparisons of FCP-SCSI and FC-NVMe
  • Multi-vendor high availability multi-speed concurrent FC-NVMe and FC fabric conformance and interoperability
  • Trials of the UNH-IOL’s NVMe over Fabrics conformance test suite of products for inclusion on the NVMe Integrator’s List

“It is an honor to lead and participate in plugfests with outstanding engineers from across the FC industry who share a single purpose of driving FC-NVMe technology to be interoperable at the highest level of dependability expected by the FC community,” said Barry Maskas, plugfest chair and Technical Staff consultant at Hewlett Packard Enterprise. “In comparison to the first FC-NVMe focused plugfest, results from this second event showed continued maturation demonstrated through tested use case configurations. Plugfest participants refined the validation, certification, and performance characterization foundation which has proven successful by previous FC technology innovations for the benefit of our collective customers.”The nine companies participating in FCIA’s FV-NVMe plugfest were:

  • Amphenol Corporation
  • Brocade Communications Systems, Inc.
  • Broadcom Limited
  • Cisco Systems
  • Hewlett Packard Enterprise
  • QLogic Corporation, a Cavium, Inc. company
  • SANBlaze Technology, Inc.
  • Teledyne Technologies; LeCroy Corporation
  • Viavi Solutions Inc.

“The UNH-IOL has worked with the FCIA and its member companies for over 20 years on Fibre Channel’s latest technological enhancements,” said Timothy Sheehan, manager, Datacenter Technologies, UNH-IOL. “The IOL is focused on supporting the for Non-Volatile Memory Express (NVMe) Fabrics conformance and interoperability testing and has brought this experience to the FCIA member companies. The last two FCIA plugfest events have focused on FC-NVMe testing and have shown great progress.”

About FCIA

The Fibre Channel Industry Association (FCIA) is a non-profit international organization whose sole purpose is to act as the independent technology and marketing voice of the Fibre Channel industry. We are committed to helping member organizations promote and position Fibre Channel, and to provide a focal point for Fibre Channel information, standards advocacy, and education. FCIA members include manufacturers, system integrators, developers, vendors, industry professionals, and end users. Our member-led working groups and committees focus on creating and championing the Fibre Channel technology roadmaps, targeting applications that include data storage, video, networking, and storage area network (SAN) management. For more info, go to http://www.fibrechannel.org.

Source: FCIA

The post Second Industry-Wide, Multi-Vendor Plugfest Focused on NVMe Over Fibre Channel Fabric Completed by Fibre Channel Industry Association appeared first on HPCwire.

NVM Express, Inc. Debuts NVMe Over Fabrics Compliance Testing

Wed, 12/13/2017 - 12:58

BEAVERTON, Ore., Dec. 13, 2017—NVM Express, Inc., the organization that developed the NVM Express (NVMe) and NVMe Management Interface (NVMe-MI) specifications for accessing solid-state drives (SSDs) on a PCI Express (PCIe) bus as well as over Fabrics, hosted its eighth NVMe Plugfest at theUniversity of New Hampshire Interoperability Laboratory (UNH-IOL) in Durham, N.H. during the week of October 30—November 2. The event offered the first official NVMe Over Fabrics (NVMe-oF) compliance and interoperability transport layer testing for RoCE, Remote Direct Memory Access (RDMA) over Converged Ethernet, and the Fibre Channel.

The testing performed by the UNH-IOL, an independent testing provider of standards conformance solutions and multi-vendor interoperability, generated 14 new certified products for the base NVMe Integrators Listand one for the NVMe-MI Integrators List. Eight inaugural products were also approved for the newly launched NVMe-oF Integrators List, which accepts RoCE initiators and targets, Ethernet switches, as well as Fibre Channel initiator, targets and switches, and software.

“Since 2013, the UNH-IOL has certified over 112 NVMe-based products at the NVM Express Plugfests,” David Woolf, senior engineer, Datacenter Technologies at the UNH-IOL, said. “By continuing to prioritize specification compliance and interoperability testing, companies can ensure faster time to market and seamless interactions with other devices NVMe-based solutions.”

Attendance at the NVMe Plugfest included 63 engineers from 19 different companies focused in enterprise, client, cloud storage and test equipment manufacturing. Participating NVM Express member companies included Broadcom, Brocade, Cavium, Cisco, Intel, Lite-On, Mellanox, Microsemi, Oakgate, SANBlaze, Seagate, SerialTek, SK Hynix, Starblaze, Teledyne-LeCroy, Via Technologies, Viavi, Toshiba and Western Digital.

“The growth and success of NVMe Plugfests demonstrate maturing NVM Express technology and a readiness for Fabrics,” Ryan Holmquist, Chair of the NVMe Interoperability and Compliance Committee (ICC), said. “Our testing events have expanded as existing standards evolve, offering a diverse and multiplying set of transports to test NVMe technologies—from PCIe architecture to over SAN, from ROCE or Fibre Channel. Beyond technology growth, the Plugfests foster a collaborative environment for device analyzer, test equipment manufacturers and industry experts to discuss issues and exchange ideas.”

The next NVMe Plugfest will be held in spring 2018 at the UNH-IOL in Durham, N.H.

About NVM Express, Inc.

With more than 100 members, NVM Express, Inc. is a non-profit organization focused on enabling broad ecosystem adoption of high performance and low latency non-volatile memory (NVM) storage through a standards-based approach. The organization offers an open collection of NVM Express (NVMe) specifications and information to fully expose the benefits of non-volatile memory in all types of computing environments from mobile to data center. NVMe-based specifications are designed from the ground up to deliver high bandwidth and low latency storage access for current and future NVM technologies. For more information, visit http://www.nvmexpress.org. The NVM Express Promoter Group is comprised of the following member companies: Cisco, Dell EMC, Facebook, Intel, Micron, Microsemi, Microsoft, NetApp, Oracle, Samsung, Seagate, Toshiba, and Western Digital.

Source: NVM Express, Inc.

The post NVM Express, Inc. Debuts NVMe Over Fabrics Compliance Testing appeared first on HPCwire.

TACC Researchers Test AI Traffic Monitoring Tool in Austin

Wed, 12/13/2017 - 10:47

Traffic jams and mishaps are often painful and sometimes dangerous facts of life. At this week’s IEEE International Conference on Big Data being held in Boston, researchers from TACC and colleagues will present a new deep learning tool that uses raw traffic camera footage from City of Austin cameras to recognize objects – people, cars, buses, trucks, bicycles, motorcycles and traffic lights – and characterize how those objects move and interact.

The researchers from Texas Advanced Computing Center (TACC), the University of Texas Center for Transportation Research and the City of Austin have been collaborating to develop tools that allow sophisticated, searchable traffic analyses using deep learning and data mining. An account of the work (Artificial Intelligence and Supercomputers to Help Alleviate Urban Traffic Problems), written by Aaron Dubrow, was posted this week on the TACC website.

Their work is being tested in parts of Austin where cameras on signal lights automatically counted vehicles in a 10-minute video clip, and preliminary results showed that their tool was 95 percent accurate overall.

“We are hoping to develop a flexible and efficient system to aid traffic researchers and decision-makers for dynamic, real-life analysis needs,” said Weijia Xu, a research scientist who leads the Data Mining & Statistics Group at TACC. “We don’t want to build a turn-key solution for a single, specific problem. We want to explore means that may be helpful for a number of analytical needs, even those that may pop up in the future.” The algorithm they developed for traffic analysis automatically labels all potential objects from the raw data, tracks objects by comparing them with other previously recognized objects and compares the outputs from each frame to uncover relationships among the objects.

The team used the open-source YOLO library and neural network developed by University of Washington and Facebook researchers for real-time object detection. According to the team, this is the first time YOLO has been applied to traffic data. For the data analysis and query component, they incorporated HiveQL, a query language maintained by the Apache Software Foundation that lets individuals search and compare data in the system.

Once researchers had developed a system capable of labeling, tracking and analyzing traffic, they applied it to two practical examples: counting how many moving vehicles traveled down a road and identifying close encounters between vehicles and pedestrians.

“Current practice often relies on the use of expensive sensors for continuous data collection or on traffic studies that sample traffic volumes for a few days during selected time periods,” Natalia Ruiz Juri, a research associate and director of the Network Modeling Center at UT’s Center for Transportation Research. “The use of artificial intelligence to automatically generate traffic volumes from existing cameras would provide a much broader spatial and temporal coverage of the transportation network, facilitating the generation of valuable datasets to support innovative research and to understand the impact of traffic management and operation decisions.”

Whether autonomous vehicles will mitigate the problem is an ongoing debate and Juri notes, “The highly anticipated introduction of self-driving and connected cars may lead to significant changes in the behavior of vehicles and pedestrians and on the performance of roadways. Video data will play a key role in understanding such changes, and artificial intelligence may be central to enabling comprehensive large-scale studies that truly capture the impact of the new technologies.”

Link to full article: https://www.tacc.utexas.edu/-/artificial-intelligence-and-supercomputers-to-help-alleviate-urban-traffic-problems

Link to video on the work: http://soda.tacc.utexas.edu

Images: TACC

The post TACC Researchers Test AI Traffic Monitoring Tool in Austin appeared first on HPCwire.

Supermicro Announces Receipt of Extension from Nasdaq

Wed, 12/13/2017 - 09:14

SAN JOSE, Calif., Dec. 13, 2017 — Super Micro Computer, Inc. (NASDAQ:SMCI), a global leader in high-performance, high-efficiency server, storage technology and green computing, today announced that on December 11, 2017 it had received a letter from the Nasdaq Stock Market (“Nasdaq”) confirming that the Company has been granted an exception to enable the Company to regain compliance with the Nasdaq continued listing requirements. Pursuant to the terms of the exception, on or before March 13, 2018, the Company must file its Annual Report on Form 10-K for the fiscal year ended June 30, 2017 as well as its Quarterly Reports on Form 10-Q for the quarters ended September 30, 2017 and December 31, 2017.

Pursuant to Nasdaq rules, Super Micro’s securities will remain listed on the Nasdaq Global Select Market pending satisfaction of the terms of the exception. In the event the Company does not make the filings within the time period required, Nasdaq will provide written notification that the Company’s securities will be delisted. At that time, the Company may appeal Nasdaq’s determination to a Hearings Panel. Super Micro intends to take all necessary steps to achieve compliance with the Nasdaq continued listing requirements as soon as practicable.

About Super Micro Computer, Inc.

Supermicro, a global leader in high-performance, high-efficiency server technology and innovation is a premier provider of end-to-end green computing solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/Big Data, HPC and Embedded Systems worldwide. Supermicro’s advanced Server Building Block Solutions® offer a vast array of components for building energy-efficient, application-optimized, computing solutions. Architecture innovations include Twin, TwinPro, FatTwin, Ultra Series, MicroCloud, MicroBlade, SuperBlade, Double-sided Storage, Battery Backup Power (BBP) modules and WIO/UIO.

Source: Super Micro Computer, Inc.

The post Supermicro Announces Receipt of Extension from Nasdaq appeared first on HPCwire.

AMD Wins Another: Baidu to Deploy EPYC on Single Socket Servers

Wed, 12/13/2017 - 06:30

When AMD introduced its EPYC chip line in June, the company said a portion of the line was specifically designed to re-invigorate a single socket segment in what has become an overwhelmingly two-socket landscape in the data center. Today, AMD and Baidu announced that China’s giant internet provider would offer AI, big data, and cloud computing services on EPYC-based single socket solutions.

This deal follows last week’s announcement that Microsoft Azure would offer EPYC-based instances (see HPCwire article, Azure Debuts AMD EPYC Instances for Storage Optimized Workloads). The EPYC line’s high memory bandwidth and IO capacity makes it well suited for many areas but especially for storage servers. AMD is working to ensure EPYC doesn’t become stereotyped by this perception.

“You have probably seen in the industry a fair number of single socket platforms from us but they have tended to be more on the storage optimized or GPU optimized,” said Scott Aylor, AMD corporate vice president and general manager of Enterprise Solutions. For example, HPE introduced a storage optimized server, CL3150, using a single socket EPYC design. “Given the variety of services that Baidu deploys, including storage but also others, I want people to know this is really a compute oriented platform,” said Aylor.

It’s clear AMD is targeting price-performance points that it hopes Intel will find difficult to match and that will help AMD reclaim chunks of the x86 data center market after a lengthy absence. The single socket gambit is an important part of the strategy as was made clear by Aylor at the June launch.

“We can build a no compromise one-socket offering that will allow us to cover up to 50 percent of the two-socket market that is today held by the [Intel Broadwell] E5-2650 and below.

“In our one socket offering we have come up with a clever way to maintain all of the I/O capabilities that you would get in a two socket as well as the full complement of eight memory channels. Today people buy two socket, sometimes because they need to, but more often than not because they have to. There are many examples in which I/O rich [workloads] like storage, like GPU compute, and some vertical workloads where people don’t necessarily need two sockets from a CPU performance perspective,” said Aylor.

AMD contends the EPYC processor will deliver 2.6X the I/O density than competitive[i] solutions and enable Baidu to achieve a level of scale and efficiency unrivaled in high-performance x86. “The combination of performance from the EPYC processor cores, and compute and I/O density packaged in a single-socket configuration, provides the ideal platform for Baidu’s next generation cloud services,” according to AMD.

“By offering outstanding performance in single-processor systems, the AMD EPYC platform provides flexibility and high-performance in our datacenter, which allows Baidu to deliver more efficient services to our customers,” said Liu Chao, senior director, Baidu System Technologies Department in the official release.

Again, from the EPYC launch in June, Aylor said, “We’ve selectively optimized a couple of skews for one socket only. So these are skews that are one socket capable only.” As an example of how the one socket and two socket offerings are distinguished, he cited on package interconnect, “The infinity fabric that would normally connect the two sockets in a two socket system, we repurpose that interconnect into more I/O lanes and that’s how you have in a two socket solution 128 lanes of PCIe and in a one socket solution you still keep the same level of connectivity.”

Today’s announcement punctuates what has been a heady year for AMD. Adoption of the single socket solution by Baidu is another demonstration of market traction and according to AMD, Baidu expects to expand its use of EPYC processors across its global datacenters beginning in the first quarter of 2018.

“This announcement with Baidu and the fact that it is AI, big data, and cloud; those are all computing oriented workloads. So think about the point we raised when we first launched [which] is we now can take what has been part of the mainstream of the market and everything that historically has been the [Intel] E5-2650 and below, and really, looking at the [Skylake] Silver and Gold today from [Intel], we can really address that now with a single socket platform,” said Aylor.

It will be interesting to watch how big a swath AMD’s single socket initiative can cut in the competitive data center market. Aylor said more and more varied single socket EPYC-based offerings are coming, but didn’t specify from who or when.

[i] Information supplied by AMD: AMD EPYCTM processor supports up to 128 PCIe Gen 3 I/O lanes (in both 1 and 2-socket configuration), versus the Intel Xeon SP Series processor supporting a maximum of 48 lanes PCIe Gen 3 per CPU, plus 20 lanes in the I/O chip (max of 68 lanes on 1 socket and 96 lanes on 2 socket). NAP-56

The post AMD Wins Another: Baidu to Deploy EPYC on Single Socket Servers appeared first on HPCwire.

Microsoft Wants to Speed Quantum Development

Tue, 12/12/2017 - 16:13

Quantum computing continues to make headlines in what remains of 2017 as tech giants jockey to establish a pole position in the race toward commercialization of quantum. This week, Microsoft took the next step in advancing its vision for the future of computing that it says will spur major advances in artificial intelligence and address humanities biggest challenges such as world hunger and climate change.

On Monday, Microsoft unveiled its custom Q# (Q-sharp) programming language as part of its effort to build an end-to-end topological quantum computing system suitable for commercial purposes. Along with a simulator for debugging and testing quantum code, Q# is included in Microsoft’s Quantum Development Kit, first announced by the company in September.

“Designed ground up for quantum, Q# is the most approachable high-level programming language with a native type system for qubits, operators, and other abstractions,” says Microsoft. “It is fully integrated with Visual Studio, enabling a complete professional enterprise-grade development tooling system for the fastest path to quantum programming efficiency.”

Using the local quantum simulator on a standard laptop, developers will be able to simulate up to 30 logical qubits, according to Microsoft. For developers who want to go beyond that, Microsoft is offering an Azure-based simulator that supports simulations above 40 logical qubits.

The preview version of the development kit is available at no charge and comes with documentation, libraries and sample programs. Microsoft said that the kit will “give people the background they need to start playing around with aspects of computing that are unique to quantum systems, such as quantum teleportation.”

According to the company, programs created for the simulator will be transferable to a real topological machine, which Microsoft is in the process of developing. Microsoft’s approach to building a universal quantum computer is centered on the topological qubit, purported to be more stable than other qubit implementations. Most approaches to quantum computing require massive amounts of error correction such that a useful device could require 10 physical qubits to achieve one logical qubit, potentially pushing up the number of physical qubits into the tens of thousands. Researchers propose that the topological qubit naturally resists decoherence and therefore requires less error correction. Conceivably this would make it possible to build a quantum machine with fewer physical qubits.

In the video below, Krysta Svore, principal researcher at Microsoft, demonstrates the new Microsoft Quantum Development Kit.

Lots of good info to get started here — https://docs.microsoft.com/en-us/quantum/index?view=qsharp-preview

The post Microsoft Wants to Speed Quantum Development appeared first on HPCwire.