DENVER, Colo., Feb. 23 — Applications are now being accepted for the Student Volunteers program at the SC17 conference to be held Nov. 12-17 in Denver. Both undergraduate and graduate students are encouraged to apply.
Students will be required to work a minimum number of hours during the conference, giving them time to engage in important education and career-advancing activities such as tutorials, technical talks, panels, poster sessions and workshops. Student Volunteers help with the administration of the conference and have the opportunity to participate in student-oriented activities, including professional development workshops, technical talks by famous researchers and industry leaders, exploring the exhibits and developing lasting peer connections.
The Student Volunteers program will accept a large number of students, both domestic and international, with the goal of transitioning students into the main conference by way of the Technical Program, Doctoral Showcase and Early Career professional development sessions.
Being a Student Volunteer can be transformative, from helping to find internships to deciding to pursue graduate school. Read about how Ather Sharif’s Student Volunteer experience inspired him to enroll in a Ph.D. program.
The deadline to apply is June 15.
The post Applications Now Open for Student Volunteers at SC17 Conference appeared first on HPCwire.
FRANKFURT, Germany, Feb. 23 — The organizers of the ISC High Performance conference are very pleased to introduce data scientist, Prof. Dr. Jennifer Tour Chayes, the managing director and co-founder of Microsoft Research New England and Microsoft Research New York City, as the ISC 2017 conference keynote speaker. Her talk will be titled “Network Science: From the Massive Online Networks to Cancer Genomics.”
She will be speaking at 9 am, which is right after the opening session on Monday, June 19. This year’s ISC High Performance conference will be held at Messe Frankfurt from June 18 – 22, and will be attended by over 3,000 HPC community members, including researchers, scientists and business leaders.
In her keynote abstract, Chayes sets up her topic as follows:
“Everywhere we turn these days, we find massive data sets that are appropriately described as networks. In the high tech world, we see the Internet, the World Wide Web, mobile phone networks, a variety of online social networks like Facebook and LinkedIn, and massive online networks of users and products like Netflix and Amazon. In economics, we are increasingly experiencing both the positive and negative effects of a global networked economy. In epidemiology, we find disease spreading over our ever growing social networks, complicated by mutation of the disease agents. In biomedical research, we are beginning to understand the structure of gene regulatory networks, with the prospect of using this understanding to manage many human diseases.”
Chayes is one of the inventors of the field of graphons, which are graph functions now widely used for machine learning of massive networks. She will briefly introduce some of the models she and her collaborator are using to describe these networks, the processes they are studying on the networks, the algorithms they have devised for the networks, and finally, methods to indirectly infer latent network structure from measured data as well as some of the processes, methods and algorithms they are using to derive insights from those networks.
“I’ll discuss in some detail two particular applications: the very efficient machine learning algorithms for doing collaborative filtering on massive sparse networks of users and products, like the Netflix network; and the inference algorithms on cancer genomic data to suggest possible drug targets for certain kinds of cancer,” explains Chayes.
She joined Microsoft Research in 1997, when she co-founded the Theory Group. She is the co-author of over 135 scientific papers and the co-inventor of more than 30 patents. Her research areas include phase transitions in discrete mathematics and computer science, structural and dynamical properties of self-engineered networks, graph theory, graph algorithms, algorithmic game theory, and computational biology.
Chayes holds a BA in biology and physics from Wesleyan University, where she graduated first in her class, and a PhD in mathematical physics from Princeton. She did postdoctoral work in the Mathematics and Physics Departments at Harvard and Cornell. She is the recipient of the NSF Postdoctoral Fellowship, the Sloan Fellowship, the UCLA Distinguished Teaching Award, and the ABI Women of Leadership Vision Award. She has twice been a member of the IAS in Princeton. Chayes is a Fellow of the American Association for the Advancement of Science, the Fields Institute, the Association for Computing Machinery, and the American Mathematical Society, and an elected member of the American Academy of Arts and Sciences. She is the winner of the 2015 John von Neumann Award, the highest honor of the Society of Industrial and Applied Mathematics. In 2016, Chayes received an Honorary Doctorate from Leiden University.
2017 Conference Registration Opens March 1
The organizers are also pleased to announce that the early-bird registration for this year’s conference and exhibition will open March 1. By registering early, attendees will be able to save money and secure their choice of hotels. For ISC 2017 partner hotels and special rates, please look at Frankfurt Hotels under Travel & Stay.
About ISC High Performance
First held in 1986, ISC High Performance is the world’s oldest and Europe’s most important conference and networking event for the HPC community. It offers a strong five-day technical program focusing on HPC technological development and its application in scientific fields, as well as its adoption in commercial environments.
Over 400 hand-picked expert speakers and 150 exhibitors, consisting of leading research centers and vendors, will greet attendees at ISC High Performance. A number of events complement the Monday – Wednesday keynotes, including the Distinguished Speaker Series, the Industry Track, The Machine Learning Track, Tutorials, Workshops, the Research Paper Sessions, Birds-of-a-Feather (BoF) Sessions, Research Poster, the PhD Forum, Project Poster Sessions and Exhibitor Forums.
Source: ISC High Performance
The post Microsoft Researcher Tapped for Opening Keynote at ISC 2017 appeared first on HPCwire.
Feb. 23 — CIARA, a global technology provider specializing in the design, development, manufacturing, integration and support of cutting-edge products and services, announced today they are an official Intel Technology Provider, HPC Data Center Specialist. This recognition certifies CIARA as experts in creating innovative high performance computing technology using the latest Intel processors to customers.
“We are proud to have achieved Intel Technology Provider HPC Data Center Specialist status,” said Shannon Shragie, Director of Product Management at CIARA. “Our collaboration with Intel HPC experts enables us to leverage Intel test tools, reducing our research and development costs, ensuring the highest quality, and offering customers the lowest total cost of ownership.”
This certification recognizes CIARA’s history of technology excellence in the design and deployment of HPC solutions, including high performance computing clusters, performance-optimized servers, storage solutions, GPU platforms and large-scale data center solutions for businesses worldwide. CIARA’s total end-to-end data center solution includes design, rack and stack, on-site rack deployment, hardware support and recycling.
CIARA works with Intel to develop solutions using the latest technologies and architectures. Achieving HPC Data Center Specialist status, means Intel has validated CIARA as a trusted partner.
Founded in 1984, CIARA is a global technology provider that specializes in the design, engineering, manufacturing, integration, deployment, support and recycling of cutting-edge IT products. With its vast range of products and services including desktops, workstations, servers and storage, HPC products, high frequency servers, OEM services, deployment services, colocation services and IT asset disposition services; CIARA is considered to be one of the largest system manufacturers in North America and the only provider capable of offering a total hardware lifecycle management solution. The company’s products are employed worldwide by organizations small to large in the sectors of public cloud, content delivery, finance, aerospace, engineering, transportation, energy, government, education and defense.
The post CIARA Achieves Platinum Intel Technology Provider, HPC Data Center Specialist Status appeared first on HPCwire.
Feb. 23 — As with many fields, computing is changing how geologists conduct their research. One example: the emergence of digital rock physics, where tiny fragments of rock are scanned at high resolution, their 3-D structures are reconstructed, and this data is used as the basis for virtual simulations and experiments.
Digital rock physics complements the laboratory and field work that geologists, petroleum engineers, hydrologists, environmental scientists, and others traditionally rely on. In specific cases, it provides important insights into the interaction of porous rocks and the fluids that flow through them that would be impossible to glean in the lab.
In 2015, the National Science Foundation (NSF) awarded a team of researchers from The University of Texas at Austin and the Texas Advanced Computing Center (TACC) a two-year, $600,000 grant to build the Digital Rocks Portal where researchers can store, share, organize and analyze the structures of porous media, using the latest technologies in data management and computation.
“The project lets researchers organize and preserve images and related experimental measurements of different porous materials,” said Maša Prodanović, associate professor of petroleum and geosystems engineering at The University of Texas at Austin (UT Austin). “It improves access to them for a wider geosciences and engineering community and thus enables scientific inquiry and engineering decisions founded on a data-driven basis.”
The grant is a part of EarthCube, a large NSF-supported initiative that aims to create an infrastructure for all available Earth system data to make the data easily accessible and useable.
Small pores, big impacts
The small-scale material properties of rocks play a major role in their large-scale behavior – whether it is how the Earth retains water after a storm or where oil might be discovered and how best to get it out of the ground.
As an example, Prodanović points to the limestone rock above the Edwards Aquifer, which underlies central Texas and provides water for the region. Fractures occupy about five percent of the aquifer rock volume, but these fractures tend to dominate the flow of water through the rock.
“All of the rain goes through the fractures without accessing the rest of the rock. Consequently, there’s a lot of flooding and the water doesn’t get stored,” she explained. “That’s a problem in water management.”
Digital rocks physicists typically perform computed tomography (CT) scans of rock samples and then reconstruct the material’s internal structure using computer software. Alternatively, a branch of the field creates synthetic, virtual rocks to test theories of how porous rock structures might impact fluid flow.
In both cases, the three-dimensional datasets that are created are quite large — frequently several gigabytes in size. This leads to significant challenges when researchers seek to store, share and analyze their data. Even when data sets are made available, they typically only live online for a matter of months before they are erased due to space issues. This impedes scientific cross-validation.
Furthermore, scientists often want to conduct studies that span multiple length scales — connecting what occurs at the micrometer scale (a millionth of a meter: the size of individual pores and grains making up a rock) to the kilometer scale (the level of a petroleum reservoir, geological basin or aquifer), but cannot do so without available data.
The Digital Rocks Portal helps solve many of these problems.
James McClure, a computational scientist at Virginia Tech uses the Digital Rocks Portal to access the data he needs to perform large-scale fluid flow simulations and to share data directly with collaborators.
“The Digital Rocks Portal is essential to share and curate experimentally-generated data, both of which are essential to allow for re-analyses and reproducibility,” said McClure. “It also provides a mechanism to enable analyses that span multiple data sets, which researchers cannot perform individually.”
The Portal is still young, but its creators hope that, over time, material studies at all scales can be linked together and results can be confirmed by multiple studies.
“When you have a lot of research revolving around a five-millimeter cube, how do I really say what the properties of this are on a kilometer scale?” Prodanović said. “There’s a big gap in scales and bridging that gap is where we want to go.”
A framework for knowledge sharing
When the research team was preparing the Portal, they visited the labs of numerous research teams to better understand the types of data researchers collected and how they naturally organized their work.
Though there was no domain-wide standard, there were enough commonalities to enable them to develop a framework that researchers could use to input their data and make it accessible to others.
“We developed a data model that ended up being quite intuitive for the end-user,” said Maria Esteva, a digital archivist at TACC. “It captures features that illustrate the individual projects but also provides an organizational schema for the data.”
The entire article can be found here.
Source: Aaron Dubrow, TACC
The post Supercomputer-Powered Portal Provides Data, Simulations to Geology and Engineering Community appeared first on HPCwire.
OAK RIDGE, Tenn., Feb. 23 — The Department of Energy’s Oak Ridge National Laboratory has announced the latest release of its Adaptable I/O System (ADIOS), a middleware that speeds up scientific simulations on parallel computing resources such as the laboratory’s Titan supercomputer by making input/output operations more efficient.
While ADIOS has long been used by researchers to streamline file reading and writing in their applications, the production of data in scientific computing is growing faster than I/O can handle. Reducing data “on the fly” is critical to keep I/O up to speed with today’s largest scientific simulations and realize the full potential of resources such as Titan to make real-world scientific breakthroughs. And it’s also a key feature in the latest ADIOS release.
“As we approach the exascale, there are many challenges for ADIOS and I/O in general,” said Scott Klasky, scientific data group leader in ORNL’s Computer Science and Mathematics Division. “We must reduce the amount of data being processed and program for new architectures. We also must make our I/O frameworks interoperable with one another, and version 1.11 is the first step in that direction.”
The upgrade boasts a number of new improvements aimed at ensuring these challenges are met, including
- a simplified write application programming interface (API) that reduces complexity via introduction of a novel buffering technique;
- lossy compression with ZFP, a software from Peter Lindstrom at Lawrence Livermore National Laboratory, that reduces the size of data on storage;
- a query API with multiple indexing/query methods, from John Wu at Lawrence Berkeley National Laboratory and Nagiza Samatova of North Carolina State University;
- a “bprecover” utility for resilience that exploits the ADIOS file format’s multiple copies of metadata;
- in-memory time aggregation for file-based output, allowing for efficient I/O with difficult write patterns;
- novel Titan-scale-supported staging from Manish Parashar at Rutgers University; and
- a laundry list of various other performance improvements.
These modifications represent the latest evolution in ADIOS’s journey from research to production, as version 1.11 now makes it easier to move data from one code to another. ADIOS’s user base has gone from just a single code to hundreds of parallel applications spread across dozens of domain areas.
“ADIOS has been a vital part of our large-scale XGC fusion code,” said Choong-Seock Chang, head of the Center for Edge Physics Simulation at Princeton Plasma Physics Laboratory. “With the continuous version updates, the performance of XGC keeps getting better; during one of our most recent ITER runs, we were able to further accelerate the I/O, which enabled new insights into our scientific results.”
ADIOS’s success in the scientific community has led to its adoption among several industrial applications seeking more efficient I/O. Demand for ADIOS has grown sufficiently so that the development team is now partnering with Kitware, a world leader in data visualization infrastructure, to construct a data framework for the scientific community that will further the efficient location and reduction of data plaguing parallel scientific computing and likely further grow ADIOS’s user base.
Throughout its evolution, ADIOS’s development team has ensured that the middleware remains fast, concurrent, scalable, portable, and perhaps most of all, resilient (the bprecover feature in 1.11 that allows for the recovery of uncorrupted data). According to Klasky, being part of the DOE national lab system was critical to ensuring the scalability of the ever-growing platform, an asset that will remain critical as ORNL moves towards the exascale.
Because exascale hardware is widely expected to be disruptive, particularly in terms of incredibly fast nodes that will make it difficult for networks and I/O to keep up, researchers are preparing now for the daunting I/O challenge to come.
ADIOS was one of four ORNL-led software development projects to receive funding from the Exascale Computing Project, a collaborative effort between the DOE’s Office of Science and the National Nuclear Security Administration to develop a capable exascale ecosystem, encompassing applications, system software, hardware technologies and architectures, and workforce to meet the scientific and national security mission needs of DOE in the mid-2020 timeframe.
The award is a testament to ADIOS’s ability in making newer technologies sustainable, usable, fast, and interoperable – so that they will all be able to read from and possibly write to other important file formats.
As the journey to exascale continues, ADIOS’s unique I/O capabilities will be necessary to ensure that the world’s most powerful computers, and the applications they host, can continue to facilitate scientific breakthroughs impossible through experimentation alone.
“With ADIOS we saw a 20-fold increase in I/O performance compared to our best previous solution,” said Michael Bussmann, a junior group leader in computational radiation physics at Helmholtz-Zentrum Dresden-Rossendorf. “This made it possible to take full snapshots of the simulation, enabling us to study our laser-driven particle accelerator from the single-particle level to the full system. It is a game changer, going from 20 minutes to below one minute for a snapshot.”
The Titan supercomputer is part of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility.
ORNL is managed by UT-Battelle for DOE’s Office of Science. DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.
The post ADIOS Version 1.11 Moves I/O Framework from Research to Production appeared first on HPCwire.
The performance of trade and match servers can be a critical differentiator for financial trading houses. Latency is often the most problematic bottleneck affecting an institution’s ability to quickly match and complete trades. Earlier this month, HPE’s ProLiant XL170r Gen9 Trade & Match Servers demonstrated the lowest max latency of any systems tested using the STAC-N1 test according to STAC.
Compared to all other public STAC-N1 reports of Ethernet-based SUTs, this SUT (stack under test) demonstrated:
- The lowest max latency at both the base rate (100K messages per second) and the highest rate tested (1M messages per second). Max at 1 million messages per second was 18 microseconds vs. the previous best 51 microseconds (SUT ID SFC141110).
- The lowest mean latency at both the base rate and the highest rate tested.
STAC notes, “[Because STAC-N1 is not tied to a particular network API, it can be used to compare stacks using different APIs (for example, UDP/Ethernet vs. RDMA/Infiniband). However, STAC-N1 is often used to compare different stacks using the same API (for example, UDP with one vendor’s NIC and driver vs. UDP with another vendor’s NIC and driver). When making the latter type of comparison, it is essential that the SUTs you are comparing used the same STAC-N1 binding.”
In this instance, the stack under test consisted of UDP over 10GbE using OpenOnload on RHEL 6.6 with Solarflare SFN 8522-PLUS Adapters on HPE ProLiant XL170r Gen9 Trade & Match Servers.
Test details: “The STAC-N benchmark was performed on two of the 4 HPE ProLiant XL170r Gen9 Servers in a 2U HPE Apollo r2600 chassis, a component of the HPE Apollo 2000 System. The HPE Apollo Trade and Match Server Solution is designed to minimize system latency, utilizing the HPE Apollo 2000 system that has been optimized for applications performing best at maximum frequency and with lower core count. This solution utilizes custom tools to enable over-clocked processors for improved performance specifically for high-frequency trading operations. The HPE Trade and Match Server Solution building block is based on the HPE Apollo 2000 System. Each chassis can accommodate up to four HPE ProLiant XL170r Gen9 Server trays, supporting one Intel Xeon E5-1680v3 processor each.
“The STAC N1 Benchmark exercised two of four HPE ProLiant XL170r Gen9 Servers in a 2U HPE Apollo r2600 Chassis, with each server configured with one Intel E5-1680v3 processor and eight 32 GiB DIMMs. The chassis was configured with two 1400W Power supplies.”
In this recurring feature, we’ll provide you with financial highlights from companies in the HPC industry. Check back in regularly for an updated list with the most pertinent fiscal information.
Cray (NASDAQ: CRAY)
Cray has reported fourth quarter and full year 2016 financial results. Total revenue for the year was $629.8 million, which was an increase of over $100 million from the year prior ($724.7 million). Net income was $10.6 million for the year, or $0.26 per diluted share.
For the fourth quarter, sales reached $346.6 million, a substantial increase from the same quarter of 2015 ($267.5 million). Net income for the quarter was $51.8 million, or $1.27 per diluted share.
“While 2016 wasn’t nearly as strong as we originally targeted, we finished the year well, with the largest revenue quarter in our history and solid cash balances, as well as delivering profitability for the year,” said Peter Ungaro, president and CEO of Cray. “We completed numerous large system installations around the world in the fourth quarter, providing our customers with the most scalable, highest performance supercomputing, storage and analytics solutions in the market. We continue to lead the industry at the high-end and, despite an ongoing downturn in the market, we’re in excellent position to continue to deliver for our customers and drive long-term growth.”
Super Micro Computer (NASDAQ: SMCI)
Supermicro has announced second quarter 2017 financial results. The company reported quarterly net sales of $652 million, which was an increase of 23.3% from the first quarter of the year and up 2% from the same quarter of 2016. GAAP net income was $22 million, up 62.5% from the first quarter and down 36.6% from the same quarter of 2016. Server offerings accounted for 68.1% of the total revenue.
For the third quarter of 2017, Supermicro expects $570-$630 million in net sales and GAAP earnings per diluted share to sit between $0.34 and $0.42. For more information, click here.
“We are pleased to report record second quarter revenues of $652.0 million that exceeded our guidance and outpaced a strong compare with last year. Contributing to this strong growth was our Twin family product line including our FatTwin, Storage, HPC, MicroBlade, and strong growth from enterprise cloud and Asia Pacific, particularly China. Component shortages and pricing, product and geographic mix adversely impacted gross margins while improved leverage allowed us to deliver stronger operating margins from last quarter,” said Charles Liang, chairman and CEO. “We expect to continue the growth of last quarter and be reflected in the year-over-year revenue growth in the March quarter based on an increasing number of sizable customer engagements demanding the performance and advantages of our leading product lines. In addition, we are well positioned to benefit from technology transitions in 2017 and have upgraded our product lines to optimize these new technologies.”
Mellanox Technologies (NASDAQ: MLNX)
Mellanox Technologies has reported fourth quarter and full year 2016 results. For the year, total revenue was $857.5 million, GAAP operating income was $30.6 million, and GAAP net income was $18.5 million ($0.37 per diluted share). For the fourth quarter, revenue was $221.7 million, GAAP operating income was $13.4 million, and GAAP net income was $9 million ($0.18 per diluted share).
For the first quarter of 2017, the company predicts revenue to range between $200-210 million. For more information, click here.
“During the fourth quarter we saw continued sequential growth in our InfiniBand business, driven by robust customer adoption of our 100 Gigabit EDR solutions into artificial intelligence, machine learning, high-performance computing, storage, database and more. Our quarterly, and full-year 2016 results, highlight InfiniBand’s continued leadership in high-performance interconnects,” said Eyal Waldman, president and CEO of Mellanox. “Customer adoption of our 25, 50, and 100 gigabit Ethernet solutions continued to grow in the fourth quarter. Adoption of Spectrum Ethernet switches by customers worldwide generated positive momentum exiting 2016. Our fourth quarter and full-year 2016 results demonstrate Mellanox’s diversification, and leadership in both Ethernet and InfiniBand. We anticipate growth in 2017 from all Mellanox product lines.”
Hewlett Packard Enterprise (NYSE: HPE)
HPE has announced full year and fourth quarter financial results for 2016. The company brought in $50.1 billion for the year, down 4% from the prior year period. For the fourth quarter, HPE’s net revenue was $12.5 billion, a decrease of 7% from the fourth quarter of 2015. HPE reported GAAP diluted net earnings per share of $1.82 for the year and $0.18 for the quarter.
For the first quarter of 2017, HPE predicts GAAP diluted net earnings per share to sit between $0.03 and $0.07. For the year, the company expects it to range between $0.72 and $0.82. For more information, click here.
“FY16 was a historic year for Hewlett Packard Enterprise,” said Meg Whitman, president and CEO of Hewlett Packard Enterprise. “During our first year as a standalone company, HPE delivered the business performance we promised, fulfilled our commitment to introduce groundbreaking innovation, and began to transform the company through strategic changes designed to enable even better financial performance.”
NVIDIA (NASDAQ: NVDA)
NVIDIA has reported results for the fourth quarter and fiscal 2017. Total sales for the year reached $6.91 billion, an increase of 38% from the year prior. GAAP earnings per diluted share were $1.13, up 117% from the previous year ($0.52). For the quarter, revenue was $2.17 billion, up 55% from the same quarter of 2016. GAAP earnings per diluted share reached $0.99, an increase of 19% from the third quarter.
For the first quarter of 2018, NVIDIA expects sales to sit around $1.90 billion. For more information, click here.
“We had a great finish to a record year, with continued strong growth across all our businesses,” said Jen-Hsun Huang, founder and CEO of NVIDIA. “Our GPU computing platform is enjoying rapid adoption in artificial intelligence, cloud computing, gaming, and autonomous vehicles. Deep learning on NVIDIA GPUs, a breakthrough approach to AI, is helping to tackle challenges such as self-driving cars, early cancer detection and weather prediction. We can now see that GPU-based deep learning will revolutionize major industries, from consumer internet and transportation to health care and manufacturing. The era of AI is upon us.”
IBM (NYSE: IBM)
IBM has reported 2016 fourth quarter and full year financial results. For the year, IBM announced $11.9 billion in net income from continuing operations, down 11% from the previous year ($13.4 billion). Diluted earnings per share were $12.39, also down (9%) from the year before. For the fourth quarter, the company reported net income of $4.5 billion from continuing operations, up 1% from the same quarter a year prior.
For 2017, IBM predicts GAAP diluted earnings per share to be at least $11.95. For more information, click here.
“In 2016, our strategic imperatives grew to represent more than 40 percent of our total revenue and we have established ourselves as the industry’s leading cognitive solutions and cloud platform company,” said Ginni Rometty, IBM chairman, president and CEO. “IBM Watson is the world’s leading AI platform for business, and emerging solutions such as IBM Blockchain are enabling new levels of trust in transactions of every kind. More and more clients are choosing the IBM Cloud because of its differentiated capabilities, which are helping to transform industries, such as financial services, airlines and retail.”
AMD (NASDAQ: AMD)
AMD has reported 2016 fourth quarter and full year financial results. For the year, the company announced revenue of $4.27 billion, an increase of 7% from 2015 ($3.99 billion). Total revenue for the quarter was $1.11 billion, up 15% year-over-year ($958 million).
For the first quarter of 2017, AMD predicts revenue to decrease 11%, plus or minus 3%. For more information, click here.
“We met our strategic objectives in 2016, successfully executing our product roadmaps, regaining share in key markets, strengthening our financial foundation, and delivering annual revenue growth,” said Dr. Lisa Su, AMD president and CEO. “As we enter 2017, we are well positioned and on-track to deliver our strongest set of high-performance computing and graphics products in more than a decade.”
Fujitsu (OTC: FJTSY)
Fujitsu has announced 2016 third quarter results. Consolidated revenue for the quarter was 1,115.4 billion yen, down 51.4 billion yen from the same quarter of 2015. The company also reported an operating profit of 37.3 billion yen, up 23.2 billion yen from the year prior. Net financial income was 5.5 billion yen, an improvement of 2.9 billion yen from the same period of 2015.
For the full year of 2016, Fujitsu expects revenue to reach 4,500 billion yen with an operating profit of 120 billion yen. For more information, click here.
Seagate Technology (NASDAQ: STX)
Seagate has reported second quarter 2017 financial results. The company announced revenue of $2.9 billion, net income of $297 million, and diluted earnings per share of $1.00. For more information, click here.
“The Company’s product execution, operational performance, and financial results improved every quarter throughout 2016. In the December quarter we achieved near record results in gross margin, cash flow, and profitability. Seagate’s employees are to be congratulated for their incredible effort,” said Steve Luczo, Seagate’s chairman and CEO. “Looking ahead, we are optimistic about the long-term opportunities for Seagate’s business as enterprises and consumers embrace and benefit from the shift of storage to cloud and mobile applications. Seagate is well positioned to work with the leaders in this digital transformation with a broad market-leading storage solution portfolio.”
Just what constitutes HPC and how best to support it is a keen topic currently. A new paper posted last week on arXiv.org – Rethinking HPC Platforms: Challenges, Opportunities and Recommendations – by researchers from the University of Edinburgh and University of St. Andrews suggests the emergence of “second generation” HPC applications (and users) requires a new approach to supporting infrastructure that draws on container-like technology and services.
In the paper they describe a set of services, which they call ‘cHPC’ (container HPC), to accommodate these emerging HPC application requirements and indicate they plan to benchmark key applications as a next step. “Many of the emerging second generation HPC applications move beyond tightly-coupled, compute-centric methods and algorithms and embrace more heterogeneous, multi-component workflows, dynamic and ad-hoc computation and data-centric methodologies,” write authors Ole Weidner, Rosa Filgueira Vicente, Malcolm Atkinson, and Adam Barker.
“While diverging from the traditional HPC application profile, many of these applications still rely on the large number of tightly coupled cores, cutting-edge hardware and advanced interconnect topologies provided only by HPC clusters. Consequently, HPC platform providers often find themselves faced with requirements and requests that are so diverse and dynamic that they become increasingly difficult to fulfill efficiently within the current operational policies and platform models.”
It’s best to read the paper in full which examines in some detail the challenges and potential solutions. The authors single out three applications areas and report that as a group they have deep experience working with them:
- Data Intensive Applications. Data-intensive applications require large volumes of data and devote a large fraction of their execution time to I/O and manipulation of data. Careful attention to data handling is necessary to achieve acceptable performance or completion. “They are frequently sensitive to local storage for intermediate results and reference data. It is also sensitive to the data-intensive frameworks and workflow systems available on the platform and to the proximity of data it uses.” Examples of large-scale, data-intensive HPC applications are seismic noise cross-correlation and misfit calculation as encountered, e.g. in the VERCE project.
- Dynamic Applications. These fall into two broad categories: “applications for which we do not have full understanding of the runtime behavior and resource requirements prior to execution and (ii) applications which can change their runtime behavior and resource requirements during execution.” Two examples cited are: (a) applications that use ensemble Kalman-Filters for data assimilation in forecasting, (b) simulations that use adaptive mesh refinement (AMR) to refine the accuracy of their solutions.
- Federated applications. “Based on the idea that federation fosters collaboration and allows scalability beyond a single platform, policies and funding schemes explicitly supporting the development of concepts and technology for HPC federations have been put into place. Larger federations of HPC platforms are XSEDE in the US, and the PRACE in the EU. Both platforms provide access to several TOP-500 ranked HPC clusters and an array of smaller and experimental platforms.”
“To explore the implementation options for our new platform model, we have developed cHPC, a set of operating-system level services and APIs that can run alongside and integrate with existing job via Linux containers (LXC) to pro- vide isolated, user-deployed application environment containers, application introspection and resource throttling via the cgroups kernel extension. The LXC runtime and software-defined networking are provided by Docker and run as OS services on the compute nodes,” say the authors. (see figure 2 from the papers shown here)
The authors note prominently in their discussion that many traditional HPC applications are still best served by traditional HPC environments for which they have been carefully coupled.
“It would be false to claim that current production HPC platforms fail to meet the requirements of their application communities. It would be equally wrong to claim that the existing platform model is a pervasive problem that generally stalls the innovation and productivity of HPC applications…[There are] significant classes of applications, often from the monolithic, tightly-coupled parallel realm, [that] have few concerns regarding the issues out-lined in this paper…They are the original tenants and drivers of HPC and have an effective social and technical symbiosis with their platform environments.
“However, it is equally important to understand that other classes of applications (that we call second generation applications) and their respective user communities share a less rosy perspective. These second generation applications are typically non-monolithic, dynamic in terms of their runtime behavior and resource requirements, or based on higher-level tools and frameworks that manage compute, data and communication. Some of them actively explore new compute and data handling paradigms, and operate in a larger, federated context that spans multiple, distributed HPC clusters.”
To qualify and quantify their assumptions, the authors report they are in the process of designing a survey that will be sent out to platform providers and application groups to verify current issues on a broader and larger scale. They write, “The main focus of our work will be on the further evaluation of our prototype system. We are working on a ‘bare metal’ deployment on HPC cluster hardware at EPCC. This will allow us to carry out detailed measurements and benchmarks to analyze the overhead and scalability of our approach. We will also engage with computational science groups working on second generation applications to explore their real-life application in the context of cHPC.”
The post Rethinking HPC Platforms for ‘Second Gen’ Applications appeared first on HPCwire.
SANTA CLARA, Calif., Feb. 22 — DataDirect Networks (DDN) today announced that it was – once again – the top storage provider among HPC sites surveyed by Intersect360. For the third consecutive year, DDN posted the largest share of installed systems at HPC sites and held its solid lead over other storage providers at HPC sites surveyed in Intersect360 Research’s “Top of All Things in HPC” survey. This report caps off a year of strong recognition of DDN as the performance storage leader that included awards ranging from best HPC storage product/technology company, best big data innovator, best storage company and best enterprise NAS to leadership recognition in IDC’s MarketScape report.
As illustrated in the table below, DDN had the largest share of installed systems at HPC sites (14.8 percent), gaining almost a full percentage point over the previous year. DDN’s closest competitors follow at 12.7 and 11.0 percent, and all other suppliers had less than 10 percent share of reported storage systems. DDN’s continued strong showing is a testament to the success of the company’s focus on solving the toughest data access and management challenges to deliver consistent, cost-effective performance at scale.
Intersect360 Research forecasts storage to be the fastest growing hardware sector in HPC, and according to a recent DDN survey, end users in the world’s most data-intense environments, like those in many general IT environments, are increasing their use of cloud. However, unlike general IT environments, the HPC sector is overwhelmingly opting for private and hybrid clouds instead of the public cloud. More than 90 percent of HPC sites surveyed are modernizing their data centers with flash, with the largest cited use cases being flash acceleration of parallel file system metadata, specific application data and specific end-user data. Survey responses show that I/O performance and rapid data growth remain the biggest issues for HPC organizations – a circumstance that favors continuing strong demand for DDN technologies that are leading the market in solving these challenges.
“High-performance sites are incredibly challenging IT environments with massive data requirements across very diverse application and user types,” said Laura Shepard, senior director of product marketing, DDN. “Because we are a leader in this space, we have the expertise to provide the optimal solutions for traditional and commercial high-performance customers to ensure they are maximizing their compute investment with the right storage infrastructure.”
DataDirect Networks (DDN) is the world’s leading big data storage supplier to data-intensive, global organizations. For more than 18 years, DDN has designed, developed, deployed and optimized systems, software and storage solutions that enable enterprises, service providers, universities and government agencies to generate more value and to accelerate time to insight from their data and information, on premise and in the cloud. Organizations leverage the power of DDN storage technology and the deep technical expertise of its team to capture, store, process, analyze, collaborate and distribute data, information and content at the largest scale in the most efficient, reliable and cost-effective manner. DDN customers include many of the world’s leading financial services firms and banks, healthcare and life science organizations, manufacturing and energy companies, government and research facilities, and web and cloud service providers. For more information, go to www.ddn.com or call 1-800-837-2298.
The post DDN Named Top Storage Provider Among HPC Sites by Intersect360 appeared first on HPCwire.
UNIVERSITY PARK, Pa., Feb. 22 — The Penn State Cyber-Laboratory for Astronomy, Materials, and Physics (CyberLAMP) is acquiring a high-performance computer cluster that will facilitate interdisciplinary research and training in cyberscience and is funded by a grant from the National Science Foundation. The hybrid computer cluster will combine general purpose central processing unit (CPU) cores with specialized hardware accelerators, including the latest generation of NVIDIA graphics processing units (GPUs) and Intel Xeon Phi processors.
“This state-of-the-art computer cluster will provide Penn State researchers with over 3200 CPU and Phi cores, as well as 101 GPUs, a significant increase in the computing power available at Penn State,” said Yuexing Li, assistant professor of astronomy and astrophysics and the principal investigator of the project.(Source: Penn State)
Astronomers and physicists at Penn State will use this computer cluster to improve the analysis of the massive observational datasets generated by cutting-edge surveys and instruments. They will be able to broaden the search for Earth-like planets by the Habitable Zone Planet Finder, sharpen the sensitivity of the Laser Interferometer Gravitational-Wave Observatory (LIGO) to the cataclysmic merger of ultra-massive astrophysical objects like black holes and neutron stars, and dramatically enhance the ability of the IceCube experiment to detect and reconstruct elusive cosmological and atmospheric neutrinos.
“The order-of-magnitude improvement in processing power provided by CyberLAMP GPUs will revolutionize the way the IceCube experiment analyzes its data, enabling it to extract many more neutrinos, with much finer detail, than ever before,” said co-principal investigator Doug Cowen, professor of physics and astronomy and astrophysics.
“The CyberLAMP team performs sophisticated simulations to study the formation of planetary systems and the universe,” said co-principal investigator Eric Ford, professor of astronomy and astrophysics and deputy director of the Center for Exoplanets and Habitable Worlds. “The CyberLAMP cluster will enable simulations with greater realism to investigate mysteries such as how Earth-like planets form, and to probe the nature of dark energy.”
“Researchers from Penn State’s Material Research Institute (MRI) will perform realistic, atomistic-scale simulations to guide the design and development of next-generation complex materials,” said co-principal investigator Adri van Duin, professor of mechanical and nuclear engineering and director of the Materials Computation Center.
Co-principal investigator Mahmut Kandemir, professor of computer science and engineering, said, “Computer scientists will work with other scientists to analyze the performance of their calculations when using new hardware accelerators so as to increase the efficiency of their simulations and to inform the design of future computer architectures.”
“Penn State’s Institute for CyberScience (ICS) is excited by this opportunity to rapidly expand the access of Penn State researchers and students to the new generation of hardware accelerators that will be critical to meet the growing computational needs of `Big Data’ and `Big Simulation’ research,” said Jenni Evans, professor of meteorology and interim director of the Institute for CyberScience.
“This grant will enable Penn State to shed new light on high-priority topics in U.S. national strategic plans,” said Andrew Stephenson, distinguished professor of biology and associate dean for research and innovation of Penn State’s Eberly College of Science, “such as the National Research Council’s 2010 Decadal Survey for astronomy and astrophysics to search for habitable planets and to understand the fundamental physics of the cosmos, as well as the White House’s Materials Genome Initiative to expedite development of new materials.”
The new system will support research in five broad research groups, including 29 Penn State faculty members across seven departments, three colleges and two institutes at Penn State’s University Park campus, as well as four faculty members from three Commonwealth Campuses, and numerous graduate students. The CyberLAMP system will be installed in Penn State’s new Tower Road Data Center and will be accessible to faculty and students across the Commonwealth.
“The grant will also provide access to the CyberLAMP system to support a wide range of outreach programs at regional and national levels, including the training of students and young researchers nationwide, educational programs for K-12 students and teachers, broadening participation of women and underrepresented minority students in cyberscience, and partnering with industry on materials research and the design of next-generation high-performance computer architectures,” said Chris Palma, outreach director of CyberLAMP.
The 3-year project, titled “MRI: Acquisition of High Performance Hybrid Computing Cluster to Advance Cyber-Enabled Science and Education at Penn State,” is led by Li, and co-principal investigators Ford, Cowen, Kandemir and van Duin in partnership with the Institute of CyberScience’s (ICS) Advanced CyberInfrastructure group, led by Chuck Gilbert, chief architect of ICS, and Wayne Figurelle, assistant director of ICS. The ICS is a University-wide institute whose mission is to promote interdisciplinary research. ICS was established in 2012 to develop a strategic and coherent vision for cyberscience at Penn State.
Source: Penn State
Feb. 22 — Asetek announced today the signing of a development agreement with a major player in the data center space.
The end-goal of the development agreement is to have products in the market before year-end and resulting revenue to have significant impact on Asetek’s future data center business. The name of the partner will be disclosed at a later date.
“This development agreement is the direct result of several years of collaboration and I am very pleased that we have come this far with our partner. I expect this is the major breakthrough we have been waiting for,” said André Sloth Eriksen, CEO and founder of Asetek.
Current data center OEM customers include Fujitsu, Penguin and CRAY. Asetek’s RackCDU D2C liquid cooling is used in nine installations in the TOP500 list of the fastest supercomputers in the world, and in nine installations in the Green500 list of the world’s most energy efficient supercomputers.
Asetek (ASETEK.OL) is the global leader in liquid cooling solutions for data centers, servers and PCs. Asetek’s server products enable OEMs to offer cost effective, high performance liquid cooling data center solutions. Its PC products are targeted at the gaming and high performance desktop PC segments. With over 3.9 million liquid cooling units deployed, Asetek’s patented technology is being adopted by a growing portfolio of OEMs and channel partners. Founded in 2000, Asetek is headquartered in Denmark and has operations in California, Texas, China and Taiwan. For more information, visit www.asetek.com.
The post Asetek Signs Datacenter Product Development Agreement appeared first on HPCwire.
RALEIGH, N.C., Feb. 22 — As part of our commitment to delivering open technologies across many computing architectures, Red Hat has joined the OpenPOWER Foundation, an open development community based on the POWER microprocessor architecture, at the Platinum level. While we already do build and support open technologies for the POWER architecture, the OpenPOWER Foundation is committed to an open, community-driven technology-creation process – something that we feel is critical to the continued growth of open collaboration around POWER.
As a participant in the OpenPOWER community and a member of the Board of Directors (where we are currently represented by Scott Herold), we plan to focus on helping to create open source software for POWER-based architectures, offering more choice, control and flexibility to developers working on hyperscale and cloud-based data centers. Additionally, we’re excited to work with other technology leaders on advanced server, networking, storage and I/O acceleration technologies, all built on a set of common, open standards.
We feel that open standards, like those being utilized by OpenPOWER, are critical to enterprise IT innovation, offering a common set of guidelines for the integration, implementation and security of new technologies. Modern standards bodies such as OpenPOWER and others seek to democratize guidelines across a broad, inclusive community, focusing on agility and providing a common ground for emerging technology. Red Hat is a strong proponent of open standards across the technology stack, participating in groups that cover the emerging software (OCI, CNCF) as well as hardware (CCIX, GenZ) stacks.
The development efforts of the OpenPOWER Foundation benefit many partners that we already work with, and we look forward to increased collaboration in an open, transparent environment. We’re also looking to support many other emerging technical areas of the community. These include machine learning and artificial intelligence, data platforms and analytics, as well as cloud and container deployments.
We’re pleased to be a part of OpenPOWER, and look forward to helping craft community-driven collaborative designs that broaden customer technology choices across the breadth of enterprise IT.
“As the technology stack becomes increasingly more complex, deploying virtual machines, cloud services and bare metal technologies must all interact simultaneously. It’s critical that we have a foundational set of standards that seamlessly work across hardware architectures. The OpenPOWER Foundation helps to set these standards for POWER systems, and Red Hat is an excellent addition to the Foundation’s leadership, both as a partner and for their extensive work in developing community-driven standards,” said Scot Schultz, Director, HPC and Technical Computing, Mellanox.
“The development model of the OpenPOWER Foundation is one that elicits collaboration and represents a new way in exploiting and innovating around processor technology. POWER architecture is well tailored for many traditional and new applications, enabling OpenPOWER Foundation members like Red Hat to add their own innovations on top of the hardware technologies or create new solutions that capitalize on emerging workloads such as cognitive applications like AI and deep learning,” said Ken King, general manager, OpenPOWER, IBM.
Source: The Red Hat Multi-Architecture Team
Feb. 22 — Registration is now open for the first annual PEARC conference! PEARC17 is open to professionals and students in advanced research computing. The conference will take place July 9-13 at the Hyatt Regency New Orleans (601 Loyola Ave., New Orleans). Registrants can book their conference registration and hotel room at pearc.org.
The PEARC (Practice & Experience in Advanced Research Computing) conference series is being ushered in with support from many organizations, and will build upon earlier conferences’ success and core audiences to serve the broader community. In addition to XSEDE, organizations supporting the new conference include the Advancing Research Computing on Campuses: Best Practices Workshop (ARCC), the Science Gateways Community Institute (SGCI), the Campus Research Computing Consortium (CaRC), the ACI-REF consortium, the Blue Waters project, ESnet, Open Science Grid, Compute Canada, the EGI Foundation, the Coalition for Academic Scientific Computation (CASC), and Internet2.
Registration costs are as listed below:
Regular Registration: $500 (Tues – Thurs)
Late Registration Fee: $600 (as of 5 p.m. ET 5/31/17)
Student Registration: $300 Note: must provide ID upon check-in for the conference
Student Registration: $360 Note: must provide ID upon check-in for the conference (as of 5 p.m. ET 5/31/17)
Tutorial Fee: $125 (Monday only)
Late Tutorial Fee: $150 (as of 5 p.m. ET 5/31/17)
Student Tutorial Fee: $80 (Monday only)
Late Student Tutorial Fee: $95 (as of 5 p.m. ET 5/31/17)
One Day Registration: $200
Late One Day: $240 (as of 5 p.m. ET 5/31/17)
Two Day Registration: $400
Late Two Day: $480 (as of 5 p.m. ET 5/31/17)
The Call for Participation is also open and accepting submissions from now until March 6 for Technical Papers and Tutorials. External Program and Workshop proposals are due March 31. Poster, Visualization Showcase and Birds-of-a-Feather submissions are due May 1.
Source: PEARC Conference Series
Feb. 22, 2017 — U.S. Department of Energy (DOE) high-performance computer sites have selected a dynamic fusion code, led by physicist C.S. Chang of the DOE’s Princeton Plasma Physics Laboratory (PPPL), for optimization on three powerful new supercomputers. The PPPL-led code was one of only three codes out of more than 30 science and engineering programs selected to participate in Early Science programs on all three new supercomputers, which will serve as forerunners for even more powerful exascale machines that are to begin operating in the United States in the early 2020s.
The PPPL code, called XGC, simulates behavior of the ions, electrons and neutral atoms in the transport barrier region— or “pedestal” — between the ultra-hot core of the plasma that fuels fusion reactions and the cooler and turbulent outer edge of the plasma. The pedestal must be high and wide enough to prevent damage to the divertor plate that exhausts heat in doughnut-shaped tokamaks that house the fusion reactions. “How to create a high edge pedestal without damaging the divertor wall is the key question to be answered,” said Chang. “That is a prerequisite for achieving steady state fusion.”
Among the team of nationwide experts developing this program are PPPL physicists Seung-Ho Ku, Robert Hager and Stephane Ethier.
Selection of the PPPL code could help ready it for exascale development. “Computer architecture is evolving rapidly and these new pre-exascale computers have features that are quite different from some of the earlier petascale supercomputers,” said Amitava Bhattacharjee, head of the Theory Department at PPPL. Petascale machines operate in petaflops, or one million billion (1015) floating point operations per second.
Bhattacharjee heads a PPPL-led Exascale Computing Project that will integrate the XGC code with GENE, a code developed at the University of California, Los Angeles, to create the first simulation of a complete fusion plasma. Exascale supercomputers will perform exaflops, or a billion billion (1018) floating point operations per second.
The three new pre-exascale supercomputers:
Cori, now fully installed at the National Energy Research Scientific Computing Center (NERSC) at the Lawrence Berkeley National Laboratory. Cori, named for biochemist Gerty Cori, the first American woman to win a Nobel Prize in science, has a theoretical peak speed of 30 petaflops per second on scientific applications using Intel Xeon “Haswell” and Xeon Phi “Knights Landing” processor nodes.
Also selected to participate in Cori’s NERSC Exascale Science Applications Program (NESAP) is the PPPL-led M3D-CI, an extended magnetohydrodynamics (MHD) code focused on simulation of plasma disruptions led by physicist Stephen Jardin, with support from physicists Joshua Breslau, Nate Ferraro and Jin Chen.
Two more PPPL-led codes, in addition to the 20 that included XGC and M3D-CI that were previously selected, will participate in the Cori NERSC program. These programs are GTC-P and GTS codes that model plasma turbulence in the plasma core and are headed by physicists William Tang and Stephane Ethier. Principal developer of the GTS code is PPPL physicist Weixing Wang. The GTC-P code is PPPL’s version of the GTC code led by the University of California, Irvine.
Summit is to be operational at the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory in 2018. Summit features a hybrid architecture consisting of IBM Power 9 processors and multiple NVIDIA Volta graphic processing units and will be capable of performing up to at least 200 petaflops for a wide range of applications. The facility’s Center for Accelerated Application Readiness (CAAR) program has selected 13 projects that will participate in the program to optimize their applications codes and demonstrate the effectiveness of their applications on Summit.
Aurora, scheduled to be deployed in 2018 at the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory, will be comprised of third generation Intel Xeon Phi “Knights Hill” many-core processors. Ten projects have been selected for the ALCF Early Science Program, which is expected to be capable of performing up to 200 petaflops on a wide range of scientific applications.
PPPL, on Princeton University’s Forrestal Campus in Plainsboro, N.J., is devoted to creating new knowledge about the physics of plasmas — ultra-hot, charged gases — and to developing practical solutions for the creation of fusion energy. The Laboratory is managed by the University for the U.S. Department of Energy’s Office of Science, which is the largest single supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, visit science.energy.gov.
Source: John Greenwald, PPPL
The post PPPL Fusion Code Selected for Optimization on Three DOE Supercomputers appeared first on HPCwire.
Feb. 22 — Atos, a global leader in digital services, today announces record results in 2016 and the over-achievement of all its 2016 financial objectives.
Revenue was € 11,717 million, up +9.7% year-on-year, +12.8% at constant exchange rates, and +1.8% organically. Revenue grew by +1.9% organically in the fourth quarter, materializing the good sales momentum and the continued revenue trend improvement. This dynamism was particularly led by the Atos Digital Transformation Factory answering the strong demand of large organizations in their digital transformation.
Operating margin was € 1,104 million, representing 9.4% of revenue, compared to 8.3% in 2015 at constant scope and exchange rates. This improvement by +110 basis points was notably resulting from more cloud based business and the continuous execution of the Tier One efficiency program through industrialization, global delivery from offshore locations, and continuous optimization of SG&A. In addition, operating margin benefitted from ongoing cost synergies including the integration of Unify.
The commercial dynamism of the Group was particularly strong in 2016 with record order entry reaching € 13.0 billion, +16.2% compared to € 11.2 billion statutory in 2015. It represented a book to bill ratio of 111% in 2016, of which 119% during the fourth quarter of 2016. Full backlog increased by +11.9% year-on-year to € 21.4 billion at the end of 2016, representing 1.8 year of revenue. The full qualified pipeline represented 6.4 months of revenue at € 6.5 billion, compared to € 6.2 billion published at the end of 2015.
Net income was € 620 million, +41.9% year-on-year and net income Group share reached € 567 million, +39.6%. Basic EPS Group share was € 5.47, +36.1% compared to € 4.01 in 2015 and diluted EPS Group share was € 5.44, +36.5% compared to € 3.98 during 2015.
Free cash flow reached € 579 million in 2016, +47.3% compared to € 393 million in 2015, materializing a strong improvement ofoperating margin conversion rate to free cash flow, reaching 52.5% in 2016 compared to 43% in 2015 and in line with the circa 65% 2019 objective. Net cash position was € 481 million at the end of 2016.
Thierry Breton, Chairman and CEO, said, “In 2016, we achieved an excellent performance by overreaching all our financial commitments. Atos delivered revenue growth across all sectors, as well as record margin improvement and free cash flow conversion. Accelerating innovation in cybersecurity, automation, and analytics, mirroring the booming demand from our customers, combined with a rigorous execution of our strategy were the key factors of this success. Our very solid financial performance materialized the alignment of our comprehensive Digital Transformation Factory with rising client needs.
With this record performance, Atos’ teams have built a unique foundation to deliver our new 3-year plan “2019 Ambition”, matching new expectations of our clients, gaining new market shares, driving more profitable growth and cash generation, while continuing to enhance value creation for our shareholders.
Indeed, year after year, Atos Board of Directors has carefully designed a Group able to embrace the global digital transformation while offering stronger visibility and resilience in a less predictable environment. We can count on the now tier-one technological profile of Atos, on its very solid balance sheet, and on the quality and dedication of our 100,000 digital technologists to strengthen our leadership in digital transformation and to deliver stronger financials in 2017, the first year of the new 3-year plan.”
Researchers from Baidu’s Silicon Valley AI Lab (SVAIL) have adapted a well-known HPC communication technique to boost the speed and scale of their neural network training and now they are sharing their implementation with the larger deep learning community.
The technique, a modified version of the OpenMPI algorithm “ring all-reduce,” is being used at Baidu to parallelize the training of their speech recognition model, Deep Speech 2, across many GPU nodes. The two pieces of software Baidu is announcing today are the baidu-allreduce C library, as well as a patch for TensorFlow, which allows people who have already modeled in TensorFlow to compile this new version and use it for parallelizing across many devices. The codes are available on GitHub.Ring all-reduce – all GPUs send data simultaneously
Baidu’s SVAIL team developed the approach about two years ago for their internal deep learning framework, named Gene and Majel (in tribute to the famous Star Trek creator and the actress who voiced the onboard computer interfaces for the series). The technique is commonplace in HPC circles, but underused within artificial intelligence and deep learning, according to Baidu.
Many of the researchers in the SVAIL group had come from the high performance computing space and recognized the competitive edge it offered.
“The algorithm is actually part of OpenMPI, but the OpenMPI implementation is not as fast,” comments Baidu Research Scientist Shubho Sengupta. “So the way we stumbled upon it was we started using OpenMPI for doing training and we realized it was not scaling to the extent that we want it to scale. I started digging through the OpenMPI source, found the algorithm, saw that it’s not very efficient, and reimplemented it.”
The SVAIL researchers wrote their own implementation of the ring algorithm for higher performance and better stability. The key distinction from the OpenMPI version is that the SVAIL implementation avoids extraneous copies between the CPU and GPU.
Explains Sengupta, “Once OpenMPI does the communication of these matrices, if the matrices are in GPU memory, it actually copies to CPU memory to do the reduction part of it – that’s actually quite wasteful. You don’t really need to do a copy, you could just write a small kernel that does the reduction in GPU memory space itself. And this especially helps when you are doing all-reduce within a node and all the GPUs are within a PCI root complex, then it doesn’t do any of the copies actually – it can just do everything in GPU memory space. This very simple idea of eliminating this copy resulted in this speedup in scaling over OpenMPI’s own implementation.”
Employing this algorithm along with SVAIL’s focus on fast networking (InfiniBand) and careful hardware-software codesign has enabled the team to get linear GPU scaling up to 128 GPUs, an achievement that was detailed in their December 2015 paper, “Deep Speech 2: End-to-End Speech Recognition in English and Mandarin.”
With their internal implementation of ring all-reduce, the team achieves between a 2.3-21.4X speedup over OpenMPI (version 1.8.5) depending on the number of GPUs.
Sengupta notes that their implementation is fastest for a small number of GPUs. “At 8 GPUs it’s about 20x faster, then as you increase the number of GPUs, it drops because now you actually have to copy data to the CPU to send across the network. But for the internal framework, we can scale all the way up to 128 GPUs and get linear scaling.”Comparison of two different all-reduce implementations. All times are in seconds. Performance gain is the ratio of OpenMPI all-reduce time to SVAIL’s all-reduce time. (Source: Deep Speech 2 paper)
Sengupta’s teammate Baidu Research Scientist Andrew Gibiansky says similar benefits can now be seen with TensorFlow: “In terms of the TensorFlow implementation, we get the same linear scaling path past eight. In terms of a comparison with running on a single GPU, it ends up being about 31x faster at 40 GPUs.”
After the Deep Speech 2 paper was published, the SVAIL team began getting requests from the community who wanted to know more about the implementation. Given that the algorithm is pretty tightly coupled to SVAIL’s proprietary deep learning framework, they needed to come up with a different way to release it, so they created two new implementations, one specifically for TensorFlow and one that is more general.
Gibiansky, who led the work on the TensorFlow patch, describes their multi-pronged approach to disseminating the information. “You can read the blog post [for a thorough technical explanation] and figure it out. If you’re using TensorFlow, you can use our modification to train your own models with this. And if you’re a deep learning author, you can look at our C library and integrate that. The goal is really to take this idea we’ve found to be really successful internally and try to start spreading it so that other people can also take advantage of it.”
Sengupta shares an interesting perspective on the opportunities to be mined for deep learning within HPC.
“With MPI – people [in deep learning] think that it is this old technology, that it is not relevant, but I think because of our work we have shown that you can build very fast collectives using MPI and that allows you to do synchronous gradient descent which converges faster, gives you deterministic results and you don’t need to do asynchronous gradient descent with parameter servers which was the dominant way of doing this when we first started,” says Sengupta.
As for the reduced-copy approach propagating back to MPI, Gibiansky notes that if you look at some of the other MPI implementations, they’re slowly moving their collectives to GPU versions. “MVPICH recently introduced an all-gather that doesn’t end up copying to CPU – so OpenMPI will probably get there, it just might take a while. Potentially giving this a little more visibility, we can spur that on.”
“There’s a lot of interest now in collectives and one thing we also realized is the all-reduce operation used in traditional HPC setups, it actually transfers data that’s actually not very large,” Sengupta adds. “What it usually does, when I talk to HPC people, it’s trying to figure out the status of something across a bunch of machines – while in deep learning we are transferring these large matrices – like 2048×2048, essentially 4 million 32-bit floating points. For the traditional HPC community, this is a very atypical input for all-reduce. The traditional HPC community does not actually use all-reduce with really large data sizes. I think with deep learning, more and more people are realizing that collective operations for really large matrices is also very important.”
A detailed explanation of ring all-reduce and Baidu’s GPU implementation is covered in this technical blog post, published today by Baidu Research. A variant of the technique is also used to provide high-performance node-local scaling for PaddlePaddle, the company’s open source deep learning framework.
So the exascale race is on. And lots of organizations are in the pack. Government announcements from the US, China, India, Japan, and the EU indicate that they are working hard to make it happen – some sooner, some later. Meanwhile commercial concerns, either working in conjunction with a government partner, or going it alone, are also looking at similar development agendas. So it’s a pretty safe bet that someone will stand up an exascale system in the next few years. Some will celebrate while others will gnash their teeth. So while we can all agree that there is an exascale race, based on a look at the various visions for such a system, most seem to have their own idea of where the finish line is.
In the past when the HPC community went through its periodic assault on computer performance expressed in scientific notation (gflops, tflops, pflops, etc.), HPC systems had relatively narrow use cases, and computation capability – the ability to churn floating point operations was the undisputed king of performance metrics. That’s what made benchmarks like the Livermore Loops and later the more enduring LINPACK metric – and its related TOP500 list – widely adopted, frequently executed, and frankly over quoted. But at least for most of these earlier generations, the benchmarks were grounded in a broad base of typical HPC workloads. And while the race may not have always gone to the swift, everyone was at least running in the same direction.Bob Sorensen, IDC
For the exascale race, that is simply no longer the case. The profusion of new and evolving HPC use cases, applications, and related architectures that are all being collectively jammed under the exascale umbrella makes that impossible. A few examples here should suffice:
High performance data analytics is a legitimate and growing segment of the HPC universe, but there the rapid growth of data sets, the addition of new unstructured data such as voice, video, and IoT input, and the need for real-time analytics clearly trump pure processor speed. Performance metrics for HPDA applications abound, but the most relevant simply do not pin their hopes on floating point rates. In addition, an interesting force behind the evolution of HPDA development will be growing legions of traditional – and decidedly non-HPC – business analytics users who are being pushed into the HPDA realm due to the spate of new business opportunities in this space. This group has little interest in the future of HPC as a technological driver and simply wants whatever solves the problem.
Likewise, deep learning is one of the latest and most promising HPC use cases, but there, training applications typically involve long but relatively straightforward computations – indeed often using only 16-bit floating point – to extract insights from data. These systems rely on more simple but high-core count processors with less rigorous capabilities in memory and bandwidth. As such, deep learning systems are well suited to win their version of the exascale race, but it is clear that such systems will not become the sine qua non for all exascale applications.
Large servers that sit in hyperscale data centers likely will also soon be able to lay claim to being exascale systems in that they possess the aggregate processing power and necessary network capability to meet the definition. This is an undeniable effect of the use of mass clustered COT-like systems to achieve high computational capability, but in most cases it is safe to conclude that these exascale systems will not be used as a single user asset but instead be routinely partitioned across very many users: they may benchmark as exascale, but they will not be used as exascale.
Even within the traditional HPC modeling and simulation sector, users are increasingly turning to more sophisticated metrics for what they want out of an exascale HPC, such as efficiency (flops/watt), or data center space requirements (flops/rack). Likewise, more and more exascale plans cite the need for new machines to achieve not peak exascale, but sustained exascale on typical workloads, and even specific improvements in time to solution for an existing suite of applications. The intent of many of these new progressive requirements is to push the emphasis of exascale architectures away from pure computational performance and to instead highlight the need for a more comprehensive hardware and software ecosystem that can underwrite an effective exascale workflow. In such environments, the exascale goal means more about overall system solution value and utility than any single hardware or software metric.
Because there are so many paths to an exascale system, the field will over the next few years be filled with announcements of a first, second, and eventually an nth exascale rollout. Careful examination of what finish line each machine crossed can only help to provide insights as to what the true value of that system is and how effectively it adds to the body of knowledge within the HPC world. Each new system will no doubt serve a valuable function in its own right, but it is clear that the sector has moved beyond a one size fits all mentality, and that decision makers – both within the commercial and government sectors – need to remember this when making plans about new HPC developments, at least until the next triple in scientific notation comes along. That would be zettaflops.
HUNTSVILLE, Ala., Feb. 21 — Abaco Systems today announced the innovative SWE540 6U OpenVPX 40 Gigabit Ethernet switch. It provides high speed Ethernet connectivity for Abaco’s latest generation of high performance computing solutions such as the SBC627 single board computer and the recently-announced DSP282A digital signal processor. This enables the company to deliver significant performance upgrades for its growing range of rapidly-deployable, complete, pre-integrated mission ready systems. These solutions target high performance embedded computing (HPEC) applications requiring the transfer of large amounts of data with the lowest possible latency such as radar, surveillance, situational awareness and imaging.
The rugged SWE540 is uniquely powerful and flexible, representing the only 6U OpenVPX 40GigE switch currently available that supports full Layer 2/3 features including hardware Layer 3 forwarding at fabric speed rates. Layer 3 switching and routing is a critical requirement for advanced security and complex networks. It provides dynamic routing over standard routing protocols, enabling a flexible range of network/fabric configurations and applications.
Superior patent-pending cooling technology limits system thermal load while still enabling the SWE540 to run at peak performance, further enhancing its robust reliability.
The SWE540 supports multiple OpenVPX profiles and uses the latest high performance switch silicon technology to support 40 Gigabit Ethernet performance across 20 ports, allowing configuration within demanding HPEC systems and a broad range of Abaco’s mission ready systems.
It also provides a straightforward, cost-effective upgrade for existing users of Abaco’s GBX460 switch, maximizing the long term value of customers’ investments while enabling a significant performance increase for those looking to achieve faster transfers of large amounts of data with lower latencies.
The switch features Abaco’s latest OpenWare switch management software, allowing it to be easily customized for specific customer requirements. OpenWare provides support for a wide range of network protocols and MIBs (management information bases) with extensive capabilities for multicast, Quality of Service, VLANs, and differentiated services. The OpenWare management interface may be accessed via serial console, SNMP, Telnet, SSH or web interface.
The combination of the SWE540’s hardware and the OpenWare switch management software delivers comprehensive security capabilities. Designed for deployment in security-sensitive mission critical applications, SWE540 features include denial of service attack prevention, user password mechanisms with multiple levels of security and military level authorization schemes including 802.1X and sanitization to allow the overwrite of non-volatile storage if a system is compromised. Survivability is also enhanced by ECC protection on the management processor memory which offers higher reliability in harsh environments.
“The SWE540 is representative of over 20 years of Abaco’s leadership in network switch design and network engineering for mission critical applications and will enable us to deliver a step-change in performance for Abaco’s mission ready systems and customer-built systems alike,” said Mrinal Iyengar, VP, Product Management, Abaco Systems. “Other 40 Gigabit Ethernet solutions offer limited routing capability – either by not offering L3 routing at all, or by limiting it to static routing. This provides far less flexibility, limiting the use to simpler, fixed networks. The SWE540 provides L3 forwarding in the switch fabric, supported by routing protocols such as RIPv3, OSPF and so on – making it ideally suited for networks that are more complex, security sensitive, or that will be required scale and be upgraded over time.”
The SWE540 is available in both air-cooled and conduction-cooled versions, and can optionally support four QSFP+ and two 1000BaseT ports on the front panel. Rear transition modules are available to enable access to 40 Gigabit and one Gigabit ports off the backplane.
About Abaco Systems
With more than 30 years of experience, Abaco Systems is the global leader in open architecture rugged embedded mission ready systems. We deliver market-leading commercial off-the-shelf and custom products, together with best in class program lifecycle management. This, together with our 800+ professionals’ unwavering focus on our customers’ success, reduces program cost and risk, allows technology insertion with affordable readiness and enables platforms to successfully reach deployment sooner and with a lower total cost of ownership. With an active presence in hundreds of national asset platforms on land, sea and in the air, Abaco Systems is trusted where it matters most. www.abaco.com
Source: Abaco Systems
Feb. 21 — Xcelerit, the leading providers of acceleration solutions for Quantitative Finance, engineering and research, have added yet another new architecture to its expanding portfolio of processor support. The new Nvidia Tesla P100 GPU accelerator delivers 5 teraflops of double precision arithmetic – an unprecedented level of computing power that will enable new applications in machine learning, quant finance and to the supercomputing community in general. Accessing this awesome power normally requires specialist expert programming to handle the data transfers, threads and synchronisation, memory access, and register usage. The Xcelerit SDK has been making it easier for programmers to access this power over a succession of GPU architectures as well as more mainstream systems like many-core CPUs. Hicham Lahlou, Xcelerit’s CEO, is enthusiastic about the P100 – “Our customers always want the latest, fastest hardware and for those that have used Xcelerit since the beginning, they can now painlessly move their code from CPU to the P100 GPU and back with no changes required.”
The Xcelerit SDK works by allowing users to quickly identify the compute intensive parts of their C++ code and augment them with some simple programming model to reveal the hidden potential for parallelism. Once this is done, the SDK automatically takes care of mapping the code to any of the supported architectures and looks after scheduling the tasks to take maximum advantage of the underlying hardware. “The SDK is really quite adaptable,” says Lahlou, “we have been able to tailor it to very many processor architectures, instruction sets, memory configurations and inter-connects to squeeze every drop of performance out of the underlying hardware.” Lahlou feels that this capability may become even more essential over the coming years. “We are seeing a healthy diversity in processor designs coming from manufacturers such as Intel, Nvidia, Qualcomm and others – making performance code work super-efficiently across all of those platforms will keep us on our toes over the coming 12-18 months,” he said.
Xcelerit is a leading provider of acceleration solutions for Quantitative Finance, engineering, and research. Our portfolio of solutions addresses a range of acceleration challenges from algorithmic optimisations to software acceleration.
Xcelerit has received recognition as a finalist in the Red Herring Europe Top 100 award, the Red Herring Top 100 Global award, and a two-time winner of HPCwire’s “Best use of High Performance Computing in Financial Services” award. Our satisfied customers include the leading firms in investment banking, asset management, and insurance. For more information, please visit www.xcelerit.com.
The post Xcelerit Adds New Architecture to Portfolio of Processor Support appeared first on HPCwire.
The partnership will see Hammer adding Spectra Logic’s high-capacity workflow, tape and disk-based products to its portfolio and will allow Spectra Logic to strengthen its position in key backup and archive markets. By partnering with Hammer, Spectra will grow its enterprise-level reseller customer base.
Jason Beeson, Hammer’s Commercial Director, said: “This is an excellent opportunity to increase our high-performance computing offering to our partners and customers. By adding Spectra Logic’s bespoke data workflow storage solutions we can reach a whole new genre of highly data-dependent users who are seeking a complete data workflow, from input and day-to-day use right through to deep storage and archiving.”
Spectra Logic’s integrated tape and disk products have become the prevailing standard for those sectors challenged with storing, managing and accessing massive amounts of data and where high-performance computing is mission critical; sectors such as media and entertainment, scientific research, healthcare and financial services.
As Spectra Logic’s object-based storage systems links directly to the public cloud, this distribution agreement will also enable Hammer to enhance its current cloud portfolio.
At the centre of Spectra Logic’s hybrid storage ecosystem lies the Spectra BlackPearl Deep Storage Gateway, which enables users easily to store large data sets forever at virtually no cost. It provides a single interface into deep storage using cloud protocols. The Spectra Logic product family delivers the industry’s best combination of high-density, scalable storage, designed for superior performance and capacity and includes tape libraries, such as Spectra TFinity and the Spectra T950, as well as disk products, featuring the Spectra Verde and Spectra ArcticBlue Disk Solutions.
Brian Grainger, Chief Sales Officer at Spectra Logic, said: “We’ve seen a major increase in demand for high-capacity, deep storage solutions, which have become essential to a wide range of businesses grappling with the challenges of storing, managing and accessing data while dealing with the rapidly evolving mandates in Europe that are driving changes in storage requirements.
“With Hammer’s help, we aim to reach more businesses that can benefit from our product range. We are keen to work with Hammer because it is among the most specialised storage distributors in Europe and has a range of products which are complementary to our own.”
Gerard Marlow, General Manager – OEM & Whitebox Storage at Hammer, said: “For 25 years, Hammer has added value at every opportunity to ensure our channel of resellers meet and exceed their customer expectations. To achieve this we constantly seek to add leading and innovative products to our portfolio and the range of deep storage products from Spectra Logic will meet this objective.”
Spectra Logic is the third company this year to join Hammer’s portfolio of world-class vendors, following Samsung Semiconductors and Huawei.
Source: Spectra Logic
The post Spectra Logic, Hammer Announce EMEA-Wide Distribution Deal appeared first on HPCwire.