Schrödinger Transforming Drug Discovery with GPU-Powered Platform 

The pharmaceutical industry has grown accustomed to investing billions of dollars to bring drugs to market, only to watch 90 percent of them fail even before clinical trials.

The problem is, and always has been, there’s simply not enough compute power in the world to accurately assess the properties of all possible molecules, nor to support the extensive experimental efforts needed in drug discovery.

“There may be more potential drug compounds than there are atoms in the universe,” said Patrick Lorton, chief technology officer at Schrödinger, the New York-based developer of a physics-based software platform designed to model and compute the properties of novel molecules for the pharma and materials industries.

“If you look at a billion molecules and you say there’s no good drug here, it’s the same as looking at a drop of water in the ocean and saying fish don’t exist,” he said.

Fresh off of a successful IPO earlier this year, Schrödinger has devoted decades to refining computational algorithms to accurately compute important properties of molecules. The company uses NVIDIA GPUs to generate and evaluate petabytes of data to accelerate drug discovery, which is a dramatic improvement over the traditional process of slow and expensive lab work.

The company works with all 20 of the biggest biopharma companies in the world, several of which have standardized on Schrödinger’s platform as a key component of preclinical research.

The COVID-19 pandemic highlights the need for a more efficient and effective drug discovery process. To that end, the company has joined the global COVID R&D alliance to offer resources and collaborate. Recently, Google Cloud has also thrown its weight behind this alliance, donating over 16 million hours of NVIDIA GPU time to hunt for a cure.

“We hope to develop an antiviral therapeutic for SARS-CoV-2, the virus that causes COVID-19, in time to have treatments available for future waves of the pandemic,” Lorton said.

Advanced Simulation Software

The pharmaceutical industry has long depended on manually intensive physical processes to find new therapeutics. This allowed it to develop many important remedies over the last 50 years, but only through a laborious trial-and-error approach, Lorton said.

He makes the comparison to airplane manufacturers, which formerly carved airplane designs out of balsa wood and tested their drag coefficient in wind tunnels. They now rely on advanced simulation software that reduces the time and resources needed to test designs.

With the pharmaceutical industry traditionally using the equivalent of balsa, Schrödinger’s drug discovery platform has become a game changer.

“We’re trying to make preclinical drug discovery more efficient,” said Lorton. “This will enable the industry to treat more diseases and help more conditions.”

Exploring New Space

For more than a decade, every major pharmaceutical company has been using Schrödinger’s software, which can perform physics simulations down to the atomic level. For each potential drug candidate, Schrödinger uses recently developed physics-based computational approaches to calculate as many as 3,000 possible compounds. This requires up to 12,000 GPU hours on high-performance computers.

Once the physics-based calculations are completed for the original set of randomly selected compounds, a layer of active learning is applied, making projections on the probable efficacy of a billion molecules.

Lorton said it currently takes four or five iterations to get a machine-learning algorithm accurate enough to be predictive, though even these projections are always double-checked with the physics-based methods before synthesizing any molecules in the lab.

This software-based approach yields much faster results, but that’s only part of the value. It also greatly expands the scope of analysis, evaluating data that human beings never would have had time to address.

“The thing that is most compelling is exploring new space,” said Lorton. “It’s not just being cheaper. It’s being cheaper and finding things you would have otherwise not explored.”

For that reason, Schrödinger’s work focuses on modeling and simulation, and using the latest high performance computing resources to expand its discovery capabilities.

Bayer Proving Platform’s Value

One customer that’s been putting Schrödinger’s technology to use is Bayer AG. Schrödinger software has been helping Bayer scientists find lead structures for several drug discovery projects, ultimately contributing to clinical development candidates.

Recently both companies agreed to co-develop a novel drug discovery platform to accelerate the process of estimating the binding affinity, as well as other properties, and synthesizability of small molecules.

Bayer can’t yet share any specific results that the platform has delivered, but Dr. Alexander Hillisch, the company’s head of computational drug design, said it’s had an impact on several active projects.

Dr. Hillisch said that the software is expected to speed up work and effectively widen Bayer’s drug-discovery capabilities. As a result, he believes it’s time for NVIDIA GPUs to get a lot more recognition within the industry.

In a typical drug discovery project, Bayer evaluates binding affinities and other properties of molecules such as absorption and metabolic stability. With Schrödinger software and NVIDIA GPUs, “we’re enumerating millions to billions of virtual compounds and are thus scanning the chemical space much more broadly than we did before, in order to identify novel lead compounds with favorable properties,” he said.

Dr. Hillisch also suggested that the impact of holistic digital drug discovery approaches can soon be judged. “We expect to know how substantial the impact of this scientific approach will be in the near future,” he said.

The drug design platform also will be part of Bayer’s work on COVID-19. The company spun off its antiviral research into a separate company in 2006, but it recently joined a European coronavirus research initiative to help identify novel compounds that could provide future treatment.

Tailor-Made Task for GPUs

Given the scope of Schrödinger’s task, Lorton made it clear that NVIDIA’s advances in developing a full-stack computing platform for HPC and AI that pushes the boundaries of performance have been as important to his company’s accomplishments as its painstaking algorithmic and scientific work.

“It could take thousands, or tens of thousands, or in some crazy case, even hundreds of thousands of dollars to synthesize and get the binding affinity of a drug molecule,” he said. “We can do it for a few dollars of compute costs on a GPU.”

Lorton said that if the company had started one of its physics calculations on a single CPU when it was founded in 1990, it would have taken until today to reach the conclusions that a single GPU can now deliver in less than an hour.

Even with the many breakthroughs in compute speed on NVIDIA GPUs, Schrödinger’s discovery projects require thousands of NVIDIA T4 and V100 Tensor Core GPUs every day, both on premises and on the Google Cloud Platform. It’s this next level of compute, combined with continued investment in the underlying science, that the company hopes will change the way all drug discovery is done.

The post Schrödinger Transforming Drug Discovery with GPU-Powered Platform  appeared first on The Official NVIDIA Blog.

Making Spark Fly: NVIDIA Accelerates World’s Most Popular Data Analytics Platform

The world’s most popular data analytics application, Apache Spark, now offers revolutionary GPU acceleration to its more than half a million users through the general availability release of Spark 3.0.

Databricks provides the leading cloud-based enterprise Spark platform, run on over a million virtual machines every day. At the Spark + AI Summit today, Databricks announced that Databricks Runtime 7.0 for Machine Learning features GPU-accelerator aware scheduling with Spark 3.0, developed in collaboration with NVIDIA and other community members.

Google Cloud recently announced the availability of a Spark 3.0 preview on Dataproc image version 2.0, noting the powerful NVIDIA GPU acceleration that’s now possible thanks to the collaboration of the open source community. We’ll be hosting a webinar with Google Cloud on July 16 to dive into these exciting new capabilities for data scientists.

In addition, the new open source RAPIDS Accelerator for Apache Spark is now available to accelerate ETL (extract, transform, load) and data transfers to boost analytics performance from end to end, without any code changes.

Faster performance on Spark not only means faster insights, but also reduced costs since enterprises can complete workloads using less infrastructure.

Accelerated Data Analytics: Scientific Computing Makes Sense of AI

Spark is increasingly in the news for good reason.

Data is essential to helping organizations navigate shifting opportunities and possible threats. But to do so, they need to decipher the critical clues hidden in their data.

Organizations add to their heaps of information every time a customer clicks on a website, hosts a call with customer support or generates a daily sales report. With the rise of AI, data analytics has become critical to helping companies spot trends and stay ahead of changing markets.

Until recently, data analytics has relied on small datasets to glean historical data and insights. This data was analyzed through ETL on highly structured data, stored in traditional data warehouses.

ETL often becomes a bottleneck for data scientists working on AI-based predictions and recommendations. Estimated to take up 70-90 percent of a data scientist’s time, ETL slows down workflows and ties up sought-after talent on the most mundane part of their work.

When a data scientist is waiting for ETL, they’re not retraining their models to gain better business intelligence. Traditional CPU infrastructure can’t scale efficiently to accommodate these workloads, which often causes costs to balloon.

With GPU-accelerated Spark, ETL no longer spells trouble. Industries such as healthcare, entertainment, energy, finance, retail and many others can now cost-effectively accelerate their data analytics insights.

The Power of Parallel Processing for Data Analytics

GPU parallel processing allows computers to work on multiple operations at a time. In a data center, these capabilities scale out massively to support complex data analytics projects. With more organizations leveraging AI and machine learning tools, parallel processing has become critical for accelerating data-heavy analytics and the ETL pipelines that drive these workloads.

Consider a retailer seeking to predict what to stock for next season. It would need to examine recent sales as well as last year’s data. A savvy data scientist might add weather models to this analysis to see what impact a wet or dry season would have on the results. They may also integrate sentiment analysis data to assess what trends are most popular this year.

With so many sources of data to analyze, speed is critical to modeling the impact that different variables might have on sales. This is where analytics moves into machine learning, and where GPUs become essential.

RAPIDS Accelerator Supercharges Apache Spark 3.0

As data scientists shift from using traditional analytics to AI applications that better model complex market demands, CPU-based processing can’t keep up without compromising either speed or cost. The growing adoption of AI in analytics has created the need for a new framework to process data quickly and cost-efficiently with GPUs.

The new RAPIDS Accelerator for Apache Spark connects the Spark distributed computing framework to the powerful RAPIDS cuDF library to enable GPU acceleration of Spark DataFrame and Spark SQL operations. The RAPIDS Accelerator also speeds up Spark Shuffle operations by finding the fastest path to move data between Spark nodes.

Visit the GitHub page to access the RAPIDS Accelerator for Apache Spark.

Watch Spark 3.0 sprint on GPUs in this video demo:

To learn more about the Spark 3.0 release, visit the Apache Software Foundation.

Data scientists can learn more about Spark 3.0 in our free Spark 3.0 e-book.

The post Making Spark Fly: NVIDIA Accelerates World’s Most Popular Data Analytics Platform appeared first on The Official NVIDIA Blog.

Quantum of Solace: Research Seeks Atomic Keys to Lock Down COVID-19

Anshuman Kumar is sharpening a digital pencil to penetrate the secrets of the coronavirus.

He and colleagues at the University of California at Riverside want to calculate atomic interactions at a scale never before attempted for the virus. If they succeed, they’ll get a glimpse into how a molecule on the virus binds to a molecule of a drug, preventing it from infecting healthy cells.

Kumar is part of a team at UCR taking work in the tiny world of quantum mechanics to a new level. They aim to measure a so-called barrier height, a measure of the energy required to interact with a viral protein that consists of about 5,000 atoms.

That’s more than 10x the state of the art in the field, which to date has calculated forces of molecules with up to a few hundred atoms.

Accelerating Anti-COVID Drug Discovery

Data on how quantum forces determine the likelihood a virus will bind with a neutralizing molecule, called a ligand, could speed work at pharmaceutical companies seeking drugs to prevent COVID-19.

“At the atomic level, Newtonian forces become irrelevant, so you have to use quantum mechanics because that’s the way nature works,” said Bryan Wong, an associate professor of chemical engineering, materials science and physics at UCR who oversees the project. “We aim to make these calculations fast and efficient with NVIDIA GPUs in Microsoft’s Azure cloud to narrow down our path to a solution.”

Researchers started their work in late April using a protein on the coronavirus believed to play a strong role in rapidly infecting healthy cells. They’re now finishing up a series of preliminary calculations that take up to 10 days each.

The next step, discovering the barrier height, involves even more complex and time-consuming calculations. They could take as long as five weeks for a single protein/ligand pair.

Calling on GPUs in the Azure Cloud

To accelerate time to results, the team got a grant from Microsoft’s AI for Health program through the COVID-19 High Performance Computing Consortium. It included high performance computing on Microsoft’s Azure cloud and assistance from NVIDIA.

Kumar implemented a GPU-accelerated version of the scientific program that handles the quantum calculations. It already runs on the university’s NVIDIA GPU-powered cluster on premises, but the team wanted to move it to the cloud where it could run on V100 Tensor Core GPUs.

In less than a day, Kumar was able to migrate the program to Azure with help from NVIDIA solutions architect Scott McMillan using HPC Container Maker, an open source tool created and maintained by NVIDIA. The tool lets users define a container with a few clicks that identify a program and its key components such as a runtime environment and other dependencies.

Anshuman Kumar used an open source program developed by NVIDIA to move UCR’s software to the latest GPUs in the Microsoft Azure cloud.

It was a big move given the researchers had never used containers or cloud services before.

“The process is very smooth once you identify the correct libraries and dependencies — you just write a script and build the code image,” said Kumar. “After doing this, we got 2-10x speedups on GPUs on Azure compared to our local system,” he added.

NVIDIA helped fine-tune the performance by making sure the code used the latest versions of CUDA and the Magma math library. One specialist dug deep in the stack to update a routine that enabled multi-GPU scaling.

New Teammates and a Mascot

The team got some unexpected help recently when it discovered a separate computational biology lab at UCR also won a grant from the HPC consortium to work on COVID. The lab observes the binding process using statistical sampling techniques to make bindings that are otherwise rare occur more often.

“I reached out to them because pairing up makes for a better project,” said Wong. “They can use the GPU code Anshuman implemented for their enhanced sampling work,” he added.

“I’m really proud to be part of this work because it could help the whole world,” said Kumar.

The team also recently got a mascot. A large squirrel, dubbed Billy, now sits daily outside the window of Wong’s home office, a good symbol for the group’s aim to be fast and agile.

Pictured above: Colorful ribbons represent the Mpro protein believed to have an important role in the replication of the coronavirus. Red strands represent a biological molecule that binds to a ligand. (Image courtesy of UCR.)

The post Quantum of Solace: Research Seeks Atomic Keys to Lock Down COVID-19 appeared first on The Official NVIDIA Blog.

Green Light! TOP500 Speeds Up, Saves Energy with NVIDIA

The new ranking of the TOP500 supercomputers paints a picture of modern scientific computing, expanded with AI and data analytics, and accelerated with NVIDIA technologies.

Eight of the world’s top 10 supercomputers now use NVIDIA GPUs, InfiniBand networking or both. They include the most powerful systems in the U.S., Europe and China.

NVIDIA, now combined with Mellanox, powers two-thirds (333) of the overall TOP500 systems on the latest list, up dramatically from less than half (203) for the two separate companies combined on the June 2017 list.

Nearly three-quarters (73 percent) of the new InfiniBand systems on the list adopted NVIDIA Mellanox HDR 200G InfiniBand, demonstrating the rapid embrace of the latest data rates for smart interconnects.

The number of TOP500 systems using HDR InfiniBand nearly doubled since the November 2019 list. Overall, InfiniBand appears in 141 supercomputers on the list, up 12 percent since June 2019.

A rising number of TOP500 systems are adopting NVIDIA GPUs, its Mellanox networking or both.

 

NVIDIA Mellanox InfiniBand and Ethernet networks connect 305 systems (61 percent) of the TOP500 supercomputers, including all of the 141 InfiniBand systems, and 164 (63 percent) of the systems using Ethernet.

In energy efficiency, the systems using NVIDIA GPUs are pulling away from the pack. On average, they’re now 2.8x more power-efficient than systems without NVIDIA GPUs, measured in gigaflops/watt.

That’s one reason why NVIDIA GPUs are now used by 20 of the top 25 supercomputers on the TOP500 list.

The best example of this power efficiency is Selene (pictured above), the latest addition to NVIDIA’s internal research cluster. The system was No. 2 on the latest Green500 list and No. 7 on the overall TOP500 at 27.5 petaflops on the Linpack benchmark.

At 20.5 gigaflops/watt, Selene is within a fraction of a point from the top spot on the Green500 list, claimed by a much smaller system that ranked No. 394 by performance.

Selene is the only top 100 system to crack the 20 gigaflops/watt barrier. It’s also the second most powerful industrial supercomputer in the world behind the No. 6 system from energy giant Eni S.p.A. of Italy, which also uses NVIDIA GPUs.

NVIDIA GPUs are powering gains in energy efficiency for the TOP500 supercomputers.

In energy use, Selene is 6.8x more efficient than the average TOP500 system not using NVIDIA GPUs. Selene’s performance and energy efficiency are thanks to third-generation Tensor Cores in NVIDIA A100 GPUs that speed up both traditional 64-bit math for simulations and lower precision work for AI.

Selene’s rankings are an impressive feat for a system that took less than four weeks to build. Engineers were able to assemble Selene quickly because they used NVIDIA’s modular reference architecture.

The guide defines what NVIDIA calls a DGX SuperPOD. It’s based on a powerful, yet flexible building block for modern data centers: the NVIDIA DGX A100 system.

The DGX A100 is an agile system, available today, that packs eight A100 GPUs in a 6U server with NVIDIA Mellanox HDR InfiniBand networking. It was created to accelerate a rich mix of high performance computing, data analytics and AI jobs — including training and inference — and to be fast to deploy.

Scaling from Systems to SuperPODs

With the reference design, any organization can quickly set up a world-class computing cluster. It shows how 20 DGX A100 systems can be linked in Lego-like fashion using high-performance NVIDIA Mellanox InfiniBand switches.

InfiniBand now accelerates seven of the top 10 supercomputers, including the most powerful systems in China, Europe and the U.S.

Four operators can rack a 20-system DGX A100 cluster in as little as an hour, creating a 2-petaflops system powerful enough to appear on the TOP500 list. Such systems are designed to run comfortably within the power and thermal capabilities of standard data centers.

By adding an additional layer of NVIDIA Mellanox InfiniBand switches, engineers linked 14 of these 20-system units to create Selene, which sports:

  • 280 DGX A100 systems
  • 2,240 NVIDIA A100 GPUs
  • 494 NVIDIA Mellanox Quantum 200G InfiniBand switches
  • 56 TB/s network fabric
  • 7PB of high-performance all-flash storage

One of Selene’s most significant specs is it can deliver more than 1 exaflops of AI performance. Another is Selene set a new record using just 16 of its DGX A100 systems  a key data analytics benchmark — called TPCx-BB — delivering 20x greater performance than any other system.

These results are critical at a time when AI and analytics are becoming part of the new requirements for scientific computing.

Around the world, researchers are using deep learning and data analytics to predict the most fruitful areas for conducting experiments. The approach reduces the number of costly and time-consuming experiments researchers require, accelerating scientific results.

For example, six systems not yet on the TOP500 list are being built today with the A100 GPUs NVIDIA launched last month. They’ll accelerate a blend of HPC and AI that’s defining a new era in science.

TOP500 Expands Canvas for Scientific Computing

One of those systems is at Argonne National Laboratory, where researchers will use a cluster of 24 NVIDIA DGX A100 systems to scan billions of drugs in the search for treatments for COVID-19.

“Much of this work is hard to simulate on a computer, so we use AI to intelligently guide where and when we will sample next,” said Arvind Ramanathan, a computational biologist at Argonne, in a report on the first users of A100 GPUs.

AI, data analytics and edge streaming are redefining scientific computing.

For its part, NERSC (the U.S. National Energy Research Scientific Computing Center) is embracing AI for several projects targeting Perlmutter, its pre-exascale system packing 6,200 A100 GPUs.

For example, one project will use reinforcement learning to control light source experiments, and one will apply generative models to reproduce expensive simulations at high-energy physics detectors.

Researchers in Munich are training natural-language models on 6,000 GPUs on the Summit supercomputer to speed the analysis of coronavirus proteins. It’s another sign that leading TOP500 systems are extending beyond traditional simulations run with double-precision math..

As scientists expand into deep learning and analytics, they’re also tapping into cloud computing services and even streaming data from remote instruments at the edge of the network. Together these elements form four pillars of modern scientific computing that NVIDIA accelerates:

It’s part of a broader trend where both researchers and enterprises are seeking acceleration for AI and analytics from the cloud to the network’s edge. That’s why the world’s largest cloud service providers along with the world’s top OEMs are adopting NVIDIA GPUs.

In this way, the latest TOP500 list reflects NVIDIA’s efforts to democratize AI and HPC. Any company that wants to build leadership computing capabilities can access NVIDIA technologies such as DGX systems that power the world’s most powerful systems.

Finally, NVIDIA congratulates the engineers behind the Fugaku supercomputer in Japan for taking the No. 1 spot, showing Arm is becoming more real and now a viable option in high performance computing. That’s one reason why NVIDIA announced a year ago it’s making its CUDA accelerated computing software available on the Arm processor architecture.

The post Green Light! TOP500 Speeds Up, Saves Energy with NVIDIA appeared first on The Official NVIDIA Blog.

Fighting COVID-19 in New Era of Scientific Computing

Scientists and researchers around the world are racing to find a cure for COVID-19.

That’s made the work of all those digitally gathered for this week’s high performance computing conference, ISC 2020 Digital, more vital than ever.

And the work of these researchers is broadening to encompass a wider range of approaches than ever.

The NVIDIA scientific computing platform plays a vital role, accelerating progress across this entire spectrum of approaches — from data analytics to simulation and visualization to AI to edge processing.

Some highlights:

  • In genomics, Oxford Nanopore Technologies was able to sequence the virus genome in just 7 hours using our GPUs.
  • In infection analysis and prediction, the NVIDIA RAPIDS team has GPU-accelerated Plotly’s Dash, a data visualization tool, enabling clearer insights into real-time infection rate analysis.
  • In structural biology, the U.S. National Institutes of Health and the University of Texas, Austin, are using GPU-accelerated software CryoSPARC to reconstruct the first 3D structure of the virus protein using cryogenic electron microscopy.
  • In treatment, NVIDIA worked with the National Institutes of Health and built an AI to accurately classify COVID-19 infection based on lung scans so efficient treatment plans can be devised.
  • In drug discovery, Oak Ridge National Laboratory ran the Scripps Research Institute’s AutoDock on the GPU accelerated Summit Supercomputer to screen a billion potential drug combinations in just 12 hours.

  • In robotics, startup Kiwi is building robots to deliver medical supplies autonomously.
  • And in edge detection, Whiteboard Coordinator Inc. built an AI system to automatically measure and screen elevated body temperatures, screening well over 2,000 healthcare workers per hour.

It’s truly inspirational to wake up every day and see the amazing effort going on around the world and the role NVIDIA’s scientific computing platform plays in helping understand the virus and discovering testing and treatment options to fight the COVID-19 pandemic.

The reason we’re able to play a role in so many efforts, across so many areas, is because of our strong focus on providing end-to-end workflows for the scientific computing community.

We’re able to provide these workflows because of our approach to full-stack innovation to accelerate all key application areas.

For data analytics, we accelerate the key frameworks like Spark3.0, RAPIDS and Dask. This acceleration is built using our domain-specific CUDA-X libraries for data analytics such as cuDF, cuML and cuGRAPH, along with I/O acceleration technologies from Magnum IO.

These libraries contain millions of lines of code and provide seamless acceleration to developers and users, whether they’re creating applications on the desktops accelerated with our GPUs or running them in data centers, in edge computers, in supercomputers, or in the cloud.

Similarly, we accelerate over 700 HPC applications, including all the most widely used scientific applications.

NVIDIA accelerates all frameworks for AI, which has become crucial for tasks where the information is incomplete — where there are no first principles to work with or the first principle-based approaches are too slow.

And, thanks to our roots in visual computing, NVIDIA provides accelerated visualization solutions, so terabytes of data can be visualized.

NASA, for instance, used our acceleration stack to visualize the landing of the first manned mission to Mars, in what is the world’s largest real-time, interactive volumetric visualization (150TB).

Our deep domain libraries also provide a seamless performance boost to scientific computing users on their applications across the different generations of our architecture. Going from Volta to Ampere, for instance.

NVIDIA’s also making all our new and improved GPU-optimized scientific computing applications available through NGC for researchers to accelerate their time to insight

Together, all of these pillars of scientific computing — simulation, AI and data analytics , edge streaming and visualization workflows — are key to tackling the challenges of today, and tomorrow.

The post Fighting COVID-19 in New Era of Scientific Computing appeared first on The Official NVIDIA Blog.

NVIDIA Shatters Big Data Analytics Benchmark

NVIDIA just outperformed by nearly 20x the record for running the standard big data analytics benchmark, known as TPCx-BB.

Using the RAPIDS suite of open-source data science software libraries powered by 16 NVIDIA DGX A100 systems, NVIDIA ran the benchmark in just 14.5 minutes, versus the current leading result of 4.7 hours on a CPU system. The DGX A100 systems had a total of 128 NVIDIA A100 GPUs and used NVIDIA Mellanox networking.

TPCx-BB benchmark results across 30 queries. Running on 16 DGX A100 systems, RAPIDS delivers the above relative performance gains per query for 10TB testing.

Software and Hardware Align for Full-Throttle Results

Today, leading organizations use AI to gain insights. The TPCx-BB benchmark features queries that combine SQL with machine learning on structured data, with natural language processing and unstructured data, reflecting the diversity found in modern data analytics workflows.

These unofficial results point to a new standard, and the breakthroughs behind it are available through the NVIDIA software and hardware ecosystem.

To run the benchmark, NVIDIA used RAPIDS for data processing and machine learning, Dask for horizontal scaling and UCX open source libraries for ultra fast communication, all supercharged on DGX A100.

DGX A100 systems can effectively power analytics, AI training and inference on a single, software-defined platform. DGX A100 unites the NVIDIA Ampere architecture-based NVIDIA A100 Tensor Core GPUs and NVIDIA Mellanox networking in a turnkey system that scales with ease.

Parallel Processing for Unparalleled Performance

TPCx-BB is a big data benchmark for enterprises representing real-world ETL (extract, transform, load) and machine learning workflows. The benchmark’s 30 queries include big data analytics use cases like inventory management, price analysis, sales analysis, recommendation systems, customer segmentation and sentiment analysis.

Despite steady improvements in distributed computing systems, such big data workloads are bottlenecked when running on CPUs. The RAPIDS results on DGX A100 showcase the breakthrough potential for TPCx-BB benchmarks powered by GPUs, a measurement historically run on CPU-only systems.

In this benchmark, the RAPIDS software ecosystem and DGX A100 systems accelerate compute, communication, networking and storage infrastructure. This integration sets a new bar for running data science workloads at scale.

Efficient Benchmarking at Big Data Scale

At the SF10000 TPCx-BB scale, the NVIDIA testing represents results for a workload with more than 10 terabytes of data.

At this scale, query complexity can quickly drive up execution time, which increases data center expenses like space, server equipment, power, cooling and IT expertise. The elastic DGX A100 architecture addresses these challenges.

And with new NVIDIA A100 Tensor Core GPU systems coming from NVIDIA hardware partners, data scientists will have even more options to accelerate their workloads with the performance of A100.

Open Source Acceleration and Collaboration

The RAPIDS TPCx-BB benchmark is an active project with many partners and open source communities.

The TPCx-BB queries were implemented as a series of Python scripts utilizing the RAPIDS dataframe library, cuDF; the RAPIDS machine learning library, cuML; and CuPy, BlazingSQL and Dask as the primary libraries. Numba was used to implement custom logic in user-defined functions, with spaCy for Named Entity Recognition.

These results would not be possible without the RAPIDS and broader PyData ecosystem.

To dive deeper into the RAPIDS benchmarking results, read the RAPIDS blog. For more information on RAPIDS, visit rapids.ai.

The post NVIDIA Shatters Big Data Analytics Benchmark appeared first on The Official NVIDIA Blog.

NVIDIA Powers World’s Leading Weather Forecasters’ Supercomputers​

Think of weather forecasting and satellite images on the local news probably come to mind. But another technology has transformed the task of forecasting and simulating the weather: supercomputing.

Weather and climate models are both compute and data intensive. Forecast quality depends on model complexity and high resolution. Resolution depends on the performance of supercomputers. And supercomputer performance depends on interconnect technology to move data quickly, effectively and in a scalable manner across compute resources.

That’s why many of the world’s leading meteorological services have chosen NVIDIA Mellanox InfiniBand networking to accelerate their supercomputing platforms, including the Spanish Meteorological Agency, the China Meteorological Administration, the Finnish Meteorological Institute, NASA and the Royal Netherlands Meteorological Institute​.

The technological advantages of InfiniBand have made it the de facto standard for climate research and weather forecasting applications, delivering higher performance, scalability and resiliency versus any other interconnect technologies.

The Beijing Meteorological Service has selected 200 Gigabit HDR InfiniBand interconnect technology to accelerate its new supercomputing platform, which will be used for enhancing weather forecasting, improving climate and environmental research, and serving the weather forecasting information needs of the 2022 Winter Olympics in Beijing.

Meteo France, the French national meteorological service, has selected HDR InfiniBand to accelerate its two new large-scale supercomputers. The agency provides weather forecasting services for companies in transport, agriculture, energy and many other industries, as well as for a large number of media channels and worldwide sporting and cultural events. One of the systems debuted on the TOP500 list, out this month.

“We have been using InfiniBand for many years to connect our supercomputing platforms in the most efficient and scalable way, enabling us to conduct high-performance weather research and forecasting simulations,” said Alain Beuraud, HPC project manager at Meteo France. “We are excited to leverage the HDR InfiniBand technology advantages, its In-Network Computing acceleration engines, extremely low latency, and advanced routing capabilities to power our next supercomputing platforms.”

HDR InfiniBand will also accelerate the new supercomputer for the European Center for Medium Range Weather Forecasts (ECMWF). Being deployed this year, the system will support weather forecasting and prediction researchers from over 30 countries across Europe. It will increase the center’s weather and climate research compute power by 5x, making it one of the world’s most powerful meteorological supercomputers.

The new platform will enable running nearly twice as many higher-resolution probabilistic weather forecasts in less than an hour, improving the ability to monitor and predict increasingly severe weather phenomena and enabling European countries to better protect lives and property.

“We require the best supercomputing power and the best technologies available for our numerical weather prediction activities,” said Florence Rabier, director general at ECMWF. “With our new supercomputing capabilities, we will be able to run higher resolution forecasts in under an hour and enable much improved weather forecasts.”

“As governments and society continue to grapple with the impacts of increasingly severe weather, we are also proud to be relying on a supercomputer designed to maximize energy efficiency,” he added.

The NVIDIA Mellanox networking technology team has also been working with the German Climate Computing Centre on optimizing performance of the ICON application, the first project in a multi-phase collaboration. ICON is a unified weather forecasting and climate model, jointly developed by the Max-Planck-Institut für Meteorologie and Deutscher Wetterdienst (DWD), the German National Meteorological Service.

By optimizing the application’s data exchange modules to take advantage of InfiniBand, the team has demonstrated a nearly 20 percent increase in overall application performance.

The design of InfiniBand rests on four fundamentals: a smart endpoint design that can run all network engines; a software-defined switch network designed for scale; centralized management that lets the network be controlled and operated from a single place; and standard technology, ensuring forward and backward compatibility, with support for open source technology and open APIs.

It’s these fundamentals that help InfiniBand provide the highest network performance, extremely low latency and high message rate. As the only 200Gb/s high-speed interconnect in the market today, InfiniBand delivers the highest network efficiency with advanced end-to-end adaptive routing, congestion control and quality of service.

The forecast calls for more world-leading weather and climate agencies to announce their new supercomputing platforms this year using HDR InfiniBand. In the meantime, learn more about NVIDIA Mellanox InfiniBand HPC technology.

The post NVIDIA Powers World’s Leading Weather Forecasters’ Supercomputers​ appeared first on The Official NVIDIA Blog.

Intel Announces Unmatched AI and Analytics Platform with New Processor, Memory, Storage and FPGA Solutions

What’s New: Intel today introduced its 3rd Gen Intel® Xeon® Scalable processors and additions to its hardware and software AI portfolio, enabling customers to accelerate the development and use of artificial intelligence (AI) and analytics workloads running in data center, network and intelligent-edge environments. As the industry’s first mainstream server processor with built-in bfloat16 support, Intel’s new 3rd Gen Xeon Scalable processors makes AI inference and training more widely deployable on general-purpose CPUs for applications that include image classification, recommendation engines, speech recognition and language modeling.

“The ability to rapidly deploy AI and data analytics is essential for today’s businesses. We remain committed to enhancing built-in AI acceleration and software optimizations within the processor that powers the world’s data center and edge solutions, as well as delivering an unmatched silicon foundation to unleash insight from data.”
–Lisa Spelman, Intel corporate vice president and general manager, Xeon and Memory Group

Why It’s Important: AI and analytics open new opportunities for customers across a broad range of industries, including finance, healthcare, industrial, telecom and transportation. IDC predicts that by 2021, 75% of commercial enterprise apps will use AI1. And by 2025, IDC estimates that roughly a quarter of all data generated will be created in real time, with various internet of things (IoT) devices creating 95% of that volume growth2.

Press Graphic
» Click for full image

Unequaled Portfolio Breadth and Ecosystem Support for AI and Analytics: Intel’s new data platforms, coupled with a thriving ecosystem of partners using Intel AI technologies, are optimized for businesses to monetize their data through the deployment of intelligent AI and analytics services.

  • New 3rd Gen Intel Xeon Scalable Processors: Intel is further extending its investment in built-in AI acceleration in the new 3rd Gen Intel Xeon Scalable processors through the integration of bfloat16 support into the processor’s unique Intel DL Boost® technology. Bfloat16 is a compact numeric format that uses half the bits as today’s FP32 format but achieves comparable model accuracy with minimal — if any — software changes required. The addition of bfloat16 support accelerates both AI training and inference performance in the CPU. Intel-optimized distributions for leading deep learning frameworks (including TensorFlow and Pytorch) support bfloat16 and are available through the Intel AI Analytics toolkit. Intel also delivers bfloat16 optimizations into its OpenVINO® toolkit and the ONNX Runtime environment to ease inference deployments.The 3rd Gen Intel Xeon Scalable processors (code-named “Cooper Lake”) evolve Intel’s 4- and 8-socket processor offering. The processor is designed for deep learning, virtual machine (VM) density, in-memory database, mission-critical applications and analytics-intensive workloads. Customers refreshing aging infrastructure can expect an average estimated gain of 1.9 times on popular workloads3 and up to 2.2 times more VMs4 compared with 5-year-old 4-socket platform equivalents.
    » View Press Kit
  • New Intel Optane persistent memory: As part of the 3rd Gen Intel Xeon Scalable platform, the company also announced the Intel Optane™ persistent memory 200 series, providing customers up to 4.5TB of memory per socket to manage data intensive workloads, such as in-memory databases, dense virtualization, analytics and high-powered computing.
    » View Press Kit
  • New Intel 3D NAND SSDs: For systems that store data in all-flash arrays, Intel announced the availability of its next-generation high-capacity Intel 3D NAND SSDs, the Intel SSD D7-P5500 and P5600. These 3D NAND SSDs are built with Intel’s latest triple-level cell (TLC) 3D NAND technology and an all-new low-latency PCIe controller to meet the intense IO requirements of AI and analytics workloads and advanced features to improve IT efficiency and data security.
    » View Press Kit
  • First Intel AI-optimized FPGA: Intel disclosed its upcoming Intel Stratix® 10 NX FPGAs, Intel’s first AI-optimized FPGAs targeted for high-bandwidth, low-latency AI acceleration. These FPGAs will offer customers customizable, reconfigurable and scalable AI acceleration for compute-demanding applications such as natural language processing and fraud detection. Intel Stratix 10 NX FPGAs include integrated high-bandwidth memory (HBM), high-performance networking capabilities and new AI-optimized arithmetic blocks called AI Tensor Blocks, which contain dense arrays of lower-precision multipliers typically used for AI model arithmetic.
    » View Press Kit
  • OneAPI cross-architecture development for ongoing AI innovation: As Intel expands its advanced AI product portfolio to meet diverse customer needs, it is also paving the way to simplify heterogeneous programming for developers with its oneAPI cross-architecture tools portfolio to accelerate performance and increase productivity. With these advanced tools, developers can accelerate AI workloads across Intel CPUs, GPUs and FPGAs, and future-proof their code for today’s and the next generations of Intel processors and accelerators.
  • Enhanced Intel Select Solutions portfolio addresses IT’s top requirements: Intel has enhanced its Select Solutions portfolio to accelerate deployment of IT’s most urgent requirements highlighting the value of pre-verified solution delivery in today’s rapidly evolving business climate. Announced today are three new and five enhanced Intel Select Solutions focused on analytics, AI and hyper-converged infrastructure. The enhanced Intel Select Solution for Genomics Analytics is being used around the world to find a vaccine for COVID-19 and the new Intel Select Solution for VMware Horizon VDI on vSAN is being used to enhance remote learning.

When Products are Available: The 3rd Gen Intel Xeon Scalable processors and Intel Optane persistent memory 200 series are shipping to customers today. In May, Facebook announced that 3rd Gen Intel Xeon Scalable processors are the foundation for its newest Open Compute Platform (OCP) servers, and other leading CSPs, including Alibaba, Baidu and Tencent, have announced they are adopting the next-generation processors. General OEM systems availability is expected in the second half of 2020. The Intel SSD D7-P5500 and P5600 3D NAND SSDs are available today. And the Intel Stratix 10 NX FPGA is expected to be available in the second half of 2020.

More Context: Accelerating AI and Analytics – Intel Processors, FPGAs, Memory & Storage, and Software (Press Kit)

3rd Gen Xeon SKU Table
» Click for full table

The Small Print:

1 IDC, Worldwide Storage for Cognitive/AI Workloads Forecast, 2018–2022, #US43707918e, April 2018

2 IDC, Edge computing, 5G and AI — the perfect storm for government systems, Shawn McCarthy, April 8, 2020

Configurations:

3 1.9x average estimated gain on popular workloads with the new 3rd Gen Intel® Xeon® Platinum 8380H processor vs. 5-year old platform. See claim 11 www.intel.com/3rd-gen-xeon-configs

4 Up to 2.2x more Virtual Machines with the new 3rd Gen Intel® Xeon® Scalable platform and Intel® SSD Data Center Family vs. 5-year old 4-socket platform. See claim 4 www.intel.com/3rd-gen-xeon-configs

The post Intel Announces Unmatched AI and Analytics Platform with New Processor, Memory, Storage and FPGA Solutions appeared first on Intel Newsroom.

Intel Announces 2020 US Partner of the Year Awards for Excellence in Accelerating Innovation

IPC
What’s New: Today Intel recognized the outstanding achievements of 30 partners with the distinction of Partner of the Year at its Intel Partner Connect 2020 virtual conference. The Partner of the Year awards honor Intel partners demonstrating excellence in technology innovation, go-to-market strategizing, sales growth and marketing.

“We appreciate each of our partners for their continued collaboration to bring new technologies to life for our customers. The shared results from 2019 demonstrate our strong partnerships and collective mission to bring innovative solutions to businesses and organizations across the world.”
– Greg Ernst, Intel vice president in the Sales and Marketing Group and general manager of U.S. Sales

Why It’s Important: The title of Partner of the Year is awarded to companies achieving the highest standards of design, development, integration and technology deployment to accelerate innovation, growth and go-to-market strategies. They represent great examples of what’s possible when we, as an ecosystem, work together.

Partner Program Winners:

Global Innovation

  • Accenture – Global Innovation Partner – Globally deploying innovative solutions across artificial intelligence (AI), analytics, blockchain and device-as-a-service leveraging Intel technologies: Intel® OpenVINO™, Intel® Arria® 10 FPGA, Intel® Movidius™ Myriad™ X VPU, Intel® Connected Logistics platform and the Intel vPro® platform.

LOEM

  • AIS – Growth – Continuously grew integration of Intel® NUC product to enhance video collaboration solutions in enterprise and education.
  • Colfax International – Go-to-Market – For successfully deploying Intel® Optane™ persistent memory DIMMS at launch and strategizing a cohesive pricing model.
  • Crystal Group – Growth – Delivered innovative, ruggedized systems tailored to specific customer needs for oil and gas, and power substation/micro grid market-ready solutions for Intel’s common substation platform.
  • Eluktronics – Innovation – Leading channel whitebook GTM strategy with SPG and executing TTM launch of Queen’s County, selling 1.2ku in first quarter.
  • IBuyPower – Go-to-Market – Set its sights on TAM expansion with Intel technology through a bold, creative and unique partnership with Toyota Racing Development. With this program, it unveiled a state-of-the-art gaming and training zone at Toyota Performance Center, remastered its Pro Series of workstation PCs for professionals, launched a series of TRD-approved systems for gamers, and broadcasted a video series designed to award-winning college students with gaming room makeovers, all with Intel branding and powered by Intel-based PCs.
  • Penguin Computing – Innovation – An innovative Linux solution for high-performance computing on-premise and in the cloud with Penguin Computing professional and managed services.
  • Razer – Growth – Razer saw exceptional growth in 2019, in part by bringing the latest Intel technologies to market, including Intel® Core™ i7 processors, Intel® Iris® Plus graphics, Thunderbolt™ 3 and Wi-Fi 6, to deliver high-performance thin-and-light gaming laptops.
  • Simply NUC – Go-to-Market – Dedicated to expanding the use cases of mini PCs into new growth segments such as digital signage, academic collaboration and AI. Simply NUC is your one-stop shop for systems, solutions and accessories.
  • Vast Data – Innovation – For close collaboration and partnership in creating Intel Optane technology-based storage solutions for new applications, such as analytics and AI, machine learning and deep learning. Uniquely integrated key Intel technologies to simplify the data center stack, eliminate storage complexity and tiers, and enable all-flash performance with archive economics.

National

  • CDW – Growth – Expanding Intel client, data center, storage and networking infrastructure solutions across over 150 countries.
  • Connection – Go-to-Market – For their dedication to selling devices consistently across SMB, public sector and enterprise segments.
  • Insight – Innovation – For simplifying complex solutions in emerging technologies like the internet of Things (IoT) and machine learning – including the Connected Platform for Detection and Prevention of the spread of viruses – to accelerate our clients’ time to value, drive efficiency in their workplaces and create positive customer experiences through partnerships and solution aggregation at scale.
  • Logicalis – Go-to-Market – Intel pre-validated, and pre-integrated IoT solutions across markets, such as asset management in healthcare, machine vision in industrial, and smart city applications.
  • Pivot – Go-to-Market – Edge secure connectivity, computing and collaboration solutions that continue to advance and scale Smart Edge’s software. Pivot signed a three-year preferred partnership agreement with Intel to continue investing in and drive the Intel® Smart Edge/Edge solution (branded Pivot Intelligent Edge) to market and support its future growth. Pivot has integrated Smart Edge to be a foundational component of Pivot’s Intelligent Edge Solution and Services that provide best-in-class secure connectivity across multiple wireless protocols (CBRS, LTE, Wifi, Lora, Zigbee, etc.).
  • Presidio – Growth – Deployed Intel-based solutions around HCI, SDS and Hybrid Cloud across its middle market, enterprise and government clients.
  • SHI International Corp. – Innovation – Leads the way with its cutting-edge Zero Touch, which streamlines configuration, deployment and management of Intel processor-based Win 10 client devices.
  • World Wide Technology – Growth – Designed, built, and deployed transformational solutions for multicloud, AI/analytics, IoT and 5G, supporting our largest enterprise, public sector, and service provider customers.
  • Zones – Innovation – For its leadership in solution development and deployment of the Intel Unite®

ISA

  • BCM – Highest IOT Growth at Associate/Affiliate Level – Provided medical equipment OEMs with a viable IoT data collection and aggregation device using Intel Core technology. Understands multiple vertical markets and embedded life cycle management, and reduces time to market with a quality product.
  • Crestron – Most Engaged Co-Marketing – The Creston Collaboration solution is an Intel® IoT Market Ready Solution built on Intel technologies (Intel Core i7, Movidius and Intel Arria FPGA). Creston engaged in a multifaceted Intel IoT Solutions Alliance co-marketing campaign (event, collateral, demos, digital), insight.tech content marketing platform, and the Intel® Solutions Marketplace to develop leads, accelerate its business and drive revenue and deployments.
  • Dell OEM – Largest IoT Co-sell Partner +Biggest IOT Growth Partner – Dell Technologies Original Equipment Manufacturer (OEM) Embedded & Edge Solutions delivers customized infrastructure, services and a secure supply chain designed for your vision and business goals – all from one trusted, sustainable and secure vendor. Dell OEM offers solutions for IoT, communications, medical, retail and more than 40 additional verticals.
  • Noodle.ai – Most impactful MRS – Noodle.ai is a mature startup software company with deep heritage and expertise of AI/ML analytics for factory/industrial environments. With the support from Intel and Dell, Noodle.ai will continue to pioneer the Smart Factories initiative, as part of the Industry 4.0 rollout.

Distributor

  • Synnex CorporationData Center Group Distributor of the Year – Grew its overall data center business with a companywide focus on growing this segment, which resulted in overall data center growth, and Intel adjacencies and Intel® Data Center Blocks.
  • Ingram MicroClient Computing Group Distributor of the Year – Grew its client computing business through a sales and marketing strategy focused on growth areas, like solutions-based on Intel NUC products.
  • ArrowInternet of Things Group Distributor of the Year – Drove an overall IoT silicon, systems, solution strategy that led to expanding its overall business and evolving its IoT go-to-market strategy.
  • ASINon-Volatile Memory Solutions Group Distributor of the Year – Exceptional growth year-over-year through a very focused effort across the entire company.
  • Tech DataBranded Systems Distributor of the Year – Strong growth results on both end-point products and data center through a variety of companywide initiatives.
  • Tech DataPartner Enablement Distributor of the Year – Delivered innovative solutions to help its Intel partners grow their Intel business through Tech Data’s Propel ITP program.
  • Computech InternationalChannel Innovation Award – Brought Intel nonvolatile memory solutions to new markets, expanding Intel’s channel presence and customer base.

More Context: Intel’s Partner Program Page

The Small Print: Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.

Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.

The post Intel Announces 2020 US Partner of the Year Awards for Excellence in Accelerating Innovation appeared first on Intel Newsroom.

What’s a DPU?

What’s a DPU?

Of course, you’re probably already familiar with the Central Processing Unit or CPU. Flexible and responsive, for many years CPUs were the sole programmable element in most computers.

More recently the GPU, or graphics processing unit, has taken a central role. Originally used to deliver rich, real-time graphics, their parallel processing capabilities make them ideal for accelerated computing tasks of all kinds.

That’s made them the key to artificial intelligence, deep learning, and big data analytics applications.

Over the past decade, however, computing has broken out of the boxy confines of PC and servers — with CPUs and GPUs powering sprawling new hyperscale data centers.

These data centers are knit together with a powerful new category of processors. The DPU, or data processing unit, has become the third member of the data centric accelerated computing model.“This is going to represent one of the three major pillars of computing going forward,” NVIDIA CEO Jensen Huang said during a talk earlier this month.

“The CPU is for general purpose computing, the GPU is for accelerated computing and the DPU, which moves data around the data center, does data processing.”

What's a DPU?

Data Processing Unit
Industry-standard, high-performance, software-programmable multi-core CPU
High-performance network interface
Flexible and programmable acceleration engines

So What Makes a DPU Different?

A DPU is a new class of programmable processor that combines three key elements. A DPU is a system on a chip, or SOC, that combines:
An industry standard, high-performance, software programmable, multi-core CPU, typically based on the widely-used Arm architecture, tightly coupled to the other SOC components

A high-performance network interface capable of parsing, processing, and efficiently transferring data at line rate, or the speed of the rest of the network, to GPUs and CPUs

A rich set of flexible and programmable acceleration engines that offload and improve applications performance for AI and Machine Learning, security, telecommunications, and storage, among others.

All these DPU capabilities are critical to enable an isolated, bare-metal, cloud-native computing that will define the next generation of cloud-scale computing.

DPUs: Incorporated into SmartNICs

The DPU can be used as a stand-alone embedded processor, but it’s more often incorporated into a SmartNIC, a network interface controller that’s  used as a key component in a next generation server.

Other devices that claim to be DPUs miss significant elements of these three critical capabilities that are fundamental to claiming to answer the question: What is a DPU?

DPUs, or data processing units, can be used as a stand-alone embedded processor, but they’re more often incorporated into a SmartNIC, a network interface controller that’s used as a key component in a next generation server.
DPUs can be used as a stand-alone embedded processor, but they’re more often incorporated into a SmartNIC, a network interface controller that’s used as a key component in a next generation server.

For example, some vendors use proprietary processors that don’t benefit from the rich development and application infrastructure offered by the broad Arm CPU ecosystem.

Others claim to have DPUs but make the mistake of focusing solely on the embedded CPU to perform data path processing.

DPUs: A Focus on Data Processing

This isn’t competitive and doesn’t scale, because trying to beat the traditional x86 CPU with a brute force performance attack is a losing battle. If 100 Gigabit/sec packet processing brings an x86 to its knees, why would an embedded CPU perform better?

Instead the network interface needs to be powerful and flexible enough to handle all network data path processing. The embedded CPU should be used for control path initialization and exception processing, nothing more.

At a minimum, there 10 capabilities the network data path acceleration engines need to be able to deliver:

  • Data packet parsing, matching, and manipulation to implement an open virtual switch (OVS)
  • RDMA data transport acceleration for Zero Touch RoCE
  • GPU-Direct accelerators to bypass the CPU and feed networked data directly to GPUs (both from storage and from other GPUs)
  • TCP acceleration including RSS, LRO, checksum, etc
  • Network virtualization for VXLAN and Geneve overlays and VTEP offload
  • Traffic shaping “packet pacing” accelerator to enable multi-media streaming, content distribution networks, and the new 4K/8K Video over IP (RiverMax for ST 2110)
  • Precision timing accelerators for telco Cloud RAN such as 5T for 5G capabilities
  • Crypto acceleration for IPSEC and TLS performed inline so all other accelerations are still operation
  • Virtualization support for SR-IOV, VirtIO and para-virtualization
  • Secure Isolation: root of trust, secure boot, secure firmware upgrades, and authenticated containers and application life cycle management

These are just 10 of the acceleration and hardware capabilities that are critical to being able to answer yes to the question: “What is a DPU?”

So what is a DPU? This is a DPU:

What's a DPU? This is a DPU, also known as a Data Processing Unit.

Many so-called DPUs focus solely on delivering one or two of these functions.

The worst try to offload the datapath in proprietary processors.

While good for prototyping, this is a fool’s errand, because of the scale, scope, and breadth of data center.

Additional DPU-Related Resources

The post What’s a DPU? appeared first on The Official NVIDIA Blog.