NVIDIA Sets Six Records in AI Performance

NVIDIA has set six AI performance records with today’s release of the industry’s first broad set of AI benchmarks.

Backed by Google, Intel, Baidu, NVIDIA and dozens more technology leaders, the new MLPerf benchmark suite measures a wide range of deep learning workloads. Aiming to serve as the industry’s first objective AI benchmark suite, it covers such areas as computer vision, language translation, personalized recommendations and reinforcement learning tasks.

NVIDIA achieved the best performance in the six MLPerf benchmark results it submitted for. These cover a variety of workloads and infrastructure scale – ranging from 16 GPUs on one node to up to 640 GPUs across 80 nodes.

The six categories include image classification, object instance segmentation, object detection, non-recurrent translation, recurrent translation and recommendation systems. NVIDIA did not submit results for the seventh category for reinforcement learning, which does not yet take advantage of GPU acceleration.

A key benchmark on which NVIDIA technology performed particularly well was language translation, training the Transformer neural network in just 6.2 minutes. More details on all six submissions are available on the NVIDIA Developer news center.

NVIDIA engineers achieved their results on NVIDIA DGX systems, including NVIDIA DGX-2, the world’s most powerful AI system, featuring 16 fully connected V100 Tensor Core GPUs.

NVIDIA is the only company to have entered as many as six benchmarks, demonstrating the versatility of V100 Tensor Core GPUs for the wide variety of AI workloads deployed today.

“The new MLPerf benchmarks demonstrate the unmatched performance and versatility of NVIDIA’s Tensor Core GPUs,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “Exceptionally affordable and available in every geography from every cloud service provider and every computer maker, our Tensor Core GPUs are helping developers around the world advance AI at every stage of development.”

State-of-the-Art AI Computing Requires Full Stack Innovation

Performance on complex and diverse computing workloads takes more than great chips. Accelerated computing is about more than an accelerator. It takes the full stack.

NVIDIA’s stack includes NVIDIA Tensor Cores, NVLink, NVSwitch, DGX systems, CUDA, cuDNN, NCCL, optimized deep learning framework containers and NVIDIA software development kits.

NVIDIA’s AI platform is also the most accessible and affordable. Tensor Core GPUs are available on every cloud and from every computer maker and in every geography.

The same power of Tensor Core GPUs is also available on the desktop, with the most powerful desktop GPU, NVIDIA TITAN RTX costing only $2,500. When amortized over three years, this translates to just a few cents per hour.

And the software acceleration stacks are always updated on the NVIDIA GPU Cloud (NGC) cloud registry.

NVIDIA’s Record-Setting Platform Available Now on NGC

The software innovations and optimizations used to achieve NVIDIA’s industry-leading MLPerf performance are available free of charge in our latest NGC deep learning containers. Download them from the NGC container registry.

The containers include the complete software stack and the top AI frameworks, optimized by NVIDIA. Our 18.11 release of the NGC deep learning containers includes the exact software used to achieve our MLPerf results.

Developers can use them everywhere, at every stage of development:

  • For data scientists on desktops, the containers enable cutting-edge research with NVIDIA TITAN RTX GPUs.
  • For workgroups, the same containers run on NVIDIA DGX Station.
  • For enterprises, the containers accelerate the application of AI to their data in the cloud with NVIDIA GPU-accelerated instances from Alibaba Cloud, AWS, Baidu Cloud, Google Cloud Platform, IBM Cloud, Microsoft Azure, Oracle Cloud Infrastructure and Tencent Cloud.
  • For organizations building on-premise AI infrastructure, NVIDIA DGX systems and NGC-Ready systems from Atos, Cisco, Cray, Dell EMC, HP, HPE, Inspur, Lenovo, Sugon and Supermicro put AI to work.

To get started on your AI project, or to run your own MLPerf benchmark, download containers from the NGC container registry.

The post NVIDIA Sets Six Records in AI Performance appeared first on The Official NVIDIA Blog.

NVIDIA Extends PhysX for High Fidelity Simulations, Goes Open Source

NVIDIA PhysX, the most popular physics simulation engine on the planet, is going open source.

We’re doing this because physics simulation — long key to immersive games and entertainment — turns out to be more important than we ever thought.

Physics simulation dovetails with AI, robotics and computer vision, self-driving vehicles, and high-performance computing.

It’s foundational for so many different things we’ve decided to provide it to the world in an open source fashion.

Meanwhile, we’re building on more than a decade of continuous investment in this area to simulate the world with ever greater fidelity, with on-going research and development to meet the needs of those working in robotics and with autonomous vehicles.

Free, Open-Source, GPU-Accelerated

PhysX will now be the only free, open-source physics solution that takes advantage of GPU acceleration and can handle large virtual environments.

It will be available as open source starting Monday, Dec. 3, under the simple BSD-3 license.

PhysX solves some serious challenges.

  • In AI, researchers need synthetic data — artificial representations of the real world — to train data-hungry neural networks.
  • In robotics, researchers need to train robotic minds in environments that work like the real one.
  • For self-driving cars, PhysX allows vehicles to drive for millions of miles in simulators that duplicate real-world conditions.
  • In game development, canned animation doesn’t look organic and is time consuming to produce at a polished level.
  • In high-performance computing, physics simulations are being done on ever more powerful machines with ever greater levels of fidelity.

The list goes on.

PhysX SDK addresses these challenges with scalable, stable and accurate simulations. It’s widely compatible, and it’s now open source.

NVIDIA PhysX scales to large numbers of interacting bodies.

PhysX SDK is a scalable multi-platform game physics solution supporting a wide range of devices, from smartphones to high-end multicore CPUs and GPUs.

It’s already integrated into some of the most popular game engines, including Unreal Engine (versions 3 and 4) and Unity3D.

You can also find the full source code on GitHub. Dig in.

The post NVIDIA Extends PhysX for High Fidelity Simulations, Goes Open Source appeared first on The Official NVIDIA Blog.

Getting Answers Faster: NVIDIA and Open-Source Ecosystem Come Together to Accelerate Data Science

No matter the industry, data science has become a universal toolkit for businesses. Data analytics and machine learning give organizations insights and answers that shape their day-to-day actions and future plans. Being data-driven has become essential to lead any industry.

While the world’s data doubles each year, CPU computing has hit a brick wall with the end of Moore’s law. For this reason, scientific computing and deep learning have turned to NVIDIA GPU acceleration. Data analytics and machine learning haven’t yet tapped into the GPU as systematically. That’s changing.

RAPIDS, launched today at GTC Europe, gives data scientists for the first time a robust platform for GPU-accelerated data science: analytics, machine learning and, soon, data visualization. And what’s more, the libraries are open-source, built with the support of open-source contributors and available immediately at www.RAPIDS.ai.

Initial benchmarks show game-changing 50x speedups with RAPIDS running on the NVIDIA DGX-2 AI supercomputer, compared with CPU-only systems, reducing experiment iteration from hours to minutes.

By the Community, for the Community

With a suite of CUDA-integrated software tools, RAPIDS gives developers new plumbing under the foundation of their data science workflows.

To make this happen, NVIDIA engineers and open-source Python contributors collaborated for two years. Building on key open-source projects including Apache Arrow, Pandas and scikit-learn, RAPIDS connects the data science ecosystem by bringing together popular capabilities from multiple libraries and adding the power of GPU acceleration.

RAPIDS will also integrate with Apache Spark, the leading open-source data science framework for data centers, used by more than 1,000 organizations.

A data science workshop following the GTC Europe keynote will feature a panel with luminaries of the open-source community — Travis Oliphant and Peter Wang, co-founders of Anaconda, as well as Wes McKinney, founder and creator of Apache Arrow and the Pandas software library, and a contributor to RAPIDS.

These pioneers will discuss the potential for RAPIDS for GPU-accelerated data science before an audience of developers, researchers and business leaders. At the workshop, Databricks, a company founded by the creators of Spark, will present on unifying data management and machine learning tools using GPUs.

It was a natural step for NVIDIA, as the creator of CUDA, to develop the first complete solution that integrates Python data science libraries with CUDA at the kernel level. By keeping it open source, we welcome further growth and contributions from other developers in the ecosystem.

This community is vast — tens of millions of downloads occur annually of the core data science libraries via the package manager Conda. Open-source development makes it easier for data scientists to rapidly adopt RAPIDS and maintain the flexibility to modify and customize tools for their applications.

NVIDIA in recent years has made diverse contributions to the AI open-source community with libraries like the Material Definition Language SDK, the NCCL software module for communication between GPUs and the NVIDIA DIGITS deep learning application.

There are 120 repositories on our GitHub page, including research algorithms, the CUTLASS library for matrix multiplication in CUDA and NVcaffe, our fork of the Caffe deep learning framework. And we’ll continue to contribute to RAPIDS alongside the open-source community, supporting data scientists as they conduct efficient, granular analysis.

Delivering Rapid Answers to Data Science Questions

Data scientists, and the insights they extract, are in high demand. But when relying on CPU systems, there’s always been a limit on how fast they can crunch data.

Depending on the size of their datasets, scientists may have a long wait for results from their machine learning models. And some may aggregate or simplify their data, sacrificing granularity for faster results.

With the adoption of RAPIDS and GPUs, data scientists can ramp up iteration and testing, providing more accurate predictions that improve business outcomes. Typical training times can shrink from days to hours, or from hours to minutes.

In the retail industry, it could allow grocery chains to estimate the optimal amount of fresh fruit to stock in each store location. For banks, GPU-accelerated insights could alert lenders about which homeowners are at risk of defaulting on a mortgage.

Access to the RAPIDS open-source suite of libraries is immediately available at www.RAPIDS.ai, where the code is being released under the Apache license. Containerized versions of RAPIDS will be available this week on the NVIDIA GPU Cloud container registry.

For more updates about RAPIDS, follow @rapidsai.

The post Getting Answers Faster: NVIDIA and Open-Source Ecosystem Come Together to Accelerate Data Science appeared first on The Official NVIDIA Blog.

Attention, Developers: We’re Turning Up the Power of NVIDIA SDKs for Turing

Developers can get off to a running start with Turing, our new GPU architecture, using our latest software tools.

Unveiled last month, Turing is one of the biggest leaps in computer graphics in 20 years. As the first Turing-based GPUs hit the shelves, we’re delivering software innovations that help developers take advantage of this powerful architecture and boost computing performance.

Here’s a look at new software optimized for Turing and geared to advance the next generation of research, AI services and graphics.

Diving into Deep Learning SDK


The CUDA parallel computing platform and programming model allows developers to harness the power of GPUs to accelerate compute-intensive applications. Each new generation of GPU architecture comes with a major CUDA update.

CUDA 10 includes support for Turing GPUs, performance optimized libraries, a new asynchronous task-graph programming model, enhanced CUDA & graphics API interoperability, and new developer tools.

Learn more and download CUDA 10.


A software library built on the CUDA platform, cuDNN accelerates deep learning training. Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning.

Deep learning frameworks using cuDNN 7.3 can leverage new features and performance of the Turing architecture to deliver faster training performance. cuDNN 7.3 highlights include improved grouped convolution and dilated convolution performance for popular deep learning applications in computer vision and speech.

Learn more and download cuDNN.


Turing GPUs are loaded with Tensor Cores that accelerate deep learning inference, which is when neural networks are deployed in the field. These cores accelerate applications such as natural language processing, recommender apps and neural machine translation.

With new optimizations and APIs, TensorRT 5 delivers up to 40x faster inference performance over CPUs, helping developers run AI services in real time. In addition to Linux, TensorRT 5 introduces support for new operating systems: Windows and CentOS.

Learn more and download TensorRT.


NCCL enables fast communication between GPUs across various network interfaces, allowing deep learning frameworks to deliver efficient multi-node, multi-GPU training at scale. Using NCCL 2.3 and later, developers can leverage new features of Turing architecture and benefit from improved low latency algorithms.

Learn more and download NCCL.

Kicking Graphics into High Gear


The NVIDIA NGX technology stack provides pre-trained networks and AI-based features that enhance graphics, accelerate video processing and improve image quality. These features rely on Tensor Cores found in RTX GPUs to maximize their efficiency and performance.

With the new RT Cores and advanced shaders on Turing GPUs, the NGX SDK can boost game performance and enhance digital content creation pipelines. The NGX SDK will be available in the next few weeks.

Learn more and register to be notified.

VRWorks Graphics SDK

Turing enables real-time ray tracing, AI and advanced shading techniques to bring virtual reality experiences to a level of realism far beyond the capabilities of traditional VR rendering.

The new VRWorks Graphics SDK improves performance and quality for VR applications by taking advantage of Turing’s variable rate shading abilities to concentrate rendering power where the eye is focused. It will also accelerate the next generation of ultra-wide field of view headsets with multi-view rendering.

Learn more and download NVIDIA VRWorks.

Giving Developers More Tools

Nsight Suite

The Nsight tool suite equips developers with powerful debugging and profiling tools to optimize performance, analyze bottlenecks and observe system activities.

Nsight Systems helps developers identify bottlenecks across their CPUs and GPUs, providing the insights needed to optimize their software. The new version includes CUDA 10 support and other enhancements.

Nsight Compute is an interactive CUDA API debugging and kernel profiling tool. The current version offers fast data collection of detailed performance metrics and API debugging via a user interface and command line tool.

Nsight Visual Studio Edition is an application development environment that allows developers to build, debug, profile and trace GPU applications. The new version includes graphics debugging with ray-tracing support and enhanced compute debugging and analysis with CUDA 10 support.

Nsight Graphics supports debugging, profiling and exporting frames built with popular graphics APIs. The new version makes GPU Trace publicly available and adds support for Vulkan ray tracing extensions.

Learn more and download NVIDIA Nsight.

Learn more information about the SDK updates on our Developer News Center.

The post Attention, Developers: We’re Turning Up the Power of NVIDIA SDKs for Turing appeared first on The Official NVIDIA Blog.

NVIDIA Clara Platform to Usher in Next Generation of Medical Instruments

NVIDIA today unveiled the NVIDIA Clara platform, a combination of hardware and software that brings AI to the next generation of medical instruments as a powerful new tool for early detection, diagnosis and treatment of diseases.

At the heart of the platform are NVIDIA Clara AGX — a revolutionary computing architecture based on the NVIDIA Xavier AI computing module and NVIDIA Turing GPUs — and the Clara software development kit for developers to create a wide range of AI-powered applications for processing data from existing systems.

The Clara platform addresses the great challenge of medical instruments: processing the massive sea of data — tens to thousands of gigabytes worth — generated each second so it can be interpreted by doctors and scientists. Achieving this level of supercomputing has traditionally required three computing architectures: FPGAs, CPUs and GPUs.

Clara AGX

Clara AGX simplifies this to a single, GPU-based architecture that delivers the world’s fastest AI inferencing on NVIDIA Tensor Cores; acceleration through CUDA, the world’s most widely adopted accelerated computing platform; and state-of-the-art NVIDIA RTX graphics. Its flexible design enables it to scale from entry-level devices to the most demanding 3D instruments.

Developers interested in Clara AGX can get started today with the NVIDIA Jetson AGX Xavier developer kit.

Clara SDK

Clara also addresses a fundamental disconnect between legacy medical instruments — which typically have a lifespan of over 10 years — and their ability to run modern applications, which benefit from the 1,000x acceleration of GPU computing over the past decade.

It achieves this by enabling the installed base of instruments to connect to the latest NVIDIA GPU servers through its ability to process raw instrument data. The most advanced imaging applications — like iterative reconstruction for CT and X-ray, beamforming for ultrasound and compressed sensing for MRI — can run on 10-year-old instruments.

The Clara SDK provides medical-application developers with a set of GPU-accelerated libraries for computing, graphics and AI; example applications for reconstruction, image processing and rendering; and computational workflows for CT, MRI and ultrasound. These all leverage containers and Kubernetes to virtualize and scale medical instrument applications for any instrument.

Support from Medical Imaging Developers

Medical imaging developers around the world are discovering numerous ways to use AI to automate workflows, make instruments run faster and improve image quality, in addition to assisting doctors in detecting and diagnosing disease. More than 400 AI healthcare startups have launched in the past five years, and the Clara platform will be able to help them harness AI to transform healthcare workloads.

For instance, Subtle Medical, a member of the NVIDIA Inception virtual accelerator program, is working on MRI applications that acquire images in one-quarter the time while requiring just one-tenth the contrast dosage to patients. Subtle Medical developers got their application running in a few hours, with an immediate speedup of 10x for AI inferencing.

“We are using AI to improve workflow for MRI and PET exams,” said Enhao Gong, founder of Subtle Medical. “NVIDIA’s Clara platform will enable us to seamlessly scale our technology to reduce risks from contrast and radiation, taking imaging efficiency and safety to the next level.”

ImFusion, also an Inception member, can create 3D ultrasound from a traditional 2D acquisition, and then visualize the ultrasound fused with CT. ImFusion developers ported their application to Clara in less than two days and take advantage of Clara’s inferencing, cinematic rendering engine and virtualization capability.

“We specialize in accelerated medical image computing and guided surgery,” said Wolfgang Wein, founder and CEO of ImFusion. “NVIDIA’s Clara platform gives us the ability to turn 2D medical images into 3D and deploy our technology virtually.”

The NVIDIA Clara platform is available now to early access partners, with a targeted beta planned for the second quarter of 2019.

Learn more about the NVIDIA Clara platform.

The post NVIDIA Clara Platform to Usher in Next Generation of Medical Instruments appeared first on The Official NVIDIA Blog.

NVIDIA GPU Cloud Adds Support for Microsoft Azure

Thousands more developers, data scientists and researchers can now jumpstart their GPU computing projects, following today’s announcement that Microsoft Azure is a supported platform with NVIDIA GPU Cloud (NGC).

Ready-to-run containers from NGC with Azure give developers access to on-demand GPU computing that scales to their need, and eliminates the complexity of software integration and testing.

Getting AI and HPC Projects Up and Running Faster

Building and testing reliable software stacks to run popular deep learning software — such as TensorFlow, Microsoft Cognitive Toolkit, PyTorch and NVIDIA TensorRT — is challenging and time consuming. There are dependencies at the operating system level and with drivers, libraries and runtimes. And many packages recommend differing versions of the supporting components.

To make matters worse, the frameworks and applications are updated frequently, meaning this work has to be redone every time a new version rolls out. Ideally, you’d test the new version to ensure it provides the same or better performance as before. And all of this is before you can even get started with a project.

For HPC, the difficulty is how to deploy the latest software to clusters of systems. In addition to finding and installing the correct dependencies, testing and so forth, you have to do this in a multi-tenant environment and across many systems.

NGC removes this complexity by providing pre-configured containers with GPU-accelerated software. Its deep learning containers benefit from NVIDIA’s ongoing R&D investment to make sure the containers take advantage of the latest GPU features. And we test, tune and optimize the complete software stack in the deep learning containers with monthly updates to ensure the best possible performance.

NVIDIA also works closely with the community and framework developers, and contributes back to open source projects. We made more than 800 contributions in 2017 alone. And we work with the developers of the other containers available on NGC to optimize their applications, and we test them for performance and compatibility.

NGC with Microsoft Azure

You can access 35 GPU-accelerated containers for deep learning software, HPC applications, HPC visualization tools and a variety of partner applications from the NGC container registry and run them on the following Microsoft Azure instance types with NVIDIA GPUs:

The same NGC containers work across Azure instance types, even with different types or quantities of GPUs.

Using NGC containers with Azure is simple.

Just go to the Microsoft Azure Marketplace and find the NVIDIA GPU Cloud Image for Deep Learning and HPC (this is a pre-configured Azure virtual machine image with everything needed to run NGC containers). Launch a compatible NVIDIA GPU instance on Azure. Then, pull the containers you want from the NGC registry into your running instance. (You’ll need to sign up for a free NGC account first.) Detailed information is in the Using NGC with Microsoft Azure documentation.

In addition to using NVIDIA published images on Azure Marketplace to run these NGC containers, Azure Batch AI can also be used to download and run these containers from NGC on Azure NCv2, NCv3 and ND virtual machines. Follow these simple GitHub instructions to start with Batch AI with NGC containers.

With NGC support for Azure, we are making it even easier for everyone to start with AI or HPC in cloud. See how easy it is for yourself.

Sign up now for our upcoming webinar on October 2 at 9am PT to learn more, and get started with NGC today.

The post NVIDIA GPU Cloud Adds Support for Microsoft Azure appeared first on The Official NVIDIA Blog.

Visualize Large-Scale, Unstructured Data in Real Time for Faster Scientific Discoveries

To help develop better pacemakers, researchers at the Barcelona Supercomputing Center recently developed the world’s first comprehensive heart model.

It’s an amazing achievement, mimicking blood flow and muscle reaction based on the heart’s electrical signals. Nearly as daunting: visualizing and analyzing their huge 54 million tetrahedral-cell model.

When running simulations at scale, supercomputers generate petabytes of data. For scientists, visualizing an entire dataset with high fidelity and interactivity is key to gathering insights. But datasets have grown so vast, that’s become difficult.

IndeX heart visualization

Tackling Scientific Visualization in the HPC Era

NVIDIA IndeX packs the performance needed to visualize these compute-heavy jobs. It works on large-scale datasets by distributing workloads across multiple nodes in a GPU-accelerated cluster.

With IndeX, there’s no need throttle back frame rates to visualize and analyze volumetric data. And there’s no need for workarounds like batch rendering that lose interactivity and show data in 2D.

IndeX lets users view their simulation’s entire datasets in real time.

ParaView Users Can Take Advantage of NVIDIA IndeX

Even better, users of ParaView, a popular HPC visualization and data analysis tool, can now take advantage of NVIDIA IndeX through the latest plug-in. ParaView is the “go to” tool for analyzing a wide variety of simulation-based data. It’s supported at all major supercomputing sites.

With IndeX on ParaView, scientists can interact with volume visualizations that scale with structured and unstructured data. This lets them analyze entire datasets in real time.

“High-interactivity visualization of our full dataset is key to gathering meaningful findings,” said Mariano Vazquez, high-performance computational mechanics group manager at the Barcelona Supercomputing Center. “The IndeX plug-in enabled us to visualize our 54 million tetrahedral-cell model in real time. Best of all, it fits right inside our existing ParaView workflow.”

Integrating IndeX as a plug-in inside ParaView allows users to take advantage of the powerful features of IndeX without  learning a new tool. In addition, the workflow remains unchanged so users can focus on their research.

NVIDIA IndeX for ParaView Key Features

  • Render structured and unstructured volume data
  • Depth correct mixing with ParaView primitives
  • Delivers high interactivity for large datasets
  • Time-series visualization
  • Scales across multi-GPU, multi-node cluster
  • Open source plug-in for custom versions of ParaView
  • Supports ParaView data formats

IndeX Plugin for Workstations and HPC Clusters

There are two versions of the plug-in. For usage in a workstation, or single server node, the plug-in is available at no cost. For performance at scale in a GPU-accelerated multi-node system, the Cluster edition of the plug-in is available at no cost to academic users and with a license for commercial users.

Get your plug-in now at nvidia.com/index.

Stop by the NVIDIA booth, H-730, at ISC this week and check out our HPC visualization demo showing real-time interactivity on a large, unstructured volume dataset.

The post Visualize Large-Scale, Unstructured Data in Real Time for Faster Scientific Discoveries appeared first on The Official NVIDIA Blog.

How GPUs Can Kick 3D Printing Industry Into High Gear

Three-dimensional printing has opened up new possibilities in fields like manufacturing, architecture, engineering and construction.

But when the objects to be printed become complex, limitations kick in. Challenges in 3D printing include multiple colors, differing densities and the use of a mix of materials.

At last month’s GPU Technology Conference, HP Labs and NVIDIA described how they’ve worked together to overcome these challenges using NVIDIA’s new GVDB Voxel open source software development kit.

Jun Zeng, principal scientist for HP Labs, and Rama Hoetzlein, lead architect for GVDB Voxels, presented a statue of a human figure with wings that combined these challenging elements.

Simplified, their goal was to be able to 3D print the statue while adjusting the density of materials to account for external forces. That increased structural integrity where it’s needed, while minimizing the amount and weight of material needed to produce it.

GVDB Voxels printed a 3D statue (L) of a complex image (R) with minimal materials and structural support.
GVDB Voxels printed a 3D statue (L) of a complex image (R) with structural support and minimal material.

Zeng told a roomful of GTC attendees that HP Labs had started using GPUs to more quickly process 3D printing voxels (volumetric pixels — essentially pixels in 3D space). He anticipates that printing technology and scale will rapidly increase computing demands in the future.

NVIDIA’s GVDB Voxels SDK has eased the complexity of 3D printing workflows by offering a platform for large-scale voxel simulation and high-quality ray-traced visualizations. And it allows for continuous data manipulation throughout the process.

“Iteration can happen during infilling, or while analyzing and determining stress,” said Hoetzlein.

Hoetzlein said the SDK is designed for simple efficient computation, simulation and rendering, even when there’s sparse volumetric data. It includes a compute API that generates high-resolution data and requires minimal memory footprint, and a rendering API that supports development of CUDA and NVIDIA OptiX pathways, allowing users to write custom rendering kernels.

The researchers’ effort started with a polygonal statue, which was subject to a stress simulation before the GVDB Voxels took over. The object is converted into a model made of small voxel cubes. Then the software optimizes the in-filling structure, varying the density based on the results of the stress simulation.

What they found was that combining GVDB Voxels with the latest Pascal architecture GPUs generated results 50 percent faster than the previous generation of GPUs, and up to 10x faster than CPU techniques. The SDK makes this possible by storing data only at the surface of the object. That reduces memory requirements without sacrificing resolution.

Zeng said that oftentimes the limitations of 3D printing devices dictate what designers can do. With the NVIDIA GVDB Voxels SDK, designers gain new flexibility.

More information is available at http://developer.nvidia.com/gvdb.

The post How GPUs Can Kick 3D Printing Industry Into High Gear appeared first on The Official NVIDIA Blog.

NVIDIA Delivers New Deep Learning Software Tools for Developers

To help developers meet the growing complexity of deep learning, NVIDIA today announced better and faster tools for our software development community. This includes a significant update to the NVIDIA SDK, which includes software libraries and tools for developers building AI-powered applications.

With each new generation of GPU architecture, we’ve continually improved the NVIDIA SDK. Keeping with that heritage, our software is Volta-ready.

Aided by developers’ requests, we’ve built tools, libraries and enhancements to the CUDA programming model to help developers accelerate and build the next generation of AI and HPC applications.

Chart of GPU ecosystem growth
The level of interest in GPU computing has exploded, fueled by advancements in AI.

The latest SDK updates introduce new capabilities and performance optimizations for GPU-accelerated applications:

  • New CUDA 9 speeds up HPC and deep learning applications with support for Volta GPUs, up to 5x faster performance for libraries, a new programming model for thread management, and updates to debugging and profiling tools.
  • Developers of end-user applications such as AI-powered web services and embedded edge devices benefit from 3.5x faster deep learning inference with the new TensorRT 3. With built-in support for optimizing both Caffe and TensorFlow models, developers can take trained neural networks to production faster than ever.
  • Engineers and data scientists can benefit from 2.5x faster deep learning training using Volta optimizations for frameworks such as Caffe2, Microsoft Cognitive Toolkit, MXNet, PyTorch and TensorFlow.

Here’s a detailed look at each of the software updates and the benefits they bring to developers and end users:


CUDA is the fastest software development platform for creating GPU-accelerated applications. Every new generation of GPU is accompanied by a major update of CUDA, and version 9 includes support for Volta GPUs, major updates to libraries, a new programming model, and updates to debugging and profiling tools.

Learn more about CUDA 9.

NVIDIA Deep Learning SDK

With the updated Deep Learning SDK optimized for Volta, developers have access to the libraries and tools that ensure seamless development and deployment of deep neural networks on all NVIDIA platforms, from the cloud or data center to the desktop to embedded edge devices. Deep learning frameworks using the latest updates deliver up to 2.5x faster training of CNNs, 3x faster training of RNNs and 3.5x faster inference on Volta GPUs compared to Pascal GPUs.

We’ve also worked with our partners and the communities so the Caffe2, Microsoft Cognitive Toolkit, MXNet, PyTorch and TensorFlow deep learning frameworks will be updated take advantage of the latest Deep Learning SDK and Volta.

This update brings performance improvements and new features to:


NVIDIA cuDNN provides high-performance building blocks for deep learning and is used by all the leading deep learning frameworks.

cuDNN 7 delivers 2.5x faster training of Microsoft’s ResNet50 neural network on the Volta-optimized Caffe2 deep learning framework. Apache MXNet delivers 3x faster training of OpenNMT language translation LSTM RNNs.

The cuDNN 7 release will be available in July as a free download for members of the NVIDIA Developer Program. Learn more at the cuDNN website.


Deep learning frameworks rely on NCCL to deliver multi-GPU scaling of deep learning workloads. NCCL 2 introduces multi-node scaling of deep learning training on up to eight GPU-accelerated servers. With the time required to train a neural network reduced from days to hours, developers can iterate and develop their products faster.

Developers of HPC applications and deep learning frameworks will have access to NCCL 2 in July. It will be available as a free download for members of the NVIDIA Developer Program. Learn more at the NCCL website.


Delivering AI services in real time poses stringent latency requirements for deep learning inference. With NVIDIA TensorRT 3, developers can now deliver 3.5x faster inference performance — under 7 ms real-time latency.

Developers can optimize models trained in TensorFlow or Caffe deep learning frameworks and deploy fast AI services to platforms running Linux, Microsoft Windows, BlackBerry QNX or Android operating systems.

TensorRT 3 will be available in July as a free download for members of the NVIDIA Developer Program. Learn more at the TensorRT website.


DIGITS introduces support for the TensorFlow deep learning framework. Engineers and data scientists can improve productivity by designing TensorFlow models within DIGITS and using its interactive workflow to manage datasets, training and monitor model accuracy in real time. To decrease training time and improve accuracy, the update also provides three new pre-trained models in the DIGITS Model Store: Oxford’s VGG-16 and Microsoft’s ResNet50, for image classification tasks, and NVIDIA DetectNet for object detection tasks.

DIGITS update with TensorFlow and new models will be available for the desktop and the cloud in July as a free download for members of the NVIDIA Developer Program. Learn more at the DIGITS website.

Deep Learning Frameworks

The NVIDIA Deep Learning SDK accelerates widely used deep learning frameworks such as Caffe, Microsoft Cognitive Toolkit, TensorFlow, Theano and Torch as well as many other deep learning applications. NVIDIA is working closely with leading deep learning frameworks maintainers at Amazon, Facebook, Google, Microsoft, University of Oxford and others to integrate the latest NVIDIA Deep Learning SDK libraries and immediately take advantage of the power of Volta.


Caffe2 announced on their blog an update to the framework that brings 16-bit floating point (FP16) training to Volta, developed in collaboration with NVIDIA:

“We are working closely with NVIDIA on Caffe2 to utilize the features in NVIDIA’s upcoming Tesla V100, based on the next-generation Volta architecture. Caffe2 is excited to be one of the first frameworks that is designed from the ground up to take full advantage of Volta by integrating the latest NVIDIA Deep Learning SDK libraries — NCCL and cuDNN.”


Amazon announced how they are working together with NVIDIA to bring high-performance deep learning to AWS. As part of the announcement, they spoke about the work we’ve done together on bringing Volta support to MXNet.

“In collaboration with NVIDIA, AWS engineers and researchers have pre-optimized neural machine translation (NMT) algorithms on Apache MXNet allowing developers to train the fastest on Volta-based platforms,” wrote Joseph Spisak, manager of Product Management at Amazon AI.


Google shared the latest TensorFlow benchmarks on DGX-1 on their developers blog:

“We’d like to thank NVIDIA for sharing a DGX-1 for benchmark testing and for their technical assistance. We’re looking forward to NVIDIA’s upcoming Volta architecture, and to working closely with them to optimize TensorFlow’s performance there, and to expand support for FP16.”

At NVIDIA, we’re also working closely with Microsoft to optimize Microsoft Cognitive Toolkit and Facebook AI Research (FAIR) to optimize PyTorch on Volta.

NVIDIA GPU Cloud Deep Learning Stack

We also announced today NVIDIA GPU Cloud (NGC), a GPU-accelerated cloud platform optimized for deep learning.

NGC is designed for developers of deep learning-powered applications who don’t want to assemble and maintain the latest deep learning software and GPUs. This comes with NGC Deep Learning Stack, a complete development environment that will run on PC, DGX and the cloud, and is powered by the latest deep learning frameworks, NVIDIA Deep Learning SDK and CUDA. The stack is fully managed by NVIDIA so developers and data scientists can start with a single GPU on a PC and scale up to additional compute resource on the cloud.

Updates to NVIDIA VRWorks and DesignWorks

Learn more information about the latest updates to some of our other SDKs:

DesignWorks GTC 2017 release

VRWorks Audio and 360 Video SDKs released at GTC

The post NVIDIA Delivers New Deep Learning Software Tools for Developers appeared first on The Official NVIDIA Blog.

NVIDIA and SAP Partner to Create a New Wave of AI Business Applications

Businesses collect mountains of data daily. Now it’s time to make those mountains move.

NVIDIA CEO and founder Jensen Huang announced today at our GPU Technology Conference that SAP and NVIDIA are working together to help businesses use AI in ways that will change the world’s view of business applications.

Together, we’re combining the advantages of NVIDIA’s AI computing platform with SAP’s leadership in enterprise software.

“With strong partners like NVIDIA at our side, the possibilities are limitless,” wrote SAP Chief Innovation Officer Juergen Mueller in a blog post published today. “New applications, unprecedented value in existing applications, and easy access to machine learning services will allow you to make your own enterprise intelligent.”

SAP is leveraging advancements NVIDIA has made from GPUs to systems to software. Our Tesla GPU computing platform represents a $2 billion investment. The NVIDIA DGX-1 — announced just over a year ago and incorporating eight GPUs — is an integrated hardware and software supercomputer that’s the result of work by over a dozen engineering teams.

DGX-1, in turn, brings together an integrated suite of software, including the leading deep learning frameworks optimized with our NVIDIA Deep Learning SDK.

Here are three examples of our collaboration with SAP that we’re demonstrating at GTC:

Measuring the ROI of Brand Impact

Many brands rely on sponsoring of televised events, yet it’s very difficult to track the impact of those ads. With the current manual process, it takes industry up to six weeks to report brand impact return on investment, and an entire quarter to adjust brand marketing expenditures.

SAP Brand Impact, powered by NVIDIA deep learning, measures brand attributes such as logos in near real time and with superhuman accuracy, because AI is not limited by all the human constraints. This is made possible using deep neural networks trained on NVIDIA DGX-1 and TensorRT to provide video inference analysis.

Results are immediate, accurate and auditable. Delivered in a day.

As a long-term SAP customer, Audi got early access to the latest SAP solution powered by NVIDIA deep learning, explains Global Head of Audi Sports Marketing Thomas Glas.

“Audi’s sponsorship team found the SAP Brand Impact solution a very useful tool. It can help Audi to evaluate its sponsorship exposure at high levels of operational excellence and transparency,” Glas said. “We were impressed by the capabilities and results of the first proof-of-concepts based on video footage from Audi FIS Alpine Ski World Cup. We’re strongly considering possibilities to combine SAP Brand Impact with our media analysis workflow for the upcoming Audi Cup and Audi FC Bayern Summer Tour.”

SAP Brand Impact screenshot
SAP Brand Impact — capturing brand logo placement in near real time. (Source: SAP)

The Future of Accounts Payable

Talk about a paper trail. A typical large manufacturing company processes 8 million invoices a year. Companies around the world still receive paper invoices that need to be processed manually. These manual processes are costly, time-consuming, repetitive and error-prone.

SAP used deep learning to train its Accounts Payable application, which automates the extraction and classification of relevant information from invoices without human intervention. A recurrent neural network is trained on NVIDIA GPUs to create this customized solution.

Records are processed in sub-seconds. Cash flow is sped up. Errors are reduced.

SAP Accounts Payable
Accounts Payable — Automatically loading accounts payable vendor details. (Source: SAP)

The Ticket to Customer Satisfaction

Eighty-one percent of companies recognize customer experience as a competitive differentiator, so why do just 13 percent rate their customer service at 9/10 or better? Companies struggle to keep up with their customers’ complaints and support issues with limited resources.

Using natural language processing and deep learning techniques on NVIDIA GPU platform, the SAP Service Ticketing application helps companies analyze unstructured data and create automated rules to categorize and route service tickets to the right person.

The result: a faster response and an improved customer experience.

SAP Service Ticketing
SAP Service Ticketing — automatically tag service tickets in the right category. (Source: SAP)

See More at GTC, SAPPHIRE and on the Web

To learn more about the SAP demos at GTC, join us at booth 118. We’ll also be at SAP SAPPHIRE, in Orlando next week, to showcase five more applications.

If you can’t make it to either show, join us for our live webinar on how we’re bringing AI to the enterprise, on June 14 at 9 am Pacific.

The post NVIDIA and SAP Partner to Create a New Wave of AI Business Applications appeared first on The Official NVIDIA Blog.