Backed by Google, Intel, Baidu, NVIDIA and dozens more technology leaders, the new MLPerf benchmark suite measures a wide range of deep learning workloads. Aiming to serve as the industry’s first objective AI benchmark suite, it covers such areas as computer vision, language translation, personalized recommendations and reinforcement learning tasks.
NVIDIA achieved the best performance in the six MLPerf benchmark results it submitted for. These cover a variety of workloads and infrastructure scale – ranging from 16 GPUs on one node to up to 640 GPUs across 80 nodes.
The six categories include image classification, object instance segmentation, object detection, non-recurrent translation, recurrent translation and recommendation systems. NVIDIA did not submit results for the seventh category for reinforcement learning, which does not yet take advantage of GPU acceleration.
A key benchmark on which NVIDIA technology performed particularly well was language translation, training the Transformer neural network in just 6.2 minutes. More details on all six submissions are available on the NVIDIA Developer news center.
NVIDIA is the only company to have entered as many as six benchmarks, demonstrating the versatility of V100 Tensor Core GPUs for the wide variety of AI workloads deployed today.
“The new MLPerf benchmarks demonstrate the unmatched performance and versatility of NVIDIA’s Tensor Core GPUs,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “Exceptionally affordable and available in every geography from every cloud service provider and every computer maker, our Tensor Core GPUs are helping developers around the world advance AI at every stage of development.”
State-of-the-Art AI Computing Requires Full Stack Innovation
Performance on complex and diverse computing workloads takes more than great chips. Accelerated computing is about more than an accelerator. It takes the full stack.
NVIDIA’s AI platform is also the most accessible and affordable. Tensor Core GPUs are available on every cloud and from every computer maker and in every geography.
The same power of Tensor Core GPUs is also available on the desktop, with the most powerful desktop GPU, NVIDIA TITAN RTX costing only $2,500. When amortized over three years, this translates to just a few cents per hour.
NVIDIA’s Record-Setting Platform Available Now on NGC
The software innovations and optimizations used to achieve NVIDIA’s industry-leading MLPerf performance are available free of charge in our latest NGC deep learning containers. Download them from the NGC container registry.
The containers include the complete software stack and the top AI frameworks, optimized by NVIDIA. Our 18.11 release of the NGC deep learning containers includes the exact software used to achieve our MLPerf results.
Developers can use them everywhere, at every stage of development:
For enterprises, the containers accelerate the application of AI to their data in the cloud with NVIDIA GPU-accelerated instances from Alibaba Cloud, AWS, Baidu Cloud, Google Cloud Platform, IBM Cloud, Microsoft Azure, Oracle Cloud Infrastructure and Tencent Cloud.
For organizations building on-premise AI infrastructure, NVIDIA DGX systems and NGC-Ready systems from Atos, Cisco, Cray, Dell EMC, HP, HPE, Inspur, Lenovo, Sugon and Supermicro put AI to work.
To get started on your AI project, or to run your own MLPerf benchmark, download containers from the NGC container registry.
NVIDIA PhysX, the most popular physics simulation engine on the planet, is going open source.
We’re doing this because physics simulation — long key to immersive games and entertainment — turns out to be more important than we ever thought.
Physics simulation dovetails with AI, robotics and computer vision, self-driving vehicles, and high-performance computing.
It’s foundational for so many different things we’ve decided to provide it to the world in an open source fashion.
Meanwhile, we’re building on more than a decade of continuous investment in this area to simulate the world with ever greater fidelity, with on-going research and development to meet the needs of those working in robotics and with autonomous vehicles.
Free, Open-Source, GPU-Accelerated
PhysX will now be the only free, open-source physics solution that takes advantage of GPU acceleration and can handle large virtual environments.
It will be available as open source starting Monday, Dec. 3, under the simple BSD-3 license.
PhysX solves some serious challenges.
In AI, researchers need synthetic data — artificial representations of the real world — to train data-hungry neural networks.
In robotics, researchers need to train robotic minds in environments that work like the real one.
For self-driving cars, PhysX allows vehicles to drive for millions of miles in simulators that duplicate real-world conditions.
In game development, canned animation doesn’t look organic and is time consuming to produce at a polished level.
In high-performance computing, physics simulations are being done on ever more powerful machines with ever greater levels of fidelity.
The list goes on.
PhysX SDK addresses these challenges with scalable, stable and accurate simulations. It’s widely compatible, and it’s now open source.
PhysX SDK is a scalable multi-platform game physics solution supporting a wide range of devices, from smartphones to high-end multicore CPUs and GPUs.
It’s already integrated into some of the most popular game engines, including Unreal Engine (versions 3 and 4) and Unity3D.
No matter the industry, data science has become a universal toolkit for businesses. Data analytics and machine learning give organizations insights and answers that shape their day-to-day actions and future plans. Being data-driven has become essential to lead any industry.
While the world’s data doubles each year, CPU computing has hit a brick wall with the end of Moore’s law. For this reason, scientific computing and deep learning have turned to NVIDIA GPU acceleration. Data analytics and machine learning haven’t yet tapped into the GPU as systematically. That’s changing.
RAPIDS, launched today at GTC Europe, gives data scientists for the first time a robust platform for GPU-accelerated data science: analytics, machine learning and, soon, data visualization. And what’s more, the libraries are open-source, built with the support of open-source contributors and available immediately at www.RAPIDS.ai.
Initial benchmarks show game-changing 50x speedups with RAPIDS running on the NVIDIA DGX-2 AI supercomputer, compared with CPU-only systems, reducing experiment iteration from hours to minutes.
By the Community, for the Community
With a suite of CUDA-integrated software tools, RAPIDS gives developers new plumbing under the foundation of their data science workflows.
To make this happen, NVIDIA engineers and open-source Python contributors collaborated for two years. Building on key open-source projects including Apache Arrow, Pandas and scikit-learn, RAPIDS connects the data science ecosystem by bringing together popular capabilities from multiple libraries and adding the power of GPU acceleration.
RAPIDS will also integrate with Apache Spark, the leading open-source data science framework for data centers, used by more than 1,000 organizations.
A data science workshop following the GTC Europe keynote will feature a panel with luminaries of the open-source community — Travis Oliphant and Peter Wang, co-founders of Anaconda, as well as Wes McKinney, founder and creator of Apache Arrow and the Pandas software library, and a contributor to RAPIDS.
These pioneers will discuss the potential for RAPIDS for GPU-accelerated data science before an audience of developers, researchers and business leaders. At the workshop, Databricks, a company founded by the creators of Spark, will present on unifying data management and machine learning tools using GPUs.
It was a natural step for NVIDIA, as the creator of CUDA, to develop the first complete solution that integrates Python data science libraries with CUDA at the kernel level. By keeping it open source, we welcome further growth and contributions from other developers in the ecosystem.
This community is vast — tens of millions of downloads occur annually of the core data science libraries via the package manager Conda. Open-source development makes it easier for data scientists to rapidly adopt RAPIDS and maintain the flexibility to modify and customize tools for their applications.
There are 120 repositories on our GitHub page, including research algorithms, the CUTLASS library for matrix multiplication in CUDA and NVcaffe, our fork of the Caffe deep learning framework. And we’ll continue to contribute to RAPIDS alongside the open-source community, supporting data scientists as they conduct efficient, granular analysis.
Delivering Rapid Answers to Data Science Questions
Data scientists, and the insights they extract, are in high demand. But when relying on CPU systems, there’s always been a limit on how fast they can crunch data.
Depending on the size of their datasets, scientists may have a long wait for results from their machine learning models. And some may aggregate or simplify their data, sacrificing granularity for faster results.
With the adoption of RAPIDS and GPUs, data scientists can ramp up iteration and testing, providing more accurate predictions that improve business outcomes. Typical training times can shrink from days to hours, or from hours to minutes.
In the retail industry, it could allow grocery chains to estimate the optimal amount of fresh fruit to stock in each store location. For banks, GPU-accelerated insights could alert lenders about which homeowners are at risk of defaulting on a mortgage.
Access to the RAPIDS open-source suite of libraries is immediately available at www.RAPIDS.ai, where the code is being released under the Apache license. Containerized versions of RAPIDS will be available this week on the NVIDIA GPU Cloud container registry.
Unveiled last month, Turing is one of the biggest leaps in computer graphics in 20 years. As the first Turing-based GPUs hit the shelves, we’re delivering software innovations that help developers take advantage of this powerful architecture and boost computing performance.
Here’s a look at new software optimized for Turing and geared to advance the next generation of research, AI services and graphics.
A software library built on the CUDA platform, cuDNN accelerates deep learning training. Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning.
Deep learning frameworks using cuDNN 7.3 can leverage new features and performance of the Turing architecture to deliver faster training performance. cuDNN 7.3 highlights include improved grouped convolution and dilated convolution performance for popular deep learning applications in computer vision and speech.
Turing GPUs are loaded with Tensor Cores that accelerate deep learning inference, which is when neural networks are deployed in the field. These cores accelerate applications such as natural language processing, recommender apps and neural machine translation.
With new optimizations and APIs, TensorRT 5 delivers up to 40x faster inference performance over CPUs, helping developers run AI services in real time. In addition to Linux, TensorRT 5 introduces support for new operating systems: Windows and CentOS.
NCCL enables fast communication between GPUs across various network interfaces, allowing deep learning frameworks to deliver efficient multi-node, multi-GPU training at scale. Using NCCL 2.3 and later, developers can leverage new features of Turing architecture and benefit from improved low latency algorithms.
The NVIDIA NGX technology stack provides pre-trained networks and AI-based features that enhance graphics, accelerate video processing and improve image quality. These features rely on Tensor Cores found in RTX GPUs to maximize their efficiency and performance.
With the new RT Cores and advanced shaders on Turing GPUs, the NGX SDK can boost game performance and enhance digital content creation pipelines. The NGX SDK will be available in the next few weeks.
Turing enables real-time ray tracing, AI and advanced shading techniques to bring virtual reality experiences to a level of realism far beyond the capabilities of traditional VR rendering.
The new VRWorks Graphics SDK improves performance and quality for VR applications by taking advantage of Turing’s variable rate shading abilities to concentrate rendering power where the eye is focused. It will also accelerate the next generation of ultra-wide field of view headsets with multi-view rendering.
The Nsight tool suite equips developers with powerful debugging and profiling tools to optimize performance, analyze bottlenecks and observe system activities.
Nsight Systems helps developers identify bottlenecks across their CPUs and GPUs, providing the insights needed to optimize their software. The new version includes CUDA 10 support and other enhancements.
Nsight Compute is an interactive CUDA API debugging and kernel profiling tool. The current version offers fast data collection of detailed performance metrics and API debugging via a user interface and command line tool.
Nsight Visual Studio Edition is an application development environment that allows developers to build, debug, profile and trace GPU applications. The new version includes graphics debugging with ray-tracing support and enhanced compute debugging and analysis with CUDA 10 support.
Nsight Graphics supports debugging, profiling and exporting frames built with popular graphics APIs. The new version makes GPU Trace publicly available and adds support for Vulkan ray tracing extensions.
NVIDIA today unveiled the NVIDIA Clara platform, a combination of hardware and software that brings AI to the next generation of medical instruments as a powerful new tool for early detection, diagnosis and treatment of diseases.
At the heart of the platform are NVIDIA Clara AGX — a revolutionary computing architecture based on the NVIDIA Xavier AI computing module and NVIDIA Turing GPUs — and the Clara software development kit for developers to create a wide range of AI-powered applications for processing data from existing systems.
The Clara platform addresses the great challenge of medical instruments: processing the massive sea of data — tens to thousands of gigabytes worth — generated each second so it can be interpreted by doctors and scientists. Achieving this level of supercomputing has traditionally required three computing architectures: FPGAs, CPUs and GPUs.
Clara AGX simplifies this to a single, GPU-based architecture that delivers the world’s fastest AI inferencing on NVIDIA Tensor Cores; acceleration through CUDA, the world’s most widely adopted accelerated computing platform; and state-of-the-art NVIDIA RTX graphics. Its flexible design enables it to scale from entry-level devices to the most demanding 3D instruments.
Clara also addresses a fundamental disconnect between legacy medical instruments — which typically have a lifespan of over 10 years — and their ability to run modern applications, which benefit from the 1,000x acceleration of GPU computing over the past decade.
It achieves this by enabling the installed base of instruments to connect to the latest NVIDIA GPU servers through its ability to process raw instrument data. The most advanced imaging applications — like iterative reconstruction for CT and X-ray, beamforming for ultrasound and compressed sensing for MRI — can run on 10-year-old instruments.
The Clara SDK provides medical-application developers with a set of GPU-accelerated libraries for computing, graphics and AI; example applications for reconstruction, image processing and rendering; and computational workflows for CT, MRI and ultrasound. These all leverage containers and Kubernetes to virtualize and scale medical instrument applications for any instrument.
Support from Medical Imaging Developers
Medical imaging developers around the world are discovering numerous ways to use AI to automate workflows, make instruments run faster and improve image quality, in addition to assisting doctors in detecting and diagnosing disease. More than 400 AI healthcare startups have launched in the past five years, and the Clara platform will be able to help them harness AI to transform healthcare workloads.
For instance, Subtle Medical, a member of the NVIDIA Inception virtual accelerator program, is working on MRI applications that acquire images in one-quarter the time while requiring just one-tenth the contrast dosage to patients. Subtle Medical developers got their application running in a few hours, with an immediate speedup of 10x for AI inferencing.
“We are using AI to improve workflow for MRI and PET exams,” said Enhao Gong, founder of Subtle Medical. “NVIDIA’s Clara platform will enable us to seamlessly scale our technology to reduce risks from contrast and radiation, taking imaging efficiency and safety to the next level.”
ImFusion, also an Inception member, can create 3D ultrasound from a traditional 2D acquisition, and then visualize the ultrasound fused with CT. ImFusion developers ported their application to Clara in less than two days and take advantage of Clara’s inferencing, cinematic rendering engine and virtualization capability.
“We specialize in accelerated medical image computing and guided surgery,” said Wolfgang Wein, founder and CEO of ImFusion. “NVIDIA’s Clara platform gives us the ability to turn 2D medical images into 3D and deploy our technology virtually.”
The NVIDIA Clara platform is available now to early access partners, with a targeted beta planned for the second quarter of 2019.
Thousands more developers, data scientists and researchers can now jumpstart their GPU computing projects, following today’s announcement that Microsoft Azure is a supported platform with NVIDIA GPU Cloud (NGC).
Ready-to-run containers from NGC with Azure give developers access to on-demand GPU computing that scales to their need, and eliminates the complexity of software integration and testing.
Getting AI and HPC Projects Up and Running Faster
Building and testing reliable software stacks to run popular deep learning software — such as TensorFlow, Microsoft Cognitive Toolkit, PyTorch and NVIDIA TensorRT — is challenging and time consuming. There are dependencies at the operating system level and with drivers, libraries and runtimes. And many packages recommend differing versions of the supporting components.
To make matters worse, the frameworks and applications are updated frequently, meaning this work has to be redone every time a new version rolls out. Ideally, you’d test the new version to ensure it provides the same or better performance as before. And all of this is before you can even get started with a project.
For HPC, the difficulty is how to deploy the latest software to clusters of systems. In addition to finding and installing the correct dependencies, testing and so forth, you have to do this in a multi-tenant environment and across many systems.
NGC removes this complexity by providing pre-configured containers with GPU-accelerated software. Its deep learning containers benefit from NVIDIA’s ongoing R&D investment to make sure the containers take advantage of the latest GPU features. And we test, tune and optimize the complete software stack in the deep learning containers with monthly updates to ensure the best possible performance.
NVIDIA also works closely with the community and framework developers, and contributes back to open source projects. We made more than 800 contributions in 2017 alone. And we work with the developers of the other containers available on NGC to optimize their applications, and we test them for performance and compatibility.
NGC with Microsoft Azure
You can access 35 GPU-accelerated containers for deep learning software, HPC applications, HPC visualization tools and a variety of partner applications from the NGC container registry and run them on the following Microsoft Azure instance types with NVIDIA GPUs:
In addition to using NVIDIA published images on Azure Marketplace to run these NGC containers, Azure Batch AI can also be used to download and run these containers from NGC on Azure NCv2, NCv3 and ND virtual machines. Follow these simple GitHub instructions to start with Batch AI with NGC containers.
With NGC support for Azure, we are making it even easier for everyone to start with AI or HPC in cloud. See how easy it is for yourself.
To help develop better pacemakers, researchers at the Barcelona Supercomputing Center recently developed the world’s first comprehensive heart model.
It’s an amazing achievement, mimicking blood flow and muscle reaction based on the heart’s electrical signals. Nearly as daunting: visualizing and analyzing their huge 54 million tetrahedral-cell model.
When running simulations at scale, supercomputers generate petabytes of data. For scientists, visualizing an entire dataset with high fidelity and interactivity is key to gathering insights. But datasets have grown so vast, that’s become difficult.
Tackling Scientific Visualization in the HPC Era
NVIDIA IndeX packs the performance needed to visualize these compute-heavy jobs. It works on large-scale datasets by distributing workloads across multiple nodes in a GPU-accelerated cluster.
With IndeX, there’s no need throttle back frame rates to visualize and analyze volumetric data. And there’s no need for workarounds like batch rendering that lose interactivity and show data in 2D.
IndeX lets users view their simulation’s entire datasets in real time.
ParaView Users Can Take Advantage of NVIDIA IndeX
Even better, users of ParaView, a popular HPC visualization and data analysis tool, can now take advantage of NVIDIA IndeX through the latest plug-in. ParaView is the “go to” tool for analyzing a wide variety of simulation-based data. It’s supported at all major supercomputing sites.
With IndeX on ParaView, scientists can interact with volume visualizations that scale with structured and unstructured data. This lets them analyze entire datasets in real time.
“High-interactivity visualization of our full dataset is key to gathering meaningful findings,” said Mariano Vazquez, high-performance computational mechanics group manager at the Barcelona Supercomputing Center. “The IndeX plug-in enabled us to visualize our 54 million tetrahedral-cell model in real time. Best of all, it fits right inside our existing ParaView workflow.”
Integrating IndeX as a plug-in inside ParaView allows users to take advantage of the powerful features of IndeX without learning a new tool. In addition, the workflow remains unchanged so users can focus on their research.
NVIDIA IndeX for ParaView Key Features
Render structured and unstructured volume data
Depth correct mixing with ParaView primitives
Delivers high interactivity for large datasets
Scales across multi-GPU, multi-node cluster
Open source plug-in for custom versions of ParaView
Supports ParaView data formats
IndeX Plugin for Workstations and HPC Clusters
There are two versions of the plug-in. For usage in a workstation, or single server node, the plug-in is available at no cost. For performance at scale in a GPU-accelerated multi-node system, the Cluster edition of the plug-in is available at no cost to academic users and with a license for commercial users.
Jun Zeng, principal scientist for HP Labs, and Rama Hoetzlein, lead architect for GVDB Voxels, presented a statue of a human figure with wings that combined these challenging elements.
Simplified, their goal was to be able to 3D print the statue while adjusting the density of materials to account for external forces. That increased structural integrity where it’s needed, while minimizing the amount and weight of material needed to produce it.
Zeng told a roomful of GTC attendees that HP Labs had started using GPUs to more quickly process 3D printing voxels (volumetric pixels — essentially pixels in 3D space). He anticipates that printing technology and scale will rapidly increase computing demands in the future.
NVIDIA’s GVDB Voxels SDK has eased the complexity of 3D printing workflows by offering a platform for large-scale voxel simulation and high-quality ray-traced visualizations. And it allows for continuous data manipulation throughout the process.
“Iteration can happen during infilling, or while analyzing and determining stress,” said Hoetzlein.
Hoetzlein said the SDK is designed for simple efficient computation, simulation and rendering, even when there’s sparse volumetric data. It includes a compute API that generates high-resolution data and requires minimal memory footprint, and a rendering API that supports development of CUDA and NVIDIA OptiX pathways, allowing users to write custom rendering kernels.
The researchers’ effort started with a polygonal statue, which was subject to a stress simulation before the GVDB Voxels took over. The object is converted into a model made of small voxel cubes. Then the software optimizes the in-filling structure, varying the density based on the results of the stress simulation.
What they found was that combining GVDB Voxels with the latest Pascal architecture GPUs generated results 50 percent faster than the previous generation of GPUs, and up to 10x faster than CPU techniques. The SDK makes this possible by storing data only at the surface of the object. That reduces memory requirements without sacrificing resolution.
Zeng said that oftentimes the limitations of 3D printing devices dictate what designers can do. With the NVIDIA GVDB Voxels SDK, designers gain new flexibility.
To help developers meet the growing complexity of deep learning, NVIDIA today announced better and faster tools for our software development community. This includes a significant update to the NVIDIA SDK, which includes software libraries and tools for developers building AI-powered applications.
With each new generation of GPU architecture, we’ve continually improved the NVIDIA SDK. Keeping with that heritage, our software is Volta-ready.
Aided by developers’ requests, we’ve built tools, libraries and enhancements to the CUDA programming model to help developers accelerate and build the next generation of AI and HPC applications.
The latest SDK updates introduce new capabilities and performance optimizations for GPU-accelerated applications:
New CUDA 9 speeds up HPC and deep learning applications with support for Volta GPUs, up to 5x faster performance for libraries, a new programming model for thread management, and updates to debugging and profiling tools.
Developers of end-user applications such as AI-powered web services and embedded edge devices benefit from 3.5x faster deep learning inference with the new TensorRT 3. With built-in support for optimizing both Caffe and TensorFlow models, developers can take trained neural networks to production faster than ever.
Engineers and data scientists can benefit from 2.5x faster deep learning training using Volta optimizations for frameworks such as Caffe2, Microsoft Cognitive Toolkit, MXNet, PyTorch and TensorFlow.
Here’s a detailed look at each of the software updates and the benefits they bring to developers and end users:
CUDA is the fastest software development platform for creating GPU-accelerated applications. Every new generation of GPU is accompanied by a major update of CUDA, and version 9 includes support for Volta GPUs, major updates to libraries, a new programming model, and updates to debugging and profiling tools.
With the updated Deep Learning SDK optimized for Volta, developers have access to the libraries and tools that ensure seamless development and deployment of deep neural networks on all NVIDIA platforms, from the cloud or data center to the desktop to embedded edge devices. Deep learning frameworks using the latest updates deliver up to 2.5x faster training of CNNs, 3x faster training of RNNs and 3.5x faster inference on Volta GPUs compared to Pascal GPUs.
We’ve also worked with our partners and the communities so the Caffe2, Microsoft Cognitive Toolkit, MXNet, PyTorch and TensorFlow deep learning frameworks will be updated take advantage of the latest Deep Learning SDK and Volta.
This update brings performance improvements and new features to:
NVIDIA cuDNN provides high-performance building blocks for deep learning and is used by all the leading deep learning frameworks.
cuDNN 7 delivers 2.5x faster training of Microsoft’s ResNet50 neural network on the Volta-optimized Caffe2 deep learning framework. Apache MXNet delivers 3x faster training of OpenNMT language translation LSTM RNNs.
Deep learning frameworks rely on NCCL to deliver multi-GPU scaling of deep learning workloads. NCCL 2 introduces multi-node scaling of deep learning training on up to eight GPU-accelerated servers. With the time required to train a neural network reduced from days to hours, developers can iterate and develop their products faster.
Developers of HPC applications and deep learning frameworks will have access to NCCL 2 in July. It will be available as a free download for members of the NVIDIA Developer Program. Learn more at the NCCL website.
Delivering AI services in real time poses stringent latency requirements for deep learning inference. With NVIDIA TensorRT 3, developers can now deliver 3.5x faster inference performance — under 7 ms real-time latency.
Developers can optimize models trained in TensorFlow or Caffe deep learning frameworks and deploy fast AI services to platforms running Linux, Microsoft Windows, BlackBerry QNX or Android operating systems.
TensorRT 3 will be available in July as a free download for members of the NVIDIA Developer Program. Learn more at the TensorRT website.
DIGITS introduces support for the TensorFlow deep learning framework. Engineers and data scientists can improve productivity by designing TensorFlow models within DIGITS and using its interactive workflow to manage datasets, training and monitor model accuracy in real time. To decrease training time and improve accuracy, the update also provides three new pre-trained models in the DIGITS Model Store: Oxford’s VGG-16 and Microsoft’s ResNet50, for image classification tasks, and NVIDIA DetectNet for object detection tasks.
DIGITS update with TensorFlow and new models will be available for the desktop and the cloud in July as a free download for members of the NVIDIA Developer Program. Learn more at the DIGITS website.
Deep Learning Frameworks
The NVIDIA Deep Learning SDK accelerates widely used deep learning frameworks such as Caffe, Microsoft Cognitive Toolkit, TensorFlow, Theano and Torch as well as many other deep learning applications. NVIDIA is working closely with leading deep learning frameworks maintainers at Amazon, Facebook, Google, Microsoft, University of Oxford and others to integrate the latest NVIDIA Deep Learning SDK libraries and immediately take advantage of the power of Volta.
Caffe2 announced on their blog an update to the framework that brings 16-bit floating point (FP16) training to Volta, developed in collaboration with NVIDIA:
“We are working closely with NVIDIA on Caffe2 to utilize the features in NVIDIA’s upcoming Tesla V100, based on the next-generation Volta architecture. Caffe2 is excited to be one of the first frameworks that is designed from the ground up to take full advantage of Volta by integrating the latest NVIDIA Deep Learning SDK libraries — NCCL and cuDNN.”
Amazon announced how they are working together with NVIDIA to bring high-performance deep learning to AWS. As part of the announcement, they spoke about the work we’ve done together on bringing Volta support to MXNet.
“In collaboration with NVIDIA, AWS engineers and researchers have pre-optimized neural machine translation (NMT) algorithms on Apache MXNet allowing developers to train the fastest on Volta-based platforms,” wrote Joseph Spisak, manager of Product Management at Amazon AI.
Google shared the latest TensorFlow benchmarks on DGX-1 on their developers blog:
“We’d like to thank NVIDIA for sharing a DGX-1 for benchmark testing and for their technical assistance. We’re looking forward to NVIDIA’s upcoming Volta architecture, and to working closely with them to optimize TensorFlow’s performance there, and to expand support for FP16.”
At NVIDIA, we’re also working closely with Microsoft to optimize Microsoft Cognitive Toolkit and Facebook AI Research (FAIR) to optimize PyTorch on Volta.
NVIDIA GPU Cloud Deep Learning Stack
We also announced today NVIDIA GPU Cloud (NGC), a GPU-accelerated cloud platform optimized for deep learning.
NGC is designed for developers of deep learning-powered applications who don’t want to assemble and maintain the latest deep learning software and GPUs. This comes with NGC Deep Learning Stack, a complete development environment that will run on PC, DGX and the cloud, and is powered by the latest deep learning frameworks, NVIDIA Deep Learning SDK and CUDA. The stack is fully managed by NVIDIA so developers and data scientists can start with a single GPU on a PC and scale up to additional compute resource on the cloud.
Updates to NVIDIA VRWorks and DesignWorks
Learn more information about the latest updates to some of our other SDKs:
Businesses collect mountains of data daily. Now it’s time to make those mountains move.
NVIDIA CEO and founder Jensen Huang announced today at our GPU Technology Conference that SAP and NVIDIA are working together to help businesses use AI in ways that will change the world’s view of business applications.
Together, we’re combining the advantages of NVIDIA’s AI computing platform with SAP’s leadership in enterprise software.
“With strong partners like NVIDIA at our side, the possibilities are limitless,” wrote SAP Chief Innovation Officer Juergen Mueller in a blog post published today. “New applications, unprecedented value in existing applications, and easy access to machine learning services will allow you to make your own enterprise intelligent.”
SAP is leveraging advancements NVIDIA has made from GPUs to systems to software. Our Tesla GPU computing platform represents a $2 billion investment. The NVIDIA DGX-1 — announced just over a year ago and incorporating eight GPUs — is an integrated hardware and software supercomputer that’s the result of work by over a dozen engineering teams.
Here are three examples of our collaboration with SAP that we’re demonstrating at GTC:
Measuring the ROI of Brand Impact
Many brands rely on sponsoring of televised events, yet it’s very difficult to track the impact of those ads. With the current manual process, it takes industry up to six weeks to report brand impact return on investment, and an entire quarter to adjust brand marketing expenditures.
SAP Brand Impact, powered by NVIDIA deep learning, measures brand attributes such as logos in near real time and with superhuman accuracy, because AI is not limited by all the human constraints. This is made possible using deep neural networks trained on NVIDIA DGX-1 and TensorRT to provide video inference analysis.
Results are immediate, accurate and auditable. Delivered in a day.
As a long-term SAP customer, Audi got early access to the latest SAP solution powered by NVIDIA deep learning, explains Global Head of Audi Sports Marketing Thomas Glas.
“Audi’s sponsorship team found the SAP Brand Impact solution a very useful tool. It can help Audi to evaluate its sponsorship exposure at high levels of operational excellence and transparency,” Glas said. “We were impressed by the capabilities and results of the first proof-of-concepts based on video footage from Audi FIS Alpine Ski World Cup. We’re strongly considering possibilities to combine SAP Brand Impact with our media analysis workflow for the upcoming Audi Cup and Audi FC Bayern Summer Tour.”
The Future of Accounts Payable
Talk about a paper trail. A typical large manufacturing company processes 8 million invoices a year. Companies around the world still receive paper invoices that need to be processed manually. These manual processes are costly, time-consuming, repetitive and error-prone.
SAP used deep learning to train its Accounts Payable application, which automates the extraction and classification of relevant information from invoices without human intervention. A recurrent neural network is trained on NVIDIA GPUs to create this customized solution.
Records are processed in sub-seconds. Cash flow is sped up. Errors are reduced.
The Ticket to Customer Satisfaction
Eighty-one percent of companies recognize customer experience as a competitive differentiator, so why do just 13 percent rate their customer service at 9/10 or better? Companies struggle to keep up with their customers’ complaints and support issues with limited resources.
Using natural language processing and deep learning techniques on NVIDIA GPU platform, the SAP Service Ticketing application helps companies analyze unstructured data and create automated rules to categorize and route service tickets to the right person.
The result: a faster response and an improved customer experience.
See More at GTC, SAPPHIRE and on the Web
To learn more about the SAP demos at GTC, join us at booth 118. We’ll also be at SAP SAPPHIRE, in Orlando next week, to showcase five more applications.
If you can’t make it to either show, join us for our live webinar on how we’re bringing AI to the enterprise, on June 14 at 9 am Pacific.