NGC is a catalog of software that is optimized to run on NVIDIA GPU cloud instances, such as the Amazon EC2 P4d instance featuring the record-breaking performance of NVIDIA A100 Tensor Core GPUs. AWS customers can deploy this software free of charge to accelerate their AI deployments.
We first began providing GPU-optimized software through the NVIDIA NGC catalog in 2017. Since then, industry demand for these resources has skyrocketed. More than 250,000 unique users have now downloaded more than 1 million of the AI containers, pretrained models, application frameworks, Helm charts and other machine learning resources available on the catalog.
Teaming Up for Another First in the Cloud
AWS is the first cloud service provider to offer the NGC catalog on their marketplace. Many organizations look to the cloud first for new deployment, so having NGC software available at the fingertips of data scientists and developers can help enterprises hit the ground running. With NGC, they can easily get started on new AI projects without having to leave the AWS ecosystem.
“AWS and NVIDIA have been working together to accelerate computing for more than a decade, and we are delighted to offer the NVIDIA NGC catalog in AWS Marketplace,” said Chris Grusz, director of AWS Marketplace at Amazon Web Services. “With NVIDIA NGC software now available directly in AWS Marketplace, customers will be able to simplify and speed up their AI deployment pipeline by accessing and deploying these specialized software resources directly on AWS.”
NGC AI Containers Debuting Today in AWS Marketplace
To help data scientists and developers build and deploy AI-powered solutions, the NGC catalog offers hundreds of NVIDIA GPU-accelerated machine learning frameworks and industry-specific software development kits. Today’s launch of NGC on AWS Marketplace features many of NVIDIA’s most popular GPU-accelerated AI software in healthcare, recommender systems, conversational AI, computer vision, HPC, robotics, data science and machine learning, including:
NVIDIA Clara Imaging: NVIDIA’s domain-optimized application framework that accelerates deep learning training and inference for medical imaging use cases.
NVIDIA DeepStream SDK: A multiplatform scalable video analytics framework to deploy on the edge and connect to any cloud.
NVIDIA HPC SDK: A suite of compilers, libraries and software tools for high performance computing.
NVIDIA Isaac Sim ML Training: A toolkit to help robotics machine learning engineers use Isaac Sim to generate synthetic images to train an object detection deep neural network.
NVIDIA Merlin: An open beta framework for building large-scale deep learning recommender systems.
NVIDIA NeMo: An open-source Python toolkit for developing state-of-the-art conversation AI models.
RAPIDS: A suite of open-source data science software libraries.
Instant Access to Performance-Optimized AI Software
NGC software in AWS Marketplace provides a number of benefits to help data scientists and developers build the foundations for success in AI.
Faster software discovery: Through the AWS Marketplace, developers and data scientists can access the latest versions of NVIDIA’s AI software with a single click.
The latest NVIDIA software: The NGC software in AWS Marketplace is federated, giving AWS users access to the latest versions as soon as they’re available in the NGC catalog. The software is constantly optimized, and the monthly releases give users access to the latest features and performance improvements.
Simplified software deployment: Users of Amazon EC2, Amazon SageMaker, Amazon Elastic Kubernetes Service (EKS) and Amazon Elastic Container Service (ECS) can quickly subscribe, pull and run NGC software on NVIDIA GPU instances, all within the AWS console. Additionally, SageMaker users can simplify their workflows by eliminating the need to first store a container in Amazon Elastic Container Registry (ECR).
Continuous integration and development: NGC Helm charts are also available in AWS Marketplace to help DevOps teams quickly and consistently deploy their services.
What’s New: Today, Intel unveiled ControlFlag – a machine programming research system that can autonomously detect errors in code. Even in its infancy, this novel, self-supervised system shows promise as a powerful productivity tool to assist software developers with the labor-intensive task of debugging. In preliminary tests, ControlFlag trained and learned novel defects on over 1 billion unlabeled lines of production-quality code.
“We think ControlFlag is a powerful new tool that could dramatically reduce the time and money required to evaluate and debug code. According to studies, software developers spend approximately 50% of the time debugging. With ControlFlag, and systems like it, I imagine a world where programmers spend notably less time debugging and more time on what I believe human programmers do best — expressing creative, new ideas to machines.”
–Justin Gottschlich, principal scientist and director/founder of Machine Programming Research at Intel Labs
Why It Matters: In a world increasingly run by software, developers continue to spend a disproportionate amount of time fixing bugs rather than coding. It’s estimated that of the $1.25 trillion that software development costs the IT industry every year, 50 percent is spent debugging code1.
Debugging is expected to take an even bigger toll on developers and the industry at large. As we progress into an era of heterogenous architectures — one defined by a mix of purpose-built processors to manage the massive sea of data available today — the software required to manage these systems becomes increasingly complex, creating a higher likelihood for bugs. In addition, it is becoming difficult to find software programmers who have the expertise to correctly, efficiently and securely program across diverse hardware, which introduces another opportunity for new and harder-to-spot errors in code.
When fully realized, ControlFlag could help alleviate this challenge by automating the tedious parts of software development, such as testing, monitoring and debugging. This would not only enable developers to do their jobs more efficiently and free up more time for creativity, but it would also address one of the biggest price tags in software development today.
How It Works: ControlFlag’s bug detection capabilities are enabled by machine programming, a fusion of machine learning, formal methods, programming languages, compilers and computer systems.
ControlFlag specifically operates through a capability known as anomaly detection. As humans existing in the natural world, there are certain patterns we learn to consider “normal” through observation. Similarly, ControlFlag learns from verified examples to detect normal coding patterns, identifying anomalies in code that are likely to cause a bug. Moreover, ControlFlag can detect these anomalies regardless of programming language.
A key benefit of ControlFlag’s unsupervised approach to pattern recognition is that it can intrinsically learn to adapt to a developer’s style. With limited inputs for the control tools that the program should be evaluating, ControlFlag can identify stylistic variations in programming language, similar to the way that readers recognize the differences between full words or using contractions in English.
The tool learns to identify and tag these stylistic choices and can customize error identification and solution recommendations based on its insights, which minimizes ControlFlag’s characterizations of code in error that may simply be a stylistic deviation between two developer teams.
Intel has even started evaluating using ControlFlag internally to identify bugs in its own software and firmware product development. It is a key element of Intel’s Rapid Analysis for Developers project, which aims to accelerate velocity by providing expert assistance.
The pharmaceutical industry has grown accustomed to investing billions of dollars to bring drugs to market, only to watch 90 percent of them fail even before clinical trials.
The problem is, and always has been, there’s simply not enough compute power in the world to accurately assess the properties of all possible molecules, nor to support the extensive experimental efforts needed in drug discovery.
“There may be more potential drug compounds than there are atoms in the universe,” said Patrick Lorton, chief technology officer at Schrödinger, the New York-based developer of a physics-based software platform designed to model and compute the properties of novel molecules for the pharma and materials industries.
“If you look at a billion molecules and you say there’s no good drug here, it’s the same as looking at a drop of water in the ocean and saying fish don’t exist,” he said.
Fresh off of a successful IPO earlier this year, Schrödinger has devoted decades to refining computational algorithms to accurately compute important properties of molecules. The company uses NVIDIA GPUs to generate and evaluate petabytes of data to accelerate drug discovery, which is a dramatic improvement over the traditional process of slow and expensive lab work.
The company works with all 20 of the biggest biopharma companies in the world, several of which have standardized on Schrödinger’s platform as a key component of preclinical research.
The COVID-19 pandemic highlights the need for a more efficient and effective drug discovery process. To that end, the company has joined the global COVID R&D alliance to offer resources and collaborate. Recently, Google Cloud has also thrown its weight behind this alliance, donating over 16 million hours of NVIDIA GPU time to hunt for a cure.
“We hope to develop an antiviral therapeutic for SARS-CoV-2, the virus that causes COVID-19, in time to have treatments available for future waves of the pandemic,” Lorton said.
Advanced Simulation Software
The pharmaceutical industry has long depended on manually intensive physical processes to find new therapeutics. This allowed it to develop many important remedies over the last 50 years, but only through a laborious trial-and-error approach, Lorton said.
He makes the comparison to airplane manufacturers, which formerly carved airplane designs out of balsa wood and tested their drag coefficient in wind tunnels. They now rely on advanced simulation software that reduces the time and resources needed to test designs.
With the pharmaceutical industry traditionally using the equivalent of balsa, Schrödinger’s drug discovery platform has become a game changer.
“We’re trying to make preclinical drug discovery more efficient,” said Lorton. “This will enable the industry to treat more diseases and help more conditions.”
Exploring New Space
For more than a decade, every major pharmaceutical company has been using Schrödinger’s software, which can perform physics simulations down to the atomic level. For each potential drug candidate, Schrödinger uses recently developed physics-based computational approaches to calculate as many as 3,000 possible compounds. This requires up to 12,000 GPU hours on high-performance computers.
Once the physics-based calculations are completed for the original set of randomly selected compounds, a layer of active learning is applied, making projections on the probable efficacy of a billion molecules.
Lorton said it currently takes four or five iterations to get a machine-learning algorithm accurate enough to be predictive, though even these projections are always double-checked with the physics-based methods before synthesizing any molecules in the lab.
This software-based approach yields much faster results, but that’s only part of the value. It also greatly expands the scope of analysis, evaluating data that human beings never would have had time to address.
“The thing that is most compelling is exploring new space,” said Lorton. “It’s not just being cheaper. It’s being cheaper and finding things you would have otherwise not explored.”
For that reason, Schrödinger’s work focuses on modeling and simulation, and using the latest high performance computing resources to expand its discovery capabilities.
Bayer Proving Platform’s Value
One customer that’s been putting Schrödinger’s technology to use is Bayer AG. Schrödinger software has been helping Bayer scientists find lead structures for several drug discovery projects, ultimately contributing to clinical development candidates.
Recently both companies agreed to co-develop a novel drug discovery platform to accelerate the process of estimating the binding affinity, as well as other properties, and synthesizability of small molecules.
Bayer can’t yet share any specific results that the platform has delivered, but Dr. Alexander Hillisch, the company’s head of computational drug design, said it’s had an impact on several active projects.
Dr. Hillisch said that the software is expected to speed up work and effectively widen Bayer’s drug-discovery capabilities. As a result, he believes it’s time for NVIDIA GPUs to get a lot more recognition within the industry.
In a typical drug discovery project, Bayer evaluates binding affinities and other properties of molecules such as absorption and metabolic stability. With Schrödinger software and NVIDIA GPUs, “we’re enumerating millions to billions of virtual compounds and are thus scanning the chemical space much more broadly than we did before, in order to identify novel lead compounds with favorable properties,” he said.
Dr. Hillisch also suggested that the impact of holistic digital drug discovery approaches can soon be judged. “We expect to know how substantial the impact of this scientific approach will be in the near future,” he said.
The drug design platform also will be part of Bayer’s work on COVID-19. The company spun off its antiviral research into a separate company in 2006, but it recently joined a European coronavirus research initiative to help identify novel compounds that could provide future treatment.
Tailor-Made Task for GPUs
Given the scope of Schrödinger’s task, Lorton made it clear that NVIDIA’s advances in developing a full-stack computing platform for HPC and AI that pushes the boundaries of performance have been as important to his company’s accomplishments as its painstaking algorithmic and scientific work.
“It could take thousands, or tens of thousands, or in some crazy case, even hundreds of thousands of dollars to synthesize and get the binding affinity of a drug molecule,” he said. “We can do it for a few dollars of compute costs on a GPU.”
Lorton said that if the company had started one of its physics calculations on a single CPU when it was founded in 1990, it would have taken until today to reach the conclusions that a single GPU can now deliver in less than an hour.
Even with the many breakthroughs in compute speed on NVIDIA GPUs, Schrödinger’s discovery projects require thousands of NVIDIA T4 and V100 Tensor Core GPUs every day, both on premises and on the Google Cloud Platform. It’s this next level of compute, combined with continued investment in the underlying science, that the company hopes will change the way all drug discovery is done.
Scientists and researchers around the world are racing to find a cure for COVID-19.
That’s made the work of all those digitally gathered for this week’s high performance computing conference, ISC 2020 Digital, more vital than ever.
And the work of these researchers is broadening to encompass a wider range of approaches than ever.
The NVIDIA scientific computing platform plays a vital role, accelerating progress across this entire spectrum of approaches — from data analytics to simulation and visualization to AI to edge processing.
In genomics, Oxford Nanopore Technologies was able to sequence the virus genome in just 7 hours using our GPUs.
In infection analysis and prediction, the NVIDIA RAPIDS team has GPU-accelerated Plotly’s Dash, a data visualization tool, enabling clearer insights into real-time infection rate analysis.
In structural biology, the U.S. National Institutes of Health and the University of Texas, Austin, are using GPU-accelerated software CryoSPARC to reconstruct the first 3D structure of the virus protein using cryogenic electron microscopy.
In drug discovery, Oak Ridge National Laboratory ran the Scripps Research Institute’s AutoDock on the GPU accelerated Summit Supercomputer to screen a billion potential drug combinations in just 12 hours.
And in edge detection, Whiteboard Coordinator Inc. built an AI system to automatically measure and screen elevated body temperatures, screening well over 2,000 healthcare workers per hour.
It’s truly inspirational to wake up every day and see the amazing effort going on around the world and the role NVIDIA’s scientific computing platform plays in helping understand the virus and discovering testing and treatment options to fight the COVID-19 pandemic.
The reason we’re able to play a role in so many efforts, across so many areas, is because of our strong focus on providing end-to-end workflows for the scientific computing community.
We’re able to provide these workflows because of our approach to full-stack innovation to accelerate all key application areas.
For data analytics, we accelerate the key frameworks like Spark3.0, RAPIDS and Dask. This acceleration is built using our domain-specific CUDA-X libraries for data analytics such as cuDF, cuML and cuGRAPH, along with I/O acceleration technologies from Magnum IO.
These libraries contain millions of lines of code and provide seamless acceleration to developers and users, whether they’re creating applications on the desktops accelerated with our GPUs or running them in data centers, in edge computers, in supercomputers, or in the cloud.
Similarly, we accelerate over 700 HPC applications, including all the most widely used scientific applications.
NVIDIA accelerates all frameworks for AI, which has become crucial for tasks where the information is incomplete — where there are no first principles to work with or the first principle-based approaches are too slow.
And, thanks to our roots in visual computing, NVIDIA provides accelerated visualization solutions, so terabytes of data can be visualized.
NASA, for instance, used our acceleration stack to visualize the landing of the first manned mission to Mars, in what is the world’s largest real-time, interactive volumetric visualization (150TB).
Our deep domain libraries also provide a seamless performance boost to scientific computing users on their applications across the different generations of our architecture. Going from Volta to Ampere, for instance.
NVIDIA’s also making all our new and improved GPU-optimized scientific computing applications available through NGC for researchers to accelerate their time to insight
Together, all of these pillars of scientific computing — simulation, AI and data analytics , edge streaming and visualization workflows — are key to tackling the challenges of today, and tomorrow.
From rapidly evolving technologies to stiff competition, there’s nothing simple about the wireless industry.
Take 5G. Whether deciding where to locate a complex web of new infrastructure or analyzing performance and service levels, the common element in all of the challenges the telecom industry faces is data — petabytes of it.
Data flows within the industry are orders of magnitudes greater than just a few years ago. And systems have become faster. This puts a premium on smart, quick decision-making.
Collecting and normalizing huge network datasets is just the start of the process. The data also has to be analyzed. To address these issues, wireless carriers like Verizon and data providers like Skyhook are turning to data analytics accelerated by the OmniSci platform and NVIDIA GPUs.
OmniSci, based in San Francisco, pioneered the concept of using the incredible parallel processing power of GPUs to interactively query and visualize massive datasets in milliseconds.
Composed of a lightning-fast SQL engine along with rendering and visualization systems, the OmniSci accelerated analytics platform allows users to run SQL queries, filter the results and chart them over a map near instantaneously. In the time it takes for traditional analytics tools to respond to a single query, the OmniSci platform allows users to get answers to questions as fast as they can formulate them.
The extreme parallel processing speed of NVIDIA GPUs allows entire datasets to be explored — without indexing or pre-aggregation. Analysts can create dashboards composed of geo-point maps, geo-heat maps, choropleths and scatter plots, in addition to conventional line, bar and pie charts.
Even non-technical users can query and visualize millions of polygons, based on billions of rows of data, at their own pace. Enhancing the interface, the RAPIDS machine learning framework enables users to create predictive models based on existing data.
Ensuring the rollout of even coverage for wireless customers requires coordinating a huge number of new cellular base stations — in everything from cell towers to homes and businesses — as well as new distributed antenna systems for major indoor and outdoor facilities.
Wireless providers must also continually monitor and analyze network performance; surges and anomalies have to be identified and quickly addressed; and equipment must be constantly optimized to meet customer demands. Additionally, cybersecurity defenses require a never-ending cycle of management, reviews and upgrades.
Accelerated analytics helps wireless carriers solve many of these difficult operational problems. For network planning, GPUs offer much deeper analysis of market utilization and can spot gaps in daypart or geographic coverage. Log queries can be reviewed within moments, instead of hours, and help predict usage in specific geographic areas to better inform engineering or utilization planning decisions.
To ensure optimal service levels of customers, engineers are using GPU-accelerated analytics to better understand network demand by parameters such as daypart, service, and type of data. They can review these metrics in any combination and at any level — nationwide, regionally or even at street level — with results plotted in fractions of a second.
On the business side, marketing and customer service personnel require improved ways to attract new customers and reduce subscriber churn. Where prepaid wireless is the norm, it’s vital to introduce services that generate incremental revenue while reducing turnover.
With OmniSci, these teams can review mobile and application data to identify opportunities for promotions or upselling and to reduce customer churn. Location, activity and hardware profiles can all be taken into account to improve ad targeting and campaign measurement.
Verizon, America’s largest telecom with over 150 million subscribers, uses the OmniSci platform to improve its network reliability.
Anomalies are identified in just moments, versus old methods that would take 45 to 60 minutes, leading to faster problem resolution. Verizon also uses OmniSci to uncover long-term trends and to facilitate the expansion of its Wi-Fi services into new venues such as sports stadiums.
Skyhook, a mobile positioning and location provider, uses OmniSci to cross-reference Wi-Fi, cellular and sensor data to provide precise information about users and devices. Retailers, to cite one example, use this intelligence to analyze store visits and shopper behavior patterns. The data also helps with customer acquisition, site selection and various investment opportunities.
Skyhook’s insights further aid in the creation of location-based experiences such as customized storytelling and venue orientation. When disasters strike, the company’s real-time knowledge base helps first responders understand complex damage scenarios and to move quickly to locations where help is needed most.
Rather than mustering a little more performance out of legacy systems, new challenges require new solutions. OmniSci and NVIDIA are helping telcos answer the call.
Massive change across every industry is being driven by the rising adoption of IoT sensors, including cameras for seeing, microphones for hearing, and a range of other smart devices that help enterprises perceive and understand what’s happening in the physical world.
The amount of data being generated at the edge is growing exponentially. The only way to process this vast data in real time is by placing servers near the point of action and by harnessing the immense computational power of GPUs.
The enterprise data center of the future won’t have 10,000 servers in one location, but one or more servers across 10,000 different locations. They’ll be in office buildings, factories, warehouses, cell towers, schools, stores and banks. They’ll detect traffic jams and forest fires, route traffic safely and prevent crime.
By placing a network of distributed servers where data is being streamed from hundreds of sensors, enterprises can use networks of data centers at the edge to drive immediate action with AI. Additionally, by processing data at the edge, privacy concerns are mitigated and data sovereignty concerns are put to rest.
Edge servers lack the physical security infrastructure that enterprise IT takes for granted. And companies lack the budget to invest in roaming IT personnel to manage these remote systems. So edge servers need to be designed to be self-secure and easy to update, manage and deploy from afar.
Plus, AI systems need to be running all the time, with zero downtime.
We’ve built the NVIDIA EGX Edge AI platform to ensure security and resiliency on a global scale. By simplifying deployment and management, NVIDIA EGX allows always-on AI applications to automate the critical infrastructure of the future. The platform is a Kubernetes and container-native software platform that brings GPU-accelerated AI to everything from dual-socket x86 servers to Arm-based NVIDIA Jetson SoCs.
To date, there are over 20 server vendors building EGX-powered edge and micro-edge servers, including ADLINK, Advantech, Atos, AVerMedia, Cisco, Connect Tech, Dell Technologies, Diamond Systems, Fujitsu, Gigabyte, Hewlett Packard Enterprise, Inspur, Lenovo, Quanta Technologies and Supermicro. As well as dozens of hybrid-cloud and network security partners in the NVIDIA edge ecosystem, such as Canonical, Check Point, Excelero, Guardicore, IBM, Nutanix, Palo Alto Networks, Rancher, Red Hat, VMware, Weka and Wind River.
There are also hundreds of AI applications and integrated solutions vendors building on NVIDIA EGX to deliver industry-specific offerings to enterprises across the globe.
Enterprises running AI need to protect not just customer data, but also the AI models that transform the data into actions. By combining an NVIDIA Mellanox SmartNIC, the industry standard for secure, high-performance networking, with our AI processors into NVIDIA EGX A100, a combined converged accelerator, we’re introducing fundamental new innovations for edge AI.
Enhanced Security and Performance
A secure, authenticated boot of the GPU and SmartNIC from Hardware Root-of-Trust ensures the device firmware and lifecycle are securely managed. Third-generation Tensor Core technology in the NVIDIA Ampere architecture brings industry-leading AI performance. Specific to EGX A100, the confidential AI enclave uses a new GPU security engine to load encrypted AI models and encrypt all AI outputs, further preventing the theft of valuable IP.
As the edge moves to encrypted high-resolution sensors, SmartNICs support in-line cryptographic acceleration at the line rate. This allows encrypted data feeds to be decrypted and sent directly to the GPU memory, bypassing the CPU and system memory.
The edge also requires a greater level of security to protect against threats from other devices on the network. With dynamically reconfigurable firewall offloads in hardware, SmartNICs efficiently deliver the first line of defense for hybrid-cloud, secure service mesh communications.
NVIDIA Mellanox’s time-triggered transport technology for telco (5T for 5G) ensures commercial off-the-shelf solutions can meet the most time-sensitive use cases for 5G vRAN with our NVIDIA Aerial SDK. This will lead to a new wave of CloudRAN in the telecommunications industry.
With an NVIDIA Ampere GPU and Mellanox ConnectX-6 D on one converged product, the EGX A100 delivers low-latency, high-throughput packet processing for security and virtual network functions.
Simplified Deployment, Management and Security at Scale
Through NGC, NVIDIA’s catalog of GPU-optimized containers, we provide industry application frameworks and domain-specific AI toolkits to simplify getting started and for tuning AI applications to new edge environments. They can be used together or individually and open new possibilities for a variety of edge use cases.
And with the NGC private registry, applications can be signed before publication to ensure they haven’t been tampered with in transit, then authenticated before running at the edge. The NGC private registry also supports model versioning and encryption, so lightweight model updates can be delivered quickly and securely.
The future of edge computing requires secure, scalable, resilient, easy-to-manage fleets of AI-powered systems operating at the network edge. By bringing the combined acceleration of NVIDIA GPUs and NVIDIA Mellanox SmartNICs together with NVIDIA EGX, we’re building both the platform and the ecosystem to form the AI nervous system of every global industry.
Original plans for the keynote to be delivered live at NVIDIA’s GPU Technology Conference in late March in San Jose were upended by the coronavirus pandemic.
Huang kicked off his keynote on a note of gratitude.
“I want to thank all of the brave men and women who are fighting on the front lines against COVID-19,” Huang said.
NVIDIA, Huang explained, is working with researchers and scientists to use GPUs and AI computing to treat, mitigate, contain and track the pandemic. Among those mentioned:
Oxford Nanopore Technologies has sequenced the virus genome in just seven hours.
Plotly is doing real-time infection rate tracing.
Oak Ridge National Laboratory and the Scripps Research Institute have screened a billion potential drug combinations in a day.
Structura Biotechnology, the University of Texas at Austin and the National Institutes of Health have reconstructed the 3D structure of the virus’s spike protein.
NVIDIA also announced updates to its NVIDIA Clara healthcare platform aimed at taking on COVID-19.
“Researchers and scientists applying NVIDIA accelerated computing to save lives is the perfect example of our company’s purpose — we build computers to solve problems normal computers cannot,” Huang said.
At the core of Huang’s talk was a vision for how data centers, the engine rooms of the modern global information economy, are changing, and how NVIDIA and Mellonox, acquired in a deal that closed last month, are together driving those changes.
“The data center is the new computing unit,” Huang said, adding that NVIDIA is accelerating performance gains from silicon, to the ways CPUs and GPUs connect, to the full software stack, and, ultimately, across entire data centers.
Systems Optimized for Data Center-Scale Computing
That starts with a new GPU architecture that’s optimized for this new kind of data center-scale computing, unifying AI training and inference, and making possible flexible, elastic acceleration.
NVIDIA A100, the first GPU based on the NVIDIA Ampere architecture, providing the greatest generational performance leap of NVIDIA’s eight generations of GPUs, is also built for data analytics, scientific computing and cloud graphics, and is in full production and shipping to customers worldwide, Huang announced.
Eighteen of the world’s leading service providers and systems builders are incorporating them, among them Alibaba Cloud, Amazon Web Services, Baidu Cloud, Cisco, Dell Technologies, Google Cloud, Hewlett Packard Enterprise, Microsoft Azure and Oracle.
The A100, and the NVIDIA Ampere architecture it’s built on, boost performance by up to 20x over its predecessors, Huang said. He detailed five key features of A100, including:
More than 54 billion transistors, making it the world’s largest 7-nanometer processor.
Third-generation Tensor Cores with TF32, a new math format that accelerates single-precision AI training out of the box. NVIDIA’s widely used Tensor Cores are now more flexible, faster and easier to use, Huang explained.
Structural sparsity acceleration, a new efficiency technique harnessing the inherently sparse nature of AI math for higher performance.
Multi-instance GPU, or MIG, allowing a single A100 to be partitioned into as many as seven independent GPUs, each with its own resources.
Third-generation NVLink technology, doubling high-speed connectivity between GPUs, allowing A100 servers to act as one giant GPU.
The result of all this: 6x higher performance than NVIDIA’s previous generation Volta architecture for training and 7x higher performance for inference.
NVIDIA DGX A100 Packs 5 Petaflops of Performance
NVIDIA is also shipping a third generation of its NVIDIA DGX AI system based on NVIDIA A100 — the NVIDIA DGX A100 — the world’s first 5-petaflops server. And each DGX A100 can be divided into as many as 56 applications, all running independently.
This allows a single server to either “scale up” to race through computationally intensive tasks such as AI training, or “scale out,” for AI deployment, or inference, Huang said.
Among initial recipients of the system are the U.S. Department of Energy’s Argonne National Laboratory, which will use the cluster’s AI and computing power to better understand and fight COVID-19; the University of Florida; and the German Research Center for Artificial Intelligence.
A100 will also be available for cloud and partner server makers as HGX A100.
A data center powered by five DGX A100 systems for AI training and inference running on just 28 kilowatts of power costing $1 million can do the work of a typical data center with 50 DGX-1 systems for AI training and 600 CPU systems consuming 630 kilowatts and costing over $11 million, Huang explained.
“The more you buy, the more you save,” Huang said, in his common keynote refrain.
Need more? Huang also announced the next-generation DGX SuperPOD. Powered by 140 DGX A100 systems and Mellanox networking technology, it offers 700 petaflops of AI performance, Huang said, the equivalent of one of the 20 fastest computers in the world.
NVIDIA is expanding its own data center with four DGX SuperPODs, adding 2.8 exaflops of AI computing power — for a total of 4.6 exaflops of total capacity — to its SATURNV internal supercomputer, making it the world’s fastest AI supercomputer.
Huang also announced the NVIDIA EGX A100, bringing powerful real-time cloud-computing capabilities to the edge. Its NVIDIA Ampere architecture GPU offers third-generation Tensor Cores and new security features. Thanks to its NVIDIA Mellanox ConnectX-6 SmartNIC, it also includes secure, lightning-fast networking capabilities.
Software for the Most Important Applications in the World Today
Huang also announced NVIDIA GPUs will power major software applications for accelerating three critical usages: managing big data, creating recommender systems and building real-time, conversational AI.
These new tools arrive as the effectiveness of machine learning has driven companies to collect more and more data. “That positive feedback is causing us to experience an exponential growth in the amount of data that is collected,” Huang said.
To help organizations of all kinds keep up, Huang announced support for NVIDIA GPU acceleration on Spark 3.0, describing the big data analytics engine as “one of the most important applications in the world today.”
Built on RAPIDS, Spark 3.0 shatters performance benchmarks for extracting, transforming and loading data, Huang said. It’s already helped Adobe Intelligent Services achieve a 90 percent compute cost reduction.
Key cloud analytics platforms — including Amazon SageMaker, Azure Machine Learning, Databricks, Google Cloud AI and Google Cloud Dataproc — will all accelerate with NVIDIA, Huang announced.
“We’re now prepared for a future where the amount of data will continue to grow exponentially from tens or hundreds of petabytes to exascale and beyond,” Huang said.
Huang also unveiled NVIDIA Merlin, an end-to-end framework for building next-generation recommender systems, which are fast becoming the engine of a more personalized internet. Merlin slashes the time needed to create a recommender system from a 100-terabyte dataset to 20 minutes from four days, Huang said.
And he detailed NVIDIA Jarvis, a new end-to-end platform for creating real-time, multimodal conversational AI that can draw upon the capabilities unleashed by NVIDIA’s AI platform.
Huang highlighted its capabilities with a demo that showed him interacting with a friendly AI, Misty, that understood and responded to a sophisticated series of questions about the weather in real time.
Huang also dug into NVIDIA’s swift progress in real-time ray tracing since NVIDIA RTX was launched at SIGGRAPH in 2018, and he announced that NVIDIA Omniverse, which allows “different designers with different tools in different places doing different parts of the same design,” to work together simultaneously is now available for early access customers.
Autonomous vehicles are one of the greatest computing challenges of our time, Huang said, an area where NVIDIA continues to push forward with NVIDIA DRIVE.
NVIDIA DRIVE will use the new Orin SoC with an embedded NVIDIA Ampere GPU to achieve the energy efficiency and performance to offer a 5-watt ADAS system for the front windshield as well as scale up to a 2,000 TOPS, level-5 robotaxi system.
Now automakers have a single computing architecture and single software stack to build AI into every one of their vehicles.
“It’s now possible for a carmaker to develop an entire fleet of cars with one architecture, leveraging the software development across their whole fleet,” Huang said.
The NVIDIA DRIVE ecosystem now encompasses cars, trucks, tier one automotive suppliers, next-generation mobility services, startups, mapping services, and simulation.
And Huang announced NVIDIA is adding NVIDIA DRIVE RC for managing entire fleets of autonomous vehicles to its suite of NVIDIA DRIVE technologies.
BMW’s 30 factories around the globe build one vehicle every 56 seconds: that’s 40 different models, each with hundreds of different options, made from 30 million parts flowing in from nearly 2,000 suppliers around the world, Huang explained.
BMW joins a sprawling NVIDIA robotics global ecosystem that spans delivery services, retail, autonomous mobile robots, agriculture, services, logistics, manufacturing and healthcare.
In the future, factories will, effectively, be enormous robots. “All of the moving parts inside will be driven by artificial intelligence,” Huang said. “Every single mass-produced product in the future will be customized.”
Spend enough time online, however, and what you want will start finding you just when you need it.
This is what’s driving the internet right now.
They’re called recommender systems, and they’re among the most important applications today.
That’s because there is an explosion of choice and it’s impossible to explore the large number of available options.
If a shopper were to spend just one second each swiping on their mobile app through the two billion products available on one prominent ecommerce site, it would take 65 years — almost an entire lifetime — to go through their entire catalog.
This is one of the major reasons why the Internet is now so personalized, otherwise it’s simply impossible for the billions of Internet users in the world to connect with the products, services, even expertise — among hundreds of billions of things — that matter to them.
They might be the most human, too. After all, what are you doing when you go to someone for advice? When you’re looking for feedback? You’re asking for a recommendation.
Now, driven by vast quantities of data about the preferences of hundreds of millions of individual users, recommender systems are racing to get better at doing just that.
The internet, of course, already knows a lot of facts: your name, your address, maybe your birthplace. But what the recommender systems seek to learn better, perhaps, than the people who know you are your preferences.
Recommender systems aren’t a new idea. Jussi Karlgren formulated the idea of a recommender system, or a “digital bookshelf,” in 1990. Over the next two decades researchers at MIT and Bellcore steadily advanced the technique.
The technology really caught the popular imagination starting in 2007, when Netflix — then in the business of renting out DVDs through the mail — kicked off an open competition with a $1 million prize for a collaborative filtering algorithm that could improve on the accuracy of Netflix’s own system by more than 10 percent, a prize that was claimed in 2009.
Over the following decade, such recommender systems would become critical to the success of Internet companies such as Netflix, Amazon, Facebook, Baidu and Alibaba.
Virtuous Data Cycle
And the latest generation of deep-learning powered recommender systems provide marketing magic, giving companies the ability to boost click-through rates by better targeting users who will be interested in what they have to offer.
Now the ability to collect this data, process it, use it to train AI models and deploy those models to help you and others find what you want is among the largest competitive advantages possessed by the biggest internet companies.
It’s driving a virtuous cycle — with the best technology driving better recommendations, recommendations which draw more customers and, ultimately, let these companies afford even better technology.
That’s the business model. So how does this technology work?
Recommenders work by collecting information — by noting what you ask for — such as what movies you tell your video streaming app you want to see, ratings and reviews you’ve submitted, purchases you’ve made, and other actions you’ve taken in the past
Perhaps more importantly, they can keep track of choices you’ve made: what you click on and how you navigate. How long you watch a particular movie, for example. Or which ads you click on or which friends you interact with.
All this information is streamed into vast data centers and compiled into complex, multidimensional tables that quickly balloon in size.
They can be hundreds of terabytes large — and they’re growing all the time.
That’s not so much because vast amounts of data are collected from any one individual, but because a little bit of data is collected from so many.
In other words, these tables are sparse — most of the information most of these services have on most of us for most of these categories is zero.
But, collectively these tables contain a great deal of information on the preferences of a large number of people.
And that helps companies make intelligent decisions about what certain types of users might like.
Content Filtering, Collaborative Filtering
While there are a vast number of recommender algorithms and techniques, most fall into one of two broad categories: collaborative filtering and content filtering.
Collaborative filtering helps you find what you like by looking for users who are similar to you.
So while the recommender system may not know anything about your taste in music, if it knows you and another user share similar taste in books, it might recommend a song to you that it knows this other user already likes.
Content filtering, by contrast, works by understanding the underlying features of each product.
So if a recommender sees you liked the movies “You’ve Got Mail” and “Sleepless in Seattle,” it might recommend another movie to you starring Tom Hanks and Meg Ryan, such as “Joe Versus the Volcano.”
Those are extremely simplistic examples, to be sure.
Data as a Competitive Advantage
In reality, because these systems capture so much data, from so many people, and are deployed at such an enormous scale, they’re able to drive tens or hundreds of millions of dollars of business with even a small improvement in the system’s recommendations.
A business may not know what any one individual will do, but thanks to the law of large numbers, they know that, say, if an offer is presented to 1 million people, 1 percent will take it.
But while the potential benefits from better recommendation systems are big, so are the challenges.
Successful internet companies, for example, need to process ever more queries, faster, spending vast sums on infrastructure to keep up as the amount of data they process continues to swell.
Companies outside of technology, by contrast, need access to ready-made tools so they don’t have to hire whole teams of data scientists.
If recommenders are going to be used in industries ranging from healthcare to financial services, they’ll need to become more accessible.
This is where GPUs come in.
NVIDIA GPUs, of course, have long been used to accelerate training times for neural networks — sparking the modern AI boom — since their parallel processing capabilities let them blast through data-intensive tasks.
But now, as the amount of data being moved continues to grow, GPUs are being harnessed more extensively. Tools such as RAPIDS, a suite of software libraries for accelerating data science and analytics pipelines much more quickly, so data scientists can get more work done much faster.
And NVIDIA’s just announced Merlin recommender application framework promises to make GPU-accelerated recommender systems more accessible still with an end-to-end pipeline for ingesting, training and deploying GPU-accelerated recommender systems.
These systems will be able to take advantage of the new NVIDIA A100 GPU, built on our NVIDIA Ampere architecture, so companies can build recommender systems more quickly and economically than ever.
Our recommendation? If you’re looking to put recommender systems to work, now might be a good time to get started.
BERT is at work in Europe, tackling natural-language processing jobs in multiple industries and languages with help from NVIDIA’s products and partners.
The AI model formally known as Bidirectional Encoder Representations from Transformers debuted just last year as a state-of-the-art approach to machine learning for text. Though new, BERT is already finding use in avionics, finance, semiconductor and telecom companies on the continent, said developers optimizing it for German and Swedish.
“There are so many use cases for BERT because text is one of the most common data types companies have,” said Anders Arpteg, head of research for Peltarion, a Stockholm-based developer that aims to make the latest AI techniques such as BERT inexpensive and easy for companies to adopt.
Natural-language processing will outpace today’s AI work in computer vision because “text has way more apps than images — we started our company on that hypothesis,” said Milos Rusic, chief executive of deepset in Berlin. He called BERT “a revolution, a milestone we bet on.”
Deepset is working with PricewaterhouseCoopers to create a system that uses BERT to help strategists at a chip maker query piles of annual reports and market data for key insights. In another project, a manufacturing company is using NLP to search technical documents to speed maintenance of their products and predict needed repairs.
Peltarion, a member of NVIDIA’s Inception program that nurtures startups with access to its technology and ecosystem, packed support for BERT into its tools in November. It is already using NLP to help a large telecom company automate parts of its process for responding to product and service requests. And it’s using the technology to let a large market research company more easily query its database of surveys.
Work in Localization
Peltarion is collaborating with three other organizations on a three-year, government-backed project to optimize BERT for Swedish. Interestingly, a new model from Facebook called XLM-R suggests training on multiple languages at once could be more effective than optimizing for just one.
“In our initial results, XLM-R, which Facebook trained on 100 languages at once, outperformed a vanilla version of BERT trained for Swedish by a significant amount,” said Arpteg, whose team is preparing a paper on their analysis.
Nevertheless, the group hopes to have before summer a first version of a Swedish BERT model that performs really well, said Arpteg, who headed up an AI research group at Spotify before joining Peltarion three years ago.
BERT also benefits from optimizations for specific tasks such as text classification, question answering and sentiment analysis, said Arpteg. Peltarion researchers plans to publish in 2020 results of an analysis of gains from tuning BERT for areas with their own vocabularies such as medicine and legal.
The question-answering task has become so strategic for deepset it created Haystack, a version of its FARM transfer-learning framework to handle the job.
In hardware, the latest NVIDIA GPUs are among the favorite tools both companies use to tame big NLP models. That’s not surprising given NVIDIA recently broke records lowering BERT training time.
“The vanilla BERT has 100 million parameters and XML-R has 270 million,” said Arpteg, whose team recently purchased systems using NVIDIA Quadro and TITAN GPUs with up to 48GB of memory. It also has access to NVIDIA DGX-1 servers because “for training language models from scratch, we need these super-fast systems,” he said.
More memory is better, said Rusic, whose German BERT models weigh in at 400MB. Deepset taps into NVIDIA V100 Tensor Core 100 GPUs on cloud services and uses another NVIDIA GPU locally.
With this new offering, AI is no longer a research project.
Most companies still keep their data inside their own walls because they see it as their core intellectual property. But for deep learning to transition from research into production, enterprises need the flexibility and ease of development the cloud offers — right beside their data. That’s a big part of what AWS Outposts with T4 GPUs now enables.
With this new offering, enterprises can install a fully managed rack-scale appliance next to the large data lakes stored securely in their data centers.
AI Acceleration Across the Enterprise
To train neural networks, every layer of software needs to be optimized, from NVIDIA drivers to container runtimes and application frameworks. AWS services like Sagemaker, Elastic MapReduce and many others designed on custom-built Amazon Machine Images require model development to start with the training on large datasets. With the introduction of NVIDIA-powered AWS Outposts, those services can now be run securely in enterprise data centers.
The GPUs in Outposts accelerate deep learning as well as high performance computing and other GPU applications. They all can access software in NGC, NVIDIA’s hub for GPU-accelerated software optimization, which is stocked with applications, frameworks, libraries and SDKs that include pre-trained models.
For AI inference, the NVIDIA EGX edge-computing platform also runs on AWS Outposts and works with the AWS Elastic Kubernetes Service. Backed by the power of NVIDIA T4 GPUs, these services are capable of processing orders of magnitudes more information than CPUs alone. They can quickly derive insights from vast amounts of data streamed in real time from sensors in an Internet of Things deployment whether it’s in manufacturing, healthcare, financial services, retail or any other industry.
On top of EGX, the NVIDIA Metropolis application framework provides building blocks for vision AI, geared for use in smart cities, retail, logistics and industrial inspection, as well as other AI and IoT use cases, now easily delivered on AWS Outposts.
Users of high-end graphics have choices, too. Remote designers, artists and technical professionals who need to access large datasets and models can now get both cloud convenience and GPU performance.
Graphics professionals can benefit from the same NVIDIA Quadro technology that powers most of the world’s professional workstations not only on the public AWS cloud, but on their own internal cloud now with AWS Outposts packing T4 GPUs.
Whether they’re working locally or in the cloud, Quadro users can access the same set of hundreds of graphics-intensive, GPU-accelerated third-party applications.
The Quadro Virtual Workstation AMI, available in AWS Marketplace, includes the same Quadro driver found on physical workstations. It supports hundreds of Quadro-certified applications such as Dassault Systèmes SOLIDWORKS and CATIA; Siemens NX; Autodesk AutoCAD and Maya; ESRI ArcGIS Pro; and ANSYS Fluent, Mechanical and Discovery Live.