Eli Gorovici loves to take friends sailing on the Mediterranean. As the new pilot of Trigo, a Tel Aviv-based startup, he’s inviting the whole retail industry on a cruise to a future with AI.
“We aim to bring the e-commerce experience into the brick-and-mortar supermarket,” said Gorovici, who joined the company as its chief business officer in May.
The journey starts with the sort of shopping anyone who’s waited in a long checkout line has longed for.
You fill up your bags at the market and just walk out. Magically, the store knows what you bought, bills your account and sends you a digital receipt, all while preserving your privacy.
Trigo is building that experience and more. Its magic is an AI engine linked to cameras and a few weighted shelves for small items a shopper’s hand might completely cover.
With these sensors, Trigo builds a 3D model of the store. Neural networks recognize products customers put in their bags.
When shoppers leave, the system sends the grocer the tally and a number it randomly associated with them when they chose to swipe their smartphone as they entered the store. The grocer matches the number with a shopper’s account, charges it and sends off a digital bill.
And that’s just the start.
An Online Experience in the Aisles
Shoppers get the same personalized recommendation systems they’re used to seeing online.
“If I’m standing in front of pasta, I may see on my handset a related coupon or a nice Italian recipe tailored for me,” said Gorovici. “There’s so much you can do with data, it’s mind blowing.”
The system lets stores fine-tune their inventory management systems in real time. Typical shrinkage rates from shoplifting or human error could sink to nearly zero.
AI Turns Images into Insights
Making magic is hard work. Trigo’s system gathers a petabyte of video data a day for an average-size supermarket.
It uses as many as four neural networks to process that data at mind-melting rates of up to a few hundred frames per second. (By contrast, your TV displays high-definition movies at 60 fps.)
Trigo used a dataset of up to 500,000 2D product images to train its neural networks. In their daily operations, the system uses those models to run millions of inference tasks with help from NVIDIA TensorRT software.
The AI work requires plenty of processing muscle. A supermarket outside London testing the Trigo system uses servers in its back room with 40-50 NVIDIA RTX GPUs. To boost efficiency, Trigo plans to deliver edge servers using NVIDIA T4 Tensor Core GPUs and join the NVIDIA Metropolis ecosystem starting next year.
Trigo got early access to the T4 GPUs thanks to its participation in NVIDIA Inception, a program that gives AI startups traction with tools, expertise and go-to-market support. The program also aims to introduce Trigo to NVIDIA’s retail partners in Europe.
In 2021, Trigo aims to move some of the GPU processing to Google, Microsoft and other cloud services, keeping some latency- or privacy-sensitive uses inside the store. It’s the kind of distributed architecture businesses are just starting to adopt, thanks in part to edge computing systems such as NVIDIA’s EGX platform.
Big Supermarkets Plug into AI
Tesco, the largest grocer in the U.K., has plans to open its first market using Trigo’s system. “We’ve vetted the main players in the industry and Trigo is the best by a mile,” said Tesco CEO Dave Lewis.
Israel’s largest grocer, Shufersal, also is piloting Trigo’s system, as are other retailers around the world.
Trigo was founded in 2018 by brothers Michael and Daniel Gabay, leveraging tech and operational experience from their time in elite units of the Israeli military.
Seeking his next big opportunity in his field of video technology, Gorovici asked friends who were venture capitalists for advice. “They said Trigo was the future of retail,” Gorovici said.
Like sailing in the aqua-blue Mediterranean, AI in retail is a compelling opportunity.
“It’s a trillion-dollar market — grocery stores are among the biggest employers in the world. They are all being digitized, and selling more online now given the pandemic, so maybe this next stage of digital innovation for retail will now move even faster,” he said.
Original plans for the keynote to be delivered live at NVIDIA’s GPU Technology Conference in late March in San Jose were upended by the coronavirus pandemic.
Huang kicked off his keynote on a note of gratitude.
“I want to thank all of the brave men and women who are fighting on the front lines against COVID-19,” Huang said.
NVIDIA, Huang explained, is working with researchers and scientists to use GPUs and AI computing to treat, mitigate, contain and track the pandemic. Among those mentioned:
Oxford Nanopore Technologies has sequenced the virus genome in just seven hours.
Plotly is doing real-time infection rate tracing.
Oak Ridge National Laboratory and the Scripps Research Institute have screened a billion potential drug combinations in a day.
Structura Biotechnology, the University of Texas at Austin and the National Institutes of Health have reconstructed the 3D structure of the virus’s spike protein.
NVIDIA also announced updates to its NVIDIA Clara healthcare platform aimed at taking on COVID-19.
“Researchers and scientists applying NVIDIA accelerated computing to save lives is the perfect example of our company’s purpose — we build computers to solve problems normal computers cannot,” Huang said.
At the core of Huang’s talk was a vision for how data centers, the engine rooms of the modern global information economy, are changing, and how NVIDIA and Mellonox, acquired in a deal that closed last month, are together driving those changes.
“The data center is the new computing unit,” Huang said, adding that NVIDIA is accelerating performance gains from silicon, to the ways CPUs and GPUs connect, to the full software stack, and, ultimately, across entire data centers.
Systems Optimized for Data Center-Scale Computing
That starts with a new GPU architecture that’s optimized for this new kind of data center-scale computing, unifying AI training and inference, and making possible flexible, elastic acceleration.
NVIDIA A100, the first GPU based on the NVIDIA Ampere architecture, providing the greatest generational performance leap of NVIDIA’s eight generations of GPUs, is also built for data analytics, scientific computing and cloud graphics, and is in full production and shipping to customers worldwide, Huang announced.
Eighteen of the world’s leading service providers and systems builders are incorporating them, among them Alibaba Cloud, Amazon Web Services, Baidu Cloud, Cisco, Dell Technologies, Google Cloud, Hewlett Packard Enterprise, Microsoft Azure and Oracle.
The A100, and the NVIDIA Ampere architecture it’s built on, boost performance by up to 20x over its predecessors, Huang said. He detailed five key features of A100, including:
More than 54 billion transistors, making it the world’s largest 7-nanometer processor.
Third-generation Tensor Cores with TF32, a new math format that accelerates single-precision AI training out of the box. NVIDIA’s widely used Tensor Cores are now more flexible, faster and easier to use, Huang explained.
Structural sparsity acceleration, a new efficiency technique harnessing the inherently sparse nature of AI math for higher performance.
Multi-instance GPU, or MIG, allowing a single A100 to be partitioned into as many as seven independent GPUs, each with its own resources.
Third-generation NVLink technology, doubling high-speed connectivity between GPUs, allowing A100 servers to act as one giant GPU.
The result of all this: 6x higher performance than NVIDIA’s previous generation Volta architecture for training and 7x higher performance for inference.
NVIDIA DGX A100 Packs 5 Petaflops of Performance
NVIDIA is also shipping a third generation of its NVIDIA DGX AI system based on NVIDIA A100 — the NVIDIA DGX A100 — the world’s first 5-petaflops server. And each DGX A100 can be divided into as many as 56 applications, all running independently.
This allows a single server to either “scale up” to race through computationally intensive tasks such as AI training, or “scale out,” for AI deployment, or inference, Huang said.
Among initial recipients of the system are the U.S. Department of Energy’s Argonne National Laboratory, which will use the cluster’s AI and computing power to better understand and fight COVID-19; the University of Florida; and the German Research Center for Artificial Intelligence.
A100 will also be available for cloud and partner server makers as HGX A100.
A data center powered by five DGX A100 systems for AI training and inference running on just 28 kilowatts of power costing $1 million can do the work of a typical data center with 50 DGX-1 systems for AI training and 600 CPU systems consuming 630 kilowatts and costing over $11 million, Huang explained.
“The more you buy, the more you save,” Huang said, in his common keynote refrain.
Need more? Huang also announced the next-generation DGX SuperPOD. Powered by 140 DGX A100 systems and Mellanox networking technology, it offers 700 petaflops of AI performance, Huang said, the equivalent of one of the 20 fastest computers in the world.
NVIDIA is expanding its own data center with four DGX SuperPODs, adding 2.8 exaflops of AI computing power — for a total of 4.6 exaflops of total capacity — to its SATURNV internal supercomputer, making it the world’s fastest AI supercomputer.
Huang also announced the NVIDIA EGX A100, bringing powerful real-time cloud-computing capabilities to the edge. Its NVIDIA Ampere architecture GPU offers third-generation Tensor Cores and new security features. Thanks to its NVIDIA Mellanox ConnectX-6 SmartNIC, it also includes secure, lightning-fast networking capabilities.
Software for the Most Important Applications in the World Today
Huang also announced NVIDIA GPUs will power major software applications for accelerating three critical usages: managing big data, creating recommender systems and building real-time, conversational AI.
These new tools arrive as the effectiveness of machine learning has driven companies to collect more and more data. “That positive feedback is causing us to experience an exponential growth in the amount of data that is collected,” Huang said.
To help organizations of all kinds keep up, Huang announced support for NVIDIA GPU acceleration on Spark 3.0, describing the big data analytics engine as “one of the most important applications in the world today.”
Built on RAPIDS, Spark 3.0 shatters performance benchmarks for extracting, transforming and loading data, Huang said. It’s already helped Adobe Intelligent Services achieve a 90 percent compute cost reduction.
Key cloud analytics platforms — including Amazon SageMaker, Azure Machine Learning, Databricks, Google Cloud AI and Google Cloud Dataproc — will all accelerate with NVIDIA, Huang announced.
“We’re now prepared for a future where the amount of data will continue to grow exponentially from tens or hundreds of petabytes to exascale and beyond,” Huang said.
Huang also unveiled NVIDIA Merlin, an end-to-end framework for building next-generation recommender systems, which are fast becoming the engine of a more personalized internet. Merlin slashes the time needed to create a recommender system from a 100-terabyte dataset to 20 minutes from four days, Huang said.
And he detailed NVIDIA Jarvis, a new end-to-end platform for creating real-time, multimodal conversational AI that can draw upon the capabilities unleashed by NVIDIA’s AI platform.
Huang highlighted its capabilities with a demo that showed him interacting with a friendly AI, Misty, that understood and responded to a sophisticated series of questions about the weather in real time.
Huang also dug into NVIDIA’s swift progress in real-time ray tracing since NVIDIA RTX was launched at SIGGRAPH in 2018, and he announced that NVIDIA Omniverse, which allows “different designers with different tools in different places doing different parts of the same design,” to work together simultaneously is now available for early access customers.
Autonomous vehicles are one of the greatest computing challenges of our time, Huang said, an area where NVIDIA continues to push forward with NVIDIA DRIVE.
NVIDIA DRIVE will use the new Orin SoC with an embedded NVIDIA Ampere GPU to achieve the energy efficiency and performance to offer a 5-watt ADAS system for the front windshield as well as scale up to a 2,000 TOPS, level-5 robotaxi system.
Now automakers have a single computing architecture and single software stack to build AI into every one of their vehicles.
“It’s now possible for a carmaker to develop an entire fleet of cars with one architecture, leveraging the software development across their whole fleet,” Huang said.
The NVIDIA DRIVE ecosystem now encompasses cars, trucks, tier one automotive suppliers, next-generation mobility services, startups, mapping services, and simulation.
And Huang announced NVIDIA is adding NVIDIA DRIVE RC for managing entire fleets of autonomous vehicles to its suite of NVIDIA DRIVE technologies.
BMW’s 30 factories around the globe build one vehicle every 56 seconds: that’s 40 different models, each with hundreds of different options, made from 30 million parts flowing in from nearly 2,000 suppliers around the world, Huang explained.
BMW joins a sprawling NVIDIA robotics global ecosystem that spans delivery services, retail, autonomous mobile robots, agriculture, services, logistics, manufacturing and healthcare.
In the future, factories will, effectively, be enormous robots. “All of the moving parts inside will be driven by artificial intelligence,” Huang said. “Every single mass-produced product in the future will be customized.”
When startup Kensho was acquired by S&P Global for $550 million in March 2018, Georg Kucsko felt like a kid in a candy store.
The head of AI research at Kensho and his team had one of Willy Wonka’s golden tickets dropped into their laps: S&P’s 100,000 hours of recorded and painstakingly transcribed audio files.
The dataset helped Kensho build Scribe, considered the most accurate voice recognition software in the finance industry. It transcribes earning calls and other business meetings fast and at low cost, helping extend S&P’s coverage by 1,500 companies and earning kudos from the company’s CEO in his own quarterly calls.
“We used these transcripts to train speech-recognition models that could do the work faster — that was a new angle no one had thought of. It allowed us to drastically improve the process,” said Kucsko.
Deloitte uses conversational AI in its dTrax software that helps companies manage complex contracts. For instance, dTrax can find and update key passages in lengthy agreements when regulations change or when companies are planning a big acquisition. The software, which runs on NVIDIA GPUs, won a smart-business award from the Financial Times in 2019.
China’s largest insurer, Ping An, already uses conversational AI to sell insurance. It’s a performance-hungry application that runs on GPUs because it requires a lot of intelligence to gauge a speaker’s mood and emotion.
In healthcare, Nuance provides conversational AI software, trained with NVIDIA GPUs and software, that most radiologists use to make transcriptions and many other doctors use to document patient exams.
Voca.ai deploys AI models on NVIDIA GPUs because they slash latency on inference jobs in half compared to CPUs. That’s key for its service that automates responses to customer support calls from as many as 10 million people a month for one of its largest users.
Framing the Automated Conversation
The technology is built on a broad software foundation of conversational AI libraries, all accelerated by GPUs. The most popular ones get lots of “stars” on the GitHub repository, the equivalent of “likes” on Facebook or bookmarks in a browser. They include:
To make it easier to get started in conversational AI, NVIDIA provides a growing set of software tools, too.
Kensho and Voca.ai already use NVIDIA NeMo to build state-of-the-art conversational AI algorithms. These machine- and deep-learning models can be fine-tuned on any company’s data to deliver the best accuracy for its particular use case.
When NVIDIA announced NeMo last fall, it also released Jasper, a 54-layer model for automatic speech recognition that can lower word error rates to less than 3 percent. It’s one of several models optimized for accuracy, available from NGC, NVIDIA’s catalog for GPU-accelerated software.
Say Hello to Jarvis, the Valet of Conversational AI
Today we’re rolling out NVIDIA Jarvis, an application framework for building and deploying AI services that fuse vision, speech and language understanding. The services can be deployed in the cloud, in the data center or at the edge.
Jarvis includes deep-learning models for building GPU-accelerated conversational AI applications capable of understanding terminology unique to each company and its customers. It includes NeMo to train these models on specific domains and customer data. The models can take advantage of TensorRT to minimize latency and maximize throughput in AI inference tasks.
Jarvis services can run in 150 milliseconds on an A100 GPU. That’s far below the 300ms threshold for real-time application and the 25 seconds it would take to run the same models on a CPU.
Jarvis Is Ready to Serve Today
Kensho is already testing some of the tools in Jarvis.
“We are using NeMo a lot, and we like it quite a lot,” said Kucsko. “Insights from NVIDIA, even using different datasets for training at scale, made crucial insights for us,” he said.
For Kensho, using such tools is a natural next step in tuning the AI models inside Scribe. When Kensho was developing the original software, NVIDIA helped train those models on one of its DGX SuperPOD systems.
“We had the data and they had the GPUs, and that led to an amazing partnership with our two labs collaborating,” Kucsko said.
“NVIDIA GPUs are indispensable for deep learning work like that. For anything large scale in deep learning, there’s pretty much not another option,” he added.
AI is alive at the edge of the network, where it’s already transforming everything from car makers to supermarkets. And we’re just getting started.
NVIDIA’s AI Edge Innovation Center, a first for this year’s Mobile World Congress (MWC) in Barcelona, will put attendees at the intersection of AI, 5G and edge computing. There, they can hear about best practices for AI at the edge and get an update on how NVIDIA GPUs are paving the way to better, smarter 5G services.
It’s a story that’s moving fast.
AI was born in the cloud to process the vast amounts of data needed for jobs like recommending new products and optimizing news feeds. But most enterprises interact with their customers and products in the physical world at the edge of the network — in stores, warehouses and smart cities.
The need to sense, infer and act in real time as conditions change is driving the next wave of AI adoption at the edge. That’s why a growing list of forward-thinking companies are building their own AI capabilities using the NVIDIA EGX edge-computing platform.
All these software stacks ride on our CUDA-X libraries, tools, and technologies that run on an installed base of more than 500 million NVIDIA GPUs.
Carriers Make the Call
At MWC Los Angeles this year, NVIDIA founder and CEO Jensen Huang announced Aerial, software that rides on the EGX platform to let telecommunications companies harness the power of GPU acceleration.
With Aerial, carriers can both increase the spectral efficiency of their virtualized 5G radio-access networks and offer new AI services for smart cities, smart factories, cloud gaming and more — all on the same computing platform.
In Barcelona, NVIDIA and partners including Ericsson will give an update on how Aerial will reshape the mobile edge network.
Verizon is already using NVIDIA GPUs at the edge to deliver real-time ray tracing for AR/VR applications over 5G networks.
It’s one of several ways telecom applications can be taken to the next level with GPU acceleration. Imagine having the ability to process complex AI jobs on the nearest base station with the speed and ease of making a cellular call.
Your Dance Card for Barcelona
For a few days in February, we will turn our innovation center — located at Fira de Barcelona, Hall 4 — into a virtual university on AI with 5G at the edge. Attendees will get a world-class deep dive on this strategic technology mashup and how companies are leveraging it to monetize 5G.
Sessions start Monday morning, Feb. 24, and include AI customer case studies in retail, manufacturing and smart cities. Afternoon talks will explore consumer applications such as cloud gaming, 5G-enabled cloud AR/VR and AI in live sports.
We’ve partnered with the organizers of MWC on applied AI sessions on Tuesday, Feb. 25. These presentations will cover topics like federated learning, an emerging technique for collaborating on the development and training of AI models while protecting data privacy.
Wednesday’s schedule features three roundtables where attendees can meet executives working at the intersection of AI, 5G and edge computing. The week also includes two instructor-led sessions from the NVIDIA Deep Learning Institute that trains developers on best practices.
See Demos, Take a Meeting
For a hands-on experience, check out our lineup of demos based on the NVIDIA EGX platform. These will highlight applications such as object detection in a retail setting, ways to unsnarl traffic congestion in a smart city and our cloud-gaming service GeForce Now.
To learn more about the capabilities of AI, 5G and edge computing, check out the full agenda and book an appointment here.
Putting AI to work on a massive scale, Alibaba recently harnessed NVIDIA GPUs to serve its customers on 11/11, the year’s largest shopping event.
During Singles Day, as the Nov. 11 shopping event is also known, it generated $38 billion in sales. That’s up by nearly a quarter from last year’s $31 billion, and more than double online sales on Black Friday and Cyber Monday combined.
Singles Day — which has grown from $7 million a decade ago — illustrates the massive scale AI has reached in global online retail, where no player is bigger than Alibaba.
Each day, over 100 million shoppers comb through billions of available products on its site. Activity skyrockets on peak shopping days, when Alibaba’s systems field hundreds of thousands of queries a second.
And AI keeps things humming along, according to Lingjie Xu, Alibaba’s director of heterogeneous computing.
“To ensure these customers have a great user experience, we deploy state-of-the-art AI technology at massive scale using the NVIDIA accelerated computing platform, including T4 GPUs, cuBLAS, customized mixed precision and inference acceleration software,” he said.
“The platform’s intuitive search capabilities and reliable recommendations allow us to support a model six times more complex than in the past, which has driven a 10 percent improvement in click-through rate. Our largest model shows 100 times higher throughput with T4 compared to CPU,” he said.
One key application for Alibaba and other modern online retailers: recommender systems that display items that match user preferences, improving the click-through rate — which is closely watched in the e-commerce industry as a key sales driver.
Every small improvement in click-through rate directly impacts the user experience and revenues. A 10 percent improvement from advanced recommender models that can run in real time, and at incredible scale, is only possible with GPUs.
Alibaba’s teams employ NVIDIA GPUs to support a trio of optimization strategies around resource allocation, model quantization and graph transformation to increase throughput and responsiveness.
This has enabled NVIDIA T4 GPUs to accelerate Alibaba’s wide and deep recommendation model and deliver 780 queries per second. That’s a huge leap from CPU-based inference, which could only deliver three queries per second.
Alibaba has also deployed NVIDIA GPUs to accelerate its systems for automatic advertisement banner-generating, ad recommendation, imaging processing to help identify fake products, language translation, and speech recognition, among others. As the world’s third-largest cloud service provider, Alibaba Cloud provides a wide range of heterogeneous computing products capable of intelligent scheduling, automatic maintenance and real-time capacity expansion.
Alibaba’s far-sighted deployment of NVIDIA’s AI platform is a straw in the wind, indicating what more is to come in a burgeoning range of industries.
Just as its tools filter billions of products for millions of consumers, AI recommenders running on NVIDIA GPUs will find a place among other countless other digital services — app stores, news feeds, restaurant guides and music services among them — keeping customers happy.
Financial fraud is on the rise. As the number of global transactions increase and digital technology advances, the complexity and frequency of fraudulent schemes are keeping pace.
Security company McAfee estimated in a 2018 report that cybercrime annually costs the global economy some $600 billion, or 0.8 percent of global gross domestic product.
One of the most prevalent — and preventable — types of cybercrime is credit card fraud, which is exacerbated by the growth in online transactions.
That’s why American Express, a global financial services company, is developing deep learning generative and sequential models to prevent fraudulent transactions.
“The most strategically important use case for us is transactional fraud detection,” said Dmitry Efimov, vice president of machine learning research at American Express. “Developing techniques that more accurately identify and decline fraudulent purchase attempts helps us protect our customers and our merchants.”
Cashing into Big Data
The company’s effort spanned several teams that conducted research on using generative adversarial networks, or GANs, to create synthetic data based on sparsely populated segments.
In most financial fraud use cases, machine learning systems are built on historical transactional data. The systems use deep learning models to scan incoming payments in real time, identify patterns associated with fraudulent transactions and then flag anomalies.
In some instances, like new product launches, GANs can produce additional data to help train and develop more accurate deep learning models.
Given its global integrated network with tens of millions of customers and merchants, American Express deals with massive volumes of structured and unstructured data sets.
Using several hundred data features, including the time stamps for transactional data, the American Express teams found that sequential deep learning techniques, such as long short-term memory and temporal convolutional networks, can be adapted for transaction data to produce superior results compared to classical machine learning approaches.
The results have paid dividends.
“These techniques have a substantial impact on the customer experience, allowing American Express to improve speed of detection and prevent losses by automating the decision-making process,” Efimov said.
Closing the Deal with NVIDIA GPUs
Due to the huge amount of customer and merchant data American Express works with, they selected NVIDIA DGX-1 systems, which contain eight NVIDIA V100 Tensor Core GPUs, to build models with both TensorFlow and PyTorch software.
Its NVIDIA GPU-powered machine learning techniques are also used to forecast customer default rates and to assign credit limits.
“For our production environment, speed is extremely important with decisions made in a matter of milliseconds, so the best solution to use are NVIDIA GPUs,” said Efimov.
As the systems go into production in the next year, the teams plan on using the NVIDIA TensorRT platform for high-performance deep learning inference to deploy the models in real time, which will help improve American Express’ fraud and credit loss rates.
Efimov will be presenting his team’s work at the GPU Technology Conference in San Jose in March. To learn more about credit risk management use cases from American Express, register for GTC, the premier AI conference for insights, training and direct access to experts on the key topics in computing across industries.
With this new offering, AI is no longer a research project.
Most companies still keep their data inside their own walls because they see it as their core intellectual property. But for deep learning to transition from research into production, enterprises need the flexibility and ease of development the cloud offers — right beside their data. That’s a big part of what AWS Outposts with T4 GPUs now enables.
With this new offering, enterprises can install a fully managed rack-scale appliance next to the large data lakes stored securely in their data centers.
AI Acceleration Across the Enterprise
To train neural networks, every layer of software needs to be optimized, from NVIDIA drivers to container runtimes and application frameworks. AWS services like Sagemaker, Elastic MapReduce and many others designed on custom-built Amazon Machine Images require model development to start with the training on large datasets. With the introduction of NVIDIA-powered AWS Outposts, those services can now be run securely in enterprise data centers.
The GPUs in Outposts accelerate deep learning as well as high performance computing and other GPU applications. They all can access software in NGC, NVIDIA’s hub for GPU-accelerated software optimization, which is stocked with applications, frameworks, libraries and SDKs that include pre-trained models.
For AI inference, the NVIDIA EGX edge-computing platform also runs on AWS Outposts and works with the AWS Elastic Kubernetes Service. Backed by the power of NVIDIA T4 GPUs, these services are capable of processing orders of magnitudes more information than CPUs alone. They can quickly derive insights from vast amounts of data streamed in real time from sensors in an Internet of Things deployment whether it’s in manufacturing, healthcare, financial services, retail or any other industry.
On top of EGX, the NVIDIA Metropolis application framework provides building blocks for vision AI, geared for use in smart cities, retail, logistics and industrial inspection, as well as other AI and IoT use cases, now easily delivered on AWS Outposts.
Users of high-end graphics have choices, too. Remote designers, artists and technical professionals who need to access large datasets and models can now get both cloud convenience and GPU performance.
Graphics professionals can benefit from the same NVIDIA Quadro technology that powers most of the world’s professional workstations not only on the public AWS cloud, but on their own internal cloud now with AWS Outposts packing T4 GPUs.
Whether they’re working locally or in the cloud, Quadro users can access the same set of hundreds of graphics-intensive, GPU-accelerated third-party applications.
The Quadro Virtual Workstation AMI, available in AWS Marketplace, includes the same Quadro driver found on physical workstations. It supports hundreds of Quadro-certified applications such as Dassault Systèmes SOLIDWORKS and CATIA; Siemens NX; Autodesk AutoCAD and Maya; ESRI ArcGIS Pro; and ANSYS Fluent, Mechanical and Discovery Live.
Those who are keeping score in AI know that NVIDIA GPUs set the performance standards for training neural networks in data centers in December and again in July. Industry benchmarks released today show we’re setting the pace for running those AI networks in and outside data centers, too.
NVIDIA Turing GPUs and our Xavier system-on-a-chip posted leadership results in MLPerf Inference 0.5, the first independent benchmarks for AI inference. Before today, the industry was hungry for objective metrics on inference because its expected to be the largest and most competitive slice of the AI market.
Among a dozen participating companies, only the NVIDIA AI platform had results across all five inference tests created by MLPerf, an industry benchmarking group formed in May 2018. That’s a testament to the maturity of our CUDA-X AI and TensorRT software. They ease the job of harnessing all our GPUs that span uses from data center to the edge.
MLPerf defined five inference benchmarks that cover three established AI applications — image classification, object detection and translation. Each benchmark has four aspects. Server and offline scenarios are most relevant for data center uses cases, while single- and multi-stream scenarios speak to the needs of edge devices and SoCs.
NVIDIA topped all five benchmarks for both data center scenarios (offline and server), with Turing GPUs providing the highest performance per processor among commercially available products.
The offline scenario represents data center tasks such as tagging photos, where all the data is available locally. The server scenario reflects jobs such as online translation services, where data and requests are arriving randomly in bursts and lulls.
For its part, Xavier ranked as the highest performer under both edge-focused scenarios (single- and multi-stream) among commercially available edge and mobile SoCs.
An industrial inspection camera identifying defects in a fast-moving production line is a good example of a single-stream task. The multi-stream scenario tests how many feeds a chip can handle — a key capability for self-driving cars that might use a half-dozen cameras or more.
The results reveal the power of our CUDA and TensorRT software. They provide a common platform that enables us to show leadership results across multiple products and use cases, a capability unique to NVIDIA.
We competed in data center scenarios with two GPUs. Our TITAN RTX demonstrated the full potential of our Turing-class GPUs, especially in demanding tasks such as running a GNMT model used for language translation.
The versatile and widely used NVIDIA T4 Tensor Core GPU showed strong results across several scenarios. These 70-watt GPUs are designed to easily fit into any server with PCIe slots, enabling users to expand their computing power as needed for inference jobs known to scale well.
MLPerf has broad backing from industry and academia. Its members include Arm, Facebook, Futurewei, General Motors, Google, Harvard University, Intel, MediaTek, Microsoft, NVIDIA and Xilinx. To its credit, the new benchmarks attracted significantly more participants than two prior training competitions.
NVIDIA demonstrated its support for the work by submitting results in 19 of 20 scenarios, using three products in a total of four configurations. Our partner Dell EMC and our customer Alibaba also submitted results using NVIDIA GPUs. Together, we gave users a broader picture of the potential of our product portfolio than any other participant.
Fresh Perspectives, New Products
Inference is the process of running AI models in real-time production systems to filter actionable insights from a haystack of data. It’s an emerging technology that’s still evolving, and NVIDIA isn’t standing still.
Today we announced a low-power version of the Xavier SoC used in the MLPerf tests. At full throttle, Jetson Xavier NX delivers up to 21 TOPS while consuming just 15 watts. It aims to drive a new generation of performance-hungry, power-pinching robots, drones and other autonomous devices.
In addition to the new hardware, NVIDIA released new TensorRT 6 optimizations used in the MLPerf benchmarks as open source on GitHub. You can learn more about the optimizations in this MLPerf developer blog. We continuously evolve this software so our users can reap benefits from increasing AI automation and performance.
Making Inference Easier for Many
One big takeaway from today’s MLPerf tests is inference is hard. For instance, in actual workloads inference is even more demanding than in the benchmarks because it requires significant pre- and post-processing steps.
In his keynote address at GTC last year, NVIDIA founder and CEO Jensen Huang compressed the complexities into one word: PLASTER. Modern AI inference requires excellence in Programmability, Latency, Accuracy, Size-of-model, Throughput, Energy efficiency and Rate of Learning, he said.
That’s why users are increasingly embracing high-performance NVIDIA GPUs and software to handle demanding inference jobs. They include a who’s who of forward-thinking companies such as BMW, Capital One, Cisco, Expedia, John Deere, Microsoft, PayPal, Pinterest, P&G, Postmates, Shazam, Snap, Shopify, Twitter, Verizon and Walmart.
This week, the world’s largest delivery service — the U.S. Post Service— joined the ranks of organizations using NVIDIA GPUs for both AI training and inference.
Hard-drive maker Seagate Technology expects to realize up to a 10 percent improvement in manufacturing throughput thanks to its use of AI inference running on NVIDIA GPUs. It anticipates up to a 300 percent return on investment from improved efficiency and better quality.
Pinterest relies on NVIDIA GPUs for training and evaluating its recognition models and for performing real-time inference across its 175 billion pins.
Snap uses NVIDIA T4 accelerators for inference on the Google Cloud Platform, increasing advertising effectiveness while lowering costs compared to CPU-only systems.
A Twitter spokesman nailed the trend: “Using GPUs made it possible to enable media understanding on our platform, not just by drastically reducing training time, but also by allowing us to derive real-time understanding of live videos at inference time.”
The AI Conversation About Inference
Looking ahead, conversational AI represents a giant set of opportunities and technical challenges on the horizon — and NVIDIA is a clear leader here, too.
NVIDIA already offers optimized reference designs for conversational AI services such as automatic speech recognition, text-to-speech and natural-language understanding. Our open-source optimizations for AI models such as BERT, GNMT and Jasper give developers a leg up in reaching world-class inference performance.
Top companies pioneering conversational AI are already among our customers and partners. They include Kensho, Microsoft, Nuance, Optum and many others.
There’s plenty to talk about. The MLPerf group is already working on enhancements to its current 0.5 inference tests. We’ll work hard to maintain the leadership we continue to show on its benchmarks.
MLPerf v0.5 Inference results for data center server form factors and offline and server scenarios retrieved from www.mlperf.org on Nov. 6, 2019, from entries Inf-0.5-15,Inf-0. 5-16, Inf-0.5-19, Inf-0.5-21. Inf-0.5-22, Inf-0.5-23, Inf-0.5-25, Inf-0.5-26, Inf-0.5-27. Per-processor performance is calculated by dividing the primary metric of total performance by number of accelerators reported.
MLPerf v0.5 Inference results for edge form factors and single-stream and multi-stream scenarios retrieved from www.mlperf.org on Nov. 6, 2019, from entries Inf-0.5-24, Inf-0.5-28, Inf-0.5-29.