Applying for a home mortgage can resemble a part-time job. But whether consumers are seeking out a home loan, car loan or credit card, there’s an incredible amount of work going on behind the scenes in a bank’s decision — especially if it has to say no.
To comply with an alphabet soup of financial regulations, banks and mortgage lenders have to keep pace with explaining the reasons for rejections to both applicants and regulators.
Busy in this domain, Wells Fargo will present at NVIDIA GTC21 next week some of its latest development work behind this complex decision-making using AI models accelerated by GPUs.
To inform their decisions, lenders have historically applied linear and non-linear regression models for financial forecasting and logistic and survivability models for default risk. These simple, decades-old methods are easy to explain to customers.
But machine learning and deep learning models are reinventing risk forecasting and in the process requiring explainable AI, or XAI, to allow for customer and regulatory disclosures.
Machine learning and deep learning techniques are more accurate but also more complex, which means banks need to spend extra effort to be able to explain decisions to customers and regulators.
These more powerful models allow banks to do a better job understanding the riskiness of loans, and may allow them to say yes to applicants that would have been rejected by a simpler model.
At the same time, these powerful models require more processing, so financial services firms like Wells Fargo are moving to GPU-accelerated models to improve processing, accuracy and explainability, and to provide faster results to consumers and regulators.
What Is Explainable AI?
Explainable AI is a set of tools and techniques that help understand the math inside an AI model.
XAI maps out the data inputs with the data outputs of models in a way that people can understand.
“You have all the linear sub-models, and you can see which factor is the most significant — you can see it very clearly,” said Agus Sudjianto, executive vice president and head of Corporate Model Risk at Wells Fargo, explaining his team’s recent work on Linear Iterative Feature Embedding (LIFE) in a research paper.
Wells Fargo XAI Development
The LIFE algorithm was developed to handle high prediction accuracy, ease of interpretation and efficient computation.
LIFE outperforms directly trained single-layer networks, according to Wells Fargo, as well as many other benchmark models in experiments.
Using LIFE, the bank can generate codes that correlate to model interpretability, offering the right explanations to which variables weighed heaviest in the decision. For example, codes might be generated for high debt-to-income ratio or a FICO score that fell below a set minimum for a particular loan product.
There can be anywhere from 40 to 80 different variables taken into consideration for explaining rejections.
“We assess whether the customer is able to repay the loan. And then if we decline the loan, we can give a reason from a recent code as to why it was declined,” said Sudjianto.
Learn more about the LIFE model work by attending the GTC talk by Jie Chen, managing director for Corporate Model Risk at Wells Fargo. Learn about model work on Deep ReLU Networks by attending the talk by Aijun Zhang, a quantitative analytics specialist at Wells Fargo, and Zebin Yang, a Ph.D. student at Hong Kong University.
The Arm ecosystem got a booster shot of advances from NVIDIA at GTC today.
NVIDIA discussed work with Arm-based silicon, software and service providers, showing the potential of energy-efficient, accelerated platforms and applications across client, cloud, HPC and edge computing.
NVIDIA also announced three new processors built around Arm IP, including “Grace,” its first data center CPU which takes AI, cloud and high performance computing to new heights.
Separately, the new BlueField-3 data processing unit (DPU) sports more Arm cores, opening doors to new more powerful applications in data center networking.
And NVIDIA DRIVE Atlan becomes the company’s first processor for autonomous vehicles packing an Arm-enabled DPU, showing the potential for high performance networks in automaker’s 2025 models.
A Vision of What’s Possible
In his GTC keynote, NVIDIA CEO Jensen Huang shared his vision for AI, HPC, data science, graphics and more. He also reaffirmed his pledge to expand the Arm ecosystem as part of the Arm acquisition deal NVIDIA announced in September 2020.
On the road to making that vision a reality, NVIDIA described a set of efforts to accelerate CPUs from four key Arm partners with NVIDIA GPUs, DPUs and software, enhancing apps from Arm developers.
GPUs Boost AWS Graviton2 Instances
In the cloud, NVIDIA announced it will provide GPU acceleration for Amazon Web Services Graviton2, the cloud-service provider’s own Arm-based processor. The accelerated Graviton2 instances will provide rich game-streaming experiences and lower the cost of powerful AI inference capabilities.
For example, game developers will use the AWS instances to stream Android games and other services that combine the efficiency of Graviton2 with NVIDIA RTX graphics technologies like ray tracing and DLSS.
In high performance computing, the new NVIDIA Arm HPC Developer Kit provides a high-performance, energy-efficient platform for supercomputers that combine Ampere Computing’s Altra — a CPU packing 80 Arm cores running up to 3.3 GHz — with the latest NVIDIA GPUs and DPUs.
The devkit runs a suite of NVIDIA compilers, libraries and tools for AI and HPC so developers can accelerate Arm-based systems for science and technical computing. Leading researchers including Oak Ridge and Los Alamos National Labs in the U.S. as well as national labs in South Korea and Taiwan will be among its first users.
Pumping Up Client, Edge Platforms
In PCs, NVIDIA is working with MediaTek, the world’s largest supplier of smartphone chips, to create a new class of notebooks powered by an Arm-based CPU alongside an NVIDIA RTX GPU.
The notebooks will use Arm cores and NVIDIA graphics to give consumers energy-efficient portables with no-compromise media capabilities based on a reference platform that supports Chromium, Linux and NVIDIA SDKs.
And in edge computing, NVIDIA is working with Marvell Semiconductor to team its OCTEON Arm-based processors with NVIDIA’s GPUs. Together they will speed up AI workloads for network optimization and security.
Top AI Systems Join Arm’s Family
Two powerful AI supercomputers will come online next year.
The Swiss National Supercomputing Centre is building a system with 20 exaflops of AI performance. And in the U.S., the Los Alamos National Laboratory will switch on a new AI supercomputer for its researchers.
Both will be powered by NVIDIA’s first data center CPU, “Grace,” an Arm-based processor that will deliver 10x the performance of today’s fastest servers on the most complex AI and HPC workloads.
Named after pioneering computer scientist Grace Hopper, this CPU has the plumbing needed for the data-driven AI era. It sports coherent connections running at 900 GB/s to NVIDIA GPUs, thanks to a fourth generation NVLink — that’s 14x the bandwidth of today’s servers.
More Arm Cores for Networking
NVIDIA Mellanox networking is more than doubling down on its investment in Arm. The BlueField-3 DPU announced today packs 400-Gbps links and 5x the Arm compute power of the current DPU, the BlueField-2 available today.
Simple math shows why bulking up on Arm makes sense: One BlueField-3 DPU delivers the equivalent data center services that could consume up to 300 x86 CPU cores.
The advance gives Arm developers an expanding set of opportunities to build fast, efficient and smart data center networks.
Today DPUs offload communications, storage, security and systems-management tasks. That’s enabling whole new classes of systems such as the cloud-native supercomputer NVIDIA announced today.
NVIDIA and Arm Behind the Wheel
Arm cores will debut in next-generation AI-enabled autonomous vehicles powered by NVIDIA DRIVE Atlan, the next leap on NVIDIA’s roadmap.
DRIVE Atlan will pack quite a punch, kicking out more than 1,000 trillion operations per second. Atlan marks the first time the DRIVE platform integrates a DPU, carrying Arm cores that will help it pack the equivalent of data center networking into autonomous vehicles.
The DPU in Atlan provides a platform for Arm developers to create innovative applications in security, storage, networking and more.
The Best Is Yet to Come
The expanding products and partnerships mark progress on our intention announced in October to bring the Arm ecosystem four acceleration suites:
NVIDIA AI – the industry standard for accelerating AI training and inference
RAPIDS – a suite of open-source software libraries maintained by NVIDIA to run data science and analytics on GPUs
NVIDIA HPC SDK – compilers, libraries and software tools for high performance computing
NVIDIA RTX – graphics drivers that deliver ray tracing and AI capabilities
And we’re just getting started. There’s much more to come and much more to say.
Learn about new opportunities combining NVIDIA and Arm at GTC21. Registration is free.
Recommenders personalize the internet. They suggest videos, foods, sneakers and advertisements that seem magically clairvoyant in knowing your tastes and interests.
It’s an AI that makes online experiences more enjoyable and efficient, quickly taking you to the things you want to see. While delivering content you like, it also targets tempting ads for jeans, or recommends comfort dishes that fit those midnight cravings.
But not all recommender systems can handle the data requirements to make smarter suggestions. That leads to slower training and less intuitive internet user experiences.
NVIDIA Merlin is turbocharging recommenders, boosting training and inference. Leaders in media, entertainment and on-demand delivery use the open source recommender framework for running accelerated deep learning on GPUs. Improving recommendations increases clicks, purchases — and satisfaction.
NVIDIA Merlin enables businesses of all types to build recommenders accelerated by NVIDIA GPUs.
Its collection of libraries includes tools for building deep learning-based systems that provide better predictions than traditional methods and increase clicks. Each stage of the pipeline is optimized to support hundreds of terabytes of data, all accessible through easy-to-use APIs.
Merlin is in testing with hundreds of companies worldwide. Social media and video services are evaluating it for suggestions on next views and ads. And major on-demand apps and retailers are looking at it for suggestions on new items to purchase.
Videos with Snap
With Merlin, Snap is improving the customer experience with better load times by ranking content and ads 60% faster while also reducing their infrastructure costs. Using GPUs and Merlin provides Snap with additional compute capacity to explore more complex and accurate ranking models. These improvements allow Snap to deliver even more engaging experiences at a lower cost.
Tencent: Ads that Click
China’s leading online video media platform uses Merlin HugeCTR to help connect over 500 million monthly active users with ads that are relevant and engaging. With such a huge dataset, training speed matters and determines the performance of the recommender model. Tencent deployed its real-time training with Merlin and achieved more than a 7x speedup over the original TensorFlow solution on the same GPU platform. Tencent dives into this further at its GTC presentation.
Postmates Food Picks
Merlin was designed to streamline and support recommender workflows. Postmates uses recommenders to help people decide what’s for dinner. Postmates utilizes Merlin NVTabular to optimize training time, reducing it from 1 hour on CPUs to just 5 minutes on GPUs.
Using NVTabular for feature engineering, the company reduced training costs by 95 percent and is exploring more advanced deep learning models. Postmates delves more into this in its GTC presentation.
Merlin Streamlines Recommender Workflows at Scale
As Merlin is interoperable, it provides flexibility to accelerate recommender workflow pipelines.
The open beta release of the Merlin recommendation engine delivers leaps in data loading and training of deep learning systems.
NVTabular reduces data preparation time by GPU-accelerating feature transformations and preprocessing. NVTabular, which makes loading massive data lakes into training pipelines easier, gets multi-GPU support and improved interoperability with TensorFlow and PyTorch.
Merlin’s Magic for Training
Merlin HugeCTR is the main training component. It’s designed for training deep learning recommender systems and comes with its own optimized data loader, vastly outperforming generic deep learning frameworks. HugeCTR provides a parquet data reader to digest the NVTabular preprocessed data. HugeCTR is a deep neural network training framework specifically designed for recommender workflows capable of distributed training across multiple GPUs and nodes for maximum performance.
The announcements were among the highlights of a streamed presentation from Jeff Fisher, senior vice president of NVIDIA’s GeForce business.
Amid the unprecedented challenges of 2020, “millions of people tuned into gaming — to play, create and connect with one another,” Fisher said. “More than ever, gaming has become an integral part of our lives.” Among the stats he cited:
Steam saw its number of concurrent users more than double from 2018
Discord, a messaging and social networking service most popular with gamers, has seen monthly active users triple to 140 million from two years ago
In 2020 alone, more than 100 billion hours of gaming content have been watched on YouTube
Also in 2020, viewership of esports reached half a billion people
Meanwhile, NVIDIA has been delivering a series of major gaming advancements, Fisher explained.
RTX ‘the New Standard’
Two years ago, NVIDIA introduced a breakthrough in graphics real-time ray tracing and AI-based DLSS (deep learning super sampling), together called RTX, he said.
NVIDIA quickly partnered with Microsoft and top developers and game engines to bring the visual realism of movies to fully interactive gaming, Fisher said.
In fact, 36 games are now powered by RTX. They include the #1 Battle Royale game, the #1 RPG, the #1 MMO and the #1 best-selling game of all time – Minecraft.
Now, we’re announcing more games that support RTX technology, including DLSS, which is coming to both Call of Duty: Warzone and Square Enix’s new IP, Outriders. And Five Nights at Freddy’s: Security Breach and F.I.S.T.: Forged in Shadow Torch will be adding ray tracing and DLSS.
Fisher announced that Overwatch and Rainbow Six Siege are also adopting NVIDIA Reflex. Now, 7 of the top 10 competitive shooters support Reflex.
And over the past four months, NVIDIA has launched four NVIDIA Ampere architecture-powered graphics cards, from the ultimate BFGPU — the GeForce RTX 3090 priced at $1,499 — to the GeForce RTX 3060 Ti at $399.
“Ampere has been our fastest selling architecture ever, selling almost twice as much as our prior generation,” Fisher said.
GeForce RTX 3060: An NVIDIA Ampere GPU for Every Gamer
With gaming now a key part of global culture, the new GeForce RTX 3060 brings the power of the NVIDIA Ampere architecture to every gamer, Fisher said.
“The RTX 3060 offers twice the raster performance of the GTX 1060 and 10x the ray-tracing performance,” Fisher said, noting that the GTX 1060 is the world’s most popular GPU. “The RTX 3060 powers the latest games with RTX On at 60 frames per second.”
The RTX 3060 has 13 shader teraflops, 25 RT teraflops for ray tracing, and 101 tensor teraflops to power DLSS, an NVIDIA technology introduced in 2019 that uses AI to accelerate games. And it boasts 12 gigabytes of GDDR6 memory.
“With most of the installed base underpowered for the latest games, we’re bringing RTX to every gamer with the GeForce RTX 3060,” Fisher said.
The GeForce RTX 3060 starts at just $329 and will be available worldwide in late February.
That’s why NVIDIA introduced Max-Q four years ago, Fisher explained.
Max-Q is a system design approach that delivers high performance in thin and light gaming laptops.
“It has fundamentally changed how laptops are built, every aspect — the CPU GPU, software, PCB design, power delivery, thermals — are optimized for power and performance,” Fisher said.
NVIDIA’s third-gen Max-Q technologies use AI and new system optimizations to make high-performance gaming laptops faster and better than ever, he said.
Fisher introduced Dynamic Boost 2.0, which for the first time uses AI to shift power between the CPU, GPU and now, GPU memory.
“So your laptop is constantly optimizing for maximum performance,” Fisher said.
Fisher also introduced WhisperMode 2.0, which delivers a new level of acoustic control for gaming laptops.
Pick your desired acoustics and WhisperMode 2.0’s AI-powered algorithms manage the CPU, GPU system temperature and fan speeds to “deliver great acoustics at the best possible performance,” Fisher explained.
Another new feature, Resizable BAR, uses the advanced capabilities of PCI Express to boost gaming performance.
Games use GPU memory for textures, shaders and geometry — constantly updating as the player moves through the world.
Today, only part of the GPU’s memory can be accessed at any one time by the CPU, requiring many memory updates, Fisher explained.
With Resizable BAR, the game can access the entire GPU memory, allowing for multiple updates at the same time, improving performance, Fisher said.
Resizable BAR will also be supported on GeForce RTX 30 Series graphics cards for desktops, starting with the GeForce RTX 3060. NVIDIA and GPU partners are readying VBIOS updates for existing GeForce RTX 30 series graphics cards starting in March.
Finally, NVIDIA DLSS offers a breakthrough for gaming laptops. It uses AI and RTX Tensor Cores to deliver up to 2x to performance in the same power envelope.
World’s Fastest Laptops for Gamers and Creators
Starting at $999, RTX 3060 laptops are “faster than anything on the market today,” Fisher said.
They’re 30 percent faster than the PlayStation 5 and deliver 90 frames per second on the latest games at ultra settings 1080p, Fisher said.
Starting at $1,299, GeForce RTX 3070 laptops are “a 1440p gaming beast.”
Boasting twice the pixels of 1080p, this new generation of laptops “provides the perfect mix of high-fidelity graphics and great performance.”
And starting at $1,999, GeForce RTX 3080 laptops will come with up to 16 gigabytes of GDDR6 memory.
They’re “the world’s fastest laptop for gamers and creators,” Fisher said, delivering hundreds of frames per second with RTX on.
As a result, laptop gamers will be able to play at 240 frames per second, across top titles like Overwatch, Rainbow Six Siege, Valorant and Fortnite, Fisher said.
Manufacturers worldwide, starting Jan. 26, will begin shipping over 70 different GeForce RTX gaming and creator laptops featuring GeForce RTX 3080 and GeForce RTX 3070 laptop GPUs, followed by GeForce RTX 3060 laptop GPUs on Feb. 2.
The GeForce RTX 3060 graphics card will be available in late February, starting at $329, as custom boards — including stock-clocked and factory-overclocked models — from top add-in card providers such as ASUS, Colorful, EVGA, Gainward, Galaxy, Gigabyte, Innovision 3D, MSI, Palit, PNY and Zotac.
Look for GeForce RTX 3060 GPUs at major retailers and etailers, as well as in gaming systems by major manufacturers and leading system builders worldwide.
“RTX is the new standard, and the momentum continues to grow,” Fisher said.
Once the founder of a wearable computing startup, Arye Barnehama understands the toils of manufacturing consumer devices. He moved to Shenzhen in 2014 to personally oversee production lines for his brain waves-monitoring headband, Melon.
It was an experience that left an impression: manufacturing needed automation.
His next act is Elementary Robotics, which develops robotics for manufacturing. Elementary Robotics, based in Los Angeles, was incubated at Pasadena’s Idealab.
Founded in 2017, Elementary Robotics recently landed a $12.7 million Series A round of funding, including investment from customer Toyota.
Elementary Robotics is in deployment with customers who track thousands of parts. Its system is constantly retraining algorithms for improvements to companies’ inspections.
“Using the NVIDIA Jetson edge AI platform, we put quite a bit of engineering effort into tracking for 100 percent of inferences, at high frame rates,” said Barnehama, the company’s CEO.
Jetson for Inspections
Elementary Robotics has developed its own hardware and software for inspections used in manufacturing. It offers a Jetson-powered robot that can examine parts for defects. It aims to improve quality with better tracking of parts and problems.
Detecting the smallest of defects on a fast moving production line requires processing of high-resolution camera data with AI in real time. This is made possible with the embedded CUDA-enabled GPU and the CUDA-X AI software on Jetson. As the Jetson platform makes decisions from video streams, these are all ingested into its cloud database so that customers are able to observe and query the data.
The results, along with the live video, are also then published to the Elementary Robotics web application, which can be accessed from anywhere.
Elementary Robotics’ system also enables companies to inspect parts from suppliers before putting them into the production line, avoiding costly failures. It is used for inspections of assemblies on production lines as well as for quality control at post-production.
Its applications include inspections of electronic printed circuit boards and assemblies, automotive components, and gears for light industrial use. Elementary Robotics customers also use its platform in packaging and consumer goods such as bottles, caps and labels.
“Everyone’s demand for quality is always going up,” said Barnehama. “We run real-time inference on the edge with NVIDIA systems for inspections to help improve quality.”
The Jetson platform recently demonstrated leadership in MLPerf AI inference benchmarks in SoC-based edge devices for computer vision and conversational AI use cases.
Elementary Robotics is a member of NVIDIA Inception, a virtual accelerator program that helps startups in AI and data science get to market faster.
Traceability of Operations
The startup’s Jetson-enabled machine learning system can handle split-second anomaly detection to catch mistakes on the production lines. And when there’s a defective part returned, companies that rely on Elementary Robotics can try to understand how it happened. Use cases include electronics, automotive, medical, consumer packaged goods, logistics and other applications.
For manufacturers, such traceability of operations is important so that companies can go back and find and fix the causes of problems for improved reliability, said Barnehama.
“You want to be able to say, ‘OK, this defective item got returned, let me look up when it was inspected and make sure I have all the inspection data,’” added Barnehama.
NVIDIA Jetson is used by enterprise customers, developers and DIY enthusiasts for creating AI applications, as well as students and educators for learning and teaching AI.
Pinterest now has more than 440 million reasons to offer the best visual search experience. That’s because its monthly active users are tracking this high for its popular image sharing and social media service.
Visual search enables Pinterest users to search for images using text, screenshots or camera photos. It’s the core AI behind how people build their Boards of Pins — collections of images by themes — around their interests and plans. It’s also how people on Pinterest can take action on the inspiration they discover, such as shopping and making purchases based on the products within scenes.
But tracking more than 240 billion images and 5 billion Boards is no small data trick.
This requires visual embeddings — mathematical representations of objects in a scene. Visual embeddings use models for automatically generating and evaluating visualizations to show how similar two images are — say, a sofa in a TV show’s living room compared to ones for sale at retailers.
Pinterest is improving its search results by pretraining its visual embeddings on a smaller dataset. The overall goal is to improve for one unified visual embedding that can perform well for its key business features.
Powered by NVIDIA V100 Tensor Core GPUs, this technique pre-trains Pinterest’s neural nets on a subset of about 1.3 billion images to yield improved relevancy across the wider set of hundreds of billions of images.
Improving results on the unified visual embedding in this fashion can benefit all applications on Pinterest, said Josh Beal, a machine learning researcher for Visual Search at the company.
“This model is fine-tuned on various multitask datasets. And the goal of this project was to scale the model to a large scale,” he said.
Benefitting Shop the Look
With so many visuals, and new ones coming in all the time, Pinterest is continuously training its neural networks to identify them in relation to others.
A popular visual search feature, Pinterest’s Shop the Look enables people to shop for home and fashion items. By tapping into visual embeddings, Shop the Look can identify items in Pins and connect Pinners to those products online.
Product matches are key to its visual-driven commerce. And it isn’t an easy problem to solve at Pinterest scale.
Yet it matters. Another Pinterest visual feature is the ability to search specific products within an image, or Pin. Improving the accuracy or recommendations with visual embedding improves the magic factor in matches, boosting people’s experience of discovering relevant products and ideas.
An additional feature, Pinterest’s Lens camera search, aims to recommend visually relevant Pins based on the photos Pinners take with their cameras.
“Unified embedding for visual search benefits all these downstream applications,” said Beal.
Making Visual Search More Powerful
Several Pinterest teams have been working to improve visual search on the hundreds of billions of images within Pins. But given the massive scale of the effort and its cost and engineering resource restraints, Pinterest wanted to optimize its existing architecture.
With some suggested ResNeXt-101 architecture optimizations and by simply upgrading to the latest releases of NVIDIA libraries, including cuDNN v8, automated mixed precision and NCCL, Pinterest was able to improve training performance of their models by over 60 percent.
NVIDIA’s GPU-accelerated libraries are constantly being updated to enable companies such as Pinterest to get more performance out of their existing hardware investment.
“It has improved the quality of the visual embedding, so that leads to more relevant results in visual search,” said Beal.
Academic researchers are developing AI to solve challenging problems with everything from agricultural robotics to autonomous flying machines.
To help AI research like this make the leap from academia to commercial or government deployment, NVIDIA today announced the Applied Research Accelerator Program. The program supports applied research on NVIDIA platforms for GPU-accelerated application deployments.
The program will initially focus on robotics and autonomous machines. Worldwide spending on robotics systems and drones is forecast to reach $241 billion by 2023, an 88 percent increase from the $128.7 billion in spending expected for 2020, according to IDC. The program will also extend to other domains such as Data Science, NLP, Speech and Conversational AI in the months ahead.
The new program will support researchers and the organizations they work with in rolling out the next generation of applications developed on NVIDIA AI platforms, including the Jetson developer kits and SDKs like DeepStream and Isaac.
Researchers working with sponsoring organizations will also gain support from NVIDIA through technical guidance, hardware grants, funding, grant application support, AI training programs, not to mention networking and marketing opportunities.
NVIDIA is now accepting applications to the program from researchers working to apply robotics and AI for automation in collaboration with enterprises seeking to deploy new technologies in the market.
Accelerating and Deploying AI Research
The NVIDIA Applied Research Accelerator Program’s first group of participants have already demonstrated AI capabilities meriting further development for agriculture, logistics and healthcare.
The University of Florida is developing AI applications for smart sprayers used in agriculture, and working with Chemical Containers Inc. to deploy AI on machines running NVIDIA Jetson to reduce the amount of plant protection products applied to tree crops.
The Institute for Factory Automation and Production Systems at Friedrich-Alexander-University Erlangen-Nuremberg, based in Germany, is working with materials handling company KION and the intralogistics research association IFL to design drones for warehouse autonomy using NVIDIA Jetson.
The Massachusetts Institute of Technology is developing AI applications for disinfecting surfaces with UV-C light using NVIDIA Jetson. It’s also working with Ava Robotics to deploy autonomous disinfection on robots to minimize human supervision and additional risk of exposure to COVID-19.
Applied Research Accelerator Program Benefits
NVIDIA offers hardware grants along with funding in some cases for academic researchers who can demonstrate AI feasibility in practical applications. The program also provides letters of support for third-party grant applications submitted by researchers.
Members will also have access to technical guidance on using NVIDIA platforms, including Jetson, as well as Isaac and DeepStream.
Membership in the new program includes access to training courses via the Deep Learning Institute to help researchers master a wide range of AI technologies.
Virtual this year, the SC20 Student Cluster Competition was still all about teams vying for top supercomputing performance in the annual battle for HPC bragging rights.
That honor went to Beijing-based Tsinghua University, whose six-member undergraduate student team clocked in 300 teraflops of processing performance.
A one teraflop computer can process one trillion floating-point operations per second.
The Virtual Student Cluster Competition was this year’s battleground for 19 teams. Competitors consisted of either high school or undergraduate students. Teams were made up of six members, an adviser and vendor partners.
In the 72-hour competition, student teams designed and built virtual clusters running NVIDIA GPUs in the Microsoft Azure cloud. Students completed a set of benchmarks and real-world scientific workloads.
Teams ran the Gromac molecular dynamics application, tackling COVID-19 research. They also ran the CESM application to work on optimizing climate modeling code. The “reproducibility challenge” called on the teams to replicate results from an SC19 research paper.
Among other hurdles, teams were tossed a surprise exascale computing project mini-application, miniVite, to test their chops at compiling, running and optimizing.
A leaderboard tracked performance results of their submissions and the amount of money spent on Microsoft Azure as well as the burn rate of their spending by the hour on cloud resources.
Roller-Coaster Computing Challenges
The Georgia Institute of Technology competed for its second time. This year’s squad, dubbed Team Phoenix, had the good fortune of landing advisor Vijay Thakkar, a Gordon Bell Prize nominee this year.
Half of the team members were teaching assistants for introductory systems courses at Georgia Tech, said team member Sudhanshu Agarwal.
Georgia Tech used NVIDIA GPUs “wherever it was possible, as GPUs reduced computation time,” said Agarwal.
“We had a lot of fun this year and look forward to participating in SC21 and beyond,” he said.
Pan Yueyang, a junior in computer science at Peking University, joined his university’s supercomputing team before taking the leap to participate in the SC20 battle. But it was full of surprises, he noted.
He said that during the competition his team ran into a series of unforeseen hiccups. “Luckily it finished as required and the budget was slightly below the limitation,” he said.
Jacob Xiaochen Li, a junior in computer science at the University of California, San Diego, said his team was relying on NVIDIA GPUs for the MemXCT portion of the competition to reproduce the scaling experiment along with memory bandwidth utilization. “Our results match the original chart closely,” he said, noting there were some hurdles along the way.
Po Hao Chen, a sophmore in computer science at Boston University, said he committed to the competition because he’s always enjoyed algorithmic optimization. Like many, he had to juggle the competition with the demands of courses and exams.
“I stayed up for three whole days working on the cluster,” he said. “And I really learned a lot from this competition.”
Teams and Flops
Tsinghua University, China
Southern University of Science and Technology
Supercomputing centers worldwide are onboarding NVIDIA Ampere GPU architecture to serve the growing demands of heftier AI models for everything from drug discovery to energy research.
Joining this movement, Fujitsu has announced a new exascale system for Japan-based AI Bridging Cloud Infrastructure (ABCI), offering 600 petaflops of performance at the National Institute of Advanced Industrial Science and Technology.
The debut comes as model complexity has surged 30,000x in the past five years, with booming use of AI in research. With scientific applications, these hulking datasets can be held in memory, helping to minimize batch processing as well as to achieve higher throughput.
To fuel this next research ride, NVIDIA Monday introduced the NVIDIA A100 80GB GPU with HBM2e technology. It doubles the A100 40GB GPU’s high-bandwidth memory to 80GB and delivers over 2 terabytes per second of memory bandwidth.
New NVIDIA A100 80GB GPUs let larger models and datasets run in-memory at faster memory bandwidth, enabling higher compute and faster results on workloads. Reducing internode communication can boost AI training performance by 1.4x with half the GPUs.
Leonardo joins a growing pack of European systems on NVIDIA AI platforms supported by the EuroHPC initiative. Its German neighbor, the Jülich Supercomputing Center, recently launched the first NVIDIA GPU-powered AI exascale system to come online in Europe, delivering the region’s most powerful AI platform. The new Atos-designed Jülich system, dubbed JUWELS, is a 2.5 exaflops AI supercomputer that captured No. 7 on the latest TOP500 list.
Linköping University is planning to build Sweden’s fastest AI supercomputer, dubbed BerzeLiUs, based on the NVIDIA DGX SuperPOD infrastructure. It’s expected to provide 300 petaflops of AI performance for cutting-edge research.
NVIDIA is building Cambridge-1, an 80-node DGX SuperPOD with 400 petaflops of AI performance. It will be the fastest AI supercomputer in the U.K. It’s planned to be used in collaborative research within the country’s AI and healthcare community across academia, industry and startups.
Full Steam Ahead in North America
North America is taking the exascale AI supercomputing ride. NERSC (the U.S. National Energy Research Scientific Computing Center) is adopting NVIDIA AI for projects on Perlmutter, its system packing 6,200 A100 GPUs. NERSC now lays claim to 3.9 exaflops of AI performance.
NVIDIA Selene, a cluster based on the DGX SuperPOD, provides a public reference architecture for large-scale GPU clusters that can be deployed in weeks. The NVIDIA DGX SuperPOD system landed the top spot on the Green500 list of most efficient supercomputers, achieving a new world record in power efficiency of 26.2 gigaflops per watt, and it has set eight new performance milestones for MLPerf inference.
The University of Florida and NVIDIA are building the world’s fastest AI supercomputer in academia, aiming to deliver 700 petaflops of AI performance. The partnership puts UF among leading U.S. AI universities, advances academic research and helps address some of Florida’s most complex challenges.
At Argonne National Laboratory, researchers will use a cluster of 24 NVIDIA DGX A100 systems to scan billions of drugs in the search for treatments for COVID-19.
Los Alamos National Laboratory, Hewlett Packard Enterprise and NVIDIA are teaming up to deliver next-generation technologies to accelerate scientific computing.
All Aboard in APAC
Supercomputers in APAC will also be fueled by NVIDIA Ampere architecture. Korean search engine NAVER and Japanese messaging service LINE are using a DGX SuperPOD built with 140 DGX A100 systems with 700 petaflops of peak AI performance to scale out research and development of natural language processing models and conversational AI services.
The Japan Agency for Marine-Earth Science and Technology, or JAMSTEC, is upgrading its Earth Simulator with NVIDIA A100 GPUs and NVIDIA InfiniBand. The supercomputer is expected to have 624 petaflops of peak AI performance with a maximum theoretical performance of 19.5 petaflops of HPC performance, which today would rank high among the TOP500 supercomputers.
India’s Centre for Development of Advanced Computing, or C-DAC, is commissioning the country’s fastest and largest AI supercomputer, called PARAM Siddhi – AI. Built with 42 DGX A100 systems, it delivers 200 exaflops of AI performance and will address challenges in healthcare, education, energy, cybersecurity, space, automotive and agriculture.
Buckle up. Scientific research worldwide has never enjoyed such a ride.
Computer vision has become so good that the days of general managers screaming at umpires in baseball games in disputes over pitches may become a thing of the past.
That’s because developments in image classification along with parallel processing make it possible for computers to see a baseball whizzing by at 95 miles per hour. Pair that with image detection to help geolocate balls, and you’ve got a potent umpire tool that’s hard to argue with.
But computer vision doesn’t stop at baseball.
What Is Computer Vision?
Computer vision is a broad term for the work done with deep neural networks to develop human-like vision capabilities for applications, most often run on NVIDIA GPUs. It can include specific training of neural nets for segmentation, classification and detection using images and videos for data.
Major League Baseball is testing AI-assisted calls at the plate using computer vision. Judging balls and strikes on baseballs that can take just .4 seconds to reach the plate isn’t easy for human eyes. It could be better handled by a camera feed run on image nets and NVIDIA GPUs that can process split-second decisions at a rate of more than 60 frames per second.
Hawk-Eye, based in London, is making this a reality in sports. Hawk-Eye’s NVIDIA GPU-powered ball tracking and SMART software is deployed in more than 20 sports, including baseball, basketball, tennis, soccer, cricket, hockey and NASCAR.
Yet computer vision can do much more than just make sports calls.
What Is Computer Vision Beyond Sports?
Computer vision can handle many more tasks. Developed with convolutional neural networks, computer vision can perform segmentation, classification and detection for a myriad of applications.
Segmentation: Image segmentation is about classifying pixels to belong to a certain category, such as a car, road or pedestrian. It’s widely used in self-driving vehicle applications, including the NVIDIA DRIVE software stack, to show roads, cars and people. Think of it as a sort of visualization technique that makes what computers do easier to understand for humans.
Classification: Image classification is used to determine what’s in an image. Neural networks can be trained to identify dogs or cats, for example, or many other things with a high degree of precision given sufficient data.
Detection: Image detection allows computers to localize where objects exist. It puts rectangular bounding boxes — like in the lower half of the image below — that fully contain the object. A detector might be trained to see where cars or people are within an image, for instance, as in the numbered boxes below.
What You Need to Know: Segmentation, Classification and Detection