SANTA CLARA, Calif., March 22, 2021 – Join Intel’s Navin Shenoy, executive vice president in the Data Platforms Group, and Lisa Spelman, corporate vice president in the Xeon and Memory Group, on April 6 for the launch of the latest 3rd Gen Intel® Xeon® Scalable processors (code-named “Ice Lake”) and the latest additions to Intel’s hardware and software portfolio targeting data centers, 5G networks and intelligent edge infrastructure.
The virtual “How Wonderful Gets Done 2021” launch event will feature Intel executives and ecosystem partners addressing some of today’s greatest business opportunities. The event will also include brief remarks by Intel Chief Executive Officer Pat Gelsinger.
What’s New: Intel today joins the National Institutes of Health (NIH) All of Us Research Program in its historic effort to improve population health by making biomedical data from underrepresented groups available to COVID-19 researchers nationwide via the Researcher Workbench. With a goal of recruiting 1 million U.S. participants from different backgrounds, All of Us is on track to build the most diverse health database of its kind and become one of the largest health research efforts in U.S. history.
“The ability to manage, analyze and share data at scale will be critical in this effort to deliver equitable and effective care during the COVID-19 pandemic and beyond. We are proud to support this important effort in partnership with All of Us and Intel.”
–Mike Daniels, vice president of Global Public Sector at Google Cloud
Why It Matters: Driven by the belief that diversity is key to accelerating health research and better understanding health disparities, the All of Us Research Program aims to enable thousands of studies on a wide range of diseases, including COVID-19. The program’s Researcher Workbench data repository will help researchers learn more about the impact of individual differences in lifestyle, socioeconomic, environment and biologic characteristics in order to advance precision diagnosis, prevention and treatment.
“All of Us is dedicated to serving a diverse body of researchers who can come together to tackle our most pressing health challenges,” said Chris Lunt, the chief technology officer of the NIH All of Us Research Program. “We appreciate Intel’s contribution of research credits and Google Cloud’s computing power to enable novel analysis of our dataset to drive greater understanding of COVID-19.”
To date, the program has enrolled more than 366,000 participants and received more than 279,000 biosamples for genomic sequencing, as well as amassed data from more than 233,000 electronic health records (EHRs) and more than 1.34 million completed surveys.
How the Program Works: Already, 80% of participants who completed early steps of the program are from underrepresented groups, including racial and ethnic minorities, those with an annual household income at or below 200% of the federal poverty level and those who have a cognitive or physical disability, to name a few. Participants are asked to answer health questionnaires, provide access to EHRs, give physical measurements, and agree to collection and analysis of biospecimens for genomic assessment.
To speed COVID-19 research, the program is prioritizing performing assays of serum collected between January 2020 and March 2020 to detect antibodies; collecting electronic health data from participants, including those who have tested or are presumed positive for COVID-19; and collecting data from recurring surveys about participants’ experiences during the pandemic. This biomedical data will be made broadly accessible to researchers through the Researcher Workbench, hosted on Google Cloud and powered by Intel® Xeon® Scalable processors. To assist in this effort, Intel is funding compute credits to support data curation and research projects to speed COVID-19 discovery and treatment.
“We are excited to join Google and All of Us in offering compute for analyzing the most diverse health database in the world,” said Prashant Shah, global head of artificial intelligence for Health and Life Sciences at Intel. “The collection and availability of this data is essential to speed scientific research and discovery to not only fight COVID-19, but to also address health disparities in medical research for years to come.”
About Intel’s Healthcare and COVID-19 Response: Intel is committed to accelerating access to technology that can combat the current pandemic and enable scientific discovery that better prepares our world for future crises. Support for NIH’s All of Us Research Program was provided in part by Intel’s Pandemic Response Technology Initiative.
NGC is a catalog of software that is optimized to run on NVIDIA GPU cloud instances, such as the Amazon EC2 P4d instance featuring the record-breaking performance of NVIDIA A100 Tensor Core GPUs. AWS customers can deploy this software free of charge to accelerate their AI deployments.
We first began providing GPU-optimized software through the NVIDIA NGC catalog in 2017. Since then, industry demand for these resources has skyrocketed. More than 250,000 unique users have now downloaded more than 1 million of the AI containers, pretrained models, application frameworks, Helm charts and other machine learning resources available on the catalog.
Teaming Up for Another First in the Cloud
AWS is the first cloud service provider to offer the NGC catalog on their marketplace. Many organizations look to the cloud first for new deployment, so having NGC software available at the fingertips of data scientists and developers can help enterprises hit the ground running. With NGC, they can easily get started on new AI projects without having to leave the AWS ecosystem.
“AWS and NVIDIA have been working together to accelerate computing for more than a decade, and we are delighted to offer the NVIDIA NGC catalog in AWS Marketplace,” said Chris Grusz, director of AWS Marketplace at Amazon Web Services. “With NVIDIA NGC software now available directly in AWS Marketplace, customers will be able to simplify and speed up their AI deployment pipeline by accessing and deploying these specialized software resources directly on AWS.”
NGC AI Containers Debuting Today in AWS Marketplace
To help data scientists and developers build and deploy AI-powered solutions, the NGC catalog offers hundreds of NVIDIA GPU-accelerated machine learning frameworks and industry-specific software development kits. Today’s launch of NGC on AWS Marketplace features many of NVIDIA’s most popular GPU-accelerated AI software in healthcare, recommender systems, conversational AI, computer vision, HPC, robotics, data science and machine learning, including:
NVIDIA Clara Imaging: NVIDIA’s domain-optimized application framework that accelerates deep learning training and inference for medical imaging use cases.
NVIDIA DeepStream SDK: A multiplatform scalable video analytics framework to deploy on the edge and connect to any cloud.
NVIDIA HPC SDK: A suite of compilers, libraries and software tools for high performance computing.
NVIDIA Isaac Sim ML Training: A toolkit to help robotics machine learning engineers use Isaac Sim to generate synthetic images to train an object detection deep neural network.
NVIDIA Merlin: An open beta framework for building large-scale deep learning recommender systems.
NVIDIA NeMo: An open-source Python toolkit for developing state-of-the-art conversation AI models.
RAPIDS: A suite of open-source data science software libraries.
Instant Access to Performance-Optimized AI Software
NGC software in AWS Marketplace provides a number of benefits to help data scientists and developers build the foundations for success in AI.
Faster software discovery: Through the AWS Marketplace, developers and data scientists can access the latest versions of NVIDIA’s AI software with a single click.
The latest NVIDIA software: The NGC software in AWS Marketplace is federated, giving AWS users access to the latest versions as soon as they’re available in the NGC catalog. The software is constantly optimized, and the monthly releases give users access to the latest features and performance improvements.
Simplified software deployment: Users of Amazon EC2, Amazon SageMaker, Amazon Elastic Kubernetes Service (EKS) and Amazon Elastic Container Service (ECS) can quickly subscribe, pull and run NGC software on NVIDIA GPU instances, all within the AWS console. Additionally, SageMaker users can simplify their workflows by eliminating the need to first store a container in Amazon Elastic Container Registry (ECR).
Continuous integration and development: NGC Helm charts are also available in AWS Marketplace to help DevOps teams quickly and consistently deploy their services.
What’s New: Intel and Google Cloud today announced their collaboration to simplify enterprises’ ability to adopt and deploy cloud-first business models using their existing on-prem, self-managed hardware. The two organizations co-developed reference architectures optimized for the now-generally available “Anthos on bare metal” solution. Targeted at data center and edge computing use cases, customers can leverage the reference architectures to rapidly deploy enterprise-class applications on their existing hardware infrastructure and efficiently handle complicated hybrid- and multi-cloud tasks.
“With today’s rapidly evolving business climate, enterprises are constantly looking for new ways to modernize their business while leveraging their existing infrastructure. Running Anthos on bare metal using servers based on Intel® Xeon® Scalable processors will simplify the deployment of a cloud-first approach, opening a wide array of new use cases across retail, telco and manufacturing industries.”
–Jason Grebe, Intel corporate vice president and general manager the Cloud and Enterprise Solutions Group
What’s the Benefit: Anthos on bare metal helps customers expedite hybrid- and multi-cloud deployments within their enterprise. The co-developed reference architectures Optimized to target the unique feature set of Intel® processors, providing customers consistency as they move containerized applications between common architectures within different cloud environments.
Intel Xeon Scalable processors are the most widely deployed server processors in the world, targeting a broad range of applications from the core of the data center to the edge of the network. Customers can use the data center reference architecture within Intel Xeon Scalable processors to help:
Optimize networking workloads leveraging Intel’s Data Plane Development Kit with single root input/output virtualization.
Deploy artificial intelligence and analytics workloads leveraging Intel® Deep Learning Boost technology.
Accelerate encryption and compression workloads with Intel® QuickAssist technology.
“Anthos on bare metal provides customers with more choice and flexibility over where to deploy applications in the public cloud, on prem or at the edge,” said Rayn Veerubhotla, director, Partner Engineering at Google Cloud. “Intel’s support for Anthos on bare metal ensures that customers can quickly deploy their enterprise applications on existing hardware, simplifying their path to hybrid- and multi-cloud approaches.”
Why It’s Important: Anthos allows applications to be packaged into containers and moved between various public cloud environments without having to rewrite the application for the underlying cloud infrastructure. Anthos on bare metal is an option that now allows enterprises to run Anthos on their existing on-prem physical servers, deployed on an operating system without a hypervisor layer.
What’s Being Done: Intel and Google Cloud have co-developed two reference architectures for Anthos on bare metal: a data center reference architecture and an edge reference architecture. The server-based data center reference architecture features Intel® Xeon® Gold 6240Y processors, Intel® Optane™ persistent memory, Intel® Solid State Drive DC S4500 Series and 10/25 GbE Intel® Ethernet Adapters. The edge reference architecture targets the Intel® NUC 10 performance kit featuring the 10th Gen Intel® Core™ i7-10710U processor, Intel® SSD Pro 7600p and Intel® Ethernet Connection I219-V.
Intel has validated both architectures with Google Cloud for customers to start to deploy today. Customers can work with their OEM partners, systems integrator or reseller to build out the reference architecture in their infrastructure.
Cloud or on premises? That’s the question many organizations ask when building AI infrastructure.
Cloud computing can help developers get a fast start with minimal cost. It’s great for early experimentation and supporting temporary needs.
As businesses iterate on their AI models, however, they can become increasingly complex, consume more compute cycles and involve exponentially larger datasets. The costs of data gravity can escalate, with more time and money spent pushing large datasets from where they’re generated to where compute resources reside.
This AI development “speed bump” is often an inflection point where organizations realize there are opex benefits with on-premises or collocated infrastructure. Its fixed costs can support rapid iteration at the lowest “cost per training run,” complementing their cloud usage.
Conversely, for organizations whose datasets are created in the cloud and live there, procuring compute resources adjacent to that data makes sense. Whether on-prem or in the cloud, minimizing data travel — by keeping large volumes as close to compute resources as possible — helps minimize the impact of data gravity on operating costs.
‘Own the Base, Rent the Spike’
Businesses that ultimately embrace hybrid cloud infrastructure trace a familiar trajectory.
One customer developing an image recognition application immediately benefited from a fast, effortless start in the cloud.
As their database grew to millions of images, costs rose and processing slowed, causing their data scientists to become more cautious in refining their models.
At this tipping point — when a fixed cost infrastructure was justified — they shifted training workloads to an on-prem NVIDIA DGX system. This enabled an immediate return to rapid, creative experimentation, allowing the business to build on the great start enabled by the cloud.
The saying “own the base, rent the spike” captures this situation. Enterprise IT provisions on-prem DGX infrastructure to support the steady-state volume of AI workloads and retains the ability to burst to the cloud whenever extra capacity is needed.
It’s this hybrid cloud approach that can secure the continuous availability of compute resources for developers while ensuring the lowest cost per training run.
Delivering the AI Hybrid Cloud with DGX and Google Cloud’s Anthos on Bare Metal
To help businesses embrace hybrid cloud infrastructure, NVIDIA has introduced support for Google Cloud’s Anthos on bare metal for its DGX A100 systems.
For customers using Kubernetes to straddle cloud GPU compute instances and on-prem DGX infrastructure, Anthos on bare metal enables a consistent development and operational experience across deployments, while reducing expensive overhead and improving developer productivity.
This presents several benefits to enterprises. While many have implemented GPU-accelerated AI in their data centers, much of the world retains some legacy x86 compute infrastructure. With Anthos on bare metal, IT can easily add on-prem DGX systems to their infrastructure to tackle AI workloads and manage it the same familiar way, all without the need for a hypervisor layer.
Without the need for a virtual machine, Anthos on bare metal — now generally available — manages application deployment and health across existing environments for more efficient operations. Anthos on bare metal can also manage application containers on a wide variety of performance, GPU-optimized hardware types and allows for direct application access to hardware.
“Anthos on bare metal provides customers with more choice over how and where they run applications and workloads,” said Rayn Veerubhotla, Director of Partner Engineering at Google Cloud. “NVIDIA’s support for Anthos on bare metal means customers can seamlessly deploy NVIDIA’s GPU Device Plugin directly on their hardware, enabling increased performance and flexibility to balance ML workloads across hybrid environments.”
Additionally, teams can access their favorite NVIDIA NGC containers, Helm charts and AI models from anywhere.
With this combination, enterprises can enjoy the rapid start and elasticity of resources offered on Google Cloud, as well as the secure performance of dedicated on-prem DGX infrastructure.
What’s New: At Intel FPGA Technology Day, Intel announced a new, customizable solution to help accelerate application performance across 5G, artificial intelligence, cloud and edge workloads. The new Intel® eASIC N5X is the first structured eASIC family with an Intel® FPGA compatible hard processor system. The Intel eASIC N5X helps customers migrate both their custom logic and designs — using the embedded hard processor in the FPGA — to structured ASICs, bringing benefits like lower unit cost, faster performance and reduced power consumption.
“The potential for data to transform industries and business has never been greater. The announcement of the new Intel eASIC N5X uniquely positions our customers to more fully benefit from the flexibility and time-to-market advantages of Intel FPGAs with the performance benefits and lower operating power of structured ASICs. The combination of FPGA, eASIC and ASIC products from Intel enables customers to take advantage of this potential by providing what we call the ‘custom logic continuum,’ and is a capability not available from any other vendor in the market.”
–Dave Moore, Intel corporate vice president and general manager of the Programmable Solutions Group
Why It Matters: FPGAs offer the best time-to-market advantage and highest flexibility for customer designs, while ASICs and structured ASIC devices provide the best hardware-optimized performance at the lowest power and cost. FPGAs are ideal for enabling agile innovation and are the fastest path to next-generation technology exploration. The programmability of FPGAs helps customers quickly develop hardware for their specific workloads and adapt to changing standards over time – as happened during the early phases of the 5G rollout and the migration toward open RAN implementations.
The new innovative Intel eASIC N5X devices deliver up to 50% lower core power and lower cost compared to FPGAs, while providing faster time to market and lower non-recurring engineering costs compared to ASICs. This allows customers to create power-optimized, high-performance and highly differentiated solutions. Intel eASIC N5X devices also help customers to meet the key security needs of many applications by incorporating a secure device manager adapted from the Intel® Agilex™ FPGA family, including secure boot, authentication and anti-tamper features.
Intel’s Unique Approach: Intel is the world’s only semiconductor company to offer the complete custom logic continuum of FPGAs, such as Intel Agilex and Intel® Stratix™ 10; structured ASICs, such as Intel eASIC N5X; and ASICs to its customers. This comprehensive data-processing portfolio of customizable logic devices helps enable Intel customers to truly optimize unit cost, performance, power consumption and time to market for their market-specific solutions in a manner that’s unique in the industry.
About Intel FPGA Technology Day: This is a one-day virtual event on Nov. 18, 2020, that brings together Intel executives, partners and customers to showcase the latest Intel programmable products and solutions through a series of keynotes, webinars and demonstrations.
Amazon Web Services’ first GPU instance debuted 10 years ago, with the NVIDIA M2050. At that time, CUDA-based applications were focused primarily on accelerating scientific simulations, with the rise of AI and deep learning still a ways off.
Since then, AWS has added to its stable of cloud GPU instances, which has included the K80 (p2), K520 (g3), M60 (g4), V100 (p3/p3dn) and T4 (g4).
The P4d instance delivers AWS’s highest performance, most cost-effective GPU-based platform for machine learning training and high performance computing applications. The instances reduce the time to train machine learning models by up to 3x with FP16 and up to 6x with TF32 compared to the default FP32 precision.
Each P4d instance features eight NVIDIA A100 GPUs and, with AWS UltraClusters, customers can get on-demand and scalable access to over 4,000 GPUs at a time using AWS’s Elastic Fabric Adaptor (EFA) and scalable, high-performant storage with Amazon FSx. P4d offers 400Gbps networking and uses NVIDIA technologies such as NVLink, NVSwitch, NCCL and GPUDirect RDMA to further accelerate deep learning training workloads. NVIDIA GPUDirect RDMA on EFA ensures low-latency networking by passing data from GPU to GPU between servers without having to pass through the CPU and system memory.
In addition, the P4d instance is supported in many AWS services, including Amazon Elastic Container Services, Amazon Elastic Kubernetes Service, AWS ParallelCluster and Amazon SageMaker. P4d can also leverage all the optimized, containerized software available from NGC, including HPC applications, AI frameworks, pre-trained models, Helm charts and inference software like TensorRT and Triton Inference Server.
P4d instances are now available in US East and West, and coming to additional regions soon. The instances can be purchased as On-Demand, with Savings Plans, with Reserved Instances, or as Spot Instances.
The first decade of GPU cloud computing has brought over 100 exaflops of AI compute to the market. With the arrival of the Amazon EC2 P4d instance powered by NVIDIA A100 GPUs, the next decade of GPU cloud computing is off to a great start.
NVIDIA and AWS are making it possible for applications to continue pushing the boundaries of AI across a wide array of applications. We can’t wait to see what customers will do with it.
The popularity of public cloud offerings is evident — just look at how top cloud service providers report double-digit growth year over year.
However, application performance requirements and regulatory compliance issues, to name two examples, often require data to be stored locally to reduce distance and latency and to place data entirely within a company’s control. In these cases, standard private clouds also may offer less flexibility, agility or on-demand capacity.
To help resolve these issues, Lenovo, Microsoft and NVIDIA have engineered a hyperconverged hybrid cloud that enables Azure cloud services within an organization’s data center.
By integrating Lenovo ThinkAgile SX, Microsoft Azure Stack Hub and NVIDIA Mellanox networking, organizations can deploy a turnkey, rack-scale cloud that’s optimized with a resilient, highly performant and secure software-defined infrastructure.
Fully Integrated Azure Stack Hub Solution
Lenovo ThinkAgile SX for Microsoft Azure Stack Hub satisfies regulatory compliance and removes performance concerns. Because all data is kept on secure servers in a customer’s data center, it’s much simpler to comply with the governance laws of a country and implement their own policies and practices.
Similarly, by reducing the distance that data must travel, latency is reduced and application performance goals can be more easily achieved. At the same time, customers can cloud-burst some workloads to the Microsoft Azure public cloud, if desired.
Lenovo, Microsoft and NVIDIA worked together to make sure everything performs right out of the box. There’s no need to worry about configuring and adjusting settings for virtual or physical infrastructure.
The power and automation of Azure Stack Hub software, the convenience and reliability of Lenovo’s advanced servers, and the high performance of NVIDIA networking combine to enable an optimized hybrid cloud. Offering the automation and flexibility of Microsoft Azure Cloud with the security and performance of on-premises infrastructure, it’s an ideal platform to:
deliver Azure cloud services from the security of your own data center,
enable rapid development and iteration of applications with on-premises deployment tools,
unify application development across entire hybrid cloud environments, and
easily move applications and data across private and public clouds.
Agility of a Hybrid Cloud
Azure Stack Hub also seamlessly operates with Azure, delivering an orchestration layer that enables the movement of data and applications to the public cloud. This hybrid cloud protects the data and applications that need protection and offers lower latencies for accessing data. And it still provides the public cloud benefits organizations may need, such as reduced costs, increased infrastructure scalability and flexibility, and protection from data loss.
A hybrid approach to cloud computing keeps all sensitive information onsite and often includes centrally used applications that may have some of this data tied to them. With a hybrid cloud infrastructure in place, IT personnel can focus on building proficiencies in deploying and operating cloud services — such as IaaS, PaaS and SaaS — and less on managing infrastructure.
A hybrid cloud requires a network that can handle all data communication between clients, servers and storage. The Ethernet fabric used for networking in the Lenovo ThinkAgile SX for Microsoft Azure Stack Hub leverages NVIDIA Mellanox Spectrum Ethernet switches — powered by the industry’s highest-performing ASICs — along with NVIDIA Cumulus Linux, the most advanced open network operating system.
At 25Gb/s data rates, these switches provide cloud-optimized delivery of data at line-rate. Using a fully shared buffer, they support fair bandwidth allocation and provide predictably low latency, as well as traffic flow prioritization and optimization technology to deliver data without delays, while the hot-swappable redundant power supplies and fans help provide resiliency for business-sensitive traffic.
Modern networks require advanced offload capabilities, including remote direct memory access (RDMA), TCP, overlay networks (for example, VXLAN and Geneve) and software-defined storage acceleration. Implementing these at the network layer frees expensive CPU cycles for user applications while improving the user experience.
To handle the high-speed communications demands of Azure Stack Hub, Lenovo configured compute nodes with a dual-port 10/25/100GbE NVIDIA Mellanox ConnectX-4 Lx, ConnectX-5 or ConnectX-6 Dx NICs. The ConnectX NICs are designed to address cloud, virtualized infrastructure, security and network storage challenges. They use native hardware support for RoCE, offer stateless TCP offloads, accelerate overlay networks and support NVIDIA GPUDirect technology to maximize performance of AI and machine learning workloads. All of this results in much needed higher infrastructure efficiency.
RoCE for Improved Efficiency
Microsoft Azure Stack Hub leverages Storage Spaces Direct (S2D) and Microsoft’s Server Message Block Direct 3.0. SMB Direct uses high-speed RoCE to transfer large amounts of data with little CPU intervention. SMB Multichannel allows servers to simultaneously use multiple network connections and provide fault tolerance through the automatic discovery of network paths.
The addition of these two features allows NVIDIA RoCE-enabled ConnectX Ethernet NICs to deliver line-rate performance and optimize data transfer between server and storage over standard Ethernet. Customers with Lenovo ThinkAgile SX servers or the Lenovo ThinkAgile SX Azure Hub can deploy storage on secure file servers while delivering the highest performance. As a result, S2D is extremely fast with disaggregated file server performance, almost equaling that of locally attached storage.
Run More Workloads
By using intelligent hardware accelerators and offloads, the NVIDIA RoCE-enabled NICs offload I/O tasks from the CPU, freeing up resources to accelerate application performance instead of making data wait for the attention of a busy CPU.
The result is lower latencies and an improvement in CPU efficiencies. This maximizes the performance in Microsoft Azure Stack deployments by leaving the CPU available to run other application processes. Efficiency gets a boost since users can host more VMs per physical server, support more VDI instances and complete SQL Server queries more quickly.
A Transformative Experience with a ThinkAgile Advantage
Lenovo ThinkAgile solutions include a comprehensive portfolio of software and services that supports the full lifecycle of infrastructure. At every stage — planning, deploying, supporting, optimizing and end-of-life — Lenovo provides the expertise and services needed to get the most from technology investments.
This includes single-point-of-contact support for all the hardware and software used in the solution, including Microsoft’s Azure Stack Hub and the ConnectX NICs. Customers never have to worry about who to call — Lenovo takes calls and drives them to resolution.
Hate hunting and pecking away at your keyboard every time you have a quick question? You’ll love this.
Microsoft’s Bing search engine has turned to Turing-NLG and NVIDIA GPUs to suggest full sentences for you as you type.
Turing-NLG is a cutting-edge, large-scale unsupervised language model that has achieved strong performance on language modeling benchmarks.
It’s just the latest example of an AI technique called unsupervised learning, which makes sense of vast quantities of data by extracting features and patterns without the need for humans to provide any pre-labeled data.
Microsoft calls this Next Phrase Prediction, and it can feel like magic, making full-phrase suggestions in real time for long search queries.
Turing-NLG is among several innovations — from model compression to state caching and hardware acceleration — that Bing has harnessed with Next Phrase Prediction.
Over the summer, Microsoft worked with engineers at NVIDIA to optimize Turing-NLG to their needs, accelerating the model on NVIDIA GPUs to power the feature for users worldwide.
A key part of this optimization was to run this massive AI model extremely fast to power real-time search experience. With a combination of hardware and model optimization Microsoft and NVIDIA achieved an average latency below 10 milliseconds.
By contrast, it takes more than 100 milliseconds to blink your eye.
Before the introduction of Next Phrase Prediction, the approach for handling query suggestions for longer queries was limited to completing the current word being typed by the user.
Now type in “The best way to replace,” and you’ll immediately see three suggestions for completing the phrase: wood, plastic and metal. Type in “how can I replace a battery for,” and you’ll see “iphone, samsung, ipad and kindle” all suggested.
With Next Phrase Prediction, Bing can now present users with full-phrase suggestions.
The more characters you type, the closer Bing gets to what you probably want to ask.
And because these suggestions are generated instantly, they’re not limited to previously seen data or just the current word being typed.
So, for some queries, Bing won’t just save you a few keystrokes — but multiple words.
As a result of this work, the coverage of autosuggestion completions increases considerably, Microsoft reports, improving the overall user experience “significantly.”
Ming-Yu Liu and Arun Mallya were on a video call when one of them started to break up, then freeze.
It’s an irksome reality of life in the pandemic that most of us have shared. But unlike most of us, Liu and Mallya could do something about it.
They are AI researchers at NVIDIA and specialists in computer vision. Working with colleague Ting-Chun Wang, they realized they could use a neural network in place of the software called a video codec typically used to compress and decompress video for transmission over the net.
Their work enables a video call with one-tenth the network bandwidth users typically need. It promises to reduce bandwidth consumption by orders of magnitude in the future.
“We want to provide a better experience for video communications with AI so even people who only have access to extremely low bandwidth can still upgrade from voice to video calls,” said Mallya.
Better Connections Thanks to GANs
The technique works even when callers are wearing a hat, glasses, headphones or a mask. And just for fun, they spiced up their demo with a couple bells and whistles so users can change their hair styles or clothes digitally or create an avatar.
A more serious feature in the works (shown at top) uses the neural network to align the position of users’ faces for a more natural experience. Callers watch their video feeds, but they appear to be looking directly at their cameras, enhancing the feeling of a face-to-face connection.
“With computer vision techniques, we can locate a person’s head over a wide range of angles, and we think this will help people have more natural conversations,” said Wang.
Say hello to the latest way AI is making virtual life more real.
How AI-Assisted Video Calls Work
The mechanism behind AI-assisted video calls is simple.
A sender first transmits a reference image of the caller, just like today’s systems that typically use a compressed video stream. Then, rather than sending a fat stream of pixel-packed images, it sends data on the locations of a few key points around the user’s eyes, nose and mouth.
A generative adversarial network on the receiver’s side uses the initial image and the facial key points to reconstruct subsequent images on a local GPU. As a result, much less data is sent over the network.
Liu’s work in GANs hit the spotlight last year with GauGAN, an AI tool that turns anyone’s doodles into photorealistic works of art. GauGAN has already been used to create more than a million images and is available at the AI Playground.
“The pandemic motivated us because everyone is doing video conferencing now, so we explored how we can ease the bandwidth bottlenecks so providers can serve more people at the same time,” said Liu.
GPUs Bust Bandwidth Bottlenecks
The approach is part of an industry trend of shifting network bottlenecks into computational tasks that can be more easily tackled with local or cloud resources.
“These days lots of companies want to turn bandwidth problems into compute problems because it’s often hard to add more bandwidth and easier to add more compute,” said Andrew Page, a director of advanced products in NVIDIA’s media group.
AI Instruments Tune Video Services
GAN video compression is one of several capabilities coming to NVIDIA Maxine, a cloud-AI video-streaming platform to enhance video conferencing and calls. It packs audio, video and conversational AI features in a single toolkit that supports a broad range of devices.
Announced this week at GTC, Maxine lets service providers deliver video at super resolution with real-time translation, background noise removal and context-aware closed captioning. Users can enjoy features such as face alignment, support for virtual assistants and realistic animation of avatars.
“Video conferencing is going through a renaissance,” said Page. “Through the pandemic, we’ve all lived through its warts, but video is here to stay now as a part of our lives going forward because we are visual creatures.”
Maxine harnesses the power of NVIDIA GPUs with Tensor Cores running software such as NVIDIA Jarvis, an SDK for conversational AI that delivers a suite of speech and text capabilities. Together, they deliver AI capabilities that are useful today and serve as building blocks for tomorrow’s video products and services.