For radiology to benefit from AI, there needs to be easy, consistent and scalable ways for hospital IT departments to implement the technology. It’s a return to a service-oriented architecture, where logical components are separated and can each scale individually, and an efficient use of the additional compute power these tools require.
AI is coming from dozens of vendors as well as internal innovation groups, and needs a place within the hospital network to thrive. That’s why NVIDIA and the American College of Radiology (ACR) have published a Hospital AI Reference Architecture Framework. It helps hospitals easily get started with AI initiatives.
A Cookbook to Make AI Easy
The Hospital AI Reference Architecture Framework was published at yesterday’s annual ACR meeting for public comment. This follows the recent launch of the ACR AI-LAB, which aims to standardize and democratize AI in radiology. The ACR AI-LAB uses infrastructure such as NVIDIA GPUs and the NVIDIA Clara AI toolkit, as well as GE Healthcare’s Edison platform, which helps bring AI from research into FDA-cleared smart devices.
The Hospital AI Reference Architecture Framework outlines how hospitals and researchers can easily get started with AI initiatives. It includes descriptions of the steps required to build and deploy AI systems, and provides guidance on the infrastructure needed for each step.
To drive an effective AI program within a healthcare institution, there must first be an understanding of the workflows involved, compute needs and data required. It comes from a foundation of enabling better insights from patient data with easy-to deploy compute at the edge.
Using a transfer client, seed models can be downloaded from a centralized model store. A clinical champion uses an annotation tool to locally create data that can be used for fine-tuning the seed model or training a new model. Then, using the training system with the annotated data, a localized model is instantiated. Finally, an inference engine is used to conduct validation and ultimately inference on data within the institution.
These four workflows sit atop AI compute infrastructure, which can be accelerated with NVIDIA GPU technology for best performance, alongside storage for models and annotated studies. These workflows tie back into other hospital systems such as PACS, where medical images are archived.
Three Magic Ingredients: Hospital Data, Clinical AI Workflows, AI Computing
Healthcare institutions don’t have to build the systems to deploy AI tools themselves.
This scalable architecture is designed to support and provide computing power to solutions from different sources. GE Healthcare’s Edison platform now uses NVIDIA’s TRT-IS inference capabilities to help AI run in an optimized way within GPU-powered software and medical devices. This integration makes it easier to deliver AI from multiple vendors into clinical workflows — and is the first example of the AI-LAB’s efforts to help hospitals adopt solutions from different vendors.
Together, Edison with TRT-IS offers a ready-made device inferencing platform that is optimized for GPU-compliant AI, so models built anywhere can be deployed in an existing healthcare workflow.
Hospitals and researchers are empowered to embrace AI technologies without building their own standalone technology or yielding their data to the cloud, which has privacy implications.
For enterprises looking to get their GPU-accelerated AI and data science projects up and running more quickly, life just got easier.
At Red Hat Summit today, NVIDIA and Red Hat introduced the combination of NVIDIA’s GPU-accelerated computing platform and the just-announced Red Hat OpenShift 4 to speed on-premises Kubernetes deployments for AI and data science.
The result: Kubernetes management tasks that used to take an IT administrator the better part of a day can now be completed in under an hour.
More GPU Acceleration, Less Deployment Hassle
This collaboration comes at a time when enterprises are relying on AI and data science to turn their vast amounts of data into actionable intelligence.
But meaningful AI and data analytics work requires accelerating the full stack of enterprise IT software with GPU computing. Every layer of software — from NVIDIA drivers to container runtimes to application frameworks — needs to be optimized.
Our CUDA parallel computing architecture and CUDA-X acceleration libraries have been embraced by a community of more than 1.2 million developers for accelerating applications across a broad set of domains — from AI to high-performance computing to VDI.
And because NVIDIA’s common architecture runs on every computing device imaginable — from a laptop to the data center to the cloud — the investment in GPU-accelerated applications is easy to justify and just makes sense.
Accelerating AI and data science workloads is only the first step, however. Getting the optimized software stack deployed the right way in large-scale, GPU-accelerated data centers can be frustrating and time consuming for IT organizations. That’s where our work with Red Hat comes in.
Red Hat OpenShift is the leading enterprise-grade Kubernetes platform in the industry. Advancements in OpenShift 4 make it easier than ever to deploy Kubernetes across a cluster. Red Hat’s investment in Kubernetes Operators, in particular, reduces administrative complexity by automating many routine data center management and application lifecycle management tasks.
NVIDIA has been working on its own GPU operator to automate a lot of the work IT managers previously did through shell scripts, such as installing device drivers, ensuring the proper GPU container runtimes are present on all nodes in the data center, as well as monitoring GPUs.
Thanks to our work with Red Hat, once the cluster is set up, you simply run the GPU operator to add the necessary dependencies to the worker nodes in the cluster. It’s just that easy. This can make it as simple for an organization to get its GPU-powered data center clusters up and running with OpenShift 4 as it is to spin up new cloud resources.
Preview and Early Access Program
At Red Hat Summit, we’re showing in our booth 1039 a preview of how easy it is to set up bare-metal GPU clusters with OpenShift and GPU operators.
Also, you won’t want to miss Red Hat Chief Technology Officer Chris Wright’s keynote on Thursday when NVIDIA Vice President of Compute Software Chris Lamb will join him on stage to demonstrate how our technologies work together and discuss our collaboration in further detail.
Red Hat and NVIDIA are inviting our joint customers in a white-glove early access program. Customers who want to learn more or participate in the early access program can sign up at https://www.openshift.com/accelerated-ai.
Good news: astronomers are getting new tools to let them see further, better than ever before. The bad news: they’ll soon be getting more data than humans can handle.
To turn the vast quantities of data that will soon be pouring out of these instruments into word-changing scientific discoveries Brant Robertson, currently a visiting professor at Princeton’s Institute for Advanced Studies and an associate professor of astronomy at UC Santa Cruz, is turning to AI.
“Astronomy is on the cusp of a new data revolution,” he said told a packed room at this week’s GPU Technology Conference in Silicon Valley.
Better Eyes on the Sky
Within a few years the range of instruments available to the world’s star-gazers will give them once unimagined capabilities. Measuring an enormous 6.5 meters across, the James Webb Space Telescope — which will be deployed by NASA, the U.S space agency, will be sensitive enough to give us a peek back at galaxies formed just a few hundred million years after the Big Bang.
The Large Synoptic Survey Telescope gets less press, but it has astronomers equally excited. The telescope, largely funded by the U.S. National Science Foundation and the Department of Energy, and being built on a mountaintop in Chile, will give astronomers the ability to survey the entire southern sky every three nights. This will produce a massive amount of data —10 terabytes a night.
Finally, the Wide Field Infrared Survey Telescope puts an enormous digital camera into space. With origins in the U.S. spy satellite program, the satellite’s features will include a 288-megapixel multi-band near-infrared camera with a field of view 100 times larger than that of the Hubble.
‘Richly Complex’ Data
Together, these three instruments will generate vast quantities of “richly complex,” data, Robertson said. “We want to take that information and learn as much as we can,” he said. “Both from individual pixels and by aggregating them together.”
It’s a task far too large for humans alone. To keep up, Robertson is turning to AI. Created by Ryan Hausen, a PhD student in UC Santa Cruz’s computer science department, Morpheus — a deep learning framework classifies astronomical objects, such as galaxies, based on the raw data streaming out of telescopes — such as the Hubble Space Telescope — on a pixel by pixel basis.
“In astronomy we really do care about the technological advances that people in this room are engineering,” Robertson told his audience at GTC.
Translation: in order to find new stars in outer space this prominent astrophysicists is looking, first, to deep learning stars here on Earth for help.
NVIDIA’s message was unmistakable as it kicked off the 10th annual GPU Technology Conference: it’s doubling-down on the datacenter.
Founder and CEO Jensen Huang delivered a sweeping opening keynote at San Jose State University, describing the company’s progress accelerating the sprawling datacenters that power the world’s most dynamic industries.
With a record GTC registered attendance of 9,000, he rolled out a spate of new technologies, detailed their broad adoption by industry leaders including Cisco, Dell, Hewlett-Packard Enterprise, and Lenovo, and highlighted how NVIDIA technologies are Communications by some of the world’s biggest names, including Accenture, Amazon, Charter Spectrum, Microsoft and Toyota.
“The accelerated computing approach that we pioneered is really taking off,” said Huang, who exactly a week ago announced the company’s $6.9 billion acquisition of Mellanox, a leader in high-performance computing interconnect technology. “If you take look at what we achieved last year, the momentum is absolutely clear.”
To be sure, Huang also detailed progress outside the data center, rolling out innovations targeting everything from robotics to pro graphics to the automotive industry.
Developers, Developers, Developers
The recurring theme, however, was how NVIDIA’s ability to couple software and silicon delivers the advances in computing power needed to transform torrents of data into insights and intelligence.
“Accelerated computing is not just about the chips,” Huang said. “Accelerated computing is a collaboration, a codesign, a continuous optimization between the architecture of the chip, the systems, the algorithm and the application.”
As a result, the GPU developer ecosystem is growing fast, Huang said. The number of developers has grown to more than 1.2 million from 800,000 last year; there now are 125 GPU powered systems among the world’s 500 fastest supercomputers; and there are more than 600 applications powered by NVIDIA’s CUDA parallel computing platform.
Mellanox — whose interconnect technology helps power more than half the world’s 500 fastest supercomputers — complement’s NVIDA’s strength in datacenters and high-performance computing, Huang said, explaining why NVIDIA agreed to buy the company earlier this month.
Mellanox CEO Eyal Waldman, who joined Huang on stage, said: “We’re seeing a great growth in data, we’re seeing an exponential growth. The program-centric datacenter is changing into a data-centric datacenter, which means the data will flow and create the programs, rather than the programs creating the data.”
Bringing AI to Datacenters
These technologies are all finding their way into the world’s datacenters as enterprises build more powerful servers — “scaling up” or “capability” systems, as Huang called it — and network their servers more closely together than ever — or “scaling out,” or “capacity” systems, as businesses seek to turn data into a competitive advantage.
To help businesses move faster, Huang introduced CUDA-X AI, the world’s only end-to-end acceleration libraries for data science. CUDA-X AI arrives as businesses turn to AI — deep learning, machine learning and data analytics — to make data more useful, Huang explained.
The typical workflow for all these: data processing, feature determination, training, verification and deployment. CUDA-X AI unlocks the flexibility of our NVIDIA Tensor Core GPUs to uniquely address this end-to-end AI pipeline.
CUDA-X AI has been adopted by all the major cloud services, including Amazon Web Services, Google Cloud Platform, and Microsoft Azure. It’s been adopted by Charter, PayPal, SAS, and Walmart.
“Think about not just the costs that they’ree saving, but the most precious resource that these data scientists have — time and iterations,” said Matt Garman, vice president of computing services at Amazon Web Services.
Turing, RTX, and Omniverse
NVIDIA’s Turing GPU architecture — and its RTX real-time ray tracing technology — is also being widely adopted. NVIDIA RTX enjoys wide support with Huang highlighting more than 20 partners — including Adobe, Autodesk, Dassault Systèmes, Pixar, Siemens, Unity, Unreal, and Weta Digital — supporting RTX.
And to support the fast-growing numbers of creative professionals across an increasingly complex pipeline around the globe, Huang introduced Omniverse, enabling creative professionals to harness multiple applications to create and share scenes across different teams and from different locations. He described is as a collaboration tools like Google Docs for 3D designers, who could be located anywhere in the world while working on the same project.
“We wanted to make a tool that made it possible for studios all around the world to collaborate,” Huang said. “Omniverse basically connects up all the designers in the studios, it works with every tool.”
To speed the work of graphics pros using these, and other tools, Huang introduced the NVIDIA RTX Server, a reference architecture that will be delivered with top system vendors.
The massive power savings alone mean these machines don’t just accelerate your work, they pay for themselves. “I used to say ‘The more you buy the more you save,’ but I think I was wrong,” Huang said, with a smile. “RTX Servers are free.”
To accelerate data preparation, model training and visualization, Huang also introduced the NVIDIA-powered Data Science Workstation. Built with Quadro RTX GPUs and pre-installed with CUDA-X AI accelerated machine learning and deep learning software, these systems for data scientists are available from global workstation providers.
Bringing gaming technology to the datacenter, as well, Huang announced the GeForce Now alliance. Built around specialized pods, each packing 1,280 GPUs in 10 racks, all interconnected with Mellanox high-speed interconnect technology, it expands NVIDIA’s GFN online gaming service through partnerships with global telecoms providers.
Together, GeForce NOW Alliance partners will scale GeForce NOW to serve millions more gamers, Huang said. Softbank and LG Uplus be among the first partners to deploy RTX cloud gaming servers in Japan and Korea later this year.
To underscore his announcement, he rolled a witty demo featuring characters in high-tech armor at a futuristic firing range, drawing broad applause from the audience. “Very few tech companies get to sit at the intersection of art and science and it’s such a thrill to be here,’ Huang said. “NVIDIA is the ILM of real time computer graphics and you can see it here.
Inviting makers to build on NVIDIA’s platform, Huang announced Jetson Nano. It’s a small, powerful CUDA-X AI computer delivering 472 GFLOPs of compute performance for running modern AI workloads, consumes just 5 watts. It supports the same architecture and software powering America’s fastest supercomputers.
Jetson Nano will come in two flavors, a $99 dev kit for makers, developers, learners, students available now; and a $129 production-ready module for creating mass-market AI powered edge systems available June 2019.
“Here’s the amazing thing about this little thing,” Huang said. “It’s 99 dollars — the whole computer — and if you use Raspberry Pi and you just don’t have enough computer performance you just get yourself one of these, and it runs the entire CUDA X AI stack.”
Huang also announced the general availability of the Isaac SDK, a toolbox that saves manufacturers, researchers and startups hundreds of hours by making it easier to add AI for perception, navigation and manipulation into next-generation robots.
Huang finished his keynote with a flurry of automotive news.
“Today we are announcing that the world’s largest car company is partnering with us from end to end,” Huang said.
The deal builds on ongoing relationship with Toyota to utilize DRIVE AGX Xavier AV compute and expands collaboration to new testing, validation using DRIVE Constellation — which is now available and allows automakers to simulate billions of miles of driving in all conditions.
And Huang announced Safety Force Field — a driving policy designed to shield self-driving cars from collisions, a sort of “cocoon,” of safety.
“We have a computational method that detects the surrounding cars and predicts their natural path – knowing our own path – and computationally avoids traffic,” Huang said, adding that the open software has been validated in simulation and can be combined with any driving software.
Whether advancing science, building self-driving cars or gathering business insight from mountains of data, data scientists, researchers and developers need powerful GPU compute. They also need the right software tools.
AI is complex and building models can be time consuming. So container technology plays a vital role in simplifying complex deployments and workflows.
At GTC 2019, we’ve supercharged NGC — a hub of essential software for deep learning, machine learning, HPC and more — with pre-trained AI models, model training scripts and industry-specific software stacks.
With these new tools, no matter your skill level, you can quickly and easily realize value with AI.
NGC Takes Care of the Plumbing, So You Can Focus on Doing Your Business
Data scientists’ time is expensive, and the compute resources they need to develop models are in high demand. If they spend hours and even days compiling a framework from the source just to find errors, that’s a loss of productivity, revenue and competitive edge.
Thousands of data scientists and developers have pulled performance-optimized deep learning framework containers like TensorFlow and PyTorch, updated monthly, from NGC because they can bypass time-consuming and error-prone deployment steps and instead focus on building their solutions.
NGC lowers the barrier to entry for companies that want to engage in the latest trends in computing. And for those already engaged, it lets them deliver greater value, faster.
Accelerate AI Projects with Pre-Trained Models and Training Scripts
Many AI applications have common needs: classification, object detection, language translation, text-to-speech, recommender engines, sentiment analysis and more. When developing applications or services with these capabilities, it’s much faster to tune a pre-trained model for your use case than to start from scratch.
NGC’s new model registry provides data scientists and researchers with a repository of the most popular AI models, giving them a starting point to retrain, benchmark and rapidly build their AI applications.
NGC enterprise account holders also can upload, share and version their own models across their organizations and teams through a hosted private registry. The model registry is accessible through https://ngc.nvidia.com and a command line interface, so users can deploy it in a hybrid cloud environment and provide their organizations with controlled access to versioned models.
NGC also provides model training scripts with best practices that take advantage of mixed precision powered by the NVIDIA Tensor Cores that enable NVIDIA Turing and Volta GPUs to deliver up to 3x performance speedups in training and inference over previous generations.
By offering models and training scripts that have been tested for accuracy and convergence, NGC provides users with centralization and curation of the most important NVIDIA deep learning assets.
Training and Deployment Stacks for Medical Imaging and Smart Cities
An efficient workflow across industries starts from pre-trained models and then performs transfer learning training with new data. Next, it prunes and optimizes the network, and then deploys to edge devices for inference. The combination of these pre-trained models with transfer learning eliminates the high costs associated with large-scale data collection, labeling and training models from scratch, providing domain experts a jumpstart on their deep learning workflows.
However, the details of the training, optimization and deployment differs dramatically by industry. NGC now provides industry-specific workflows for smart cities and medical imaging.
For medical imaging, the NVIDIA Clara Train SDK enables medical institutions to start with pre-trained models of MRI scans for organ segmentation and use transfer learning to improve those models based on datasets owned by that institution. Clara Train produces optimized models, which are then deployed using the NVIDIA Clara Deploy SDK to provide enhanced segmentation on new patient scans.
NGC-Ready Systems — Validated Platforms Optimized for AI Workloads
NGC-Ready systems, offered by top system manufacturers around the world, are validated by NVIDIA so data scientists and developers can quickly get their deep learning and machine learning workloads up and running optimally.
Maximum performance systems are powered by NVIDIA V100 GPUs, with 640 Tensor Cores and up to 32GB of memory. For maximum utilization, systems are powered by the new NVIDIA T4 GPUs, which excel across the full range of accelerated workloads — machine learning, deep learning, virtual desktops and HPC. View a list of validated NGC-Ready systems.
Deploy AI Infrastructure with Confidence
The adoption of AI across industries has skyrocketed. This has led IT teams to support new types of workloads, software stacks and hardware for a diverse set of users. While the playing field has changed, the need to minimize system downtime, and keep users productive, remains critical.
To address this concern, we’ve introduced NVIDIA NGC Support Services, which provide enterprise-grade support to ensure NGC-Ready systems run optimally and maximize system utilization and user productivity. These new services provide IT teams with direct access to NVIDIA subject-matter experts to quickly address software issues and minimize system downtime.
NGC Support Services are available through sellers of NGC-Ready systems, with immediate availability from Cisco for its NGC-Ready validated NVIDIA V100 system, Cisco UCS C480 ML. HPE will offer the services for the HPE ProLiant DL380 Gen10 server as a validated NGC-Ready NVIDIA T4 server in June. Several other OEMs are expected to begin selling the services in the coming months.
Get Started with NGC Today
Pull and run the NGC containers and pre-trained models at no charge on GPU-powered systems or cloud instances at ngc.nvidia.com.
Introduced today at NVIDIA’s GPU Technology Conference, CUDA-X AI is the only end-to-end platform for the acceleration of data science.
CUDA-X AI arrives as businesses turn to AI — deep learning, machine learning and data analytics — to make data more useful.
The typical workflow for all these: data processing, feature determination, training, verification and deployment.
CUDA-X AI unlocks the flexibility of our NVIDIA Tensor Core GPUs to uniquely address this end-to-end AI pipeline.
Capable of speeding up machine learning and data science workloads by as much as 50x, CUDA-X AI consists of more than a dozen specialized acceleration libraries.
It’s already accelerating data analysis with cuDF, deep learning primitives with cuDNN; machine learning algorithms with cuML; and data processing with DALI, among others.
Together, these libraries accelerate every step in a typical AI workflow, whether it involves using deep learning to train speech and image recognition systems or data analytics to assess the risk profile of a mortgage portfolio.
Each step in these workflows requires processing large volumes of data, and each step benefits from GPU accelerated computing.
Backed by Google, Intel, Baidu, NVIDIA and dozens more technology leaders, the new MLPerf benchmark suite measures a wide range of deep learning workloads. Aiming to serve as the industry’s first objective AI benchmark suite, it covers such areas as computer vision, language translation, personalized recommendations and reinforcement learning tasks.
NVIDIA achieved the best performance in the six MLPerf benchmark results it submitted for. These cover a variety of workloads and infrastructure scale – ranging from 16 GPUs on one node to up to 640 GPUs across 80 nodes.
The six categories include image classification, object instance segmentation, object detection, non-recurrent translation, recurrent translation and recommendation systems. NVIDIA did not submit results for the seventh category for reinforcement learning, which does not yet take advantage of GPU acceleration.
A key benchmark on which NVIDIA technology performed particularly well was language translation, training the Transformer neural network in just 6.2 minutes. More details on all six submissions are available on the NVIDIA Developer news center.
NVIDIA is the only company to have entered as many as six benchmarks, demonstrating the versatility of V100 Tensor Core GPUs for the wide variety of AI workloads deployed today.
“The new MLPerf benchmarks demonstrate the unmatched performance and versatility of NVIDIA’s Tensor Core GPUs,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “Exceptionally affordable and available in every geography from every cloud service provider and every computer maker, our Tensor Core GPUs are helping developers around the world advance AI at every stage of development.”
State-of-the-Art AI Computing Requires Full Stack Innovation
Performance on complex and diverse computing workloads takes more than great chips. Accelerated computing is about more than an accelerator. It takes the full stack.
NVIDIA’s AI platform is also the most accessible and affordable. Tensor Core GPUs are available on every cloud and from every computer maker and in every geography.
The same power of Tensor Core GPUs is also available on the desktop, with the most powerful desktop GPU, NVIDIA TITAN RTX costing only $2,500. When amortized over three years, this translates to just a few cents per hour.
NVIDIA’s Record-Setting Platform Available Now on NGC
The software innovations and optimizations used to achieve NVIDIA’s industry-leading MLPerf performance are available free of charge in our latest NGC deep learning containers. Download them from the NGC container registry.
The containers include the complete software stack and the top AI frameworks, optimized by NVIDIA. Our 18.11 release of the NGC deep learning containers includes the exact software used to achieve our MLPerf results.
Developers can use them everywhere, at every stage of development:
For enterprises, the containers accelerate the application of AI to their data in the cloud with NVIDIA GPU-accelerated instances from Alibaba Cloud, AWS, Baidu Cloud, Google Cloud Platform, IBM Cloud, Microsoft Azure, Oracle Cloud Infrastructure and Tencent Cloud.
For organizations building on-premise AI infrastructure, NVIDIA DGX systems and NGC-Ready systems from Atos, Cisco, Cray, Dell EMC, HP, HPE, Inspur, Lenovo, Sugon and Supermicro put AI to work.
To get started on your AI project, or to run your own MLPerf benchmark, download containers from the NGC container registry.
NVIDIA PhysX, the most popular physics simulation engine on the planet, is going open source.
We’re doing this because physics simulation — long key to immersive games and entertainment — turns out to be more important than we ever thought.
Physics simulation dovetails with AI, robotics and computer vision, self-driving vehicles, and high-performance computing.
It’s foundational for so many different things we’ve decided to provide it to the world in an open source fashion.
Meanwhile, we’re building on more than a decade of continuous investment in this area to simulate the world with ever greater fidelity, with on-going research and development to meet the needs of those working in robotics and with autonomous vehicles.
Free, Open-Source, GPU-Accelerated
PhysX will now be the only free, open-source physics solution that takes advantage of GPU acceleration and can handle large virtual environments.
It will be available as open source starting Monday, Dec. 3, under the simple BSD-3 license.
PhysX solves some serious challenges.
In AI, researchers need synthetic data — artificial representations of the real world — to train data-hungry neural networks.
In robotics, researchers need to train robotic minds in environments that work like the real one.
For self-driving cars, PhysX allows vehicles to drive for millions of miles in simulators that duplicate real-world conditions.
In game development, canned animation doesn’t look organic and is time consuming to produce at a polished level.
In high-performance computing, physics simulations are being done on ever more powerful machines with ever greater levels of fidelity.
The list goes on.
PhysX SDK addresses these challenges with scalable, stable and accurate simulations. It’s widely compatible, and it’s now open source.
PhysX SDK is a scalable multi-platform game physics solution supporting a wide range of devices, from smartphones to high-end multicore CPUs and GPUs.
It’s already integrated into some of the most popular game engines, including Unreal Engine (versions 3 and 4) and Unity3D.
No matter the industry, data science has become a universal toolkit for businesses. Data analytics and machine learning give organizations insights and answers that shape their day-to-day actions and future plans. Being data-driven has become essential to lead any industry.
While the world’s data doubles each year, CPU computing has hit a brick wall with the end of Moore’s law. For this reason, scientific computing and deep learning have turned to NVIDIA GPU acceleration. Data analytics and machine learning haven’t yet tapped into the GPU as systematically. That’s changing.
RAPIDS, launched today at GTC Europe, gives data scientists for the first time a robust platform for GPU-accelerated data science: analytics, machine learning and, soon, data visualization. And what’s more, the libraries are open-source, built with the support of open-source contributors and available immediately at www.RAPIDS.ai.
Initial benchmarks show game-changing 50x speedups with RAPIDS running on the NVIDIA DGX-2 AI supercomputer, compared with CPU-only systems, reducing experiment iteration from hours to minutes.
By the Community, for the Community
With a suite of CUDA-integrated software tools, RAPIDS gives developers new plumbing under the foundation of their data science workflows.
To make this happen, NVIDIA engineers and open-source Python contributors collaborated for two years. Building on key open-source projects including Apache Arrow, Pandas and scikit-learn, RAPIDS connects the data science ecosystem by bringing together popular capabilities from multiple libraries and adding the power of GPU acceleration.
RAPIDS will also integrate with Apache Spark, the leading open-source data science framework for data centers, used by more than 1,000 organizations.
A data science workshop following the GTC Europe keynote will feature a panel with luminaries of the open-source community — Travis Oliphant and Peter Wang, co-founders of Anaconda, as well as Wes McKinney, founder and creator of Apache Arrow and the Pandas software library, and a contributor to RAPIDS.
These pioneers will discuss the potential for RAPIDS for GPU-accelerated data science before an audience of developers, researchers and business leaders. At the workshop, Databricks, a company founded by the creators of Spark, will present on unifying data management and machine learning tools using GPUs.
It was a natural step for NVIDIA, as the creator of CUDA, to develop the first complete solution that integrates Python data science libraries with CUDA at the kernel level. By keeping it open source, we welcome further growth and contributions from other developers in the ecosystem.
This community is vast — tens of millions of downloads occur annually of the core data science libraries via the package manager Conda. Open-source development makes it easier for data scientists to rapidly adopt RAPIDS and maintain the flexibility to modify and customize tools for their applications.
There are 120 repositories on our GitHub page, including research algorithms, the CUTLASS library for matrix multiplication in CUDA and NVcaffe, our fork of the Caffe deep learning framework. And we’ll continue to contribute to RAPIDS alongside the open-source community, supporting data scientists as they conduct efficient, granular analysis.
Delivering Rapid Answers to Data Science Questions
Data scientists, and the insights they extract, are in high demand. But when relying on CPU systems, there’s always been a limit on how fast they can crunch data.
Depending on the size of their datasets, scientists may have a long wait for results from their machine learning models. And some may aggregate or simplify their data, sacrificing granularity for faster results.
With the adoption of RAPIDS and GPUs, data scientists can ramp up iteration and testing, providing more accurate predictions that improve business outcomes. Typical training times can shrink from days to hours, or from hours to minutes.
In the retail industry, it could allow grocery chains to estimate the optimal amount of fresh fruit to stock in each store location. For banks, GPU-accelerated insights could alert lenders about which homeowners are at risk of defaulting on a mortgage.
Access to the RAPIDS open-source suite of libraries is immediately available at www.RAPIDS.ai, where the code is being released under the Apache license. Containerized versions of RAPIDS will be available this week on the NVIDIA GPU Cloud container registry.
Unveiled last month, Turing is one of the biggest leaps in computer graphics in 20 years. As the first Turing-based GPUs hit the shelves, we’re delivering software innovations that help developers take advantage of this powerful architecture and boost computing performance.
Here’s a look at new software optimized for Turing and geared to advance the next generation of research, AI services and graphics.
A software library built on the CUDA platform, cuDNN accelerates deep learning training. Deep learning researchers and framework developers worldwide rely on cuDNN for high-performance GPU acceleration. It allows them to focus on training neural networks and developing software applications rather than spending time on low-level GPU performance tuning.
Deep learning frameworks using cuDNN 7.3 can leverage new features and performance of the Turing architecture to deliver faster training performance. cuDNN 7.3 highlights include improved grouped convolution and dilated convolution performance for popular deep learning applications in computer vision and speech.
Turing GPUs are loaded with Tensor Cores that accelerate deep learning inference, which is when neural networks are deployed in the field. These cores accelerate applications such as natural language processing, recommender apps and neural machine translation.
With new optimizations and APIs, TensorRT 5 delivers up to 40x faster inference performance over CPUs, helping developers run AI services in real time. In addition to Linux, TensorRT 5 introduces support for new operating systems: Windows and CentOS.
NCCL enables fast communication between GPUs across various network interfaces, allowing deep learning frameworks to deliver efficient multi-node, multi-GPU training at scale. Using NCCL 2.3 and later, developers can leverage new features of Turing architecture and benefit from improved low latency algorithms.
The NVIDIA NGX technology stack provides pre-trained networks and AI-based features that enhance graphics, accelerate video processing and improve image quality. These features rely on Tensor Cores found in RTX GPUs to maximize their efficiency and performance.
With the new RT Cores and advanced shaders on Turing GPUs, the NGX SDK can boost game performance and enhance digital content creation pipelines. The NGX SDK will be available in the next few weeks.
Turing enables real-time ray tracing, AI and advanced shading techniques to bring virtual reality experiences to a level of realism far beyond the capabilities of traditional VR rendering.
The new VRWorks Graphics SDK improves performance and quality for VR applications by taking advantage of Turing’s variable rate shading abilities to concentrate rendering power where the eye is focused. It will also accelerate the next generation of ultra-wide field of view headsets with multi-view rendering.
The Nsight tool suite equips developers with powerful debugging and profiling tools to optimize performance, analyze bottlenecks and observe system activities.
Nsight Systems helps developers identify bottlenecks across their CPUs and GPUs, providing the insights needed to optimize their software. The new version includes CUDA 10 support and other enhancements.
Nsight Compute is an interactive CUDA API debugging and kernel profiling tool. The current version offers fast data collection of detailed performance metrics and API debugging via a user interface and command line tool.
Nsight Visual Studio Edition is an application development environment that allows developers to build, debug, profile and trace GPU applications. The new version includes graphics debugging with ray-tracing support and enhanced compute debugging and analysis with CUDA 10 support.
Nsight Graphics supports debugging, profiling and exporting frames built with popular graphics APIs. The new version makes GPU Trace publicly available and adds support for Vulkan ray tracing extensions.