How to choose the best GPU for your machine learning project

Scaling Up Your Deep Learning: GPUs, Cloud, and Everything in Between

As technology advances and more organizations are implementing machine learning operations (MLOps), people are looking for ways to speed up processes. This is especially true for organizations working with deep learning (DL) processes which can be incredibly lengthy to run. You can speed up this process by using graphical processing units (GPUs) on-premises or in the cloud.

Scaling Up Your Deep Learning: GPUs, Cloud, and Everything in Between

Download PDF

GPU is a powerful tool for speeding up a data pipeline with a deep neural network. The first reason to use GPU is that DNN inference runs 3-4 times faster on GPU compared to CPU with the same pricing. The second reason is taking off the load from the CPU, which allows doing more work at the same instance and reduces network load.

Typical Deep learning pipeline with GPU consists of:

– Data preprocessing (CPU)

– DNN execution: training or inference (GPU)

– Data post-processing (CPU)

Data transfer between CPU ram and GPU ram is the most common bottleneck. Therefore there are two main aims for building Data science pipeline architecture. The first is to reduce the number of transferring data transactions by aggregation several samples (images) into a batch. The second is to reduce the size of a specific sample by filtering data before transferring.

Ilia Iorin - Data Science Engineer at MobiDev

Ilia Iorin

Data Science Engineer at MobiDev

How are GPUs Advancing Deep Learning?

Training and implementing DL models require the use of deep neural networks DNN and datasets with hundreds of thousands of data points. These networks require significant resources, including memory, storage, and processing power. While central processing units (CPUs) can provide this power, the use of graphical processing units (GPUs) can substantially speed the process. 

GPUs are microprocessors that are specially designed to perform specific tasks. These units enable parallel processing of tasks and can be optimized to increase performance in artificial intelligence and deep learning processes. In particular, the benefits of using GPUs with deep learning include:

  • The number of cores—GPUs can have a large number of cores, can be clustered, and can be combined with CPUs. This enables you to significantly increase processing power. 
  • Higher memory—GPUs can offer higher memory bandwidth than CPUs (up to 750GB/s vs 50GB/s). This enables you to more easily handle the large amounts of data required for deep learning.
  • Flexibility—GPUs capacity for parallelism enables you to combine GPUs in clusters and distribute tasks across the cluster. Or, you can use GPUs individually with clusters assigned to the training of individual algorithms.

On-Premises GPU Options for Deep Learning

When using GPUs for on-premises implementations you have multiple vendor options. Two popular choices are NVIDIA and AMD.


NVIDIA is a popular option at least in part because of the libraries it provides, known as the CUDA toolkit. These libraries enabled the easy establishment of deep learning processes and formed the base of a strong machine learning community with NVIDIA products. This can be seen in the widespread support that many DL libraries and frameworks provide for NVIDIA hardware.

In addition to GPUs, the company also offers libraries supporting popular DL frameworks, including PyTorch. The Apex library in particular is useful and includes several fused, fast optimizers, such as FusedAdam. 

The downside of NVIDIA is that it has recently placed restrictions on when CUDA can be used. These restrictions require that the libraries only be used with Tesla GPUs and cannot be used with the less expensive RTX or GTX hardware. 

This has serious budget implications for organizations training DL models. It is also problematic when you consider that although Tesla GPUs do not offer significantly more performance over the other options, the units cost up to 10x as much. 


AMD provides libraries, known as ROCm. These libraries are supported by TensorFlow and PyTorch as well as all major network architectures. However, support for the development of new networks is limited as is community support. 

Another issue is that AMD does not invest as much into its deep learning software as NVIDIA. Because of this, AMD GPUs provide limited functionality in comparison to NVIDIA outside of their lower price points.

Cloud Computing with GPUs

An option that is growing in popularity with organizations training DL models is the use of cloud resources. These resources can provide pay-for-use access to GPUs in combination with optimized machine learning services. All three major providers offer GPU resources along with a host of configuration options. 


Azure grants a variety of instance options for GPU access. These instances have been optimized for high computation tasks, including visualization, simulations, and deep learning.

Within Azure, there are three main series of instances you can choose from: 

  • NC-series—instances optimized for network and compute-intensive workloads. For example, CUDA and OpenCL-based simulations and applications. Instances provide high performance based on the NVIDIA Tesla V100, the Intel Haswell, or the Intel Broadwell GPUs.
  • ND-series—instances optimized for inference and training scenarios for deep learning. Instances provide access to NVIDIA Tesla P40, Intel Broadwell, or Intel Skylake GPUs. 
  • NV-series—instances optimized for virtual desktop infrastructures, streaming, encoding, or visualizations and support DirectX and OpenGL. Instances provide access to NVIDIA Tesla M60 or AMD Radeon Instinct MI25 GPUs. 


In AWS you can choose from four different options, each with a variety of instance sizes different. Options include EC2 P3, P2, G4, and G3 instances. These options enable you to choose between NVIDIA Tesla V100, K80, T4 Tensor, or M60 GPUs. You can scale up to 16 GPUs depending on the instance.

To enhance these instances, AWS also offers Amazon Elastic Graphics, a service that enables you to attach low-cost GPU options to your EC2 instances. This enables you to use GPUs with any compatible instance as needed. This service provides greater flexibility for your workloads. Elastic Graphics provides support for OpenGL 4.3 and can offer up to 8GB of graphics memory.

Google Cloud

Rather than dedicated GPU instances, Google Cloud enables you to attach GPUs to your existing instances. For example, if you are using Google Kubernetes Engine you can create node pools with access to a range of GPUs. These include NVIDIA Tesla K80, P100, P4, V100, and T4 GPUs.

Google Cloud also offers the Tensorflow processing unit (TPU). This unit includes multiple GPUs designed for performing fast matrix multiplication. It provides similar performance to Tesla V100 instances with Tensor Cores enabled. The benefit of TPU is that it can provide cost savings through parallelization.

Each TPU is the equivalent of four GPUs, enabling comparatively larger deployments. Additionally, TPUs are now at least partially supported by PyTorch.

Should You Choose On-Premise or Cloud for Deep Learning Infrastructure?

When the time comes to choose your infrastructure you need to decide between an on-premises and a cloud approach. Cloud resources can significantly lower the financial barrier to building a DL infrastructure. 

These services can also provide scalability and provider support. However, these infrastructures are best for short term projects since consistent resource use can cause costs to balloon.

In contrast, on-premises infrastructures are more expensive upfront but provide you with greater flexibility. You can use your hardware for as many experiments as you want over as long a period as you want with stable costs. You also retain full control over your configurations, security, and data.

For organizations that are just getting started, cloud infrastructures make more sense. These deployments enable you to start running with minimal upfront investment and can give you time to refine your processes and requirements. However, once your operation grows large enough, switching to on-premises could be the choice.


To advance quickly, machine learning workloads require high processing capabilities. As opposed to CPUs, GPUs can provide an increase in processing power, higher memory bandwidth, and a capacity for parallelism. 

You can use GPUs on-premises or in the cloud. Popular on-premise GPUs include NVIDIA and AMD. Cloud-based GPU can be provided by many cloud vendors, including the top three—Azure, AWS, and Google Cloud. When choosing between on-premise and cloud GPU resources, you should consider budget and skills. 

On-premise resources typically come with a high upfront overhead, but the cost can stabilize in the long term. However, if you do not have the necessary skills to operate on-premise resources, you should consider machine learning services or cloud offerings, which can be easier to scale and often come with managed options.

Scaling Up Your Deep Learning: GPUs, Cloud, and Everything in Between

Download PDF
AI-Based Visual Inspection For Defect Detection1

AI-Based Visual Inspection For Defect Detection

10 AI And Machine Learning Trends To Impact Business In 2020

10 AI and Machine Learning Trends To Impact Business in 2020

Artificial intelligence and machine learning use cases in the manufacturing industry

AI and Machine Learning as the Next Big Thing in Manufacturing

Want to get in touch?

contact us