AWS Deep Learning Guide

Quick guide to set up deep learning in the cloud

First, some basic background:

Amazon Web Services (AWS) let us run our programs on rented virtual computers. Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides configurable compute in the cloud. To instantiate a virtual machine within an EC2 instance, you need to select a type of Amazon Machine Images (AMI), which is a type of virtual appliance. For deep learning practitioners, we typically use AWS Deep Learning AMI (DLAMI), which is servers hosted by Amazon that are specially dedicated to deep learning tasks. DLAMI machine instance supports many instance types that are CPU or GPU powered. Moreover, it’s preconfigured with NVIDIA CUDA and all the popular deep learning frameworks like TensorFlow, PyTorch, Caffe, Keras and so on.

This guide aims to be simple and hands-on. To learn more about AWS and its services, AWS provides many detailed documentations. I actually refrained from reading them unless I run into specific problems. Now let’s dive into the setup process.


1: Sign in to EC2 Console

After signing up for an AWS account, Sign In to the Console.

Select EC2 service. You can search under Find Services. In the future it will appear in the Recently visited services.


2: Choose an instance (and reset limit for first-timer)

For details on different instance types, you can check out this page. Since P2 instances are intended for general-purpose GPU compute applications, I decided to use p2.xlarge model with a single GPU at $0.9/hour. Now I have an instance type in mind, I need to check its current limit.

In the left column of the Console, under EC2 Dashboard, select Limits.

When you first create an account, AWS sets the limits listed on the page, but you have the option to request limit increase for your selected instance type.

Under the Instance Limits, the current limit for Running On-Demand p2.xlarge instances is $0$ in us-west-2 (Oregon) region. I need to request a limit increase to $1$. When you click on Request limit increase, simply fill in the request form. Inside the form under Case description, I just put “request a limit increase for deep learning projects”. When you submit the request, you should be able to hear back in a few hours. Once you’ve been approved of a limit increase, you don’t need to do this again.


3: Key Pair

Last step before launching an instance is to set up a secure key pair.

  1. Open Terminal
  2. Type ssh-keygen and press enter. You’ll see a path to your id_rsa key.
  3. cd to the location where your .ssh folder is
  4. Type open .
  5. Copy the id_rsa.pub file in the Finder and paste it to Desktop or any directory (you’ll delete it later)

Back inside the EC2 Console, in the left column under NETWORK & SECURITY, select Key Pairs, then select Import Key Pair, then choose the file you just copied and pasted, which is your public key used to connect with AWS. Name your key if you want, then hit Import and you’re done.


4: Create launch template

I may create the same instance type multiple times in the future, so it’s easiest to create a template.

Inside the EC2 Console, in the left column under INSTANCES, select Launch Templates, then select Create launch template. Put in your template name, description if you’d like.

Under Launch template contents, to fill in AMI ID, select Search for AMI ami Then type deep learning, the dropdown menu should list something with Deep learning AMI (Ubunte) Version xx. Just select the top result of this format. dlami

For instance type, choose the one you selected. Here I choose p2.xlarge so I typed p2. instance-type

Lastly, under key pair name, select the key you just imported. my-key

Now just scroll down and hit Create launch template


5: Launch instance

You should be directed back to the Launch Templates page in the EC2 Console. Click the Actions button and select Launch instance from template. Review everything and launch. launch

Now you can check your instance under Instances. It may take a while for your Instance State to go from pending to running. Once it’s running, look for the IPv4 Public IP of your instance under Description. For example, your IP may look like 12.13.123.123. Click Copy to clipboard right next to it.

  1. Open Terminal
  2. Type ssh -L localhost:8888:localhost:8888 ubuntu@IP you just copied
  3. For example, ssh -L localhost:8888:localhost:8888 ubuntu@12.13.123.123
  4. If prompted Are you sure you want to continue connecting (yes/no)? type yes
  5. Type jupyter notebook
  6. Follow the direction: Copy/paste this URL into your browser when you connect for the first time
  7. In the future, Copy/paste the URL from The Jupyter Notebook is running at:
  8. To quit jupyter, type Ctrl C and y
  9. Type source activate tensorflow_p36 to enter the virtual env of TensorFlow with Python3.

6: Stop / Terminate

To prevent incurring charges, remember to always stop or terminate your running instance under the Actions–>Instant state of your instance page.

Once you terminated an instance, all storage is deleted, so you cannot restart it. On the other hand, stopping your instance will still keep the storage volume (may incur costs), but you can restart it later.

You can check out the instance lifecycle and the costs here.


Monitor GPUs:

If you want to monitor the current status of the GPU(s), it’s useful to know the NVIDIA System Management Interface (nvidia-smi). It is a command line utility aims to manage and monitor GPU devices.

  1. Open a new Terminal
  2. Type ssh -L localhost:8888:localhost:8888 ubuntu@your IPv4 Public IP
  3. Type nvidia-smi

It will show a summary table of your GPUs with their live memory consumption.

  • Type nvidia-smi -h to see help

More questions:

This is the official tutorial to get started on DLAMI.


That’s it! Hope AWS doesn’t seem that scary now 😜

Avatar

Related

Next
Previous
comments powered by Disqus