First, some basic background:
Amazon Web Services (AWS) let us run our programs on rented virtual computers. Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides configurable compute in the cloud. To instantiate a virtual machine within an EC2 instance, you need to select a type of Amazon Machine Images (AMI), which is a type of virtual appliance. For deep learning practitioners, we typically use AWS Deep Learning AMI (DLAMI), which is servers hosted by Amazon that are specially dedicated to deep learning tasks. DLAMI machine instance supports many instance types that are CPU or GPU powered. Moreover, it’s preconfigured with NVIDIA CUDA and all the popular deep learning frameworks like TensorFlow, PyTorch, Caffe, Keras and so on.
This guide aims to be simple and hands-on. To learn more about AWS and its services, AWS provides many detailed documentations. I actually refrained from reading them unless I run into specific problems. Now let’s dive into the setup process.
1: Sign in to EC2 Console
After signing up for an AWS account, Sign In to the Console.
Select EC2 service. You can search under Find Services. In the future it will appear in the Recently visited services.
2: Choose an instance (and reset limit for first-timer)
For details on different instance types, you can check out this page. Since P2 instances are intended for general-purpose GPU compute applications, I decided to use p2.xlarge model with a single GPU at $0.9/hour. Now I have an instance type in mind, I need to check its current limit.
In the left column of the Console, under EC2 Dashboard, select Limits.
When you first create an account, AWS sets the limits listed on the page, but you have the option to request limit increase for your selected instance type.
Under the Instance Limits, the current limit for Running On-Demand p2.xlarge instances is $0$ in us-west-2 (Oregon) region. I need to request a limit increase to $1$. When you click on Request limit increase, simply fill in the request form. Inside the form under Case description, I just put “request a limit increase for deep learning projects”. When you submit the request, you should be able to hear back in a few hours. Once you’ve been approved of a limit increase, you don’t need to do this again.
3: Key Pair
Last step before launching an instance is to set up a secure key pair.
- Open Terminal
enter. You’ll see a path to your
cdto the location where your
- Copy the
id_rsa.pubfile in the Finder and paste it to Desktop or any directory (you’ll delete it later)
Back inside the EC2 Console, in the left column under NETWORK & SECURITY, select Key Pairs, then select Import Key Pair, then choose the file you just copied and pasted, which is your public key used to connect with AWS. Name your key if you want, then hit Import and you’re done.
4: Create launch template
I may create the same instance type multiple times in the future, so it’s easiest to create a template.
Inside the EC2 Console, in the left column under INSTANCES, select Launch Templates, then select Create launch template. Put in your template name, description if you’d like.
Under Launch template contents, to fill in AMI ID, select Search for AMI
deep learning, the dropdown menu should list something with Deep learning AMI (Ubunte) Version xx. Just select the top result of this format.
For instance type, choose the one you selected. Here I choose p2.xlarge so I typed
Lastly, under key pair name, select the key you just imported.
Now just scroll down and hit Create launch template
5: Launch instance
You should be directed back to the Launch Templates page in the EC2 Console. Click the Actions button and select Launch instance from template. Review everything and launch.
Now you can check your instance under Instances. It may take a while for your Instance State to go from pending to running. Once it’s running, look for the IPv4 Public IP of your instance under Description. For example, your IP may look like
126.96.36.199. Click Copy to clipboard right next to it.
- Open Terminal
ssh -L localhost:8888:localhost:8888 ubuntu@IP you just copied
- For example,
ssh -L localhost:8888:localhost:8888 firstname.lastname@example.org
- If prompted
Are you sure you want to continue connecting (yes/no)?type
- Follow the direction:
Copy/paste this URL into your browser when you connect for the first time
- In the future, Copy/paste the URL from
The Jupyter Notebook is running at:
- To quit jupyter, type
source activate tensorflow_p36to enter the virtual env of TensorFlow with Python3.
6: Stop / Terminate
To prevent incurring charges, remember to always
terminate your running instance under the Actions–>Instant state of your instance page.
Once you terminated an instance, all storage is deleted, so you cannot restart it. On the other hand, stopping your instance will still keep the storage volume (may incur costs), but you can restart it later.
You can check out the instance lifecycle and the costs here.
If you want to monitor the current status of the GPU(s), it’s useful to know the NVIDIA System Management Interface (nvidia-smi). It is a command line utility aims to manage and monitor GPU devices.
- Open a new Terminal
ssh -L localhost:8888:localhost:8888 ubuntu@your IPv4 Public IP
It will show a summary table of your GPUs with their live memory consumption.
nvidia-smi -hto see help
This is the official tutorial to get started on DLAMI.
That’s it! Hope AWS doesn’t seem that scary now 😜