Deep Learning on a spot instance - save money and run only when needed on AWS!
At some point (perhaps mid-2021), AWS seems to have disabled Spot as an option for P-class instances. It may be possible to request a limit increase to run a P spot instance. See notes on instance types at the end.
I had a web scraping job that collected a lot of news articles I wanted to summarize using machine learning (PEGASUS, to be specific). Summarization required a large EC2 instance that I didn't want to leave running - so I decided to "bid" on a spot instance every day when I needed it. Spot prices were often 25 cents on the dollar and only run when needed so overall my deep learning predictions were 98% cheaper than if on an on demand instance running all the time.
Below is a breakdown of how to request a spot instance and run a task on it for deep learning or whatever your use case may be! I used python/boto3 for this.
Python Initiated Spot Request
Import packages and create client
Make request
And finally I make the request with the price I'm willing to pay
ClientToken is a unique string that AWS uses to ensure your request isn't submitted multiple times.[:10],
is the date which means that I can't accidentally spin up multiple instances in the same day.
Final script
import boto3
import base64
import datetime
client = boto3.client("ec2", "us-east-2")
userData = '''
Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"
- [scripts-user, always]
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"
sudo shutdown -P +120; aws s3 cp s3://my-bucket-name/ ./ && sudo runuser -l ubuntu -c '/bin/bash /tmp/ > run_pred.log 2>&1'
launchSpecification = {
"IamInstanceProfile": {
"Arn": "arn:aws:iam::MY-AWS-ACCOUNT-ID:instance-profile/MY-INSTANCE-IAM",
"BlockDeviceMappings": [
"DeviceName": "/dev/sda1", # note: '/dev/xvda' on some instance types
"Ebs": {
"DeleteOnTermination": True,
"VolumeSize": 80,
"VolumeType": "standard",
"ImageId": "ami-MY-AMI-ID",
"KeyName": "MY-SSH-KEY",
"SecurityGroups": ["MY-SG-1-NAME", "MY-SG-2-NAME"],
"InstanceType": "m4.xlarge",
"Placement": {
"AvailabilityZone": "us-east-2a",
"SecurityGroupIds": ["sg-MY-SG-1", "sg-MY-SG-2"],
"UserData": base64.b64encode(
For Reference:
Instance type notes:
If you hit an error like botocore.exceptions.ClientError: An error occurred (MaxSpotInstanceCountExceeded) when calling the RequestSpotInstances operation: Max spot instance count exceeded
check to see what instance types you're able to use. When I searched for spot in June 2022 I saw that "All P Spot Instance Requests" only allowed "0 vCPUs" explaining why my old script with p2.xlarge was failing!
For deep learning you'll want a GPU instance. AWS seems to have created a new class that has GPU's and can be run as spot - the DL instance type - but that's like $15+ per hour! You may be able to request a limit increase to run P instances if you need a GPU.
set -e
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
# Set up directories
mkdir /home/ubuntu/analysis
mkdir /home/ubuntu/analysis/pred
# fetch the data from S3
aws s3 sync s3://MY_BUCKET/data /home/ubuntu/analysis/data
aws s3 sync s3://MY_BUCKET/model /home/ubuntu/analysis/model
cd /home/ubuntu/analysis/model
# Install packages
pip3 install --user -r requirements.txt
pip3 install --user pipenv ipython
pipenv --site-packages
pipenv install
# Predict
pipenv run python
# Send predictions to S3
aws s3 sync /home/ubuntu/analysis/pred s3://MY_BUCKET/pred/`date +"%Y%m%d"`
# Shutdown after making predictions to limit costs. This was commented out in case someone happened to copy and paste this accidentally...
# sudo shutdown