I recently went down the rabbit hole trying out the newest bioinformatics workflow manager, redun. While installation and running workflows locally went off without a hitch, I experienced some trouble getting jobs deployed to AWS Batch. Here’s a list of my troubleshooting steps, in case you experience the same issues. To start, I followed the instructions for the “05_aws_batch” example workflow.
I was deploying the workflow on my AWS account at Loyal. This may change if you’re using a new AWS account, or have different security policies in place.
Building docker images
Docker needs root access to build and push images to a registry. In practice, this often means using “sudo” before every command. You can fix this with the command sudo chmod 666 /var/run/docker.sock
Or see the longer fix in this stack overflow post.
Submitting jobs to AWS Batch
I experienced the following error when submitting jobs to AWS Batch:
upload failed: - to s3://MY-BUCKET/redun/jobs/ca27a7f20526225015b01b231bd0f1eeb0e6c7d8/status
An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden
I thought this was due to an error in the “role” setting, and that was correct. I first tried using the generic role
arn:aws:iam::ACCOUNT-ID:role/aws-service-role/batch.amazonaws.com/AWSServiceRoleForBatch
but that didn’t work.
I then added a custom IAM role to AWS with S3, EC2, ECS and Batch permissions. I added the following permissions as well:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
And then everything worked as expected.
ECS unable to assume role
I heard from someone else trying redun for the first time that they were able to get the batch submission working with the (similar) instructions at this stack overflow post
I hope this helps anyone trying to deploy redun to AWS Batch for the first time!