tl;dr security is hard, authorization is hard, authentication is hard, authenticating non-humans is hard. Don’t despair. Use least-privileges, apply policies on both clients and resources, monitor usage of credentials, if you are forced to use hard-coded credentials let them only be able to call sts:AssumeRole and rotate them frequently.
There’s a lot of discussion here about hard-coding AWS credentials into files, and how to authenticate and authorize non-human tools for AWS access. Here are my two cents.
Firstly when it comes to AWS resources there are always 2 sides to the coin - the client and the resource. Your blog post talks about how to authorize a client to access an S3 bucket, but never mentions the IAM resource policy on the S3 bucket itself. This is because when you call boto3’s create_bucket function, which calls S3’s CreateBucket API, for backwards-compatibility reasons it creates a publicly-readable S3 bucket, which is why the news is full of “omg the cloud is insecure everyone’s personal information got leaked”. Please avoid creating publicly-readable S3 buckets; the console makes it extremely difficult to do this any more, but the API is just an API and will let you do it.
AWS users who use S3 buckets should enable and run the Access Analyzer for S3, identify publicly accessible S3 buckets in their account, and take appropriate action. They may just be e.g. static websites, which is fine, or may store personally identifiable information (PII), which is not fine. Also consider whether your data should be encrypted-at-rest. with a KMS key that you own and control. or even a straight-up symmetric key you own. This is again defense-in-depth in case someone gets access to the S3 bucket.
Secondly, if your goal is “give a user access to a particular bucket/prefix and only be able to use a subset of APIs”, instead of creating credentials you could use S3 access points. An S3 access point creates a completely new DNS endpoint https://[access_point_name]-[accountID].s3-accesspoint.[region].amazonaws.com and when users hit this DNS endpoint to perform options S3 enforces policies like “You can only call GetObject and PutObject” on your behalf. This is an easy way to enforce IAM resource policies on the S3-side instead, and you can create multiple S3 access points for a single bucket (unlike an S3 bucket policy).
That just simplifies the authorization story, what am I allowed to do. But authentication, who am I, is always tricky. This DNS endpoint for an S3 access point is not a secret nor should it be, anyone can call it. If I am a human I can call e.g. AWS STS AssumeRoleWithWebIdentity, assume some role, and then on the S3 access point only allow access from that role. OK. But how do I know e.g. the backup script that is running nightly on your VPS is the backup script?
This is simple to answer for e.g. an AWS EC2 instance. If you attach an IAM role to an AWS EC2 instance’s execution role, you delegate this problem to AWS. But this doesn’t help you, you want to access an S3 bucket from a VPS host. That’s fine, but then you need to solve the problem on a VPS host.
If you hard-code credentials on the VPS host that e.g. gives sts:AssumeRole permission for this new role, what happens if you accidentally version-control those credentials or the VPS provider steals those credentials? Maybe this is OK if you rotate these hard-coded credentials regularly and monitor AWS CloudTrail logs for who uses the credentials.
You could enforce that clients form certain IP ranges are allowed to assume a role but these is not a sufficient control.
Aha! Maybe an EC2 instance has permissions and you SSH to it before uploading files or something. But how do you provision and control access to the SSH private/public key pair?
Set up some secure service that you call over HTTP to get temporary credentials. But how do you authenticate your script with the secure service?
It’s at this point that people throw up their hands and say “Authentication is hard! I’ll just put the credentials in a file and set a calendar reminder to rotate them once every 30 days, and set up some tools to analyze CloudTrail access logs”. For your threat model this may be fine.
But maybe you can dive deeper into your threat model and think…hmmm. I’m backing up data from a VPS to S3. Surely this can be an append-only backup, and I only need to grant s3:PutObject permission to some role that the VPS can assume using STS. That way if someone steals the credentials the worst they can do is put more data in. I’d have to pay for it which sucks but they can’t read my data. Hmm, they could overwrite it? But I can set up object locks to prevent overwrites. etc.
This is because when you call boto3’s create_bucket function, which calls S3’s CreateBucket API, for backwards-compatibility reasons it creates a publicly-readable S3 bucket, which is why the news is full of “omg the cloud is insecure everyone’s personal information got leaked”.
If you hard-code credentials on the VPS host that e.g. gives sts:AssumeRole permission for this new role, what happens if you accidentally version-control those credentials or the VPS provider steals those credentials?
This is why I wanted separate per-bucket credentials in the first place: I want to minimize the damage someone could do with stolen credentials should they access them. Much rather my VPS provider steals credentials for a single bucket than for my entire account!
But maybe you can dive deeper into your threat model and think…hmmm. I’m backing up data from a VPS to S3. Surely this can be an append-only backup, and I only need to grant s3:PutObject permission to some role that the VPS can assume using STS.
Object locks are interesting - I hadn’t seen those! I like the idea of using them to prevent leaked write-only credentials from being used to over-write previously written paths.
Secondly, if your goal is “give a user access to a particular bucket/prefix and only be able to use a subset of APIs”, instead of creating credentials you could use S3 access points.
Whoa, I had not seen those before - looks like they were only added in 2019. Shall investigate, thank you!
I’d love to provide a link in the README to material people can read that has solid, easy-to-follow recommendations for the absolute best practices on this kind of stuff- but I’ve been hoping to run into useful information like that for years, and the best material still seems to show up in comments like this one!
am I missing something here? I definitely don’t want to be creating public buckets by default!
No, then I am wrong, when you call CreateBucket the bucket is not publicly-readable by default.
I’d love to provide a link in the README to material people can read that has solid, easy-to-follow recommendations for the absolute best practices on this kind of stuff- but I’ve been hoping to run into useful information like that for years, and the best material still seems to show up in comments like this one!
Maybe a blog post idea for me! It’s always tough putting your neck out giving prescriptive best practice advice because you may be wrong, or things may change. Also it’s a broad topic, it’s hard to focus and come up with narrow objectives.
As someone who spent a lot of this year on compliance audits and thinking about “least privilege in AWS”, this is such a detailed but also clear write up on this topic. Thank you for taking the time to write and post it.
PS: at first I wondered “who are you, who are so wise in the ways of science?”, but then I checked and saw you work at AWS, https://asim.ihsan.io/about/, and went 💡
A word of caution: embedding the secrets in your apps is a big NO NO in aws land.
I am afraid this solution goes exactly against what you are supposed to do, in the real case:
you’ll create a role to access the bucket
your app will call the STS (secure token service) assume_role_with_web_identity or similar ( after successful authentication, you can use openid or saml, or other federation if that’s your thing).
THAT will give you a set of tokens you can use to deal with the bucket.
note that likely amplify is the easiest way to deal with this currently
came here to post basically the same thing - storing credentials is the wrong approach, assume a role that’s scoped to exactly what it needs and leave secret issuance to STS.
I don’t understand how I can build my projects against this. If I’m going to call assume role I need to have credentials that let me call that, right? So something needs to be stored somewhere.
Here are some examples of things I have built or want to build with S3:
A backup script that runs nightly on a VPS via cron and sends data to S3. I want to set this up once and forget about it.
A GitHub Actions workflow that runs on every commit, builds an asset of some sort and stores that in S3. This needs to work from stable credentials that are stored in GitHub’s secrets mechanism.
A stateless web application deployed to Vercel that needs to be able to access assets in a private S3 bucket.
Logging configuration: I want to use a tool like Papertrail and give it the ability to write gathered logs to an S3 bucket that I own
None of these cases feature an authenticated user session or any type - they all require me to create long lived credentials that I store in secrets.
That does look like a good option for GitHub Actions - where my code is running in a contest that has an authenticated session I can exchange for another token - but it doesn’t help for cron scripts or anything where I want my code to run in a situation that doesn’t have access to credentials that can be exchanged in that way.
Confession; I’ve read that GitHub documentation on OIDC a couple of times now and it gives me the e impression that actually implementing that would take me the best part of a day to figure out (mostly at the AWS end) - it seems really hard! I wish it wasn’t.
I looked at the source code (before even reading the post) and I was extremely happy: more code should be written like this. Granted it’s a simple tool, but the code was simple and to the point. Most of my difficulties in understanding it came from not knowing much about S3 bucket credentials; I was never confused because the author used too many abstractions or tried to be clever in how they wrote the code. Congrats!
I’ve said before that Amazon should create a new version of S3 (“S4”) that has sane defaults and workflows. There are whole business models that exist just because creating an S3 bucket is harder than it needs to be (JAMStack).
I’m reading more and more about different use cases for datasette, which is what the author is using this for, mostly.
On the one hand, I want to install it and play with my data. I’m kind of busy though, and I can’t find the time to integrate I’ve datasette with my existing workflows. In example, I don’t want to deal with transferring my photos from Google photos to s3.
On the other hand, I also want to just start small and make my own little set of personal data mining tools scoped precisely to myself. But that would take even more time.
In the end I just never dig into it in any way and use the precious little free time to read a book.
This is a constant challenge for me - I’m really keen on driving down the friction involved in trying out Datasette, because I have exactly the same problem exploring other projects myself.
The biggest improvements I’ve made here are the following:
Datasette has a macOS Desktop app now which doesn’t require you to have a working Python install - you download Datasette.app from https://datasette.io/desktop and open the icon
brew install datasette works and will get you the Datasette CLI tool
I’m always looking for new ways I can make getting started easier! I have a beta version of a SaaS hosted version too, which I’m currently making some major upgrades to.
tl;dr security is hard, authorization is hard, authentication is hard, authenticating non-humans is hard. Don’t despair. Use least-privileges, apply policies on both clients and resources, monitor usage of credentials, if you are forced to use hard-coded credentials let them only be able to call
sts:AssumeRole
and rotate them frequently.There’s a lot of discussion here about hard-coding AWS credentials into files, and how to authenticate and authorize non-human tools for AWS access. Here are my two cents.
Firstly when it comes to AWS resources there are always 2 sides to the coin - the client and the resource. Your blog post talks about how to authorize a client to access an S3 bucket, but never mentions the IAM resource policy on the S3 bucket itself. This is because when you call boto3’s
create_bucket
function, which calls S3’sCreateBucket
API, for backwards-compatibility reasons it creates a publicly-readable S3 bucket, which is why the news is full of “omg the cloud is insecure everyone’s personal information got leaked”. Please avoid creating publicly-readable S3 buckets; the console makes it extremely difficult to do this any more, but the API is just an API and will let you do it.AWS users who use S3 buckets should enable and run the Access Analyzer for S3, identify publicly accessible S3 buckets in their account, and take appropriate action. They may just be e.g. static websites, which is fine, or may store personally identifiable information (PII), which is not fine. Also consider whether your data should be encrypted-at-rest. with a KMS key that you own and control. or even a straight-up symmetric key you own. This is again defense-in-depth in case someone gets access to the S3 bucket.
Secondly, if your goal is “give a user access to a particular bucket/prefix and only be able to use a subset of APIs”, instead of creating credentials you could use S3 access points. An S3 access point creates a completely new DNS endpoint
https://[access_point_name]-[accountID].s3-accesspoint.[region].amazonaws.com
and when users hit this DNS endpoint to perform options S3 enforces policies like “You can only call GetObject and PutObject” on your behalf. This is an easy way to enforce IAM resource policies on the S3-side instead, and you can create multiple S3 access points for a single bucket (unlike an S3 bucket policy).That just simplifies the authorization story, what am I allowed to do. But authentication, who am I, is always tricky. This DNS endpoint for an S3 access point is not a secret nor should it be, anyone can call it. If I am a human I can call e.g. AWS STS
AssumeRoleWithWebIdentity
, assume some role, and then on the S3 access point only allow access from that role. OK. But how do I know e.g. the backup script that is running nightly on your VPS is the backup script?This is simple to answer for e.g. an AWS EC2 instance. If you attach an IAM role to an AWS EC2 instance’s execution role, you delegate this problem to AWS. But this doesn’t help you, you want to access an S3 bucket from a VPS host. That’s fine, but then you need to solve the problem on a VPS host.
sts:AssumeRole
permission for this new role, what happens if you accidentally version-control those credentials or the VPS provider steals those credentials? Maybe this is OK if you rotate these hard-coded credentials regularly and monitor AWS CloudTrail logs for who uses the credentials.It’s at this point that people throw up their hands and say “Authentication is hard! I’ll just put the credentials in a file and set a calendar reminder to rotate them once every 30 days, and set up some tools to analyze CloudTrail access logs”. For your threat model this may be fine.
But maybe you can dive deeper into your threat model and think…hmmm. I’m backing up data from a VPS to S3. Surely this can be an append-only backup, and I only need to grant
s3:PutObject
permission to some role that the VPS can assume using STS. That way if someone steals the credentials the worst they can do is put more data in. I’d have to pay for it which sucks but they can’t read my data. Hmm, they could overwrite it? But I can set up object locks to prevent overwrites. etc.In the code I’m using
s3.create_bucket(...)
without any extra options - https://github.com/simonw/s3-credentials/blob/0.3/s3_credentials/cli.py#L92-L100 - and, as far as I can tell, the resulting buckets are not public. I just tried creating one, uploaded a file to it and then attempted to access the file by URL and got a permission error: https://simonw-test-bucket-is-this-public.s3.amazonaws.com/yourfilename.csv - am I missing something here? I definitely don’t want to be creating public buckets by default!This is why I wanted separate per-bucket credentials in the first place: I want to minimize the damage someone could do with stolen credentials should they access them. Much rather my VPS provider steals credentials for a single bucket than for my entire account!
That’s essentially what my
s3-credentials create name-of-bucket --write-only
option does - it creates a brand new user and applies this inline policy to them so that they can only write (with PutObject) to the specified bucket: https://github.com/simonw/s3-credentials/blob/0.3/s3_credentials/policies.py#L38-L48Object locks are interesting - I hadn’t seen those! I like the idea of using them to prevent leaked write-only credentials from being used to over-write previously written paths.
Whoa, I had not seen those before - looks like they were only added in 2019. Shall investigate, thank you!
I’d love to provide a link in the README to material people can read that has solid, easy-to-follow recommendations for the absolute best practices on this kind of stuff- but I’ve been hoping to run into useful information like that for years, and the best material still seems to show up in comments like this one!
No, then I am wrong, when you call
CreateBucket
the bucket is not publicly-readable by default.Maybe a blog post idea for me! It’s always tough putting your neck out giving prescriptive best practice advice because you may be wrong, or things may change. Also it’s a broad topic, it’s hard to focus and come up with narrow objectives.
As someone who spent a lot of this year on compliance audits and thinking about “least privilege in AWS”, this is such a detailed but also clear write up on this topic. Thank you for taking the time to write and post it.
PS: at first I wondered “who are you, who are so wise in the ways of science?”, but then I checked and saw you work at AWS, https://asim.ihsan.io/about/, and went 💡
You’re welcome!
My views don’t represent AWS. And maybe I’m missing something obvious about authenticating non-human tools. Please correct or enlighten me, I’m always learning.
This is a fantastically useful comment, thank you!
Do you mind if I quote bits of it in this issue thread? https://github.com/simonw/s3-credentials/issues/7
Yes you can quote it parts of it, just add “These views don’t represent AWS” at the end.
A word of caution: embedding the secrets in your apps is a big NO NO in aws land. I am afraid this solution goes exactly against what you are supposed to do, in the real case:
assume_role_with_web_identity
or similar ( after successful authentication, you can use openid or saml, or other federation if that’s your thing).note that likely amplify is the easiest way to deal with this currently
came here to post basically the same thing - storing credentials is the wrong approach, assume a role that’s scoped to exactly what it needs and leave secret issuance to STS.
I don’t understand how I can build my projects against this. If I’m going to call assume role I need to have credentials that let me call that, right? So something needs to be stored somewhere.
Here are some examples of things I have built or want to build with S3:
None of these cases feature an authenticated user session or any type - they all require me to create long lived credentials that I store in secrets.
Can I use assume role for these? If so, how?
For GitHub Actions you can now use OIDC to assume a role rather than long-lived credentials: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments
That does look like a good option for GitHub Actions - where my code is running in a contest that has an authenticated session I can exchange for another token - but it doesn’t help for cron scripts or anything where I want my code to run in a situation that doesn’t have access to credentials that can be exchanged in that way.
Confession; I’ve read that GitHub documentation on OIDC a couple of times now and it gives me the e impression that actually implementing that would take me the best part of a day to figure out (mostly at the AWS end) - it seems really hard! I wish it wasn’t.
There was an article that went around the other week about using AWS IoT to get temporary credentials for machines in a home lab: https://ideas.offby1.net/posts/automating-letsencrypt-route53-using-aws-iot.html
Does it have configurable endpoints? I’m using wasabi s3, and might try this tool out.
It doesn’t but it could do - adding that to the ticket: https://github.com/simonw/s3-credentials/issues/2#issuecomment-959554514
Nice, thanks!
I looked at the source code (before even reading the post) and I was extremely happy: more code should be written like this. Granted it’s a simple tool, but the code was simple and to the point. Most of my difficulties in understanding it came from not knowing much about S3 bucket credentials; I was never confused because the author used too many abstractions or tried to be clever in how they wrote the code. Congrats!
Thanks! I code for myself in six months time assuming I’ve been working on other things and have forgotten every detail of the current project.
I’ve said before that Amazon should create a new version of S3 (“S4”) that has sane defaults and workflows. There are whole business models that exist just because creating an S3 bucket is harder than it needs to be (JAMStack).
I’m reading more and more about different use cases for datasette, which is what the author is using this for, mostly.
On the one hand, I want to install it and play with my data. I’m kind of busy though, and I can’t find the time to integrate I’ve datasette with my existing workflows. In example, I don’t want to deal with transferring my photos from Google photos to s3.
On the other hand, I also want to just start small and make my own little set of personal data mining tools scoped precisely to myself. But that would take even more time.
In the end I just never dig into it in any way and use the precious little free time to read a book.
This is a constant challenge for me - I’m really keen on driving down the friction involved in trying out Datasette, because I have exactly the same problem exploring other projects myself.
The biggest improvements I’ve made here are the following:
brew install datasette
works and will get you theDatasette
CLI toolI’m always looking for new ways I can make getting started easier! I have a beta version of a SaaS hosted version too, which I’m currently making some major upgrades to.
Would definitely welcome suggestions!
Very nice, exactly what I needed!
I should look into doing that for SES too