Ever locked yourself out of your own EC2 instance? I did exactly this week while "improving" my SSH configuration. The good news is I was able to get back in with a EC2 trick. I'll show you a proven method to regain access: EC2 user data scripts that can run on every boot.
Let me start with a confession: I'm supposed to know better. After >20 years of software engineering and preaching best practices about security and backup/recovery, I still managed to make a spectacularly dumb mistake. I was working on my remote development box, trying to rejigger git commit signing verification to work just the way I want... You can probably guess what happened next.
One nano ~/.ssh/authorized_keys
session later, I had successfully locked myself out of my own EC2 instance with the classic Permission denied (publickey)
error staring back at me mockingly. After a brief (albeit audible) facepalm, I managed to get back in. I also realized this was actually a great opportunity to document the recovery process. Because if I can make this mistake, so can others.
SSH authentication failures typically show up as Permission denied (publickey)
errors when you try to connect. In my case it was an incorrectly modified ~/.ssh/authorized_keys
file.
In my case, I had mangled the authorized_keys
file while trying to add keys in the wrong format. The keys that worked perfectly for git commit-signature verification – gpg.ssh.allowedSignersFile
– are a different format than for SSH authentication's AUTHORIZED_KEYS format.
Before we dive into the recovery methods, make sure you have:
This method uses EC2 user data with a special MIME multipart format to fix SSH access on every boot. It's the approach that saved me, and it's surprisingly clean once you know the right format. I figured out this right format from this article.
On your local machine, get your public key:
ssh-add -L
Copy the entire key line that looks like:
ssh-rsa AAAAB3NabcC1xyzEAAAADAQAB... your-key-name
Here's where it gets interesting. AWS has specific requirements for making user data run on every boot, not just the first time. You need this exact MIME multipart format (again my reference for this was this article):
Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0
--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"
#cloud-config
cloud_final_modules:
- [scripts-user, always]
--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"
#!/bin/bash
mkdir -p /home/YOUR_USERNAME/.ssh
echo "YOUR_PUBLIC_KEY_HERE" > /home/YOUR_USERNAME/.ssh/authorized_keys
chown -R YOUR_USERNAME:YOUR_USERNAME /home/YOUR_USERNAME/.ssh
chmod 700 /home/YOUR_USERNAME/.ssh
chmod 600 /home/YOUR_USERNAME/.ssh/authorized_keys
--//--
The MIME format seemed important to to trigger it to run on every boot rather than just the first one (i.e. via the scripts-user, always
bit).
Replace:
YOUR_USERNAME
with your SSH username (common ones: ec2-user
, ubuntu
, vagrant
)YOUR_PUBLIC_KEY_HERE
with the full public key from step 1Wait for the instance to fully boot (grab some coffee), then test:
ssh your-username@your-instance-ip
If it works, you should feel that familiar rush of relief. If not, don't worry - there is another option...
Once SSH works, immediately remove the user data script:
This prevents the script from running on every future boot and potentially overwriting intentional SSH changes.
If the user data method doesn't work, this more involved approach should work. I didn't have to try this so I'm not sure of the exact detailed steps, but the high level steps are:
ssh ec2-user@rescue-instance-ip
sudo mkdir /mnt/broken-disk
sudo mount /dev/xvdf1 /mnt/broken-disk # might be /dev/nvme1n1p1
/mnt/broken-disk/home/YOUR_USERNAME/.ssh/authorized_keys
as we did in the script above)./dev/sda1
That's it? Again, I didn't try try this, but if you did and it worked (or not) let me know with a comment!
After getting back into my instance and fixing my original Git signing issue, I realized a few things:
cp ~/.ssh/authorized_keys ~/.ssh/authorized_keys.backup
Since my original problem was confusing Git signing key formats with SSH authentication, here's the distinction:
SSH authorized_keys format:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAXMo4T... comment
Git allowed_signers format:
user@example.com ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAXMo4T... comment
Different files, different formats. Who knew? Again, the references are:
SSH lockouts happen to the best of us - even those of us who should definitely know better. The user data method with MIME multipart format is the solution that worked for me, while EBS volume recovery is a more involved option.
Remember to clean up temporary fixes and implement prevention measures. And maybe, just maybe, don't edit critical SSH configuration files when you're feeling eager to get Git commit signing working "real quick". Or at least test it in a second terminal before disconnecting!
Trust me on that last one.