Backup to Amazon S3

Published: June 10, 2018  •  linux, selfhosted

There are many different ways how you can backup your data. My preferred solution is to store backups on Amazon S3. It's cheap and not that complicated to set up. First you need an Amazon account, if you not already have one. Go to https://aws.amazon.com/s3/ and click Sign Up.

In the following tutorial I will setup a backup for a self-hosted Gitea server, but you can apply this tutorial to other files you want to backup to S3.


Create Bucket

After signing up, open the S3 web console: https://s3.console.aws.amazon.com/s3/
Create a new bucket, choose a bucket name and select the region. Be aware that the prices are different depending on the region. Check the S3 pricing page.

backup_01

We don't need special properties for this bucket and make sure that you do not grant public access to the bucket.


Lifecycle rule

I usually add a lifecycle rule that automatically moves files after a few days from S3 to Glacier. Storing data on Glacier is much cheaper than S3 but downloading from Glacier cost you more and the files have to be stored at least 90 days on Glacier, if you delete them before additional fees apply. Glacier is especially useful for backups because you rarely need to download them.
Also check the documentation about other storage classes: https://aws.amazon.com/s3/storage-classes/

Adding a lifecycle rule only makes sense when you backup files that are revisioned. For instance: backup-1.zip, backup-2.zip, backup-3.zip or backup-20180601.tar.gz, backup-20180602.tar.gz. If you always overwrite the old backup files the lifecycle rule never applies.

Click on the bucket name and open the Management tab, then click on Lifecycle and Add lifecycle rule. backup_02

Enter a rule name and click Next backup_03

Under Transitions select Current Version and add a transition to Glacier after 5 days. backup_05

Under Expiration select Current Version and expire objects after 95 days. backup_04


Policy

Next we create a policy. Open the IAM console: https://console.aws.amazon.com/iam and go to Policies and create a new policy.

backup_06

Select the service S3, under Actions select the PutObject action. Under Resources specify the bucket that we created before. Make sure that the object ends with /*

backup_07

backup_08

Review the policy, give it a name and create it.


User

Go to Users and click on Add user.
backup_09

Enter a user name and select Programmatic access.
backup_10

Click on Next: Permissions. Click Attach existing policies directly and search for the policy you created in the previous step. Select the checkbox in front of the rule name.

backup_11

Click Next: Review and Create user.
The next dialog shows the Access key ID and Secret access key. You can also download both keys as text file (Download .csv).

backup_12


Install tools

On the VPS we install s3cmd, a command line client for Amazon S3.

sudo apt install s3cmd

Upload an arbitrary file to check if everything is set up correctly.

sudo s3cmd --access_key=AKIA.... --secret_key=LDf...  put /home/git/gitea/gitea s3://ralscha.giteabackup

The next package we install is gpg to encrypt our backups. This protects our backups from the eyes from Amazon and everybody else that gains access to our S3 bucket. This is optional, if you don't care about the security of your backup files you can skip this.

sudo apt install gpg

We will encrypt the backup with AES, a symmetric encryption algorithm. Here an example how you can use gpgto AES encrypt and decrypt a text file.

gpg --cipher-algo AES256 --symmetric --batch --passphrase the_passphrase test.txt
gpg --decrypt --batch --passphrase the_passphrase -o test.txt -d test.txt.gpg

Backup Gitea

Gitea provides the dump command that stores the configuration and the repositories into one zip file. We have to run the command with the git user.

cd /home/git/gitea/
sudo -H -u git bash -c "/home/git/gitea/gitea dump"

With all the pieces in place we can create a bash script that runs dump, encrypts the file and uploads it to S3.

cd /home/git/gitea/
sudo nano gitea-backup

Insert the following code. Insert the correct access_key and secret_key.

#!/bin/sh
cd /home/git/gitea
sudo -H -u git bash -c "/home/git/gitea/gitea dump"
gpg --cipher-algo AES256 --symmetric --batch --passphrase the_passphrase *.zip
s3cmd --access_key=AKIA... --secret_key=LDf...  put /home/git/gitea/*.zip.gpg s3://ralscha.giteabackup
rm /home/git/gitea/*.zip
rm /home/git/gitea/*.zip.gpg

Save it (ctrl+o), close the editor (ctrl+x) and change the permission so we can run the script.

sudo chmod 755 gitea-backup

Test the script with sudo ./gitea-backup. Visit the Amazon S3 web console and check if the file is stored in the bucket.


Setup systemd timer

Next we install a scheduler to run this script periodically. We do this with systemd which has a built in timer service.

Create a timer file

cd /home/git/gitea
sudo nano gitea-backup.timer

Add this code. This instructs systemd to run the backup script each day at 5 am.

[Unit]
Description=Run gitea-backup once a day

[Timer]
OnCalendar=*-*-* 05:00:00
RandomizedDelaySec=30
Persistent=true

[Install]
WantedBy=timers.target

Create the corresponding service file

sudo nano gitea-backup.service
[Unit]
Description=gitea-backup

[Service]
WorkingDirectory=/home/git/gitea
Type=oneshot
ExecStart=/home/git/gitea/gitea-backup

Then we link the two files into the /lib/systemd/system folder, start the timer and then enable it so it will be automatically started each time the server boots up.

sudo ln -s /home/git/gitea/gitea-backup.timer /lib/systemd/system/gitea-backup.timer
sudo ln -s /home/git/gitea/gitea-backup.service /lib/systemd/system/gitea-backup.service
sudo systemctl daemon-reload
sudo systemctl start gitea-backup.timer
sudo systemctl enable gitea-backup.timer

Check that the timer is installed

sudo systemctl list-timers

To test the service issue this command

sudo systemctl start gitea-backup