Cloud Backup: Protecting Your Digital Treasures
Your archive database is working. Duplicates are identified. You know exactly what you have. Now comes the most important part: making sure you never lose it.
Local storage fails. Hard drives die. Houses burn down. Laptops get stolen. Floods happen. The only backup strategy that survives disaster is offsite backup. That means cloud storage.
Why Cloud Backup Matters
Hard drives have a 100% failure rate. Not “might fail.” Will fail. The question is when, not if.
Your external drive spinning in your closet right now? Average lifespan is 3 to 5 years. SSDs? 5 to 7 years, then the NAND flash cells start degrading. NAS drives? Better, but still mechanical parts that wear out.
Local backups protect against accidents, not disasters. Time Machine on an external drive saves you when you accidentally delete a file. It does nothing when your house burns down. Or floods. Or gets robbed. Or when both your laptop and backup drive fail in the same week because they were both in your backpack.
Cloud storage survives everything. Proper cloud object storage like S3 has 99.999999999% durability (11 nines). Your data is replicated across multiple geographic regions. AWS could lose an entire data center and your files would be fine. Your house could disappear and your files would be fine.
That wedding video? Those photos of your kids? Your college thesis? If they only exist on local drives, they are temporary. Eventually, you will lose them. Cloud backup makes them permanent.
The Deduplication Advantage
Here is where the archive system pays off. After deduplication, you don’t need to back up 1.47 TB. You need to back up 732 GB.
Total data across all backups: 1.47 TB Unique data after deduplication: 732 GB Duplicate waste: 735 GB (50%)
You save 50% on storage costs immediately. That wedding video you had in four places? Back it up once. Those photo library copies scattered across three drives? One backup covers all of them.
This is the power of content-based deduplication. The cloud backup job processes unique files only. Duplicates are marked “skipped” and reference the canonical copy. You pay once, protect everything.
S3-Compatible Storage Options
“S3-compatible” means it speaks Amazon’s S3 API. Your Rails app doesn’t care who runs the actual storage. AWS, Backblaze, Wasabi, or MinIO all work the same way from the application layer.
AWS S3 (Standard)
- Cost: $0.023/GB/month
- Egress: $0.09/GB
- 732 GB archive: $16.84/month
- Pros: Most compatible, every tool supports it, instant availability
- Cons: Expensive for long-term storage, egress fees add up
Backblaze B2
- Cost: $0.006/GB/month
- Egress: First 3x storage free, then $0.01/GB
- 732 GB archive: $4.39/month
- Pros: Cheapest storage, generous free egress (2.2 TB/month free)
- Cons: Slower than AWS in some regions
Wasabi Hot Cloud Storage
- Cost: $0.0069/GB/month (min 1 TB billing)
- Egress: $0 (unlimited free)
- 732 GB archive: $5.05/month (charged for 1 TB minimum)
- Pros: No egress fees, fast, predictable costs
- Cons: 1 TB minimum billing, 90-day minimum retention
MinIO (Self-Hosted)
- Cost: Your server costs
- Egress: Free (your bandwidth)
- 732 GB archive: Depends on your server
- Pros: Full control, no vendor lock-in, free software
- Cons: You manage infrastructure, redundancy, backups of backups
Why This Choice: Wasabi
This project uses Wasabi for three reasons:
1. No egress fees. Restoring your entire archive costs $0. With AWS S3, downloading 732 GB costs $65.88. With Wasabi, it costs nothing. For a disaster recovery scenario where you need everything back, this is huge.
2. Predictable pricing. The price is the price. No surprise charges for API calls, no tiered storage math, no calculating free egress allowances. $0.0069/GB/month, period.
3. S3-compatible API. ActiveStorage works out of the box. No custom adapters, no compatibility layers. It is S3 as far as Rails is concerned.
The 1 TB minimum is fine. This archive is 732 GB now and will grow. Photos and videos accumulate over time. You will hit 1 TB eventually.
Cost Comparison Table
| Provider | Storage Cost | Egress Cost | Monthly (732 GB) | Full Restore Cost |
|---|---|---|---|---|
| AWS S3 Standard | $0.023/GB | $0.09/GB | $16.84 | $65.88 |
| Backblaze B2 | $0.006/GB | $0.01/GB* | $4.39 | $0 (free tier) |
| Wasabi | $0.0069/GB | $0 | $5.05** | $0 |
| MinIO | Server cost | $0 | Varies | $0 |
* First 3x storage size is free (2.2 TB/month for 732 GB) ** Billed at 1 TB minimum ($6.90/month)
Over one year:
- AWS S3: $202/year + restore costs
- Backblaze B2: $53/year
- Wasabi: $83/year (1 TB billing)
- MinIO: Your server costs
For 732 GB of irreplaceable family photos and videos, $83/year is cheap insurance.
ActiveStorage Configuration
Rails 8 includes ActiveStorage for cloud file management. Configure Wasabi (or any S3-compatible provider) in config/storage.yml:
# config/storage.yml
wasabi:
service: S3
access_key_id: <%= Rails.application.credentials.dig(:wasabi, :access_key_id) %>
secret_access_key: <%= Rails.application.credentials.dig(:wasabi, :secret_access_key) %>
region: <%= Rails.application.credentials.dig(:wasabi, :region) || 'us-east-1' %>
endpoint: https://s3.<%= Rails.application.credentials.dig(:wasabi, :region) || 'us-east-1' %>.wasabisys.com
bucket: avi-archive
force_path_style: true
Why force_path_style: true? S3 supports two URL styles: virtual-hosted (bucket.s3.amazonaws.com) and path-style (s3.amazonaws.com/bucket). AWS prefers virtual-hosted. Wasabi requires path-style. The flag forces path-style URLs.
Store credentials securely:
# Edit encrypted credentials
rails credentials:edit
# Add Wasabi credentials
wasabi:
access_key_id: YOUR_ACCESS_KEY
secret_access_key: YOUR_SECRET_KEY
region: us-east-1
bucket: your-archive-bucket
Configure the service in your environment:
# config/environments/production.rb
config.active_storage.service = :wasabi
# config/environments/development.rb
config.active_storage.service = :local # Use local disk in dev
ActiveStorage now routes all file operations through Wasabi in production.
Backup Status Tracking
The Backupable concern adds cloud backup status to any model:
# app/models/concerns/backupable.rb
module Backupable
extend ActiveSupport::Concern
BACKUP_STATUSES = %w[pending uploaded skipped error].freeze
included do
validates :backup_status, inclusion: { in: BACKUP_STATUSES }, allow_nil: true
scope :backup_pending, -> { where(backup_status: [nil, "pending"]) }
scope :backup_complete, -> { where(backup_status: "uploaded") }
scope :backup_error, -> { where(backup_status: "error") }
end
def mark_uploaded!(url)
update!(backup_status: "uploaded", backup_url: url, backed_up_at: Time.current)
end
def mark_error!(message)
update!(backup_status: "error", backup_error: message)
end
def mark_skipped!
update!(backup_status: "skipped")
end
end
Status values:
pending(ornil): Not yet uploadeduploaded: Successfully backed up to cloudskipped: Duplicate, references another file’s backuperror: Upload failed, seebackup_errorfor details
Models using Backupable:
UniqueFile: One row per unique content hash (732 GB of unique files)Photo: Individual photos with EXIF dataVideo: Video filesAudioFile: Audio filesDocument: Documents
The Backup Workflow
Upload unique files only. Skip duplicates. Track everything.
Step 1: Filter for unique files
# Find all unique files that need backup
files_to_backup = UniqueFile
.where(needs_backup: true)
.where(backup_status: [nil, "pending", "error"])
.where.not(content_hash: nil)
.order(:category, size_bytes: :desc) # Prioritize: photos first, largest first
.limit(1000)
This query:
- Filters files marked
needs_backup: true - Skips already uploaded files
- Includes failed uploads for retry
- Orders by category (photos first) and size (largest first)
- Batches 1000 at a time
Step 2: Upload each file
# lib/tasks/backup.rake
namespace :backup do
desc "Upload unique files to cloud storage"
task upload: :environment do
files_to_backup = UniqueFile.backup_pending.limit(1000)
files_to_backup.find_each do |unique_file|
begin
# Check if file exists on disk
unless File.exist?(unique_file.canonical_path)
unique_file.mark_error!("File not found: #{unique_file.canonical_path}")
next
end
# Read file content
file_content = File.open(unique_file.canonical_path)
# Upload to ActiveStorage
blob = ActiveStorage::Blob.create_and_upload!(
io: file_content,
filename: File.basename(unique_file.canonical_path),
content_type: MIME::Types.type_for(unique_file.extension).first&.content_type || "application/octet-stream",
metadata: {
content_hash: unique_file.content_hash,
original_path: unique_file.canonical_path,
size_bytes: unique_file.size_bytes,
category: unique_file.category
}
)
# Mark as uploaded
unique_file.mark_uploaded!(blob.url)
puts "[OK] Uploaded: #{unique_file.canonical_path} (#{unique_file.size_bytes} bytes)"
rescue => e
unique_file.mark_error!(e.message)
puts "[FAIL] Failed: #{unique_file.canonical_path} - #{e.message}"
end
end
end
end
Step 3: Skip duplicates
# Mark duplicates as skipped (they reference the canonical backup)
namespace :backup do
desc "Mark duplicate files as skipped"
task skip_duplicates: :environment do
# For each unique file that's uploaded, mark all duplicate references as skipped
UniqueFile.backup_complete.find_each do |unique_file|
# Find all scanner file entries with the same content hash
Scanner::FileEntry
.where(content_hash: unique_file.content_hash)
.where.not(path: unique_file.canonical_path)
.find_each do |duplicate|
# Update Photo/Video/AudioFile/Document records that reference this duplicate
Photo.find_by(file_entry_id: duplicate.id)&.mark_skipped!
Video.find_by(file_entry_id: duplicate.id)&.mark_skipped!
AudioFile.find_by(file_entry_id: duplicate.id)&.mark_skipped!
Document.find_by(file_entry_id: duplicate.id)&.mark_skipped!
end
end
puts "Marked duplicates as skipped"
end
end
Step 4: Verify uploads
namespace :backup do
desc "Verify uploaded files exist in cloud storage"
task verify: :environment do
UniqueFile.backup_complete.find_each do |unique_file|
begin
# Parse blob key from backup_url
blob = ActiveStorage::Blob.find_by(key: extract_blob_key(unique_file.backup_url))
unless blob&.service&.exist?(blob.key)
unique_file.mark_error!("Blob missing from cloud storage")
puts "[FAIL] Missing: #{unique_file.canonical_path}"
else
puts "[OK] Verified: #{unique_file.canonical_path}"
end
rescue => e
unique_file.mark_error!("Verification failed: #{e.message}")
puts "[FAIL] Error: #{unique_file.canonical_path} - #{e.message}"
end
end
end
def extract_blob_key(url)
# Extract ActiveStorage blob key from URL
# Example URL: https://s3.us-east-1.wasabisys.com/avi-archive/abc123xyz
url.split('/').last
end
end
Content Hash Verification
Before marking a file as uploaded, verify the content hash matches:
def verify_upload(unique_file, blob)
# Download the first chunk to verify
blob.download do |chunk|
remote_hash = Digest::SHA256.hexdigest(chunk)
local_hash = unique_file.content_hash
unless remote_hash == local_hash
raise "Hash mismatch: local=#{local_hash}, remote=#{remote_hash}"
end
end
true
end
For large files, consider streaming hash calculation:
def streaming_hash(io)
digest = Digest::SHA256.new
while chunk = io.read(8192)
digest.update(chunk)
end
digest.hexdigest
end
This avoids loading entire files into memory.
Disaster Recovery Considerations
Scenario: Total data loss. Your laptop dies. Your NAS fails. Your external drives are gone. Everything local is destroyed.
Recovery plan:
- Install Rails app on a new machine
- Restore PostgreSQL database from backup (you are backing this up too, right?)
- Run restore rake task to download all files from cloud storage
- Verify content hashes match database records
- Rebuild local directory structure
Restore rake task:
namespace :backup do
desc "Restore all files from cloud storage"
task restore: :environment do
restore_path = ENV['RESTORE_PATH'] || Rails.root.join('restored_archive')
FileUtils.mkdir_p(restore_path)
UniqueFile.backup_complete.find_each do |unique_file|
begin
blob = ActiveStorage::Blob.find_by(key: extract_blob_key(unique_file.backup_url))
# Recreate directory structure
relative_path = unique_file.canonical_path.sub(/^\/Volumes\/[^\/]+/, '')
full_path = File.join(restore_path, relative_path)
FileUtils.mkdir_p(File.dirname(full_path))
# Download file
blob.download do |chunk|
File.write(full_path, chunk)
end
puts "[OK] Restored: #{full_path}"
rescue => e
puts "[FAIL] Failed: #{unique_file.canonical_path} - #{e.message}"
end
end
end
end
Run with:
RESTORE_PATH=/Volumes/NewDrive/Archive rails backup:restore
Database backup is critical. The cloud storage holds your files. The database holds the metadata, EXIF data, GPS locations, categories, and deduplication mapping. Back up PostgreSQL regularly:
# Dump database
pg_dump avi_archive > avi_archive_backup_$(date +%Y%m%d).sql
# Upload to cloud
aws s3 cp avi_archive_backup_20260202.sql s3://avi-archive-db-backups/
Automate this with cron or a scheduled job. Daily database backups are cheap insurance.
Watch Out: Egress Fees
The hidden cost of cloud storage is getting your data back.
AWS S3 charges $0.09/GB for egress (downloads). Restoring 732 GB costs $65.88. If you restore twice, you have paid $131.76 in bandwidth fees. That is more than a year of storage.
Backblaze B2 offers 3x your storage size as free egress per month. For 732 GB stored, you get 2.2 TB/month free downloads. Restoring once per month is free. Restoring twice costs money.
Wasabi has zero egress fees. Restore your entire archive ten times. Download it daily. Transfer it to another provider. It costs $0.
Why this matters: Disaster recovery means downloading everything. Testing your backups (which you should do) means downloading files. Migrating to another provider means downloading everything. Egress fees turn disaster recovery into a financial decision. That is a bad position to be in when your house is on fire.
Free egress removes this friction. You can test restores without anxiety. You can move data freely. You can recover from disasters without calculating costs.
Adapt This: Choose Your Storage Provider
This guide uses Wasabi, but the architecture works with any S3-compatible provider. Swap the config/storage.yml configuration and everything else stays the same.
If you prioritize lowest cost: Use Backblaze B2 ($4.39/month for 732 GB). The free egress tier covers most disaster recovery scenarios.
If you prioritize compatibility: Use AWS S3. Every tool, every library, every service supports it. You pay more, but you never fight compatibility issues.
If you prioritize control: Use MinIO on your own server. You manage the infrastructure, but you own the data completely. Good for privacy-sensitive archives.
If you prioritize simplicity: Use Wasabi. One price, no surprise fees, works exactly like S3. This is the “set it and forget it” option.
The Rails ActiveStorage layer abstracts the provider. The backup workflow is identical regardless of where files land. Choose the provider that matches your priorities, update the config, and ship it.
Backup Priorities
Not all files are equally important. Prioritize backups by category:
Priority 1: Photos (418k files, ~300 GB) Irreplaceable. Family memories, travel photos, events. Back these up first.
Priority 2: Videos (4.4k files, ~200 GB) Also irreplaceable. Graduations, weddings, kids growing up. Second priority.
Priority 3: Documents (98k files, ~50 GB) Tax records, college papers, legal documents. Important but often reproducible.
Priority 4: Audio (40k files, ~80 GB) Music collections, voice memos, podcasts. Nice to have, but often re-downloadable.
Priority 5: Code Projects (2.2k projects, ~100 GB) Your own projects are valuable. Dependencies and node_modules are not. Back up source, skip build artifacts.
Run separate backup jobs for each priority. Start with photos and videos. If you run out of budget or storage, at least the irreplaceable stuff is safe.
Monitoring and Alerts
Track backup progress with simple metrics:
# app/models/backup_stats.rb
class BackupStats
def self.summary
{
total_files: UniqueFile.count,
uploaded: UniqueFile.backup_complete.count,
pending: UniqueFile.backup_pending.count,
errors: UniqueFile.backup_error.count,
skipped: UniqueFile.where(backup_status: "skipped").count,
total_bytes_backed_up: UniqueFile.backup_complete.sum(:size_bytes),
percent_complete: (UniqueFile.backup_complete.count.to_f / UniqueFile.count * 100).round(2)
}
end
end
Add a dashboard page:
<!-- app/views/backup_status/index.html.erb -->
<h1>Cloud Backup Status</h1>
<% stats = BackupStats.summary %>
<div class="stats">
<div class="stat">
<h3>Files Uploaded</h3>
<p><%= stats[:uploaded].to_s(:delimited) %> / <%= stats[:total_files].to_s(:delimited) %></p>
<p><%= stats[:percent_complete] %>%</p>
</div>
<div class="stat">
<h3>Bytes Backed Up</h3>
<p><%= number_to_human_size(stats[:total_bytes_backed_up]) %></p>
</div>
<div class="stat">
<h3>Pending</h3>
<p><%= stats[:pending].to_s(:delimited) %></p>
</div>
<div class="stat">
<h3>Errors</h3>
<p><%= stats[:errors].to_s(:delimited) %></p>
</div>
</div>
Set up alerts for errors:
# Check for upload errors daily
namespace :backup do
desc "Report backup errors"
task report_errors: :environment do
errors = UniqueFile.backup_error.limit(100)
if errors.any?
puts "Backup errors detected:"
errors.each do |file|
puts " #{file.canonical_path}: #{file.backup_error}"
end
# Send email alert (optional)
BackupMailer.error_report(errors).deliver_later
else
puts "No backup errors"
end
end
end
Run this daily with cron or a scheduled job.
Conclusion
Cloud backup is the final step. Everything before this (scanning, deduplication, categorization) prepares the data. This step makes it permanent.
732 GB of unique content. Not 1.47 TB. Deduplication saves 50% of backup costs.
S3-compatible storage. Wasabi, Backblaze, AWS, or MinIO. The architecture works with any provider.
Backup status tracking. Every file knows if it is pending, uploaded, skipped, or errored.
Verification. Content hashes confirm uploads are correct.
Disaster recovery. Restore everything from cloud with a single rake task.
Your digital archive is now protected against every disaster except the end of the internet. Hard drives will fail. Laptops will die. Houses might burn. Your files will survive.
Next: Customization Guide - Making the system work for your specific needs.