The Web Application: Browsing Your Digital Life
After scanning your files and understanding your archive structure, you need a way to explore it. A web application transforms your archive from a directory of files into a browsable, searchable system that helps you understand what you have and decide what to keep.
This guide walks through building a Rails-based archive browser that separates read-only file scanning from application-level data management.
Why Rails?
Rails 8 provides everything needed for an archive browser without reinventing the wheel:
- Convention over configuration: Standard patterns for models, controllers, and routes mean less decision fatigue
- Batteries included: ActiveStorage, background jobs, and asset pipeline work out of the box
- Hotwire: Turbo and Stimulus deliver modern interactions without heavy JavaScript frameworks
- Mature ecosystem: Gems like Kaminari (pagination) and Ransack (search) solve common problems
The goal is rapid development. This is a personal tool, not a product. Rails lets you focus on archive-specific logic instead of plumbing.
Adapt This: Django, Laravel, or Phoenix work equally well. The key is choosing a framework you know that handles database queries, file uploads, and background jobs without extra setup.
The Hybrid Model Architecture
The most important architectural decision is separating the scanner layer from the application layer. This prevents Rails from accidentally modifying your file scan data.
Two-Layer Design
+------------------------------------------+
| Application Layer (Read/Write) |
| |
| Directory Project Photo Video |
| UniqueFile AudioFile Document |
| |
| Populated by rake tasks |
| Used by controllers and views |
+------------------+-----------------------+
|
| References via file_entry_id
|
+------------------v-----------------------+
| Scanner Layer (Read-Only) |
| |
| Scanner::FileEntry |
| |
| Wraps files table from Python scanner |
| Rails NEVER writes to this table |
+------------------------------------------+
Scanner Layer
The scanner layer consists of a single model that wraps the files table:
# app/models/scanner/file_entry.rb
module Scanner
class FileEntry < ApplicationRecord
self.table_name = 'files'
# Read-only model - prevent accidental writes
def readonly?
true
end
# Relationships to application models
has_one :directory
has_one :project
has_one :photo
has_one :video
has_one :audio_file
has_one :document
has_one :unique_file
# Useful scopes
scope :directories, -> { where(is_dir: true) }
scope :files, -> { where(is_dir: false) }
scope :by_path, -> { order(:path) }
end
end
The readonly? method prevents Rails from ever updating or deleting these records. The scanner owns this data.
Application Layer
Application models are populated by rake tasks that read from Scanner::FileEntry and create specialized records:
# app/models/photo.rb
class Photo < ApplicationRecord
include FileEntryBacked
include Backupable
belongs_to :file_entry, class_name: 'Scanner::FileEntry'
# Scopes for common queries
scope :with_gps, -> { where.not(latitude: nil, longitude: nil) }
scope :by_date, -> { order(date_taken: :desc) }
scope :recent, -> { where('date_taken > ?', 1.year.ago) }
# EXIF metadata
# latitude, longitude, date_taken, camera_make, camera_model, etc.
end
This separation means:
- The scanner runs independently: Python can rescan files without touching application data
- No write conflicts: Rails never modifies scan results
- Flexible schemas: Application models can have different structures than the raw file data
- Easy repopulation: Drop application tables and rebuild without losing scan data
Key Models and Their Purpose
The application layer uses specialized models optimized for specific file types and browsing patterns.
Directory (1.1M records)
Pre-computed folder statistics for fast navigation:
# app/models/directory.rb
class Directory < ApplicationRecord
include FileEntryBacked
belongs_to :file_entry, class_name: 'Scanner::FileEntry'
belongs_to :parent, class_name: 'Directory', optional: true
has_many :children, class_name: 'Directory', foreign_key: :parent_id
# Pre-computed stats (updated by rake tasks)
# total_size, file_count, dir_count, depth
scope :root, -> { where(parent_id: nil) }
scope :by_size, -> { order(total_size: :desc) }
def path
file_entry.path
end
end
Without pre-computation, calculating folder sizes requires recursively scanning children. With 1.1 million directories, that is too slow for web requests. The populate:directories task calculates these stats once.
Project (2,200 records)
Code repositories with git metadata:
# app/models/project.rb
class Project < ApplicationRecord
include FileEntryBacked
belongs_to :file_entry, class_name: 'Scanner::FileEntry'
# git_remote, github_url, is_git_repo, language
scope :with_git, -> { where(is_git_repo: true) }
scope :on_github, -> { where.not(github_url: nil) }
scope :by_language, ->(lang) { where(language: lang) }
end
Of 2,200 projects, 326 have git repositories and 294 are on GitHub. This model helps identify which code is backed up to cloud hosting.
UniqueFile (1.2M records)
Deduplicated files tracked by SHA-256 hash:
# app/models/unique_file.rb
class UniqueFile < ApplicationRecord
include Backupable
has_many :file_entries,
class_name: 'Scanner::FileEntry',
primary_key: :sha256,
foreign_key: :sha256
# sha256, size, backup_status, uploaded_at
scope :duplicates, -> { where('file_count > 1') }
scope :large, -> { where('size > ?', 100.megabytes) }
def human_size
ActiveSupport::NumberHelper.number_to_human_size(size)
end
end
This model powers duplicate detection and backup tracking. When multiple files share the same hash, you only need to back up the content once.
Photo (418,000 records)
Images with EXIF and GPS data:
# app/models/photo.rb
class Photo < ApplicationRecord
include FileEntryBacked
include Backupable
# latitude, longitude, date_taken, camera_make, camera_model
# width, height, orientation
scope :with_gps, -> { where.not(latitude: nil, longitude: nil) }
scope :portraits, -> { where(orientation: 'portrait') }
scope :by_camera, ->(make) { where(camera_make: make) }
def location?
latitude.present? && longitude.present?
end
end
Only 19,786 photos (4.7%) have GPS coordinates. This model helps find geotagged photos and identify camera equipment.
Video, AudioFile, and Document
Similar specialized models for other media types:
# 4,400 videos
class Video < ApplicationRecord
include FileEntryBacked
include Backupable
# duration, codec, resolution, frame_rate
end
# 40,000 audio files
class AudioFile < ApplicationRecord
include FileEntryBacked
include Backupable
# title, artist, album, duration, bitrate
end
# 98,000 documents
class Document < ApplicationRecord
include FileEntryBacked
include Backupable
# page_count, author, title, word_count
end
Each model includes only metadata relevant to its file type. This keeps queries fast and schemas focused.
Shared Concerns
Two concerns provide common functionality across application models.
FileEntryBacked
Links application models to scanner data:
# app/models/concerns/file_entry_backed.rb
module FileEntryBacked
extend ActiveSupport::Concern
included do
belongs_to :file_entry, class_name: 'Scanner::FileEntry'
delegate :path, :name, :size, :modified_at, to: :file_entry
end
def file_exists?
File.exist?(file_entry.path)
end
def human_size
ActiveSupport::NumberHelper.number_to_human_size(size)
end
def extension
File.extname(path).downcase.delete('.')
end
end
This concern eliminates repetitive delegation code and provides common file operations.
Backupable
Tracks backup status with a state machine:
# app/models/concerns/backupable.rb
module Backupable
extend ActiveSupport::Concern
included do
# backup_status: pending, uploaded, skipped, error
# uploaded_at, backup_key, backup_error
scope :pending_backup, -> { where(backup_status: 'pending') }
scope :backed_up, -> { where(backup_status: 'uploaded') }
scope :backup_failed, -> { where(backup_status: 'error') }
end
def mark_uploaded!(key)
update!(
backup_status: 'uploaded',
backup_key: key,
uploaded_at: Time.current
)
end
def mark_failed!(error)
update!(
backup_status: 'error',
backup_error: error.to_s
)
end
def needs_backup?
backup_status == 'pending'
end
end
This concern standardizes backup tracking across Photos, Videos, and other media models.
Essential Features
The web application provides four core features for archive exploration.
1. Dashboard with Archive Statistics
The homepage shows aggregate statistics:
# app/controllers/dashboard_controller.rb
class DashboardController < ApplicationController
def index
@stats = Rails.cache.fetch('archive_stats', expires_in: 1.hour) do
{
total_files: Scanner::FileEntry.files.count,
total_size: Scanner::FileEntry.sum(:size),
directories: Directory.count,
photos: Photo.count,
photos_with_gps: Photo.with_gps.count,
videos: Video.count,
projects: Project.count,
projects_on_github: Project.on_github.count,
unique_files: UniqueFile.count,
duplicates: UniqueFile.duplicates.count
}
end
end
end
Statistics are cached for one hour because calculating aggregates over millions of records is expensive.
2. Directory Navigation
Browse the archive by folder structure:
# app/controllers/browse_controller.rb
class BrowseController < ApplicationController
def show
path = params[:path] || '/'
@directory = Directory.joins(:file_entry)
.find_by(file_entries: { path: path })
return redirect_to root_path unless @directory
# Get immediate children
@subdirectories = @directory.children.by_size.limit(100)
# Get files in this directory
@files = Scanner::FileEntry.files
.where(parent_id: @directory.file_entry.id)
.order(:name)
.page(params[:page])
end
end
Routes:
# config/routes.rb
get '/browse/*path', to: 'browse#show', as: :browse
get '/browse', to: 'browse#show'
The wildcard route captures nested paths like /browse/Users/avi/Documents.
3. Category Browsing
View files by type:
# app/controllers/categories_controller.rb
class CategoriesController < ApplicationController
MODELS = {
'photos' => Photo,
'videos' => Video,
'audio' => AudioFile,
'documents' => Document,
'projects' => Project
}
def show
@category = params[:category]
@model = MODELS[@category]
return redirect_to root_path unless @model
@items = @model.includes(:file_entry)
.page(params[:page])
.per(50)
end
end
Routes:
get '/category/:category', to: 'categories#show', as: :category
This provides fast access to all photos, all videos, or all projects without navigating the directory tree.
4. Search
Fuzzy search across file paths:
# app/controllers/search_controller.rb
class SearchController < ApplicationController
def index
@query = params[:q]
return if @query.blank?
@results = Scanner::FileEntry.where(
'path ILIKE ?',
"%#{sanitize_sql_like(@query)}%"
).limit(100)
# Filter by category if specified
if params[:category].present?
@results = filter_by_category(@results, params[:category])
end
end
private
def filter_by_category(results, category)
case category
when 'photos'
results.joins(:photo)
when 'videos'
results.joins(:video)
# etc.
else
results
end
end
end
PostgreSQL’s ILIKE operator provides case-insensitive substring matching. For larger archives, consider adding full-text search with pg_search or integrating Elasticsearch.
Routes and Controllers
The complete routing structure:
# config/routes.rb
Rails.application.routes.draw do
root 'dashboard#index'
# Directory navigation
get '/browse/*path', to: 'browse#show', as: :browse
get '/browse', to: 'browse#show'
# Category browsing
get '/category/:category', to: 'categories#show', as: :category
# Search
get '/search', to: 'search#index'
# File details
resources :files, only: [:show]
# Projects
resources :projects, only: [:index, :show] do
collection do
get :github
get :local
end
end
# Photos
resources :photos, only: [:index, :show] do
collection do
get :map # GPS-tagged photos on a map
get :timeline
end
end
# Admin
namespace :admin do
get '/stats', to: 'stats#index'
post '/sync', to: 'sync#create'
end
end
Controllers follow Rails conventions:
- DashboardController: Homepage with statistics
- BrowseController: Directory tree navigation
- CategoriesController: Type-specific browsing
- SearchController: Full-archive search
- FilesController: Individual file details
- ProjectsController: Code repository management
- PhotosController: Photo gallery with map and timeline views
Adapt This: If you prefer GraphQL or JSON APIs, replace controller actions with resolvers or API endpoints. The model layer remains the same.
Population Workflow
The application layer starts empty. Rake tasks populate it from scanner data.
Main Population Task
# lib/tasks/populate.rake
namespace :populate do
desc "Populate all application models from scanner data"
task all: :environment do
Rake::Task['populate:directories'].invoke
Rake::Task['populate:unique_files'].invoke
Rake::Task['populate:projects'].invoke
Rake::Task['populate:photos'].invoke
Rake::Task['populate:videos'].invoke
Rake::Task['populate:audio'].invoke
Rake::Task['populate:documents'].invoke
end
task directories: :environment do
puts "Populating directories..."
Scanner::FileEntry.directories.find_each do |entry|
Directory.find_or_create_by!(file_entry: entry) do |dir|
# Find parent directory
parent_path = File.dirname(entry.path)
parent_entry = Scanner::FileEntry.find_by(path: parent_path)
dir.parent = Directory.find_by(file_entry: parent_entry) if parent_entry
# Calculate stats
dir.depth = entry.path.count('/')
dir.file_count = Scanner::FileEntry.files
.where(parent_id: entry.id)
.count
dir.total_size = calculate_total_size(entry)
end
end
puts "Created #{Directory.count} directories"
end
task photos: :environment do
puts "Populating photos..."
Scanner::FileEntry.where(category: 'image').find_each do |entry|
next if Photo.exists?(file_entry: entry)
# Extract EXIF data
exif = extract_exif(entry.path)
Photo.create!(
file_entry: entry,
latitude: exif[:latitude],
longitude: exif[:longitude],
date_taken: exif[:date_taken],
camera_make: exif[:camera_make],
camera_model: exif[:camera_model],
width: exif[:width],
height: exif[:height],
backup_status: 'pending'
)
end
puts "Created #{Photo.count} photos"
end
# Similar tasks for other models...
end
These tasks are idempotent. Running them multiple times safely updates existing records without creating duplicates.
Sync Tasks
After the scanner runs, sync tasks update application models:
# lib/tasks/sync.rake
namespace :sync do
desc "Sync all application models with scanner data"
task all: :environment do
Rake::Task['sync:cleanup'].invoke # Remove deleted files
Rake::Task['sync:stats'].invoke # Update statistics
Rake::Task['populate:all'].invoke # Add new files
end
task cleanup: :environment do
puts "Removing records for deleted files..."
# Find photos whose file_entry no longer exists
Photo.left_joins(:file_entry)
.where(file_entries: { id: nil })
.destroy_all
# Repeat for other models...
end
task stats: :environment do
puts "Updating directory statistics..."
Directory.find_each do |dir|
dir.update!(
file_count: Scanner::FileEntry.files
.where(parent_id: dir.file_entry_id)
.count,
total_size: calculate_total_size(dir.file_entry)
)
end
end
end
Run rails sync:all after each scanner run to keep the application layer in sync with file system changes.
Background Jobs with Solid Queue
Long-running operations use background jobs:
# app/jobs/backup_file_job.rb
class BackupFileJob < ApplicationJob
queue_as :default
def perform(photo_id)
photo = Photo.find(photo_id)
# Upload to Wasabi S3
key = "photos/#{photo.file_entry.sha256}#{photo.extension}"
File.open(photo.path, 'rb') do |file|
S3_CLIENT.put_object(
bucket: ENV['WASABI_BUCKET'],
key: key,
body: file
)
end
photo.mark_uploaded!(key)
rescue => e
photo.mark_failed!(e)
raise
end
end
Queue jobs from controllers or rake tasks:
# app/controllers/photos_controller.rb
class PhotosController < ApplicationController
def backup
@photo = Photo.find(params[:id])
BackupFileJob.perform_later(@photo.id)
redirect_to @photo, notice: 'Backup queued'
end
end
Solid Queue (built into Rails 8) handles job persistence, retries, and scheduling without external dependencies like Redis or Sidekiq.
UI with Hotwire and Tailwind
Views use Turbo for reactive updates and Tailwind for styling.
Directory Listing
<%# app/views/browse/show.html.erb %>
<div class="max-w-7xl mx-auto px-4">
<nav class="breadcrumbs text-sm text-gray-600 mb-4">
<%= render_breadcrumbs(@directory) %>
</nav>
<div class="grid grid-cols-1 gap-4">
<%# Subdirectories %>
<% @subdirectories.each do |subdir| %>
<%= link_to browse_path(subdir.path),
class: "block p-4 bg-white rounded shadow hover:shadow-lg" do %>
<div class="flex items-center justify-between">
<div>
<h3 class="font-semibold"><%= subdir.name %></h3>
<p class="text-sm text-gray-600">
<%= subdir.file_count %> files, <%= subdir.human_size %>
</p>
</div>
<svg class="w-6 h-6 text-gray-400"><!-- folder icon --></svg>
</div>
<% end %>
<% end %>
<%# Files %>
<% @files.each do |file| %>
<%= link_to file_path(file),
class: "block p-4 bg-gray-50 rounded hover:bg-gray-100" do %>
<div class="flex items-center justify-between">
<span><%= file.name %></span>
<span class="text-sm text-gray-600"><%= file.human_size %></span>
</div>
<% end %>
<% end %>
</div>
<%= paginate @files %>
</div>
Photo Gallery
<%# app/views/photos/index.html.erb %>
<div class="photo-grid grid grid-cols-4 gap-4">
<% @photos.each do |photo| %>
<%= link_to photo_path(photo),
data: { turbo_frame: "modal" } do %>
<div class="aspect-square bg-gray-200 rounded overflow-hidden">
<%= image_tag photo_thumbnail_url,
class: "w-full h-full object-cover",
loading: "lazy" %>
</div>
<% end %>
<% end %>
</div>
<%# Modal frame for photo details %>
<turbo-frame id="modal" class="modal"></turbo-frame>
Turbo Frames load photo details in a modal without full page reloads.
Search with Live Updates
<%# app/views/search/index.html.erb %>
<%= form_with url: search_path, method: :get,
data: { turbo_frame: "results", turbo_action: "advance" } do |f| %>
<%= f.text_field :q,
value: @query,
placeholder: "Search files...",
class: "w-full px-4 py-2 border rounded",
data: { action: "input->search#submit" } %>
<%= f.select :category,
options_for_select([['All', ''], ['Photos', 'photos'],
['Videos', 'videos'], ['Projects', 'projects']],
params[:category]),
{},
class: "ml-2 px-4 py-2 border rounded" %>
<% end %>
<turbo-frame id="results">
<%= render @results %>
</turbo-frame>
With a Stimulus controller for debounced search:
// app/javascript/controllers/search_controller.js
import { Controller } from "@hotwired/stimulus"
export default class extends Controller {
submit() {
clearTimeout(this.timeout)
this.timeout = setTimeout(() => {
this.element.requestSubmit()
}, 300)
}
}
This provides Google-style search with results appearing as you type.
Database Indexes
Proper indexing is critical for fast queries over millions of records:
# db/migrate/..._add_indexes.rb
class AddIndexes < ActiveRecord::Migration[8.0]
def change
# Scanner layer
add_index :files, :path, unique: true
add_index :files, :parent_id
add_index :files, :sha256
add_index :files, :category
add_index :files, [:is_dir, :parent_id]
# Directories
add_index :directories, :parent_id
add_index :directories, :file_entry_id, unique: true
add_index :directories, :total_size
# Photos
add_index :photos, :file_entry_id, unique: true
add_index :photos, [:latitude, :longitude]
add_index :photos, :date_taken
add_index :photos, :backup_status
# UniqueFiles
add_index :unique_files, :sha256, unique: true
add_index :unique_files, :backup_status
add_index :unique_files, :size
# Projects
add_index :projects, :file_entry_id, unique: true
add_index :projects, :is_git_repo
add_index :projects, :github_url
end
end
Without these indexes, queries like “find all photos with GPS” or “list directories by size” would require full table scans.
Configuration and Environment
Key configuration files:
Database
# config/database.yml
default: &default
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
development:
<<: *default
database: avi_archive
production:
<<: *default
database: avi_archive
username: <%= ENV['DATABASE_USER'] %>
password: <%= ENV['DATABASE_PASSWORD'] %>
Storage
# config/storage.yml
wasabi:
service: S3
access_key_id: <%= ENV['WASABI_ACCESS_KEY'] %>
secret_access_key: <%= ENV['WASABI_SECRET_KEY'] %>
region: us-east-1
bucket: <%= ENV['WASABI_BUCKET'] %>
endpoint: https://s3.wasabisys.com
Queue
# config/queue.yml
production:
dispatchers:
- polling_interval: 1
batch_size: 500
workers:
- queues: "*"
threads: 3
processes: 2
Solid Queue runs in the same process as Rails in development and as separate workers in production.
Testing Strategy
Focus tests on business logic, not framework behavior:
# test/models/photo_test.rb
class PhotoTest < ActiveSupport::TestCase
test "marks photo as uploaded with key" do
photo = photos(:one)
photo.mark_uploaded!('photos/abc123.jpg')
assert_equal 'uploaded', photo.backup_status
assert_equal 'photos/abc123.jpg', photo.backup_key
assert_not_nil photo.uploaded_at
end
test "identifies photos needing backup" do
pending = photos(:pending)
uploaded = photos(:uploaded)
assert pending.needs_backup?
refute uploaded.needs_backup?
end
end
# test/models/directory_test.rb
class DirectoryTest < ActiveSupport::TestCase
test "calculates depth from path" do
root = directories(:root)
nested = directories(:deeply_nested)
assert_equal 1, root.depth
assert_equal 5, nested.depth
end
test "builds parent hierarchy" do
child = directories(:child)
parent = directories(:parent)
assert_equal parent, child.parent
assert_includes parent.children, child
end
end
Integration tests verify the population workflow:
# test/integration/populate_test.rb
class PopulateTest < ActiveSupport::TestCase
test "populate:photos creates photo records" do
assert_difference 'Photo.count' do
Rake::Task['populate:photos'].execute
end
end
test "sync:cleanup removes orphaned records" do
photo = photos(:deleted_file)
photo.file_entry.destroy
assert_difference 'Photo.count', -1 do
Rake::Task['sync:cleanup'].execute
end
end
end
System tests verify the UI with real browser interactions:
# test/system/browse_test.rb
class BrowseTest < ApplicationSystemTestCase
test "navigates directory tree" do
visit root_path
click_on "Browse"
assert_selector "h1", text: "/"
click_on "Users"
assert_selector "h1", text: "/Users"
click_on "avi"
assert_selector "h1", text: "/Users/avi"
end
test "searches for files" do
visit root_path
fill_in "Search", with: "vacation"
assert_selector ".search-result", count: 3
end
end
Run tests with rails test and system tests with rails test:system.
Deployment
Deploy with Kamal (included in Rails 8) for zero-downtime deployments:
# config/deploy.yml
service: archive-browser
image: archive/browser
servers:
web:
hosts:
- 192.168.1.10
options:
network: "private"
env:
DATABASE_URL: postgresql://user:pass@localhost/avi_archive
WASABI_ACCESS_KEY: <%= ENV['WASABI_ACCESS_KEY'] %>
registry:
server: registry.example.com
username: deploy
password: <%= ENV['REGISTRY_PASSWORD'] %>
accessories:
db:
image: postgres:16
host: 192.168.1.10
port: 5432
env:
POSTGRES_PASSWORD: <%= ENV['POSTGRES_PASSWORD'] %>
directories:
- data:/var/lib/postgresql/data
Deploy with:
kamal setup # First time
kamal deploy # Subsequent deploys
For simpler deployments, use a VPS with systemctl:
# /etc/systemd/system/archive-browser.service
[Unit]
Description=Archive Browser
After=network.target
[Service]
Type=simple
User=deploy
WorkingDirectory=/home/deploy/archive-browser
Environment="RAILS_ENV=production"
Environment="DATABASE_URL=postgresql://..."
ExecStart=/home/deploy/.rbenv/shims/bundle exec puma -C config/puma.rb
Restart=always
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl enable archive-browser
sudo systemctl start archive-browser
Next Steps
With a functional web application, you can:
- Browse your archive: Navigate directories and view file metadata
- Search effectively: Find files by path, category, or content
- Track backups: Monitor which files have been uploaded to cloud storage
- Analyze patterns: Use aggregate statistics to understand your digital footprint
The next guide covers backup strategies, cloud storage integration, and ensuring your archive survives hardware failures.
Adapt This: The concepts here work with any web framework. The key principles are:
- Separate read-only scanner data from application data
- Use specialized models for different file types
- Pre-compute expensive statistics
- Index heavily for query performance
- Build features incrementally
Start with basic directory browsing and add features as needed. This is your archive, build what helps you most.
Next: Deduplication Analysis - Finding and managing duplicate files across your archive.