Last Week
Recording outside as summer comes to an end! Last week was productive - I successfully set up Kermit the Log and used it to diagnose a phantom UI error. The database insert was succeeding, but the frontend was showing failure. Turns out the post-insert fetch operation had a parsing error that was masquerading as an insert failure. Fixed the error handling to report these separately and resolved the parsing issue itself.
Email Infrastructure Overhaul
This week I’m tackling one of the bigger architectural challenges: email infrastructure. The naive implementation from earlier this year - one Supabase Edge Function invocation per email - won’t scale. Edge Functions aren’t truly serverless when it comes to billing; you pay for setup and teardown time on every invocation.
I’m implementing an outbox pattern with cron-based draining and webhook integration for event tracking. This touches the database schema, Edge Functions, and external service integration. Once complete, the MVP is essentially done.
What does it mean in English?
Instead of sending emails immediately when triggered, the app now queues them in an “outbox” table. A background job checks this outbox periodically and sends all pending emails in batches. This dramatically reduces costs and adds reliability through proper event tracking from the email service.
Nerdy Details
The current email architecture has fundamental scaling problems that need addressing before production deployment. Let me walk through the complete redesign, implementation details, and architectural decisions.
The Problem with Naive Edge Functions
Edge Functions are marketed as “serverless” - you only pay for what you use. But this is misleading. Every Edge Function invocation includes:
- Cold start time (100-500ms for Deno runtime initialization)
- Network handshake (50-200ms for TLS negotiation)
- Function execution (your actual code)
- Cleanup and teardown (50-100ms)
For a simple email send that might take 200ms of actual execution, you’re paying for 400-800ms of overhead. At scale, this becomes expensive:
// Current naive implementation
async function sendNotificationEmail(guestId: string, message: string) {
// This triggers a new Edge Function invocation
const { data, error } = await supabase.functions.invoke('send-email', {
body: {
to: getGuestEmail(guestId),
subject: 'Your table is ready!',
html: message
}
});
// Total time billed: ~600ms
// Actual work time: ~200ms
// Overhead: 66%
}
At 1000 emails per day, you’re paying for 400 seconds of overhead. At 100,000 emails per day, that’s 11 hours of wasted compute time.
The Outbox Pattern Architecture
The outbox pattern decouples message production from message sending. Here’s the complete implementation:
-- Email outbox table schema
CREATE TABLE email_outbox (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
created_at TIMESTAMPTZ DEFAULT NOW(),
scheduled_for TIMESTAMPTZ DEFAULT NOW(),
priority INTEGER DEFAULT 5, -- 1 = highest, 10 = lowest
attempts INTEGER DEFAULT 0,
max_attempts INTEGER DEFAULT 3,
last_attempt_at TIMESTAMPTZ,
status TEXT DEFAULT 'pending', -- pending, sending, sent, failed
-- Email content
to_email TEXT NOT NULL,
cc_emails TEXT[],
bcc_emails TEXT[],
subject TEXT NOT NULL,
html_body TEXT,
text_body TEXT,
-- Metadata
correlation_id UUID, -- Links to original business event
tenant_id UUID NOT NULL,
email_type TEXT NOT NULL, -- notification, marketing, transactional
-- Resend specific
resend_id TEXT, -- ID returned by Resend API
resend_status TEXT, -- delivered, bounced, complained, etc.
-- Error tracking
error_message TEXT,
error_details JSONB,
-- Indexes for performance
INDEX idx_outbox_status_scheduled (status, scheduled_for),
INDEX idx_outbox_priority_status (priority, status, scheduled_for),
INDEX idx_outbox_correlation (correlation_id),
INDEX idx_outbox_resend_id (resend_id)
);
-- Separate high-priority outbox for urgent notifications
CREATE TABLE email_outbox_priority (
-- Same schema as above
-- Separate table allows different cron intervals
) INHERITS (email_outbox);
The Drainer Implementation
The drainer is a cron-triggered Edge Function that processes the outbox:
// Edge Function: drain-email-outbox.ts
import { createClient } from '@supabase/supabase-js';
import { Resend } from 'resend';
const BATCH_SIZE = 50; // Process up to 50 emails per invocation
const LOCK_TIMEOUT_MS = 30000; // 30 second timeout for processing
interface EmailOutboxRow {
id: string;
to_email: string;
cc_emails?: string[];
bcc_emails?: string[];
subject: string;
html_body?: string;
text_body?: string;
attempts: number;
max_attempts: number;
priority: number;
email_type: string;
correlation_id?: string;
tenant_id: string;
}
export default async function drainOutbox(req: Request) {
const supabase = createClient(
Deno.env.get('SUPABASE_URL')!,
Deno.env.get('SUPABASE_SERVICE_KEY')!
);
const resend = new Resend(Deno.env.get('RESEND_API_KEY')!);
// Use advisory locks to prevent concurrent processing
const lockResult = await supabase.rpc('pg_try_advisory_lock', {
key: 123456 // Arbitrary lock ID for email drainer
});
if (!lockResult.data) {
return new Response(
JSON.stringify({ message: 'Another drainer is running' }),
{ status: 200 }
);
}
try {
// Fetch pending emails with row-level locking
const { data: emails, error: fetchError } = await supabase
.from('email_outbox')
.select('*')
.in('status', ['pending', 'failed'])
.lte('scheduled_for', new Date().toISOString())
.lt('attempts', supabase.raw('max_attempts'))
.order('priority', { ascending: true })
.order('scheduled_for', { ascending: true })
.limit(BATCH_SIZE)
.lock('for update skip locked'); // Skip rows being processed
if (fetchError) throw fetchError;
if (!emails || emails.length === 0) {
return new Response(
JSON.stringify({ message: 'No emails to send' }),
{ status: 200 }
);
}
// Process emails in parallel with controlled concurrency
const results = await processEmailBatch(emails, resend, supabase);
return new Response(
JSON.stringify({
processed: results.length,
successful: results.filter(r => r.success).length,
failed: results.filter(r => !r.success).length
}),
{ status: 200 }
);
} finally {
// Always release the advisory lock
await supabase.rpc('pg_advisory_unlock', { key: 123456 });
}
}
async function processEmailBatch(
emails: EmailOutboxRow[],
resend: Resend,
supabase: SupabaseClient
) {
// Process in chunks of 10 for rate limiting
const chunks = chunkArray(emails, 10);
const results = [];
for (const chunk of chunks) {
const chunkResults = await Promise.allSettled(
chunk.map(email => sendEmail(email, resend, supabase))
);
results.push(...chunkResults);
// Rate limiting pause between chunks
await new Promise(resolve => setTimeout(resolve, 100));
}
return results.map((result, index) => ({
emailId: emails[index].id,
success: result.status === 'fulfilled',
error: result.status === 'rejected' ? result.reason : null
}));
}
async function sendEmail(
email: EmailOutboxRow,
resend: Resend,
supabase: SupabaseClient
) {
try {
// Update status to sending
await supabase
.from('email_outbox')
.update({
status: 'sending',
last_attempt_at: new Date().toISOString(),
attempts: email.attempts + 1
})
.eq('id', email.id);
// Send via Resend
const { data, error } = await resend.emails.send({
from: getFromAddress(email.email_type, email.tenant_id),
to: email.to_email,
cc: email.cc_emails,
bcc: email.bcc_emails,
subject: email.subject,
html: email.html_body,
text: email.text_body,
tags: [
{ name: 'tenant_id', value: email.tenant_id },
{ name: 'email_type', value: email.email_type },
{ name: 'correlation_id', value: email.correlation_id || 'none' }
]
});
if (error) throw error;
// Update with success
await supabase
.from('email_outbox')
.update({
status: 'sent',
resend_id: data.id,
error_message: null,
error_details: null
})
.eq('id', email.id);
return { success: true, resendId: data.id };
} catch (error) {
// Update with failure
const isRetryable = isRetryableError(error);
const newStatus =
email.attempts + 1 >= email.max_attempts ? 'failed' :
isRetryable ? 'pending' : 'failed';
await supabase
.from('email_outbox')
.update({
status: newStatus,
error_message: error.message,
error_details: {
error: serializeError(error),
retryable: isRetryable,
timestamp: new Date().toISOString()
},
// Exponential backoff for retries
scheduled_for: isRetryable ?
new Date(Date.now() + Math.pow(2, email.attempts) * 60000).toISOString() :
null
})
.eq('id', email.id);
throw error;
}
}
function isRetryableError(error: any): boolean {
// Network errors, rate limits, and temporary failures
const retryableStatuses = [429, 500, 502, 503, 504];
const retryableMessages = [
'ECONNREFUSED',
'ETIMEDOUT',
'ENOTFOUND',
'rate limit',
'quota exceeded'
];
if (error.status && retryableStatuses.includes(error.status)) {
return true;
}
const message = error.message?.toLowerCase() || '';
return retryableMessages.some(msg => message.includes(msg.toLowerCase()));
}
Webhook Integration for Event Tracking
The second major improvement is implementing webhook handling for Resend events:
// Edge Function: resend-webhook.ts
import { createHmac } from 'https://deno.land/std/node/crypto.ts';
interface ResendWebhookEvent {
type: 'email.sent' | 'email.delivered' | 'email.delivery_delayed' |
'email.bounced' | 'email.complained' | 'email.opened' | 'email.clicked';
created_at: string;
data: {
email_id: string;
from: string;
to: string[];
subject: string;
// Event-specific fields
bounce_type?: string;
complaint_type?: string;
click?: { link: string };
};
}
export default async function handleResendWebhook(req: Request) {
// Verify webhook signature
const signature = req.headers.get('resend-signature');
const timestamp = req.headers.get('resend-timestamp');
const body = await req.text();
if (!verifyWebhookSignature(body, signature!, timestamp!)) {
return new Response('Invalid signature', { status: 401 });
}
const event: ResendWebhookEvent = JSON.parse(body);
const supabase = createClient(
Deno.env.get('SUPABASE_URL')!,
Deno.env.get('SUPABASE_SERVICE_KEY')!
);
// Record the event
const { error: insertError } = await supabase
.from('email_events')
.insert({
resend_id: event.data.email_id,
event_type: event.type,
event_timestamp: event.created_at,
event_data: event.data,
processed: false
});
if (insertError) {
console.error('Failed to record event:', insertError);
return new Response('Failed to process', { status: 500 });
}
// Process the event based on type
switch (event.type) {
case 'email.delivered':
await handleDelivered(event, supabase);
break;
case 'email.bounced':
await handleBounced(event, supabase);
break;
case 'email.complained':
await handleComplaint(event, supabase);
break;
case 'email.opened':
await trackEngagement(event, supabase, 'opened');
break;
case 'email.clicked':
await trackEngagement(event, supabase, 'clicked');
break;
}
// Mark event as processed
await supabase
.from('email_events')
.update({ processed: true })
.eq('resend_id', event.data.email_id)
.eq('event_type', event.type)
.eq('event_timestamp', event.created_at);
return new Response('OK', { status: 200 });
}
function verifyWebhookSignature(
body: string,
signature: string,
timestamp: string
): boolean {
const secret = Deno.env.get('RESEND_WEBHOOK_SECRET')!;
const expectedSignature = createHmac('sha256', secret)
.update(`${timestamp}.${body}`)
.digest('hex');
// Constant-time comparison to prevent timing attacks
return signature.split(' ').some(sig => {
const [version, hash] = sig.split('=');
return version === 'v1' && constantTimeCompare(hash, expectedSignature);
});
}
async function handleBounced(
event: ResendWebhookEvent,
supabase: SupabaseClient
) {
// Update outbox status
await supabase
.from('email_outbox')
.update({
resend_status: 'bounced',
error_message: `Email bounced: ${event.data.bounce_type}`,
error_details: {
bounce_type: event.data.bounce_type,
timestamp: event.created_at
}
})
.eq('resend_id', event.data.email_id);
// Update guest email validity if this is a hard bounce
if (event.data.bounce_type === 'permanent') {
const { data: email } = await supabase
.from('email_outbox')
.select('correlation_id, to_email')
.eq('resend_id', event.data.email_id)
.single();
if (email?.correlation_id) {
// Mark guest email as invalid
await supabase
.from('guests')
.update({
email_valid: false,
email_bounce_type: event.data.bounce_type
})
.eq('email', email.to_email);
}
}
}
async function handleComplaint(
event: ResendWebhookEvent,
supabase: SupabaseClient
) {
// Complaints are spam reports - immediately suppress this email
await supabase
.from('email_suppressions')
.insert({
email: event.data.to[0],
reason: 'complaint',
complaint_type: event.data.complaint_type,
created_at: event.created_at
})
.onConflict('email')
.merge();
// Update outbox
await supabase
.from('email_outbox')
.update({
resend_status: 'complained',
error_message: 'Recipient marked as spam'
})
.eq('resend_id', event.data.email_id);
}
Database Schema for Event Tracking
-- Email events table for webhook data
CREATE TABLE email_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
created_at TIMESTAMPTZ DEFAULT NOW(),
resend_id TEXT NOT NULL,
event_type TEXT NOT NULL,
event_timestamp TIMESTAMPTZ NOT NULL,
event_data JSONB NOT NULL,
processed BOOLEAN DEFAULT FALSE,
INDEX idx_events_resend_id (resend_id),
INDEX idx_events_processed (processed, created_at),
INDEX idx_events_type (event_type, event_timestamp)
);
-- Email suppressions for managing unsubscribes and complaints
CREATE TABLE email_suppressions (
email TEXT PRIMARY KEY,
reason TEXT NOT NULL, -- bounce, complaint, unsubscribe
bounce_type TEXT,
complaint_type TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
expires_at TIMESTAMPTZ, -- For temporary suppressions
INDEX idx_suppressions_reason (reason),
INDEX idx_suppressions_expires (expires_at)
);
-- Analytics aggregation table
CREATE TABLE email_analytics (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
date DATE NOT NULL,
tenant_id UUID NOT NULL,
email_type TEXT NOT NULL,
sent_count INTEGER DEFAULT 0,
delivered_count INTEGER DEFAULT 0,
opened_count INTEGER DEFAULT 0,
clicked_count INTEGER DEFAULT 0,
bounced_count INTEGER DEFAULT 0,
complained_count INTEGER DEFAULT 0,
UNIQUE(date, tenant_id, email_type)
);
Cron Configuration
The cron jobs are configured in Supabase:
-- Standard outbox drainer (every 60 seconds)
SELECT cron.schedule(
'drain-email-outbox',
'* * * * *', -- Every minute
$$
SELECT net.http_post(
url := 'https://[project-ref].functions.supabase.co/drain-email-outbox',
headers := jsonb_build_object(
'Authorization', 'Bearer ' || current_setting('app.service_key')
)
);
$$
);
-- Priority outbox drainer (every 10 seconds)
SELECT cron.schedule(
'drain-priority-outbox',
'*/10 * * * * *', -- Every 10 seconds (requires pg_cron 1.4+)
$$
SELECT net.http_post(
url := 'https://[project-ref].functions.supabase.co/drain-priority-outbox',
headers := jsonb_build_object(
'Authorization', 'Bearer ' || current_setting('app.service_key')
)
);
$$
);
-- Cleanup old events (daily at 2 AM)
SELECT cron.schedule(
'cleanup-email-events',
'0 2 * * *',
$$
DELETE FROM email_events
WHERE created_at < NOW() - INTERVAL '30 days'
AND processed = true;
$$
);
-- Analytics aggregation (hourly)
SELECT cron.schedule(
'aggregate-email-analytics',
'0 * * * *',
$$
INSERT INTO email_analytics (date, tenant_id, email_type, sent_count, delivered_count, ...)
SELECT
DATE(event_timestamp),
o.tenant_id,
o.email_type,
COUNT(*) FILTER (WHERE e.event_type = 'email.sent'),
COUNT(*) FILTER (WHERE e.event_type = 'email.delivered'),
...
FROM email_events e
JOIN email_outbox o ON e.resend_id = o.resend_id
WHERE e.processed = true
AND e.event_timestamp >= NOW() - INTERVAL '1 hour'
GROUP BY DATE(event_timestamp), o.tenant_id, o.email_type
ON CONFLICT (date, tenant_id, email_type)
DO UPDATE SET
sent_count = email_analytics.sent_count + EXCLUDED.sent_count,
...;
$$
);
Application Layer Integration
The application code now queues emails instead of sending directly:
// EmailService.kt
class EmailService(
private val supabaseClient: SupabaseClient,
private val logger: Logger
) {
suspend fun queueNotificationEmail(
guestId: String,
waitlistId: String,
message: String,
priority: EmailPriority = EmailPriority.NORMAL
): Result<String> {
return try {
val guest = supabaseClient.from("guests")
.select()
.eq("id", guestId)
.single<Guest>()
// Check suppression list
if (isEmailSuppressed(guest.email)) {
logger.warn { "Email suppressed for ${guest.email}" }
return Result.failure(EmailSuppressedException(guest.email))
}
val emailId = UUID.randomUUID().toString()
val outboxTable = when (priority) {
EmailPriority.URGENT -> "email_outbox_priority"
else -> "email_outbox"
}
supabaseClient.from(outboxTable).insert(
mapOf(
"id" to emailId,
"to_email" to guest.email,
"subject" to "Your table is ready!",
"html_body" to renderEmailTemplate(message, guest),
"text_body" to stripHtml(message),
"priority" to priority.value,
"email_type" to "notification",
"correlation_id" to waitlistId,
"tenant_id" to guest.tenantId,
"scheduled_for" to if (priority == EmailPriority.SCHEDULED) {
// Allow scheduling future emails
calculateScheduledTime(waitlistId)
} else {
Instant.now().toString()
}
)
)
logger.info { "Email queued: $emailId for ${guest.email}" }
Result.success(emailId)
} catch (e: Exception) {
logger.error(e) { "Failed to queue email" }
Result.failure(e)
}
}
private suspend fun isEmailSuppressed(email: String): Boolean {
val suppression = supabaseClient.from("email_suppressions")
.select()
.eq("email", email)
.gte("expires_at", Instant.now().toString())
.singleOrNull<EmailSuppression>()
return suppression != null
}
suspend fun getEmailStatus(emailId: String): EmailStatus? {
return supabaseClient.from("email_outbox")
.select("status, resend_status, attempts, error_message")
.eq("id", emailId)
.singleOrNull<EmailStatus>()
}
suspend fun cancelScheduledEmail(emailId: String): Result<Unit> {
return try {
supabaseClient.from("email_outbox")
.update(mapOf("status" to "cancelled"))
.eq("id", emailId)
.eq("status", "pending") // Only cancel if not yet sent
Result.success(Unit)
} catch (e: Exception) {
Result.failure(e)
}
}
}
enum class EmailPriority(val value: Int) {
URGENT(1), // Guest notifications
HIGH(3), // Account verifications
NORMAL(5), // General updates
LOW(7), // Marketing
SCHEDULED(5) // Future scheduled emails
}
Monitoring and Observability
The system includes comprehensive monitoring:
-- Email health check view
CREATE VIEW email_health AS
SELECT
COUNT(*) FILTER (WHERE status = 'pending') as pending_count,
COUNT(*) FILTER (WHERE status = 'sending') as sending_count,
COUNT(*) FILTER (WHERE status = 'failed' AND attempts >= max_attempts) as failed_count,
MIN(scheduled_for) FILTER (WHERE status = 'pending') as oldest_pending,
MAX(last_attempt_at) as last_activity,
AVG(attempts) FILTER (WHERE status = 'sent') as avg_attempts_to_send
FROM email_outbox
WHERE created_at > NOW() - INTERVAL '24 hours';
-- Alert when outbox is backing up
CREATE OR REPLACE FUNCTION check_email_backlog()
RETURNS void AS $$
DECLARE
pending_count INTEGER;
oldest_pending TIMESTAMPTZ;
BEGIN
SELECT
COUNT(*),
MIN(scheduled_for)
INTO pending_count, oldest_pending
FROM email_outbox
WHERE status = 'pending'
AND scheduled_for < NOW();
IF pending_count > 100 OR
(oldest_pending IS NOT NULL AND oldest_pending < NOW() - INTERVAL '10 minutes') THEN
-- Trigger alert via webhook
PERFORM net.http_post(
url := current_setting('app.alert_webhook_url'),
body := jsonb_build_object(
'type', 'email_backlog',
'pending_count', pending_count,
'oldest_pending', oldest_pending
)
);
END IF;
END;
$$ LANGUAGE plpgsql;
-- Run health check every 5 minutes
SELECT cron.schedule(
'check-email-health',
'*/5 * * * *',
'SELECT check_email_backlog();'
);
Cost Analysis
The new architecture dramatically reduces costs:
Before (Naive Implementation):
- 1000 emails/day = 1000 Edge Function invocations
- Average billed time per invocation: 600ms
- Total daily compute time: 600 seconds
- Monthly cost (at $0.50 per million ms): ~$9
After (Outbox Pattern):
- 1000 emails/day with batch size 50 = 20 invocations
- Average billed time per invocation: 3000ms (processing 50 emails)
- Total daily compute time: 60 seconds
- Monthly cost: ~$0.90
- Savings: 90%
At scale (100K emails/day), the savings become massive:
- Before: ~$900/month
- After: ~$90/month
- Savings: $810/month or $9,720/year
Error Recovery and Resilience
The system handles various failure scenarios:
- Network failures: Automatic retry with exponential backoff
- Rate limiting: Respects Resend’s rate limits, queues for later
- Invalid emails: Marked and suppressed after hard bounces
- Service outages: Emails remain queued until service recovers
- Partial batch failures: Only failed emails retry, successful ones marked complete
Testing Strategy
Comprehensive testing ensures reliability:
// EmailServiceTest.kt
class EmailServiceTest {
@Test
fun `should handle concurrent drainer invocations`() = runTest {
// Insert 100 test emails
val emails = (1..100).map { createTestEmail() }
// Simulate concurrent drainers
val results = coroutineScope {
(1..5).map {
async { drainOutbox() }
}
}.awaitAll()
// Verify no email was processed twice
val processedIds = results.flatMap { it.processedEmails }
assertEquals(processedIds.size, processedIds.distinct().size)
}
@Test
fun `should respect priority ordering`() = runTest {
// Insert emails with different priorities
val urgent = createTestEmail(priority = 1)
val normal = createTestEmail(priority = 5)
val low = createTestEmail(priority = 10)
insertEmails(low, normal, urgent)
val result = drainOutbox(batchSize = 2)
// Verify processing order
assertEquals(listOf(urgent.id, normal.id), result.processedEmails)
}
@Test
fun `should handle webhook signature verification`() = runTest {
val webhook = ResendWebhook(secret = "test_secret")
// Valid signature
val validBody = """{"type":"email.delivered","data":{"email_id":"123"}}"""
val validSig = webhook.generateSignature(validBody)
assertTrue(webhook.verify(validBody, validSig))
// Invalid signature
assertFalse(webhook.verify(validBody, "invalid_signature"))
// Tampered body
val tamperedBody = """{"type":"email.bounced","data":{"email_id":"123"}}"""
assertFalse(webhook.verify(tamperedBody, validSig))
}
}
Performance Optimizations
Several optimizations ensure smooth operation at scale:
- Database indexes: Carefully chosen for query patterns
- Row-level locking: Prevents duplicate processing
- Advisory locks: Prevents concurrent drainer execution
- Batch processing: Reduces round trips to Resend API
- Connection pooling: Reuses database connections
- Lazy loading: Only loads email body when needed
Future Enhancements
The architecture supports future improvements:
- Template management: Store templates separately, reference by ID
- A/B testing: Route emails through different templates/content
- Analytics dashboard: Real-time email performance metrics
- Multi-channel: Extend to SMS, push notifications
- Smart scheduling: ML-based optimal send time prediction
- Fallback providers: Switch to SendGrid/SES if Resend fails
Next Week
With the email infrastructure overhauled, the MVP is nearly complete. Next week I’ll focus on final testing, performance optimization, and preparing for the beta launch. The outbox pattern implementation might bleed into next week given its complexity, but once done, we’ll have a production-ready email system that can scale from 10 to 100,000 emails per day without breaking a sweat.