Building a Collaborative DAW in the Browser

Oct 1, 2025

When you think about building a Digital Audio Workstation (DAW), the browser probably isn’t the first platform that comes to mind. But what if you could build a fully featured, collaborative music production tool that runs entirely in the browser with realtime synchronization, multi-track editing, effects chains, and sample-accurate audio playback?

The Challenge: Real-Time Audio Collaboration

Building a DAW is complex enough on its own. Add real-time collaboration, and you’ve got some challenges:

Timeline synchronization: When one user drags a clip, everyone needs to see it move instantly
Audio sample management: Large audio files need efficient storage and streaming
Sample-accurate playback: Audio scheduling needs millisecond precision
Ownership and permissions: Users should only delete their own clips
Live mix state: Volume, mute, solo—should these sync across users or stay local?

Let me walk you through how I solved these problems using Convex for realtime data sync and Web Audio API for precise audio scheduling.

Convex: Schema-Driven Real-Time Database

The heart of the collaborative experience is Convex, a real-time backend that automatically syncs data across clients. What makes Convex special is its schema-first approach with automatic TypeScript generation, making it easier to develop and use LLMs to code along with.

Here’s the core schema that powers the timeline:

export default defineSchema({
  tracks: defineTable({
    roomId: v.string(),
    index: v.number(),
    volume: v.number(),
    muted: v.optional(v.boolean()),
    soloed: v.optional(v.boolean()),
  })
    .index("by_room", ["roomId"])
    .index("by_room_index", ["roomId", "index"]),

  clips: defineTable({
    roomId: v.string(),
    trackId: v.id("tracks"),
    startSec: v.number(),
    duration: v.number(),
    leftPadSec: v.optional(v.number()),
    bufferOffsetSec: v.optional(v.number()),
    name: v.optional(v.string()),
    sampleUrl: v.optional(v.string()),
  })
    .index("by_room", ["roomId"])
    .index("by_track", ["trackId"]),

  ownerships: defineTable({
    roomId: v.string(),
    ownerUserId: v.string(),
    clipId: v.optional(v.id("clips")),
    trackId: v.optional(v.id("tracks")),
  })
    .index("by_clip", ["clipId"])
    .index("by_track", ["trackId"])
    .index("by_owner", ["ownerUserId"]),
});

Notice the intentional minimalism here. I deliberately avoid storing track names or complex metadata in Convex. Instead, I focus on what needs to be synchronized: positioning, timing, and ownership. Everything else stays client side.

Room-Based Multi-Tenancy

The ownerships table is key to authorization. When a user creates a clip or track, an ownership record is created. This enables enforcing owner-only deletions in Convex mutations:

// Example mutation: only owners can delete their clips
export const deleteClip = mutation({
  args: { clipId: v.id("clips") },
  handler: async (ctx, { clipId }) => {
    const userId = await getUserId(ctx);
    const ownership = await ctx.db
      .query("ownerships")
      .withIndex("by_clip", (q) => q.eq("clipId", clipId))
      .first();

    if (!ownership || ownership.ownerUserId !== userId) {
      throw new Error("Unauthorized: only the owner can delete this clip");
    }

    await ctx.db.delete(clipId);
    await ctx.db.delete(ownership._id);
  },
});

Effects Chains with Convex

EQ and reverb settings are also stored in Convex, enabling synchronized effect chains across collaborators:

effects: defineTable({
  roomId: v.string(),
  targetType: v.string(), // 'track' | 'master'
  trackId: v.optional(v.id("tracks")),
  index: v.number(), // chain order
  type: v.string(), // 'eq' | 'reverb'
  params: v.any(), // flexible for different effect types
})
  .index("by_track", ["trackId"])
  .index("by_track_order", ["trackId", "index"])

This flexible schema allows different effect types while maintaining ordered chains per track or master bus.

Web Audio API: Sample-Accurate Scheduling

The second pillar is a custom audio engine built on Web Audio API. The challenge with browser DAWs is achieving sample-accurate playback while maintaining the flexibility of a timeline editor.

The Transport System

At the core is a transport system that maps timeline seconds to audio context time:

export class AudioEngine {
  private transportEpochCtxTime = 0;
  private transportEpochTimelineSec = 0;
  private transportRunning = false;

  private timelineToCtxTime(timelineSec: number) {
    if (!this.audioCtx) return 0;
    const delta = timelineSec - this.transportEpochTimelineSec;
    return this.transportEpochCtxTime + Math.max(0, delta);
  }

  onTransportStart(playheadSec: number) {
    if (!this.audioCtx) return;
    this.transportEpochCtxTime = this.audioCtx.currentTime;
    this.transportEpochTimelineSec = Math.max(0, playheadSec);
    this.transportRunning = true;
    this.scheduleAllClipsFromPlayhead();
  }
}

When you hit play, the engine captures the current audioContext.currentTime as an epoch and maps all timeline positions relative to it. This enables precise scheduling regardless of when playback starts.

Clip Scheduling with Offsets

Each clip can have trimming and padding, so scheduling needs to account for buffer offsets:

scheduleClip(
  clip: Clip,
  buffer: AudioBuffer,
  playheadSec: number
) {
  if (!this.audioCtx) return;

  const startSec = clip.startSec + (clip.leftPadSec || 0);

  if (playheadSec > startSec + clip.duration) return; // already past

  const bufferOffset = clip.bufferOffsetSec || 0;
  const playOffset = Math.max(0, playheadSec - startSec);
  const bufferStart = bufferOffset + playOffset;
  const remainingDuration = clip.duration - playOffset;

  const scheduleTime = this.timelineToCtxTime(
    Math.max(startSec, playheadSec)
  );

  const source = this.audioCtx.createBufferSource();
  source.buffer = buffer;
  source.connect(this.getTrackChain(clip.trackId));
  source.start(scheduleTime, bufferStart, remainingDuration);

  this.activeSources.push(source);
}

This handles:

Left padding: Silent space before audio starts in the clip window
Buffer offset: Trimming from the left (start playing X seconds into the sample)
Playhead offset: Resume playback mid-clip if user seeks

Effect Chains: Per-Track EQ and Reverb

Each track has its own effects chain built with Web Audio nodes:

private rebuildTrackRouting(trackId: string) {
  const input = this.trackInputs.get(trackId);
  const gain = this.trackGains.get(trackId);
  if (!input || !gain) return;

  // Disconnect current routing
  try { input.disconnect(); } catch {}

  const eqChain = this.eqChains.get(trackId) || [];

  // Build the chain: input -> [EQ nodes] -> gain -> master
  let currentNode: AudioNode = input;

  for (const eqNode of eqChain) {
    currentNode.connect(eqNode);
    currentNode = eqNode;
  }

  const reverb = this.trackReverbs.get(trackId);
  if (reverb?.enabled) {
    // Parallel wet/dry routing
    currentNode.connect(reverb.dryGain);
    currentNode.connect(reverb.preDelay);
    reverb.preDelay.connect(reverb.convolver);
    reverb.convolver.connect(reverb.wetGain);

    reverb.dryGain.connect(gain);
    reverb.wetGain.connect(gain);
  } else {
    currentNode.connect(gain);
  }

  gain.connect(this.masterGain);
}

This routing system:

Connects EQ nodes in series (each BiquadFilter applies frequency shaping)
Implements parallel dry/wet routing for reverb
Maintains clean separation between track input, effects, and output gain

Metronome with Lookahead Scheduling

The metronome demonstrates precise timing with a lookahead scheduler:

private scheduleMetronomeTicks() {
  if (!this.transportRunning || !this.metronomeEnabled) return;

  const nowCtx = this.audioCtx.currentTime;
  const scheduleUntil = nowCtx + this.metronomeLookaheadSec; // 250ms ahead
  const secondsPerBeat = 60 / this.bpm;

  let nextBeatTimeline = this.nextMetronomeBeatTimelineSec;

  while (nextBeatTimeline) {
    const eventTime = this.timelineToCtxTime(nextBeatTimeline);
    if (eventTime > scheduleUntil) break;

    if (eventTime >= nowCtx - 0.02) { // small tolerance
      const source = this.audioCtx.createBufferSource();
      source.buffer = this.metronomeBuffer;
      source.connect(this.metronomeGain);
      source.start(eventTime);
      this.metronomeSources.push(source);
    }

    nextBeatTimeline += secondsPerBeat;
  }

  this.nextMetronomeBeatTimelineSec = nextBeatTimeline;
}

This runs every 50ms, scheduling clicks 250ms in advance. The Web Audio API’s precise scheduling ensures sample-accurate timing even if JavaScript execution is delayed.

MediaBunny: Browser-Native Audio Recording

One of the most interesting features of the DAW is the ability to record audio directly from your microphone and add it to the timeline. For this, I use MediaBunny—a pure TypeScript media toolkit that handles audio recording, encoding, and metadata extraction entirely in the browser.

MediaBunny is perfect for browser-based DAWs because it’s:

Dependency-free: No need for server-side processing or FFmpeg binaries
Format-agnostic: Works with WebM, MP4, WAV, and more
TypeScript-native: Excellent type safety and IDE support
Lightweight: Only ~5KB when tree-shaken

Recording Pipeline with MediaBunny

Here’s how the recording workflow integrates MediaBunny with the browser’s MediaStream API:

import {
  Output,
  BufferTarget,
  WebMOutputFormat,
  MediaStreamAudioTrackSource,
  QUALITY_MEDIUM,
} from 'mediabunny';

const startRecording = async () => {
  // Request microphone access
  const mediaStream = await navigator.mediaDevices.getUserMedia({
    audio: {
      echoCancellation: true,
      noiseSuppression: true,
      sampleRate: 44100,
    },
  });

  const audioTrack = mediaStream.getAudioTracks()[0];
  
  // Create MediaBunny output with WebM container
  const output = new Output({
    format: new WebMOutputFormat(),
    target: new BufferTarget(), // Record to in-memory buffer
  });

  // Create audio source from the live MediaStream track
  const audioSource = new MediaStreamAudioTrackSource(audioTrack, {
    codec: 'opus',
    bitrate: QUALITY_MEDIUM,
  });

  output.addAudioTrack(audioSource);
  await output.start(); // Begin recording
};

This creates a live audio pipeline that continuously encodes microphone input to Opus-compressed WebM. The BufferTarget accumulates the encoded data in memory, ready for export when recording stops.

Stopping and Analyzing Recordings

When the user stops recording, MediaBunny finalizes the output and provides the encoded buffer:

const stopRecording = async () => {
  // Stop all media tracks
  mediaStream.getTracks().forEach(track => track.stop());

  // Finalize MediaBunny output
  await output.finalize();
  const buffer = (output.target as BufferTarget).buffer;

  // Create a blob from the encoded data
  const blob = new Blob([buffer], { type: 'audio/webm' });
  
  // Analyze the recording to extract metadata
  const audioFile = await analyzeAudioFile(blob, `recording-${Date.now()}.webm`);
};

Metadata Extraction

MediaBunny’s Input class can parse audio files and extract detailed metadata—crucial for properly scheduling clips on the timeline:

import { Input, ALL_FORMATS, BlobSource } from 'mediabunny';

const analyzeAudioFile = async (blob: Blob, fileName: string) => {
  const input = new Input({
    formats: ALL_FORMATS,
    source: new BlobSource(blob as File),
  });

  // Extract duration and track metadata
  const duration = await input.computeDuration();
  const audioTrack = await input.getPrimaryAudioTrack();

  if (!audioTrack) {
    throw new Error('No audio track found');
  }

  return {
    name: fileName,
    blob,
    duration,
    sampleRate: audioTrack.sampleRate,
    numberOfChannels: audioTrack.numberOfChannels,
    url: URL.createObjectURL(blob),
  };
};

This metadata extraction is essential because:

Duration: Determines the clip’s length on the timeline
Sample rate: Ensures proper playback speed through Web Audio API
Channel count: Enables stereo/mono routing decisions

From Recording to Timeline

Once MediaBunny has captured and analyzed the audio, the workflow is:

Upload to R2: The audio blob is sent to the Cloudflare Worker for storage
Create Convex clip: A new clip document is created with the R2 URL
Schedule playback: The audio engine decodes the buffer and schedules it for playback

This seamless integration between MediaBunny, Convex, and the audio engine enables a professional recording workflow entirely in the browser—no server-side processing required.

Edge Infrastructure: Cloudflare Workers + R2

Audio files are large, so efficient storage and streaming is critical. I use Cloudflare R2 for object storage, accessed through a Hono Worker API:

// Upload endpoint
app.post('/api/samples', async (c) => {
  const { user } = c.var;
  if (!user) return c.json({ error: 'Unauthorized' }, 401);

  const formData = await c.req.formData();
  const file = formData.get('file') as File;
  const roomId = formData.get('roomId') as string;

  const key = `rooms/${roomId}/clips/${file.name}`;

  await c.env.AUDIO_SAMPLES.put(key, file.stream(), {
    httpMetadata: { contentType: file.type },
  });

  const url = `/api/samples/${roomId}/${encodeURIComponent(file.name)}`;
  return c.json({ url, key });
});

The streaming endpoint serves audio with proper caching headers:

app.get('/api/samples/:roomId/:filename', async (c) => {
  const { roomId, filename } = c.req.param();
  const key = `rooms/${roomId}/clips/${filename}`;

  const object = await c.env.AUDIO_SAMPLES.get(key);
  if (!object) return c.notFound();

  return new Response(object.body, {
    headers: {
      'Content-Type': object.httpMetadata?.contentType || 'audio/wav',
      'Cache-Control': 'public, max-age=31536000',
      'X-R2-Key': key,
    },
  });
});

This edge-first approach means audio samples are served from Cloudflare’s global network with minimal latency.

Lessons Learned

Building this DAW taught me several valuable lessons:

1. Embrace eventual consistency: Users must have local-only preferences (solo/mute) while other edits sync globally. Not everything needs real-time sync.

2. Schema minimalism: Convex works best with minimal, focused schemas. Store only what truly needs synchronization.

3. Lookahead scheduling is essential: Web Audio scheduling is sample-accurate, but JavaScript isn’t. Always schedule audio events ahead of time.

4. TypeScript everywhere pays off: From Convex schema to audio engine types, type safety caught countless bugs during development.

5. Edge infrastructure matters: Serving audio from R2 through Cloudflare’s network.

What’s Next?

The project is live and functional, but there’s always more to explore:

MIDI support for virtual instruments
Audio effects like compression and delay
Collaborative mixing with per-user mix snapshots
Offline editing with sync when reconnected

If you’re interested in real-time collaboration, audio programming, or edge computing, I encourage you to check out the live demo and explore the source code.

Building a DAW in the browser pushes the limits of what’s possible on the web platform—and it’s incredibly rewarding to see it all come together.