High Availability at Braintree

Who am I?

What am I talking about?

  • Performing maintenance in 30 second windows
  • Suspending traffic during small maintenance windows
  • Automating infrastructure changes

Part 1:

Tiny Maintenance Windows

How?

postgres = hot

Adding an Index

The Standard Approach

class AddNameIndex < ActiveRecord::Migration
  def self.up
    add_index :people, :name
  end

  def self.down
    # ...
  end
end

Adding an Index

The Concurrent Approach

class AddNameIndex < ActiveRecord::Migration
  def self.up
    execute "CREATE INDEX CONCURRENTLY index_people_on_name ON people (name)"
  end

  def self.down
    # ...
  end
end

Doesn't lock on reads

Doesn't lock on writes!

but

Drawbacks

When this option is used, PostgreSQL must perform two scans of the table, and in addition it must wait for all existing transactions that could potentially use the index to terminate. Thus this method requires more total work than a standard index build and takes significantly longer to complete. However, since it allows normal operations to continue while the index is built, this method is useful for adding new indexes in a production environment.

postgres = hot ?

postgres = hot_migrations!

Hot Migrations

class HotMigrator < ActiveRecord::Migrator
  def ddl_transaction(&block)
    ActiveRecord::Base.connection.execute "set statement_timeout = 0"
    block.call
  end
end

task :migrate_hot => :environment do
  HotMigrator.migrate(
    "db/migrate_hot/",
    ENV["VERSION"] ? ENV["VERSION"].to_i : nil
  )
end

Release Scheduling

Old release schedule

  • Do some work for release 1
  • Merge to release branch
  • Tag
  • Deploy
  • Do some work for release 2
  • Merge to release branch
  • Tag
  • Deploy

New release schedule

  • Release 1.0 is the same
  • Do some work for release 2
  • Create hot migration for release 2
  • Cherry pick hot migration into release branch
  • Tag 2.0
  • Deploy
  • Merge to release branch
  • Tag 2.1
  • Deploy

Part 2:

Suspending Traffic

A picture is worth a bunch of words

Broxy

Broxy

Dispatchers

Dispatchers

To suspend traffic

just stop the dispatchers

Suspending Traffic

Suspending Traffic

Suspending Traffic

Suspending Traffic

Suspending Traffic

Part 3:

Automation

"Piano Key" Deploy

"Piano Key" Deploy

"Piano Key" Deploy

"Piano Key" Deploy

"Piano Key" Deploy

"Piano Key" Deploy

"Piano Key" Deploy

"Piano Key" Deploy

"Piano Key" Deploy

"Piano Key" Deploy

"With Broxy" Deploy

"With Broxy" Deploy

"With Broxy" Deploy

"With Broxy" Deploy

"With Broxy" Deploy

Database Failover

Database Failover

Database Failover

Database Failover

Database Failover

Database Failover

Database Failover

Database Failover

Database Failover

Shameless Self-promotion Slide

Chicago Tech Drinkup - Tomorrow, 6:00pm, Haymarket Pub and Brewery

Node.js meetup - Oct. 18th, 6:00pm

Q & A

#

/