Active Record Database Documentation 12 November 2017

Database Documentation

Documentation Folders CC image from Pixabay

Active Record Database Documentation

This post will cover how to use the Active Record feature to add comments to your tables, columns, and indexes. This feature makes it easy to keep your database documentation up to date as you can add descriptions at the same time as you add or update tables and columns.

Documentation Embedded with Change Process

Why would we want to use Rails to build our database documentation?

I believe documentation close to code and embedded in the code change process has a better chance of staying up to date and relevant.

I also think we want to add human context to our information in same tools we build our database. We want to do this in source code that is trackable for the same reason we run database migrations in Rails opposed to just having a DBA make schema changes outside of our application code change process.

When we embed our database documentation in our standard code change process we easily get many advantages.

  • See DB comments change over time because they are part of Git
  • Search in code editor tools (and github)
  • Documentation can be reviewed as part of PRs by a data team, analysts, or other folks who might be the target documentation audience

By having the documentation embedded in the database directly, other values can be unlocked.

  • The documentation is embedded in most DB explorer tools (SQL workbench, Postico, etc).
  • Single source of truth documentation. It is easy to generate and push to documentation repositories (markdown, html, confluence). Either from Rails, CI, or any other tool in your workflow (see examples below).

Database Documentation in Postico

Postico OSX Postgres client showing comments as you explore DB structure

Code Samples

Below are some code samples to help you get started with a workflow around database documentation.

Migration

A migration adding comments to a previously existing table. You can add descriptions to call out deprecated fields, gotchas, planned refactorings, or add historical context that may be helpful to the next developer trying to understand what the field means.

class AddContactComments < ActiveRecord::Migration[5.1]
  def change
    msg = 'Contacts table holds individual details about our contacts, it is associated with leads and customers'
    change_table_comment(:contacts, msg)

    change_column_comment(:contacts, :first_name, 'the contacts first name')
    change_column_comment(:contacts, :last_name, 'the contacts last name')
    change_column_comment(:contacts, :house_latitude, 'the house_latitude the contact lives at')
    change_column_comment(:contacts, :house_longitude, 'the house_longitude the contact lives at')
    change_column_comment(:contacts, :house_location_accuracy, 'the accuracy range we captured the GPS with')
    change_column_comment(:contacts, :deleted_at, "the date the contact was 'hidden' from our DB")
    
    # call out gotchas
    # tasks have assignee_id while contacts still use agent_id in the DB, this is a recommended refactoring
    msg = 'the agent_id field is for who the contact is currently assigned to, various places in the code and API it is referenced by assignee_id'
    change_column_comment(:leads, :agent_id, msg)
  end
end

Ruby DB Docs Access

To generate documentation which could be pushed to a wiki, html, confluence, or elsewhere you can iterate through a tables columns and fetch the comments.

> ActiveRecord::Base.connection.table_comment('leads')
=> "Leads table holds individual details about leads, related associations, event timestamps, and joins to contact"	
> Contact.columns_hash['literacy'].comment
=> reading level: {"-1"=>"none", "0"=>"no_read", "1"=>"limited_read", "2"=>"read_fluent", "none"=>"none", "no_read"=>"no_read", "limited_read"=>"limited_read", "read_fluent"=>"read_fluent"}

# iterate through all the columns on a table and output them to your documentation file or API
Contact.columns.each do |c|
  puts c.comment 
end

SQL DB Docs Access

Obviously you don’t need Rails to get at this information, you can pull it out with raw SQL as well. Covered in this post: Querying table, view, column and function descriptions

# get the all the table comments in your DB
SELECT c.relname As tname, CASE WHEN c.relkind = 'v' THEN 'view' ELSE 'table' END As type, 
    pg_get_userbyid(c.relowner) AS towner, t.spcname AS tspace, 
    n.nspname AS sname,  d.description
   FROM pg_class As c
   LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
   LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
   LEFT JOIN pg_description As d ON (d.objoid = c.oid AND d.objsubid = 0)
   WHERE c.relkind IN('r', 'v') AND d.description > ''
   ORDER BY n.nspname, c.relname ;

# get all the comments for the contacts table
SELECT a.attname As column_name,  d.description
   FROM pg_class As c
    INNER JOIN pg_attribute As a ON c.oid = a.attrelid
   LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
   LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
   LEFT JOIN pg_description As d ON (d.objoid = c.oid AND d.objsubid = a.attnum)
   WHERE  c.relkind IN('r', 'v') AND  n.nspname = 'public' AND c.relname = 'contacts'
   ORDER BY n.nspname, c.relname, a.attname ;
comments

Databases Across Environments 26 October 2017

Random Links

DB Syncs CC image from Pixabay

Syncing Databases Across Environments

It seems that often when a business grows, at some point it is hard to create useful QA, development, and staging data to cover all the cases that can occur on production. Eventually, there is a need to try to replicate certain cases or debug various issues that is much easier when you can pull the production data to another system to experiment and debug safely. Nearly, everywhere I have worked eventually we need to clone large portions of the production DB to staging… Quickly followed by wanting to pull staging data to local development. While sometimes various privacy and security concerns must be taken into account, in general we are just talking about replicating a database or tables from one place to another. I will cover some approaches to moving DBs.

DB Rotation on AWS

If you are in a cloud environment use the tools they provide when you can and avoid building a custom solution. We use AWS, and we run our Postgres DB via the RDS service. We have backup and retention rules and the ability to restore snapshots to different names. We leverage snapshots to move the DBs from one environment to another. If you are using Google cloud or Heroku there are similar options. I am not going to layout the code, but you can also do it easily via the AWS GUI. Our basic setup when in the cloud is detailed below.

  • Production: has a nightly snapshot taken
  • Staging:
    • about once every other month we restore the staging DB from a production snapshot
    • after the snapshot is restored some scripts run (clean up, sanitize, and truncate large un-needed tables) so it will run better on a smaller DB instance
    • Careful about various test data that is expected by managers and your QA team. You might have destroyed a lot of work
    • We handle that by QA regression suite will detect our QA team data is missing and recreate example QA data for the team
    • staging has nightly snapshots taken
  • Dev Stacks: deployed development environments
    • these are created dynamically by developers from git branches
    • they restore a new DB from the latest staging snapshot
    • then run migrations and whatnot on the branch to get it to the correct state
    • we don’t take any back ups of these stacks which are traditionally destroyed after a few days of use

Full DB to local Development

While the AWS toolchain is great you can’t restore an AWS snapshot to your local Postgres DB. For that we just created a simple script that will dump Postgres and upload it to S3 in a compressed format. We have some options to exclude some extra large tables that we generally don’t care about for development, you could always pull those later using the method for individual tables mentioned later.

Full Postgres to S3 Dump Rake Task

  def db_to_dump
    ActiveRecord::Base.connection_config[:database]
  end

  def db_host
    ActiveRecord::Base.connection_config[:host]
  end

  def db_port
    ActiveRecord::Base.connection_config[:port]
  end

  def db_user
    ActiveRecord::Base.connection_config[:username]
  end

  def db_pass
    ActiveRecord::Base.connection_config[:password]
  end

  ###
  # Run this on the target environment you wish to dump (staging dev stack)
  # bundle exec rake data:dump_full_db
  #
  # The DB files should get uploaded to S3 ready 
  # to be pulled into another environment
  ###
  desc 'dump full DB to S3'
  task dump_full_db: :environment do
    tables_to_exclude = %w[really_big_tables papertrail_versions]
    exclude_lines = tables_to_exclude.map { |table| "--exclude-table-data=#{table}" }.join(' ')
    full_db_file = '/tmp/full_dev_dump_data.sql'

    begin
      s3 = AWS::S3.new(access_key_id: env['S3_ACCESS_KEY_ID'],
                     secret_access_key: env['S3_SECRET_ACCESS_KEY'])
      bucket = s3.buckets['dev-db-bucket']
    
      # HACK this is needed so you don't pass the password
      File.open(pg_pass_file, 'w') { |f|
        f.write("#{db_host}:#{db_port}:#{db_to_dump}:#{db_user}:#{db_pass}")
      }
      `chmod 600 #{pg_pass_file}`
      
      `pg_dump #{exclude_lines} -O -v -x -F c -f #{full_db_file} -h #{db_host} -p #{db_port} -U #{db_user} #{db_to_dump}`

      [full_db_file].each do |file|
        if File.exists?(file)
          puts "uploading #{file}"
          path = Pathname.new(file)
          obj = bucket.objects[path.basename.to_s]
          obj.write(path)
        end
      end
    ensure
      `rm #{pg_pass_file} &> /dev/null`
    end
  end
  
  ###
  # Run this on the environment you wish to create or restore
  # the most recent dump.
  #
  # bundle exec rake reload_full_from_s3
  ###
  desc 'reload dev DB from S3'
  task reload_full_from_s3: :environment do
    unless ENV['SKIP_DOWNLOAD']
      s3, bucket = s3_and_bucket
      full_db_file = '/tmp/full_dev_dump_data.sql'

      [full_db_file].each do |file|
        puts "downloading #{file}"
        path = Pathname.new(file)
        obj = bucket.objects[path.basename.to_s]

        File.open(path.to_s, 'wb') do |s3_file|
          obj.read do |chunk|
            s3_file.write(chunk)
          end
        end
      end
    end

    db_to_dump = ActiveRecord::Base.connection_config[:database]
    ActiveRecord::Base.remove_connection

    puts `psql -c "drop database #{db_to_dump};"`
    if $? != 0
      puts 'drop DB failed (you likely need to close console or app)'
      exit 1
    end
    puts `psql -c "create database #{db_to_dump};"`
    puts `pg_restore  --verbose --dbname #{db_to_dump} -F c #{full_db_file}`
  end

Single table to Staging or Development

OK that makes it easy for a new dev to get a copy of the latest full DB, what if you just want to pull the most recent payments, customers, or products… A much faster way it to load a table or two.

Postgres Single table S3 Dump & Load Rake Task

  ###
  # Run on target environment to dump a table
  # TABLE_NAME=phone_block_lists bundle exec rake db:dump_db_table
  ###
  desc 'dump single DB table to S3'
  task dump_db_table: :environment do
    table = ENV['TABLE_NAME'] || 'users'
    table_file = "/tmp/#{table}.sql"

    begin
      s3, bucket = s3_and_bucket
      write_pg_pass
      `pg_dump --no-owner --no-acl -v -F c -h #{db_host} -p #{db_port} -U #{db_user} --table public.#{table} #{db_to_dump} > #{table_file}`

      [table_file].each do |file|
        if File.exists?(file)
          puts "uploading #{file}"
          path = Pathname.new(file)
          obj = bucket.objects[path.basename.to_s]
          obj.write(path)
        end
      end
    ensure
      `rm #{pg_pass_file} &> /dev/null`
    end
  end
  
  ###
  # Run on environment where you want to load the table
  # TABLE_NAME=phone_block_lists bundle exec rake db:reload_table_from_s3
  ###
  desc 'reload dev DB table from S3'
  task reload_table_from_s3: :environment do
    table = ENV['TABLE_NAME'] || 'administrative_areas'
    table_file = "/tmp/#{table}.sql"
    db_to_dump = ActiveRecord::Base.connection_config[:database]

    unless ENV['SKIP_DOWNLOAD']
      s3, bucket = s3_and_bucket

      [table_file].each do |file|
        puts "downloading #{file}"
        path = Pathname.new(file)
        obj = bucket.objects[path.basename.to_s]

        File.open(path.to_s, 'wb') do |s3_file|
          # this is slow perhaps move to AWS CLI download
          obj.read do |chunk|
            s3_file.write(chunk)
          end
        end
      end
    end

    puts `psql --dbname #{db_to_dump} -c "truncate table #{table}"`
    puts `pg_restore  --verbose --data-only --dbname #{db_to_dump}  -F c #{table_file}`
  end

Database Helper Scripts

These are small examples extracted out from our database helper rake task. We have a number of other little helpers as well. It can be useful to build up some small simple tools to help you team move data around environments. It can also really simplify the process of bring on new developers to the team. Hope the simple examples above will be helpful to some folks.

comments

Markdown to Confluence Wikis 27 August 2017

image DB Schema Changes

Markdown converted for Atlassian’s Confluence

If you ever have a need to convert a bunch of markdown or html files to confluence it is really easy. Their Rest-API is pretty clean and if you use Ruby there is a small Confluence API gem to make it even easier.

Why Move To Confluence?

Why would we move our documentation from Markdown in a git repo, to Confluence? Well our team is growing and we interact with more and more non developers, who could benefit and contribute to our documentation. While some of our documentation was very technical much of it wasn’t. It would be helpful to our Support, IT, Project Management, and Stakeholders. Markdown and git aren’t the best way to get other folks comfort able reading or contributing to our documentation.

  • Increased collaboration
  • Simplified access control to read
  • Ability to mix ticketing, widgets, and wiki docs

That was enough for us to decide on moving our documentation over.

comments

Information Architecture Changes 13 August 2017

image DB Schema Changes

Visualizing DB Schema Changes, even at a quick glance better structures stand out

Information Architecture Changes

Creating a robust Data Model for your business & application is one of the most important things to get right. We still don’t have great tools available for discussing proposed database schema changes. Fixing a bad data model after data has started flowing on a production system is more complicated & time consuming than folks estimate. A series of not fully thought out data decisions early on can cut a team’s velocity significantly. This is due to time being spent on trying to fix bad & invalid data, adding missing DB constraints & validations, and refactoring towards a more appropriate data model.

What can we do to ensure a more robust model from the start and increase the confidence we have in data model changes?

A Proposal For More Communicative Data Model Changes

Below, I will lay out a proposal we are iterating on as part of our OGE team best practices.

Our Goals:

  • Ability to communicate anticipated impacts of our data model changes clearly across Development, Project Management, & Business Intelligence teams
  • Ability to quickly visualize changes (faster feedback loop)
  • Reduce bad data models making it to production
  • Support both interactive & asynchronous communication styles that work best for our distributed team
  • Ability to see high and low level details using the same data
  • Easily integrates into our tool chain
comments

Most Valuable Code 20 July 2017

image detect

Obama writing JS with code.org.

The Value of Code

While both of the below examples are obviously a bit exaggerated they do capture the point. The code we write has value because it solves some challenge.

The Best Code is No Code At All
-Jeff Atwood

or

Programming is a means to an end, not an end in itself. You should be trying to do as little of it as possible to make the thing that you want.
-Jake Levine

Most Valuable Code I’ve Written

Given that I think the most valuable code I have currently ever written is a stupid and simple method. This is a recreation of the method below avoiding un-important internals.

comments
Dan Mayer Profile Pic
Welcome to Dan Mayer's development blog. I primary write about Ruby development, distributed teams, and dev/PM process. The archives go back to my first CS classes during college when I was first learning programming. I contribute to a few OSS projects and often work on my own projects, You can find my code on github.

Twitter @danmayer

Github @danmayer