As systems grow sometimes tables and models come and go. Often things don’t get fully cleaned up or removed from the system. As time goes on it is often harder and harder to know what is safe to remove, as people involved with the code may have moved on. I have used lots of little tricks to clean up dead code. I even maintain Coverband to find code usage in production. One trick Tom Copeland pointed out to me was checking the max created at on a given table to see when it was last used. It was so simple and helpful I ended up wanting to do that across all tables not just individual models.

Rake DB cleanup

I first tried to do this in pure SQL with a subquery, but when I was having trouble making it work, I found a post basically saying don’t do that. It’s point was if you are going to run a subquery over a collection of all your tables us code not SQL. In the end I made a simple Rake task to output the data I wanted. You should be able to drop it into your own project and get a good idea of tables that might not be needed and get to drop the related code as well. In my case it brought about 14 tables to our attention to schedule for cleanup.

namespace :database do
  desc 'List all tables in surge DB and last usage, help find tables to remove'
  task :list_recent => :environment do
    table_data = {}
    skip_large_slow_tables = ['some_table', 'versions']

    sql = "select * from information_schema.tables where table_schema='public' and table_type='BASE TABLE';"
    tables_array = ActiveRecord::Base.connection.execute(sql)
    tables_array.each do |table|
      table_name = table['table_name']
      next if skip_large_slow_tables.include?(table_name)
      puts "running: #{table_name}"
        sql = "select max(created_at) from #{table_name};"
        max_result = ActiveRecord::Base.connection.execute(sql).first['max']
        max_result = 'N/A'
      table_data[table_name] = max_result
    table_data.sort_by{|k,v| v.to_s }.each do |pair|
      table = pair.first
      max = pair.last
      puts "#{table}:#{max}"

blog comments powered by Disqus
Dan Mayer Profile Pic
Welcome to Dan Mayer's development blog. I primary write about Ruby development, distributed teams, and dev/PM process. The archives go back to my first CS classes during college when I was first learning programming. I contribute to a few OSS projects and often work on my own projects, You can find my code on github.

Twitter @danmayer

Github @danmayer