Safely Removing Image Assets from Rails 28 October 2019

Bugs

photo credit cleaning: pixabay

Why Cleanup Rails Image Assets?

Why would we want to more safely delete image assets?

  • a clean repo is easier to jump into and maintain over time
  • cruft that isn’t in use can be confusing over time
  • image assets can slow down your test and deploy pipeline
    • Rails tests frequently need to dynamically or initialize building all assets, this is often a slow hit on a Rails test suite
  • Deployment needs to work with assets as well, often multiple steps
    • building all assets rake assets:precompile
    • compressing asset bundle
    • uploading assets to a CDN

All of this time adds up, assets compilation on a large legacy Rails app still adds around 40 seconds to the build time, down from 1m30s in the past. While asset preparation and deployment before cleanup and optimization of that flow and files was adding over 3 minutes to our deploy time.

How To Safely Delete Image Assets

OK, hopefully now you would love to delete all the images in your app/assets/images folder that you don’t use… What images are safe to delete or out of use? I have looked at a number of ideas for this.

  • grepping with various scripts
  • using log aggregation search results to ensure no hits were being made of an image asset
  • using Sprockets options, unknown_asset_fallback alone, doesn’t make you entirely safe…

What I wanted was a way to delete a folder of images or a single image that I believed was no longer in use, but have the build fail if there was any reference to the image. I wanted Rails to fail in these cases:

  • a page is loaded in dev mode referencing a missing asset
  • tests are run against a page referencing a missing asset (ActionDispatch::IntegrationTest, request spec, etc)
  • bundle exec rake assets:precompile

Sprockets Unknown Asset Fallback

Not surprisingly, other folks have wanted this and Sprockets has a built-in option config.assets.unknown_asset_fallback, which gets close to what I wanted. From the docs, this option claims to:

When set to a truthy value, a result will be returned even if the requested asset is not found in the asset pipeline. When set to a falsey value it will raise an error when no asset is found in the pipeline. Defaults to true.

So let’s set it to false: Rails.application.config.assets.unknown_asset_fallback = false. Now if you have a deleted image referenced like below:

<%= image_tag("deleted_image.svg") %>

You will get an error when visiting the page or running tests:

bundle exec rake

...S.......E

Error:
HomeControllerTest#test_should_get_index:
ActionView::Template::Error: The asset "deleted_image.svg" is not present in the asset pipeline.
    app/views/home/index.html.erb:6:in `_app_views_home_index_html_erb___957919561084124106_70092585694780'
    test/controllers/home_controller_test.rb:5:in `block in <class:HomeControllerTest>'

This doesn’t make one entirely safe, as images that are referenced in your scss, css, or other styles would still not cause an error. They would silently lead to broken images.

Patch To Force Asset Compilation To Fail on Unknown Assets

Sadly, I couldn’t find any option or configuration that would cause compiling stylesheets to fail. I thought this would block my goal of being able to remove a ton of assets safely with confidence… Well, after lots of digging, I figured out how to patch sprockets-rails so that it will blow up and fail when it references an unknown asset.

You can apply this patch in your: config/initializers/assets.rb

Now if you have a file in your styles, like app/assets/stylesheets/application.scss reference an image, your asset pipeline will blow up when the image is missing.

.broken-image-demo {
  background-image: image-url('deleted_image.svg');
}

Depending on how your tests run, they will fail when precompiling assets, or a failure will occur when you call rake assets:precompile as shown below.

bundle exec rake assets:precompile
...
Done in 1.32s.
rake aborted!
Sprockets::Rails::Helper::AssetNotFound: path not resolved: deleted_image.svg
/Users/danmayer/projects/coverband_demo/config/initializers/assets.rb:56:in `rescue in compute_asset_path'
/Users/danmayer/projects/coverband_demo/config/initializers/assets.rb:51:in `compute_asset_path'
/Users/danmayer/.rvm/gems/ruby-2.6.2/gems/actionview-5.2.2.1/lib/action_view/helpers/asset_url_helper.rb:200:in `asset_path'
...

Asset Failure Demo

If you want to see this in action, feel free to clone coverband demo. Install gems and get the tests passing. Then read the comments and run tests or compile assets when uncommenting the example lines to trigger the expected errours. Key Files:

A Final Note

On an old legacy application we were able to delete over 50% of the total asset disk size, by clearing out old unused image assets. This made it easier to find and navigate our assets folder, and it significantly sped up both our test suite and deployment. While I wouldn’t expect most projects to have as much image cruft sitting around, especially with older applications, it is easy for these unused assets to really build up over time.

While the above, should make it easier to delete image assets and do housekeeping yourself, it still takes a bit of time. You need to go through a process:

  • find a likely set of unused images
  • delete them, run tests
  • add back images that were still used
  • repeat until satisfied

This obviously looks like a process that can be automated to help you clean up all your image assets automatically. That is totally true, and I will cover how to do that in a future post. This post covers what is a prerequisite to being able to automate the cleanup, ensuring that your app will properly and very loudly fail when an image was removed which is still required.

comments

Flaky Ruby Tests 07 September 2019

Restoration of a Ruby Test Suite

I want to talk about some recent work to restore a Rails app’s test suite to a usable state. The goal went beyond the test suite, to restoring trust in the continuous deployment pipeline, but this post will mostly focus on the Rspec suite. The reason I started on this work was that the current state of deployment was “dangerous”, various folks preferred to avoid the project as it was difficult to work in and release, but still critical to our overall architecture. At it’s very worst, deploys were taking over 30 minutes, with a failure rate of the deployment pipeline of 45%. The issue became clear and high priority to me when one day, I had two small PRs to release, due to bad luck with the failure rate, it took me nearly 6 hours to get my changes deployed. A constant distraction that dragged on through meetings, and other work. Making the pains of the team extremely clear and personal, I decided an effort to get things back into a safe state should be taken on.

I wanted to share some of what I learned as there has been some recent discussion in the Ruby community about fixing flaky tests @samsaffron @tenderlove @ctietze @SonjaBPeterson @saramic.

A Restoration Plan

Taking a step back, thinking about what needed to happen and how to get there was the first step. I thought this would fit in well to the old agile advice…

Make It Work. Make It Right. Make It Fast.

Make It Work

What didn’t work about the deployment? In my opinion, it was broken because:

  • doesn’t meet a 95% or better success rate
  • deploys are too slow, to watch and review if changes succeeded, 10 minutes of less
  • test suite relying on CI parallelism is to slow to ever run locally, local suite run needs to be possible in 1hr or less.

With a definition of what success looks like to make it work, then I was able to start to dig into the details of how to get there. 

Delete Flaky Tests That Aren’t Valuable

This is often a hard one for some teams. An honest discussion of the purpose and value of tests is likely needed. I found good success by having a small PR removing a few flaky tests, and pointing to similar tests in the suite that exercised functionality in a more reliable way. For example, moving from a complete feature spec that tested several endpoints in a long workflow, to a couple of feature tests exercising individual endpoints, along with unit tests for the underlying service providers. The team might need to have a discussion ahead of time, or you might be surprised that others quickly agree that really flaky tests aren’t providing value.

Fixing Flaky Tests

The primary cause of our unstable deploy pipeline was flaky tests. Given we were having deployments fail 45% of the time we had a large number of flaky tests causing issues. Let’s dive into some of the techniques for resolving flaky test failures.

Flaky Paint

Divide and Conquer, with Rspec-Retry

Initially quarantine helps to reduce their damage to other tests, but you still have to fix them soon.

Since before I joined the project, it has used rspec-retry as a way to isolate some flaky tests. Yes, this is a band-aid, but in terms of getting back to a “make it work” baseline, it is a wonderful tool. This is how I used it. For a period of about a month, I watched every failure on our CI spec suite. Every time a test failed more than once, I would add it to an internal wiki marking the test as flaky. Myself and other folks from the team, when they had time, would “adopt” a test and try to fix it, if one timeboxed an hour or two and couldn’t figure out and fix the underlying issue, we would tag it flaky, so that rspec-retry would run the test multiple times trying to achieve success. We ran our flaky tag specs in a special CI job, bundle exec rspec --tag retry_flaky_test isolated from our other tests. This CI job had a success rate of 99%, so the flaky tests would pass on retry, and be split off from others. Then with logging, and debugging we could dig in deeper in resolve the underlying issues and move the test back into the standard test suite. This is great because it very quickly got the test suite back into usable condition and tags all the future work still needing to be addressed and captures metrics about which tests are most flaky, or no longer are flaky as we resolve issues. At our current stage, we still need to go back and resolve a number of flaky tests, but they no longer slow down or block our CI.

Isolate Flaky Test Recipe:

  • capture data and identify flaky tests (use metric tracking, or just eyeball if from your CI history)
  • quickly try to fix them timeboxed to a short amount of time
  • if you can’t fix them, tag them for rspec-retry, to isolate them and remove them as a blocker for CI
  • Find a way to distribute this process across folks on the team, and explain the plan on how to follow through with cleanup.

Fix the Worst Offenders

From above you likely will find some worst offender tests or hopefully patterns that impact multiple tests. These even with flaky test rspec-retry may still fail to often to be reliable. If you dig into a few of the tests (during the timeboxing) you are likely to find some patterns. For example @TildeWill, fixed a whole class of flaky tests related to Capybara negative matchers. We also fixed entire categories of failing tests that weren’t properly using Capybara’s asynchronous matchers, each of these fixes added around 6% to the success rate of our CI suite per PR.

Common Flaky Test Issues

I won’t cover all the various types of flaky tests in as much detail as you can find in @samsaffron’s post, Tests that sometimes fail. Here are some of the most common issues we found while resolving issues.

  • fix timing issues (timezone dependant)
  • stale state issues due to non DB stores (global vars, redis, memcache, Rails.cache, etc)
  • fix order / dependency issues… Tests that only pass in a specific order
    • running specs in documentation mode can really help find ordering issues… Run it this way every time so you have a clear log of what ran when ``time bundle exec rspec –format documentation`
  • Capybara
    • devise auth issues 1 2
    • not using the aysnch matchers
      • example: expect(page.current_url).to eq(/something/) is bad, switch to waiting version expect(page).current_path.to eq(/something/) which is good.
    • using sleep opposed to correct waiting matchers
    • in geneal wait_for_ajax is dangerous
  • VCR allowing non-stubbed network requests can be dangerous, try to get your suite passing with, VCR.configure { |c| c.allow_http_connections_when_no_cassette = false }

Tips and Tools to Fix Flaky Tests

A few tips to help debug and fix flaky tests. I found each of these scripts extremely valuable in moving forward our success rate.

Single Flaky Test Case

Quickly, verify a test that fails randomly even in isolation, with this great tip.

Bugs
from @_swanson

Single Flaky Spec File

Quickly, check the success rate of a single file. I use this to report a before and after of a fix in a PR.

Calculate Success Rate of a Branch on CircleCI

Push a fix on a branch and run it a number of times to see if it improves the overall success rate.

Additional Resources on Resolving Flaky Tests

Make It Right

Most of the remaining work on restoring this test suite now falls into this category. The deployment pipeline succeeds at over a 95% success rate at around 10m. These are acceptable numbers for our project. What we haven’t done is resolve all of the flaky tests which pass because of retry attempts. Until we can move all the tests to be fully reliable there is work to be done. 

Make It Fast

I will dive into this more in a future article, but with some effort, the team was able to get our CI deployment pipeline down from over 30m on avg to only taking 10m on average. The CI jobs to run just our tests are down around 5m, with the full deployment jobs taking longer. I expect as we continue to make improvements and fix some of the known bad actors in our test suite, this number will continue to go down. Why did we make so much progress on “Make It Fast” before finishing “Make It Right”? Well, we needed a better and faster feedback loop to find and fix flaky tests, as well as to make it right. A fast feedback loop is really required to make progress quickly. Until we could increase the iteration cycles, we could only have so many flaky fix PRs make it through the pipeline in a day, and at the beginning testing locallt wasn’t really possible. In terms of make it fast, I did want to mention there are still two efforts under way.

Local Test Suite Speed

If the test suite it to slow to ever run locally it is also hard to test and ensure it reliably runs anywhere other than on CI. Initially, the test suite was so slow, it would either crash or stall out most of the time. Occasionally with many failures, it would complete after 9+ hours… After using CI to drive most of the fixes, now the local spec suite reliably runs on a single machine in 45 minutes. This is still far to slow for my liking but is headed in the right direction.

Deployment Pipeline Improvements

The CI deployment pipeline is the test suite, but also much more. This article isn’t going to focus on deployment improvements but without changes related to the tests or fixing flaky test failures. Various improvements cut our deployment in a third, I will detail this more in a future article. This involved breaking down all the continuous deployment steps finding inefficiencies, redundancy, and improving parallelization.

comments

Building SVG Badges in Ruby 10 April 2019

Building SVG Badges in Ruby

A while ago I needed to create a simple dynamic SVG. After a bit of tinkering, it was easy enough to build a simple Ruby class to do what I needed to do.

Example Badge

SVG created from the below code

It was a super quick thing to put together and solved a problem I was having. I was reminded of that today when I had another quick issue that I wanted to resolve. I wanted to pull some data not available by a services APIs, the data I needed was easily accessible to me in their webviews, so I quickly hacked together a web scraper, pulling the data I needed into a google sheet.

Being a Developer, Unblocks You

A thing I have always loved about being a developer is you can solve your own problems. You have to be careful to not get sucked into it and wasting a bunch of time, but you also aren’t blocked just because something you need isn’t already available. If you build a quick hack ensure it is just that a quick hack and that you won’t regret putting it into your workflow. The below SVG example was a tiny helper for devs, while today’ journey helped pull metrics on CircleCI related to metrics I want to pull for myself monthly. In either case, if it breaks it is no issue and can be fixed in a few minutes.

Being able to solve the problems you run into along the way is one of the great parts of being a developer.

Full Ruby SVG Badge Code

class SvgFormatter
  def initialize(output = nil)
    @output = output || STDOUT
  end

  def format(result)
    percentage = SOME_DATA_SOURCE.round(1)
    File.open('badge/results.svg', 'w') { |f| f.write(template(percentage)) }
  rescue RuntimeError => e
    @output.puts e.message
    @output.puts 'SvgFormatter was unable to generate a badge.'
  end

  private

  def template(percentage)
    file_content = <<~SVGTEMPLATE
      <?xml version="1.0"?>
      <svg xmlns="http://www.w3.org/2000/svg" width="90" height="20">
        <linearGradient id="a" x2="0" y2="100%">
          <stop offset="0" stop-color="#bbb" stop-opacity=".1"/>
          <stop offset="1" stop-opacity=".1"/>
        </linearGradient>
        <rect rx="3" width="90" height="20" fill="#555"/>
        <rect rx="3" x="51" width="39" height="20" fill="#007ec6"/>
        <rect rx="3" width="90" height="20" fill="url(#a)"/>
        <g fill="#fff" text-anchor="middle" font-family="DejaVu Sans,Verdana,Geneva,sans-serif" font-size="11">
          <text x="24.5" y="15" fill="#010101" fill-opacity=".3">Title</text>
          <text x="24.5" y="14">Title</text>
          <text x="68.5" y="15" fill="#010101" fill-opacity=".3">#{percentage}%</text>
          <text x="69.5" y="14">#{percentage}%</text>
        </g>
      </svg>
    SVGTEMPLATE
    file_content
  end
end
comments

Zola Electric Career 14 February 2019

Zola Electric Career

Tomorrow is my last day at Zola Electric, it is time for a different challenge. It has been an amazing ride over the last 4+ years. Helping grown from a single country with non connected devices to smart devices that can power a large variety of appliances. During my time with Zola it has grown to provide energy access to over 1 million people without grid access in Africa. It has been a rewarding and challenging experience. I wanted to share my favorite story from working at Zola over the last few years.

Installing Solar

My first installation in Tanzania

Want to see it in action, visit the Coverband Demo Site, visit a few pages and watch the coverage report change as you use the site.

Satisfying Work

On my first work visit to Arusha, we had scheduled a number of activities for me to better understand the business and our operations. One of which was to go watch an installation occur near Arusha. On arrival, the children were playing in the yard and extremely excited to see us visit. After a bit of discussion and planning of where cabling, lights, and the M-series box will go we get started. I volunteer to install the panel, with no ladder available, I climbed a tree and jumped onto the tin roof attached the solar panel in place. Using a screwdriver to part the roof panels to run the panel cabling through the roof to the box. I came down to help finish install some of the lights. The mother holding the youngest baby turns on her Zola lights for the first time and the kids’ cheer and dance by her feet while a smile of pride and happiness fills her face. It is possibly the best and most satisfying work moment of my life.

Happy Mother

The happy mother after installation of lighting system

More Photos!

Climbing the tree

Climbing to the roof

Jumping over

Jumping over to the roof, sure hope it holds me!!!

Adding a light

Hanging a light as part of the installation, seen earlier with the Mother below

comments

Coverband 3 03 October 2018

Coverband 3

Coverband gives deep insight into the production usage of every line of code in your Ruby app. Find dead code, find hotspots, see code paths that might be unexpectedly in high usage.

This release marks deep dives into Ruby performance, benchmarking, and driving almost all features based on improving performance. The release is a majory performance improvement, resolving some issues were Coverband 2 could have outliers that performed worse than the original Coverband 1 gem.

Bugs

Me first and the Gimme Gimmes, a punk cover band, image from Wikimedia

Want to see it in action, visit the Coverband Demo Site, visit a few pages and watch the coverage report change as you use the site.

What breaking changes are there?

  • drops Tracepoint
  • drops Ruby <= 2.3.0
  • drops JSON Gem dependency
  • drops various other features not needed without Tracepoint
    • memory cache, sampling, restricted to app folders, etc

What is new?

  • standardizes on Coverage array format vs sparse hash
  • rewrites store methods, for 60X perf!
    • implemented for Redis and File store
  • improved mountable web interface

What exactly is faster?

tldr;

Storing Coverage data to a data store. While this didn’t happen often when it did it could generate thousands of Redis requests on large apps, which could make for a very slow outlier request…

Long Version

The primary improvements come down to changing the storage format and moving from method of storage that required one call per file and assummed a small list of files, to a 2 pass stoage for all data. Basically, the previous storage mechanism created an N+1 type issue. Coverband 2 had solved the issue of capturing Coverage data, which was no longer a performance concern, in fact it captured all files by default as there wasn’t really a reason to filter for performance anymore… This created a new problem when trying to store coverage data. Eventually a very specific benchmark was added that simulates trying to store coverage reports for apps with 2934 files tracking coverage. In Coverband 3 by changing the storage format and the storage plan the storage benchmark improved by nearly 60X!

Simulate repeatedly storing coverage reports with Coverband 2.0.3:

rake benchmarks:redis_reporting
runs benchmarks on reporting large sets of files to redis
Warming up --------------------------------------
       store_reports     1.000  i/100ms
Calculating -------------------------------------
       store_reports      0.814  (± 0.0%) i/s -     13.000  in  15.982911s

Simulate repeatedly storing coverage reports with Coverband 3.0.0:

rake benchmarks:redis_reporting
runs benchmarks on reporting large sets of files to redis
Warming up --------------------------------------
       store_reports     4.000  i/100ms
Calculating -------------------------------------
       store_reports     47.111  (± 4.2%) i/s -    708.000  in  15.066259s

How did this impact Rails Apps

In general, depending on various settings the outliers weren’t that noticable or it made Coverband unusable for folks. I configured a Rails app with non ideal, but not terrible options, to show what this could look like and how Coverband 3 resolves the issue. If configured idealy even Coverband 2 would mostly only show in outliers, so this test is showing how even in a bad setup V3 performs with nearly no impact.

The benchmarks below are a further iteration of my AB Benchmarking Rails apps process. Now much easier to read and compare, by graphing by following this Apache Bench via Gnuplot guide.

No Coverband Coverband 2 Coverband 3
Bugs Bugs Bugs
mean: 27.272 [ms] mean: 35.762 [ms] mean: 28.460 [ms]
36.67 [#/sec] (mean) 27.96 [#/sec] (mean) 35.14 [#/sec] (mean)

Notice that with no Coverband and Coverband 3, the majority of requests show little variation, while Coverband 2 has frequent outliers pushing request timing all over the place. See full sized images of the benchmarks below.

What is next?

I am glad you asked, as we introduced a roadmap and set of feature discussions for upcoming Coverband improvements. Check out the changes.md for the latest, but in short here is some of what we have in the works.

Please give Coverband a shot and add any issue you have to the repo.

Footnotes

For those crazy few that really want all the data on the benchmarks…

No Coverband

No Coverband benchmark graph
rake benchmarks:coverband_demo_graph
Benchmarking coverband-demo.herokuapp.com (be patient)

Concurrency Level:      10
Time taken for tests:   54.543 seconds
Complete requests:      2000
Failed requests:        0
Requests per second:    36.67 [#/sec] (mean)
Time per request:       272.717 [ms] (mean)
Time per request:       27.272 [ms] (mean, across all concurrent requests)

Percentage of the requests served within a certain time (ms)
  50%    262
  66%    269
  75%    273
  80%    276
  90%    288
  95%    302
  98%    328
  99%    350
 100%   1400 (longest request)

Coverband 2

Coverband 2 benchmark graph
rake benchmarks:coverband_demo_graph
Benchmarking coverband-demo.herokuapp.com (be patient)

Concurrency Level:      10
Time taken for tests:   71.524 seconds
Complete requests:      2000
Failed requests:        0
Requests per second:    27.96 [#/sec] (mean)
Time per request:       357.619 [ms] (mean)
Time per request:       35.762 [ms] (mean, across all concurrent requests)

Percentage of the requests served within a certain time (ms)
  50%    325
  66%    353
  75%    378
  80%    398
  90%    489
  95%    545
  98%    622
  99%    682
 100%   1442 (longest request)

Coverband 3

Coverband 3 benchmark graph

rake benchmarks:coverband_demo_graph
Benchmarking coverband-demo.herokuapp.com (be patient)

Concurrency Level:      10
Time taken for tests:   56.919 seconds
Complete requests:      2000
Failed requests:        0
Total transferred:      16565120 bytes
HTML transferred:       14974000 bytes
Requests per second:    35.14 [#/sec] (mean)
Time per request:       284.597 [ms] (mean)
Time per request:       28.460 [ms] (mean, across all concurrent requests)

Percentage of the requests served within a certain time (ms)
  50%    273
  66%    282
  75%    290
  80%    296
  90%    319
  95%    344
  98%    381
  99%    410
 100%   1358 (longest request)
comments

Lead Developer London 18 July 2018

A Unique Conference

I attended #LeadDevLondon this year, it was a nice and different conference. Generally, I have attended more language-specific conferences. There were basically no language-specific talks at all during the conf, as everything was a bit more general about building and working well with technology teams. Given that these days a large part of what I do is manage teams of devs, it was good to hear other folks talk through some of the challenges and solutions they have gone through during their journey.

leaddev london stickers

My swag bag full of stickers

Some Favorite Slides

For my very favorite slide see the section with my tweets below ;) Otherwise, enjoy some favorite slides with little to no context ;) You can generally find the full presentation slides on the talks page and the full videos of the #LeadDevLondon talks on youtube.

leaddev london slides

@alicegoldfuss explaining containers

leaddev london slides

@alicegoldfuss explaining what else ops does ;)

leaddev london slides

@alexhillphd going over the art of code reviews, this talk had my favorite slide of the conf

leaddev london slides

@cmccarrick how to scale yourself and make important decisions

leaddev london slides

@tara_ojo on how to help Jr’s with good 1:1s

leaddev london slides

@nmeans lessons learn from 3 mile island

leaddev london slides

@ClareSudbery

leaddev london slides

@dbussink shows how distributed his team is

leaddev london slides

@dbussink explains how to make that an advantage

leaddev london slides

@WebDevBev on how to run awesome internships

leaddev london slides

@jqiu25 the journey to safe ways to edit production data

Four Slides From The Legacy Code Talk

I guess I really liked what @ramtop had to say about legacy code

leaddev london slides

leaddev london slides

leaddev london slides

leaddev london slides

A Talk Pattern I Want To See

One thing I noticed and really came away with from the talks is that every piece of advice and best practice really fits into part of a companies growth curve. Even the most agreed on advice can’t and shouldn’t apply the same to a single person startup and a company of thousands of people and hundreds of engineers. This made me crave a talk that opposed to saying how some team approaches a challenge and the ways it works for them, but how a best practice or process changes and adapts to different growth stages of a company…

As a single talk, an example let’s consider continuous delivery of the growth of a team. While the end best practices involve style checkers, peer code review, automated test suites, deployment to one or more staging, and feature flag or percentage based releases to production, before/after metric analysis on the impact of deploys… It would not be possible to start there nor would it be sensible to set up all that complexity when just getting started… I feel like the flow would naturally look something more like this.

  • 1 person startup -> manually run CLI deployment task from the dev machine
  • 2-3 people -> CI test suite, manual CLI deployment
  • 3-8 people -> static style checker, CI test suite, code review, staging deployment with stakeholder / QA review, push-button deployments and rollbacks automated on CI
  • 8-16 -> all the above, with multiple staging environments, some integrated with 3rd party sandboxes, feature flag based releases, metrics tied to specific deployed releases, and deployment queue to help schedule deployments
  • etc, etc, etc…

I think it would be great to see detailed talks on tech org structures, testing, monitoring, alerting, etc..

My Tweets

Some of the tweets I shared during the conf.

The Most Important Part, Stickers

Seriously, thanks to all the sponsors and friends handing out stickers. I love collecting stickers at conferences to come back and cover my climbing wall. I am building a large collage covering the entire climbing wall in layers of stickers. After returning from a conference I always have a huge collection of stickers to use, which is great. In this case, I basically was able to make an entire tile of my climbing wall a #leaddevlondon tile full of stickers I picked up during the trip.

leaddev london stickers

leaddev london stickers

leaddev london stickers

If you don’t know what to do with all the stickers you have collected at various conferences, feel free to send them my way ;)

comments

Benchmarks Bugs 10 June 2018

Benchmarking Bugs

On my previous post I covered benchmarking Rubygems… I was working on some follow up to improve the benchmarks and discovered a bug rendering one of the benchmark comparisons invalid. In this post we will cover what I had missed, how to avoid some gotchas, and ways to improve the readability of your benchmarks.

Bugs

image from pixabay

Spot The Bug

This is a simplified example from the source code. The bug is hard to notice without knowing the internals of the library, but if you understand Ruby’s Coverage library you might spot it.

namespace :benchmarks do
  desc 'set up coverband with coverage redis'
  task :setup_coverage do
    clone_classifier
    $LOAD_PATH.unshift(File.join(classifier_dir, 'lib'))
    require 'benchmark'
    require 'classifier-reborn'

    Coverband.configure do |config|
      config.root               = Dir.pwd
      config.percentage         = 100.0
      config.logger             = $stdout
      config.collector          = 'coverage'
      config.memory_caching     = ENV['MEMORY_CACHE'] ? true : false
      config.store              = Coverband::Adapters::RedisStore.new(Redis.new)
    end
  end
  
    def run_work
    puts "benchmark for: #{Coverband.configuration.inspect}"
    puts "store: #{Coverband.configuration.store.inspect}"
    Benchmark.bm(15) do |x|
      x.report 'coverband' do
        SAMPLINGS.times do
          Coverband::Collectors::Base.instance.sample do
            work
          end
        end
      end

      x.report 'no coverband' do
        SAMPLINGS.times do
          work
        end
      end
    end
    Coverband::Collectors::Base.instance.stop
    Coverband::Collectors::Base.instance.reset_instance
  end

  desc 'runs benchmarks coverage'
  task run_coverage: :setup_coverage do
    puts 'Coverband Coverage configured with to use default redis store'
    SAMPLINGS = 5
    run_work
  end
end

The issue is that the benchmark is trying to compare running Coverband with Ruby’s Coverage lib against code, which doesn’t collect usage data. When using Ruby’s Coverage is loaded and started, it changes how ALL other code is loaded and interpreted. Which means once the Coverage library is loaded both benchmarks will run with the performance impact. The only difference is once is collecting and reporting the coverage to Redis, via Coverband. Since the Coverage library effect the Ruby runtime it is best to run the benchmarks as entirely isolated processes to avoid blending the impacts of one benchmark into the other.

Spot the Bug in the Stats

Even if one doesn’t notice the issue in the code, the output from the benchmarks should make the issue stand out. In my previous post I noticed that I could no longer detect performance impacts of Coverband, that seemed to good to be true.

                      user     system      total        real
coverband         0.320000   0.010000   0.330000 (  0.322387)
no coverband      0.320000   0.000000   0.320000 (  0.321767)

It also didn’t match with real world Rails benchmarks. At the time in that post, I believed my benchmarks just weren’t sufficient to capture the small impact. Wanting to make my benchmarks easier to read and to be able to detect the known performance impact is what caused me to find the bug.

Making Benchmarks More Readable

As I was working to detail the the performance impact of Coverband, I decided I should port my benchmarks from the std-lib benchmark, to the excellent benchmark-ips gem by @evanphx. The gem has output in the format below. The new format in comparison to the default format, makes it much easier to understand performance differences. In this case because of the bug to see clearly a missing and expected difference. The improved Gem also helped me find better iterations to detect changes on my existing and working benchmarks.

Coverband Coverage configured with to use default Redis store
Warming up --------------------------------------
           coverband     1.000  i/100ms
        no coverband     1.000  i/100ms
Calculating -------------------------------------
           coverband     14.690  (±27.2%) i/s -    149.000  in  12.045429s
        no coverband     15.112  (±33.1%) i/s -    151.000  in  12.240970s

Comparison:
        no coverband:       15.1 i/s
           coverband:       14.7 i/s - same-ish: difference falls within error

The improved format along with increased iterations (5 seconds warmup and 12 seconds runtime) made it clear that there was truly ZERO difference between the Coverage benchmarks, which was wrong. This makes it very clear a there was a bug in my methodology.

Fixing the Issue

Once I realized Coverage was loaded for both runs, it was pretty clear what was wrong. I needed to do a couple things.

  • The Coverage lib needs to be both required and started before any code that we plan to track (this also wasn’t done in the previous benchmark, so actual coverage data wasn’t being collected correctly.)
  • Since Coverage changes how Ruby actually interprets the code when loaded. I needed to run the two timed tasks independently and finally compare the data.

It was pretty easy to port my old benchmark code to benchmark-ips, and to clean things up along the way. As well as handle both of the mentioned issues. See the new benchmark code below (simplified here, see Github for the full src code details).

namespace :benchmarks do
  desc 'setup standard benchmark'
  task :setup do
    clone_classifier
    $LOAD_PATH.unshift(File.join(classifier_dir, 'lib'))
    require 'benchmark'
    require 'benchmark/ips'

    # NOTE: When we require files is what makes performance impact of Coverage interesting (moving this above or below Coverage.start gives detectable vs not detectable performance impacts
    require 'classifier-reborn'
    if ENV['COVERAGE']
      puts 'Coverage library loaded and started'
      require 'coverage'
      ::Coverage.start
    end
    require 'redis'
    require 'coverband'
    require File.join(File.dirname(__FILE__), 'dog')
  end

  desc 'set up coverband with coverage Redis'
  task :setup_coverage do
    Coverband.configure do |config|
      config.root               = Dir.pwd
      config.percentage         = 100.0
      config.logger             = $stdout
      config.collector          = 'coverage'
      config.memory_caching     = ENV['MEMORY_CACHE'] ? true : false
      config.store              = Coverband::Adapters::RedisStore.new(Redis.new)
    end
  end
  
  def run_work(hold_work = false)
    suite = GCSuite.new
    #puts "benchmark for: #{Coverband.configuration.inspect}"
    #puts "store: #{Coverband.configuration.store.inspect}"
    Benchmark.ips do |x|
      x.config(:time => 12, :warmup => 5, :suite => suite)
      x.report 'coverband' do
        Coverband::Collectors::Base.instance.sample do
          work
        end
      end
      Coverband::Collectors::Base.instance.stop
      x.report 'no coverband' do
        work
      end
      x.hold! 'temp_results' if hold_work
      x.compare!
    end
    Coverband::Collectors::Base.instance.reset_instance
  end
  
  desc 'runs benchmarks coverage'
  task run_coverage: [:setup, :setup_coverage] do
    puts 'Coverband Coverage configured with to use default Redis store'
    run_work(true)
  end

  desc 'compare Coverband Ruby Coverage with normal Ruby'
  task :compare_coverage do
    puts 'comparing with Coverage loaded and not, this takes some time for output...'
    puts `COVERAGE=true rake benchmarks:run_coverage`
    puts `rake benchmarks:run_coverage`
  end
end

As you can see to handle the hard part of running the benchmarks independently and comparing them I used a cool feature of benchmark-ips. Loading the Coverage library or not is now controlled by ENV['COVERAGE'] making it easy to run with or without a change to how Ruby interprets the code.

Using benchmark-ips hold!

As I was starting to look at how to compare data on two runs, I checked to see if the new benchmarking gem I was using could help me. The documentation covered having a hold! feature, which seemed exactly what I needed.

If you are comparing multiple implementations of a piece of code you may want to benchmark them in separate invocations of Ruby so that the measurements are independent of each other. You can do this with the hold! command.
benchmark-ips, independent-benchmarking

Sounds great, but the hold! wasn’t detailed very well in the documentation. In fact another user had created an issue for the project trying to figure out how to properly use the hold!. I decided to read through the source code to figure it out and verify that it could do what I needed.

PR with example benchmark-ips hold! Usage

Well long as I take the time to figure it out, and I know others have struggled to understand the usage. We might as well try to make the usage clear to everyone. The project has an examples folder detailing several usages. Adding a small PR with a hold usage example, gave a simple usage scenario. Which was a simplified version of the approach I used to benchmark the Ruby Coverage library performance impacts in Coverband.

Updated Benchmark Conclusions

After fixing the bug, benchmarking Coverage is still very interesting. Even small changes in when Coverage is started and which files are required, can have a massive impact on the overall performance. Take a look at diff below, which shows moving a single line require 'classifier-reborn' to be above or below when we require and start Coverage, which means the libraries code is either included or not in the Coverage collection.

-    require 'classifier-reborn'
     if ENV['COVERAGE']
       puts 'Coverage library loaded and started'
       require 'coverage'
       ::Coverage.start
     end
+    require 'classifier-reborn'
     require 'redis'
     require 'coverband'
     require File.join(File.dirname(__FILE__), 'dog')

The first benchmark shows when classifier-reborn is required prior to setting up Coverage and therefor it’s data is excluded from the results.

rake benchmarks:compare_coverage
comparing with Coverage loaded and not, this takes some time for output...
Coverband Coverage configured with to use default Redis store
Warming up --------------------------------------
           coverband     1.000  i/100ms
Calculating -------------------------------------
           coverband     11.950  (±16.7%) i/s -    140.000  in  12.041935s

Pausing here -- run Ruby again to measure the next benchmark...
Coverband Coverage configured with to use default Redis store
Warming up --------------------------------------
        no coverband     1.000  i/100ms
Calculating -------------------------------------
        no coverband     14.497  (±20.7%) i/s -    161.000  in  12.020487s

Comparison:
        no coverband:       14.5 i/s
           coverband:       12.0 i/s - same-ish: difference falls within error

In this second benchmark we ensure the Coverage library is loaded and started prior to requiring classifier-reborn, which means we are capturing all the usage of that library code.

rake benchmarks:compare_coverage
comparing with Coverage loaded and not, this takes some time for output...
Coverband Coverage configured with to use default Redis store
Warming up --------------------------------------
           coverband     1.000  i/100ms
Calculating -------------------------------------
           coverband      9.117  (±11.0%) i/s -    109.000  in  12.108073s

Pausing here -- run Ruby again to measure the next benchmark...
Coverband Coverage configured with to use default Redis store
Warming up --------------------------------------
        no coverband     1.000  i/100ms
Calculating -------------------------------------
        no coverband     14.184  (±14.1%) i/s -    164.000  in  12.017139s

Comparison:
        no coverband:       14.2 i/s
           coverband:        9.1 i/s - 1.56x  slower

In these examples the only difference is that the library classifier-reborn is either included in the recorded coverage or ignored by the Coverage library, based on when we require it. The improved benchmark code:

  • is easier to read than it was previously (both code and benchmark output)
  • it clearly captures and allows for easy performance tests of Coverband using Coverage
  • It makes easily clear how much of a win Coverage is over the older TracePoint API
    • Coverage showing either no diff or 1.56x slower (depending on the coverage scope)
    • Tracepoint showing around 3.95x slower

Full Benchmark Output

Here is the full benchmark comparison output across all benchmarked configurations of Coverband.

Coverband tracepoint configured with file store
Warming up --------------------------------------
           coverband     1.000  i/100ms
        no coverband     1.000  i/100ms
Calculating -------------------------------------
           coverband      3.348  (±29.9%) i/s -     39.000  in  12.095922s
        no coverband     13.921  (±21.6%) i/s -    160.000  in  12.048469s

Comparison:
        no coverband:       13.9 i/s
           coverband:        3.3 i/s - 4.16x  slower

Coverband tracepoint configured with default Redis store
Warming up --------------------------------------
           coverband     1.000  i/100ms
        no coverband     1.000  i/100ms
Calculating -------------------------------------
           coverband      3.726  (± 0.0%) i/s -     45.000  in  12.286824s
        no coverband     14.736  (±27.1%) i/s -    150.000  in  12.059519s

Comparison:
        no coverband:       14.7 i/s
           coverband:        3.7 i/s - 3.95x  slower

comparing with Coverage loaded and not, this takes some time for output...
Coverage library loaded and started
Coverband Coverage configured with to use default Redis store
Warming up --------------------------------------
           coverband     1.000  i/100ms
Calculating -------------------------------------
           coverband     12.173  (±16.4%) i/s -    143.000  in  12.017026s

Pausing here -- run Ruby again to measure the next benchmark...
Coverband Coverage configured with to use default Redis store
Warming up --------------------------------------
        no coverband     1.000  i/100ms
Calculating -------------------------------------
        no coverband     15.045  (±13.3%) i/s -    177.000  in  12.024386s

Comparison:
        no coverband:       15.0 i/s
           coverband:       12.2 i/s - same-ish: difference falls within error

Driving Performance Impact Improvements Based on Learnings

Now with an ability to detect performance differences when running Coverband with Ruby’s Coverage library, I can build features and benchmarks to show how to reduce the performance impact on real world use cases.

For example in general with Coverband the goal is to show the coverage of application code ignoring framework and gem code. Knowing that which files are loaded at what time related to Coverage start can impact performance. I can build a feature to ensure that Coverband sets the Coverage library to ensure ONLY the application code is being tracked by Coverage, which will help ensure the lowest possible performance impact.

I will be working on this feature (Coverband Safelist) and new benchmarks to show how much application developers could reduce the performance burden on standard Rails apps by using a Safelist. Opposed to allowing various Rails framework code to be tracked along side their application code which is required by the current version of Coverband, given how Rails loads code. Look forward to another post, when that feature and benchmarks are ready.

comments
Dan Mayer Profile Pic
Welcome to Dan Mayer's development blog. I primary write about Ruby development, distributed teams, and dev/PM process. The archives go back to my first CS classes during college when I was first learning programming. I contribute to a few OSS projects and often work on my own projects. You can find my code on github.

Twitter @danmayer

Github @danmayer