Dan Mayer
10 April 2022

Redis & Sidekiq

Redis & Sidekiq

A collection of notes about working with Sidekiq and Redis. A previous post about Ruby & Redis, briefly touched on some things, but I will get into more specifics in this post.

Redis and Background Jobs

A common usage of Redis for Rubyists is for background jobs. Two popular libraries for jobs are Sidekiq and Resque. At this point, I highly recommend Sidekiq over Resque as it is more actively maintained and has more community support around it. I am not going to get into too many specifics of Sidekiq and Resque, but talk a bit more about how they use Redis. There are always some gotchas when working with Redis, ask folks about sometimes an incident that occurred because of a keys, flushall, or flushdb command. Some of these commands are destructive which is always something to be careful with, but they also all have very slow performance characteristics. It is worth noting how some of the calls in Resque and Sidekiq scale with queue depth, which is critical to understand.

A Incident Caused by our Sidekiq/Redis Observability

UPDATE: We no longer think the line below is the culprite, we observe latency growth and decline with queue size, but we are unsure of the cause and unable to reproduce. As seen, in the NOTE above the Sidekiq latency call is O(1+1) and therefor fast and predictable.

We got into trouble when moving from Resque to Sidekiq because our observability instrumentation was frequently making an O(S+N) call (Sidekiq’s queue latency). It wasn’t much of a problem until we had one of our common traffic spikes that results in a brief deep queue depth. Our previous Resque code didn’t have any issues and had some similar instrumentation being sent to Datadog. While our Sidekiq code had been live for days, this behavior where our processing speed decreased with queue depth hadn’t been observed or noticed. The problem came to light when on a weekend (of course) a small spike caused a background job backlog, as we an expected common case. The latency went way up due to our instrumentation and we started processing jobs slower than we enqueued them. This fairly quickly filled our entire Redis leading to OOM errors.

Redis Sidekiq Analytics

Analytics Monitoring our Recovery

These charts are from after the incident. We moved to a new Redis to get things back up and running during the incident, and after things were under control worked on draining the old full Redis, in a isolated way that couldn’t impact production load. In this graph, you can see as we reduce the queue size the latency of our Redis calls also reduces in step. I included CPU to show how hard we were taxing our Redis, this chart isn’t 1:1 as we were adding and removing workers and making some other tweaks, but the queue size -> latency is a direct correlation.

Code Culprite

NOTE: Update Mike responded that he doesn’t think the latency call is the issue so we are further investigating

UPDATE: We no longer think the line below is the culprite, we observe latency growth and decline with queue size, but we are unsure of the cause and unable to reproduce. As seen, in the NOTE above the Sidekiq latency call is O(1+1) and therefor fast and predictable.

As mentioned it wasn’t any of our normal code that was really the problem, it was this line that was part of our instrumentation and observability tooling. Sidekiq::Queue.new(queue_name).latency. As with any incident, there were a ton of other related things, but it is worth noting that this seemingly simple line could have some hidden gotchas or an outsized impact on your Redis performance. As that latency call scales linearly with queue size, it is calling Redis’s Lrange under the hood which is an O(S+N) operation.

Sidekiq / Redis Performance

A colleague @samsm, helped dig into this incident by putting together the queue size -> latency charts above as well as all the helpful tables I am sharing below. These showe how Sidekiq calls translate into their Redis implementation operation details and the operational costs.

Redis / Sidekiq Math

Doing the math on various Sidekiq operations: how much will they impact Redis?

Big O Complexity Notation 101

Big O Notation

Redis Big O

Sidekiq Operation Complexity

Redis Sidekiq Mapping

Additional Sidekiq / Redis Reading

Some additional reading if you want to dig in further on working with Sidekiq and Redis


Dan Mayer
04 April 2022


I am starting a series of self-reference materials on staples of web application delivery. A place to collect and document my learnings and understanding of various technologies I often reach for when designing systems. The first in this series was about Ruby and Redis While this one will focus in on Content Delivery Networks. These reference pages will be updated over time and evolve as my usage or focus changes.

Content Delivery Network (CDN)

A Content Delivery Network can help serve high traffic or high performance websites as well as offer a number of features.

A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s as a means for alleviating the performance bottlenecks of the Internet[1][2] as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today… – wikipedia

Minimum of what you want your CDN to be doing for you

  • Smart routing: last-mile network distribution
  • Speedy established TLS: speed up your TLS handshakes
  • DDOS Protection: Cached pages are kind of already protected but many CDNs offer DDOS protection (you can also do this at your load balancer layer)
  • Serving assets: handling serving assets to avoid having static file load hit dynamic servers
  • Caching: at least assets, but even better for HTML / API content
  • Compression: gzip and brotli

CDN Setups

Often you might want different configurations and settings for different purposes and uses of CDNs. For example, it is fairly common to have an asset CDN with long-lived caches for assets, while you might want applications to be able to specify find grained caching headers for HTML and API content.

Why You Should have A CDN

It can help even if you aren’t quite ready to fully leverage it. CDNs have a lot of hidden advantages, and with some early setup can be very quickly utilized to handle insane amounts of traffic. Handling loads of largest advertising campaigns for the initial experience even if all those new signups have to be pushed into a queue to handle when you can. As this tweet points out even after folks think of all the obvious reasons to have CDNs, there are often many other clever ways to use CDNs.

A Start Up Story

My first startup got a viral link taking it down and by the time we got page caching on our rails server a few hours had passed and we lost the bump… For this first bit of traction, a CDN could have made all the difference. The content would have been easy to protect with the single viral page behind the CDN and appropriate content headers. That startup never succeeded, but early on much time was wasted trying to improve app layer caching and performance where a CDN could have been a big help.

CDN Gotchas

While CDNs are great, nothing is free and every abstraction adds some complexity to your system. Understanding the values it can provide and also understanding some of the gotchas cna help your team decide if it is the right decision for the system.

  • Accidentally caching private pages/data!
  • caches including things like a set cookie (sessions, return_to, etc) that should be user-specific
  • CSRF, many traditional protections like the built-in Rails CSRF don’t work well with cached pages
  • difficulties with various security implementations like content-security-policy nonce implementations
  • Having different rules for cache keys and what information is sent to the origin
  • you may only cache on specific cookies and headers…but you might want your origin to receive all headers to help with debugging or other info, this is an additional mental load when understanding a request-response cycle.

Understand the reliability risk and how to mitigate it

I have written previously about request depth and availabiltiy. A CDN is an additional layer in the request depth of your application stack… It also only has a 99.9% AWS SLA, as the entry point to your systems which means that is the upper limit on your overall SLA pretending for a moment everything else was 100% reliable. Now, AWS and most CDNs actually have far better real-world uptime and success rates during normal operations, but all of the major CDNs have had notable major outages. Including Cloudfront having a major outage the day before thanksgiving. There are a few protective measures one can take a team really needs to ensure higher availability of their site, neither suggestion is cheap.

 * A team can implement and support High Availability Origin Failover as a protective layer within Cloudfront, to protect against origin level failures.  * A team can implement multi-CDN DNS failover This can be done in an automated or manual fashion depending on complexity and cost concerns.

CDNs vs Load Balancers

These days CDNs can do a lot of what load balancers do, by routing different types of requests to different back-end “origin” servers. Geo target routing to support locale and nearest reach servers… While this overlaps with load balancers, I generally end up with a configuration where I have both a CDN and load balancers in place. My setup looks often looks like so: web browser -> CDN -> ALB -> Application Servers.

CDNs Enhancing Request Payloads

As part of the ability to route different requests, CDNs now often handle basic IP Address Geolocation, allowing one to route requests from different countries, language support, city, postal code, or more to different origins. Even if your application doesn’t use different origins, the Geolocation information is often extremely useful. You can avoid additional network calls or 3rd party integrations to leverage the CDN’s built-in geolocation support. For example, AWS Cloudfront can provide Geolocation info on all requests to the CDN to impact caching logic as well as header hints to your application servers. For example, it is helpful to use this data to detect if a user might be accessing your site on the wrong domain given the country they are in. Example additional header data.

CloudFront-Viewer-Country-Name: United States
CloudFront-Viewer-Country-Region: MI
CloudFront-Viewer-Country-Region-Name: Michigan
CloudFront-Viewer-City: Ann Arbor
CloudFront-Viewer-Postal-Code: 48105
CloudFront-Viewer-Time-Zone: America/Detroit
CloudFront-Viewer-Latitude: 42.30680
CloudFront-Viewer-Longitude: -83.70590
CloudFront-Viewer-Metro-Code: 505

Note: inside a Ruby app, Rails/Rack will format the headers adding in HTTP and upcasing, so you could access this data in a typical Rails app like so request.headers["HTTP_CLOUDFRONT_VIEWER_CITY"].

CDNs Are Evolving

This post covers more traditional CDNs and features common consumer-facing sites should consider leveraging. CDNs are now moving into the realm of cloud infrastructure with [email protected] and other CDNs allowing code (most commonly Javascript) to be deployed to the CDN network. The features and options supported when looking at having fully supported runtime code at the edge opens up new architecture and application stack options. I am looking at FAAS and edge deployed code, but haven’t leveraged it in any significant production environment yet. It is definitely a space to keep an eye on that promises to simplify global app distribution while maintaining extremely performant consumer experiences.

Additional CDN Links


Dan Mayer
26 March 2022

Ruby & Redis

Ruby & Redis

A collection of notes and some tips about using Redis.

Redis Setup

Redis is super easy to setup, and in dev mode often just works right out of the box, but as you leverage and scale it inproduction, you might want to think more about it’s setup beyond just setting a default REDIS_URL ENV var. Often a basic Redis for simple product is just setup like so…

Redis.current = Redis.new(url: ENV['REDIS_URL'])

This has some issues:

A better setup adding in configurable options:

  url: ENV['REDIS_URL'],
  timeout: ENV.fetch("REDIS_TIMEOUT", 1),
  reconnect_attempts: ENV.fetch("REDIS_RECONNECT_ATTEMPTS", 3),
  reconnect_delay: ENV.fetch("REDIS_RECONNECT_DELAY", 0.5),
  reconnect_delay_max: ENV.fetch("REDIS_RECONNECT_DELAY_MAX", 5)

If you are wanting to configure a Redis and use it across threads, using a Redis connection pool is recommended.

pool_size = ENV.fetch("RAILS_MAX_THREADS", 10)
redis_pool = ConnectionPool.new(size: pool_size) do
    url: ENV['REDIS_URL'],
    timeout: ENV.fetch("REDIS_TIMEOUT", 1),
    reconnect_attempts: ENV.fetch("REDIS_RECONNECT_ATTEMPTS", 3),
    reconnect_delay: ENV.fetch("REDIS_RECONNECT_DELAY", 0.5),
    reconnect_delay_max: ENV.fetch("REDIS_RECONNECT_DELAY_MAX", 5)

Although this means when using it you need to grab a pool connection first

# original style, which is deprecated and would block across threads

# utilizing a pool
redis_pool.with do |conn|

thx @ericactripp, for sharing the link about connection pools

Redis in Common Libraries

All the above helps when you are working with Redis directly, but often we are configuring common libraries with Redis, how many of them are able to leverage the same kinds of benifits like a connection pool?

# common config that won't leverage a redis connection pool
config.cache_store = :redis_cache_store, {
  url: ENV["REDIS_URL"],

# by setting the pool side and timeout, you can leverage a connection pool with your Redis
config.cache_store = :redis_cache_store, {
  url: ENV["REDIS_URL"],
  pool_size: 8,
  pool_timeout: 6

Investigate Combining Redis Calls

If you have an app that is making many sequential Redis calls, there is a good chance you could make a significant improvement by leveraging Redis pipelining or Mget. I think that that the Flipper codebase is a great way to learn and see various Ruby techniques. It is high quality and has a wide adoption so you can trust it has been put through the paces. If you want to dig into combinging calls, read about the differences between pipeline and mget in terms of code and latency.

Pipeline Syntax Change

As long as we are updating some of our calls, worth being aware of another depracation. “Pipelining commands on a Redis instance is deprecated and will be removed in Redis 5.0.0.”

redis.pipelined do

# should be replaced by

redis.pipelined do |pipeline|

Redis Usage Beyond The Basics

If you are looking to do a bit more than the default Rails.cache capabilities with Redis, you will find it supports a lot of powerful feature, and can along with pipelining be extremely performant. If you are looking to push for as performant as you can, setup hiredis-rb as your connection for redis-rb. It uses C extensions to be as performant as possible. This post goes into some details where direct caching wiht Redis can provide more powerful capabilities than using Rails.cache

Redis CLI

A few useful tips around using Redis and understanding how your application is using Redis.

  • brew install redis: install redis via homebrew
  • brew uninstall redis: uninstall redis via homebrew
  • brew info redis: Get info on currently installed redis
  • redis-cli ping: Check if redis service is running
  • redis-cli monitor
  • redis-cli slowlog get 100

Exciting Things Are Happening with Ruby Redis

A good deal of things will be changing in Redis-rb 5.0, we mentioned Redis.current and the redis.pipelined changes. These changes and others help support a move to a simpler and faster redis-client under the hood.

A move to simplify the redis-rb codebase and drop a mutex looks like it will roll out redis-client, which can significantly speed up some use cases. It looks like sidekiq for example with move to this in the near future.

Update: Looks like that perf win was a bit to good to be true.


Dan Mayer
18 January 2022

Micro-Service Request Depth Availability

Micro-Service Request Depth Availability

In systems that use micro-services, often the growth and interaction of the services grow organically over time. While it is enabling teams to move quickly and integrate whatever they need it leads to some known bad patterns in terms of micro-service interactions that have serious impacts on availability. This post explains two of micro-service integration anti-patterns calling them “The Mesh” (I prefer Distributed Monolith) and “Services in Depth”. In both, the issue is a single request into your system can fan out to many individual services both in breadth and depth. Most micro-service systems I have seen have a mix of both fan-outs in-breadth and deep lines of service depths in some cases.

Understand the Implecations of Deep Service Call Depth

Let’s consider service depth, as the simpler version of the problem to reason about. When a team is investing going micro-services some breath and depth calls are to be expected but understanding what it means and how to consider the impacts in the designing of the system. Below we will consider that each application has an aggregate request availability of 99.9%.

What does the request success rate look like for a request with 6 micro-service call depth?

> (0.999) * 100
=> 99.9
> (0.999 * 0.999) * 100
=> 99.8001
> (0.999 * 0.999 * 0.999) * 100
=> 99.7002999
> (0.999 * 0.999 * 0.999 * 0.999) * 100
=> 99.6005996001
> (0.999 * 0.999 * 0.999 * 0.999 * 0.999) * 100
=> 99.5009990004999
> (0.999 * 0.999 * 0.999 * 0.999 * 0.999 * 0.999) * 100
=> 99.4014980014994

Assuming each service has an aggregate availability of 99.9%, A service call depth of 6 has a request availability of %99.4

This is likely a lot lower than teams expected. Also, often depending on your infrastructure it is a lot easier to stack up to six network calls than you may think. Also, while I am not covering the impact to latency in this post, understand it will have a large and negative impact on latency for a deep request call stack.

Visualizing The Request Failure Rate

A nice way to think about the combined success rate is by thinking of each network hop as having a small opportunity for failure. These failure threads peel off as requests navigate the micro-service call stack. As the complexity of the network communications increases and the call depth deepens, the likelihood of failure increases as well. This would include things like your load balancer, DBs, App servers, and application caches.

Each network hop is an opportunity for failure, in the above showing 7 failure opportunities

Request Depth Failure Trends

Another way to visualize this is just a simple bar chart showing a decline of expected availability as service depth grows.

Micro-Service Request Availability Calculator

The below calculator will let you quickly estimate your theoretical availability based on the estimated SLA across multiple service calls. Consider each part of your infrastructure (CDNs, load balancers, DBs, Caches) as well as the total services involved in a successful response to a request.

Micro-Service Call Depth Availibility Calculator


So far we have been talking about micro-services and their availability, we should consider the services often used to host our application code into the cloud. A very popular cloud for hosting service is AWS, which publishes all AWS SLAs. Let’s look at this from the perspective of a typical AWS application, assuming a single app stack (no micro-services), a pretty standard setup, and assuming the app code has a runtime SLA of 99.9%, the combined math leaves a total theoretical max request success expectation of %99.6. If you are building something that needs extremely high reliability, are you able to keep availability promises?]

You can see as you stack AWS services, regardless of your application stability, request success percentage decreases…

Service SLA % Success Math Request % Success
cloudfront 99.9 0.999 99.9%
ALB 99.99 0.999 * 0.9999 99.89%
ECS 99.99 0.999 * 0.9999 * 0.9999 99.88%
Custom App 99.9 0.999 * 0.9999 * 0.9999 * 0.999 99.78%
Elasticache 99.9 0.999 * 0.9999 * 0.9999 * 0.999 * 0.999 99.68%
RDS (Postgres) 99.95 0.999 * 0.9999 * 0.9999 * 0.999 * 0.999 * 0.9995 99.63%

Mitigations / Considerations

When you realize as the system scales and grows and the number of total microservice dependencies a typical request into your system may have, it is worth thinking about and considering some mitigation strategies. Opposed to pushing towards Five Nines: Chasing The Dream?, embrace failure and resilience, find an acceptable and achievable level of availability for your service. Then invest in mitigation techniques and strategies to deliver a reliable client experience on unreliable internet. A few examples of mitigations are listed below.

  • Client Side Retries: A good reason to have client-side retries and avoid implementing retries at all levels of the infrastructure (some other special cases may make sense to avoid full round trips). See Google’s SRE book, sections Client-Side Throttling and Deciding to Retry from the Handling Overload chapter.

  • Be Wary of Circular Graphs: Detect circular graphs, even if this can technically be supported in your infrastructure, it may be best to avoid as a way to force folks to think through more robust and scalable solutions.

  • Avoid Duplicate Service Calls: This happens when you might have a very common piece of data that a service calls, before you know it all your micro-services call this in high demand service. You might have an initial request fan out to 3 micro-services that all call this common data service under the hood. This often happens for something like user data.
    • Consider common data and look at data forwarding, which can early in the request processing add metadata that is sent to all upstream requests. Avoiding all upstream requests from making individual network requests for the data.
  • Agree on Constraints: Consider alerting on requests that exceed an agreed maximum service call depth or circular call graphs.


There are a lot of benefits of microservices, but I feel like the expectations around reliability and latency are often overlooked when folks move from a larger shared codebase and adopt microservices. The companies are looking for faster deployments and teams that move independently and do not fully grasp that they slowly have turned every method call or DB join into a remote network request with all the failures and performance characteristics that come with it.


Dan Mayer
08 January 2022

Revisiting Front-Ends

Revisiting Front-Ends

I have been a ‘full-stack’ developer for a long time. These days depending on where you work and how your org works being full-stack isn’t really viable anymore. Given the growing complexity of both the front-end and back-end end systems, it is more and more required to specialize. That being said, I feel like there are good reasons to understand and think across the front-end boundaries. For example, if you care about user performance how you design backend APIs and deploy front ends can have a massive impact. From fully supporting and leveraging CDNs, pre-fetching, cached API-queries, and more. Anyways, as my front-end skills fell further behind and some exciting changes have been occurring in the front-end world it was a good time to spend a little time refreshing my knowledge and sharpening my tools.

Where I started

I decided to look at a couple of different projects and ways to explore the space. This hasn’t brought me fully up to speed with the amazing front-end folks I work with, but I have learned a lot and enjoyed working in a bit more visual space.

  • I worked on a side project modernizing its look and design updating Semantic-UI/Fomantic-UI designs.
  • I picked up, Modern Front-End Development for Rails and worked through some exercises.
  • I dug into CDNs, caching, and compression options (Brotli)… modernizing our setup at work
  • I helped automate and setup lighthouse tracking on projects at work, and fixed some low hanging fruit
  • I played around with hotwire
  • I picked up Tailwind CSS for a few toy projects
  • I converted my blog to Tailwind CSS from an old customized Twitter bootstrap theme
  • I started working on some visualizations in D3.js, diagramming network traffic, failure rates, and org/system structures.
  • I built a presentation as a Tailwind / D3 microsite vs a slide show

Some of the next things coming up?

  • I will likely convert over the Semantic-UI/Fomantic-UI site to Tailwind
  • I am helping support our frontend team on efforts to decouple our Front End deployment from Rails
  • I will be digging into our custom react design system a bit more at work and porting over a few pages to it* Adding some more visualizations directly to this blog


After trying out a few frameworks including a bit of a deep dive on Semantic-UI/Fomantic-UI. I wasn’t satisfied, the growing buzz around Tailwind pulled me in. I still have a lot to learn and a ways to go, but it is better matching my needs/desires for front-end support than anything has in a long time. As I played with Tailwind, I needed a few projects to drive a bit more real-world usage.

Converting the Blog to Tailwind

This blog you are reading moved from Twitter Bootstrap now supports Purge CSS for the various pages, has Tailwind layouts, templates, and dev support. Nothing too complicated, but I feel it looks much better than it did. Simpler header, more readable font / white space. Dropped most of the sidebar, etc… I had to reformat some of my Markdown and post tags, I wrote a conversion script to reformat my post history.

Bootstrap vs Tailwind

Tailwind Learning Sources

There are a lot of great resources out there and I wanted to share a few. I have also been using Tailwind on some test Rails projects, so some of the links are more Rails -> Tailwind specific.

Other Learning Projects

Other than the blog, a few other examples from my recent front-end exploration exploration.

Coverband Semantic

Coverband Web built in Semantic-UI

D3 Network request flow chart

Org / System Relationships

Presentation breaking down org charts, team / system relationships, and network request flows (D3 and Tailwind)

D3 SLA Chart

D3 Request Max SLA Calculator (interactive visualization)

Final Thoughts

It is good to revisit and resharpen skills in an area even if you aren’t planning to be an expert. While I don’t really do full-stack work in my daily workflow anymore, I am often heavily influencing our system design and architecture as it relates to microservices, mobile, and the future of our front-ends. I want to ensure I am still looking closely enough to know what questions to ask and understand when folks are sharing ideas and concerns… I want and need to know the landscape, so to speak, including some of the toolchains, pain points, and benefits over older styles of development. A quick bit of focused exploratory work can help one stay fresh while also not slowing down or getting in the way of the experts doing the real front-end work where I work. I am able to be a better and more capable partner in discussions and designs.

As I continue learning more about Tailwind and Visualization tools, please share any good links with me.


Dan Mayer
04 January 2022

Principal / Staff Engineer Resources

Principal / Staff Engineer Resources

A friend who recently was promoted to a Staff Engineer, and wanted to learn a bit more about the role asked me if I had any good resources. I put together a small set of resources that could be helpful when aspiring or transitioning into Principal or Staff level engineering roles. I figured it would be good to share out with others who might be interested. I broke up the resources into a few different categories, so folks can dig in where relevant. It is worth noting that higher-level technical roles have a lot of variance between companies. Folks will find that they can often shape the role or find a company that defines the role in a way that will be interesting to them. Don’t feel like you need to fit into a tightly defined box, long as you are providing high-level value, there is a place for you to grow and apply your technical experience. You will hear of staff engineers who never code, while others with the same title still consider software development a core part of their job responsibilities. Whatever way you want to grow and increase the value of your work there is always more to learn as you expand your technical software career and look to have a broader impact on software development.

Staff and Leadership Resources

A few things actually focus on this narrow niche of technology roles. Overall, these are all excellent resources that I highly recommended. This list includes sources that continually create and add new content.

Staff and Leadership Articles

There are many good articles as well, but obviously these are more of one offs.

Keeping Up with Tech in General

As you are shaping technology decisions that can have impacts for years, plan to try to keep up on some of what is happening with the industry in general. Find sources that relate to your field, attend conferences, read newsletters, or listen to podcasts. Figure out the best way for you to low effort keep your ear to the ground. A few resources, I have enjoyed.

Distributed Team Resources

My friend and I both work on remote-first teams. A few resources around remote work, and how to lead tech outside of centralized teams.

  • First, if you are working remote long term, it is worth getting a good remote audio/video setup
  • I don’t think top down leadership scales particularly well for software, I think it is even less effective when distributed. Scaling Architecture Conversationally sets out a number of approaches to encourage a more team sourced achitecture with guidance.
  • Hashicorp: Distributed, Async, and Document Driven - learn from how Hashicorp builds distributed software
  • Remote, while this company seemed to go off the Rails (pun intended) and loose a lot of the good will they had in the remote community, much of the advice can still be useful.


Although there are lots of great posts, talks, and threads about technology leadership, sometimes nothing can give the big picture and the depth of a book. The shortlist of books below, stand out in my mind as worth the time.

  • Accelerate - If you want to shape how a software org functions, ensure you are basing it on research backed successes. This book summaries what is actually working best based on research.
  • The First 90 days - While I highly disagree with a few things in this book, there is enough that is actionable and will help you have an impact as well as understand what is motivating other new leaders, that it is worth reading.
  • The Manager’s Path - Even if you are looking to stay on the Staff / technical track many parts of this book will help you grow with parts of your job.
  • Working Backwards - Another book I don’t agree or like all the recommendations, but has enough practical and actionable advice on how Amazon scaled and managed to stay agile with massive growth, that it is worth reading.
  • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems - Even if you aren’t handling big data (yet), this book will let you understand when and why to reach for distributed systems and tools.
  • Domain Driven Design - One of the most important things while scaling and hardest to fix later is bad domain modeling, deeply learn about building good data models.

Ruby Specific

I am mostly operating in the Ruby world, and my friend is also at a Ruby-based company, so some Ruby-specific resources. As you grow as a tech leader in some sense you will care less about a specific technology, but there are reasons to keep an ear to the ground for the technologies your business has invested so much in. As I have grown my career, I have found it valuable to stay up to date with things going on in the community. If your company is invested in Ruby it means your hiring, training, operations, and infrastructure costs are all related to Ruby, ensure you are using it at its best.

Enjoy the Role

Good luck with the new role and growth. As always there are lots to learn, but there is a growing community out there to find friends and mentors and talk about what you want out of your career. The highly defined ladder is changing and roles are more malleable as we move to more hybrid and distributed ways of working. Feel free to explore and help shape the ways folks can lead in technology, it doesn’t have to be a path to being a manager, director, or VP in all cases anymore. It is an exciting time to be a technical learner and leader.

If you have any good articles, sites, or books please share them with me, as I am always looking to learn more about what others in this area are doing.