How to Troubleshoot a Problem

Building computers early on in life was an invaluable experience. Fighting with mobos & hunting for drivers taught me more about software development (& customer support) than hardware. I've been trying to summarize those lessons into an easy framework for debugging, troubleshooting and fighting most problems. Below is just what I have so far. Hopefully it's useful in some way.

1) Don't Panic - if this sounds familiar, congrats, you're a geek. You have to be calm & stop freaking out to be of any use. Someone going ape shit in your face is the fastest way to stop your brain from working. So take a moment, read Zen Habits and just chill for a second.

2) Identify the Problem - before doing anything, ANYTHING, you must figure out what's going wrong. Is the page not displaying? Is an error appearing? What's in the stack trace? What is happening that shouldn't be? 3/10 times you will stop here because nothing is wrong at all. Someone was just going crazy. If it's not one of those times, be thankful that you figured out what was going on & are closer to fixing it.

3) Stop the Bleeding - if your server is not responding, your website is down or a database is just hung, stop thinking about why its happening & get it back up & running! Seriously. Stop reading! Go do it! It's easier to explain what went wrong after things are working than trying to explain why the Apocalypse is inevitably happening RIGHT NOW.

4) Document the Symptoms - ok, you don't actually have to write anything down, but at least understand what is going on. Get out a whiteboard. List it out in front of your face. You may recognize something.

5) Diagnose the Disease with Peers - come up with some kind of diagnose and ask your buddies and/or co-workers. Start talking about it. Someone may say something that triggers a memory that helps you identify the root problem. Or someone may already know what's wrong but just hasn't shared it yet.

6) Start Treatment - make forward progress. Come up with a regiment of patched, upgraded, caching, mapreducin', memory-managed, leak-destroying updates. Throw them at the problem & see what happens.

No comments: