When the show was on the Food Network, I always enjoyed the Japanese battles the best. I was a fan of Rokusaburo Michiba because of the skills that he demonstrated. Not just his cooking proficiency, but the very way he approached every battle. I like to imagine myself approaching battle in the same way. The difference is that my opponent isn’t another chef or DBA. My opponent is a production outage.
Eye of the StormThe beginning of a battle involved a lot of hectic running around trying to get lots of dishes started. It was difficult to figure out what any one person was doing and you definitely couldn’t see any semblance of a team plan for the battle. Just everybody doing something and somehow making it work.
On Michiba’s side of the kitchen, something different was occurring. He would get his assistants started on different preparations while he took a piece of paper and a pen and wrote out the menu in beautiful Japanese characters. At first, I thought this was crazy. He’s wasting precious time. That’s 4 or 5 minutes he can’t get back at the end of the battle when he’ll need it. Yet somehow, he never needed it. In fact, his dishes always seemed to finish with perfect timing right at the very end. At the same time, I saw his opponents many times talking about dishes they had planned to make but ran out of time. How could this time waster end up so perfectly so often?
The answer of course is planning. When others were panicking and rushing around, he was writing out his menu. After that, he and all of his assistants knew exactly what needed to be done and when. They had a plan, and they knew how to follow a plan. The trouble with this is that nobody wants to plan when there is a production outage. Business wants action. They want you to cowboy up and get it back online by any means. Nobody wants to look like they’re not giving it their all, so planning hardly ever comes into play.
Here’s a real-life example where failure to plan was costly. Someone called me at home one morning because their replication publisher (mirrored asynchronously) was down and they needed to failover to the mirror. I was in the shower when they called. Because I didn’t answer, they asked a “friend” who is a “DBA” what to do. He gave his suggestions and they followed through with it, but what he had advised wasn’t working. I called them when I got out of the shower and had listened to their message.
I asked them what was going on, and their reply was, “We need to failover the publisher to the mirror. I’ve dropped mirroring, now what do I do?” I listed what they would need to do and then walked them through each step. The publisher wasn’t online, so there was no option to re-establih mirroring. Their friend had told them that the only way to bring the mirror online if mirroring asynchronously was to drop mirroring.
Here’s what they actually did to rectify the problem (actual time: 5 hours):
- Recover the mirror database
- Change the application connection strings to point to the original mirror as the primary (for write operations)
- Change the application connection strings to point to the original mirror for read operations
- Drop replication
- Recreate replication
- Reinitialize and reconfigure the 5 subscribers to the publisher
- Change the application connection strings to repoint to the subscribers for read operations
Here’s what they should have done (estimated time: 5 seconds):
- Force service on the mirror with allow data loss
To be a good DBA, you don’t need to write out a full menu every time a problem occurs, but you should be doing at least some minimal planning. Make sure everyone knows what needs to be done and what their part of the plan is. Be like Rokusaburo Michiba. Let others run around wildly while you take a couple of minutes to understand the situation and devise the best plan. Then take action. You’ll be surprised how planning can help you reach your end goal right on time instead of struggling to get there.