Tue, 15 May 2012

Monitoring and Testability

At UDS last week there was another "Testing in Ubuntu" session. During the event I gave a brief presentation on monitoring and testability. The thesis was that there are a lot of parallels between monitoring and testing, so many that it's worth thinking of monitoring as a type of testing at times. Due to that great monitoring requires a testable system, as well as thinking about monitoring right at the start to build a monitorable system as well as a testable one.

You can watch a video of the talk here. (Thanks to the video team for recording it and getting it online quickly.)

I have two main questions. Firstly, what are the conventional names for the "passive" and "active" monitoring that I describe? Seecondly, do you agree with me about monitoring?

Posted at: 03:15 | category: /tech | Comments (5)


I'm not sure there is a conventional name for what you describe as passive monitoring. The lean startup group consider detecting folk that don't complete purchases an essential metric in the sales pipeline. I think I'd probably classify just as metric gathering.

'Active' is what most folk mean when they talk about monitoring of a site.

I do agree that monitoring is as important a part of the delivery of a product as a unit/functional test suite. There are some very effective tools for writing such monitoring as behavioural tests using cucumber.

Posted by Robert Collins at Wed May 16 01:29:21 2012

I think measuring those that don't complete purchases is a great idea. It seems that in general it won't give you immediately actionable things so isn't a good fit for alerting. A pager going off when someone cancels an order would be a sign of fanatical pursuit of sales, but not what most people are aiming for.

It's interesting that most people talk about 'active' monitoring, because most of the services I have seen only support it in a very shallow way e.g. checking for 200 response codes on a few pages. It's good that it's considered the ideal though :-)

Thanks for the reference to cucumber, I'll check it out.

Posted by James Westby at Wed May 16 02:59:27 2012

So, its totally normal that some folk won't complete an order process. Paging on something you don't need a human to action isn't very useful - and the relevant thing to do with order processing stuff is to learn why they cancelled. Thats best done by automation and experimentation.

On the automation side:
- gathering the data in the first place
- presenting it sensibly (e.g. 80% of folk that start an order fail to complete, vs 10 people fail to complete)
- Add exit interviews (you're cancelling the process, can you tell us why?)

On the experimentation side, run a number of tests (separately or concurrently) to see what makes people more or less likely to complete an order. Things like page design, prose, price, discounts, bundles, should all be part of that.

That may seem like a digression, but it actually ties right back to actionability: if your situation normal was (say) 20% of folk that start an order are failing to complete, you can alert when that rises to 25%, and then start tweaking in realtime - you can observe after quite a short window (minutes if you have enough folk landing on the page) whether a particular test helped or hindered.

Posted by Robert Collins at Wed May 16 03:08:02 2012

Agreed on all of those points.

Referencing purchases in the talk I was meaning things like alerting if a purchase fails due to the purchasing service being down after 3 retries. That's certainly something that I would want to be alerted about, and not something that is related to the users intent, so will be low on false positives.

Posted by James Westby at Wed May 16 03:10:50 2012

I think another way to look at this is Visibility into what your project is doing. Monitoring and Testing can be seen as associated with that.

Posted by John A Meinel at Wed May 16 09:30:04 2012