Variety, Black Swans and Platform 3.0

By Stuart Boardman, KPN

Enterprises today are subject to and increasingly make use of a range of technological and business phenomena, that increase enormously the range of factors affecting the ability of an enterprise to carry out its business effectively and efficiently. Some examples of this (Cloud, Big Data, The Internet of Things, Social Media/Business and Mobility) are the focus of The Open Group’s Open Platform 3.0™ Forum. An enterprise participating in some way in this world (i.e. any enterprise unable to lock itself inside its own walls) will have to find ways of matching the variety these phenomena introduce. I’m using the term Variety here in the sense defined by W. Ross Ashby – most notably in his Law Of Requisite Variety (1952), which I’ve written more extensively about elsewhere.

Variety can be internal or external to a system (an enterprise is a system) but it’s the external variety that is increased so dramatically by these new phenomena, because they typically involve having some part of an enterprise’s business performed by another party – or a network of parties, not all of whom are necessarily directly known to that enterprise.

Ashby’s law says that the more Variety a system has to deal with, the more variety is needed in its responses. Variety must be matched by variety. You need at least to be able to monitor each factor and assess changes in its behavior, if you are to have any hope of responding..

There are three main elements involved in developing a strategy to match variety.

First we need to find ways of identifying relevant variety and of understanding what its effect on our enterprise might be. That’s going to tell us what meaningful options for response exist. We should not make the mistake of thinking that a deterministic response to any given type of variety is always possible. Ashby himself was very clear about this. Some factors (especially those involving people) don’t behave in a predictable manner. It’s therefore useful to classify each form of variety according to some schematic. I use Tom Graves’s SCAN framework.

image001

There are other frameworks – I just find Tom’s semantics rich but easy to follow.

Second we need to understand the level of risk that a particular form of variety might pose. How much damage might a particular event do to our business? In the “always on” world that Platform 3.0 encompasses, there’s a tendency to assume that being offline is a drama. But is that always true? The size and cost of a response mechanism needs to be in proportion with the risk involved.

image003Lastly we must decide what kind of response mechanism we actually want to implement – assuming  the level of risk and the available options indicate any need for response at all. The fact that we could put a control mechanism in place doesn’t necessarily mean that it’s a good idea, as Nassim Nicholas Taleb shows in his book Antifragile (of which more in a minute).

Here’s an example from the Internet of Things (IoT): “Smart Charging” for electric automobiles. Here we know that the number of parties involved is quite small (Distribution Network Operator, Charging Provider, Local Controller/Provider and Automobile/User) and that both functional and legal/commercial contracts image005between parties will apply. If we look at an individual device (sensor, monitor, controller…) and its relationship to someone else’s device, there’s a good chance we can describe the behaviour with some confidence. So we’re talking about a Simple situation that’s amenable to a rules based (“if A happens, do B”) response. But of course it’s not usually so straightforward. One can expect at least a one to many relationship between “our” device and the devices with which it exchanges information. So in reality we’re dealing with a Complicated situation. That doesn’t mean you can’t determine a reliable set of behaviours and responses but it will be a sizeable matrix and will require significant analysis effort.

So what’s the risk? Well that depends who you are. A car owner, a charging provider and a network operator have quite different perceptions of what constitutes an event of business significance. They all have a common interest in the efficient functioning of the system as a whole but quite different views on which events require a response and what sort of response. A sensor or controller problem could lead to a failure to detect a potential network overload. So could faulty data about weather or consumption patterns or poor (big) data analytics, all of which fall at best into the Ambiguous category. For a car owner this isn’t really a risk until something goes seriously wrong – and even then one can always work from home. For the network operator the significance is far greater, as they are legally responsible for providing sufficient capacity and additional infrastructure is expensive. On the other hand, if the network operator decides to play safe and reduce capacity allocated to the charging provider, that will at least lead to irritation for car owners due to incomplete or slow charging of their car. That is not usually a business critical event but the possibility exists. For the charging provider an isolated local event is not much more than an annoyance but a widespread effect can have direct financial or customer relationship consequences.

Then there’s the third consideration. Just because we could set up a control, does that mean we really should do so? In Antifragile Taleb shows that many systems are fragile exactly because they try to control everything. Now in general this applies to social/economic systems, which in SCAN terms are Ambiguous or Not-known and therefore not really amenable to tight control anyway. But even mechanical systems can suffer from this problem. It’s not uncommon that a response to some stimulus has knock-on effects elsewhere in the system and if there’s a two way relationship between a source of variety and our response mechanism, all kinds of unexpected things could happen. So we need to be very sure about what we’re doing.

image007Moreover tightly controlled systems have great difficulty with black swan events (another Taleb book), because these by definition are not catered for in the rule book. An over-reaction or mistaken reaction can have disastrous consequences. No reaction may sometimes be a better tactic. All of which brings me to another example.

The example is based on the (in)famous Amazon outage of a couple of years back and is in no way intended to knock Amazon. I’ve written about this in detail in another blog but the central point is that when there is a significant outage we (the customer) are in the Not-known territory. We have no direct ability to respond to the variety that caused the problem, so we need a different way of responding – something that we can decide for ourselves but which can’t possibly be based on a rules driven approach. I described a response that involved creating a separate back-up/recovery strategy – potentially with multiple options. But of course that comes at a price, so our risk assessment needs to be well thought through.

This example has another interesting aspect to it. The scale of the problem arose from a failure of a control structure that could manage expected events but which actually made things worse in the face of something in the order of a black swan event. And of course this isn’t just about machines – there were people involved too. The control structure was intended to be robust but was in fact fragile. But in the end how much damage was done? As far as I know no-one went bust. Amazon learned from the experience and continued to do so – and so did everyone else. So actually the whole system proved to be anti-fragile. It got better as a result of a few knocks. I don’t know exactly how Amazon do it now but I hope they’ve given up trying to control everything with a rule book.

You could say that the mission of the Open Platform 3.0 Forum is to help enterprises gain the benefits they seek from all those phenomena. So here’s a great opportunity for the Forum to take a lead in an area that too often gets shoved off into the non-sexy world of “non-functional requirements”. I hope we can describe ways for enterprises to deal with variety in an intelligent and adequate manner – to reliably manage what can be managed without driving themselves crazy trying to manage the unmanageable.

Stuart BoardmanStuart Boardman is a Senior Business Consultant with KPN where he co-leads the Enterprise Architecture practice as well as the Cloud Computing solutions group. He is co-lead of The Open Group Cloud Computing Work Group’s Security for the Cloud and SOA project and a founding member of both The Open Group Cloud Computing Work Group and The Open Group SOA Work Group. Stuart is the author of publications by the Information Security Platform (PvIB) in The Netherlands and of his previous employer, CGI. He is a frequent speaker at conferences on the topics of Cloud, SOA, and Identity.

6 Comments

Filed under Cloud, Open Platform 3.0, Platform 3.0

6 responses to “Variety, Black Swans and Platform 3.0

  1. Really nice, Stuart – and many thanks for including my work in this!

    One quick comment: it’s not just the variety itself that’s the challenge for a Simple or Complicated schema – it’s the way that the set of variety-parameters itself can change, or what I’ve called ‘variety-weather’, ‘the variety of the variety’. In some circumstances and contexts a fitness-landscape can turn into something like a fitness-seascape, with tumultuous seas of uncertainty and change that can switch a ‘high-fitness’ point into a ‘low-fitness’ point in a matter of moments. The result is that rules that _did_ work well can suddenly stop working (i.e. stop applying validly) – and unless we watch out for that possibility, we may not realise that it’s happened.

    • Tom, the beauty of variety is that it’s a relative, observer-, situation-, purpose- etc dependant measure of complexity. Adding other dimensions like diversity for numbers, variation for change in time and others, can be useful in some cases, but I see variety as better left as it is and if a variety-of-variety is needed, then it’s again a matter of ‘variety’. Adding a new dimension is like adding a new variable, it’s just an increase of variety with a new set of states.

      • tetradian

        Ivo: yes, agreed, the real one-line summary is that “variety is variety is variety”. Either there are no dimensions to it, or it’s almost entirely dimensions, but it probably doesn’t all that much matter which it is: it’s _all_ ‘variety’.

        The catch is that many people _do_ tend to assume that once ‘the variety’ has been ‘defined’ – such as in a requirements-document, or a software-application – that it will then stay that way forever. Which it doesn’t, especially over long periods of time: in other words, the variety that people _think_ is ‘the variety’ itself has variety (which then forms part of the _real_ variety of the context, as you say). It’s that distinction that I was aiming to capture in that concept of ‘variety-weather': the difference between the subset of the real variety that is captured in someone’s ‘control-system’, versus the larger scope in which that subset of variety no longer delivers ‘control’.

  2. Tom and Ivo, thanks for this discussion. It lead me to think about something Ruth Malan wrote recently. It’s another aspect of the overall discussion and I think it’s very important.
    Ruth pointed out that our response itself has to be able to change, even if the variety in question doesn’t change, because the variety in the system as a whole may require a change in our response. After all, our chosen response is based not just on the behaviour of the source but on what we regard as desirable outcomes or on financial feasibility, possible knock on effects etc. These can be changed by physical, market, political or environmental factors, new opportunities or anything else that might cause us to revise our strategy. It may simply not be desirable to respond as we have previously done. Ruth Calls this requisite flexibility. I think it’s a powerful idea.

    • Stuart, the response itself can be seen as a vector with cognitive and behavioural variables. Commonly, dealing with a stimulus includes both efforts to attenuate the variety of the stimulus (cognitive component) and efforts to amplify the variety of response (behavioural component). There are several limitation to that. One is the the transducer’s variety and the other is the actual response budget. That concept was very convincingly developed by Max Boisot, on the basis of Ashby’s work enriched by the findings of Murry Gell-Mann, Halland, March and others.

      Well to apply that to our discussion, both your post and this comment are very good and stimulating but I’ll find another way for a more elaborated response.

      Now, a short and probably low-variety response: the choice to distinguish the change of the variety of one stimulus to that of the system is a choice of the observer. The boundary of the systems is another and that goes on for distinguishing market and political factors. Responses, especially human ones, are neither rational nor linearly dependant on the stimuli, provided we had the capacity to detect all of them.

  3. Pingback: Do Androids Dream of Electric Sheep? | The Open Group Blog