Tag Archives: Real Time Embedded Systems Forum

Why Technology Must Move Toward Dependability through Assuredness™

By Allen Brown, President and CEO, The Open Group

In early December, a technical problem at the U.K.’s central air traffic control center in Swanwick, England caused significant delays that were felt at airports throughout Britain and Ireland, also affecting flights in and out of the U.K. from Europe to the U.S. At Heathrow—one of the world’s largest airports—alone, there were a reported 228 cancellations, affecting 15 percent of the 1,300 daily flights flying to and from the airport. With a ripple effect that also disturbed flight schedules at airports in Birmingham, Dublin, Edinburgh, Gatwick, Glasgow and Manchester, the British National Air Traffic Services (NATS) were reported to have handled 20 percent fewer flights that day as a result of the glitch.

According to The Register, the problem was caused when a touch-screen telephone system that allows air traffic controllers to talk to each other failed to update during what should have been a routine shift change from the night to daytime system. According to news reports, the NATS system is the largest of its kind in Europe, containing more than a million lines of code. It took the engineering and manufacturing teams nearly a day to fix the problem. As a result of the snafu, Irish airline Ryanair even went so far as to call on Britain’s Civil Aviation Authority to intervene to prevent further delays and to make sure better contingency efforts are in place to prevent such failures happening again.

Increasingly complex systems

As businesses have come to rely more and more on technology, the systems used to keep operations running smoothly from day to day have gotten not only increasingly larger but increasingly complex. We are long past the days where a single mainframe was used to handle a few batch calculations.

Today, large global organizations, in particular, have systems that are spread across multiple centers of technical operations, often scattered in various locations throughout the globe. And with industries also becoming more inter-related, even individual company systems are often connected to larger extended networks, such as when trading firms are connected to stock exchanges or, as was the case with the Swanwick failure, airlines are affected by NATS’ network problems. Often, when systems become so large that they are part of even larger interconnected systems, the boundaries of the entire system are no longer always known.

The Open Group’s vision for Boundaryless Information Flow™ has never been closer to fruition than it is today. Systems have become increasingly open out of necessity because commerce takes place on a more global scale than ever before. This is a good thing. But as these systems have grown in size and complexity, there is more at stake when they fail than ever before.

The ripple effect felt when technical problems shut down major commercial systems cuts far, wide and deep. Problems such as what happened at Swanwick can affect the entire extended system. In this case, NATS, for example, suffers from damage to its reputation for maintaining good air traffic control procedures. The airlines suffer in terms of cancelled flights, travel vouchers that must be given out and angry passengers blasting them on social media. The software manufacturers and architects of the system are blamed for shoddy planning and for not having the foresight to prevent failures. And so on and so on.

Looking for blame

When large technical failures happen, stakeholders, customers, the public and now governments are beginning to look for accountability for these failures, for someone to assign blame. When the Obamacare website didn’t operate as expected, the U.S. Congress went looking for blame and jobs were lost. In the NATS fiasco, Ryanair asked for the government to intervene. Risk.net has reported that after the Royal Bank of Scotland experienced a batch processing glitch last summer, the U.K. Financial Services Authority wrote to large banks in the U.K. requesting they identify the people in their organization’s responsible for business continuity. And when U.S. trading company Knight Capital lost $440 million in 40 minutes when a trading software upgrade failed in August, U.S. Securities and Exchange Commission Chairman Mary Schapiro was quoted in the same article as stating: “If there is a financial loss to be incurred, it is the firm committing the error that should suffer that loss, not its customers or other investors. That more than anything sends a wake-up call to the entire industry.”

As governments, in particular, look to lay blame for IT failures, companies—and individuals—will no longer be safe from the consequences of these failures. And it won’t just be reputations that are lost. Lawsuits may ensue. Fines will be levied. Jobs will be lost. Today’s organizations are at risk, and that risk must be addressed.

Avoiding catastrophic failure through assuredness

As any IT person or Enterprise Architect well knows, completely preventing system failure is impossible. But mitigating system failure is not. Increasingly the task of keeping systems from failing—rather than just up and running—will be the job of CTOs and enterprise architects.

When systems grow to a level of massive complexity that encompasses everything from old legacy hardware to Cloud infrastructures to worldwide data centers, how can we make sure those systems are reliable, highly available, secure and maintain optimal information flow while still operating at a maximum level that is cost effective?

In August, The Open Group introduced the first industry standard to address the risks associated with large complex systems, the Dependability through Assuredness™ (O-DA) Framework. This new standard is meant to help organizations both determine system risk and help prevent failure as much as possible.

O-DA provides guidelines to make sure large, complex, boundaryless systems run according to the requirements set out for them while also providing contingencies for minimizing damage when stoppage occurs. O-DA can be used as a standalone or in conjunction with an existing architecture development method (ADM) such as the TOGAF® ADM.

O-DA encompasses lessons learned within a number of The Open Group’s forums and work groups—it borrows from the work of the Security Forum’s Dependency Modeling (O-DM) and Risk Taxonomy (O-RT) standards and also from work done within the Open Group Trusted Technology Forum and the Real-Time and Embedded Systems Forums. Much of the work on this standard was completed thanks to the efforts of The Open Group Japan and its members.

This standard addresses the issue of responsibility for technical failures by providing a model for accountability throughout any large system. Accountability is at the core of O-DA because without accountability there is no way to create dependability or assuredness. The standard is also meant to address and account for the constant change that most organization’s experience on a daily basis. The two underlying principles within the standard provide models for both a change accommodation cycle and a failure response cycle. Each cycle, in turn, provides instructions for creating a dependable and adaptable architecture, providing accountability for it along the way.

oda2

Ultimately, the O-DA will help organizations identify potential anomalies and create contingencies for dealing with problems before or as they happen. The more organizations can do to build dependability into large, complex systems, hopefully the less technical disasters will occur. As systems continue to grow and their boundaries continue to blur, assuredness through dependability and accountability will be an integral part of managing complex systems into the future.

Allen Brown

Allen Brown is President and CEO, The Open Group – a global consortium that enables the achievement of business objectives through IT standards.  For over 14 years Allen has been responsible for driving The Open Group’s strategic plan and day-to-day operations, including extending its reach into new global markets, such as China, the Middle East, South Africa and India. In addition, he was instrumental in the creation of the AEA, which was formed to increase job opportunities for all of its members and elevate their market value by advancing professional excellence.

Leave a comment

Filed under Dependability through Assuredness™, Standards

Call for Submissions

By Patty Donovan, The Open Group

The Open Group Blog is celebrating its second birthday this month! Over the past few years, our blog posts have tended to cover Open Group activities – conferences, announcements, our lovely members, etc. While several members and Open Group staff serve as regular contributors, we’d like to take this opportunity to invite our community members to share their thoughts and expertise on topics related to The Open Group’s areas of expertise as guest contributors.

Here are a few examples of popular guest blog posts that we’ve received over the past year

Blog posts generally run between 500 and 800 words and address topics relevant to The Open Group workgroups, forums, consortiums and events. Some suggested topics are listed below.

  • ArchiMate®
  • Big Data
  • Business Architecture
  • Cloud Computing
  • Conference recaps
  • DirectNet
  • Enterprise Architecture
  • Enterprise Management
  • Future of Airborne Capability Environment (FACE™)
  • Governing Board Businesses
  • Governing Board Certified Architects
  • Governing Board Certified IT Specialists
  • Identity Management
  • IT Security
  • The Jericho Forum
  • The Open Group Trusted Technology Forum (OTTF)
  • Quantum Lifecycle Management
  • Real-Time Embedded Systems
  • Semantic Interoperability
  • Service-Oriented Architecture
  • TOGAF®

If you have any questions or would like to contribute, please contact opengroup (at) bateman-group.com.

Please note that all content submitted to The Open Group blog is subject to The Open Group approval process. The Open Group reserves the right to deny publication of any contributed works. Anything published shall be copyright of The Open Group.

Patricia Donovan is Vice President, Membership & Events, at The Open Group and a member of its executive management team. In this role she is involved in determining the company’s strategic direction and policy as well as the overall management of that business area. Patricia joined The Open Group in 1988 and has played a key role in the organization’s evolution, development and growth since then. She also oversees the company’s marketing, conferences and member meetings. She is based in the U.S.

1 Comment

Filed under Uncategorized

TOGAF™ to the Platform: Developing Dependability Cases, 2011 RTESF San Diego Meeting

By G. Edward Roberts, Elparazim

The Open Group RTES (Real Time Embedded Systems) Forum has embarked on a project to define a RTES version of TOGAF™.  To accomplish this task, the Forum has looked at technologies and techniques that represent the “best-of-breed” practices in the industry. So far, the Forum has studied the Modeling side of development with AADL (Architecture and Analysis and Design Language) standard from the SAE (Society of Automotive Engineers), SysML (Systems Modeling Language) and MARTE (Modeling and Analysis of Real-Time and Embedded Systems) from the OMG (Object Management Group).  These technologies and their use will definitely be in the guidelines being added into this vertical domain instance of TOGAF™.

On this afternoon’s session of the Forum during The Open Group Conference, San Diego, there will be a continuation of a discussion started in a webinar from September 2010. That webinar outlined certain proposals by some of the members on what they thought could be accomplished by the Forum in the area of the development of Dependability Cases for systems. One interesting proposal was the development of a multi-level taxonomy/ontology of Assurance attributes that would need to be captured by any tools supporting the development of Dependability Cases.  These discussions will help shape the roadmap for the Forum’s work in this area.

At this Conference, the RTES Forum will start to examine the technologies and techniques in the industry surrounding the development of Dependability Cases.  Many systems lack dependability (aka Assurance) in certain areas, e.g. MILS, security, deadlock avoidance, due to the lack of detailed development resulting in a failure to detect flaws (assumptions, missing data, lack of testing) in ones design of a Real-Time and/or Embedded System. In the past, systems desiring to be at a high level of Assurance in some area had to be formally (i.e. mathematically) proved for correctness (called ‘Formal Methods’).  This was an extremely costly endeavor. The industry has recognized this dilemma and showed that a somewhat lesser degree of Assurance could be obtained by making a formal structured argument about the system meeting certain requirements, i.e. a Dependability Case, which would keep track of the details of what one has to provide as evidence to prove the case. This technique has the ability to represent formal methods as well as these lesser Assurance arguments.

On Tuesday during the Conference, there will be a set of presentations on Dependability Cases technologies and the processes needed to develop them. First, I will present an update to the Forum on the work being done current on this project. Included with this report is the work being done in modeling TOGAF™ and its importance to the RTES effort. The second presentation will be a look at the technologies surrounding Dependability Cases: ARM (Argumentation Metamodel)  and SAEM (Software Assurance Evidence Metamodel) from the SysA group in the OMG, soon to be combined together into a single standard, SACM (Structured Assurance Case Metamodel), the GSN (Goal Structuring Notation) and a general discussion of the work of Steven Toulmin’s reasoning model with which these technologies have been influenced.

The third lecture, by Rance DeLong of Lynux Works, will deal with some of the theory and practice of building Dependibility Cases using his recent work on MILS Protection Profiles. This lecture will deal with how one does Compositional Certification, that is, given components that have some level of Assurance, how does one combine them to develop systems that are assured.  Also included in this lecture will be a discussion on the Common Criteria Authoring Environment and new MILS research directions.

The fourth and final presentation on this topic, will be right after lunch on Tueday at 1:30pm, and will be presented by Dr. Matsuno of the University of Tokyo on D-Case technology.  This is a process and soon to be released tool on the eclipse platform to develop Dependability Cases for systems.  The forum is excited to have Dr. Matsuno present and hopes that this will open up a process description that will be part of the RTES plugin to TOGAF™.

G. Edward Roberts is owner of Elparazim, a consulting company on Enterprise/Software Architecture and Development. Edward holds degrees in Electrical Engineer and Mathematics, and worked for most of his professional life, as an Advanced Technology Researcher for the US Navy. He is currently working with the Real-Time Embedded Systems Forum, of which he is a member, to develop a domain specific TOGAF™ for that sector and the Architecture Forum (also a member) to model TOGAF 9.  Edward is a TOGAF™ 9 Certified Architect and certified Professional Engineer in EE.

Comments Off

Filed under Enterprise Architecture, TOGAF®, Uncategorized