Natural Language Processing for Standards Development

By Chris Harding, Principal, Lacibus, LTD, and Member of The Open Group

Natural Language Processing (NLP) is emerging as a powerful technology with many applications. Here are some thoughts on its application to standards development, with a call to participate in The Open Group work that uses it.

NLP Can Extract Topics and Ideas from Document Text

The Power of NLP

ChatGPT has caused a huge stir recently. It packages the state of the art in NLP into a chatbot. It is free for everyone to try and, looking at the current email and social media buzz, you might think that practically everyone has tried it.

The buzz isn’t confined to the technical press. Mainstream media such as CNN in the United States, the London Times, and Gemany’s Bild have featured articles on how it could affect their readers. It is forecast to disrupt industries ranging from healthcare to real estate, and to have a major impact on society as a whole.

Disruption of some industries is already well under way. Chatbots play an increasing role in sales and technical support, as anyone who has visited a product website recently can testify. ChatGPT is attracting attention because it can work at a more general level, responding to any prompt on any topic, like a person. It is using similar technology to sales and support chatbots, but has a large model that has been trained on a much bigger set of data, and this makes a crucial difference.

The Limitations

The technology does have limitations. It can do commodity content creation based on previously documented information, but without the accuracy and insight of a human expert. The best sales and support chatbots hand over to humans when they realize that their limitations have been reached.

Its strength is in presenting ideas extracted from existing material, rather than in creating something that is radically new. It can produce a good summary of quantum theory, but almost certainly would not have been able to invent quantum theory. I asked it whether it could produce a better theory than quantum theory. Its response included, “As an AI language model, I am not capable of doing experimental research or contributing to the evolution of a physical theory such as quantum mechanics.” I couldn’t have expressed this better.

It doesn’t have a real logical reasoning capability. For example, it couldn’t solve the following riddle. “Mike’s Mum has four children. Three of them are Tom, Dick, and Harry. What is the name of her fourth child?”

Also, some of its answers are factually incorrect. This is not surprising, since the body of material it was trained on probably included false statements, and its probabilistic model may select a more probable answer when a less probable one is actually right.

Back in the 1960s, an AI program called Eliza created a similar stir. It could produce quite a realistic response to a prompt by picking out patterns and phrases in it, and using an algorithm to generate phrases that might come next in a conversation. For example, prompted with, “I’m frightened by artificial intelligence,” it might respond, “How long have you been frightened by artificial intelligence?” ChatGPT is Eliza on steroids. Its responses are based, not just on what is in the prompt, but on the huge body of material that it was trained on. They are generated, not by a simple algorithm, but by a complex probabilistic model.

Collaborative Intelligence

The best humans had always beaten machines at chess until the mid 1990s, when grandmaster Gary Kasparov lost to supercomputer Big Blue. This gave a big impetus to the debate on human versus artificial intelligence. Kasparov himself has given the matter serious thought, and has been exploring collaboration between people and AI. Writing with David De Cremer in the Harvard Business Review, he describes an experiment showing that a well organized partnership between people and machines can be superior not only to the best people but also to the best machines. They conclude that AI should augment human intelligence, not replace it.

At the present time, using AI to augment human intelligence looks to be the way forward. A machine can assimilate a far greater body of knowledge than any human, and can present conclusions from it in ways that humans can understand. They can capture and express a consensus view. Humans add a moral and ethical dimension, and perhaps also some “common sense”, and are able to guide the machines towards what is “right” for their societies. They are better able to form new ideas and develop new theories.

Use in Standards Development

Despite its limitations, the natural language processing technology that underlies ChatGPT has huge potential for disruptive change in many areas. One of these is standards development. Last year, I gave a presentation to The Open Group Architecture Forum on the possibilities for using NLP. The subsequent rise to prominence of ChatGPT suggests that the possibilities are even more exciting, and could be realized sooner than I thought.

Standards enable people to work together by giving them a common foundation for communication. Their scope has expanded in recent times. The UNIX® standard, a standard of The Open Group, helped software developers collaborate by defining a particular kind of operating system. The TOGAF® standard, a standard of The Open Group helps Enterprise Architects collaborate by describing how they should think about architecture development.

At its 2020 event in San Antonio, Texas, two of The Open Group VPs, Andrew Josey and David Lounsbury, presented the concept of Standards as Code, where a standard may consist of executable code, provided that the code is subject to consensus-led change control. They recognized the potential of computer software to be practical standards. This software can now include programs such as ChatGPT, with standards consisting of their language models. Standards as Language Models may well be the biggest disruption of the standards world since the French Academy of Sciences defined the metre.

As well as providing a way of representing standards, natural language processing can support their development, and help people use them. The Open Group Data Integration Work Group is exploring these possibilities.

The Data Integration Work Group

The Data Integration Work Group is writing a Guide to Data Integration using The Open Group standards. To establish a basis, it is researching use cases for and current trends in data integration, and reviewing the corpus of The Open Group standards to identify relevant clauses.

To aid this work, it is using a prototype Ideas Browser, which analyses a set of web pages so that users can browse the topics and ideas without reading all the words. Its summaries are generated by the language model used by ChatGPT. It won’t replace people, but will enable them to review much more material much more quickly, so that they can take faster decisions, and produce better work.

Participation in this group is open to all members of the Architecture Forum, and the group is looking for new members. If you are interested in joining it, and working at the leading edge of standards development, contact Forum Director Dan Hutley.

Dr. Chris Harding is Founder and Principal of Lacibus Ltd. He formed the company to provide services based on virtual data lakes and data-centered architecture. Chris developed the ideas that led to the formation of the company while working as Director of the Open Platform 3.0™ Forum of The Open Group.

Chris was a staff member of The Open Group for many years, supporting its member activities in data communications, directory interoperability, web, service-oriented architecture, cloud computing, and other areas. He was the lead author of The Open Group Guide: Cloud Computing for Business, has helped produce a number of other publications by The Open Group, and has written many online articles. He remembers the early development of the TOGAF® Standard, a standard of The Open Group, and maintains an interest in Enterprise Architecture as a member of the Work Group on TOGAF Supporting the Digital Enterprise. His main focus is now on data platforms. He follows several industry initatives related to this, and participates in The Open Group Data Integration Work Group.

Before joining The Open Group, Chris was a data communications consultant and, before that, a software engineer and team leader. He has a PhD in Mathematical Logic.

He lives with his wife in Lincolnshire, UK, where he has scope to pursue his hobbies of gardening and photography.