Linguistically Intelligent Authoring Assistance in Requirements Engineering

Executive Summary

Having control over language is crucial when formulating requirements in requirements engineering.

The advantage of writing requirements in natural language is that no training in a special metalanguage is needed. Authors also have relatively extensive linguistic freedom. However, this freedom also has the disadvantage that there is more room for errors, inconsistencies, ambiguities, and other text quality problems.

Poor text quality can result in a variety of fundamental problems in requirements engineering.

To address these problems, commonly used software solutions for text quality assurance in requirements engineering usually rely on word-based methods. In simple terms, this means that word lists are used for checks that include, for instance, “weak words”, incompleteness markers and other lexicon-based patterns.

Linguistically intelligent machine-based authoring assistance combines lexicon-based methods with a more in-depth analysis method, a morpho-syntactic analysis. Words and sentences are broken down into their smallest meaningful components and then analyzed. This makes it possible to identify complex problems such as missing word references, incorrect word order or stylistic shortcomings. In addition, linguistically intelligent machine authoring assistance is capable of checking a document depending on its structure, which means that it can apply specialized checking rules to specific text formats.

The fact that such authoring assistance software is based on rules is another big advantage: it is able to access a very extensive and proven repertoire of rules. Individual configurations of rules can be put together and new rules can be added. This means that an authoring assistance tool can make general text quality criteria in requirements engineering both subject to review and, if desired, cover language-specific company requirements. This includes specific corporate terminology. Linguistic analysis methods also extend to this area in the context of modern authoring assistance software: the software ensures consistent adherence to corporate terminology.

Sentence reuse is largely ignored by popular software solutions for text quality assurance in requirements engineering. Yet sentence reuse offers significant advantages. If linguistically polished requirements already exist, they can be stored in a sentence repository and reused at any time. In the best case, none to only a few adjustments are then necessary to the accepted sentence. This saves authors time (and money) because they don't have to rewrite everything, and the text quality of the accepted requirement has already been verified in the past. Sentence reuse also helps to maintain consistency of wording, which has a positive effect on the flow of reading and ultimately on the comprehensibility and unambiguity of the requirements. And this benefits everyone involved in the requirements engineering process.

In Summary: In requirements engineering, you can ensure the linguistic quality of your requirements using a modern machine-based authoring assistance tool such as Congree Authoring Server. Unlike other software solutions, by using linguistically intelligent methods, even relatively complex text quality problems can be found and corrected. Consistency is also ensured, especially by adhering to corporate terminology.

Introduction

Requirements engineering is the process of defining, documenting and maintaining requirements in the development process (of software, for example). Requirements play an important and multifaceted role in this process. Two basic questions that reveal the logic behind requirements engineering:

What are the expectations for the product? What do clients, end users, investors and other stakeholders expect?
What must be implemented by the developers and how?

In addition, complete, correct and properly documented requirements are vital so that they can be used as the starting point for subsequent product development (source).

In this whitepaper, we will present why text quality is an essential factor for practicing meaningful requirements engineering. Building on this, we will show how linguistically intelligent machine-based authoring assistance can be used to avoid language stumbling blocks and ensure text quality.

Requirements engineering: linguistic features

Basics

Requirements are sometimes written in formal languages such as UML (Unified Modelling Language), which were designed to describe systems. Most often, natural language is used for requirements defined in the requirements engineering process. This does not have to be learned in advance, making it easier to get started in defining requirements (source). The fact that natural language does not have to first be learned is beneficial not only for authors, but also for all stakeholders. They are able to read and understand the requirements without having to learn a new syntax or language. In view of specific syntax or formal languages, which at times refer only to specific areas of requirements (such as software), natural language has yet another advantage: it is universal and can be used for all types of requirements (source: Reuther, Ursula und Koch, Matthias (o. J.): Automatisierte Qualitätssicherung von Anforderungen mit Hilfe linguistischer Regeln).

Natural language usually appears in requirements engineering either as free text or in the context of structured templates (source: see above).

Linguistic challenges

Compared to formal languages, natural language gives authors a greater degree of wording leeway. The challenge is that the more freedom there is, the greater the variations in content there will be, because everyone has a distinct writing style and individual strengths and weaknesses. This provides comparatively more room for errors and unnecessary variants.

One of the key issues in the quality of precision and unambiguity is of a (subject-specific) lexicographic nature: “weak words” also called imprecise words, violate the specified quality criteria in certain contexts. Weak words are “words or phrases that, when used in a free text, suggest that the free text is very likely to be imprecise” (source).

The primary problems occurring in practice were investigated by Kandt in 2003: 967 natural language requirements were examined. Of these, 223 requirements (or almost a quarter of the total), included weak words as already discussed above. The second most frequent issue was related to grammar: 83 requirements were incorrectly not written in the singular. Stylistic issues were also found: 53 requirements were not short and simple, 37 requirements were incomplete, and 23 requirements used negations (source: Kandt, R. K. 2003: Software Requirements Engineering: Practices and Techniques. JPL Document D-24994. SQI Report R-3. Jet Propulsion Laboratory. California Institute of Technology).

The most frequently occurring linguistic errors or problems extend across virtually all categories. Any text quality assurance, whether manual or software-based, must cover a broad linguistic spectrum.

Why text quality is so important in requirements engineering

To answer this question, we will first take a look at the disadvantages that result when requirements are not documented at all. This extreme case leads to:

Software development not knowing what exactly has to be done
Customers (as part of paid development) unable to understand the scope of the product they are paying for before it is finished
No clear definition of when a product is “finished”
Quality reviewers with no guidelines for testing the finished product (source).

Poorly written requirements are better than no documented requirements at all. Nevertheless, in extreme cases, a lack of text quality can also lead to the problems described above. In a worst case scenario, money may have to be repaid to the customer if the finished product does not live up to expectations. This is where clear, understandable requirements can save a great deal of money.

Standards for text quality in requirements engineering

To answer the question of how requirements should be written in requirements engineering, in addition to the ISO/IEC/IEEE 29148 standard there is also the guiding principle of the three Cs, “Completeness, Consistency and Correctness"(source). Alspaugh et al. also state that “clarity” is an important linguistic requirements specification quality attribute (source: Alspaugh, T.A., Elliott, S., Winbladh, S.K., Diallo, M.H., Naslavsky, L., Ziv, H., & Richardson, D. (2006). The Importance of Clarity in Usable Requirements Specification Formats).

But how can these quality attributes be implemented in language? This will be examined in the following sections.

Correctness: Spelling and Grammar

Correctness is a basic property of text. It can be found on both the content level as well as the formal level of language. The “how” will be discussed here, and thus the formal level. Textual correctness is not about stylistic features, but about whether texts are free from errors when measured against language quality attribute requirements. These quality attribute requirements are found in spelling and grammar rules – the Duden reference books, for example, are extremely widespread and widely accepted in German.

Clarity: Authoritativeness, Unambiguity, Comprehensibility

Clarity is a property of text that cannot be as clearly outlined as correctness. This is where the parameters of authoritativeness, unambiguity and comprehensibility come in, which together define the property “clarity” in greater detail. What makes up these properties in terms of text will now be described in more detail. It should be noted that for each of the three properties, there are other features in addition to those mentioned, which will not be discussed in detail.

Authoritativeness

Expressing authoritativeness by language means, above all, avoiding vague modes of expression. In German, this includes:

“man”: The indefinite pronoun “man” (which translates as “one”, which has fallen out of general use in modern English) expresses the subject of the sentence, but does not specify anyone in particular. In practice, this can result, for example, in a statement containing an instruction using “man”, but with no one feeling responsible for carrying out the instruction because no one is addressed explicitly. It is also quite conceivable that the use of “man” does not make it clear to stakeholders which people will play specific roles in the development process. This can easily lead to misunderstandings.
Past subjunctive: The past subjunctive sometimes occurs together with “man”, but also frequently with a specific subject. Constructions such as “Jana would use the software” are still vague. No clear statement is made as to whether or not Jana will really be using the software.
Modal verbs: Like the subjunctive, modal verbs can also detract from the authoritativeness of a statement. They add a degree of uncertainty to the main verb. “The system may crash if XY occurs” says nothing about what will really happen if XY actually does occur. In requirements engineering, however, it is important for facts to be described as precisely as possible and not just as a possibility.
„beziehungsweise“ und „bzw.“: The German word “beziehungsweise” and it’s abbreviation “bzw.” are inappropriate for use in written requirements. Why? “Beziehungsweise” has two meanings: “more precisely” and “in the other case” (source).
For sentences following the pattern “[Statement 1] beziehungsweise [Statement 2]”, this means:
- Statement 2 substantiates statement 1
- Statement 2 adds a “counter case” to statement 1
If a statement requires greater specification, it can be assumed that it has not been formulated in an authoritative and meaningful enough way. This can severely impair how the requirement is received in terms of content. And formulating a counterclaim within a sentence is not very clear nor unambiguous. It is better to write two separate statements. In addition (and this is where authoritativeness comes into play again), a statement quickly appears non-authoritative if a statement counter to it is made in the same sentence.
Softeners: “Vielleicht” (maybe, perhaps) is a good example illustrating how authoritativeness is diminished by softeners (linguistic hedges). If the statement is made within a requirement that action A causes reaction B in the system, it is clear to the development team how it must design the system to behave. The statement that action A may cause reaction B, on the other hand, is completely non-authoritative and thus essentially useless.

Unambiguity

Ambiguities must be avoided to achieve unambiguity. To do so, it is worth looking more closely at the following linguistic entities:

Passive constructions: Passive constructions often pose a problem because they focus on an action but leave out the agent. Depending on the type of text, this can make sense, but usually not in the context of requirements engineering. A requirement is most unambiguous when the agent is specified.
Pronominal resumption is not in itself a bad thing. However, in some cases it can lead to ambiguities.
“The ball was round. It was on the table”, for example, does not present a problem. The first sentence has only one noun and the second sentence only one pronoun. This works grammatically and immediately suggests that “it” resumes the noun “ball”.
This colloquial expression does, however, pose a problem: “The dog put the ball on the table. It was beat.” It cannot be clearly determined whether “it” refers to the dog, ball or table. Ambiguity has been created. It is therefore important to mitigate such constructions, e.g. by repeating the noun instead of resuming it using a pronoun.
Subject-object placement: The position of subject and object plays an important role in avoiding ambiguity. In German, the subject usually comes before the object in the sentence. However, it is permissible, though unconventional, to put the object before the subject. Nevertheless, this reversed position can cause ambiguities. Any German reading a sentence will subconsciously expect subject-object word order and possibly interpret the sentence incorrectly due to their expectations. While object-subject order is used to spice up texts in German publications, for example, the focus in requirements engineering should be on unambiguity. Subject-object order must always be adhered to.

Comprehensibility

There are many factors that influence the comprehensibility of a text. Avoiding or limiting the following constructions will put you on the right track to comprehensible texts:

Prepositional phrases: Prepositional phrases are not a problem in and of themselves – they only pose a problem when there are too many of them in a sentence. Above a certain number, the references of the prepositional phrases to each other become more difficult to identify. An extreme example of this is “leakage at the corrugated fuel vent pipe from the right tank chamber to the tank filler neck as a result of kink damage during tank assembly.” There is no generally agreed method for determining how many prepositional phrases are too many. Four per sentence, for example, may represent the threshold, but where this lies in individual cases must be determined individually on a company or departmental basis.
Having many units of meaning in a sentence results in a higher density of information. If information density is very high, text comprehension suffers, or simply put, the brain has to absorb too much input in too limited a space. The line between many and too many units of meaning must be determined on a case-by-case basis, just as with prepositional phrases.
Long sentences are not necessarily conducive to the comprehensibility of text. The longer the sentence, the greater the risk that at the end of the sentence readers no longer know exactly how it began. The sentence may have to be read many times over, and extracting the information is made more difficult overall. It is important to keep sentences as short as possible, especially in requirements engineering. 20-26 words as a maximum length could be a reference point, but company-specific or subject-specific quality attribute requirements should also be taken into consideration here.
The verb part problem: One feature of the German language is the occurrence of verbal brackets. An example: “Mit dem Produkt kann der Schalter, der für das Licht zuständig ist, ohne Probleme an die Wand angebracht werden” (“With the product, the switch responsible for the light can be attached to the wall without any problems”). There are 13 words between the verb parts “kann” (“can”) and “angebracht werden” (“be attached”). The more words that separate the verb parts, the less understandable the sentence. This challenge often occurs in very long sentences. To maintain comprehensibility, it is important to keep verb parts as close together as possible.

Consistency: terminology and reuse

Consistency is a complex that applies to several linguistic levels. For one, there is stylistic consistency: a consistent style across sentences, texts, and possibly even departments. The requirement to always present the same facts in the same way is an example of this. This requirement represents an important factor for linguistic success in the world of requirements engineering. However, this chapter will focus on terminological consistency and maintaining consistency by reuse.

Terminological consistency

The correct and consistent use of terminology has many benefits for companies. Employing consistent terminology also has many advantages in requirements engineering. For this reason it is advisable to include a glossary with the requirements specification, providing an overview of the relevant technical vocabulary in the document.

If the requirements consistently contain the terms defined in the glossary, the positive consequences are (source):

Stakeholders who do not yet know the technical vocabulary are able to familiarize themselves with it.
This ensures that all stakeholders “speak the same language”, avoiding misunderstandings and potential ambiguities.

When drafting requirements, it can be a challenge to use technical vocabulary consistently and not create terminological variants. Overcoming this challenge should be given a high priority, because inconsistent terminology in requirements not only nullifies the benefits described, it also makes the content of the requirements specification difficult or impossible to find. This happens because the content does not match the defined terminology, which is used as an aid for searching and provides orientation for stakeholders.

Consistency from reuse

The opposite of consistency is, simply put, a diversity of variants. This applies to terminology as well as entire sentences or even modules. Over time, ever more content is being created in companies. Requirements engineering is no exception. It is unlikely that every new sentence expresses a new idea. Frequently, new content has already “been there, done that” in a similar form. When wording is unconstrained, however, it is reasonable to assume that the newly drafted sentence will be phrased differently from the existing one. Thus, there are two linguistic variants of content wind up in the company’s sentence repertoire.

These sentence variants must be avoided. Numerous benefits are created when sentences that meet all linguistic quality attribute requirements are saved and proposed for reuse when writing requirements. The time spent writing is shortened by the application of proposed texts. Stakeholders who have already worked with similar requirements documents will again encounter sentences that are familiar and were already understood. If existing, tried-and-tested phrasing is used in a document, this increases the comprehensibility of the text.

Completeness: sentence templates

In requirements engineering, completeness is usually understood as a content-related criterion. In linguistic terms, syntactic completeness, i.e., structural completeness at the sentence level, should be mentioned. Only complete sentences result in correct requirements.

Syntactic completeness can be formalized, for example, by using sentence templates.

Common methods for text quality assurance in requirements engineering

There are already widespread methods for meeting the text-related challenges accompanying natural language requirements in requirements engineering. They can be roughly divided into two areas:

Restriction of the linguistic inventory
Word list and pattern-based search for unwanted constructions

This will be examined in the following sections.

Restriction of the linguistic inventory

Types of text such as poetry and prose draw on the myriad of words and styles of a language. In requirements engineering, the opposite is true: texts should be functional, not “beautiful”. Correct, consistent and complete. One way to achieve this is to apply a “requirements syntax”. The quote below explains the “requirements syntax” principle:

„There is a set syntax (structure), with an underlying ruleset. A small number of keywords are used to denote the different clauses of an EARS requirement. The clauses are always in the same order, following temporal logic. The syntax and the keywords closely match common usage of English and are therefore intuitive.“

Although the quote refers to the well-known Easy Approach to Requirements Syntax (EARS), it aptly illustrates what “requirements syntax” is all about.

Fixed, rules-based structure, permitted words – there is some similarity here to prescriptive controlled languages.

Controlled languages according to the prescriptive approach determine “permitted” terms and structures, while proscriptive approaches determine language entities that are “not permitted”. There are advantages and disadvantages to both approaches. The advantage of the prescriptive approach is that the definition of permitted structures provides less wording leeway than the proscriptive approach. This can have a positive effect on the consistency of the text, especially if several authors are involved. However, authors could also consider restricted wording leeway as a disadvantage, which may create problems with acceptance. Moreover, the higher number of rules that need to be learned suggest a relatively higher training overhead for the prescriptive approach. The advantage of the proscriptive approach is that it leaves the technical writer more wording leeway. The prescriptive approach results in notifications from checking programs that are designed to be comparatively more specific, as they clearly specify the author’s violations. The disadvantage of the proscriptive approach is that due to the many “non-incorrect” options for writing a text, there is more room for inconsistency. In simple terms, the weaknesses of one standard are the strengths of the other.

Word list and pattern-based search for unwanted constructions

In addition to the option of applying a “requirements syntax”, other methods exist for ensuring text quality in requirements engineering. These can be classified under the proscriptive approach and are often implemented on the basis of word lists. Specifically, it is a matter of detecting and flagging the following undesirable constructions in requirements:

Weak words
Multiplicity
Negations
Incompleteness (e.g. indicated by authors as “TBA” or “TBD”)
Passive
Subjective formulations
Optionality
Implicity
Unambiguity
Comparative phrases (source: Reuther, Ursula und Koch, Matthias (o. J.): Automatisierte Qualitätssicherung von Anforderungen mit Hilfe linguistischer Regeln)

The methods presented recognize a host of critical linguistic constructs in the context of requirements engineering and help in the drafting of appropriate texts.

But they also have weaknesses. On the subject of “requirements syntax”, the need for training should be discussed first. But the fact that there are still things missing in the syntax that writers have to fill in themselves also opens up a certain margin of error.

Detecting errors – this is fundamental to the proscriptive approach. Although commonly employed methods based on word lists detect and flag many problematic issues, they miss underlying textual problems that cannot be checked on the basis of word lists.

What’s more is that searching word lists only occurs without context. Many potential errors are found, but many unnecessary warnings are also generated, because text quality problems are usually relevant in different ways in different text structures (headings, lists, body text, etc.). The problem of context insensitivity also exists when word list searches are extended by pattern-based search methods, e.g. based on regular expressions.

The performance of pattern-based search methods is better than that of the word list method. However, linguistically complex problems such as inflection or term variants cannot be adequately covered or not covered at all. Similarly, string-based only search is unable to identify syntactic dependencies. For example, “no longer” may appear as a weak word in a sentence such as “The length must be no longer than 7 meters.” However, a string-based rule that finds fault with this construction will also flag correct sentences such as “The value no longer appears as an arrow, but as a pulsing signal.”

Modern authoring assistance in requirements engineering

Differences from traditional approaches

In most cases, commonly used software for text quality assurance in requirements engineering is based on a proscriptive approach: it checks for non-permitted constructions on the basis of formal rules. Word lists and pattern-based searches can be used for this purpose.

Software like Congree Authoring Server goes a step further. Its language check is based on linguistic algorithms. This means that the software’s linguistic engine breaks down the texts being analyzed into their smallest meaningful elements, morphemes, and identifies the sentence structure through syntactic analysis. Even complex grammar errors, stylistic errors or complex variants of non-permitted terms can thus be detected.

General spelling and grammar
- Detected by traditional approaches: to a limited extent
- Detected by Congree: yes
„man“
- Detected by traditional approaches: yes
- Detected by Congree: yes
Past subjunctive
- Detected by traditional approaches: to a limited extent
- Detected by Congree: yes
Modal verbs
- Detected by traditional approaches: to a limited extent
- Detected by Congree: yes
„beziehungsweise“/„bzw.“
- Detected by traditional approaches: yes
- Detected by Congree: yes
Softeners
- Detected by traditional approaches: yes
- Detected by Congree: yes
Passive
- Detected by traditional approaches: to a limited extent
- Detected by Congree: yes
Pronominal resumption
- Detected by traditional approaches: no
- Detected by Congree: yes
Subject-object placement
- Detected by traditional approaches: no
- Detected by Congree: yes
Too many prepositional phrases
- Detected by traditional approaches: yes
- Detected by Congree: yes
Too many units of meaning
- Detected by traditional approaches: no
- Detected by Congree: yes
Long sentences
- Detected by traditional approaches: yes
- Detected by Congree: yes
Verb parts too far apart
- Detected by traditional approaches: no
- Detected by Congree: yes
Terminology
- Detected by traditional approaches: yes
- Detected by Congree: yes, terminological variants included
Sentence reuse
- Detected by traditional approaches: no
- Detected by Congree: yes
Sentence templates
- Detected by traditional approaches: yes
- Detected by Congree: no

A deeper, more complex linguistic analysis represents the main difference between modern linguistic-based authoring assistance software and traditional text quality assurance tools for requirements engineering. Another distinguishing feature is context-sensitive checking: software such as Congree Authoring Server applies its checking rules depending on document structure. Depending on configuration, different, appropriate rules apply to titles, paragraphs, and lists. This minimizes the risk of false positives.

Feasibility

There are already a large number of rules that have been successfully applied in the context of technical documentation for many years. Most of these rules also satisfy the rules used in requirements engineering (Quelle: Reuther, Ursula und Koch, Matthias (o. J.): Automatisierte Qualitätssicherung von Anforderungen mit Hilfe linguistischer Regeln). The repertoire of rules can also be expanded to cover company-specific rule requirements.

The question of feasibility always includes its specific technical implementation. Software for checking language must be used where text is created. That’s the reason why this type of software usually offers a wide range of integration options. Using the Congree Authoring Server as an example of modern, linguistic-based authoring assistance software means that there are several basic paths to integration:

Full integration

The software can be installed as a plugin for a wide range of desktop applications. One example of this are the applications included in Microsoft Office, such as Microsoft Word, which is frequently used as an editor for requirements documents.

Browser plugin

When applications for creating or editing requirements documents are used in a browser, this is where the Congree browser plugin comes in. It enables the use of the Congree Language Check wherever text is created online, e.g. in Atlassian JIRA:

APIs

If a software solution is being used in requirements engineering, for which no full integration option is available, authoring assistance can still be easily integrated based on APIs.

Conclusion

Requirements engineering takes place when requirements are formulated for a system or product. This is usually done in natural language. Due to the fact that different stakeholders with different interests, levels of knowledge and perspectives are involved in every project, comprehensibility plays a key role. However, other criteria such as correctness, unambiguity, authoritativeness, and terminological as well as stylistic consistency are also linguistically relevant for formulating good, target-oriented requirements when documenting requirements.

There is a wide range of traditional approaches and tools that cover many linguistic problem areas in requirements engineering. However, there are many gaps in coverage that are primarily methodological since such tools make use of word list and pattern-based analysis methods. These gaps in coverage are filled in by modern, linguistic-based authoring assistance software. By breaking down linguistic content into morphemes, sentences can be analyzed in depth. This is done sensitive to the context of the sentence, minimizing unnecessary false positives from the check. Congree Authoring Server, which combines its Congree Language Check with a terminology component and the option to reuse sentences, can be cited as an example of linguistic-based software.

This enables requirements to be optimized with software assistance, mitigating the problems caused by poorly written requirements documents. This benefits all stakeholders in the process.

More Knowledge

Tags:

Congree
Knowledge
White Paper
Terminology
Language Check
Integration
Authoring Memory