Our term extraction services provide you with optimal support for compiling your terminology. In case you already use term extraction tools, you are welcome to compare the results with those provided by our linguistic based approach.
You do not have any company specific terminology yet but would like to create a corporate terminology?
You want to extend your existing terminology on the basis of your document collection?
So far you have been compiling your terminology without tool assistance?
The basis of our analysis is a set of mature language technologies. A morphological analysis component determines the lemma of each word, its syntactic category and other grammatical information such as gender and number of nouns.
By lemmatization, different word forms are collected under one lemma (configuration, configurations). A grammatical analysis component detects the congruence of word groups and sentence patterns.
Term candidates are determined on the basis of linguistic properties. These patterns are in particular:
- multi-word compounds built from adjectives and nouns (acoustic absorber, active navigation system)
- compound nouns (acrylester, handgrip, microsurgery)
- derived or simplex nouns (maneuverability, rubber)
Statistical analysis determines frequency of occurrence of term candidates as well as the number of documents in which a term candidate occurs. These statistical figures help you in deciding whether a term candidate is to be added to your terminology or not.
- lemmatized terms
- grammatical features
- different types of term formation patterns
- frequency of occurrence per term candidate
- number of documents in which a term occurs
- list of documents in which a term occurs
- sample contexts
The determination of sample contexts per term candidate is based on a combination of statistical and linguistic methods. The sample contexts allow you for checking the distribution of a term and how it is used in different environments.
The consideration of metadata such as the ‘sort of document’ or the ‘department in which a document is relevant’ allows for a specific view on how terms are used across departments, different text sorts and the like.
The term extraction process can take existing terminology into account. Terms that are already part of the existing terminology are marked. This allows you to get an idea on how frequently existing terminology is used in your documents. It gives you a clue on which terms need to be added to the existing terminology.
Also, you see which terms from the existing terminology are obviously irrelevant, as they are not used in a single document.
You can combine term extraction with terminology evaluation. This provides you information on:
- whether term candidates contain items with incorrect spelling,
- which term candidates are problematic according to term formation rules (for example extremely complex compounds)
- or which term candidates are possibly variants of each other such as cost reduction, reduction of costs, or reducing costs.
Contact us with your requirements for terminology extraction from your existing documents.
We can meet your special requirements for term extraction as well.