Friday, March 17, 2017

Vocabulary Control

Information Access Through The Subject


The term ‘vocabulary control’ refers to a limited set of terms that must be used to index documents, and to search for these documents, in a particular system. It may be defined as a list of terms showing their relationships and used to represent the specific subject of the document.

An information system may help the user by explicitly assigning index terms (that is, words or notations) to the documents and controlling, at least in the case of alphabetical (word) systems, the  semantic and often the syntatic relationships between these index terms the words (which may be subject  headings or descriptors) are assigned from  recognized subject heading lists or thesauri, and the notations from recognized classification schedules, and thus use controlled vocabulary.  A controlled vocabulary is one in which there is only one term or notation in the vocabulary for any one concept. The Library of  Congress List of Subject Headings is an example of a controlled alphabetical vocabulary, and the Dewey Decimal Classification is an example of a notational vocabulary (By definition, all notational vocabularies must be controlled).

The controlled vocabulary performs several tasks:
  • It usually explicitly records the hierarchical and affinitive/associative relations of a concept. Examples: Allergy, narrower-term: Hay fever; 385 (Railroad transportation), 381.1 (economic aspects of railroad transportation)
  • It establishes the size and scope of each topic. For example, whether or not the word baseball or the notation 796.357 is to include the concept softball.
In addition, for word based systems, the controlled vocabulary identifies synonyms terms and selects one preferred term among them. For homonyms, it explicitly identifies the multiple concepts expressed by that word or phrase. In short, vocabulary control helps in overcoming problems that occur due to natural language of the document’s subject. Hence, if vocabulary control is not exercised different indexers or the same indexer might use different terms for the same concept on different occasions for indexing the documents dealing with the same subject and also use a different set of terms for representing the same subject at the time of searching. This, in turn, would result in ‘mis-match’ and thus affect information retrieval.


Vocabulary Control
  • Subject Heading List
    • List of Subject Headings-General Principles
  • Thesaurus
    • Structure of Thesaurus
    • Relationship Between Terms
    • Thesauri and Subject Headings List
    • Thesauri and Classification Schemes
    • Thesauro-Facet
    • Classaurus
  • Systematic Arrangement

Subject Heading List

Subject heading has been defined as a word or group of words indicating a subject under which all materials dealing with the same theme is entered in a catalogue or bibliography, or is arranged in a file.

A vocabulary control device depends on a master list of terms that can be assigned to documents. Such a master list of terms is called ‘List of Subject Headings’. A list of subject headings contains the subject access terms (preferred terms) to be used in the cataloguing or indexing operation at hand.

List of Subject Headings-General Principles

The general principles that guide the indexers in the choice and rendering of subject headings from the standard list of subject headings are discussed in the following sub-sections:

Specific and Direct Entry - The principles of specific and direct entry require that a document be assigned directly under the most specific subject that accurately and precisely represents its subject content.

Common Usage - This principle states that the word(s) used to express a subject must represent common usage.

Uniformity - The principle of uniform heading is adopted in order to show what the library collection has on a given subject. One uniform term must be selected from several synonyms and this term must be applied consistently to all documents on the topic. The heading chosen must also be unambiguous. Similarly, if there are variant spellings of the same term or different possible forms of the same headings, only one is used as the heading.

Consistent and Current Terminology - A term chosen on the basis of common usage may become obsolete with the passage of time. Subsequently, a list of subject headings may incorporate current terminology. In such a situation a subject authority file is to be maintained.  Once a heading is changed, every record that was linked to the old heading can be linked to the new heading and this decision is recorded in the subject authority file.

Form Heading - In addition to the subject headings, there are form headings that have the same appearance as topical subject headings but refer to the literary or artistic form or form of material. Libraries that want to provide access to these kinds of materials may assign appropriate form headings to individual works as well as to collections and materials about the form.

Cross Reference - Cross-references direct the user from terms not used as headings to the term that is used, and from broader and related topics to the one chosen to represent a given subject. Three types of cross-references are used in the subject headings structure. These are discussed below:

a) See (or USE) references - These references guide users from terms that are not used as headings to the authorized headings for the subject in question. ‘See’ or ‘USE’ references ensure that in spite of different names for a given subject a user still be able to locate materials on it.

b) See also (including BT, NT, and RT) references - These references guide users to the headings that are related either hierarchically or associatively and are used as entries in the catalogue. By connecting related headings, the ‘See also’ (RT, for related term) references draw the user’s attention to material related to his interest. By linking hierarchically related headings, ‘See also’ (BT, for broader form; NT, for narrower term) references help the user to search specific aspects of his subject of interest.

c) General references - General references direct the user to a group or category of headings instead of individual headings. It is sometimes called a ‘blanket reference’. The provision of general references in the standard list of subject headings obviate the need to make long lists of specific references and thus ensure economy of space.


An indexing language is a language used to describe the subject or another aspect of the document in an index. The authority list that helps to encode the documents’ subject at the input stage (index terms) and also at searching stage (search terms) is called as a thesaurus.

A more formal definition of a thesaurus might be: An organized list of terms from a specialized vocabulary arranged to facilitate the selection of index terms as well as search terms.

A thesaurus differs from a conventional authority list such as Sear’s List, in that the terms are not necessarily alone but may be coordinated with other terms. The relationships between the terms are clearly defined by use of the following standard abbreviations:-
  • SN    Scope Note
  • UF    Used For
  • BT    Broader Term
  • RT    Related Term
  • SA    See Also
The alphabetical listing of index terms in a thesaurus consists of following types of terms:

a) Descriptor - That can be used as index terms to describe concepts contained in a document. this is also known as ‘preferred term’.

b) Non-descriptor - That cannot be used as index terms but appear in the thesaurus to expand entry words of the indexing language. They are also known as ‘non-preferred terms’.

Abressive Paper     (Preferred term/descriptor)
UF     Sand Paper   (non preferred term/non descriptor)
BT     Paper           (non preferred term/non descriptor)
RT     Abressives    (non preferred term/non descriptor)

Structure of Thesaurus

The internal form of individual entries and the arrangement of various entries in relation to one another constitute the structure of thesaurus. Cross-references make explicit the way in which entries relate to each other in a network of concepts. Each entry in a thesaurus consists of a pack of terms, which are related to it in different ways. The  different terms in the entry are displayed in the following format:
  • DESCRIPTOR (with scope note whenever needed)
  • Synonyms and quasi-synonyms (displaying equivalence relationship and denoted by the relationship indicator USE/UF (Use for)
  • Broader Terms (displaying hierarchical-superordinate relationship and denoted by BT)
  • Narrower Terms (displaying hierarchical-subordinate relationship and denoted by NT)
  • Related Terms (displaying associate relationship and denoted by RT)

A thesaurus may be either alphabetical, or classified, and it may or may not include a graphical display. In an alphabetical thesaurus, the descriptors followed by their relationships are listed in alphabetical sequences. In a classified thesaurus, the descriptors are listed in accordance with the hierarchical relationships represented in the thesaurus. The various levels of the hierarchy are shown by appropriate indentations. The graphical displays are multi-dimensional ways of representing the relationships between terms. Such relationships are indicated by arrows lines or by presenting term in concentric circles showing hierarchy.
Reciprocal entries appear for each term in a thesaurus whenever a relationship, whether hierarchical or non-hierarchical, is established between two terms.

Relationship Between Terms

The inter-relationships between the terms in a thesaurus are brought out by two basic types of relationships: (1) Hierarchical relationships, and (2) Non-hierarchical relationships.

Hierarchical Relationships - Hierarchical relationships refers to super-ordinate and subordinate relationship for a concept. this relationship may be of three types:
a) Genus-Species (Generic) relationship
b) Hierarchical Whole-Part relationship
c) Instance relationship

a) Genus-species (Generic) Relationship links genus and species and represents the basis of scientific, taxonomic system. As for example,

    NT Mice

    BT Rodents

Here, ‘rodents’ is a genus (broader concept) while ‘Mice’  represents its species (narrower concept).

b) Hierarchical Whole-Part Relationship means that the name of a part implies the name of its whole in any context. As for example,

    Cardio-Vascular System
    BT Circulatory System

    Circulatory System
    NT Cardio-Vascular System

Here ‘Cardio-Vascular System’ always refers to a part of its whole ‘Circulatory System’.

c) Instance Relationship occurs in a particular instance, which links proper name with a common noun. As for example,

    Mountain Regions
    NT Alps

    BT Mountain Regions

    Mountain Regions
    NT Himalayas

    BT Mountain Regions

Instance relationship is often not shown in a thesaurus to control its size.

Non-hierarchical Relationship - When two terms are related other than hierarchical, the relationship may be called non-hierarchical relationship. This relationship may be further grouped as:
a) Equivalence (or preferential) relationship, and
b) Associative  (or affinitive) relationship.

a) Equivalence (or Preferential) Relationship - Usually, it refers to the preferred terms and distinguishes such terms from the non-preferred terms. The symbol used to represent these relationships in a thesaurus are USE and UF (Used for). It will cater for controlling the following problems:

USE Handicaps
UF Disabilities
USE Intemperance
UF Temperance
Spelling Variant:
USE Labor
UF Labour
Polyvinyl chloride
UF Polyvinyl Chloride
Specific to general:
USE Dogs
UF Alsatiens

b) Associative (or Affinitive) Relationship - This relationship is employed to cover other relationship between terms that are related but are neither consistently hierarchical nor equivalent. In other words, here two terms are conceptually associated on a number of different basis while satisfying the requirement that one of the terms should function as a component in any explanation or definition of the other. They are indicated by the code RT (Related Term). Some examples of associative relationships are demonstrated below:

i) A discipline or field of study and the object or phenomenon studied:
Birds                             Ornithology
RT Ornithology             RT Birds

ii) An action and its property:
Indexing                        Efficiency
RT Efficiency                RT Indexing

iii) An action and resulting product:
Weaving                       Cloth
RT Cloth                       RT Weaving

iv) Coordinate ideas:
Classification                Cataloguing
RT Cataloguing            RT Classification

v) Ideas having common elements in their definition:
Management                Administration
RT Administration        RT Management

Thesauri and Subject Headings List

Both thesauri and subject headings list are vocabulary control devices, but they are used in different situation. The essential characteristics which differentiate them are considered in the following sections:

a) Subject headings list fulfills the needs of pre-coordinate indexes, whereas a thesauri is designed to meet specific needs of post-coordinate indexes.

b) Thesauri generally contain terms that are more specific than these found in conventional subject heading list.

c) A thesaurus normally avoids inverted terms such as ‘Psychology, children’.

d) The relationship display is more extensive in the case of thesaurus than in the case of traditional subject headings list. Incidentally, some well-known subject headings list such as Library of Congress Subject Headings and Sear’s List of Subject Headings, in their latest editions, have adopted thesaurus format, thereby showing the relationships existing between terms.

e) The relationships between terms listed in a thesaurus are not transferred to the indexes in many cases Dictionary Catalogues normally provide ‘See’ and ‘See also’ references linking the related subject headings.

The above are some significant aspects that distinguish a thesaurus from a conventional subject heading list.

Thesauri and Classification Schemes

A classification scheme, especially a faceted and hierarchical one, is able to show hierarchical, faceted and phase relationships, but often misses other associative and equivalence relationships. However, the real difference between a classification scheme and a thesaurus lies in their purpose and use.


This concept has been developed by Jean Aitchison and others for English  Electric Company. It is basically a faceted classification, integrated with a thesaurus. Thesauro-Facet consists of two sections: a) faceted classification scheme, and b) alphabetical thesaurus. Here, the thesaurus replaces the alphabetical subject index, which normally follows the schedules in a conventional faceted classification. Terms appear twice once in the schedule and once in the alphabetical thesaurus, the link between two locations being the notation or class number. It can be used in both pre-and post-coordinate indexing systems.


It is also a vocabulary control device developed by Dr. Ganesh Bhattacharya at DRTC that incorporates in itself features of both a faceted classification scheme as well as that of a conventional alphabetical thesaurus. It is an elementary category-based (faceted) systematic scheme of hierarchical classification in verbal plane incorporating all the necessary and sufficient features of a conventional information retrieval thesaurus. Like any classification scheme, it displays hierarchical relationships among terms in its schedules. Like a faceted classification scheme, there are separate schedules for each of the Elementary Categories (Entity, Property, and Action) and for common modifiers like Form, Time, Place, and Environment. Like any thesaurus, each of the terms in the hierarchic schedules is enriched by synonyms, quasi-synonyms, etc. Unlike a thesaurus, a classaurus does not include other associatively related terms (RTs) because of its category-based (faceted) structure. It is said that a term in one elementary category has a high chance of being associatively related with another term in another category depending on the subject of the document. It is assumed that RTs should not be dictated by the designer of the classaurus, rather it should be dictated by the document itself since any term may be associatively related to other terms depending on the nature of the thought content of the document. The classaurus has two parts: the Systematic Part and the Alphabetical Index Part.
The Systematic Part consists of common modifiers. Each entry in the Systematic Part consists of Descriptor; Definition/Scope Note (whenever needed, Synonyms (UF), if any, parts, and species/types. The hierarchy of terms is shown by indentation indicated by ‘.’ (dot). Terms in an array are arranged according to alphabetical order. Each term in the Systematic Part is assigned a  unique alphanumeric code. The Alphabetical Index Part to the Classaurus contains each and every term including synonyms occurring in the Systematic Part along with its address (i.e. alphanumeric code).

Systematic Arrangement

The above discussion on showing semantic relationships is related to one method of arrangement, the alphabetical.  We can also show relationships by juxtaposition, that is, grouping related concepts together in a systematic arrangement to form a classification scheme. Such an arrangement will show hierarchical relationships as well as coordinate relationships, and may well also show others such as instruments and materials. In this way, a substantial part of the cross-reference structure required by an alphabetical arrangement is eliminated, because the relationships are shown by the way that the concepts are grouped. We normally arrange books on the shelves of a library in this way in order to help the user, who will find the books they are interested in shelves in the same area.

There is, however, a price to be paid for this advantage. If we group our preferred terms systematically, then the order in which they occur is no longer self-evident, and we are forced to introduce a notation or code vocabulary to show the order and enables to find particular concepts among the systematic arrangement. The entry vocabulary now becomes doubly important, because not only does it contain all the non-preferred terms as well as the preferred terms, but being arranged alphabetically-it also forms our only means of access to the systematic arrangement, via the voice vocabulary. We need to look up terms in which we are interested in the entry vocabulary, which will tell us what codes have been used to denote them:

Electronics    621.381    (DDC)
Cyclotrons    621.384.61    (UDC)
Disease (Medicine)    L:491    (CC6)

Equivalence relationships are catered for by simply showing the same code for each; in fact, all the entries in the entry vocabulary may be regarded as equivalence relationships, in that they show the heading used for arrangement (in this case a piece of notation) for both preferred and non-preferred terms. In the schedules of the scheme, i.e. the list of index vocabulary terms in systematic order, we shall find only the preferred terms.
Another problem arises because, of the occurrences of a concept in more than one hierarchy, where we find both generic, permanent, relationships, and quasi-generic relationships representing applications. So the same basic concept may be represented by more than one code, depending on the context within which it appears:

                  botany                 583.29
                  hygiene                613.8
                  social customs     394.1

Systematic arrangement can show many of the categories of relationships we have identified, either by juxtaposition in the schedules or by the complementary juxtaposition of entries in the alphabetical sequence of the entry vocabulary. However, this does not cover all the affinitive/associative group, some of which may actually be hidden by the arrangement. The only way in which these may be drawn to the attention of the indexer or searcher is through cross-references in the schedules or in the entry vocabulary. Unfortunately, such cross-references are the exception rather than the rule in most classification schemes, this may well be a reflection of the fact that only in recent years have we begun to clarify the nature of relationships which may occur between concepts.


Information Access Through The Subject : An Annotated Bibliography / by Salman Haider. - Online : OpenThesis, 2015. (408 pages ; 23 cm.)

Annotated bibliography titled Information Access Through The Subject covering Subject Indexing, Subject Cataloging, Classification, Artificial Intelligence, Expert Systems, and Subject Approaches in Bibliographic and Non-Bibliographic Databases etc. 

MLIS Thesis is available and discussed in following places: 
Information Access Through The Subject

The project "annotated bibliography" was worked out as Master of Library & Information Science (MLIS) dissertation in the Department of Library and Information Science, A.M.U, IndiaInformation Access Through The Subject is a very much appreciated work (see Testimonials). It earned the author S. Bashiruddin – P. N. Kaula Gold Medal, Post Graduate Merit Scholarship, First Division, and IInd Position in the MLIS program.



  • Written 2017-03-18

  • Help us improve this article! Contact us with your feedback.

Thanks all for your love, suggestions, testimonials, likes, +1, tweets, and shares ...