SP Publications
International Journal Of English and Studies (IJOES)
An International Peer-Reviewed Journal ; Volume-4, Issue-1, 2022 www.ijoes.in ISSN: 2581-8333; Impact Factor: 5.421(SJIF)
ISSN: 2581-8333 Copyright © 2022 SP Publications Page 73
RESEARCH ARTICLE
Presenting Sambalpuri-Kosli Language: A
Demonstration of Limited-Resourced Language ____________________________________________________________________________________________
Dr. Bipin Bihari Dash,Assistant Professor in English, Odisha University of Technology and Research,OUTR, Bhubaneswar-
751029 Odisha, India
Abstract:
Language is one of such components that everybody has a
stake in; which encapsulates a consortium of elements such as
culture of a community, indigenous knowledge paradigm,
social and religious values, folklores and so on. One of the
salient objectives of this research is to document and describe
the Sambalpuri-Kosli language by way of preparing an online
dictionary which could prove to be a stepping stone for the
technological advancement and future research into it. The
present dictionary is a multilingual, web-based and thematic
dictionary of around 600 words collected from three
domains-flora and fauna, kinship, and body parts.
Documenting one’s own language is to archive and
disseminate it for the posterity; this can be done better none
other than by making a web-based dictionary. The data in
the form of lexicon has been encoded with the Toolbox, and
Lexique Pro has been used for its online launch. The data has
been analyzed and processed in such a manner that it can be
comprehended by researchers of other disciplines. The
concerned paper explicates that the dictionaries are not only
a repository of lexicons which are mere representatives of the
linguistic knowledge but also a plethora of cognition-database
of a particular speech community embedded in the same
language such as cultural, anthropological, ethnographic,
social and so on. Furthermore, it has been attempted to look
at how other ontological information are inherently pertained
to language.
Keywords: Toolbox, Lexique Pro, Lexicography,
Documentation, Lexicon
INTRODUCTION One of the pertinent issues in the arena of
language in the present era is that the languages are
challenged with an alarming rate of their extinction. It has
been apprehended that the forthcoming century will
eyewitness the fast disappearance of the languages
‘without being adequately recorded’ (Krauss, 1992,
Crystal 2002:19). The current world-wide distribution of
languages exhibits that majority (3586) of world’s
languages are spoken by approximately a meagre population (0.2%) whereas a minor number of languages
(83) are spoken around 79% of the world’s population
(Harrison, 2007:14). Besides, most of the languages are
less-resourced and less-described in terms of the
availability of the electronic corpora on one hand and the
amount of linguistic research on the other respectively.
Because of the existence of a dominant language, the
minor languages are not able to captivate the attention of
the government for their patronization and as a result they
are consistently and indifferently neglected which thereby
results in the endangerment of the language.
As rightly put forth by Ostler (1993), languages that are
lacking active participation in the electronic media are
subjected to be endangered. So, these languages are either
dialects or languages having no government patronization (Behera et al., 2015). As a consequence, the situations of
these languages in South Asia in general and particularly
in Indic languages are ‘relatively bleak’ (McEnry et al.,
2000). Although India is a land of more than 1500
languages with five prominent diverse language families
(Abbi, 2001), only 22 are scheduled and the rest are
fighting for their survival. Therefore, it is indispensable to
document, describe and archive those languages fighting
for survival and make them available online for the
posterity for conducting further natural language
processing research and development on them.
Description of a language refers to describing the
formal properties of language like phoneme, morpheme,
sentence, and at other higher levels. Language
documentation complements language description which aims at describing a language's abstract system of
structures and rules in the form of a grammar or
dictionary.
Documentation, as put forth by Himmelmann (2006:01), is a “lasting, multipurpose record of a
language”. Broadly speaking, in other words, it is
concerned with the compilation and preservation of
SP Publications
International Journal Of English and Studies (IJOES)
An International Peer-Reviewed Journal ; Volume-4, Issue-1, 2022 www.ijoes.in ISSN: 2581-8333; Impact Factor: 5.421(SJIF)
ISSN: 2581-8333 Copyright © 2022 SP Publications Page 74
RESEARCH ARTICLE
linguistic primary data and interfaces between primary
data and the analysis based on that. The primary data
includes audio and video recordings of a communicative
event and field notes taken on elicitation session as well.
Typical steps involve recording, transcribing (often using
the International Phonetic Alphabet and/or a "practical
orthography" made up for that language), annotation and
analysis, translation into a language of wider
communication, archiving and dissemination.
One of the innovative ways to document a language and
the phenomena pertaining to language is with the
lexicographic documentation. Lexicography is the applied
study of the meaning, evolution, and function of the
vocabulary units of a language for the purpose of
compilation in book form in short. Perhaps the simplest explanation of lexicography is that it is a scholarly
discipline that involves compiling, writing, or editing
dictionaries. Lexicography is widely considered an
independent scholarly discipline, though it is a subfield
within linguistics.
There are two types of lexicography. They are as follows:
Practical: It is the art or craft of compiling, writing and
editing dictionaries.
Theoretical: It is the scholarly discipline of analyzing and
describing the semantic, syntagmatic and pragmatic
relationships within the lexicon of a language, developing
theories of dictionary components and structures linking the data in dictionaries, the needs for information by users
in specific types of situation, and how users may best
access the data incorporated in printed and electronic
dictionaries. This is sometimes referred to as 'meta-
lexicography'. “These dictionaries of endangered
languages comprise a wider inventory from a variety of
speech genres, with sophisticated multimedia materials,
and new ways of preserving cultural memory and
representing semantic and cultural ontologies.” (Ogilvia,
2011: 389-404)
It shows multidisciplinary nature and draws on
theoretical concepts and methods from linguistics,
ethnography, folklore studies, psychology, information
and library science, archiving and museum studies, digital
humanities, media and recording arts, pedagogy, ethics, and other research areas. Its major goal is the creation of
well-organized, long-lasting corpora that can be used for a
variety of purposes, including theoretical research and
practical needs such as language and cultural
revitalization. Another prominent feature is attention to the
rights and desires of language speakers and communities
and collaboration with them in the recording, analysis,
archiving, dissemination, and support of their own
languages.
AIMS AND OBJECTIVES
One of the salient objectives of this concerned
research is to create a lexicon of the Sambalpuri-
Kosli in an electronic version so that future
research can be initiated on the linguistic aspects
of the language.
Secondly, to present the socio-cultural,
anthropological, ethnographic aspects of the
region where the language is spoken so as to deep
delve into the indigenous knowledge system underlined by the language.
To avail the language to the researchers of the
other interdisciplinary branches so as to explore
the other language-pertaining aspects in future.
To cater to the linguistic needs of the Western
Odisha region and to make use of the dictionary
for teaching-learning process through the
language.
To publish in both the versions i.e. print and
electronic so that it reaches to all from those who
have access to the technology and to those who do not.
It has been documented in three languages, viz.
English, Hindi and Sambalpuri so that it is
comprehensible to all speakers.
BACKGROUNDS
Sambalpuri-Kosli (ISO 639-3 spv) belongs to the
Indo-Aryan Language family largely spoken in vast
geographical distribution of ten districts (Sambalpur,
Bargarh, Bolangir, Sonepur, Kalahandi, Sundargarh, Boud,
Deogarh, Nuapada and Jharsuguda) with approximately 18 million (Census Report, 2001) people of western, south-
western and north-western Odisha and some parts of
Jharkhand and Chhattisgarh as well.
Although there is adequate amount of literature available
in the Sambalpuri-Kosli, the linguistic research and
development is quite negligible. The attitude of the
speakers towards the language is quite positive and the
domains of use are more in the informal setting than in the
formal ones. The language is not used as a language in the
SP Publications
International Journal Of English and Studies (IJOES)
An International Peer-Reviewed Journal ; Volume-4, Issue-1, 2022 www.ijoes.in ISSN: 2581-8333; Impact Factor: 5.421(SJIF)
ISSN: 2581-8333 Copyright © 2022 SP Publications Page 75
RESEARCH ARTICLE
pedagogic process rather the dominant language, i.e. Odia,
supersedes in this matter.
The proposed paper undertaken pertains to the
documentation of a less-resourceful and less
technologically advanced language named Samblapuri-
Kosli with Lexique Pro (see Figure 1) and Toolbox. In the
concerned paper an effort has been envisaged for the
making of an online Sambalpuri dictionary under the
semantic domains of body parts, kinship terms, and flaura
and fauna. The domain of body parts has further been sub-
divided into two broad categories: internal and external. In
addition, the domain of flora and fauna has been sub-
categorized into six more categories: creepers, fruit plants,
vegetable plants, flower, weather, & other trees.
Furthermore, kinship terms have been categorized into
affinal and non-affinal. The dictionary is a multilingual
(Sambalpuri-Hindi-English) dictionary (see fig 1)
comprising approximately of four hundred words under
the aforementioned semantic categories.
Fig 1 Lexique Pro Sambalpuri-Kosli Lexicon Snapshoot
The languages employed in the dictionary are Sambalpuri for phonetic transcriptions and drawing examples, English for descriptions, glossing, indicating parts of speech, and drawing examples, Hindi for descriptions, examples instantiation and
gloss of each word. “These dictionaries challenge the traditional types of dictionaries because they are everything in one. They
combine aspects of the learners dictionary, historical dictionary, encyclopaedic dictionary, talking dictionary, pictorial
dictionary, video dictionary, and visual thesaurus” (Ogilvia, 2011: 389-404).Furthermore, it also contains pictures and audios
of each word in electronic format which makes it a talking dictionary. Besides, it provides with scientific nomenclatures,
etymological reference, cross-reference, details of the source, morphemic breaks, if needed and the metadata like dates of
entries and the parts of speech of each respective word.
SP Publications
International Journal Of English and Studies (IJOES)
An International Peer-Reviewed Journal ; Volume-4, Issue-1, 2022 www.ijoes.in ISSN: 2581-8333; Impact Factor: 5.421(SJIF)
ISSN: 2581-8333 Copyright © 2022 SP Publications Page 76
RESEARCH ARTICLE
Fig 2 an Exemplary Entry of the Lexicons
Special Features of the Dictionary:
Multilingual
Contains images and audios of each word
Morphemic breaks
Parts of speech
Entry of the source Contains scientific names of words pertaining to flora and fauna
Etymological reference
Cross reference
Method of Data Collection: The data has been collected from the Sambalpuri speakers of western Odisha. The rest of the data
is proposed to be collected from the Sambalpuri blogs and Facebook from the native speakers living there. The data collected
and documented as of now is from three below-mentioned domains or themes (see figure 4).
Body parts: a. Internal
b. External
Kinship terms: a. Affinal
b. Non-Affinal
Flora and fauna: a. Animals: birds, reptiles, insects, and wild animals.
b. Plants: creepers, herbs, trees
SP Publications
International Journal Of English and Studies (IJOES)
An International Peer-Reviewed Journal ; Volume-4, Issue-1, 2022 www.ijoes.in ISSN: 2581-8333; Impact Factor: 5.421(SJIF)
ISSN: 2581-8333 Copyright © 2022 SP Publications Page 77
RESEARCH ARTICLE
Fig 3 Three Domains of Data Collection
Method of Data Analysis:
After data collection the complete lexicon is given entry with the help of Toolbox Software and for online launching
and upload Lexique Pro is used.
For the recording of the audio files, assistance has been taken from Audacity software and Angel SV 200mA recorder.
The analysis has been conducted at two levels, i.e. linguistic and the cultural and the relation between the two in a
dictionary-making enterprise.
Linguistic Analysis:
Sambalpuri has loaned many words from the other Indian languages and others into its lexicon. Out of the total number of
lexicons around 33 percent is from the indigenous Sambalpuri, about 38 percent of them are from the Odia language, approximately 27 from Hindi and the rest constitutes the other languages (see Chart 1).
SERIES 1
Sambalpuri Odia Hindi English and others
Semantic Domains
Body Parts
Internal
External
KinshipTerms
Affinal
Non-Affinal
Flora and Fauna
Creepers
Vegetables
Flower
Fruit
Weather
Miscellaneous
SP Publications
International Journal Of English and Studies (IJOES)
An International Peer-Reviewed Journal ; Volume-4, Issue-1, 2022 www.ijoes.in ISSN: 2581-8333; Impact Factor: 5.421(SJIF)
ISSN: 2581-8333 Copyright © 2022 SP Publications Page 78
RESEARCH ARTICLE
Noun (body parts) (n > v)
Verb
/kɑn/ ‘ear’
/kənɑ/ ‘hear’
/kɑnd̪ʰ/ ‘shoulder’
/kənd̪ʰɑ/ ‘bear’
noun
Verb ( to+ infinitive)
/nɑk/ ‘nose’
/nəkɑbɑr/ ‘to smell’
/kɑn/ ‘ear’
/kənɑbɑr/ ‘to hear’
Compound nouns (flora & fauna)
/bɪleɪ/ ‘cat’ + /ɑɛk̃ʰ/ ‘eye’
Noun+noun
/bɪleɪ ɑɛ̃kʰ/ ‘cat‟s eye’
/hɑt̪ɪ/ ‘elephant’ + /muɖɪɑ/ ‘headed’
Noun+adjective
/hɑt̪ɪ muɖɪɑ/ ‘elephant-headed’
With respect to the verbalizations, verbs are formed with the addition of ‘-ɑ’ and with the reduction of the vowels /ɑ/ and /ə/
from /ə/ and /ʊ/ respectively in words consisting of single syllables. With regard to the verbs of infinitive construction „-bar‟
suffix is added to the stems of the verbs of directions with the reduction of the vowels /ɑ/ and /ə/ from /ə/ and /ʊ/ respectively.
In consideration to the compound noun formation two types of constructions are noticeable viz. noun + noun and noun +
adjective.
Body parts( n > adj )
/hɑ:t̪/ ‘hand’ /hɑ:t̪e/ ‘hand-sized’
/peʈ/ ‘stomach’ /peʈe/ ‘full-stomach’
/pɑ:d̪/ ‘foot’ /pɑ:d̪e/ ‘one foot’
/ɑ:̃ʈʰʊ/ ‘knee’ /ɑ:ʈ̃ʰe/ ‘length upto knee’
/mʊɖ/ ‘head’ /mʊɖɑ/ ‘bent’
In the field of body parts adjectives are formed with the addition of the derivational morphemes ‘-e’ and
‘-ɑ’ to the roots. In addition, in some cases the vowel at the nucleus which is longer (e.g. /ɑ:/) used in the
nouns gets centralized (e.g. /ə/) in the adjectives.
SP Publications
International Journal Of English and Studies (IJOES)
An International Peer-Reviewed Journal ; Volume-4, Issue-1, 2022 www.ijoes.in ISSN: 2581-8333; Impact Factor: 5.421(SJIF)
ISSN: 2581-8333 Copyright © 2022 SP Publications Page 79
RESEARCH ARTICLE
Noun>adjective Adjective
/kʊkʊr/ „dog‟ /kʊkʊrɪɑ/ „doggish
/mɑkəɖ/ „monkey‟ /mɑkəɖɪɑ/ „ugly‟
/ gʰʊsrɪ/ „pig‟ / gʰʊsrɪɑ/ „piggish‟
So far as the words of flora and fauna are concerned, adjectivalizations are formed with the addition of ‘-ɪɑ’ suffix to the root.
Cultural Analysis:
Words are cultural, archaeological, and environmental signatures of a community. “But more important, for humanity in
general, is the need to preserve cultural diversity and knowledge systems that can be encoded in a dictionary” (Ogilvia, 2011:
389-404). There are many such words-/ʈə̃ɖʰɛɪ pok/ ‘praying mantis’, /ərəkʰ gəcʰ/ ‘calotropis tree’, /d̪ʊd̪ʰrɑ gəcʰ/ ‘stramonium
plant’, /kəi ̃ gəcʰ/ ‘water lily plant’, /d̪ʰəmnɑ/ ‘the female cobra’, /cɑt̪ək/ ‘the swallow bird’ that provide us with ample
information regarding the culture of a specific speech community. For instance, /ʈə̃ɖʰɛɪ/ means witch and /pok/ refers to the worm in Sambalpuri. In other words, /ʈə̃ɖʰɛɪ pok/ (see Figure 5)
denotes to ‘the worm of the witch’. Hence, Praying mantis is believed to be the agent of a witch that is going to suck the blood
of the person on whom the spell is triggered at night, especially on the full moon and new moon nights.
It is one of the popular blind beliefs of the language speakers of the region. /cɑt̪ək/ (see Figure 6) ‘the swallow bird’ is believed to be one of the rarest birds which do not drink water from the water present on the earth’s crust; it directly drinks water when
the rain comes. In Sambalpuri /cɑhə̃/ means ‘want’ or ‘look’. So the word /cɑt̪ək/ probably has been derived from the word
/cɑhə̃/.
Fig 5 praying mantis Fig 6 the swallow bird
/ərəkʰ gəcʰ/ (see Figure 7) ‘calotropis tree’, /d̪ʊd̪ʰrɑ gəcʰ/ (see Figure 8) ‘stramonium plant’ are two of the plants belonging to
the flora and fauna domain refer to the religious aspect of the region. The flowers of the plants are worshipped to Lord Shiva
that cannot be used for worshipping any other gods and goddesses. This aspect denotes the fact that most of the speakers of the
language are from the Hindu community. /d̪ʰəmnɑ/ (see Figure 9) ‘the female cobra’ is referring also to the religious aspect of
the community. The witnessing of the mating of the king and queen cobra is considered as auspicious by the people.
SP Publications
International Journal Of English and Studies (IJOES)
An International Peer-Reviewed Journal ; Volume-4, Issue-1, 2022 www.ijoes.in ISSN: 2581-8333; Impact Factor: 5.421(SJIF)
ISSN: 2581-8333 Copyright © 2022 SP Publications Page 80
RESEARCH ARTICLE
Fig 7 Calotropis tree
Fig 8 the female cobra
Fig 9 Stramonium plant
/kəi ̃ gəcʰ/ (see Figure 10) ‘water lily plant’ sprouts in
the ponds generally and the ponds are extremely deep.
There is a belief that the lilies are the homelands of the
Gods and Goddesses and one must not pluck them. If
one plucks them they are sure to face some problems.
This phenomenon is also manifested in one of the
festivals where a large number of people gather to
celebrate the occasion known as /bərʊɑ/ ‘Barua’. In
this festival some specific persons get possessed by
one of the deities and they typically behave with the
characteristics of the respective deities.
SP Publications
International Journal Of English and Studies (IJOES)
An International Peer-Reviewed Journal ; Volume-4, Issue-1, 2022 www.ijoes.in ISSN: 2581-8333; Impact Factor: 5.421(SJIF)
ISSN: 2581-8333 Copyright © 2022 SP Publications Page 81
RESEARCH ARTICLE
Fig 10 water lily plant
Conclusion:
English, Chinese, Spanish, French, Japanese and more of
the European and Western languages are high-resource.
There is already a vast corpus of data in those languages
which can be tapped for training and learning. Low-
resource languages are those that have relatively less data
available for training conversational systems. One of the
best aspects of today’s hyper connected world is that the
fruits of technological innovation can spread across the
globe. There is no reason a new technology breakthrough
in the present world could not also be replicated to help
societies in India. ‘Odia’ is a suitable language in Odisha
but Sambalpuri-Kosli is a limited-resourced language.
From the foregoing discussion and on the basis of the
above analysis it can however be affirmed that undoubtedly the dictionaries provide the readers with an
abundant storehouse of the language under study. In
addition, the dictionaries, being the product of the
language also encapsulates plethora of the other semantic
and ontological information with respect to the socio-
SP Publications
International Journal Of English and Studies (IJOES)
An International Peer-Reviewed Journal ; Volume-4, Issue-1, 2022 www.ijoes.in ISSN: 2581-8333; Impact Factor: 5.421(SJIF)
ISSN: 2581-8333 Copyright © 2022 SP Publications Page 82
RESEARCH ARTICLE
cultural aspect, indigenous knowledge paradigm,
philosophical and religious values, folklores and so on.
REFERENCES
[1]Abbi, A. Manual of Linguistic Fieldwork and
Structures of Indian Languages. Lincom
Europa. 2001
[2]Behera, P., Ojha, A. K., Jha, G. N.. Issues and
Challenges in Developing Statistical
POS Taggers for Sambalpuri. In
Proceedings LTC-2015, Poland,
Springer Verlag. Accessed on
23.02.2016
http://ltc.amu.edu.pl/book/papers/LRL- 13.pdf. 2015
[3]Behera, P. & Ojha, A. K. Developing an
Automated SVM POS Tagger for
Sambalpuri: the Case of a Lesser-
known Language. In Proceedings of
ELKL-4, 2016, Cambridge Scholars
Publishing (to be published), India. 2016
[4]Behera, P. Issues and Challenges in Corpus
Collection and Annotation of
Sambalpuri: the Case of a Lesser-known
Language. Proceedings of ELKL-4, 2016, Cambridge Scholars Publishing
(to be published), India. 2016
[5]Buseman, A. & Buseman, K. Toolbox Self-
Training- How to use the Field
Linguist’s Toolbox, Version 1.5.9 Ma.
2011
[6]E.Coward, D., & E. Gimmes, Charles. Making
Dictionaries. North Carolina: SIL. 2000.
[7]Himmelmann, N. P. Language Documentation:
What is it and what is it good for.
Essentials of language documentation, 178, 1. 2006
[7]Jha, G. N., Hellan L., Beermann, D., Singh, S.,
Behera, P. & Banerjee, E. (2014). Indian
Languages on the TypeCraft
Platform– The Case of Hindi and Odia,
Proceedings of WILDRE-2014 (ISBN:
978-2-9517408-8- 4):
Rekyavijk, Iceland. Accessed on
23.02.2016 http://www.lrec-
conf.org/proceedings/lrec2014/workshop
s/LREC2014Workshop-
WILDRE%20Proceedings.pdf
[8]Kushal, G. Case and Agreement in
Sambalpuri. Centre for Linguistics, Jawaharlal Nehru Univerity. 2015
[9]Mathai, E. K. & Kelsall, J. Sambalpuri of
Orissa, India: A Brief Sociolinguistic
Survey. SIL International. 2013
[10]McEnery, T., Baker, P., & Burnard, L.
Corpus resources and minority language
engineering. In LREC. 2000
[11]Ostler, N. Language technology and the
Smaller Language. Elra Newsletter, 4(2).
1999
[12]Ogilvie, S. Linguistics, lexicography, and the
revitalization of endangered languages. International Journal of
Lexicography, ecr019. 2011
[13]Ogilvie, Sarah. Linguistics, Lexicography,
and the Revitalization of the Endangered
Languages. International Journal of
Lexicography, Vol. 24 No. 4, pp. 389–
404 doi:10.1093/ijl/ecr019. 2011
[14]Ojha, A. K., Behera, P., Singh, S. & Jha, G.
N. (2015). Training & Evaluation of
POS Taggers in Indo-Aryan
Languages: A Case of Hindi, Odia and Bhojpuri, InProceedings of LTC-2015,
Poland, Springer Verlag.Accessed on
23.02.2016
http://ltc.amu.edu.pl/book/papers/TANO
2-2.pdf
[15]Patel, Kunjabana. A Sambalpuri Phonetic
Reader. Sambalpur: Menaka Prakashani.
2017