1 computational investigation of palestinian arabic dialects ezra daya rafi talmon shuly wintner
TRANSCRIPT
![Page 1: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/1.jpg)
1
Computational Investigation of Palestinian Arabic Dialects
Ezra DayaRafi TalmonShuly Wintner
![Page 2: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/2.jpg)
2
Background
Fieldwork study refers to Arabic
dialects spoken by people in 250 localities –
Northern and central parts of Israel. Localities in the West Bank. Southern Lebanese communities in Galilee. 1948’s Palestinian refugees in existing Arabic
localities .
![Page 3: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/3.jpg)
3
Background cont.
Colloquial Arabic featuresColloquial Arabic features:
Non-official spoken language, usually not written. Differs from place to place. The similarity/distance between the Arabic dialects can be measured Considered by the speakers as less prestigious compared to the official Arabic.
![Page 4: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/4.jpg)
4
Background cont.
Work performed by special teamsWork performed by special teams :: Collecting and processing fieldwork material such as
recorded interviews and linguistic questionnaires. Transcription of the material that constitutes the
basis of our work. Defining an accurate description of the language
varieties of Palestinian colloquial Arabic, their characteristics, and their geographical distribution.
![Page 5: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/5.jpg)
5
Transcribed Text Sample
![Page 6: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/6.jpg)
6
Objectives
Publication of the vast collected material using computational linguistic techniques in order to:
Create lexicons and glossaries for Arabic dialects automatically. Create a linguistic atlas to graphically measure the similarities
among the dialects. Better understanding of morphological and phonemic
dialectology features.
![Page 7: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/7.jpg)
7
Linguistic Atlas
![Page 8: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/8.jpg)
8
The challenge – Rich Morphology
Semitic languages such as Arabic, have a rich morphology and contain highly inflected forms. Example:
axdat is 3nd, singular, feminine, past form of the verb axad Obtained by concatenating the suffix ‘at’ and reducing the vowel ‘a’ to the base axad.
![Page 9: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/9.jpg)
9
Rich Morphology cont.
Arabic has a complex system of morphology based
on triconsonantal roots that is common in Semitic
languages.
For example, there are 10 verb patterns, each
of which can be inflected in 3 numbers, 2 genders,
3 persons, several tenses and aspects, and can be
suffixed by several pronominal forms.
![Page 10: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/10.jpg)
10
Traditional Approach
Assignment of linguists performing grammatical analysis of the transcribed texts and manually creating lexicon, glossaries and linguistic atlas.
Disadvantages: Lack of sophistication. Time consuming. Expensive human resources.
![Page 11: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/11.jpg)
11
Innovative Approach
Devise an automated analysis of these transcribed texts, in order to obtain: An automated creation of a glossary to organize all the
lexical items by grammatical features. i.e. root, pattern etc.
Isolation of the phonetic and morphological features and characteristic of specific dialects in this surveyed area.
Measurement of dialect similarity. Automated processing provides accuracy and efficiency .
![Page 12: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/12.jpg)
12
Linguistic Technologies
For this research we intend to exploit existing computational linguistics technology for the investigation of Palestinian Arabic dialectsby using:
Finite-State technology. Machine learning techniques. Computational dialectology.
![Page 13: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/13.jpg)
13
Finite State Technology
Employing the Xerox finite state tools and techniques which are:
Useful and efficient programs that process text in natural languages.
Concentrating on morphological analysis and generation.
Giving access to finite state operations and a regular expression compiler.
![Page 14: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/14.jpg)
14
Machine Learning
Machine learning is concerned with the question of how to construct computer programs that
automatically improve with experience. Two distinguished learning frameworks according to the amount of supervision used:
– Supervised learning when the learning algorithm is presented with pairs of
strings of symbols., i.e. inflected and uninflected forms.– Unsupervised learning when the algorithm is presented merely with a single
set of words, and must work out what the morphological relationships are.
![Page 15: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/15.jpg)
15
Computational Dialectology
Use measures to compute the distance
between two given dialects and to define
geographical dialect boundaries.
Example: Edit Distance The distance could be set sensitive to
phonological similarities.
Example: |)||,(||||,| fddisttddist
![Page 16: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner](https://reader036.vdocuments.site/reader036/viewer/2022072013/56649e5e5503460f94b57309/html5/thumbnails/16.jpg)
16
Previous Related Work
Morphological Tagging of the Qur’an:Morphological Tagging of the Qur’an:
The system facilitates a variety of queries on the Qur’anic text that make reference to the words and their linguistic attributes and provides full morphological tagging of its words.
The core of the system is a set of finite-state based rules which describe the morpho-phonological and morpho-syntactic phenomena of the Qur’anic language. The system is currently being used for teaching and
research purposes.