sentiment analysis on bangla and romanized bangla text ... · sentiment analysis on bangla and...

Sentiment Analysis on Bangla and Romanized

Bangla Text (BRBT) using Deep Recurrent

models

Asif Hassana, Dr. Nabeel Mohammed

a,b, Dr. Abul Kalam al Azad

a

aDepartment of Computer Science and Engineering, University of Liberal Arts Bangladesh, Dhaka, Bangladesh

bFaculty of Information Technology, Monash University, Clayton Campus, Melbourne, Australia

Abstract

Sentiment Analysis (SA) is an action research area in the digital age. With rapid and constant

growth of online social media sites and services, and the increasing amount of available textual

data such as - statuses, comments, reviews, blogs etc. in them, application of automatic SA is

also on the rise. However, most of the research works on SA in natural language processing

(NLP) are based on English language. Despite being the sixth most widely spoken language in

the entire world, Bangla still does not have a proper dataset that is both large and standard. As a

result, recent research works in Bangla have failed to produce results that can be both

comparable to works done by others and reusable as stepping stones for future researchers to

progress in this field. Therefore, in our work we first tried to provide a textual dataset - that

includes not just Bangla, but Romanized Bangla texts as well, which is substantial, post-

processed and multiple validated, for using it in SA experiments. We tested this dataset in Deep

Recurrent model, specifically, Long Short Term Memory (LSTM), using two types of loss

functions – binary crossentropy and categorical crossentropy, and also did some experimental

pre-training by using data from one validation to pre-train the other and vice versa. Lastly, we

documented the results along with some analysis on them, which was promising.

2

1.0: Introduction

The purpose of this thesis is to discuss our work on Sentiment Analysis (SA) on Bangla

(Bengali) and Romanized Bangla texts, using deep recurrent models. Bangla is one of the top 10

most widely spoken languages in the world, with almost 200 million speakers worldwide, 160

million of whom are Bangladeshi [2]. With a growing economy, declining price of technology

and Government incentives, the traditional businesses that adopted IT and the IT sector as a

whole in Bangladesh have enjoyed considerable and rapid growth; which in turn has widened the

scope for more Bangladeshi people to get involved in online activities such as - getting

connected to friends and families through social media, expressing their opinions and thoughts

on popular micro-blogging and social networking sites, sharing opinions and thoughts by means

of comments on online news portals, doing online shopping through online marketplaces and

other such applications. While there are many advantages for online-based businesses, there are

disadvantages too. It becomes increasingly harder for such businesses to monitor and analyze

market trend, especially when it is done by analyzing the reaction of the customers on their

products or services, due to less or no human-to-human interaction in such businesses. Moreover,

the task of going through comments and reviews from each individual customers and figuring

out the sentiments within is tedious and in some cases simply intractable, especially considering

that usually very high volume of data is generated very quickly in this day and age of digital

connectivity. Therefore, application of automatic SA can play a vital role here for increasing

efficiency and productivity.

Sentiment Analysis is itself a very important area of research, as vast number of studies has been

done over past few years. SA has been defined as:

"Sentiment analysis, also called opinion mining, is the field of study that analyzes

people‟s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards

entities such as products, services, organizations, individuals, issues, events, topics, and

their attributes." [3].

Since it has a large area of application, it goes with many other terms e.g. opinion extraction,

sentiment mining, opinion mining, subjectivity analysis, emotion analysis, review mining etc.

depending on the different area it is applied to. Thus, opinion mining and SA actually point to

3

the same field of study. Most of the research works we find on SA are based on the English

language, and not as many on Bangla. This interesting work by Das and Bandyopadhyay [4] on

subjectivity detection included Bangla but it is not self-sufficient, as English is also needed. We

have discussed more about other studies on Bangla in chapter 2 (Literature review). However,

none of the works truly considered Bangladesh's perspective. We need to consider not just

standardized Bangla, but Banglish (Bangla words mixed with English words) and Romanized

Bangla. These three major types can again be loosely categorized in - good, standard, bad,

wrong, totally wrong, particular to specific location (almost arcane) etc. depending on the level

of clarity, grammatical correctness, meaningfulness, personal idiosyncrasies, impact of

localization etc. Moreover, for the Romanized Bangla the added complexity is due to the

variation in transliteration between people who know English well and those who don't [5]. The

reason, that no clear standard is followed when 160 million Bangladeshi people write in any of

the mentioned types, makes it all the more complicated and challenging to work with. The

following table has some examples of the texts in all three major types –

Samples Type Comments

অত্যন্ত সময় োপয়যোগী এবং উপকোরী

পদয়েপ।

Translation: a very timely and helpful

step

Bangla Standard,

Meaningful, Good

etc.

পপথীপবয়ত্ মোনূল মোয়েই ভূ কয়

Translation: To err is human

Bangla Bad, Wrong

etc.

এইডো পক কইপ রর মোমো!!

Translation: What are you saying dude!!

Bangla Non standard,

Meaningful, bad etc.

গম আয়সো পন বো?

Translation: How are you?/Are you

doing well?)

Bangla Highly localized,

Non standard etc.

4

I hate you! আর রকোনপদন রত্োমোয়ক love

রকোরয়বো নো। never ever!!

Translation: I hate you! I won't love you

anymore! never ever!!)

Banglish Okay, Meaningful

etc.

Ottonto shomoyopojogi ebong upokari

podokhkhep.

Translation: a very timely and helpful step

Romanized

Bangla

Standard,

Meaningful, Good

etc.

amar bareta akan teke car mael dora

Translation: my house is four miles from

here

Romanized

Bangla

Non standard,

Transliteration error,

Wrong etc.

Table 1: Examples of Bangla text variants

In the recent past, Deep Learning methods, specifically recurrent model-based deep learning

models have enjoyed a lot of success in NLP(Natural Language Processing) than conventional

machine learning methods [6]. While there are other approaches to SA, in this thesis we will

concentrate exclusively on such deep techniques. Our key contributions cover -

Pre-processing the data in a way so that it is readily usable by researchers.

Application of deep recurrent models on a Bangla and Romanized Bangla text corpus.

Pre-train dataset of one label for another (vice versa) to prove its usefulness.

The paper is organized as follows. In chapter 2 we discussed the background of our work and the

works of others in the same field that inspired and helped us in a way. In chapter 3 we discussed

in details about the dataset that we used for our experiments. Chapter 4 discusses the

methodology and also includes the experimental setup for the deep recurrent models, as

elaborately as possible. Chapter 5 (results and discussion) has all the discussion about various

results found from our experimentation, and lastly chapter 6 has conclusion.

5

2.0: Background

Let us now look into the works of others to describe, summarize, evaluate and clarify our work

on Sentiment Analysis using deep recurrent models for Bangla and Romanized Bangla texts.

2.1: Sentiment Analysis

Although the term "Sentiment Analysis" may have appeared for the first time in Nasukawa and

Yi [7] , research works on sentiment appeared as early as in 2000 [8-10]. With advent of social

media on internet e.g. Facebook, Twitter, forum discussions, reviews, and its rapid growth, we

were introduced to humongous amount of digital data (mostly opinionated texts e.g. statuses,

comments, arguments etc.) like never before, and to deal with this huge data SA field enjoyed a

similar growth. Since early 2000, sentiment analysis has become one of the most active research

areas in NLP (Natural Language Processing) [3].

However, most of the works are highly concentrated on English language, with just a few

research papers for Bangla. English SA research has enjoyed great progress, favored by the

presence of standard data sets. Standard datasets allow researchers to do their own experiments

and compare their contributions with those of others. For English language, an example of such a

standard SA dataset is the IMDB Movie Review Data set, which contains 50,000 annotated

(positive or negative movie review) movie reviews made by the viewers. This dataset was

originally created by Maas, Daly [11] and since then has been used by a multitude of different

studies.

A detailed survey paper [12] presented an overview on the recent updates in SA algorithms and

applications, categorizing and summarizing total 54 articles that had been published till 2014.

The following figures (1 & 2) were taken from their paper -

6

Figure 1: Sentiment analysis on product review

Figure 2: Sentiment classification techniques

7

Godbole, Srinivasaiah [13] collected opinions from newspaper and blogs, and assigned scores

indicating positive or negative opinion to each distinct entity in the text corpus to do SA.

In [14], they proposed and investigated a paradigm to mine the sentiment from a popular real-

time micro-blogging service like Twitter, and they fashioned a hybrid approach of using both

corpus-based and dictionary-based methods in determining the semantic orientation of the

tweets.

2.2: Sentiment Analysis for Bangla

It is quite unfortunate that there is no standard collection of data, such as - the IMDB dataset,

Twitter corpus etc. for Bangla texts. One effort for standardization came from an automatic

translation of positive and negative words of SentiWordNet [15]. However, no corpus was

Figure 3: System architecture

8

created from this work, thereby limiting its usage to word level determination of sentiment,

rather than the more complex natural language processing methods. Additionally, such

simplified techniques do not consider the variety of ways in which people usually write, e.g.

spelling mistakes, using colloquial terms etc.

A small dataset of Bangla Tweets were collected along with Hindi and Tamil by Patra, Das [16],

where the authors reported on the outcome of a shared Sentiment Analysis task of Indian

languages. They used 999 Bangla tweets for training and 499 for testing. They did some post

processing such as pruning of emoticons from the tweets and removal of duplicated posts. This

data was annotated manually by native speakers. However, in terms of accuracy the dataset's

insignificant size may have been the only setback they had.

Another similar collection was done in this paper [17], where they collected 1400 Bangla

Tweets. However, their dataset is not publicly available, and the size of the dataset is rather small

when it comes to the question of usability for some recent deep learning-based NLP techniques,

as over training of data as small as this, is highly likely for such deep models.

A slightly larger corpus was collected, automatically annotated and manually verified byDas and

Bandyopadhyay [4], as their collection was almost 2500 Bangla text samples from news items

and blog posts. The uniqueness of their collection over the ones collected by others [16, 17] was

the average size of 288 words of their samples, which is quite a bit larger than the 144 character

Tweet limit.

With most of the other works proceeded in the similar way, the two biggest issues with the

current state of affairs in Bangla SA research are - first and foremost, the absence of a standard

and big enough dataset to compare against, which makes comparison of research work extremely

difficult, and secondly, none of the Bangla SA research takes into account the very prominent

practical aspect of the use of Romanized Bangla [5].

9

2.3: Deep recurrent models

The models we used to run our experiments for our work are deep recurrent models. The

following sections would give us some insight on the background of deep recurrent models and

the algorithm they apply.

2.3.1: Deep learning

AI (Artificial Intelligence) has been traditionally done in two ways – i) Knowledge based, and

ii) Representation learning based. Knowledge base approach to AI uses logical inference rules

to reason about statements input by users. Cyc was one of the most famous of such projects [18].

However, these projects didn‟t see much success. The failure of knowledge based approach was

the driving force into finding a way to give AI the ability to gather its own knowledge by

extracting patterns or learning from the data – popularly known as Machine Learning. This

new algorithm was based on representation of data or feature. That is, the system is given a

number of features about the task in hand on which it will give a decision. Clearly if any of the

features were wrong it would mean wrong representation of the data and the system would not

perform well. To rectify this situation representation learning based [19] algorithm was used.

This algorithm gave better results than the manually tailored representation of data, and allowed

systems to adapt to new tasks with ease. However, using this algorithm it was required that high

level abstract features from the raw data were extracted without any error caused by

misinterpretation due to the factors of variation, as there can be such factors (e.g. an accent in

speakers speech) which would cause false representation in absence of highly sophisticated

(human like) understanding. However, deep learning performed better with this issue, as it

provides with complex representations expressed in terms of a number of other simpler

representations. It may appear that Deep Learning came fairly recently, but in reality it existed

under different names since as early as 1940s. However, deep learning didn‟t get much of

importance until recently. And with this newfound importance the term “deep learning” is

becoming popular.

10

The following Venn diagram figure shows the internal relationships between deep learning,

machine learning and AI and their corresponding AI technology. [20] –

Figure 4: Venn diagram to show relationships among deep learning, machine

learning and other AI technologies

11

2.3.2: Artificial Neural Network

Artificial neural network or ANN for short, is a computational model which is inspired by

biological neural networks of natural neurons [21]. A neuron (also known as nerve cell) is a

special biological information processing cell composed of a cell body (or soma), and two types

of outward tree-like branches – axons and dendrites, and at the terminals of these branches –

synapses. Signals from other neurons are received through dendrites and signal generated in the

cell body after processing is transmitted through axons. Neurons are connected to each other

through synapses where axon of one neuron is connected to dendrite of another neuron [22].

The very first conceptual model of artificial neurons was proposed by Warren S. McCulloch,

who was a neuroscientist, and Walter Pitts [23]. In their paper they described mathematical

aspects of artificial neuron as it computes a weighted sum of n number of input signals that

outputs 1 if the sum is greater than a certain threshold, and outputs 0 otherwise. This could be the

very first conceptualization of Perceptrons. Following is a graphical representation of a

perceptron.

w1

w2

input w3 output

wn Weighted Sum Activation function

Figure A: graphical representation of perceptron

X1

X2

X3

Xn

∑

12

Figure 5: A rolled RNN

diagram

2.3.3: Recurrent Neural Network

Recurrent Neural Network or RNN in short, is highly used in speech recognition, handwriting

recognition, natural language processing and others. Moreover, RNN is the precursor to LSTM,

thereby making it important to discuss and understand RNNs before we can get into LSTMs.

While traditional neural networks failed to create a persistent model that would somewhat mimic

the way our memory cells work for learning and remembering information, RNN – a class of

ANN, has an interesting model design with a loop which makes the information persistent. As

described by Bullinaria [24] – “The fundamental feature of a Recurrent Neural Network (RNN)

is that the network contains at least one feed-back connection, so the activations can flow round

in a loop. That enables the networks to do temporal processing and learn sequences.”

Figure 5 below shows a simple RNN diagram with feed-back connection [1] Here A takes input

xt and outputs ht. The loop enables the flow of information from one step to the next.

To better understand this looping mechanism in RNNs, we should consider the next figure where

an unrolled RNN diagram is shown. (Figure 6)

13

In the diagram we see how input vector x0 generates output vector h0 and sends the information

to next step, where input vector x1 generates h1, and sends the information again to the next step.

So it is like there are multiple copies of same network, where a successor gets information from

all the predecessors, connected in architecture that excels at processing sequential data. Simple

RNNs uses the following formula to calculate the hidden vector, apart from input and output

vectors [25]–

ℎ𝑡 = tan 𝑊ℎℎℎ𝑡−1 +𝑊𝑥ℎ𝑥𝑡

2.3.4: LSTM

While RNN‟s success was critical in speech and pattern recognition due to its ability to

memorize long-term dependencies, it was not without problems. RNNs were able to connect

previous information to current task, only when the gap between the information was small. As

the gap widened, RNNs started to perform poorly. It is highly typical for all traditional RNNs to

have this vanishing gradient problem as the depth and complexity of layers are increased, unlike

Long Short Term Memory neural networks – LSTM in short. LSTM neural network is like an

extension of simple RNN [26]. In 1997 Hochreiter and Schmidhuber introduced LSTM, where a

memory cell had linear dependence of its present activity and its past activity. Input and output

Figure 6: Unrolled simple RNN Diagram [1]

14

gates were introduced to efficiently modulate input and output. However, the introduction of

forget gates were crucial to effective modulation of the information flow between present and

past activities. [27, 28]

In [25], the authors presented an extension of the LSTM using a gate function called depth gate,

and provided the explanation of the equations of LSTM (from 1.1 to 1.5) -

𝑖𝑡 = 𝜎 𝑊𝑥𝑖𝑥𝑡 +𝑊ℎ𝑖ℎ𝑡−1 +𝑊𝑐𝑖𝑐𝑡−1 (1.1)

𝑓𝑡 = 𝜎 𝑊𝑥𝑓𝑥𝑡 +𝑊ℎ𝑓ℎ𝑡−1 +𝑊𝑐𝑓𝑐𝑡−1 (1.2)

𝑐𝑡 = 𝑓𝑡⨀𝑐𝑡−1 + 𝑖𝑡⨀𝑡𝑎𝑛ℎ 𝑊𝑥𝑐𝑥𝑡 +𝑊ℎ𝑐ℎ𝑡−1 (1.3)

𝑜𝑡 = 𝜎 𝑊𝑥𝑜𝑥𝑡 +𝑊ℎ𝑜ℎ𝑡−1 +𝑊𝑐𝑜𝑐𝑡−1 (1.4)

ℎ𝑡 = 𝑜𝑡⨀𝑡𝑎𝑛ℎ 𝑐𝑡 (1.5)

2.4: Software tools used

We used library and tools provided by Keras to design our models. Keras is a compact and

highly modular neural networks library. It is written in python and compatible with Python

2.7~3.5. Keras runs upon a back-end. Theano and TensorFlow are both compatible back-ends

for Keras. However, Keras may use just one as its back-end – either Theano or Tensorflow.

Models are the core data structure of Keras. Model is a way to organize the layers. There are two

types of models in Keras –

the Sequential model, and

the Model class used with functional API (Keras functional API)

All of our experiments ran in a Sequential model.

15

2.4.1: Sequential model characteristics

The Keras Sequential model is a linear stack of layers which can be created either by passing a

list of layer instances directly to the constructor, or using .add() method. It is necessary to

specify the input shape for the model and only the first layer in a Sequential model needs this

information, as the following layers can do automatic shape inference. This can be easily done

by passing an input_shape argument to the very first layer, describing a tuple of integers or None

entries. The latter case tells the model that any positive integer can be expected. One can pass a

batch_input_shape argument instead, where batch dimension is included. In case of 2D layers

such as Dense input shape can be specified via input_dim argument, whereas, 3D temporal layers

supports two arguments – input_dim and input_length.

Before the Sequential model can be trained, it is necessary to configure the learning process, and

it is done by .compile() method. The following three arguments are quite important for the

model –

1. Optimizer: Usually a string identifier of an existing optimizer e.g. rmsprop

2. Loss function: The objective function that the model tries to minimize. Again it is

usually a string identifier of an existing loss function e.g. binary_crossentropy ,and

3. Metrics: For now only accuracy metric is supported at this point.

Once the previous steps are taken care of we come to the part of training the model. Numpy

ndarrays of input data and labels are used for Keras model training. The method used for training

is .fit(). Some of the important arguments for training function are as follow –

1. X - Input data, as a Numpy array, or a list of Numpy arrays in case there are multiple

inputs.

2. y - Labels, also as a Numpy array.

3. batch_size – Number of samples per gradient update

4. nb_epoch – The number of epochs for training the model

5. validation_data – Tuple of (X, y) to be used for validation data. Etc.

This function returns a history object that holds a record of training loss values and metrics

values for each epochs, as well as validation loss values and validation metrics values. By using

16

functions from modules such as h5py it is possible to save the weights from this history object. It

is possible to save the entire model as well using applicable modules.

.evaluate() method takes input data and label (x, and y) as arguments and returns the scalar test

loss or list of scalars on the input data, batch by batch (configured by batch_size argument) All

the information in this section were based on Keras documentation, which is very well

documented and helpful. [29]

2.4.2: Embedding Layers

The whole idea of Embedding layer or word embedding is an aftermath of recent innovation

called word2vec [30]. In a simple way, word2vec is a technique that converts words into unique

discrete values and then maps each word in a continuous vector space. Likewise, Keras‟

Embedding layer takes positive integers as indexes and turns them into dense vector of fixed

size. To use embedding one must use it as the first layer in a model. Embedding layer takes

input_dim as its first argument which is actually the size of the vocabulary, or in another way the

number of unique words, such that the largest integer (i.e. word index) in the input should be no

larger than input_dim-1 (vocabulary size). If it does then there will be errors during model run.

17

Figure 7: Bangla and Romanized Bangla data ratio

3.0: Dataset details

The dataset we used is primarily a BRBT (Bangla and Romanized Bangla) dataset, based on

work of M. R. Amin, [31] and the version used in our work varies mostly in terms of modified

number of posts, and a bit more polishing. Currently the Bangla Sentiment Analysis (SA) dataset

consists of total 9337 post samples. The dataset is unique because not only this is really big but it

also encompasses the till-now-ignored Romanized Bangla. Romanized Bangla is the Bangla

written in English alphabets. Inclusion of Romanized Bangla is paramount, because the ease of

writing Bangla using any standard QWERTY keyboard (without a Bangla keyboard e.g. Bijoy

keyboard) and the simplicity of using English as base language for the posts, have lifted

popularity of Romanized Bangla not just in personal messages and micro-blogs but also in Govt.

sanctioned mass messages/announcements. The dataset is currently kept private for safe keeping

and further improvement. However, it may be made available by personally contacting the

owner/authors.

3.1: Data Statistic

Total number of entries: 9337 (no of rows in sheet 9338)

Bangla entries: 6698 (no of rows in sheet 6699)

Romanized Bangla entries: 2639 (no of rows in sheet 2640)

18

3.1.1: Data Sources

Data were collected from various micro-blog sites, such as, Facebook, Twitter, YouTube etc, and

some online news portal, product review panels etc. Following is the statistic of data sources -

From Facebook: 4621

From Twitter: 2610

From YouTube: 801

From online news portals: 1255

From product review pages: 50

3.1.2: Post collection data processing

Removal of emoticons:- emoticon, hash-tags were removed to give annotators an

unbiased-text-only content to make a decision based on three criteria - positive, negative

and ambiguous.

Removal of proper nouns:- Proper nouns were replaced with tags to provide ambiguity.

All text samples were collected from publicly available sources and did not reflect the

opinion of the authors. (The original text samples have been preserved but are not

Figure 8: Data comparison by data source

19

publicly available. These can be obtained by emailing the authors directly and signing the

required consent form.)

Manual validation (by native speakers):- Collected data samples are manually

annotated into one of three categories: positive (1), negative (0) and ambiguous (A). Each

text sample was independently manually annotated by two different native Bangla

speaking individuals for total two validations. Each annotator validated the data without

knowing decisions made by other. This ensures that the validations are unbiased and

personal.

Text Sample 1st Annotator 2nd Annotator

অয়নক ভোয়ো হয় য়ে গোন!

Translation: very nice song!

Positive Positive

মম মোপন্তক সক দুঘ মটনো ৩ জন

পনহত্।

Translation: 3 dead in a tragic road

accident.

Negative Negative

Chotobelar modhur din gulo khub miss kori

Translation: really miss the

sweet childhood days

Positive Negative

Sympony er set gula kemon?

Translation: How are Symphony mobile sets?

Positive Ambiguous

20

আয়ো আয়ো তু্পম কখয়নো

আমোর হয়বনো

Translation : Light, light, you'll

never be mine

Ambiguous Negative

Table 2: Dataset validation samples

3.2: Double validation analysis

Figure 3 gives us a better perspective of the agreement-disagreement between first validation and

second validation. Rows give first validation's agreement-disagreement for all three annotation

types (first row positive, then next row negative and then last row ambiguous) with second

validation's positive, negative and ambiguous (sequentially), and columns do same thing, only it

is second validations agreement-disagreements related to first validation. For example, first row

tells us for all positive annotation of first validation, second validation agreed with 2817

positive, and disagreed with 538 negative and 392 ambiguous. We can also find out that, there

were total 2817+538+392 = 3747 positive annotations from first validation, among which second

F

irst

Vali

dati

on

Second Validation

Positive Negative Ambiguous

Positive 2817 538 392

Negative 178 3864 404

Ambiguous 27 95 1022

Table 3: Confusion Matrix table

21

validation agreed with 2817 and disagreed with 3747 - 2817 = 930. Again we can say second

annotator agreed 75% of the times with the first annotator for positive annotations, etc. That is

why this confusion matrix is of great importance, to do all sorts of analysis on both validations.

3.3: Dataset preparation

We prepared the data from dataset for our convenience to easily access any specific type of data,

e.g. Bangla posts only or Romanized Bangla only etc. , store and distribute, so without accessing

the actual dataset (xlsx file) one can reuse parts (or whole) of the dataset for his/her experiments

with the models. We used Pythons pickling technology to make pickle files for serializing data

from the datasheets. We have explained in details how we prepared the dataset in chapter 4 -

Methodology. We have uploaded all the .pkl files (along with python codes) on GitHub under

public access [32]

3.3.1: Pickle file details

Although the readme files attached to the GitHub repository have all the information needed for

potential experimenter, we are going to give some details of what the repository holds and which

does what and explain the methodology in chapter 4. There are two folders -

1. pickled-sheets, and

2. pickled-sheet-split.

Pickled-sheets (folder):

This folder holds a single .pkl.gz file (total three) for each individual sheet in BRBT dataset.

Each .pkl file consists of a shuffled Numpy array of [[data], [label1],[label2]] where data means

tokens from tokenized strings, and label1 and label2 are first validation and second validation

respectively.

Sentiment_Analysis.pkl.gz for the sheet which has both Bangla and Romanized Bangla

posts.

22

Bangla_Sentiment_Analysis.pkl.gz for the sheet having only the Bangla posts

Romanized_Bangla_Sentiment_Analysis.pkl.gz for Romanized Bangla sheets only.

Pickled-sheet-split (folder):

This folder holds pkl.gz files for each .pkl file from "pickled-sheets" folder, split into three sets

- training, testing and validation. So for each "pickled-sheets" file there are three files; total 3x3

= 9 files. Each split set is a Numpy array of [[data],[label1],[label2]] where data means the

tokens, and label1 and label2 corresponds to first validation and second validation for the data.

Length for the split was tried to be kept 80% of the total data as training set, 15% for testing and

5% for validation. However, the ratio couldn't be maintained in most cases. Following are the

exact lengths taken for each datasheets.

Sheet1 (Sentiment_Analysis.pkl.gz) :- total length 9337

1. Training set length: 7500 (brbt_split_train.pkl.gz)

2. Validation set length: 500 (brbt_split_validate.pkl.gz)

3. Test set length: 1337 (brbt_split_test.pkl.gz)

Bangla (Bangla_Sentiment_Analysis.pkl.gz) :- total length 6698

1. Training set length: 5400 (bangla_split_train.pkl.gz)

2. Validation set length: 400 (bangla_split_validation.pkl.gz)

3. Test set length: 898 (bangla_split_test.pkl.gz)

Romanized Bangla (Romanized_Bangla_Sentiment_Analysis.pkl.gz):- total length 2639

1. Training set length: 2200 (rb_split_train.pkl.gz)

2. Validation set length: 150 (rb_split_validation.pkl.gz)

3. Test set length: 289 (rb_split_test.pkl.gz)

This folder also contains a simple python code split_three_ways.py to read .pkl.gz files found

in pickled-dataset folder and split them into abovementioned three sets based on the length

defined in the source code. It's quite basic in terms of coding and its methods self-explanatory for

users who would want to try out different length for their sets. The code expects pickled files to

be in the same directory as the code file is in.

23

4.0: Dataset setup

In this chapter, we shall be discussing the methods used for collection and preparation of the

dataset, setting up models for experiments, labeling each experiments and explaining our models

and experiments etc.

We have already discussed the statistical details of our dataset in previous chapter (chapter 3). In

this section we are going to briefly discuss about the methods used for data collection and setting

up the dataset for making it research-ready, not just for ourselves, but for other interested

researchers as well.

The data was manually picked from various online micro-blog sites, product review panels, news

portals etc. For tweets „bn‟ parameters were used in the search option to access Bangla tweets

only. There are over 10000 total Bangla and Romanized Bangla posts in the dataset [31].

We checked for empty rows or columns, missing annotation, proper tagging (for dataset

with proper nouns replaced), proper categorization etc. The resultant dataset is now both unique

and error-free in terms of the abovementioned flaws.

Two additional sheets were added – one for Bangla texts only and the other for Romanized

Bangla posts only. Codes were done in Python and modules such as openpyxl, cPickle, were

used in making scripts to automate tasks such as –

Reading data from “xlsx” files

Converting textual data into tokens

Saving the data as tuple ([data], [label1], [label2])

Applying random shuffle on Numpy array converted from simple tuple

Serializing each datasheets and splitting three sets from each and making them available

for public to download and un-pickle to use them in their models.

For our experiments we applied the tokenizing, splitting, serializing scripts on the “full-text” (or

unmodified texts column of the dataset with all the proper nouns, emoticons etc intact) also,

hence creating additional sets of pickle files. But we didn‟t make these publicly available, as they

were only produced for experimental purpose.

24

80

to

ken

s

Emb

edd

ing

laye

r Long

Short

Term

Memory

Layer

(128)

5.0: Model Implementation

Our dataset consists of three categories –

Positive,

Negative, and

Ambiguous.

Depending on the dataset used and number of categories classified, we used three types of fully

connected neural networks layer – known as Dense layer in Keras. Those are – Dense(1),

Dense(2) and Dense(3). [Figure 9, 10, 11 respectively]

Figure 9: Dense(1) model

ANN

25

80

to

ken

s

Emb

edd

ing

laye

r Long

Short

Term

Memory

Layer

(128)

80

to

ken

s

Emb

edd

ing

laye

r Long

Short

Term

Memory

Layer

(128)

Figure 10: Dense(2) Model

Figure 11: Dense(3) Model

While Dense(1) is sufficient to output 1 and 0 values (1 for positive and 0 for negative), when

using Categorical crossentropy as loss, and Ambiguous category is taken into consideration („A‟

changed to integer value of 2), we used Dense(3). However, Dense(2) was used for “Ambiguous

removed” experiment sets where we omitted data entries with „A‟ validation (either by 1st or 2

nd

validation) and only positives or negatives were counted. In this case, 0 and 1 goes to two

different neurons instead of one. We used data for one validation set as pre-training for another

validation set. More specifically, first we fit data from 1st validation in the model to pre-train for

2nd

validation data – which is fit in the same model afterwards. Likewise, we fit data from 2nd

validation to pre-train for 1st validation data. This sort of pre-training was to check whether it

ANN

ANN

ANN

ANN

ANN

26

can be useful to pre-train on an independently sentiment analysis data even if the labels did not

match.

6.0: Experiments

In this segment we are going to discuss all about the experiments – setup, model labels and tags

used to distinguish different experiments, tables showing all the experiments by their labels,

number of epochs, dense layer number, type of max_features etc.

6.1: Experimental setup

Our model is based on Recurrent Neural Networks (RNN) – more specifically we used LSTM

neural network. We used Keras‟ model-level library since it has all the required features to help

us develop our deep learning model. We used Theano as the back-end for Keras. All our models

are Keras Sequential models. First layer of the Sequential model is the Embedding layer. We

used Embedding layer to implement the word to vector representation for the words in our

dataset. We used a variable named max_features as the input dimension argument for

embedding layer. It means the highest token value returned by the tokenizer during tokenization

of our words, which in turn means that max_features is also the vocabulary size (input_dim). The

value of max_features must be equal to or higher than the vocabulary size to make the model run

without any runtime error. The second layer is Long Short Term Memory (LSTM) with an

internal state of 128 dimensions. The third is a fully connected NN layer which in Keras

terminology known as a Dense layer. Usually, a one dimension dense layer This actually outputs

to a single neuron of dimension 1. For our model implementation we need to work with 2 types

of values – positive and negative, represented by 1 and 0 respectively. And usually a one

dimension neuron holds values of 0 and 1. However, for experimentation we will need to use

more than one dimension dense layers. We used 2 dimension dense layer to include another

27

80

to

ken

s

Emb

edd

ing

laye

r Long

Short

Term

Memory

Layer

(128)

Dense layer

(1/2/3)

Figure 12: Model schematic

value – Ambiguous or neutral, and 3 dimensions dense layer with categorical loss function. The

input for our sequential model will be series of tokens. This is the reason we tokenized the words

from our dataset first during data preparation. For the input we took maximum of 80 tokens at a

time. The consequence for this would be that our proposed model would not be able to process

more than 80 words at a time. However, that may not be much of a limitation since 80 words at a

time is still large enough. We applied ‘sigmoid’ activation function to the output. Depending on

our data and the labels, we used both ‘binary-crossentropy’ and ‘categorical-crossentropy’ as

loss functions Dropouts of 0.2 were used both in Embedding layer and LSTM layer which help

reduce overfitting by randomly setting a fraction of input units to 0 at each update during training

time [33].

6.1: Experiment model label Tags

There are actually 36 unique experiments using the same LSTM model, depending on the dataset

used, processing of texts, loss function used, processing of labels (annotations on data), and

input_dim value for Embedding layer. However, it turns into a total of 72 experiments – one half

of experiments where label 1 (1st validation) is used for pre-training, and the other half where

label 2 (2nd

validation) is used for pre-training. Following are the tags used in experiments and

what they actually mean.

28

Tags used for different types of dataset –

Dataset Type Tag used in experimental labels

Bangla and Romanized Bangla (total) brbt

Bangla (only) bangla

Romanized Bangla (only) rb

Tags used depending on processing of texts/posts –

Processing of texts Tag used in experimental labels

removed and other modifications PN

Full texts (no modification) FT

Tags used based on loss function –

Loss function used Tag used in experimental labels

Binary_crossentropy bin

Categorical_crossentropy cat

Tags used based on Annotation data modification-

Annotation data modification Tag used in experimental labels

Annotation value of „A‟ removed (label along

with data removed) Ra

Annotations value of „A‟ converted to 2 ato2

Tags used based on different type of max_features applied -

29

Max_features type Tag used in experimental labels

Non-fixed, ranging from 20,000 ~ 40,000

depending on the dataset type and size 1

Value fixed at 500 2

6.2: Experiment model table

The following table has all 36 experimental labels and their other significant specifications

which were unique for both sets of pre-training. These labels also denote to experiment sets

where label 1 is used for pre-training; and for the alternate experiments where label 2 is used for

pre-training, only change is in the experiment label with a prefix of „ALT‟ –

Experiment label Dense layer

dimension

Max_features

value

Number of

Epoch

brbt_bin_PN_ra_1 1 35000 50

brbt_bin_PN_ra_2 1 500 50

brbt_bin_FT_ra_1 1 40000 50

brbt_bin_FT_ra_2 1 500 50

brbt_cat_PN_ra_1 2 35000 25

brbt_cat_PN_ra_2 2 500 25

brbt_cat_FT_ra_1 2 40000 25

brbt_cat_FT_ra_2 2 500 25

brbt_cat_PN_ato2_1 3 35000 50

brbt_cat_PN_ato2_2 3 500 50

30

brbt_cat_FT_ato2_1 3 35000 50

brbt_cat_FT_ato2_2 3 500 50

bangla_bin_PN_ra_1 1 35000 25

bangla_bin_PN_ra_2 1 500 25

bangla_bin_FT_ra_1 1 28000 25

bangla_bin_FT_ra_2 1 500 25

bangla_cat_PN_ra_1 2 35000 25

bangla_cat_PN_ra_2 2 500 25

bangla_cat_FT_ra_1 2 35000 25

bangla_cat_FT_ra_2 2 500 25

bangla_cat_PN_ato2_1 3 40000 25

bangla_cat_PN_ato2_2 3 500 25

bangla_cat_FT_ato2_1 3 40000 25

bangla_cat_FT_ato2_2 3 500 25

rb_bin_PN_ra_1 1 20000 25

rb_bin_PN_ra_2 1 500 25

rb_bin_FT_ra_1 1 25000 25

rb_bin_FT_ra_2 1 500 25

rb_cat_PN_ra_1 2 20000 25

rb_cat_PN_ra_2 2 500 25

rb_cat_FT_ra_1 2 20000 25

rb_cat_FT_ra_2 2 500 25

31

rb_cat_PN_ato2_1 3 20000 25

rb_cat_PN_ato2_2 3 500 25

rb_cat_FT_ato2_1 3 35000 25

rb_cat_FT_ato2_2 3 500 25

Table 4: Experiment labels table

32

7.0: Results and discussion

7.1: Result table

The following table holds results from the experiments where 2nd

validation is pre-trained with

1st validation dataset –

Experiment labels Validation

used

Test score

(loss)

Test accuracy

brbt_bin_PN_ra_1 1st validation 1.88182373031 0.623299319728

2nd

validation 1.65080066764 0.679593720705

brbt_bin_PN_ra_2 1st validation 0.95320324427 0.593537415169

2nd

validation 1.10312330084 0.632502309063

brbt_bin_FT_ra_1 1st validation 1.84389834437 0.627986347919

2nd

validation 1.4472491316 0.691244240345

brbt_bin_FT_ra_2 1st validation 0.913010644424 0.622866894401

2nd

validation 1.02413836789 0.639631336625

brbt_cat_PN_ra_1 1st validation 1.49438894849 0.636904761905

2nd

validation 1.58974196519 0.660203139923

brbt_cat_PN_ra_2 1st validation 1.00040410733 0.577380952786

2nd

validation 1.19003349273 0.62973222486

brbt_cat_FT_ra_1 1st validation 1.46728742489 0.654436859661

2nd

validation 1.43848987329 0.688479261904

brbt_cat_FT_ra_2 1st validation 0.703368600115 0.640784983139

2nd

validation 0.774347296124 0.658064515854

33

brbt_cat_PN_ato2_1 1st validation 2.37038821664 0.529543754318

2nd

validation 2.8942532849 0.519820494088

brbt_cat_PN_ato2_2 1st validation 1.40178696362 0.507105460253

2nd

validation 1.69921113683 0.471204188281

brbt_cat_FT_ato2_1 1st validation 2.43535874321 0.546746447716

2nd

validation 2.65894489638 0.519820493286

brbt_cat_FT_ato2_2 1st validation 1.24154179383 0.519072550219

2nd

validation 1.56229666569 0.501121914378

bangla_bin_PN_ra_1 1st validation 1.4950274012 0.625790138914

2nd

validation 1.41194984732 0.6910344828

bangla_bin_PN_ra_2 1st validation 0.84057880322 0.608091023568

2nd

validation 0.902013720808 0.649655173154

bangla_bin_FT_ra_1 1st validation 1.59519414057 0.61772151944

2nd

validation 1.57762829749 0.675900277091

bangla_bin_FT_ra_2 1st validation 0.771766250948 0.593670885321

2nd

validation 0.874607833799 0.639889197171

bangla_cat_PN_ra_1 1st validation 1.53535538588 0.633375473933

2nd

validation 1.3721139773 0.707586207554

bangla_cat_PN_ra_2 1st validation 0.818257555196 0.60809102387

2nd

validation 0.898441794659 0.663448275903

bangla_cat_FT_ra_1 1st validation 1.54158453458 0.635443038126

2nd

validation 1.50628914397 0.667590027783

34

bangla_cat_FT_ra_2 1st validation 0.783895753758 0.61772151944

2nd

validation 0.902959698125 0.649584488195

bangla_cat_PN_ato2_1 1st validation 2.21905702797 0.525027808676

2nd

validation 2.35434000338 0.533926585161

bangla_cat_PN_ato2_2 1st validation 1.11962867512 0.516129032258

2nd

validation 1.28997873955 0.526140155762

bangla_cat_FT_ato2_1 1st validation 2.15015999162 0.530589544004

2nd

validation 2.39394572473 0.529477196885

bangla_cat_FT_ato2_2 1st validation 1.06338288429 0.539488320389

2nd

validation 1.32858297875 0.521690767519

rb_bin_PN_ra_1 1st validation 1.52705907256 0.608695648876

2nd

validation 1.67791411082 0.6375

rb_bin_PN_ra_2 1st validation 0.954761880424 0.612648221108

2nd

validation 1.15460999012 0.65

rb_bin_FT_ra_1 1st validation 1.22663058467 0.682203389831

2nd

validation 1.32435384307 0.638766519824

rb_bin_FT_ra_2 1st validation 0.859760753179 0.648305085756

2nd

validation 1.23520370973 0.621145374449

rb_cat_PN_ra_1 1st validation 1.45886840415 0.62450592814

2nd

validation 1.81545053323 0.616666666667

rb_cat_PN_ra_2 1st validation 1.04351792529 0.59288537478

2nd

validation 1.09681313038 0.6375

35

rb_cat_FT_ra_1 1st validation 1.11829374705 0.648305083736

2nd

validation 1.29434119126 0.665198237885

rb_cat_FT_ra_2 1st validation 0.933010570074 0.610169490515

2nd

validation 1.13431040043 0.656387665461

rb_cat_PN_ato2_1 1st validation 1.85035417814 0.477508650519

2nd

validation 2.3243691055 0.456747404844

rb_cat_PN_ato2_2 1st validation 1.31008294462 0.508650519031

2nd

validation 1.47633354969 0.463667820069

rb_cat_FT_ato2_1 1st validation 2.00189424318 0.525951557093

2nd

validation 2.16044338199 0.505190311419

rb_cat_FT_ato2_2 1st validation 1.56623152981 0.536332179931

2nd

validation 1.91411103592 0.512110726644

Table 5: Experiment results for pre-training label 2

Experiment labels Validation

used

Test score

(loss)

Accuracy

ALTbangla_bin_FT_ra_1 1st validation 1.67468570637 0.6602

2nd

validation 1.38265603005 0.7825

ALTbangla_bin_FT_ra_2 1st validation 0.963279091859 0.6741

2nd

validation 0.800574915561 0.7704

ALTbangla_cat_FT_ra_1 1st validation 1.5892310106 0.6407

36

2nd

validation 1.41864707728 0.7523

ALTbangla_cat_FT_ra_2 1st validation 0.903364171258 0.6713

2nd

validation 0.769961072963 0.7855

Table 6: Experiment results for pre-training label 1

From Table 5 and 6, we see the result of one half of the experiments where 1st validation was

used for pre-training. Highest accuracy was attained by Bangla dataset with categorical

crossentropy loss, modified text, Ambiguous removed and non-fixed max_features, with 70% of

accuracy – which is 20% more than chance for two category dataset. However, this experiment

on BRBT dataset with categorical loss, modified text, ambiguous converted to 2, has a low

accuracy score of 55% but for a three category it scores 22% more than chance (33%).

Therefore, it is clear that most of experiment sets (dataset-wise, or PN-FT tag-wise, or loss

function-wise, and label category-wise) scored above chance. However, none of the experiments

with fixed max_features (vocabulary size for Embedding layer) scored well compared to the

non-fixed variants.

37

Following are the graphs for some of the experiments with high accuracy scores -

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

1 2 3 4 5 6 7 5 6 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

loss

Epochs

bangla_cat_PN_ra_1 (2nd Val)

val_loss

loss

Figure 13: loss-val_loss graph for bangla_cat_PN_ra_1 (2nd validation)

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 5 6 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

accu

racy

Epoch

bangla_cat_PN_ra_1 (2nd val)

acc

val_acc

Figure 14: acc-val_acc graph for bangla_cat_PN_ra_1 (2nd validation

38

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

1 2 3 4 5 6 7 5 6 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

loss

Epochs

bangla_bin_PN_ra_1(2nd val)

loss

val_loss

Figure 15: loss-val_loss graph of bangla_bin_PN_ra_1 (2nd validation)

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 5 6 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

accu

racy

Epochs

bangla_bin_PN_ra_1(2nd val)

acc

val_acc

Figure 16: acc-val_acc graph of bangla_bin_PN_ra_1 (2nd validation)

39

8.0: Conclusion

Our goals in this project were -

1. Pre-processing the data in a way so that it is readily usable by researchers.

2. Application of deep recurrent models on a Bangla and Romanized Bangla text corpus.

3. Pre-train dataset of one label for another (vice versa) to prove its usefulness.

In meeting our goals, we pre-processed a BRBT (Bangla and Romanized Bangla Text) dataset of

total 9337 entries with 6698 entries for Bangla and 2639 for Romanized Bangla texts. Then

dataset was split and serialized into training set, testing set and validation set of lengths defined

in section 3.3.1 and made them available for public so it can be usable by researchers.

For our experiments, we applied LSTM which is a deep recurrent model. There are total 32

different experiments based on the same model with only differences in dataset used, loss

function applied, modification done (or not) on data (proper noun replaced with tags,

duplication removal etc.) etc (This has been discussed in detail in section 4.3.1). While most of

the experiments scored accuracy higher than chance in percentage, Bangla dataset with

categorical crossentropy as loss function and non-fixed max_features for the embedding layer

with “Ambiguous removed” scored highest with 78% in accuracy for 2 category (results

compared from both pre-training set of experiments), and Bangla and Romanized Bangla dataset

(modified text set) with categorical crossentropy loss, non-fixed max_features, and “Ambiguous

converted to 2” scored highest with 55% in accuracy for 3 category.

Our implementation of pre-training dataset of one label for another has showed that, even if the

labels do not match it is useful to pre-train on an independently annotated SA data. For time

constraints we could not finish experiments with all 36 alternate experiments using label 2 for

pre-training for label 1 data, which we intend to complete before do the paper for this research.

However, from four experiments done from the alternate experiment set we have seen consistent

result from 2nd

validation data (label 2).

40

References:

1. Olah, C., Understanding LSTM Networks. 2016.

2. Banglapedia. Bangla Language. Available from:

http://en.banglapedia.org/index.php?title=Bangla_Language.

3. Liu, B., Sentiment analysis and opinion mining. Synthesis lectures on human language

technologies, 2012. 5(1): p. 1-167.

4. Das, A. and S. Bandyopadhyay, Subjectivity detection in english and bengali: A crf-based

approach. Proceeding of ICON, 2009.

5. Khan, S. Convergence in spelling, and spell-checker for Romanized Bangla in computers and

mobile phones. in Informatics, Electronics & Vision (ICIEV), 2014 International Conference on.

2014. IEEE.

6. LeCun, Y., Y. Bengio, and G. Hinton, Deep learning. Nature, 2015. 521(7553): p. 436-444.

7. Nasukawa, T. and J. Yi. Sentiment analysis: Capturing favorability using natural language

processing. in Proceedings of the 2nd international conference on Knowledge capture. 2003.

ACM.

8. Pang, B., L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classification using machine

learning techniques. in Proceedings of the ACL-02 conference on Empirical methods in natural

language processing-Volume 10. 2002. Association for Computational Linguistics.

9. Das, S. and M. Chen. Yahoo! for Amazon: Extracting market sentiment from stock message

boards. in Proceedings of the Asia Pacific finance association annual conference (APFA). 2001.

Bangkok, Thailand.

10. Wiebe, J. Learning subjective adjectives from corpora. in AAAI/IAAI. 2000.

11. Maas, A.L., et al. Learning word vectors for sentiment analysis. in Proceedings of the 49th Annual

Meeting of the Association for Computational Linguistics: Human Language Technologies-

Volume 1. 2011. Association for Computational Linguistics.

12. Medhat, W., A. Hassan, and H. Korashy, Sentiment analysis algorithms and applications: A

survey. Ain Shams Engineering Journal, 2014. 5(4): p. 1093-1113.

13. Godbole, N., M. Srinivasaiah, and S. Skiena, Large-Scale Sentiment Analysis for News and Blogs.

ICWSM, 2007. 7(21): p. 219-222.

http://en.banglapedia.org/index.php?title=Bangla_Language

41

14. Kumar, A. and T.M. Sebastian, Sentiment analysis on twitter. IJCSI International Journal of

Computer Science Issues, 2012. 9(4): p. 372-373.

15. Das, D. and S. Bandyopadhyay. Developing Bengali WordNet Affect for Analyzing Emotion. in

International Conference on the Computer Processing of Oriental Languages. 2010.

16. Patra, B.G., et al. Shared task on sentiment analysis in indian languages (sail) tweets-an

overview. in International Conference on Mining Intelligence and Knowledge Exploration. 2015.

Springer.

17. Chowdhury, S. and W. Chowdhury. Performing sentiment analysis in Bangla microblog posts. in

Informatics, Electronics & Vision (ICIEV), 2014 International Conference on. 2014. IEEE.

18. Lenat, D.B. and R.V. Guha, Building large knowledge-based systems; representation and

inference in the Cyc project. 1989: Addison-Wesley Longman Publishing Co., Inc.

19. Bengio, Y., A. Courville, and P. Vincent, Representation learning: A review and new perspectives.

IEEE transactions on pattern analysis and machine intelligence, 2013. 35(8): p. 1798-1828.

20. Ian Goodfellow, Y.B., Aaron Courville, Deep Learning. 2016.

21. Gershenson, C., Artificial neural networks for beginners. arXiv preprint cs/0308031, 2003.

22. Jain, A.K., J. Mao, and K.M. Mohiuddin, Artificial neural networks: A tutorial. IEEE computer,

1996. 29(3): p. 31-44.

23. McCulloch, W.S. and W. Pitts, A logical calculus of the ideas immanent in nervous activity. The

bulletin of mathematical biophysics, 1943. 5(4): p. 115-133.

24. Bullinaria, J.A., Recurrent neural networks. Neural Computation: Lecture, 2013. 12.

25. Yao, K., et al., Depth-Gated Recurrent Neural Networks. arXiv preprint arXiv:1508.03790, 2015.

26. Elman, J.L., Finding structure in time. Cognitive science, 1990. 14(2): p. 179-211.

27. Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural computation, 1997. 9(8): p.

1735-1780.

28. Gers, F.A., J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM.

Neural computation, 2000. 12(10): p. 2451-2471.

29. Chollet, F. Keras. 2015; Available from: https://github.com/fchollet/kera.

30. Mikolov, T. and J. Dean, Distributed representations of words and phrases and their

compositionality. Advances in neural information processing systems, 2013.

https://github.com/fchollet/kera

42

31. Amin, M.R., BRBT: A dataset of Bangla and Romanized Bangla Texts for Sentiment Analysis.

2016, University of Liberal Arts Bangladesh.

32. Hassan, A. Repository for BRBT pickle files. 2016; Available from: https://github.com/Asif-

Hassan/BRBT-dataset-pickles.

33. Srivastava, N., et al., Dropout: a simple way to prevent neural networks from overfitting. Journal

of Machine Learning Research, 2014. 15(1): p. 1929-1958.
https://github.com/Asif-Hassan/BRBT-dataset-pickleshttps://github.com/Asif-Hassan/BRBT-dataset-pickles

sentiment analysis on bangla and romanized bangla text ... · sentiment analysis on bangla and...

Documents