using content-based filtering in a system of recommendation in the context of digital mobile...

5

Click here to load reader

Upload: elaine-cecilia-gatto

Post on 19-May-2015

338 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Using content-based filtering in a system of recommendation in the context of digital mobile interactive tv

Using Content-Based Filtering in a System of

Recommendation in the Context of Digital Mobile

Interactive TV

Elaine Cecília Gatto, Sergio Donizetti Zorzo IEEE Conference Publishing

Computer Science Department. Federal University of São Carlos – UFSCar

Highway Washington Luís, Km 235, PO Box 676, 13565-9. São Carlos, São Paulo, Brazil.

{elaine_gatto, zorzo}@dc.ufscar.br.

Abstract-Recommendation systems provide suggestions based

on information about the preferences of users. The filtering information is used by recommender systems for the processing

of information and suggestions to users and content-based filtering is an approach to filtering information widely used in recommender systems. Content-Based Filtering on analyzing the

correlation of the content of items with the profile, suggesting relevant items and discarding the irrelevant. Widely used on the Internet, recommendation systems are being studied for use in

the context of Digital TV, there are already several studies in this direction. Just as occurs on the Internet, recommendation systems can be used in Digital TV for recommendation of TV

programs, advertising and publicity and also electronic commerce. Thus, the items in the context of digital TV, may be programs, publicity / advertising and the products to be sold.

Applying Content Filtering Based on the recommendation of programs, for example, it should correlate the content of these programs with user preferences, which in this scenario are the

types of programs he has preferred to watch. This paper presents the studies performed with Content Filtering Based on Data Applied to Digital TV. The studies seek to observe and evaluate

how some techniques of content-based filtering can be used in recommendation systems in the context of Digital TV.

I. INTRODUCTION

Digital TV implementation in Brazil provides new markets

which can be explored. Well-succeeded technologies as those

in Web environment, for example, can be applied in Digital

TV domain and achieve the same success.

The interaction either through the remote control or the cell

phone keyboard etc by the user today, will allow many

applications to be carried to this environment.

One of the areas which has been extensively studied and is

well-succeeded in the Web is that of personalization. There

are some surveys concerning recommendation systems for

Digital TV as for example [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] among

others.

Recommendation systems can contribute to a better use of

Digital TV in residences, in groups or individually, in a cell

phone, for example. These systems can help the user to choose

the program, avoiding waste of time and of course, suggesting

to the user programs which really interest him. Moreover,

recommendation systems can be applied to publicity and

advertisement on Digital TV, as well as in the T-Commerce.

This work is organized as follows: section 1 introduces the

paper, section 2 presents a comparison between digital TV and

portable devices for homes, section 3 presents related works,

section 4 talks about content-based filtering, section 5 presents

our recommendation system, Section 6 talks about the

characteristics of households, the EPG, the user history and

methodology used for the tests, Section 7 talks about the

results and Section 8 presents the conclusion.

II. COMPARING IDTV IN RESIDENCES AND IDTV FOR CELL

PHONES

The use of IDTV for cell phones will quickly boom due to

the increasingly quantity of these devices surpassing television

sets in Brazil, when cell phones with IDTV are available to

population. Thus, some differences between IDTV for

residences and for cell phones can be noticed.

IDTV standard adopted in Brazil calls full-seg the fixed

devices like set-top-box, and one-seg, devices like cell phones,

miniTVs, PDAs, etc. In residences, the IDTV is used by all

residents while in the cell phone it is normally used by only

one user, the owner of the device.

Another characteristic is the size of the display. In

residences, the IDTV television sets have screens bigger than

30”, where is possible to have a more flexible development,

presentation and displaying of the content. However, cell

phones screens are smaller than 10" requiring a higher effort

in development to display the content on the screen avoiding

image pollution and confusion to the user.

An exceptional characteristic in this environment is that

IDTV for cell phones can be seen anywhere and anytime. On

the other hand, IDTV viewing period in residences can be

longer than in cell phones which are used in situations of

waiting and displacement.

IDTV in cell phones can use already existent 2G/3G net

architecture, and 4G in the future, as a return channel, making

interactivity possible in this environment before occurring in

IDTV.

The middleware adopted in Brazil has national technology

and is called Ginga. Ginga-NCL and Ginga-J declarative and

imperative portions of the middleware are necessary for full-

Page 2: Using content-based filtering in a system of recommendation in the context of digital mobile interactive tv

seg devices. For one-seg devices, only Ginga-NCL declarative

portion is required. There is a reference implementation of the

middleware for full-seg devices. For one-seg devices, this

reference implementation is not available yet, but are working

in this middleware development, as PUC-RIO and UFES

(Symbian e Android). [11, 12, 13]

The users of these devices need special attention due to

current characteristics of this environment like processing

power, storage capacity and battery.

III. RELATED WORK

There are many works involving recommendation systems

for IDTV for set-top-box and more recently for portable

devices. This section presents two recent works about

recommendation systems for IDTV.

In [5] the recommendation system fits the systems with

content-based filtering category, using text mining. The

system uses a simple interface with the user and accepts a

natural language as text entry as well as four values which

reflect user preferences for comedy, action, horror and erotic.

First, the system extracts texts and then searches for emotions

in the text and the distances among themes are calculated.

Finally, an index is calculated for each entry and a list of

programs organized by this index is returned.

In [3] the main aim of the system is substituting the

common content by a personalized and adapted content in a

more attractive way for the user. Therefore, this system

accepts and allows TV reception either through broadcast, or

multimedia streaming. The system uses explicit collection –

when using for the first time it is necessary to inform the

preferences – and also implicit collection – user’s actions in

the device are monitored, stored and sent to the server. The

personalized content– chosen based on preferences – is sent to

the user’s portable device by the Server in order to be

previously stored before being exhibited.

The ZapTV [4] developed for DVB-H standard allows the

user to create his own content, offering aggregated value

services as multimodal access (Web and Cell phones), return

channel, video note, personalized sharing and distribution of

content. Besides the technology provided by DVB-H, ZapTV

comprehends other technologies as TV-Anytime,

Technologies emerging from Web 2.0 and involved in the

Semantic Web. The main functionalities of ZapTV include a

social net, personalized content broadcasting (implicit or

explicit recommendation), thematic channels diffusion

planning (age-group, genre or specific theme), client

application and transmission of the electronic programming

guide. ZapTV seeks to improve the recommendation using an

intelligent personalization mechanism which matches

information filtering with semantic logic processes and it was

based on the principles of participation and sharing between

Web 2.0 users, so that the creation, sharing, classification and

note of content make the search for content easier.

Our recommendation system is in the portable device, and

the inclusion of servers in Brazilian IDTV architecture for cell

phones is not necessary to provide recommendation and,

consequently the need for remote communication is also not

necessary, avoiding that the user pay for the data traffic in the

net in order to receive recommendation or send his data, and

thus, protecting the user’s data privacy.

IV. CONTENT-BASED FILTERING

Content-Based Filtering (CBF) uses the content attributes to

describe the content of the items and then calculate the

similarity. This approach does not depend on other users’

evaluation about the items. CBF is an information recovery

technique which bases its forecast on the fact that previous

preferences of the users are reliable indicators for future

behavior. In order to formulate recommendations, a variety of

algorithms has been proposed to evaluate the content of

documents and find regularities. Some of these algorithms

operate with classification knowledge and others operate with

the problem of regression. Some of the problems and

limitations found in systems using CBF are super

specialization, the problem of the new user and the analyses of

limited content. [7, 6, 14]

V. RECOMMENDER SYSTEM

Our recommendation system aims to facilitate the IDTV

user’s routine by interacting with a simple interface which

provides content of preference without spending so much time

to find it.

The process starts when the user turns on the TV in the cell

phone. The user history data collected is submitted to

information filtering based on content in order to find the

user’s profile. Data resulting from this process are formatted.

The user profile is stored in a database with the date and time

of the generation. With the user profile updated it is possible

to look in the EPG for compatible TV programs and which are

going to be transmitted around the current time, providing a

list of these programs. This list is also stored in a data base

with the date and time of generation.

The recommendations are presented to the user and those

required are stored with the user history. During the time the

IDTV for cell phone is turned on, all programs viewed by the

user are stored in the database which has the user history. This

process is repeated every time the user turns on the TV.

Ginga-NCL middleware has a layer for resident applications

responsible for exhibition, other layer for the common center

responsible for offering several services and a last layer

regarding the protocols stack.

The recommendation system is considered as an element in

Ginga-NCL architecture, in Ginga Common Core, due to the

need to continue using data locally and also use Tunner

libraries – in order to obtain information about the channels

tune – ESInformation – in order to obtain information about

EIT table generating the EPG – and Context Manager – to

obtain system information.

As GINGA-NCL middleware is mandatory in Brazil for

portable devices, the recommendation system was planned,

Page 3: Using content-based filtering in a system of recommendation in the context of digital mobile interactive tv

designed and modeled according to Brazilian rules which refer

to portable devices, thus meeting these devices needs. More

details on our system can be obtained in. [15]

VI. TESTS

For the tests we used the data corresponding to TV viewing

and program schedule. These data were provided by IBOPE

which is a Brazilian multinational private equity firms and a

leading market research in Latin America. 67 years ago to

IBOPE provides a wide range of information and media

studies, public opinion, voting intention, consumption, brand

and market behavior. In the following subsections the

characteristics of these data and the tests will be detailed. [16]

A. Characteristics of Residences

Data that contain information provided IBOPE EPG (TV

programming), history of the user's view (what the viewer saw)

and also the socioeconomic information. All these data were

separated and stored in MySQL database the data correspond

to 15 days of programming and monitoring of six Brazilian

households with TV programming Open. These households

were monitored every minute, and each individual was also

monitored separately.

TABLE I NUMBER OF INDIVIDUALS BY RESIDENCE

Residence 1 2 3 4 5 6

Individuals 2 3 3 2 2 3

TVs 1 1 2 2 1 2

B. Characteristics of Date

The data used for these tests have undergone a process of

manual adjustment. For each of the algorithms used, was a

necessary pre-processing for correct use and analysis.

Subsections C, D, E and F detail the composition of these data.

C. EPG

The EPG (Electronic Program Guide) is composed of 15

TXT files called programming files, one for each day

(05/03/2008 to 19/03/2008) with a grid of 10 TV stations

Open, starting at 00:00:00 and ends at 05:59:00. After

understanding the files that make up the EPG, the data were

copied from the archives of programming a spreadsheet

BrOffice and then was done cleaning up unnecessary data.

We noticed some inconsistencies in schedules, which were

immediately corrected so that future analysis will not generate

erroneous results. This was repeated for each of the 15

programming files, generating a single spreadsheet containing

the entire 15 days of EPG.

Was added to these data the day of week and duration of the

program. The EPG, this step is not complete, missing the

genre and subgenre of each program. Searched for it on the

official websites of the gender of each station broadcast

programs and then identified according to the Brazilian

standard ABNT NBR 15603-3:2007, Annex C, "gender

descriptor in the descriptor content." [17]

A new table was created, identical to the EPG table, but

with added fields with the names of genre, to be used with the

technique of the cosine. These fields were populated with 0 or

1 depending on the program or not fit in that genre, becoming

a matrix.

D. User History

Historical data viewing of users are needed for the

discovery of these preferences. In the context of digital TV

that we are considering, these data are collected and stored

implicitly.

Spreadsheets sent by tuning IBOPE, which contains user

data, were modified so that filtering techniques based on

content could be applied.

The data were then separated by households and although

some homes have more than one TV in these households it

was noticed that there is no record of monitoring more than

one TV at the same time and therefore it is considered that the

household has only one TV.

Data were also formatted: date in yyyy-mm-dd, time in

hh:mm:ss format TV and 00X. The resulting sheets were

converted to CSV files that were then inserted in MySQL

were also added to user data, information for day of week,

time of day and duration of display.

E. Methodology

We simulate the Content Filtering Based on using two

different techniques, the Apriori and cosine, both using as a

target attribute to gender. In the case of Apriori, we apply the

settings shown in Figure 1.

Then start the simulation for Apriori. For each household,

the process was the same on the first day to generate

recommendations for the second day, the second day, based

on what was seen the day before and the present day, to

generate recommendations for the third day, and so on.

First we opened the CSV file corresponding to the X home

and day 1. After convert some attributes String to Numeric

Nominal and others for Nominal (applying filters). Then the

executable Apriori and ultimately save the output. With the

data saved, it was possible to assess whether the day after,

someone from the household attended any of the genera found

by Apriori.

To find the cosine first and save the profile, then calculate

the distance of the cosine, cosine and the standard itself and

finally found the right answers. The process is iterative and is

performed for each day and each household.

F. Apriori e Cosine

The algorithms of association techniques identify

associations between data records that are somehow related.

The basic premise is evidence on the presence of others in the

same transaction, to determine what things are related.

Page 4: Using content-based filtering in a system of recommendation in the context of digital mobile interactive tv

Figure 1. Parameters used in Apriori.

The association rules interconnect objects in an attempt to

expose patterns and trends. The discovery of associations must

show both associations as trivial associations not trivial.

The Apriori algorithm is often used to mine association

rules. Apriori can work with a high number of attributes,

generating various combinations between them and

performing successive searches across the database,

maintaining optimum performance in terms of processing time.

The algorithm tries to find all the relevant association rules

between items, which has the form X (history) ==> Y

(consequent). If x% of transactions that contain X also contain

Y, then x% represents the factor of trust (under the rule of

confidence). The support factor is a measure representing x%

of cases in which X and Y occur simultaneously over the total

number of records (often). [18]

The cosine is a measure of similarity, a metric that can be

applied to find out if an item has a correlation or not with the

user profile. A binary vector is a set of two elements, x and y.

In an n-dimensional space, where n is the number of items in

the vector. You can therefore calculate the cosine between the

vectors, measured as similarity between the user profile and its

history. The similarity is high when the value of cosine is high,

the closer to 1, the greater the similarity. [19]

VII. RESULTS

For all households were made in excel spreadsheets to

account for the percentage of success of each technique to

each household. As the number of recommendations were

generated in May, then the basic formula for calculating the

percentage of correct answers was used:

Moreover, we also calculated the average percentage of

correct answers for the number of recommendations generated

using the following formula:

Figure 2 and Figure 3 show the chart with the average

achieved for the percentages calculated for each hit home for

the techniques of the Cosine and Apriori. Households 2, 4 and

6 had the best results for the cosine and the Apriori

households 4:06. As can be seen in Figure 4, which presents

the comparison chart between the two techniques in general,

the cosine has outperformed over the Apriori.

Figure 2. Parameters used in Cosine.

Figure 3. Parameters used in Apriori.

Figure 4. Comparison between means of Apriori and Cosine.

(1)

(2)

Page 5: Using content-based filtering in a system of recommendation in the context of digital mobile interactive tv

VIII. CONCLUSION

During the tests, we observed some peculiarities. Our

system recommends content based on the kinds of programs,

and our analysis were made according to that parameter. With

the Apriori algorithm, the format of the data is already

collected correctly for use. For cosine, the EPG needs to be

changed to an array before starting the discovery process

profiles and recommendations.

The Apriori is able to mine only the user viewing history,

discovering your profile from the rules. To select the programs

to be recommended, another technique should be used. The

cosine can do both. However, Apriori can discover more

features in the user history, for example, "the user stands in

front of the TV more often at night, he enjoys watching

movies and watching TV more frequently in the second."

The cosine cannot find these features, but can reach our

goal. To find similar patterns in association rules, it is

necessary to use more complex queries to the bank.

The output from the Apriori must be crafted to generate the

correct profile of the user, ie, the rules should be interpreted,

which in terms of implementation becomes somewhat

cumbersome. The cosine output is more readable, its result

goes straight to the intended goal, allowing the output to be

used without the need for a post-treatment.

In relation to entry to the Apriori there is no need of

treatment, since the data will be used the way they are

collected. But for the cosine, where the EPG is updated, the

table containing the matrix of the EPG should be modified

according to the new EPG, becoming somewhat laborious.

Then simulated with both techniques the process of delivery

and acceptance of recommendations by calculating the

percentage of correct and generating graphics. The profile of

genres found by both algorithms are similar. Although both

techniques to cover the needs of the system, the cosine is one

that can be better utilized.

On a desktop, as was the case of our tests, the return of

calculating the cosine is faster in relation to the return of the

Apriori association rules. However, further studies on the

consumption of processing of these algorithms in a cell with

TVDI is not yet possible in Brazil. The time that the whole

process of recommendation takes to complete varies according

to the technique of customization to be used. In our tests and

simulations, the cosine ends the process before the Apriori.

Studies show that although both algorithms meet our needs,

these two techniques the Cosine can be better worked in the

recommendation system for TVDPI.

ACKNOWLEDGMENT

We thank IBOPE for providing real data about the

electronic program guide and also the viewer’s behavior data

from March, 05, 2008 to March, 19, 2008.

REFERENCES

[1] Avila, P. M. TV Recommender: Application Development Support of Recommendation for the Brazilian System of Digital TV. Dissertation. Graduate in Computer Science. Department of Computer Science. Federal University of Sao Carlos. 90 pages, 2010.

[2] Lucas, A. Customization for Digital TV using the strategy of Recommender System for multiuser environments. Dissertation. Graduate in Computer Science. Department of Computer Science. Federal University of Sao Carlos. 103 pages, 2009.

[3] Uribe, S. et al. Mobile TV Targeted Advertisement and Content Personalization. In 16th International Workshop Conference on Systems, Signals and Image Processing, Chalkida, Greece, 18-19/06/2009.

[4] Solla, A. G. et al. ZapTV: Personalized User-Generated Content for Handheld Devices in DVB-H Mobile Newtorks. In Proceedings 6th European Interactive TV Conference, p.193-203, Salzburg, Áustria, 03- 04/07/2008.

[5] Bär, A. et al. A Lightweight Mobile TV Recommender: Towards a One-Click-to-Watch Experience. In Proceedings 6th European Interactive TV Conference, p.142-147, Salzburg, Áustria, 03-04/07/2008.

[6] Einarsson, O. P. Content Personalization for Mobile TV Combining Content-Based and Collavorative Filtering. Master Thesis. Center for Information and Communication Technologies. Technical Univesity of Denmark. August 22, 2007

[7] Chorianopoulos, K. Personalized and mobile digital TV applications. In Proceedings of the Multimedia Tools and Aplications, p. 1- 10, vol.36, 27 January 2007.

[8] Choi, J. Y.; Koh, D.; Lee, J. Ex-ante simulation of mobile TV market based on consumers’ preference data. In Proceedings of the Technological Forecasting & Social Change, p. 1043-1053, 2007.

[9] Yu, Z. et al. TV program recommendation for multiple viewers based on user profile merging. In Proceedings of the User Model User-Adap Inter, p. 63-82, 2006.

[10] Das, D. and ter Horst, H. Recommder Systems for TV. In Proceedings of 15 th AAAI Conference, Madison, Wisconsin, July 1998.

[11] Ginga. Disponível em: <http://www.ginga.org.br/>, Acessado em 06 de janeiro de 2010. http://www.ginga.org.br/

[12] Ginga-NCL. Disponível em: <http://www.ginga.org.br/>, Acessado em 06 de janeiro de 2010. http://www.gingancl.org.br/

[13] Ginga-J. Disponível em: <http://www.ginga.org.br/>, Acessado em 06 de janeiro de 2010. http://dev.openginga.org/

[14] Pazzani, M. J. A framework for Collaborative, Content-Based and Demographic Filtering. Artificial Intelligence Review, p. 393-408, December 1999.

[15] Gatto, Elaine C., Zorzi, Sergio D. Recommender System for Digital TV Portable Interactive Brasileira. In 8th International Information and Telecommunication Technologies Symposium - December 09-11, 2009. Florianopolis, Santa Catarina, Brazil.

[16] IBOPE. Disponível em <www.ibope.com.br> [17] ABNT NBR 15603-2. Digital terrestrial television – Multiplexing and

service information (SI) Part 2: Data structure and definition of basic information of SI.

[18] Witten, I. H.; Frank, E. Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition, Morgan Kaufmann, 525 pages, June 2005.

[19] Cristo, M. Sistemas de Recomendação, Métodos e Avaliação. 81 slides. 2009.