master thesis: imputation of missing product information ... · global retail ecommerce sales will...

24
Chair of Software Engineering for Business Information Systems (sebis) Faculty of Informatics Technische Universität München wwwmatthes.in.tum.de Master Thesis: Imputation of missing Product Information using Deep Learning A Use Case on Amazon Product Catalogue Aamna Najmi, 18.01.2019, Kickoff Advisor: Ahmed Elnaggar

Upload: others

Post on 23-Aug-2020

4 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Chair of Software Engineering for Business Information Systems (sebis)

Faculty of Informatics

Technische Universität München

wwwmatthes.in.tum.de

Master Thesis: Imputation of missing Product Information using

Deep LearningA Use Case on Amazon Product Catalogue

Aamna Najmi, 18.01.2019, Kickoff

Advisor: Ahmed Elnaggar

Page 2: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

▪ Introduction

▪ Motivation

▪ Objective

▪ Approach

▪ Research Questions

▪ Dataset

▪ Methodology

▪ Timeline of the Project

Outline

© sebisKickoff Master Thesis – Aamna Najmi 2

Page 3: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

▪ Introduction

▪ Motivation

▪ Objective

▪ Approach

▪ Research Questions

▪ Dataset

▪ Methodology

▪ Timeline of the Project

Outline

© sebisKickoff Master Thesis – Aamna Najmi 3

Page 4: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Introduction

▪ Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail

spending worldwide [1]

▪ 20% of purchase failures are potentially a result of missing or unclear product information [2]

▪ Detailed product information = improved customer experience and company profit

© sebisKickoff Master Thesis – Aamna Najmi 4

Page 5: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

▪ Introduction

▪ Motivation

▪ Objective

▪ Approach

▪ Research Questions

▪ Dataset

▪ Methodology

▪ Timeline of the Project

Outline

© sebisKickoff Master Thesis – Aamna Najmi 5

Page 6: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Motivation

© sebisKickoff Master Thesis – Aamna Najmi 6

Organizational Benefits Customer Experience

Machine Learning

Transfer

LearningMulti Task

LearningNLP

Computer

Vision

Reliability

Informed

Decision

Making

Enhanced

Website

Navigation

Inventory

Management

Delivery and

TransportationCompany

Profit

Page 7: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

▪ Introduction

▪ Motivation

▪ Objective

▪ Approach

▪ Research Questions

▪ Dataset

▪ Methodology

▪ Timeline of the Project

Outline

© sebisKickoff Master Thesis – Aamna Najmi 7

Page 8: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Objective

© sebisKickoff Master Thesis – Aamna Najmi 8

Predict missing information (Category, Color and Gender) of Amazon products belonging to the

Fashion department in the European market using textual information in five languages,

namely English, German, French, Spanish and Italian, and product images as inputs to the

model.

Page 9: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

▪ Introduction

▪ Motivation

▪ Objective

▪ Approach

▪ Research Questions

▪ Dataset

▪ Methodology

▪ Timeline of the Project

Outline

© sebisKickoff Master Thesis – Aamna Najmi 9

Page 10: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Approach

© sebisKickoff Master Thesis – Aamna Najmi 10

• Multi-task learning : Train one model to

perform multiple tasks concurrently [3]

• Transfer Learning : Use pre-trained network

to apply its weights and biases to a different

domain [4]

Page 11: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

▪ Introduction

▪ Motivation

▪ Approach

▪ Research Questions

▪ Dataset

▪ Methodology

▪ Timeline of the Project

Outline

© sebisKickoff Master Thesis – Aamna Najmi 11

Page 12: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Research Questions

1. Could training on multiple tasks and transfer learning perform better than training on one task

independently using the Amazon Product Catalog Dataset?

2. What architecture choices and hyperparameters shall we use in both multi-task and transfer

learning to obtain better performance?

3. Can multi-task learning and transfer learning be useful in the ecommerce domain to enhance user-

experience?

© sebisKickoff Master Thesis – Aamna Najmi 12

Page 13: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

▪ Introduction

▪ Motivation

▪ Approach

▪ Research Questions

▪ Dataset

▪ Methodology

▪ Timeline of the Project

Outline

© sebisKickoff Master Thesis – Aamna Najmi 13

Page 14: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Dataset

▪ Source: Scraped data from five regional Amazon websites (UK, DE, FR, IT,ES)

▪ Records: 200k from each website, 1 million in total

▪ Attributes: Product ID, Product Title, Product Description, Color, Category, Product Summary, Product

Specifications, Product Image

▪ Sample:

© sebisKickoff Master Thesis – Aamna Najmi 14

Page 15: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Dataset (contd.)

© sebisKickoff Master Thesis – Aamna Najmi 15

Note: The above image is representative of the data scraped from www.amazon.co.uk

Page 16: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Dataset (contd.)

© sebisKickoff Master Thesis – Aamna Najmi 16

Note: The above image is representative of the data scraped from www.amazon.co.uk

Page 17: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Dataset (contd.)

© sebisKickoff Master Thesis – Aamna Najmi 17

Note: The above image is representative of the data scraped from www.amazon.co.uk

Page 18: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Dataset (contd.)

© sebisKickoff Master Thesis – Aamna Najmi 18

Note: The above image is representative of the data scraped from www.amazon.co.uk

Page 19: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

▪ Introduction

▪ Motivation

▪ Approach

▪ Research Questions

▪ Dataset

▪ Methodology

▪ Timeline of the Project

Outline

© sebisKickoff Master Thesis – Aamna Najmi 19

Page 20: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Methodology

© sebisKickoff Master Thesis – Aamna Najmi 20

Web ScrapingPreparing the

dataset Integrate dataset

Train the model on MTL and TL architecture

Evaluation of the results

Verify and validate the research

questions

Page 21: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

▪ Introduction

▪ Motivation

▪ Approach

▪ Research Questions

▪ Dataset

▪ Methodology

▪ Timeline of the Project

Outline

© sebisKickoff Master Thesis – Aamna Najmi 21

Page 22: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Timeline

© sebisKickoff Master Thesis – Aamna Najmi 22

Thesis start

Kick-off presentation

26/ Nov/ 18 5/ Jan/ 19 14/ Feb/ 19 26/ Mrz/ 19 5/ Mai/ 19 14/ Jun/ 19

Literature Review

Implementation

Evaluation

Documentation

Review

Literature ReviewImplementationEvaluationDocumentationReview

Start Date December 1, 2018January 1, 2019February 1, 2019March 1, 2019April 1, 2019

Duration 120119887560

End Date March 31, 2019April 30, 2019April 30, 2019May 15, 2019May 31, 2019

Thesis end

Hand in Thesis

Page 23: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

References

© sebisKickoff Master Thesis – Aamna Najmi 23

[1] https://www.emarketer.com/Article/Worldwide-Retail-Ecommerce-Sales-Will-Reach-1915-Trillion-This-Year/1014369

[2] https://www.nngroup.com/reports/ecommerce-user-experience/

[3] Sebastian Ruder. An overview of Multi-Task Learning in Deep Neural Networks, 2017

[4] Sebastian Ruder. http://ruder.io/transfer-learning/

Page 24: Master Thesis: Imputation of missing Product Information ... · Global retail ecommerce sales will reach about $4 trillion in 2020, accounting for 14.6% of total retail spending worldwide

Technische Universität München

Faculty of Informatics

Chair of Software Engineering for Business

Information Systems

Boltzmannstraße 3

85748 Garching bei München

Aamna Najmi

Imputation of missing Product Information

using Deep Learning

[email protected]