d1.4 - event, weather and multilingual data services · ew-shopp ga number: 732590 h2020...

29
D1.4 - Event, Weather and Multilingual Data Services Deliverable n: 1.4 Date: 27 December 2018 Status: Final Version: 1.0 Authors: Aljaž Košmerlj (JSI), Matteo Palmonari (UNIMIB), Flavio De Paoli (UNIMIB) Contributors: JSI, UNIMIB Reviewers: Matej Žvan (BT), Dumitru Roman (SINTEF) Distribution: Public Grant n. 732590 - H2020-ICT-2016-2017/H2020-ICT-2016-1

Upload: others

Post on 14-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

D1.4 - Event, Weather andMultilingualDataServices

Deliverablen: 1.4Date: 27December2018Status: FinalVersion: 1.0Authors: Aljaž Košmerlj (JSI), Matteo Palmonari (UNIMIB), Flavio De Paoli

(UNIMIB)

Contributors: JSI,UNIMIB

Reviewers: MatejŽvan(BT),DumitruRoman(SINTEF) Distribution: Public

Grantn.732590-H2020-ICT-2016-2017/H2020-ICT-2016-1

Page 2: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

2

HistoryofChanges

Version Date Description Revisedby

0.1 30/10/2018 TentativeTableofContents AljažKošmerlj(JSI)

0.2 18/12/2018 WroteChapter4andsection5.2 AljažKošmerlj(JSI

0.3 19/12/2018 WroteSection2.2 FlavioDePaoli,MatteoPalmonari(UNIMIB)

0.4 19/12/2018 Improved Section 2.2 and added Section4.1

FlavioDePaoli,MatteoPalmonari(UNIMIB)

1.1.1.1.1.1.1 0.91.1.1.1.1.1.2 27/12/20181.1.1.1.1.1.3 Finalizeddocument 1.1.1.1.1.1.4 AljažKošmerlj(JSI)

1.1.1.1.1.1.5 1.01.1.1.1.1.1.6 28/12/20181.1.1.1.1.1.7 Finalcheckbycoordinatorandminoredits1.1.1.1.1.1.8 Matteo Palmonari(UNIMIB)

Page 3: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

3

ExecutiveSummary

This deliverable describes the event, weather and multilingual data services of the EW-Shoppproject.Theseservicessupplycontextualinformationtothebusinessdataprovidedbytheprojectbusiness partners as well as the cross-lingual linking of datasets in different languages. The textbuilds on specifications and descriptions from previous deliverables and details extensions andadditionsmadebasedonexperience fromdeploymentof business casepilots. The chief amongstthese additions is the introduction of the custom events ontology for description of custombusiness-impacting events. The ontology stems from the alignment of custom event data thatbusiness partners use or plan to use to the Schema.org vocabulary, so as to maximizeinteroperabilitywithanincreasingvolumeofeventdatathatarepublishedusingthisvocabulary.

The deliverable expands on specifications and descriptions from deliverable D1.3. It describesservicesused indeploymentofpilotsoutlined indeliverableD4.2.Theresultsofevaluationof theservicesbasedontheirperformanceinthepilotsarereportedindeliverableD2.3.

Page 4: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

4

TableofcontextHistoryofChanges.................................................................................................................................2

ExecutiveSummary...............................................................................................................................3

Listoffigures.........................................................................................................................................5

Listoftables...........................................................................................................................................6

Chapter2 Introduction......................................................................................................................7

2.1 RelationshiptoOtherDeliverables..........................................................................................7

2.2 AbbreviationsandAcronyms...................................................................................................7

2.3 DocumentStructure...............................................................................................................10

Chapter3 EventData......................................................................................................................10

3.1 UpdatesSinceD1.3................................................................................................................10

3.2 CustomEvents........................................................................................................................10

3.2.1 ObjectivesofanOntologyforCustomEvents.................................................................11

3.2.2 Methodology...................................................................................................................11

3.2.3 Schema.orgEventModel................................................................................................12

3.2.4 Partners’EventData.......................................................................................................14

3.2.5 UseCasesforInteroperableDescriptionsofCustomEventData...................................18

3.2.6 GuidelinesfortheDesignoftheOntology......................................................................18

3.2.7 TheEW-ShoppOntologyBasedonSchema.org..............................................................19

Chapter4 WeatherData.................................................................................................................24

4.1 UpdatesSinceD1.3................................................................................................................25

4.2 AlternativeSourcesofWeatherData....................................................................................25

Chapter5 MultilingualDataLinkingServices..................................................................................26

5.1 UpdatesSinceD1.3................................................................................................................27

5.2 HandlingKeywordsintheJOTDataset..................................................................................27

Chapter6 Conclusion......................................................................................................................28

References...........................................................................................................................................29

Page 5: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

5

Listoffigures

FIGURE1.MAINTYPESUSEDINTHECUSTOMEVENTONTOLOGYANDTHEIRMUTUALRELATIONS...............................................20

Page 6: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

6

Listoftables

TABLE1.ABBREVIATIONSANDACRONYMS.......................................................................................................................8TABLE2.SHORTREFERENCESFORPROJECTPARTNERS.........................................................................................................9TABLE3:SUMMARYOFTHETOOLSINTHEEW-SHOPPTOOLKIT............................................................................................9TABLE4-CENEJEEVENTDATAPROPERTIES.....................................................................................................................14TABLE5-BIGBANGEVENTDATAPROPERTIES.................................................................................................................14TABLE6-CDEEVENTDATAPROPERTIES........................................................................................................................15TABLE7-FACEBOOKEVENTDATAPROPERTIES................................................................................................................17TABLE8-EW-SHOPPPROPERTIES................................................................................................................................21TABLE9–PROPERTYMAPPINGS..................................................................................................................................23

Page 7: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

7

Chapter2 IntroductionTheEW-Shoppprojectaimstosupportmoderne-commercebusinessesbygivingthemthemeanstoplace their business data into context. Roughly this context can be split into environmental andsocial.Fortheenvironmentalaspect,theprojectisfocusingonweatheranditseffectonconsumers.Forthesocialinfluences,theprojectofferstoolstoexploreimpactsofeventsaswellastoolstolinkbusinessdataacrosslanguagesinmultilingualsettings.

Theprojectevent,weatherandmultilingualdata serviceswereall developedearly in theproject,sincetheywereneededforthedevelopmentofthepilotsdescribedindeliverableD4.2[7].Duetothis, the deliverablewith the specificationof these data services,D1.3 [3], already also describedtheir technicaldetails andAPIs. Thisdocument therefore focusesonupdates sinceD1.3anddoesnotunnecessarilyrepeatcontent.

Foreventdatathelargestdevelopmentistheintroductionofcustomevents.TheseareeventsthatimpactbusinessesbutarenotcoveredinnewsmediaandcannotbedetectedusingdatafromEventRegistry–anewsmediamonitoringplatformandprimarysourceofprojecteventdata.Sincetheseeventsarebusiness-specificweintroduceanontologyfortheirdescription.

Weather data services have received few functional updates, withmost of the work focusing onimproving their stability and removing bugs. In this document we explore potential sources ofweatherdata for after theproject,whendata fromEuropeanCentre forMedium-RangeWeatherForecastsmaynotbeavailable.

Forthemultilingualdataservicestherearealsofewupdatestoreport,butweintroduceanewtaskthat arose during the development of the JOT business case. JOT data contains large amounts ofkeywords that need to be clustered based on their semantics, to enable efficient processing.Wedescribetheproblemandtheclusteringapproach.

2.1 RelationshiptoOtherDeliverables

This deliverable describes the updated versions of the EW-Shopp data services introduced anddescribed inD1.3[3].Theservicesusedata formatsspecifiedbydeliverableD1.2 [2]andfulfil theinteroperabilityrequirementsfromdeliverableD1.1[1].Theserviceswereusedinthedeploymentof pilots described in deliverable D4.2 [7]. Based on the pilots’ outcome the performance of theserviceswasevaluatedindeliverableD2.3[4].

2.2 AbbreviationsandAcronyms

AbbreviationsandacronymsusedinthedocumentareexplainedinTable1.

Page 8: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

8

Table1.Abbreviationsandacronyms

Abbreviation Description

API ApplicationProgrammingInterface

BC BusinessCase

CSV CommaSeparatedValues

EAN EuropeanArticleNumber

EC EuropeanCommission

ECMWF EuropeanCentreforMedium-RangeWeatherForecast

EU EuropeanUnion

HTTP HypertextTransferProtocol

ID Identifier

JSON JavaScriptObjectNotation

JSON-LD JavaScriptObjectNotationforLinkedData

MARS MeteorologicalArchivalandRetrievalSystem

RDF ResourceDescriptionFramework

RDFS ResourceDescriptionFrameworkSchema

OWL WebOntologyLanguage

REST RepresentationStateTransferwebservices

URI UniformResourceIdentifier

URL UniformResourceLocator

UTF UnicodeTransformationFormat

W3C WorldWideWebConsortium

Table2showstheprojectpartnersalongwiththeirshortreferencesforeasiermentionsthroughoutthedocument.

Page 9: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

9

Table2.Shortreferencesforprojectpartners

No. Beneficiary(partner)nameasin[GA] Shortreference

1 UNIVERSITÀDEGLISTUDIDIMILANO-BICOCCA UNIMIB

2 CENEJEDRUZBAZATRGOVINOINPOSLOVNOSVETOVANJEDOO CE

3 BROWSETEL(UK)LIMITED BT

4 GfKEURISKOSRL GfK

5 BIGBANG,TRGOVINAINSTORITVE,DOO BB

6 MEASURENCELIMITED ME

7 JOTINTERNETMEDIAESPAÑASL JOT

8 ENGINEERING–INGEGNERIAINFORMATICASPA ENG

9 STIFTELSENSINTEF SINTEF

10 INSTITUTJOZEFSTEFAN JSI

Finally,Table3containsasummaryofthetoolsandcomponentsoftheEW-Shopptoolkit,whicharementionedinthisdocument.

Table3:SummaryofthetoolsintheEW-Shopptoolkit.

ComponentName Shortdescription

DataGraft DataGraftisacloud-basedplatformfordatahostingandinteractivedatatransformations.InthetoolkitithastheroleofthedatawranglercomponenttogetherwithGrafterizer,itsdatatransformationinterface.ItisdevelopedandmaintainedbySINTEF.

ASIA Atoolforthesemanticenrichmentofdataavailableintabularformats.ItissupportedbyABSTAT,atooltoprofileknowledgegraphsrepresentedinRDFbasedonlinkeddatasummarizationmechanisms.ItisincludedasaplugininDataGraftandisdevelopedandmaintainedbyUNIMIB.

QMiner QMinerisadataanalyticsplatformforprocessinglarge-scalereal-timestreamscontainingstructuredandunstructureddata.Inthetoolkitithastheroleofthedataanalysercomponent.ItisdevelopedandmaintainedbyJSI.

Knowage Knowageisabusinessintelligencesuitewithstrongsupportforproducinghigh-qualityreportsofthetransformed,enrichedandanalysedinformationobtainedfromthetoolkit.Inthetoolkit,ithastheroleofthedata-reportingcomponent.ItisdevelopedandmaintainedbyENG.

Page 10: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

10

2.3 DocumentStructure

Thisdocumenthasthefollowingstructure.Chapter2containstheintroductiontogetherwiththelistofabbreviationsandacronymsusedinthedocumenttextandthisstructureoverview.Thefollowingthere chapters,Chapter3Chapter4andChapter5describeevent,weatherandmultilingualdataservicesrespectively.Chapter6closesthedeliverablewithconcludingremarks.

Chapter3 EventDataEventsrepresentthesocialcontextoftheshoppers’journey.Theoccurrencesandobservancesthatmay encourage them to spend more, direct them to particular products, drive them to requestcustomersupportenmasseordistract themfromshoppingaltogether.Thischapterdescribes thedatasourcesandformatsofeventdataintheEW-Shoppproject.

3.1 UpdatesSinceD1.3

The source of global events in the EW-Shopp project is Event Registry, a platform formonitoringmass news media. Since it is an established platform and it already has an extensive andcomprehensiveAPI,little-to-noneextensionwasneededforthepurposesoftheproject.TheAPIanditsdataformataredescribedindetailindeliverableD1.3[3].

EventRegistryisaveryrichdatasource,butitonlyallowsustoobservetheworldthroughthelensofnewsmedia.Duringthedevelopmentoftheprojectpilotsitbecameincreasinglyclear,thatthisisnotsufficient.Thereisawiderangeofeventsnotcoveredinthenewsthatcanhaveamassiveeffectonconsumerbehaviour.Marketingcampaignswithspecialoffereventsanddiscountsareperhapstheclearestexampleofthis.Boththepricechangeeventfromthediscountofferedaswellastheincreased visibility from additional advertisement canmove themarket. Another example for thecaseofcallcentremanagement is thedatewhenreceipts formobilephonesubscriptionpackagesare sent to the subscribers. Customers commonly have questions regarding their receipts, whichincrease, call volume. If the company issuing the receiptshasby some chancemadea systematicerrorwhencalculatingreceiptamounts,thecallcentreisfloodedwithcallers.

Thepreviousparagraphpresentedtwoexamplesofeventswehavenamedcustomevents.Therearemanymoreeventsofsuchnatureandwecallthemcustomsincetheyarespecifictoeachbusinessandtypicallyneedtobeat leastpartially tailoredto theirneeds.Tosupportcustomevents in theEW-ShopptoolkitweintroduceanontologyfordescribingtheminSection3.2.

3.2 CustomEvents

This section is devoted to the definition of the EW-Shopp Ontology to model custom events ofinteresttotheproject.

Page 11: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

11

3.2.1 ObjectivesofanOntologyforCustomEvents

ThedefinitionoftheEW-Shoppcustomeventontologyhasthegoalofharmonizingthedescriptionofeventsthatareprovidedorusedbypartners intheEW-Shoppproject. InEW-Shoppeventsareusedtoenrichinformationaboutothermeasuresthatdescribeabusinessphenomenonofinterestto build predictivemodels. Thesemeasures are different in different business cases.We refer toD4.2[7]andD2.3[4]foradetaileddescriptionofthedatapreparationandanalyticsworkflowsthatneedtobesupportedintheproject.

Theontologyforinternaleventshastheaimofdefiningasharedterminologytodescribeeventsandsupporttheintegrationofdataabouttheseeventswithexistingdatasoastobuildintegrateddatathatarefeedtotheanalyticalmodellingstepsintheworkflows.SinceinEW-Shoppwerefertothisintegration step as semantic enrichment, the aim of the internal event ontology is to supportsemanticenrichmentofadatasetwitheventdata.Moreingeneral,wecansetthegoalofthiseventontologyassupportingevent-basedanalyticsworkflowsintheindustry.

3.2.2 Methodology

The methodology adopted to design the EW-Shopp custom event ontology is inspired by amethodologyfortheagileandsimplifieddesignofontologiesproposedbySilvioPeroni [8],oneofthe most recent methodologies proposed for ontology design; in particular, this methodologyproposesacycleconsistingofthefollowingthreephases:M1)collectionofdomaininformationwiththe help of domain experts, definition of usage scenarios and test cases, definition of amodelet(ontologypiece)basedon theseprinciplesandmeeting theusage requirements,definitionof testcasesandreleaseofthemodelet;M2) integrationofthetestcaseswiththecurrentontology;M3)refactoring of the current ontology. The methodology also includes in the sub steps severalrecommendations:usageofaglossary(termstobeconsidered)forthedefinitionofthetestcases,reuse of ontology design patterns and existing ontologies, keep the modelets and the ontologysimpleandclosetotherequirementsspecifiedinthetestset,bestpracticesforentitynames.

TheworktothedefinitionoftheEW-Shoppinternaleventontologyhasbeenthereforeorganizedinthefollowingphases(weincludereferencestotheabove-mentionedmethodology).

1. (M1) State of the art: a comprehensive review of the literature and available tools wasalready conducted inD1.1 [1]. This preliminary study allowedus to identify the recurrentpatternsformodellingevents,andrankontologiesbypopularityandcompleteness.Tothisontology we added the analysis of event descriptions in Schema.org (enclosed in thisdocument). The outcome is that Schema.org is themost popular event ontology and themostcompleteaccordingtoEW-Shopprequirements.Thisontologyprovidesinfactseveralpatterns for modelling events and related information (a guideline recommended in theadoptedmethodology)asdiscussedinSection3.2.3.

2. (M1) Sampleeventdata collection: the actual definitionof EW-Shoppontology startedbycollecting event data samples from partners to identify the main concepts and data ofinterest foreachpartner.SamplesaretableswithdataextractedfromactualdatasetsandwillbepresentedinSection3.2.4.

Page 12: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

12

3. (M1) Samples schema alignment: sample tables have been compared to identify commonconcepts (properties for the description of events), and preliminary data type definition(reportedinSection3.2.4).

4. (M1)Use and test cases definition: the usage of ontology-compliant event descriptions inEW-Shopp, with consequent test cases, is well defined in EW-Shopp: it consists in theenrichmentofcorporatedatawithcustomeventdatarelevantfortheiranalysis,asdefinedinthebusinesscases.

5. (M1)Definitionofguidelinesforthedefinitionoftheontology:basedonthereviewofthestateoftheartandontheanalysisofsamplesofeventdatausedbythepartners,wehavederived a set of guidelines that have inspired the definition of the ontology, which arereportedinSection3.2.5.

6. Ontologydefinition:Schema.orghasbeenadoptedasstartingontologytodefinemappingswherepossibleandaddnewconceptstocomplytheEW-Shoppneeds.Themainadvantageis to keep compliance with existing tools and systems that already adopt Schema.org asreferenceontology.Theresultsof thisphasearediscussed inSection3.2.7.Thisdefinitionphasehasfollowedthefollowingsubsteps:6.1. (M2) definition of the subset of Schema.org of interest based on the vocabulary

usedinthesampleschemas;6.2. (M2-M3)foreacheventdatasource:mappingofeachdataschemastoSchema.org

and extension of the ontology with the source properties not covered bySchema.org;

6.3. (M3)refactoringoftheontologyandfinalizationofthefirstversion.Theresultsofthesephasesarefurtherdescribedinthenextsubsections.

3.2.3 Schema.orgEventModel

AccordingtothedefinitiongiveninSchem.orgofficialdocumentation1,thedatamodelusedisverygenericandinspiredbyRDFSchema2.Schema.orgdefines

1. asetoftypesarrangedinamultipleinheritancehierarchywhereeachtypemaybeasub-classofmultipletypes.

2. asetofproperties:I. eachpropertymayhaveoneormoretypesasitsdomains.Thepropertymaybe

usedforinstancesofanyofthesetypes.II. eachpropertymayhaveoneormoretypesasitsranges.Thevalue(s)ofthe

propertyshouldbeinstancesofatleastoneofthesetypes.

ThedecisiontoallowmultipledomainsandrangeswaspurelypragmaticandrelatedtothedifficultyofspecifyingmultiplepossibledomainsandrangeswithontologyweblanguageslikeRDFSandOWL.Whilethecomputationalpropertiesofsystemswithasingledomainandrangeliketheonesbasedon RDFS are easier to understand, in practice, this forces the creation of a lot of artificial types,

1 Schema.orgeventmodel:http://schema.org/Event2RDFschema:http://www.w3.org/TR/rdf-schema/

Page 13: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

13

whicharetherepurelytoactasthedomain/rangeofsomeproperties.Otherwise,OWLsupportsthespecificationofmultipledomainsandrangesbyspecifyingasadomain(orrange)thedisjunctionofmore classes or datatypes (here we will use the term type to refer to a class or a data type).However,OWLwasfoundtoocomplicatedtobeunderstoodbyalargenumberofpractitioners.Forthis reason, domains and ranges are specified in Schema.org using the meta predicatedomainIncludesandrangeIncludes.Inthefollowing,whenwesaythatapropertyPhasdomain“CorD”,we indicate thatPhasCandDas recommendeddomains, i.e., that<P,domainIncludes,C>and<P,domainIncludes,D>. These specifications are not strong enough to support inference but areintendedaspragmaticrecommendationsabouttheusageoftheproperties.Thisimpliesthatthereis no logical enforcing on domains and ranges in Schema.org and specifications of properties’domainsandrangesprovided inSchema.orgcanbeoverwrittenorslightlychanged inanontologythatreusethesepropertieswithoutcausingproper inconsistencies.Finally, itshouldbenotedthatpropertiesinSchema.orgareusedpolymorphicallywithclassesandliteralsasdomains(wewillreferto this feature as polymorphic property usage). So it may happen that a property, e.g.,schema:identifier has Text andProduct as domains,whichmeans that the valueof suchpropertymaybeapieceof text (e.g.,avalueof thexsd:Stringdatatype)oraURI identifyingan instanceofProduct.

The canonical machine representation of schema.org is in RDFa3. Representations in JSON-LD,Microdata,andOWL4arealsoavailable.

Schema.orgwas not designed to become a universal ontology. Instead, it is expected to be usedalongsideothervocabularies that share thebasicdatamodeland theuseofunderlying standardslike JSON-LD, Microdata and RDFa as proposed by Schema.org. We observe that JSON-LD(particularly in combination with Schema.org) is a W3C-supported language that has gained asignificantuptakeamongpractitioners andprogrammers. Polymorphicusageof properties canbeusedwithJSON-LD,whichimposesfewerrestrictionsthanRDFwhenusedunderOWLspecifications.Theseobservationswillbeconsideredinthedesignofthecustomeventontologydescribedinthisdeliverable(seeSection3.2.7).

Asoftoday,theeventmodelhasbeenadoptedinmany(between100.000and250.000)systemsasreported on the official site; including a popular WordPress plugin that adds complete JSON-LDbasedschema(structureddata)toeventpostsgeneratedwithfollowingplugins.

• StandardGoogleEventRichSnippetSchema.

• EventdetailpageEventschema

• AutomaticallycreateEventRickSnippetforyourevent,NoManualwork.

• WorkwithleadingEventCalendarPluginslikeEventManager,AllinoneEventCalendar,

• EventOn,WPEventAggregator, ImportFacebookEvents, ImportEventbriteEvents, ImportMeetupEvents

• EventsManagerTicketcanbeshowninGoogleSchema3CanonicalrepresentationofSchema.org:https://schema.org/docs/schema_org_rdfa.html4OWLrepresentationofSchema.org:https://schema.org/docs/schemaorg.owl

Page 14: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

14

• For All In One Event Calendar by Time.ly support Event List, Agenda, Day,Month,Week,Posterboard,Stream.(Pro)

Maturity,flexibilityandpopularityoftheschema.orgmodelarethemainreasonsforitsadoptionfordevelopingtheEW-Shoppeventontology.

3.2.4 Partners’EventData

3.2.4.1 BC1-Pilot1-CenejeTable 4 reports the properties that have been found in the event data samples collected fromCeneje.

Table4-Cenejeeventdataproperties

Property Description

DateTime Eventdateandtime

ProductDescription Productname

EanCode EANcodeifexists

CenejeProductId Cenejeinternalproductid

CategoryId Cenejeinternalcategoryid

SellerId Sellerid

SellerProductId Sellerproductid

PriceChanged 1OR0ifpricechangedcomparetopreviousday

Price Price

Change Pricechangein%

3.2.4.2 BC1-Pilot2-BigBangTable5 reports theproperties thathavebeen found in theeventdatasamplescollected fromBigBang.

Table5-BigBangeventdataproperties

Property Description

ProductID ProductID

Page 15: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

15

ProductDescription Productdescription/modelname

EANCode EANcode

ProductGroupLevel4 Productgroup

ActivityID Uniqueidofactivity

ActivityID_new Uniqueidofactivity

ActivityType Activitytypeaccordingtoclassification

ActivityTypeDesc Activitytypedescription

ActivityTitle Shortactivitydescription

ChannelID Mktchannelid

ChannelDesc Mktchanneldescription

ProductCatalogID IDoftheCatalog(distributedtohouseholds)

ProductCatalogDesc Catalog#

BeginDate Beginningoftheactivity

EndDate Endoftheactivity

PriceDiscount Whethermodelincludesdiscountin%orlowerpriceinEUR

Discount Discountin%

Price Priceonthepricelist(sellingpriceinaction=pricewithdiscount,taxincl)

3.2.4.3 BC1-Pilot3-Browsetel-CDETable6reportsthepropertiesproposedbyCDEtodescribeinternalevents.

Table6-CDEeventdataproperties

Property Description

ID EventID

Page 16: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

16

NAME Shortdescription

DESCRIPTION Description

START_DATE Startdate

[END_DATE] Enddate(optional)

START_TIME Starttime

END_TIME Endtime

CLASSIFICATION_ID IDtoclassificationdefinition

QUANTITY Quantitativevalue(e.g.,numberofcalls)

QUANTITY_UNIT_ID Typeassociatedwithvalues(e.g.,countinginpositiveinteger)

PRODUCT_ID IDtoproductclassification

LOCATION_ID IDtolocationdefinition

CLASSIFICATION_DEFINITION

CLASSIFICATION_CODE Classificationcode

CLASSIFICATION_DESCRIPTION Description

PRODUCTDEFINITION ProductID

EAN_CODE EANcode

PRODUCT_DESCRIPTION Description

LOCATIONDEFINITION Catalog#

Page 17: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

17

LOCATION_NAME Name

LOCATION_DESCRIPTION Description

GIS_X Longitude

GIS_Y Latitude

3.2.4.4 BC3-MeasurenceThis case isdifferent from theonesofotherpartners sinceMeasurence is interested in capturingeventsassociatedwithcampaignsthatarepromotedasFacebookevents.Therefore,theEW-ShoppmodelshouldcapturethepeculiarpropertiesfromtheFacebookeventdatamodel.Table7reportsthepropertiesofinteresttakenfromtheFacebookAPIdescriptions5.

Table7-Facebookeventdataproperties

Property Description

id TheeventID

name Eventname

description Long-formdescription

start_time Starttime

end_time Endtime,ifonehasbeenset

interested_count Numberofpeopleinterestedintheevent

attending_count Numberofpeopleattendingtheevent

PlaceID EventPlaceinformation

PLACEDEFINITION

name Name

5 FacebookeventAPI:https://developers.facebook.com/docs/graph-api/reference/event/

Page 18: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

18

city City

country Country

country_code Countrycode

latitude Latitude

longitude Longitude

region Region

street Street

3.2.5 UseCasesforInteroperableDescriptionsofCustomEventData

Enrichmentwith custom event data has already been donewith ad-hoc coding strategies for theimplementation of BC1 pilot services (as developed by the companies Browsetel, Ceneje, and BigBang).DeliverableD2.3[4]reportsabouttheevaluationofthecapacityofthetoolkitinreplicatingthedataworkflowsusingthetoolsdevelopedintheproject.

The integrationofFacebookevents forusage inBC3(developedbyMeasurence) isongoing,whileeventdatausedinBC4willrelyontheEventRegistry.

Test cases for the custom event data ontology consist in the successful development of dataenrichmentworkflows that use representationsof customeventsbasedon the vocabularyof thisontology. Inparticular, theontologyhas thegoalofmaking it possible forpartners to share theircustomeventsviaAPIs,whichreturneventdata inJSON-LDformatbasedontheEW-Shoppeventontology.Theontologywillworkasarecommendationforpartnersabouttheterminologytouseforsharing and exchanging custom event data. We plan to extend the ASIA tool to consume theserepresentations, possibly defining widgets in the ASIA GUI to fetch events similar to the widgetdevelopedforweatherdataanddescribedinD2.3[4].

3.2.6 GuidelinesfortheDesignoftheOntology

Based on the goal of the event ontology, i.e., supporting event-based analytics workflows in theindustry, and on the previous steps of the adopted methodology, we have drawn the followingguidelinestodrivethedesignoftheontology:

Page 19: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

19

1. Harmonization re-using sharedontologies. Tomake theontology valuable andextensiblebeyondthespecificdatapreparationandanalyticsworkflowssupported intheproject,wewilltrytousetheterminologyofexistingontologiestoharmonizetheterminologyusedtodescribeevents.

2. Limitednestingofeventdescriptions.Afterthesemanticenrichmentstep,eventdatawillappearincolumnsofatablethatcontainstheenricheddata;asaconsequence,whenusedin the analytical modelling steps, event descriptions are flattened into a table; the eventontologyshould,therefore,nativelysupporttheenrichmentstep.

3. Intuitiverenderingofpropertiesas tableattributes ineventdescriptions.Becauseof (2),thecolumnheadersshould intuitivelydescribethecontentof thecolumn;whilesearchingfor harmonizing the terminology used to describe the event, i.e., reducing the number ofdifferent terms used to describe similar properties of the events, the terminology mustmake the data still understandable by users who will work with them in the analyticalmodelling steps of the workflow. As a consequence, some of the terminology used bypartnerstodescribetheireventswillbepreservedintheontology.

4. Polymorphicpropertyusageandheuristicspecificationsofdomainsandranges.Wefoundthat the reasons that motivated the polymorphic property usage and the heuristicspecification of domains and range, i.e., as a recommendationmore than as a normativespecification, also applies to the contextwhere this event ontology is used. For example,also in this case, the event ontology would be mostly used to specify the meaning ofpropertiesusedindataexchangedusingtheJSON-LDformat.Wheneventdatawillappearinanenricheddataset,eventswillbeeithermodelledinJSONorinatabularformat;inthefirstcase,JSON-LDisfullyJSONcompliant;inthesecondcase,ontologytypeswillnotappearwhilepropertynameswillbeheadersofthecolumns.

3.2.7 TheEW-ShoppOntologyBasedonSchema.org

We introduceaproperty-drivenontology,whichmeans that theprimarygoal is toharmonize thepropertiesusedtodescribeevents.WhendatawillbecollectedasJSONdata,JSON-LDcanbeusedtoreusetheontologyproperties;whendatawillbecollectedasorfactoredintoatable,propertiescan provide the header for each column. For this reasonwemostly specify the properties of theontology,identifyingaminimalnumberoftypesthatarerelevantbecauseusedastypesofsubjectsorobjects(values)fortheseproperties.

Page 20: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

20

Figure1.Maintypesusedinthecustomeventontologyandtheirmutualrelations

According to thedecisionofadoptingSchema.orgas referenceontology,we identifiedamong theavailablepropertiesthosethatcanbemappedtotheonesinusebypartners.Forthosethatdonotrepresent the concepts of interest we introduced new properties as specialization of existingSchema.orgproperties,sotokeepthehighestcompliancypossible.

Table 8 reports the properties taken from Schema.org (with schema prefix and highlighted withorangebackground), and theones introducedbyEW-Shopp (withewsprefixandhighlightedwithlight orange background). In the notes, we report: references to Schema.org types fromwhich aproperty is derived (when possible),wherewith “derived from a type”wemean that the type isspecifiedamongthedomainsoftheproperty,and,foranewproperty,thepropertyofSchema.orgwhichthepropertyisasubpropertyof.

TheEW-ShoppEventOntologyisanontologyspecifiedinRDF.Themaintypesusedintheontologyand theirmutual relations are depicted in Figure 1.We omit from the figure the properties thateither have data types as domains (e.g., integers, floats, etc.), with the only exception of time-relatedinformationthatiscrucialforeventrepresentation;otherpropertiesthathaveliteralvaluesordescribemoredetailed information,e.g.,ofpostaladdressesareomittedanddescribed later inthissubsection.ThedarkorangecolourindicatestypesandpropertiesspecifiedinSchema.org,thelightorangecolour indicates typesandproperties introduced in theEW-Shoppontology (with the“ews:”prefix),thegreencolourindicatesdatatypesandthepurplecolourindicatesthegenericURItype(consideredequivalenttoThing)andonetypefromanotherontology.WeomittheprefixesofalltypesandpropertiesthatareeitherreusedfromSchema.orgorbasedonxsd:types(i.e.,TimeandDateTime).

Theontologyhasthefollowingproperties:

• ItisbasedonanextensionofSchema.orgontology.• As Schema.org it uses polymorphic properties and heuristic domain/range specifications

(with includesDomain and includesRange); this featuresmake itdifficult toproperlydepict

O

NTO

LOG

IE E

“TE

CN

OLO

GIE

SE

MA

NTI

CH

E”

Event

Place

PostalAddress

Product

URI | skos:Concept location

location

category

category

address

ews:product

ews:MarketingEvent

Date | DateTime

endDate startDate

subClassOf

Page 21: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

21

multiple domain and range specifications in Figure 1 (wherewe representmultiple rangespecificationsassinglenodeswithmorelabelsseparatedbythe“|”symbolandonlyreportmaintypesusedasranges).

• Themain typesconsidered in theontologyarederived fromSchema.org,whereare listedamongthemostfrequentlyusedtypes6.Thesetypesare:

o schema:Event,whichisthetypeassociatedtoallevents;o schema:Product,whichisthetypeassociatedtoproducts;o schema:Place,whichisthetypeassociatedtolocations;

• Additionaltypesusedintheontologyare:o ews:MarketingEvent,whichistheonlynewtypeintroducedintheontology,andis

definedassubclassofschema:Event;o skos:Concept, which is defined as possible type for a property schema:category,

which is introduced in to associate a category to an event; the typeschema:CateogryCode is pending in the Schema.org definition and not used indomainssimilartotheonesaddressedinEW-Shoppsofar;forthisreasonwereusedatype(i.e.,anOWLclass)definedinSKOS,aW3C-recommendedlanguagetodefinesimplecategorizationsystems;

o schema:PostalAddress,which isusedbecause it is the recommendedvalue for theschema:addresspropertythatisattributedtolocations(instancesofschema:Place);inpractice,apostaladdressisaplaceholderusedtoaggregatemorespecificaddressinformation specified using a number of properties; leveraging the non-normativespecification of domains and ranges in Schema.org, we also consider descriptionswhere these properties (e.g., schema:postalCode ) are directly referred to placeswithoutusinganinstanceofpostaladdressasintermediary.

Schema.orgdoesnotprovidepropertiestodescribemeasuresofevents’aspects,e.g.,thenumberof attendees; we introduced several properties to describe these measures; in this case, wepreferred to keep a terminology as close as possible to the terminology used to specify thesemeasuresby thepartners; however,we linked theseproperties to Schema.orgby specifying theirsuperpropertiesinSchema.org.

Table8-EW-Shoppproperties

NAME RANGE DESCRIPTION NOTES

EW-Shoppcustomeventdefinition(propertiesthatdescribeinstancesofschema:Event)

schema:identifier TextorURI Anidentifierofanitem schema:Thing

schema:name Text Thenameoftheitem. schema:Thing

schema:description Text Adescriptionoftheitem. schema:Thing

ews:source Text Adescriptionofthesourceoftheevent

ews:channelCode Text Acodeassociatedwithachannelinamarketingevent ews:MarketingEvent

ews:channelDescription Text Adescriptionassociatedwithachannelinamarketingevent

6https://schema.org/docs/gs.html#schemaorg_types

Page 22: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

22

schema:startDate DateorDateTime

Thestartdateandtimeoftheitem(inISO8601dateformat).

schema:Event

schema:endDate DateorDateTime

Theenddateandtimeoftheitem(inISO8601dateformat).

schema:Event

schema:category URI Acategoryforanitem

schema:Thing(subpropertyofschema:about;rec.rangeisskos:Concept)

ews:quantity xsd:int AnumberidentifyingagenericquantitySubpropertyofews:simpleMeasure

ews:quantyUnitId URIorTextThespecificationoftheunitinwhichaquantityismeasured

Subpropertyofschema:identifier

ews:interestedAudience xsd:int Thenumberofpeopleinterestedinanevent Subpropertyofews:simpleMeasure

ews:attendingAudience xsd:int Thenumberofpeoplewhoplantoattendanevent Subpropertyofews:simpleMeasure

ews:priceChanged BooleanAmeasurethatassignsabooleanvaluetospecifyifthepriceofaproducthaschangedornot

Subpropertyofews:booleanMeasure

schema:discount TextorBoolean Anydiscountapplied(toanOrder) schema:Order

ews:priceChange xsd:float Pricechangein% Subpropertyofews:simpleMeasure

schema:price xsd:float Theofferpriceofaproduct,orofapricecomponentwhenattachedtoPriceSpecificationanditssubtypes. schema:Offer

ews:product URIorProductTheproducttheeventrefersto-ifwearedescribingeventsaboutproducts

Subpropertyofschema:about

schema:locationPlaceorPostalAddressorText

Thelocationofforexamplewheretheeventishappening,anorganizationislocated,orwhereanactiontakesplace.

schema:Event

ews:simpleMeasure xsd:floatorxsd:int Ameasureusedto Subpropertyof

schema:value

ews:booleanMeasurexsd:floatorxsd:int Ameasurethatassignsabooleanvalue

Subpropertyofschema:value

EW-Shoppclassificationdefinition

schema:description Text Adescriptionoftheitem. schema:Thing

EW-Shoppproductdefinition(propertiesthatdescribeinstancesofschema:Product)

schema:gtin13 TextTheGTIN-13codeoftheproduct,ortheproducttowhichtheofferrefers.Thisisequivalentto13-digitISBNcodesandEANUCC-13.

schema:Product

schema:description Text Adescriptionoftheitem. schema:Thing

schema:seller URIAnentitywhichoffers(sells/leases/lends/loans)theservices/goods.Asellermayalsobeaprovider.

schema:BuyActionorschema:Offerorschema:Order

schema:sku TextTheStockKeepingUnit(SKU),i.e.amerchant-specificidentifierforaproductorservice,ortheproducttowhichtheofferrefers.

schema:Productorschema:Offer

ews:catalogId Text Specifytheidentifier Subpropertyofschema:identifier

schema:description Text Adescriptionoftheitem. schema:Thing

schema:category URI Specifiedassubpropertyofschema:about;rangeisskos:Concept

schema:Productorschema:Thing

EW-Shopplocationdefinition(propertiesthatdescribeinstancesofschema:PlaceandPostalAddress)

schema:name Text Thenameoftheitem. schema:Thing

schema:description Text Adescriptionoftheitem. schema:Thing

schema:addressLocality Text Thelocality.Forexample,MountainView. schema:PostalAddress

schema:addressCountry CountryorText Thecountry.Forexample,USA.Youcanalsoprovidethetwo-letterISO3166-1alpha-2countrycode.

schema:PostalAddress

Page 23: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

23

schema::addressCountry CountryorText Thecountry.Forexample,USA.Youcanalsoprovidethetwo-letterISO3166-1alpha-2countrycode.

schema:PostalAddress

schema:latitude NumberorText Thelatitudeofalocation.Forexample37.42242(WGS84). schema:GeoCoordinates

schema:longitude NumberorTextThelongitudeofalocation.Forexample-122.08585(WGS84). schema:GeoCoordinates

schema:addressRegion Text Theregion.Forexample,CA. schema:PostalAddress

schema:streetAddress Text Thestreetaddress.Forexample,1600AmphitheatrePkwy. schema:PostalAddress

schema:postalCode Text Thepostalcode.Forexample,94043. schema:PostalAddress

schema:address TextorPostalAddress

Theaddress,possiblyspecifiedasastructuredPostalAddressspecification.

schema:Placeorschema:Personorschema:Organizationorschema:GeoShapeorschema:GeoCoordinates

3.2.7.1 MappingBetweenEW-ShoppOntologyPropertiesandPartners’PropertiesThefollowingTable9reportsthemappingsbetweenpropertiesdefinedintheEW-Shoppontologyandthepropertiesdiscussedintheprevioussection(theintendedsemanticsofamappingsbetweentwoproperty is that they represent equivalent relations). Full descriptionof themappings canbefoundonaspreadsheetonline7

Table9–PropertyMappings

EW-Shopp BT CE BB ME(Facebook)

CUSTOMEVENTDEFINITION

schema:identifier ID ActivityID id

schema:name NAME ActivityTitle name

schema:description DESCRIPTION description

ews:source SOURCE

ews:channelCode ChannelID

ews:channelDescription ChannelDesc

schema:startDate START DateTime BeginDate start_time

schema:endDate [END] EndDate end_time

schema:category CLASSIFICATION_CODE CategoryId ActivityType

ews:quantity QUANTITY

ews:quantyUnitId QUANTITY_UNIT_ID

ews:interestedAudience interested_count

ews:attendingAudience attending_count

ews:priceChanged PriceChanged

schema:discount PriceDiscount

ews:priceChange Change Discount

schema:price Price Price

ews:product PRODUCT_ID CenejeProductId ProductID

7https://docs.google.com/spreadsheets/d/1DgaWlVJiI2ZvXT_z8B3kGx4W6XcmwWGdOw7SGL0HbK8/edit?usp=sharing

Page 24: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

24

schema:location LOCATION_ID Place/LocationID

ews:simpleMeasure

ews:booleanMeasure

CLASSIFICATIONDEFINITION

schema:description CLASSIFICATION_DESCRIPTION ActivityTypeDesc

PRODUCTDEFINITION

schema:gtin13 EAN_CODE EanCode EANCode

schema:description PRODUCT_DESCRIPTION ProductDescription ProductDescription

schema:seller SellerId

schema:sku SellerProductId

ews:catalogId ProductCatalogID

schema:description PRODUCT_DESCRIPTION ProductDescription ProductDescription

schema:category ProductGroupLevel4

LOCATIONDEFINITION

schema:name LOCATION_NAME name

schema:description LOCATION_DESCRIPTION

schema:addressLocality city

schema:addressCountry country

schema::addressCountry country_code

schema:latitude GIS_X latitude

schema:longitude GIS_Y longitude

schema:addressRegion region

schema:streetAddress street

schema:postalCode zip

schema:address

3.2.7.2 EncodingandDataFormatsAlldata,and,inparticular,textualdatawillberepresentedusingUnicodeUTF-8characterencodingtosupportinteroperabilityacrosslanguagesatthealphabetlevel.

Chapter4 WeatherData

Weather is amajor factor in the environmental context of the shopper’s journey. To analyse andmodelitseffectsonshopperbehaviour,theprojecthasanagreementwiththeEuropeanCentreforMedium-RangeWeather Forecasts8 (ECMWF) to access itsMARS9weather data archive. As this istheir operational archive, this means the project can obtain historic weather state data, historic

8https://www.ecmwf.int/9MeteorologicalArchivalandRetrievalSystem

Page 25: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

25

weatherforecastdataaswellascurrentweatherforecastdata.AwrapperAPIaroundtheECMWFAPIwasdevelopedforprojectpurposes.ItwaspresentedindeliverableD1.3[3]andhasbeenusedforthedevelopmentofalltheprojectpilots.

4.1 UpdatesSinceD1.3

ECMWF provides an API in Python for access to its weather data archive. It is intended anddevelopedforuseinmeteorologicalinstitutions,whichmeansitisnotwellsuitedforbusinessuse.For example, it uses a set of internal codes to denote individual weather parameters to access(temperature, pressure etc.) and it retrieves the data inGRIB, a binary format commonly used inmeteorology.

To enable theuseofweather data in projectworkflows awrapperAPIwasdeveloped inPython,whichstreamlines the retrievaloperationsofbusiness-relatedweatherparameters (i.e. those thatmay reasonably be expected to influence shopper behaviour). For example, it retrieves relativehumidityoftheairbutskipsthetemperatureoflakewater.TheAPIalsosupportsaggregationovertimeandgeographicalregions.

TheAPIisavailableonapublicGitHubrepository10.Documentationwithexamplesofuseisprovidedin its wiki page11. Since it was necessary for the development of the pilots, its developmentwasprioritizedearlyintheprojectanditwasinaverymaturestatewhenitwasdescribedinD1.3[3].Inthe past year, there were little-to-none functional updates to it, only bug fixes and stabilityimprovements.

4.2 AlternativeSourcesofWeatherData

Theagreement theprojecthaswithECMWFtoaccess theirdata is for researchpurposesandwillendwiththeproject.Touseweather-basedanalyticsinthetoolkitaftertheprojectanappropriatesource of weather datawill need to be used. Herewe provide a set of options collected after asurveyofthefield.

ECMWF:ThesimplestoptionwouldbetoreachanewagreementwithECMWFandcontinueusingtheir data. Some appropriate level of compensationwould of course need to be negotiated. Themain consumers of ECMWF data are research institution and weather forecast agencies, whoseneedsdifferfromthosefortheworkflowsdevelopedintheproject.InourearlytalkswithECMWFtheywereopentothe ideaofexploringnewpossibilities forexploitingtheirdata.Theprojectwillreopenthisdialogueandpresenttheresultscollectedfrompilotstotogetherexaminepossibilitiesforcollaborationinfuture.

National weather forecast agencies: Most European countries have their own national weatheragencies that produce local forecasts, conduct meteorological research and provide weatherforecast data to other institutions in the country. Most of these countries are also member or

10https://github.com/JozefStefanInstitute/weather-data11https://github.com/JozefStefanInstitute/weather-data/wiki

Page 26: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

26

cooperatingstatesinECMWFandsharedatawiththem.Theyareagoodoptiontoprovideweatherdata as the formatwould likely be very close to that of ECMWF and littlemodificationwould beneeded.On thedown side, individual agreementswith eachof themwouldneed tobemade foreachcountryyou’dwanttocover.

OpenWeatherMap12:OpenWeatherMap is a service inspired by thewell-knownOpenStreetMap13projectthatprovidesaccesstoglobalweatherdata–bothhistoricdataandforecasts–viaaRESTAPI.Theyofferafree-tierservicethatallows60requestsperminutefordataoneitherthecurrentweather state or a forecast on 3 hours for the next 5 days, which can reasonably cover thepredictionneedsofseveralpilots.Theyalsooffercommercialuseraccountswithmorerequestsanddailypredictionsforupto16daysahead.Accesstohistoricdataforthepast5yearsisalsoavailableatacost,sotheycanalsoprovidebulklearningdata.Theweatherparameterstheyoffercorrespondto the ones used in the project to a large degree, so the servicemay be a goodmatch even formodelsbuiltonECMWFdata.

WeatherUnderground14:ThisserviceisasubsidiaryoftheIBM-ownedTheWeatherCompany.Theyofferasubscription-basedweatherdataAPIwhereitispossibletoobtainglobalweatherstateandforecastdataforupto15daysahead.AtanextracosthistoricdatafromJuly2011isalsoavailable.

DarkSky15:LikeWeatherUndergroundtheyofferasubscription-basedweatherdataAPIforweatherstate,forecastforupto7daysaheadandhistoricdatagoingbackdecades.

AccuWeather16: Another well-known service offering a subscription-based API. Forecast data isavailablefor15daysahead.Somehistoricweatherdataisalsoavailableasaseparateexpense,butitisunclearhowfarbackitgoes.

Chapter5 MultilingualDataLinkingServices

The tools formultilingual data linking ensure the interoperability of the EW-Shopp toolkit acrossdifferent languages.Moderne-commercebusinesses typically operate acrossdiverse geographicalregions.Inordertoleveragedataandinsightsoverlanguageborderssomemeansofinterlinkingthisdatamustbeprovided.ThischaptercoverstheservicesdevelopedwithinEW-Shopptosupportsuchinterlinking.Aswithpreviousdataservicesdescribedinthisdeliverable,theyweredevelopedearlyto support deployment of the pilots andwere already described in deliverableD1.3 [3]. Herewe

12https://openweathermap.org/13https://www.openstreetmap.org/14https://www.wunderground.com/15https://darksky.net/dev16https://developer.accuweather.com/

Page 27: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

27

focus on updates done since then. We also describe the keyword clustering task that emergedduringdevelopmentoftheJOTbusinesscasepilot.

5.1 UpdatesSinceD1.3

Multilinguality is covered in EW-Shopp by supporting cross-lingual linking with the ASIA dataenrichmenttool.WerefertodeliverableD3.2[5](Chapter3)fordetailsaboutdatalinkinginASIA.

ASIA supports cross-lingual instance-level annotations by plugging in cross-lingual reconciliationservices. Cross-lingual reconciliation services are based on multilingual indexes for the referencedata(thedatausedforreconciliation).ServicesthatcovermultilingualityusedavailableforASIAare:

• Wikifier, which covers Wikipedia entities in 130 languages (described above in thedeliverable)–seedeliverableD1.3[3]fordetails.

• GeoNames, which covers labels of spatial entities in a large variety of languages. Thecoveredlanguageschangefromentitytoentity,butusually includethelocal languageofatoponym. This means that a data provider that provides data of companies for a givenjurisdictionwheretoponymsarenamedusingthelocallanguagewouldbeabletoreconcilethetoponymagainstGeoNames.

• Wikidata,whichprovidesdifferent reconciliation services (oneper language) to import asneeded.

Wikidata and Wikifier are currently not used in in EW-Shopp data enrichment workflows, butWikifierisusedbytheEventRegistry.ThekeymultilingualdatareconciliationserviceforEW-Shoppdata enrichment workflows is GeoNames, which help reconciling location toponyms to theGeoNames knowledge basewhere locations are associatedwith geo coordinates,which, on theirturncanbeusedtofetchweatherdatafromtheMARSAPIs.

Finally,wealsoremarkthatallthedatalinkingserviceswilluseUnicodeUTF-8characterencodingtosupportinteroperabilityacrosslanguagesatthealphabetlevel.

5.2 HandlingKeywordsintheJOTDataset

Thetask in the JOTbusinesscase is toanalyseandpredict thedynamicsof impressionsofGooglekeywords in target regions based on environmental and social context, i.e. weather and events.Their pilot is unique among all project pilots in the sheer volume of text data (for details seedeliverablesD4.1 [6]andD4.2 [7]).TheJOTpilotdatasetcontains informationaboutthetimeandregionaldistributionofmillionsofGooglekeywords.Sofar,theapproachintheprojecthasbeentomodelthemindividually,butpilotexperimentshaveshownthatthisdoesnotscaletothe levelofdatainvolved.

Amorefeasibleandpracticalapproachwouldbetomodelgroupsofrelatedkeywordstogether.Thiswouldensurebetterqualityofdataandstrongersignalforthemodelsaswellasreducethenumber

Page 28: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

28

ofmodelsneeded.Indeed,theoriginalideaofthepilotwastouseGooglekeywordcategories,butunfortunatelyitwasnotpossibletoobtainthembelowthetoplevel,whichistoocoarse.

To overcome this, a method for clustering the keywords together based on their semantics isneeded. A state-of-the-art approach for this is to useword embeddings –mappings of individualwordsintovectorsofrealnumbers.Thisreducesthedimensionsofthewordspacefromthenumberofallwordstosomefixed,muchsmallernumber.Byusingalargecorpusoftext,theembeddingcanbebuilt fromdata insuchaway, thatvectorsofsemanticallyrelatedwordsareclosertogether intheembeddingspacethanthosethatareunrelated.

Anadvantageof this approach is that thewordembeddings canbepre-built and then reused forseveralproblems.Sincecomputingthemisaveryintensivebatchjobthatdemandsalotofdataandhardwareresources,someinstitutionshavealsostartedofferingfreeopen-sourcedembeddingsforpublic use. For clustering of Google keywords, the embeddings released with the fastText17 textclassification library developed by Facebook [9] are well suited. They offer embeddings for 157languages[10],includingSpanishandGermanwhicharerelevantfortheJOTcase.Theembeddingsarebuilt ona combinationofCommonCrawl andWikipediadata (fordetails see [10]),whicharebothlargeandwell-curateddatasets,promisinggoodqualityembeddings.

One final non-trivial technical detail is how touse theword embeddings to represent theGooglekeywords. Though Google uses the term “keywords”, they are in fact multi-word phrases. Thismeansweneedtohaveawaytorepresenttheentirekeywordusingtheembeddingsofitswords.Ourplanistousethestate-of-the-artapproachdescribedin[11],whereaweightedaverageofthewordsiscomputedandthencorrectedusingdimensionalityreductionmethodology.

ThissectionpresentstheplannedapproachforhandlingthelargeamountofGooglekeywordsintheJOTbusinesscasetoenablescalableanalytics.Whilethisdocumentisbeingwritten,theapproachisbeing implemented18. At this point it is unclear if the embedding-based-clustering functionalityshouldbeofferedaspartoftheEW-Shopptoolkitor if itremainsasacase-specificpre-processingstep.ThatdependsatleastinpartonthefinalperformanceofthisapproachfortheJOTcaseaswellastheeffortneededtoincorporateitintothetoolkitenvironment.

Chapter6 ConclusionThisdeliverablepresentedtheevent,weatherandmultilingualdataservicesdevelopedandusedintheEW-Shoppproject.TheseserviceswerefirstintroducedinanearlierdeliverableD1.3[3]wheretheir specifications and APIs are already described. This document focuses on the updates andextensionstotheservicesbasedonexperiencefromdevelopmentanddeploymentofthebusinesscasepilots.

17https://fasttext.cc/18TheneedtoclustersimilarkeywordsemergedduringthedevelopmentofJOTpilotservices,asdocumentedinD4.2.

Page 29: D1.4 - Event, Weather and Multilingual Data Services · EW-Shopp GA number: 732590 H2020 -ICT20162017/H2020 1 3 Executive Summary This deliverable describes the event, weather and

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

29

OfalltheextensionsdescribedthelargestisthedevelopmentoftheontologyforcustomeventsinSection3.2.ThisontologyprovidesbusinessesusingtheEW-ShopptoolkitthemeanstodescribeanyeventsimpactingtheirbusinessdynamicsthatarenotcapturedbytheEventRegistrydatasource.

Someaspectsofthedataservicesremainopen.ForweatherdataitisunclearwhichofthepossiblealternativedatasourceslistedinSection4.2isthebest.Itispossiblethatdifferentsourcesmaysuitdifferentbusiness.Also,thekeywordclusteringtoolfortheJOTdatadescribedinSection5.2isstillunder development. Though the methodological details of the approach are clear, its technicalimplementationisstillemergingandwithit itsroleintheEW-Shopptoolkit.Theseissuesremainatopic of ongoing work and dialogue with the business partners and will be revisited in followingdeliverables.

References

[1] D1.1:InteroperabilityRequirementsSpecification

[2] D1.2:Spatial,TemporalandProductDataFormatSpecification

[3] D1.3:Event,WeatherandMultilingualDataServicesSpecification

[4] D2.3:EW-ShoppPlatformEvaluationAssessment

[5] D3.2:EW-ShoppComponentsasaService:Transformation,LinkingandAnalytics

[6] D4.1:BusinessCaseRequirements

[7] D4.2:PilotsDeployment

[8] Peroni,S.,2016,Asimplifiedagilemethodologyforontologydevelopment.InOWL:ExperiencesandDirections–ReasonerEvaluation(pp.55-69).Springer,Cham.

[9] Joulin, A., Grave, E., Bojanowski, P. and Mikolov, T., 2016. Bag of tricks for efficient textclassification.arXivpreprintarXiv:1607.01759.

[10] Grave,E.,Bojanowski,P.,Gupta,P.,Joulin,A.andMikolov,T.,2018.Learningwordvectorsfor157languages.arXivpreprintarXiv:1802.06893.

[11] Arora, S., Liang, Y. and Ma, T., 2016. A simple but tough-to-beat baseline for sentenceembeddings.