d1.3 - event, weather and multilingual data services ......specifications in deliverable d1.2. the...

28
D1.3 - Event, Weather and Multilingual Data Services Specification Deliverable n: 1.3 Date: 21 December 2017 Status: Final Version: 1.2 Authors: Aljaž Košmerlj (JSI), Flavio De Paoli (UNIMIB) Contributors: JSI, UNIMIB Reviewers: Olga Melnyk (ME), Matej Žvan (Browsetel), Matjaž Dolenc (CDE) Distribution: Public Grant n. 732590 - H2020-ICT-2016-2017/H2020-ICT-2016-1

Upload: others

Post on 15-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

D1.3 - Event, Weather andMultilingual Data ServicesSpecification

Deliverablen: 1.3Date: 21December2017Status: FinalVersion: 1.2Authors: AljažKošmerlj(JSI),FlavioDePaoli(UNIMIB)

Contributors: JSI,UNIMIB

Reviewers: OlgaMelnyk(ME),MatejŽvan(Browsetel),MatjažDolenc(CDE) Distribution: Public

Grantn.732590-H2020-ICT-2016-2017/H2020-ICT-2016-1

Page 2: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

2

HistoryofChanges

Version Date Description Revisedby

0.1 24/10/2017 TentativeTableofContents AljažKošmerlj

0.9 19/12/2017 AllcontentexceptforSection4.3 AljažKošmerlj

1.0 20/12/2017 Section4.3onASIA FlavioDePaoli

1.1 21/12/2017 Implemented the comments of internalreviewers

AljažKošmerlj

Page 3: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

3

ExecutiveSummary

Inthisdeliverableweprovidethespecificationfortheevent,weatherandmultilingualdataservices.Theseservicessupplycontextualinformationtothebusinessdataprovidedbytheprojectbusinesspartnersaswellasthecross-linguallinkingofdatasetsindifferentlanguages.TherequirementsfortheservicesbasedonbusinessneedsarepresentedandtheAPIsaddressingthemareoutlined.

The deliverable is based on requirements collected in deliverable D1.1 and data formatspecificationsindeliverableD1.2.ThebusinessrequirementsforthedescribedservicesarebasedonbusinesscasedescriptionsfromdeliverableD4.1.

Page 4: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

4

TableofcontextHistoryofChanges.................................................................................................................................2

ExecutiveSummary...............................................................................................................................3

Listoffigures.........................................................................................................................................6

Listoftables...........................................................................................................................................7

Chapter1 Introduction.......................................................................................................................8

1.1 RelationshiptoOtherDeliverables........................................................................................8

1.2 AbbreviationsandAcronyms.................................................................................................8

1.3 DocumentStructure............................................................................................................10

Chapter2 EventData.......................................................................................................................11

2.1 DataSource..........................................................................................................................11

2.2 ProjectRequirements..........................................................................................................12

2.3 EventDataAPI.....................................................................................................................14

2.3.1 API................................................................................................................................14

2.3.2 Dataformat..................................................................................................................15

Chapter3 WeatherData..................................................................................................................15

3.1 DataSource..........................................................................................................................15

3.2 ProjectRequirements..........................................................................................................15

3.3 WeatherDataAPI................................................................................................................16

3.3.1 API................................................................................................................................17

3.3.2 DataFormat.................................................................................................................17

Chapter4 MultilingualDataLinkingServices...................................................................................18

4.1 ProjectRequirements..........................................................................................................18

4.2 Wikifier................................................................................................................................19

4.3 ASIA......................................................................................................................................20

4.4 XLing....................................................................................................................................21

Chapter5 Conclusion.......................................................................................................................22

References...........................................................................................................................................22

Appendix..............................................................................................................................................23

5.1 EventRegistryDatamodel...................................................................................................23

Page 5: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

5

5.2 WeatherDataAttributes.....................................................................................................26

5.3 WikifierAnnotationFormat.................................................................................................27

Page 6: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

6

Listoffigures

FIGURE1:EVENTREGISTRY(HTTP://EVENTREGISTRY.ORG/)ONLINEGRAPHICALUSERINTERFACEDISPLAYINGTHEADVANCEDSEARCHOPTIONS.........................................................................................................................................................12

Page 7: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

7

Listoftables

TABLE1.ABBREVIATIONSANDACRONYMS.......................................................................................................................8TABLE2.SHORTREFERENCESFORPROJECTPARTNERS........................................................................................................9TABLE3:SUMMARYOFTHETOOLSINTHEEW-SHOPPPLATFORM......................................................................................10TABLE4:LISTOFWEATHERATTRIBUTESSUPPORTEDBYTHEEW-SHOPPWEATHERAPI..........................................................26

Page 8: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

8

Chapter1 IntroductionOne of themain aims of the EW-Shopp project is to contextualize business data with social andenvironmentalfactors.Thesearerepresentedaseventsreportedintheglobalmediaandweather.These factors both intuitively and in the experience of project business partners have a stronginfluenceon consumerbehaviour. Analysing these influences for insights andusing them tobuildpredictivemodelscanofferacompetitiveadvantageonthemarket.

Modernbusiness also operate on several differentmarkets and commonly handle data in severaldifferentlanguages.Toenableefficientmanagementandintegrationofsuchmultilingualdataasetoftoolsisneededforcross-linguallinguallinkingandannotation.

Thisdeliverableprovidesthespecificationsforaccessingthecontextualdatasources.EventRegistry,aplatformforreal-timeglobalmediamonitoringandanalysis,isusedasadatasourceforeventdataandthedatasourceforweatherdataistheoperationalarchiveoftheEuropeanCentreforMedium-Range Weather Forecasts. Specification for the cross-lingual annotation services developed byproject technical partners is also provided to address the challenges of managing multilingualdatasets.

Whenever possible, the specifications are presented using existing state of the service APIs andexamplesof theiruse.Mostof the servicesandAPIsdescribed in thisdocumentare still inactivedevelopmentandmayevolveinthecomingmonths.Developmentanddeploymentofprojectpilotservices is especially likely to have an impact. All subsequent changes to these services will bereportedinfollow-updeliverables.

1.1 RelationshiptoOtherDeliverables

ThisdeliverablespecifiesdataservicesusedintheEW-Shoppproject.ThedataformatsusedbytheservicesarespecifiedbydeliverableD1.2[2]andfollowtheinteroperabilityrequirementsspecifiedindeliverableD1.1[1].ThefunctionalitiesofthedescribeddataservicesweredesignedfollowingthebusinesscaserequirementsandpilotspecificationsfromdeliverableD4.1[3].

1.2 AbbreviationsandAcronyms

AbbreviationsandacronymsusedinthedocumentareexplainedinTable1.

Table1.Abbreviationsandacronyms

Abbreviation Description

API ApplicationProgrammingInterface

Page 9: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

9

Abbreviation Description

BC BusinessCase

CSV CommaSeparatedValues

EAN EuropeanArticleNumber

EC EuropeanCommission

ECMWF EuropeanCentreforMedium-RangeWeatherForecast

EU EuropeanUnion

FTP FileTransferProtocol

SFTP SecureFileTransferProtocol

HTTP HypertextTransferProtocol

ID Identifier

JSON JavaScriptObjectNotation

KG KnowledgeGraph

LOD LinkedOpenData

RDF ResourceDescriptionFramework

REST RepresentationStateTransferwebservices

URI UniformResourceIdentifier

URL UniformResourceLocator

Table2showstheprojectpartnersalongwiththeirshortreferencesforeasiermentionsthroughoutthedocument.

Table2.Shortreferencesforprojectpartners

No. Beneficiary(partner)nameasin[GA] Shortreference

1 UNIVERSITÀDEGLISTUDIDIMILANO-BICOCCA UNIMIB

2 CENEJEDRUZBAZATRGOVINOINPOSLOVNOSVETOVANJEDOO CE

3 BROWSETEL(UK)LIMITED BT

4 GfKEURISKOSRL GfK

5 BIGBANG,TRGOVINAINSTORITVE,DOO BB

6 MEASURENCELIMITED ME

Page 10: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

10

7 JOTINTERNETMEDIAESPAÑASL JOT

8 ENGINEERING–INGEGNERIAINFORMATICASPA ENG

9 STIFTELSENSINTEF SINTEF

10 INSTITUTJOZEFSTEFAN JSI

Finally Table 3 contains a summary of the tools and components which comprise the EW-Shoppplatform.

Table3:SummaryofthetoolsintheEW-Shoppplatform.

ComponentName Shortdescription

DataGraft DataGraftisacloud-basedplatformfordatahostingandinteractivedatatransformations.IntheplatformithastheroleofthedatawranglercomponenttogetherwithGrafterizer,itsdatatransformationinterface.ItisdevelopedandmaintainedbySINTEF.

ASIA Atoolforthesemanticenrichmentofdataavailableintabularformats.ItissupportedbyABSTAT,atooltoprofileknowledgegraphsrepresentedinRDFbasedonlinkeddatasummarizationmechanisms.ItisincludedasaplugininDataGraftandisdevelopedandmaintainedbyUNIMIB.

QMiner QMinerisadataanalyticsplatformforprocessinglarge-scalereal-timestreamscontainingstructuredandunstructureddata.Intheplatformithastheroleofthedataanalysercomponent.ItisdevelopedandmaintainedbyJSI.

Knowage Knowageisabusinessintelligencesuitewithstrongsupportforproducinghigh-qualityreportsofthetransformed,enrichedandanalyzedinformationobtainedfromtheplatform.Intheplatformithastheroleofthedatareportingcomponent.ItisdevelopedandmaintainedbyENG.

1.3 DocumentStructure

Thedocumentisorganizedasfollows.Chapter1hasintroducedthetopicofthedeliverable,namelythe specification of event, weather and multilingual data services and has placed this documentamongitsrelateddeliverables.InChapter2eventdataservicebasedontheEventRegistryplatformisspecified.Chapter3describestheweatherdataserviceandtheAPIdevelopedforitsuse.FinallyinChapter4thethreetoolscomprisingthemultilingualdataserviceareoutlined.

Page 11: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

11

Chapter2 EventDataInordertomeasuretheeffectofsocialcontextonconsumerbehaviour,areliablesourceofeventsfromtheentireworldisneeded.InEW-ShoppthisroleisfulfilledbyEventRegistry1,aplatformforreal-timemonitoringofglobalnews.ThischapterdescribesEventRegistry,outlinestheeventdatarequirementsintheprojectandspecifiestheeventdataAPIforusewithinEW-Shopp.

2.1 DataSource

Event Registry is a platform for real-time monitoring and analysis of global news. Its news feedservicecollectsonaverage175,000newsarticlesdailyfromover26,000newssources.Thecollectedarticlescomefromallovertheworldandare in15different languages.Thesources includemajorglobalnewsoutletssuchasCNNorBBC,internationalnewsagenciessuchasReutersorAssociatedPressaswellassmallerlocalnewspublishers.Allthenewsitemsarecollectedfromthepublishers’RSSfeedsandarestoredtogetherintheEventRegistrydatabase.EventRegistryhasbeenrunningcontinuously since December 2013 and has amassed over 209million articles and over 7millionevents.

All articles are processed with a lexical and semantic analysis pipeline and are then clusteredtogetheraccordingtotheircontent.Allarticlesinaclusterdiscussthesamereal-worldevent.Inthisdocument we use the term ‘event’ interchangeably for the occurrence and the article clusterdescribingit.

Semantic processing for in EventRegistry includes semantic annotationusing theWikifier service,which is described in Section 4.2. This means entity mentions in all articles are linked to theirconcepts denoted by the URLs of the Wikipedia pages describing them. All articles are alsocategorizedintoathreelevelcategorytaxonomywhichconsistsofthetopthreelevelsoftheDMOZtaxonomy2.Conceptdataisaggregatedforeventsandcanbefurtheraggregatedforanysetofgiveneventsduringanalysis.

Foralleventsthedateoftheeventandthelocationoftheeventaredetermined.Notethatthedateoftheeventisnotnecessarilythesameasthedatewhenthearticleswerepublished.EventRegistryanalysesthedatesmentionedinarticlecontentanddeterminesthemostlikelydatewhentheeventdiscussedinthearticlehasoccurred.Onlyifnoneofthementionsisreliablydeemedtobecorrectisthe median publishing date of the articles in the event taken. A similar approach with locationmentionsistakenwhendeterminingtheeventlocationwiththedifferencethatthelocationis leftemptyifitcannotbedeterminedinthecontent.Thelocationcanbeeitheracityand/oracountryand isrepresentedusing itsGeoNames3URI.Detailsabouttherepresentationof locationdatacanbefoundindeliverableD1.2[2].

1http://eventregistry.org/2http://dmoztools.net/3http://www.geonames.org/

Page 12: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

12

Figure1:EventRegistry(http://eventregistry.org/)onlinegraphicaluserinterfacedisplayingtheadvancedsearchoptions.

Allthelistedeventinformationcanbeusedtoqueryforeventsandarticles inEventRegistry.Thiscan be performed using the online graphical interface (shown in Figure 1)with a number of richvisualization options. An alternative is through the Event Registry official API for Python4 andJavaScript5. Both APIs support querying events and articles through any of their properties andobtainingtheresultinJSONformat.

2.2 ProjectRequirements

Event data provides the social context for the business process we are analysing in the project.Examplesofthisbasedonexperiencefromtheprojectbusinesspartnersare:

4https://github.com/EventRegistry/event-registry-python5https://github.com/EventRegistry/event-registry-node-js

Page 13: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

13

• After highly publicised product releases from major electronics companies (e.g. Apple,Samsung,Huawei)hereisincreasedinterestinaproductreleasedandthebrandtowhichitbelongs.Thereisaspikeinsalesoncetheproductactuallyreachesthestores.

• Certainmajorsportseventshaveapositiveeffectonsalesofselectitems.Forexample,theworldchampionshipinfootballisaccompaniedbyanincreaseinsalesoftelevisionsets.

• On theother hand, popular sports events can have a negative effect onweb traffic.Highprofile games attract people into stadiums or in front of TVs and away from active webbrowsing. This can have significant impact on the level of impressions and click-throughratesofonlinemarketingcampaigns.Thiseffectcanbeevenmorepronouncedifthe localteamisplaying.

Based on such experience the business partners have formed the business case requirementsdescribedindeliverableD4.1[3].Fromthosewecandesigntheeventdatarequirementswhichcanbesummarizedintothefollowinglist.Theeventdataserviceneedstosupport:

1.) queryingforeventsfromagiventimerange;2.) queryingforeventsfromagivengeographicalarea;3.) queryingforeventsfromagivencategory;4.) queryingforeventsrelatedtoagivenconceptorkeyword;5.) aggregationofeventdataforasetofevents;6.) anycombinationofpreviousrequirements.

Theexactmechanismof integratingeventdatawithotherdatasetsbasedontemporalandspatialinformationiscoveredindeliverableD1.2[2].Precisematchingmaynotalwaysbepossibleforbothtemporalandspatialmatches.Ifnothingelse,eventtimeisonlydeterminedonthelevelofdateandnotthehourandeventlocationisdeterminedatmostatcitylevel.It isworthytonotethatthisisnot an issue.Most relevant events, suchas those fromexamples in thebeginningof this section,haveaneffectwithawidescope.Bythiswemeantheireffectlastsforsomesignificanttimeperiodandaffectsawidergeographicalregion.Thusitwillbecapturedevenwithamorecoarsematchingapproach.

This requirements listwasbuiltbefore thedeploymentofpilotsand isbasedonunderstandingofthe business cases by the business partners and the expertise of the technical partners. Therequirementsmayevolveovertimeasthisunderstandingdeepensandthebusinesshypothesesaretestedinthepilots.

Finally,itisimportanttonotethatEventRegistryonlylistseventsreportedinthemedia.Therearetypes of events with influence on consumers that are not commonly covered by news outlets.Occurrencesperhaps thought toobanal tobenewsworthy, suchas thedatewhencarsare legallyobliged to switch towinter tires in countrieswith coldwinters.Alsonot included are any kindofcompanyinternaleventssuchasmarketingcampaigns,serviceupdates,competitionsforprizesetc.Shouldbusinesspartnershaveaccesstodataregardinganysuchevents,theyhavetobeintegratedthroughtheplatformasaseparatedatasource.Effortswillbemadeduringdevelopmentofpilotstobuildadatamodelforsuchcustomeventsthatwillcovertheneedsofallbusinesscases.

Page 14: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

14

2.3 EventDataAPI

AfterreviewoftherequirementslistedinSection2.2,weconcludethattheEventRegistryAPIfulfilsallofthemandnomajorextension isneededatthispoint. It is likelysomeextensionwillbecomenecessaryduringdevelopmentofthepilots.Especiallyonstreamlinedfilteringofeventsrelevanttopilotdomains.Sincetheextentandnatureofthoseisunclearatthistime,theywillbereportedinfollow-updocumentation.ThedetailsoftheAPIarepresentedinthissection.

2.3.1 API

Event Registry has an API for Python and JavaScript. Though the language is different, both APIsfollowthesamestructureand in thebackgrounduse thesameRESTAPI toperformall theonlinequeries. It is also possible to use the RESTAPI directly by crafting theHTTP requests themselves.Howeverthatwould involvea lotmoretechnicalworkandoffernoadvantage.SincethetwoAPIsaresocloselyrelated,weusesingularwhenreferringtoitintherestofthetextandthedescriptionholdsforboth.

TheAPIusesJSONastheformatforboththerequestssenttotheEventRegistryserviceaswellasthe event data returned. The request object holds the query parameters such as the event timerangeorlocation,articlelanguageetc.TherequestobjectisthentransmittedtotheEventRegistryserverwhichcomputestheresponseandreturnsthedata.AdetailedoverviewoftheAPIfunctionssurpasses the scope of this deliverable and can be found online6. The main functionality can besummarizedintothefollowingoperations:

• Searchingforevents/articles–InthisoperationEventRegistryisqueriedforalistofeventsthat fallwithin theboundsof the requestparameters suchas the time range, categoryorrelevancetosomeconcept.Thedatareturnedisthelistofeventsorarticleswithalimitedamount of basic data fields such as the event or article title. Besides these, aggregatedinformation over all the events or articles can be requested such as an aggregation ofrelevantconceptsorcategoriesoveralleventsorarticlesreturnedbythequeries.

• Obtaining information about events/articles – This operation is used to obtain detailedinformationaboutasetofgivenEventRegistryeventsorarticles.This listmostcommonlycomesfromapreviouslyexecutedsearchoperation.AllthedetailedinformationthatEventRegistryholdsaboutspecificeventsorarticlescanbeobtainedusingthisoperation.

• Matchingkeywordstoappropriateconcepts–EventRegistrysupportsqueriesusingnormalkeywords,similartoanywebsearchengine.Howeversearchingusingconceptannotationispreferred, since it takes advantage of the entity disambiguation performed duringannotation.Thisoperationobtainstheappropriateconcepts(representedbytheirWikipediaURIs)forgivenkeywords.

EventRegistry offersmore functionality than covered in this overview, such as information aboutlong-term trends in events or about the sharing of articles in social media. However we do not

6https://github.com/EventRegistry/

Page 15: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

15

foreseeanydirectuseforthisdataatthispoint.Shouldtheyproverelevant infuture,wewilladdthemtotheprojectdataandreportthisinfuturedocumentation.

2.3.2 Dataformat

Asmentionedbefore,theformatofthereturneddataisJSON.ThefulldatamodelisavailableintheEventRegistryonlinedocumentation7andcontainsthedatamodelforEventRegistryrepresentationofarticle,event,category,conceptandnewssource.AsubsetofthedatafieldsforaselectionofthemostrelevantentitiesispresentedintheappendixinSection5.1.

Chapter3 WeatherData

While theevents represent thesocial contextofconsumerbehaviour, theweather represents theenvironmentalcontext.ToobtainhighqualityweatherdatatheEW-Shoppprojecthasanagreementwith the European Centre for Medium-Range Weather Forecasts8 (ECMWF). ECMWF is anindependentintergovernmentalorganisationsupportedby34statesandisbotharesearchinstituteanda24/7operationalweatheranalysisandforecastservice.ThischapterpresentstheECMWFdataservices,outlinestheweatherdatarequirements intheprojectandspecifiestheweatherdataAPIdevelopedforEW-Shopp.

3.1 DataSource

ECMWFisoneoftheleadingmeteorologicalinstitutionsintheworld.Ithasbeenestablishedin1975and today joins experts and resources from 34 supporting countries. Besides being a researchinstituteitisanoperationalweatheranalysisandforecastserviceprovidingweatherforecastdatatoitsmemberstates.Thissupportsnationalweatherservicesofmembercountrieswhichusethisdatatopreparetheweatherforecaststheircitizensfollowinthemedia.

Asapartoftheiroperationalactivities,ECMWFusesthedataofweathermeasurementfromallthemember states and partner institutions to compute themodel of the full state of globalweathertwice per day with a forecast for the following ten days. Theweather state and forecast data isstored in their Meteorological Archival and Retrieval System9 (MARS) archive. EW-Shopp hasobtainedfullaccesstothisarchivespanningbackover30years.

3.2 ProjectRequirements

7https://github.com/EventRegistry/event-registry-python/wiki/Data-models8https://www.ecmwf.int/9https://www.ecmwf.int/en/faq/what-mars

Page 16: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

16

Weatherhasastronginfluenceoverhumanbehaviour. Itcancausephysicaldiscomfortwithgreatheat,limitourmovementwithastrongdownpourorevendamageourpropertywithhighwindsorhail.Ofcoursethisextendstoconsumerbehaviourasillustratedbythefollowingexamplesbasedonthebusinesspartnerexperience:

• Warmsunnydaysmeanadecreaseinwebtrafficasmorepeopleareoutdoorsandnotusingtheirdevicestobrowsetheweb.

• Thefirstwetseasoninautumntriggersasurgeinsalesofclothesdryers,ashangingclothesouttodrybecomessignificantlylesseffective.

• Badweatherconditionssuchasstrongrainandhighwindsresultinlessfoottrafficinfrontofstoresaspeopletakerefugefromtheelements.

Similarly as for events data we can form requirements for weather data from the business caserequirementsdescribedindeliverableD4.1[3].Theweatherdataserviceneedstosupport:

1.) queryingforbothactualweatherstateorweatherforecastmadeatsomegiventime;2.) queryingforweatherdatafromagiventimerange;3.) queryingforweatherdatafromagivengeographicalarea;4.) aggregatingweatherdataoversometimerangeorgeographicalarea;aggregationincludes

computingtheminimum,maximumoraveragevalues.

Aswitheventdatarequirements,weatherdatarequirementsmaychangeduringthecourseoftheproject as deeper insights into the effects ofweather on consumers are discovered. All potentialextensionswillbedocumentedinfuturedeliverables.

3.3 WeatherDataAPI

MARScanbeaccessedviaRESTAPIusingasimplequerylanguage.Forexample:

retrieve, class = od, type = an, expver = 1, date = 19990215, time = 12, param = t, levtype = pressure level, levelist = 1000/850/700/500, target = "t.grb"

This request retrieves 1000, 850, 700 and500hPa temperatures from the 15thof February 199912:00 UTC Analysis. The request syntax is complex and requires a lot of knowledge aboutMARSinternaldatastructure.AnofficialPythonwrapperaroundthisAPI isavailable,which improvesonthisbutisunfortunatelystillindevelopmentandoflimiteduse.Bothapproachesalsodownloadthedata in the GRIB (GRIdded Binary) format commonly used in meteorology. This format, thoughefficientwithrespecttodiskspace,isnoteasytodecodeandneedstobetransformedforpracticaluse.ToalleviatetheseissuesJSIdevelopedadedicatedwrapperforuseinEW-Shopp.

Page 17: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

17

3.3.1 API

TheEW-ShoppweatherAPIdevelopedbyJSI is freelyavailableontheweb10,butdatacanonlybeobtainedwithanappropriateAPIkeyprovidedbytheECMWF.JSIisinchargeofdistributingtheAPIkeyamongtheprojectpartners.TheAPIiswritteninPythonandusesacombinationoftheofficialPythonAPIanddirectcompilationofMARSrequestsinthebackground.

Since the API is dedicated for use in the EW-Shopp project it has been focused to only returnweather attributes relevant to the project. The MARS archive has a rich selection of over 100meteorological attributes to choose from, however most of them are relevant for analysis ofmeteorological phenomena and not relevant to the project aims. 15 relevant attributes weremanually selected for inclusion into the project weather data. The full list with descriptions isincludedintheappendixinSection5.2.

The API is designed around two main objects: WeatherApi and WeatherExtractor. WeatherApihandlesdataacquisitionfromtheMARSarchive. Itbuildsarequest,transmits ittotheMARSdataserviceanddownloadstherequesteddata.Therequestcanspecifytheparameterslistedbelow.

• date(range):Thedateorthestart/enddatesofthedataset.

• time:Selection of the time of day when the data and forecasts are computed. Either00:00:00or12:00:00.

• step:Time(s) in hours from the time of computation (value of the ‘time’ parameter) forwhich forecast data is returned. Values can be in [0,1,2,...,89] u [90,93,96,...,141] u[144,150,156,...,240].

• area:Thelatitude/longitudeboundariesoftheareaforwhichthedataisreturned.

• resolution:Thedataiscomputedoveragrid.Thisparameterspecifiesthelatitude/longituderesolution forwhich the data is returned. The values are in degrees, for example: [0.125,0.125]forroughly15kmx15kmgrid.

ThedataisdownloadedinGRIBformatandstoredintoafileondisk.Extractionandmanagementofthedata intheGRIBfile ishandledbytheWeatherExtractorobject,which isdescribed inthenextsection.

3.3.2 DataFormat

As described in the previous section,weather data is downloaded by the API to disk in theGRIBformat.TheWeatherExtractorobjectloadsthedatafromtheGRIBfile,transformsitintoapandas11DataFrame which serves as internal representation of the data and performs filtering andaggregationoperations.Pandasisanopen-sourcePythondataanalysislibraryandDataFrameisitsmaindatastructure.Dueto itsgenerality, this format iswellsuitedfor integrationandusebythe

10https://github.com/JozefStefanInstitute/weather-data11https://pandas.pydata.org/

Page 18: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

18

projectbusinesspartners.Fromit thedatacaneasilybetransformed intoanumberofotherdataformatssuchasJSON,SQL,csvetc.

Several aggregation operations are also available, where aggregation can be computation ofminimum,maximumoraveragevalue.Thedatacanbeaggregatedovertimeandgeographicalarea.For timeanhour,dayorweek rangecanbe specified. Forareaanareawith specific latitudeandlongitude boundaries can be specified or a set of target points can be given. In the latter caseaggregatedvaluesarecomputedforthegiventargetpoints.Eachgriddatapointisaggregatedonlyto the target point closest to it. The results of all aggregation operations are again internallyrepresented as a pandas DataFrame and can be transformed into some other format for use oroutput.

Note that business partners may be aware of weather-related events that may have specialmeaning.Forexamplethefirstsnowofwintercommonlycausesamuchlargerdisruptionthanwhenit snows later.Theymayalsohaverecordsofextremeweathereventssuchasstronghailor frostwhich impacted their services significantly.Any such records area complementarydata source totheweather data servicewhich holdsmeasurements, not discrete events. These special weathereventswillbeintegratedviathesamecustomeventdatamodelmentionedinthelastparagraphofChapter2.

Chapter4 MultilingualDataLinkingServices

Project business partners operate over diverse international geographical areas. As consequence,thedatatheyhandleismultilingual.Inordertoenableinteroperabilityofsuchdataandenabletheuseof insights fromareaswithone language inareaswithanother,asetof tools forcross-lingualdata linking must be provided in the project. This section presents the requirements for theseservices in theprojectand three toolsproviding theneeded functionality– theWikifier,ASIAandXLing.

4.1 ProjectRequirements

Textualdataintheprojectcomesintwomainforms:

• free text – such as news articles, product descriptions and documentation, marketingmaterials;

• tabulardata–mostdataprovidedbybusinesspartnersisintablescontainingtextualpiecesof data such as the product name, brand,manufacturer, colour and other non-numericalproperties.

Page 19: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

19

Forbothforms,mostofthedatacomingintotheprojectisunprocessedandunlinked.Suchrawtextisnotwellsuitedformachineprocessing.Thoughdirectstringmatchingtechniquescanbeusedtosupportcertainfunctionality,ofcourse,thesedonotworkinacross-lingualsetting.

Indiscussionwithbusinesspartners, the threemain functionalitiesof themultilingualdata linkingserviceslistedbelowwereidentified.

• annotation of conceptmentions in free text – In text data analysis this task is known assemanticannotation.Thepurpose istofindmentionsofconcepts,disambiguatethemandlink them to some reference knowledge base. In the scope of the EW-Shopp this, forexample,meansidentifyingmentionsofmanufacturers,brandsandtechnologiesinfreetextproductdescriptionsormentionsofproductsinnewsarticletext.

• linkingtabulardatatoconcepts–Mostbusinessdata(perhapsmostofalldata)isstoredintabularform.Recordidsorsomeotheridentifyingvaluesaretypicallyusedtolinktablesofdatabetweeneachother(keysinrelationaldatabasesareanexampleofthis).Thisapproachishighlyefficientbutdependsontheexistenceofsuchmatchingvalues.Incaseswhentheyareunavailablesuchanapproachisimpossible.Itiscommontohavereferencedata(e.g.acuratedrepositoryofproductdata)andadatasetwewanttolinktothisreferencedata(e.g.a catalogue of products available on a web store) that share no identifiers. We need amethodthatisabletolinktworecordsfromthesetwodatasetsbasedonthesemanticsoftheir values. An example of this would be tomatch products from the Cenejeweb storecatalogue to GfK product reference data based on product properties since they do notshareacommonsetofproductids.

• measuringsemanticrelatednessoffreetexts–Alargeamountofinformationisstillstoredintheformofunstructuredtext,suchasproductdescriptions,marketingmaterialsandnewsitems.Humanseasilyparsesuchinformationandcanidentifythesametypeofproduct(e.g.televisionsorrefrigerators)oridenticalproducts(e.g.theverysamemodeloftoaster)fromdifferentdescriptions.Ceneje,asacomparisonshoppingengine,regularlyneedtosolvethisproblem when matching product descriptions from different stores on their website.Therefore,amethodforautomaticallymeasuringsuchsemanticrelatednessbetweenpiecesoftextisneeded.

All three functionsmustwork in a cross-lingual setting. The following sections present tools eachansweringoneofthethreelistedaboverequirements,correspondingly.

4.2 Wikifier

The Wikifier12 is a semantic annotation service developed and maintained by JSI. It performs asubtypeofsemanticannotationcalledwikification.ThisnamestemsfromitsuseofWikipediaasareferencedatabaseandthesourceofconceptidentifiers.Inthissetting,theWikipediaistreatedasalargeandfairlygeneral-purposeontology:eachpageisthoughtofasrepresentingaconcept,whiletherelationsbetweenconceptsarerepresentedbyinternalhyperlinksbetweendifferentWikipedia

12http://wikifier.org/

Page 20: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

20

pages, as well as by Wikipedia’s category memberships and cross-language links. Wikifier is thesemanticannotationtoolusedtocomputearticleandeventconceptsintheEventRegistry.

Thetaskofperformingwikificationonaninputdocumentcanbebrokendownintoseveralcloselyinterrelatedsubtasks:

1.) identifyphrases(orwords)intheinputdocumentthatmayrefertoaconcept;

2.) performdisambiguation-determinewhichconceptistheonethataphraseisreferringto;

3.) determinewhichconceptsarerelevantenoughtotheentiredocument,sothattheyshouldbeincludedintheoutputofthesystem.

TheWikifierusesanapproachsimilartotheonedescribedin[4]basedonthePageRankalgorithm[5]toperformsteps2.and3.Adetailedtechnicaldescriptionisoutsidethescopeofthisdocument.Inlayman’stermsitselectsthoseconceptsfromtheannotationcandidatesthatare“close”toeachother in Wikipedia (i.e. following internal hyperlinks) and are relevant to the general documentcontent.Theresultisalistofallconceptsmentionedinthedocumentalongwiththeexactlocationof their mentions. Each annotation also has an estimated relevance score which can be used tocontrolthelevelofannotationuncertainty.

The Wikifier service is maintained by JSI and is available via a REST API. Full documentation isavailableonline13.Itcanannotatedocumentsinanyofthetop100languagessortedbythesizeoftheir respectiveWikipediacorpora.AdocumentcanbesubmittedviaaHTTPGEToPOST requestandaJSONresponseisreturnedwiththeannotatedresult.SomeexamplesoftheWikifierresponsestructurearepresentedintheAppendixinSection5.3.

4.3 ASIA

ASIA is a tool for the semantic enrichment of data available in tabular formats, developed byUNIMIB.Joiningtabulardatathatdoesnotusethesamerecordidsorsomeotheridentifyingvaluesisnotstraightforward,duetotheunavailabilityofdirectjoiningpoints(i.e.equalvalues).Addressingthisissuerequirestocreatelinksfromtablevaluestoasharedsystemofidentifiers,whichallowstobridgethegapbetweendata.

ASIAaimstohelpusersincreatingtheselinks,byimplementingsemanticreconciliationalgorithmstoperformtheentitylinkingontables,thatislinkingtablevaluestosomeexternalreferencedata.Particularly,theentitylinkingisperformedattwodifferentlevels:

• Schema-level linking: linking table schema values (i.e. the header of a table) to sharedvocabulariesandontologies;

• Instance-levellinking:linkingdatavaluestosharedsystemsofidentifiers.

Even if the instance-level linking might be enough in order to enable the enrichment, also theschema-level linking is expected to be considered, since it is helpful in addressing some issues

13http://wikifier.org/info.html

Page 21: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

21

related to the instance-level linking (e.g. schema-level linking results are used to apply blockingtechniqueswhileperformingtheinstance-levellinking).

Schema-level and instance-level links are created by ASIA asannotations for the table. Users cancreate schema-level annotation through the ASIA interface, by validating ASIA suggestions aboutclasses and properties – defined in several ontologies and vocabularies – to be used. If a userspecifiesadifferentclass(orproperty),ASIAsuggestsclasses(orproperties)thatsyntacticallymatchtheuserinput(autocompletefunctionality).Otherwise,theinstance-levelannotationsareexpectedto be created by ASIA automatically, because of the higher dimension of the data values, whichcannotbevalidatedone-by-one.

TableannotationsenabletwodifferentfunctionalitiesofASIA:

• Generationofknowledgegraphs(KGs)fromatabulardataset:theschema-levelannotationsare transformed intoexecutabledata transformationtopublish tabulardataasaKG;datavalueswillbeusedtocreatenewinstancesandpopulatethegraph.

• Enrichment of tabular data with third-party data: instance-level annotations (along withschema-levelannotations,whenneeded),whichlinktablevaluestoreferenceKGs,areusedtofacilitateenrichmentofbusinessdatawithdatafromthesereferenceKGs(e.g.,thelinktoaproduct intheproductsdatasetsallowstoretrievealsotheproductbrand,stored inthesamedataset)orfromthird-partydatabyusinglinksasbridges(e.g.,thelinktoaGeoNameslocation can be used to retrieve the GeoNames location identifier, that is required forretrieving events from the Event Registry). ASIA supports this key enrichment process byproviding data enrichment widgets, which exploit links to reference KGs to ease theextraction of additional data from third-party sources and their fusion into the originaltabulardata.

TheASIA interface is developedas componentofGrafterizer, a tool for tabulardata cleaningandtransformation,developedatSINTEF.TheASIABackendisdevelopedandmaintainedbyUNIMIB.AlllinkingservicesimplementedintheASIABackendaremadeaccessibleviaRESTAPIs.

The current versionofASIAprovidesonly the schema-level linking suggestions,basedonABSTAT.This service can be invoked via a HTTP GET request, passing some parameters (string to beautocompleted,typeofrequestedsuggestion–classorproperty,numberofresultstobereturned,andsoon).TheservicereturnsaJSONwithallclassesorpropertiesthataresyntacticallysimilartothestringpassedasparameter.

4.4 XLing

XLing14 is a cross-lingual document semantic similaritymeasuring service and is one of the inputsused for article clustering in Event Registry. It receives two documents as input and computes asimilarityscoreasanumberinthe[0,1]range.AsWikifieritworksonthetop100languagessortedbythesizeoftheirrespectiveWikipediacorpora.

14http://xling.ijs.si

Page 22: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

22

The approach used to compute the similarity score is based on canonical correlation analysis. Itavoids direct translation of the documents and rather uses a statistical hub language to connectthem.ThehublanguageiscomputedusingWikipediaarticlesasanalignedmultilingualtextcorpus.Detailsoftheapproachcanbefoundin[6].

Theservice isavailableviaasimpleRESTAPI.Thetwodocumentsaresubmitted inapostrequestwith their respective languages (i.e. their ISO693-1 codes). The similarity score is returned in theHTTPresponse.

Chapter5 ConclusionThisdocumentpresentedthespecificationofevent,weatherandmultilingualdataservicesfortheEW-Shopp platform. Data sources and tools for each data type were presented with theirrequirements and APIs. The requirements were based on descriptions of business partners’workflowsandspecificationsoftheirpilotservicesindeliverableD4.1.Extensionstothetoolswereplannedanddevelopedwherenecessary,howeverother issuesandneedswill likelybediscoveredduringdeploymentofthepilots.

Onesuchimportantfactoristheworkloadresilienceandresponsivenessoftheservices.Theserviceswere developed with performance in mind, however once the platform is tested on real datastreams,more optimizationsmight prove to be necessary. Caching of queried data and improvedsupport for bulkdata retrieval (i.e.multiplequeriesper request) are examplesof techniques thatcouldbeusedtoaddresssuchissuesiftheyarise.

APIsecurityisalsoanopenissue.Mostoftheserviceslistedinthedocumenthavebeendevelopedinanacademicsettingandarevery lenient inthatrespect.Astheygetplugged intoacommercialplatformthedangersofattacksandexploitswilllikelyriseseverely.Thereareongoingdiscussionsintheprojectonaddressingthissatisfactorily.

It is clear the services will most assuredly continue to evolve during the project. All subsequentchangeswillbereportedinfuturedeliverables.

References

[1] D1.1:InteroperabilityRequirementsSpecification

[2] D1.2:Spatial,temporalandproductdataformatspecification

[3] D4.1:BusinessCaseRequirements

[4] Zhang, L., Rettinger, A., 2014. Final ontological word-sense-disambiguation prototype.DeliverableD3.2.3,xLikeProject.

Page 23: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

23

[5] Page, L., Brin, S., Motwani, R. and Winograd, T., 1999. The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.

[6] Rupnik, J., Muhic, A., Leban, G., Skraba, P., Fortuna, B. and Grobelnik, M., 2016. News across languages-cross-lingual document similarity and event tracking. Journal of Artificial Intelligence Research, 55, pp.283-316.

Appendix

5.1 EventRegistryDatamodel

We present selected elements from the Event Registry Data model using example data. FurtherdetailscanbestudiedintheEventRegistryonlinedocumentation15.

Article

{//article'sURI(newsfeedid)"uri":"143701955",//weburl"url":"http://www.Newsmax.com/Newsfront/obama-staff-veterans-revamp/2013/12/18/id/542478",//article'stitle"title":"DesperateObamaTriestoResetAgendawithNewStaff",//article'sfullbody"body":"HighlightthelinkandpressCTRL/Command+Ctocopythelinktoyourclipboard.\nAsPhilSchiliroarrivedathisfirstmeetinglast...",//dateandtimeofpublishing"date":"2013-12-18","time":"11:40:00","datetime":"2013-12-18T11:40:00Z",//eventURItowhichthearticleisassignedto(ifany)"eventUri":"20588","source":{//detailsaboutthenewssource(seeSourcedatamodel)},"categories":[//listofcategories(seeCategorydatamodel)],"concepts":[//listofconcepts(seeConceptdatamodel)],//datesthatwereextractedfromthearticle"extractedDates":[{"amb":false,//ambiguous?"date":"2013-12-03",//normalizeddate"dateEnd":"2013-12-08","detectedDate":"Dec.3-8",//detectedstring"imp":true,//wastheyearvalueimputed?"posInText":6164,//locationintext"textSnippet":"ublicanattacks.ADec.3-8pollof86competit"

15https://github.com/EventRegistry/event-registry-python/wiki/Data-models

Page 24: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

24

},//remaininglistofextracteddates],"id":"565",//internalERid-donotuse!"lang":"eng",//languageofthearticle"location":null//wasthereexplicitlocationextractedfromdateline?}

Event

{//eventURI"uri":"3403979",//totalarticlesreportingabouttheevent"totalArticleCount":100,//articlesperlanguage"articleCounts":{"deu":82,"eng":18},"concepts":[//listofconcepts(seeConceptdatamodel)],"categories":[//listofcategories(seeCategorydatamodel)],//eventtitleinavailablelanguages"title":{"deu":"ObamakommtzurEr\u00f6ffnungderHannoverMesse","eng":"WhiteHousesaysObamawillmake5thvisittoGermany,takeintradeshow"},//eventsummaryinavailablelanguages"summary":{"deu":"Hannover(dpa)US-Pr\u00e4sidentObamakommt2016wiedernachDeutschland:InHannoverer\u00f6ffneterdieweltgr\u00f6\u00dfteIndustrieschau.DieSicherheitsma\u00dfnahmenwerdensch\u00e4rferseinals2013.…","eng":"HONOLULU,Hawaii-TheWhiteHousesaysPresidentBarackObamawilltraveltoGermanyinlateApriltoattendtheworld'slargesttradeshowforindustrialtechnology…"},//whichdateshavebeenfrequentlyfoundinarticlesaboutthisevent"commonDates":[{"date":"2016-04-24","freq":11},//remainingcommondates],//whentheeventhappened"eventDate":"2016-04-24",//howmuchimpactonsocialmediadidarticlesabouttheeventget"socialScore":91.4,//wheredidtheeventhappen"location":{"country":{"area":357021,"code2":"DE","code3":"DEU","continent":"Europe","geoNamesId":"2921044","label":{"eng":"Germany","spa":"Alemania"},

Page 25: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

25

"lat":51.5,"long":10.5,"type":"country","wikiUri":"http://en.wikipedia.org/wiki/Germany"},"geoNamesId":"2910831","label":{"eng":"Hanover","spa":"Hannover"},"lat":52.37052,"long":9.73322,"population":515140,"type":"place","wikiUri":"http://en.wikipedia.org/wiki/Hanover"},//ifeventisprovidedasaresultofaquery,wgtrepresentrelevancetothequery(inrange0-100)"wgt":98}

Category

{//category'sURI"uri":"dmoz/Society/Issues/Warfare_and_Conflict",//URIoftheparentcategory"parentUri":"dmoz/Society/Issues",//categorylabel"label":"Society/Issues/Warfare_and_Conflict",//URIsofchildrencategories"childrenUris":[],//howmuchwasthecategorytrendinginthelastdays"trendingHistory":{"latestArticleTimestamp":"2016-03-1703:44:00","news":[5867,//29daysago6818,5927,...3371,5957,5782,//2daysago5139,//yesterday646//today]},//internalERid-donotuse!"id":283}

Concept

{//concept'sURI"uri":"http://en.wikipedia.org/wiki/United_States",//concepttype-person,loc,orgorwiki"type":"loc",//conceptlabelsinrequestedlanguages"label":{"eng":"UnitedStates","spa":"EstadosUnidos"},//whatclassesdoestheconceptbelongto

Page 26: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

26

"conceptClassMembership":["http://dbpedia.org/ontology/Country"],//synonymsfortheconcept,ifany"synonyms":{"eng":["USA","U.S.A."]},//textualdescriptionoftheconcept"description":"<p>TheUnitedStatesofAmerica(USA),commonlyreferredtoastheUnitedStates(U.S.),America,andsometimestheStates,isafederalrepublicconsistingof50states,16territories,andafederaldistrict....",//internalERid-donotuse"id":"233"}

5.2 WeatherDataAttributes

TheAPIdownloadstheweatherattributeslistedinTable4.

Table4:ListofweatherattributessupportedbytheEW-ShoppweatherAPI.

name unit shortname description

Cloudbaseheight m cbh

Searchingfromthe2ndlowestmodellevelupwards,theheightofthelevelwherecloudfractionbecomes>1%andcondensatecontent>1.E-6.

Maximumtemperatureat2metersinthelast6hours

K mx2t6 Maximumtemperatureat2metresinthelast6hours

Minimumtemperatureat2metresinthelast6hours

K mn2t6 Minimumtemperatureat2metresinthelast6hours

10metrewindgustinthelast6hours

m/s 10fg6 10metrewindgustinthelast6hours

Surfacepressure Pa sp Airpressureatgroundlevel

Totalcolumnwatervapour kg/m2 tcwv Verticallyintegratedwatervapour

Snowdepth mofwaterequivalent sd Depthofsnowcoverage

Snowfall mofwater sf Convective+stratiformsnowfall.Accumulated

Page 27: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

27

name unit shortname description

equivalent field.

Totalcloudcover (0-1) tcc Totalcloudcoverderivedfrommodellevelsusingthemodel'soverlapassumption

2metretemperature K 2t Temperature2mabovetheground

Totalprecipitation m tp

Heightofwaterinonem^2frompercipitation.Convectiveprecipitation+stratiformprecipitation(CP+LSP).Accumulatedfield.

Precipitationtype

codetable(seedescription)

ptype

Describesthetypeofprecipitationatthesurfaceatthevaliditytime.Aprecipitationtypeisassignedwhereverthereisanon-zerovalueofprecipitationinthemodeloutputfield(howeversmall).Theprecipitationtypeshouldbeusedtogetherwiththeprecipitationratetoprovide,forexample,indicationofpotentialfreezingrainevents.Precipitationtype(0-8)usesWMOCodeTable4.201ValuesofptypedefinedintheIFS:0=Noprecipitation,1=Rain,3=Freezingrain(i.e.supercooled),5=Snow,6=Wetsnow(i.e.startingtomelt),7=Mixtureofrainandsnow,8=Icepellets

Visibility m vis Visibilityinmetres.

Maximumtotalprecipitationrateinthelast6hours

kgm-2s-1 mxtpr6

Thetotalprecipitationiscalculatedfromthecombinedlarge-scaleandconvectiverainfallandsnowfallrateseverytimestepandthemaximumiskeptsincethelast6hours.

Minimumtotalprecipitationrateinthelast6hours

kgm-2s-1 mntpr6

Thetotalprecipitationiscalculatedfromthecombinedlarge-scaleandconvectiverainfallandsnowfallrateseverytimestepandtheminimumiskeptsincethelast6hours.

5.3 WikifierAnnotationFormat

TheWikifierreturnsaJSONresponseofthefollowingform:

{ "annotations": [ ... ], "spaces":["", " ", " ", "."], "words":["New", "York", "City"], "ranges": [ ... ] }

Page 28: D1.3 - Event, Weather and Multilingual Data Services ......specifications in deliverable D1.2. The business requirements for the described services are based on ... Semantic processing

EW-Shopp GAnumber:732590 H2020-ICT-2016-2017/H2020-ICT-2016-1

28

Wheretheannotationsfieldcontainsalistofannotationobjects;thewordsandspacesfieldscontainthetokenizeddocumenttextandcanbeusedtoreconstructit;andtherangesfieldliststheconceptmentioncandidates.Eachannotationhasthefollowingstructure:

{ "title":"New York City", "url":"http:\/\/en.wikipedia.org\/wiki\/New_York_City", "lang":"en", "pageRank":0.102831, "cosine":0.662925, "secLang": "en", "secTitle":"New York City", "secUrl":"http:\/\/en.wikipedia.org\/wiki\/New_York_City", "wikiDataClasses": [ {"itemId":"Q515", "enLabel":"city"}, {"itemId":"Q1549591", "enLabel":"big city"}, ... ], "wikiDataClassIds": ["Q515", "Q1549591", ...], "dbPediaTypes":["City", "Settlement", "PopulatedPlace", ...], "dbPediaIri":"http:\/\/dbpedia.org\/resource\/New_York_City", "supportLen":2.000000, "support": [ ... a list of ranges in the text that are mentions of this concept ... ] }

AdetaileddescriptionofallannotationfieldsisavailableintheWikifieronlinedocumentation.