a multi-media approach to cross-lingual entity knowledge...

A Multi-media Approach to Cross-lingual Entity Knowledge Transfer

Di Lu1, Xiaoman Pan1, Nima Pourdamghani2,Shih-Fu Chang3, Heng Ji1, Kevin Knight2

1 Computer Science Department, Rensselaer Polytechnic Institute{lud2,panx2,jih}@rpi.edu

2 Information Sciences Institute, University of Southern California{damghani,knight}@isi.edu

3 Electrical Engineering Department, Columbia [email protected]

Abstract

When a large-scale incident or disaster oc-curs, there is often a great demand forrapidly developing a system to extractdetailed and new information from low-resource languages (LLs). We proposea novel approach to discover compara-ble documents in high-resource languages(HLs), and project Entity Discovery andLinking results from HLs documents backto LLs. We leverage a wide variety oflanguage-independent forms from multi-ple data modalities, including image pro-cessing (image-to-image retrieval, visualsimilarity and face recognition) and soundmatching. We also propose novel meth-ods to learn entity priors from a large-scaleHL corpus and knowledge base. UsingHausa and Chinese as the LLs and En-glish as the HL, experiments show that ourapproach achieves 36.1% higher Hausaname tagging F-score over a costly super-vised model, and 9.4% higher Chinese-to-English Entity Linking accuracy overstate-of-the-art.

1 Introduction

In many situations such as disease outbreaks andnatural calamities, we often need to develop an In-formation Extraction (IE) component (e.g., a nametagger) within a very limited time to extract infor-mation from low-resource languages (LLs) (e.g.,locations where Ebola outbreaks from Hausa doc-uments). The main challenge lies in the lack of la-beled data and linguistic processing tools in theselanguages. A potential solution is to extract andproject knowledge from high-resource languages(HLs) to LLs.

A large amount of non-parallel, domain-rich,

topically-related comparable corpora naturally ex-ist across LLs and HLs for breaking incidents,such as coordinated news streams (Wang et al.,2007) and code-switching social media (Voss etal., 2014; Barman et al., 2014). However, withouteffective Machine Translation techniques, evenjust identifying such data in HLs is not a trivialtask. Fortunately many of such comparable doc-uments are presented in multiple data modalities(text, image and video), because press releaseswith multimedia elements generate up to 77%more views than text-only releases (Newswire,2011). In fact, they often contain the same orsimilar images and videos, which are language-independent.

In this paper we propose to use images as ahub to automatically discover comparable corpora.Then we will apply Entity Discovery and Linking(EDL) techniques in HLs to extract entity knowl-edge, and project results back to LLs by leverag-ing multi-source multi-media techniques. In thefollowing we will elaborate motivations and de-tailed methods for two most important EDL com-ponents: name tagging and Cross-lingual EntityLinking (CLEL). For CLEL we choose Chineseas the LL and English as HL because Chinese-to-English is one of the few language pairs forwhich we have ground-truth annotations from offi-cial shared tasks (e.g., TAC-KBP (Ji et al., 2015)).Since Chinese name tagging is a well-studiedproblem, we choose Hausa instead of Chineseas the LL for name tagging experiment, becausewe can use the ground truth from the DARPALORELEI program1 for evaluation.

Entity and Prior Transfer for Name Tagging:In the first case study, we attempt to use HL extrac-tion results directly to validate and correct names

1http://www.darpa.mil/program/low-resource-languages-for-emergent-incidents

Figure 1: Image Anchored Comparable Corpora Retrieval.

extracted from LLs. For example, in the Hausadocument in Figure 1, it would be challengingto identify the location name “Najeriya” directlyfrom the Hausa document because it’s differentfrom its English counterpart. But since its transla-tion “Nigeria” appears in the topically-related En-glish document, we can use it to infer and validateits name boundary.

Even if topically-related documents don’t ex-ist in an HL, similar scenarios (e.g., disease out-breaks) and similar activities of the same en-tity (e.g., meetings among politicians) often re-peat over time. Moreover, by running a high-performing HL name tagger on a large amount ofdocuments, we can obtain entity prior knowledgewhich shows the probability of a related nameappearing in the same context. For example, ifwe already know that “Nigeria”, “Borno”, “Good-luck Jonathan”, “Boko Haram” are likely to ap-pear, then we could also expect “Mouhammed AliNdume” and “Mohammed Adoke” might be men-tioned because they were both important politi-cians appointed by Goodluck Jonathan to consideropening talks with Boko Haram. Or more gener-ally if we know the LL document is about politicsin China in 1990s, we could estimate that famouspoliticians during that time such as “Deng Xiaop-ing” are likely to appear in the document.

Next we will project these names extracted fromHL documents directly to LL documents to iden-tify and verify names. In addition to textual ev-idence, we check visual similarity to match anHL name with its equivalent in LL. And we ap-ply face recognition techniques to verify personnames by image search. This idea matches hu-man knowledge acquisition procedure as well. Forexample, when a child is watching a cartoon andshifting between versions in two languages, s/hecan easily infer translation pairs for the same con-

Figure 2: Examples of Cartoons in Chinese (left)and English (right).

cept whose images appear frequently (e.g., “宝宝(baby)” and “螃蟹 (crab)” in “Dora Exploration”,“海盗 (pirate)” in “the Garden Guardians”, and“亨利 (Henry)” in “Thomas Train”), as illustratedin Figure 2.

Representation and Structured KnowledgeTransfer for Entity Linking: Besides datasparsity, another challenge for low-resource lan-guage IE lies in the lack of knowledge resources.For example, there are advanced knowledge rep-resentation parsing tools available (e.g., AbstractMeaning Representation (AMR) (Banarescu et al.,2013)) and large-scale knowledge bases for En-glish Entity Linking, but not for other languages,including some medium-resource ones such asChinese. For example, the following documentsare both about the event of Pistorius killing his girlfriend Reeva:

• LL document: 南非残疾运动员皮斯托瑞斯被指控杀害女友瑞娃于其茨瓦内的家中。皮斯托瑞斯是南非著名的残疾人田径选手，有“刀锋战士”之称。...(The dis-abled South African sportsman Oscar Pis-

Pistorius

Oscar Pistoriuscoreference

South Africamodify

Blade Runnermodify

Reeva

modelmodify

die-01

kill-01

en/South_Africa

en/Pretorialanglink

zh/

(Tshwane)

en/Reeva_Steenkamp

zh/ · (Reeva)

redirect

(South Africa) (Tshwane)

langlinkrenamed link

Retrieve relevant entity set

en/Johannesburg en/Cape_Town en/Bivane_River en/Jacob_Zuma ……

capital

Knowledge Base Walker

LL Documents HL Documents

LL entity mentions AMR based

Knowledge GraphImage anchored comparable document retrieval

Entity Linking

(Reeva)(Pistorius)

Figure 3: Cross-lingual Knowledge Transfer for Entity Linking.

torius was charged to killing his girl friendReeva at his home in Tshwane. Pistorius isa famous runner in South Africa, also namedas “Blade Runner”...)

• HL document: In the early morning ofThursday, 14 February 2013, “Blade Run-ner” Oscar Pistorius shot and killed SouthAfrican model Reeva Steenkamp...

From the LL documents we may only be ableto construct co-occurrence based knowledge graphand thus it’s difficult to link rare entity mentionssuch as “瑞娃 (Reeva)” and “茨瓦内 (Tshwane)”to an English knowledge base (KB). But if we ap-ply an HL (e.g., English) entity linker, we couldconstruct much richer knowledge graphs from HLdocuments using deep knowledge representationssuch as AMR, as shown in Figure 3, and link allentity mentions to the KB accurately. Moreover,if we start to walk through the KB, we can eas-ily reach from English related entities to the enti-ties mentioned in LL documents. For example, wecan walk from “South Africa” to its capital “Pre-toria” in the KB, which is linked to its LL form“比勒陀利亚” through a language link and then isre-directed to “茨瓦内” mentioned in the LL doc-ument through a redirect link. Therefore we caninfer that “茨瓦内” should be linked to “Pretoria”in the KB.

Compared to most previous cross-lingual pro-jection methods, our approach does not requiredomain-specific parallel corpora or lexicons, or infact, any parallel data at all. It also doesn’t requireany labeled data in LLs. Using Hausa and Chi-nese as the LLs and English as HL for case study,experiments demonstrate that our approach canachieve 36.1% higher Hausa name tagging over acostly supervised model trained from 337 docu-

ments, and 9.4% higher Chinese-to-English EntityLinking accuracy over a state-of-the-art system.

2 Approach Overview

Figure 4 illustrates the overall framework. Itconsists of two steps: (1) Apply language-independent key phrase extraction methods oneach LL document, then use key phrases as a queryto retrieve seed images, and then use the seed im-ages to retrieve matching images, and retrieve HLdocuments containing these images (Section 3);(2) Extract knowledge from HL documents, anddesign knowledge transfer methods to refine LLextraction results.

Figure 4: Overall Framework.

We will present two case studies on name tag-ging (Section 5) and cross-lingual entity linking(CLEL) (Section 6) respectively. Our projectionapproach consists of a series of non-traditionalmulti-media multi-source methods based on tex-tual and visual similarity, face recognition, as wellas entity priors learned from both unstructureddata and structured KB.

3 Comparable Corpora Discovery

In this section we will describe the detailed stepsof acquiring HL documents for a given LL doc-ument via anchoring images. Using a cluster ofimages as a hub, we attempt to connect topically-related documents in LL and HL. We will walkthrough each step for the motivating example inFigure 1.

3.1 Key Phrase Extraction

For an LL document (e.g., Figure 1 for thewalk-through example), we start by extracting itskey phrases using the following three language-independent methods: (1) TextRank (Mihalceaand Tarau, 2004), which is a graph-based rankingmodel to determine key phrases. (2) Topic mod-eling based on Latent Dirichlet allocation (LDA)model (Blei et al., 2003), which can generate asmall number of key phrases representing the maintopics of each document. (3) The title of the doc-ument if it’s available.

3.2 Seed Image Retrieval

Using the extracted key phrases together as onesingle query, we apply Google Image Search toretrieve top 15 ranked images as seeds. To reducethe noise introduced by image search, we filterout images smaller than 100×100 pixels becausethey are unlikely to appear in the main part of webpages. We also filter out an image if its web pagecontains less than half of the tokens in the query.Figure 1 shows the anchoring images retrieved forthe walk-through example.

3.3 HL Document Retrieval

Using each seed image, we apply Google image-to-image search to retrieve more matching images,and then use the TextCat tool (Cavnar et al., 1994)as a language identifier to select HL documentscontaining these images. It shows three Englishdocuments retrieved for the first image in Figure 1.

For related topics, more images may be avail-able in HLs than LLs. To compensate this datasparsity problem, using the HL documents re-trieved as a seed set, we repeat the above stepsone more time by extracting key phrases from theHL seed set to retrieve more images and gathermore HL documents. For example, a Hausa doc-ument about “Arab Spring” includes protests thathappened in Algeria, Bahrain, Iran, Libya, Yemenand Jordan. The HL documents retrieved by LL

key phrases and images in the first step missed thedetailed information about protests in Iran. How-ever the second step based on key phrases and im-ages from HL successfully retrieved detailed re-lated documents about protests in Iran.

Applying the above multimedia search, weautomatically discover domain-rich non-paralleldata. Next we will extract facts from HLs andproject them to LLs.

4 HL Entity Discovery and Linking

4.1 Name Tagging

After we acquire HL (English in this paper) com-parable documents, we apply a state-of-the-art En-glish name tagger (Li et al., 2014) based on struc-tured perceptron to extract names. From the out-put we filter out uninformative names such as newsagencies. If the same name receives multiple typesacross documents, we use the majority one.

4.2 Entity Linking

We apply a state-of-the-art Abstract Meaning Rep-resentation (AMR) parser (Wang et al., 2015a)to generate rich semantic representations. Thenwe apply an AMR based entity linker (Pan et al.,2015) to link all English entity mentions to thecorresponding entities in the English KB. Givena name nh, this entity linker first constructs aKnowledge Graph g(nh) with nh at the hub andleaf nodes obtained from names reachable byAMR graph traversal from nh. A subset of theleaf nodes are selected as collaborators of nh.Names connected by AMR conjunction relationsare grouped into sets of coherent names. For eachname nh, an initial ranked list of entity candidatesE = {e1, ..., eM} is generated based on a saliencemeasure (Medelyan and Legg, 2008). Then aKnowledge Graph g(em) is generated for each en-tity candidate em in nh’s entity candidate list E .The entity candidates are then re-ranked accordingto Jaccard Similarity, which computes the similar-ity between g(nh) and g(em): J(g(nh), g(em)) =|g(nh)∩g(em)||g(nh)∪g(em)| . Finally, the entity candidate with thehighest score is selected as the appropriate entityfor nh. Moreover, the Knowledge Graphs of co-herent mentions will be merged and linked collec-tively.

4.3 Entity Prior Acquisition

Given the English entities discovered from theabove, we aim to automatically mine related en-

tities to further expand the expected entity set. Weuse a large English corpus and English knowledgebase respectively as follows.

If a name nh appears frequently in these re-trieved English documents,2 we further mine otherrelated names n′

h which are very likely to appearin the same context as nh in a large-scale news cor-pus (we use English Gigaword V5.0 corpus3 in ourexperiment). For each pair of names 〈n′

h, nh〉, wecompute P(n ′

h |nh) based on their co-occurrencesin the same sentences. If P(n ′

h |nh) is larger than athreshold,4 and n′

h is a person name, then we addn′h into the expected English name set.Let E0 = {e1, ..., eN} be the set of entities in

the KB that all mentions in English documents arelinked to. For each ei ∈ E0, we ‘walk’ one stepfrom it in the KB to retrieve all of its neighborsN (ei). We denote the set of neighbor nodes asE1 = {N (e1), ...,N (eN )}. Then we extend theexpected English entity set as E0 ∪ E1. Table 1shows some retrieved neighbors for entity “ElonMusk”.

Relation Neighboris founder of SpaceXis founder of Tesla Motorsis spouse of Justine Muskbirth place Pretoriaalma mater University of Pennsylvania

parents Errol Muskrelatives Kimbal Musk

Table 1: Neighbors of Entity “Elon Musk”.

5 Knowledge Transfer for Name Tagging

In this section we will present the first case studyon name tagging, using English as HL and Hausaas LL.

5.1 Name Projection

After expanding the English expected name set us-ing entity prior, next we will try to carefully select,match and project each expected name (nh ) fromEnglish to the one (nl ) in Hausa documents. Wescan through every n-gram (n in the order 3, 2, 1)in Hausa documents to see if any of them match anEnglish name based on the following multi-medialanguage-independent low-cost heuristics.

2for our experiment we choose those that appear morethan 10 times

3https://catalog.ldc.upenn.edu/LDC2011T0740.02 in our experiment.

Spelling: If nh and nl are identical (e.g.,“Brazil”), or with an edit distance of one afterlower-casing and removing punctuation (e.g., nh

= “Mogadishu” and nl = “Mugadishu”), or sub-string match (nh = “Denis Samsonov” and nl =“Samsonov”).

Pronunciation: We check the pronunciationsof nh and nl based on Soundex (Odell, 1956),Metaphone (Philips, 1990) and NYSIIS (Taft,1970) algorithms. We consider two codes matchif they are exactly the same or one code is a partof the other. If at least two coding systems matchbetween nh and nl, we consider they are equiva-lents.

Visual Similarity: When two names refer to thesame entity, they usually share certain visual pat-terns in their related images. For example, usingthe textual clues above is not sufficient to find theHausa equivalent “Majalisar Dinkin Duniya” for“United Nations”, because their pronunciationsare quite different. However, Figure 5 shows theimages retrieved by “Majalisar Dinkin Duniya”and “United Nations” are very similar.5

We first retrieve top 50 images for each mentionusing Google image search. Let Ih and Il denotetwo sets of images retrieved by an nh and a can-didate nl (e.g., nh = “United Nations” and nl =“Majalisar Dinkin Duniya” in Figure 5), ih ∈ Ihand il ∈ Il . We apply the Scale-invariant featuretransform (SIFT) detector (Lowe, 1999) to countthe number of matched key points between twoimages, K (ih , il ), as well as the key points in eachimage, P(ih) and P(il ). SIFT key point is a cir-cular image region with an orientation, which canprovide feature description of the object in the im-age. Key points are maxima/minima of the Dif-ference of Gaussians after the image is convolvedwith Gaussian filters at different scales. They usu-ally lie in high-contrast regions. Then we definethe similarity (0 ∼ 1) between two phrases as:

S(nh, nl) = maxih∈Ih

maxil∈Il

K(ih, il)

min(P (ih), P (il))(1)

Based on empirical results from a separate smalldevelopment set, we decide two phrases match ifS(nh, nl) > 10%. This visual similarity compu-tation method, though seemingly simple, has been

5Although existing Machine Translation (MT) tools likeGoogle Translate can correctly translate this example phrasefrom Hausa to English, here we use it as an example to illus-trate the low-resource setting when direct MT is not available.

one of the principal techniques in detecting near-duplicate visual content (Ke et al., 2004).

(a) Majalisar Dinkin Duniya (b) United Nations

Figure 5: Matched SIFT Key points.

5.2 Person Name Verification through FaceRecognition

For each name candidate, we apply Google imagesearch to retrieve top 10 images (examples in Fig-ure 6). If more than 5 images contain and onlycontain 1-2 faces, we classify the name as a per-son. We apply face detection technique based onHaar Feature (Viola and Jones, 2001). This tech-nique is a machine learning based approach wherea cascade function is trained from a large amountof positive and negative images. In the future wewill try other alternative methods using differentfeature sets such as Histograms of Oriented Gra-dients (Dalal and Triggs, 2005).

Figure 6: Face Recognition for Validating PersonName ‘Nawaz Shariff’.

6 Knowledge Transfer for Entity Linking

In this section we will present the second casestudy on Entity Linking, using English as HL andChinese as LL. We choose this language pair be-cause its ground-truth Entity Linking annotationsare available through the TAC-KBP program (Ji etal., 2015).

6.1 Baseline LL Entity LinkingWe apply a state-of-the-art language-independentcross-lingual entity linking approach (Wang et al.,2015b) to link names from Chinese to an En-glish KB. For each name n, this entity linkeruses the cross-lingual surface form dictionary〈f, {e1, e2, ..., eM}〉, where E = {e1, e2, ..., eM}is the set of entities with surface form f in the KB

according to their properties (e.g., labels, names,aliases), to locate a list of candidate entities e ∈ Eand compute the importance score by an entropybased approach.

6.2 Representation and StructuredKnowledge Transfer

Then for each expected English entity eh, if thereis a cross-lingual link to link it to an LL (Chinese)entry el in the KB, we added the title of the LLentry or its redirected/renamed page cl as its LLtranslation. In this way we are able to collect aset of pairs of 〈cl, eh〉, where cl is an expected LLname, and eh is its corresponding English entityin the KB. For example, in Figure 3, we can col-lect pairs including “(瑞娃, Reeva Steenkamp)”,“(瑞娃·斯廷坎普, Reeva Steenkamp)”, “(茨瓦内,Pretoria)” and “(比勒陀利亚, Pretoria)”. Foreach mention in an LL document, we then checkwhether it matches any cl, if so then use eh to over-ride the baseline LL Entity Linking result. Table 2shows some 〈cl, eh〉 pairs with frequency. Our ap-proach not only successfully retrieves translationvariants of “Beijing” and “China Central TV”, butalso alias and abbreviations.

eh el cl Freq.

Beijing 北京市

北京(Beijing) 553北京市(Beijing City) 227燕京(Yanjing) 15京师(Jingshi) 3北平(Beiping) 2首都(Capital) 1蓟(Ji) 1

燕都(Yan capital) 1ChinaCen-tralTV

中国

中央电视台

央视(Central TV) 19CCTV 16

中央电视台(Central TV) 13中国央视(China Central TV) 3

Table 2: Representation and Structured Knowl-edge Transfer for Expected English Entities “Bei-jing” and “China Central TV”.

7 Experiments

In this section we will evaluate our approach onname tagging and Cross-lingual Entity Linking.

7.1 Data

For name tagging, we randomly select 30 Hausadocuments from the DARPA LORELEI programas our test set. It includes 63 person names (PER),64 organizations (ORG) 225 geo-political entities

(GPE) and locations (LOC). For this test set, intotal we retrieved 810 topically-related Englishdocuments. We found that 80% names in theground truth appear at least once in the retrievedEnglish documents, which shows the effectivenessof our image-anchored comparable data discoverymethod.

For comparison, we trained a supervisedHausa name tagger based on Conditional Ran-dom Fields (CRFs) from the remaining 337 la-beled documents, using lexical features (characterngrams, adjacent tokens, capitalization, punctua-tions, numbers and frequency in the training data).

We learn entity priors by running the Stanfordname tagger (Manning et al., 2014) on English Gi-gaword V5.0 corpus.6 The corpus includes 4.16billion tokens and 272 million names (8.28 millionof which are unique).

For Cross-lingual Entity Linking, we use30 Chinese documents from the TAC-KBP2015Chinese-to-English Entity Linking track (Ji et al.,2015) as our test set. It includes 678 persons, 930geo-political names, 437 organizations and 88 lo-cations. The English KB is derived from BaseKB,a cleaned version of English Freebase. 89.7% ofthese mentions can be linked to the KB. Using themulti-media approach, we retrieved 235 topically-related English documents.

7.2 Name Tagging PerformanceTable 3 shows name tagging performance. We cansee that our approach dramatically outperformsthe supervised model. We conduct the WilcoxonMatched-Pairs Signed-Ranks Test on ten folders.The results show that the improvement using vi-sual evidence is significant at a 95% confidencelevel and the improvement using entity prior issignificant at a 99% confidence level. Visual Ev-idence greatly improves organization tagging be-cause most of them cannot be matched by spellingor pronunciation.

Face detection helps identify many personnames missed by the supervised name tagger. Forexample, in the following sentence, “Nawaz Shar-iff ” is mistakenly classified as a location by thesupervised model due to the designator “kasar(country)” appearing in its left context. Sincefaces can be detected from all of the top 10 re-trieved images (Figure 6), we fix its type to person.

• Hausa document: “Yansanda sun dauki6https://catalog.ldc.upenn.edu/LDC2011T07

wannan matakin ne kwana daya bayanda PMkasar Nawaz Shariff ya fidda sanarwar indaya bukaci... (The Police took this step a dayafter the PM of the country Nawaz Shar-iff threw out the statement in which he de-manded that...)”

Face detection is also effective to resolve clas-sification ambiguity. For example, the commonperson name “Haiyan” can also be used to refer tothe Typhoon in Southeast Asia. Both of our HLand LL name taggers mistakenly label “Haiyan”as a person in the following documents:

• Hausa document: “...a yayinda mahaukaci-yar guguwar teku da aka lakawa sunaHaiyan ta fada tsibiran Leyte da Samar. (...asthe violent typhoon, which has been given thename, Haiyan, has swept through the islandof Leyte and Samar.)”

• Retrieved English comparable document:“As Haiyan heads west toward Vietnam, theRed Cross is at the forefront of an interna-tional effort to provide food, water, shelterand other relief...”

In contrast using face detection results we suc-cessfully remove it based on processing the re-trieved images as shown in Figure 7.

Figure 7: Top 9 Retrieved Images for ‘Haiyan’.

Entity priors successfully provide more detailedand richer background knowledge than the compa-rable English documents. For example, the maintopic of one Hausa document is the former pres-ident of Nigeria Olusegun Obasanjo accusing thecurrent President Goodluck Jonathan, and a com-ment by the former 1990s military administrator ofKano Bawa Abdullah Wase is quoted. But BawaAbdullah Wase is not mentioned in any related En-glish documents. However, based on entity priorswe observe that “Bawa Abdullah Wase” appearsfrequently in the same contexts as “Nigeria” and“Kano”, and thus we successfully project it backto the Hausa sentence: “Haka ma Bawa Abdul-lahi Wase ya ce akawai abun dubawa a kalamun

System Identification F-score ClassificationAccuracy

OverallF-scorePER ORG LOC7 ALL

Supervised 36.52 38.64 42.38 40.25 76.84 30.93Our Approach 77.69 60.00 70.55 70.59 95.00 67.06Our Approach w/o Visual Evidence 73.77 46.58 70.74 67.98 94.77 64.43Our Approach w/o Entity Prior 64.91 60.00 70.55 67.59 94.71 64.02

Table 3: Name Tagging Performance (%).

tsohon shugaban kasa kuma tsohon jamiin tsaro.(In the same vein, Bawa Abdullahi Wase said thatthere were things to take away from the formerPresident’s words.)”. The impact of entity priorson person names is much more significant thanother categories because multiple person entitiesoften co-occur in some certain events or relatedtopics which might not be fully covered in the re-trieved English documents. In contrast most ex-pected organizations and locations already exist inthe retrieved English documents.

For the same local topic, Hausa documents usu-ally describe more details than English documents,and include more unsalient entities. For example,for the president election in Ivory Coast, a Hausadocument mentions the officials of the electoralbody such as “Damana Picasse”: “Wani wakilinhukumar zaben daga jamiyyar shugaba Gbagbo,Damana Picasse, ya kekketa takardun sakamakona gaban yan jarida, ya kuma ce ba na halal ba ne.(An official of the electoral body from presidentGbagbo’s party, Damana Picasse, tore up the re-sult document in front of journalists, and said it isnot legal.)”. In contrast, no English comparabledocuments mention their names. The entity priormethod is able to extract many names which ap-pear frequently together with the president name“Gbagbo”.

7.3 Entity Linking Performance

Table 4 presents the Cross-lingual Entity Linkingperformance. We can see that our approach sig-nificantly outperforms our baseline and the bestreported results on the same test set (Ji et al.,2015). Our approach is particularly effective forrare nicknames (e.g., “C罗” (C Luo) is used to re-fer to Cristiano Ronaldo) or ambiguous abbrevia-tions (e.g., “邦联” (federal) can refer to Confed-erate States of America, 邦联制 (Confederation)and many other entities) for which the contexts inLLs are not sufficient for making correct linkingdecisions due to the lack of rich knowledge rep-

resentation. Our approach produces worse linkingresults than the baseline for a few cases when thesame abbreviation is used to refer to multiple en-tities in the same document. For example, when“巴” is used to refer to both “巴西 (Brazil)” or “巴勒斯坦 (Palestine)” in the same document, our ap-proach mistakenly links all mentions to the sameentity.

Figure 8: An Example of Cross-lingual Cross-media Knowledge Graph.

7.4 Cross-lingual Cross-media KnowledgeGraph

As an end product, our framework will constructcross-lingual cross-media knowledge graphs. Anexample about the Ebola scenario is presented inFigure 8, including entity nodes extracted fromboth Hausa (LL) and English (HL), anchored byimages; and edges extracted from English.

8 Related Work

Some previous cross-lingual projection meth-ods focused on transferring data/annotation(e.g., (Pado and Lapata, 2009; Kim et al., 2010;Faruqui and Kumar, 2015)), shared feature rep-resentation/model (e.g., (McDonald et al., 2011;Kozhevnikov and Titov, 2013; Kozhevnikov andTitov, 2014)), or expectation (e.g., (Wang andManning, 2014)). Most of them relied on a large

Overall Linkable EntitiesApproach PER ORG GPE LOC ALL PER ORG GPE LOC ALLBaseline 49.12 60.18 80.97 80.68 66.57 67.27 67.61 81.05 80.68 74.70

State-of-the-art 49.85 64.30 75.38 96.59 65.87 68.28 72.24 75.46 96.59 73.91Our Approach 52.36 67.05 93.33 93.18 74.92 71.72 75.32 93.43 93.18 84.06

Our Approach w/o KB Walker 50.44 67.05 84.41 90.91 70.32 69.09 75.32 84.50 90.91 78.91

Table 4: Cross-lingual Entity Linking Accuracy (%).

amount of parallel data to derive word alignmentand translations, which are inadequate for manyLLs. In contrast, we do not require any paral-lel data or bi-lingual lexicon. We introduce newcross-media techniques for projecting HLs to LLs,by inferring projections using domain-rich, non-parallel data automatically discovered by imagesearch and processing. Similar image-mediatedapproaches have been applied to other tasks suchas cross-lingual document retrieval (Funaki andNakayama, 2015) and bilingual lexicon induc-tion (Bergsma and Van Durme, 2011). Besidesvisual similarity, their method also relied ondistributional similarity computed from a largeamount of unlabeled data, which might not beavailable for some LLs.

Our name projection and validation approachesare similar to other previous work on bi-linguallexicon induction from non-parallel corpora (e.g.,(Fung and Yee, 1998; Rapp, 1999; Shao and Ng,2004; Munteanu and Marcu, 2005; Sproat et al.,2006; Klementiev and Roth, 2006; Hassan et al.,2007; Udupa et al., 2009; Ji, 2009; Darwish,2010; Noeman and Madkour, 2010; Bergsma andVan Durme, 2011; Radford et al., ; Irvine andCallison-Burch, 2013; Irvine and Callison-Burch,2015)) and name translation mining from multi-lingual resources such as Wikipedia (e.g. (Sorgand Cimiano, 2008; Adar et al., 2009; Nabende,2010; Lin et al., 2011)). We introduce new multi-media evidence such as visual similarity and facerecognition for name validation, and also exploit alarge amount of monolingual HL data for miningentity priors to expand the expected entity set.

For Cross-lingual Entity Linking, some recentwork (Finin et al., 2015) also found cross-lingualcoreference resolution can greatly reduce ambi-guity. Some other methods also utilized globalknowledge in the English KB to improve linkingaccuracy via quantifying link types (Wang et al.,2015b), computing pointwise mutual informationfor the Wikipedia categories of consecutive pairsof entities (Sil et al., 2015), or using linking as

feedback to improve name classification (Sil andYates, 2013; Heinzerling et al., 2015; Besancon etal., 2015; Sil et al., 2015).

9 Conclusions and Future Work

We describe a novel multi-media approach to ef-fectively transfer entity knowledge from high-resource languages to low-resource languages. Inthe future we will apply visual pattern recognitionand concept detection techniques to perform deepcontent analysis of the retrieved images, so we cando matching and inference on concept/entity levelinstead of shallow visual similarity. We will alsoextend anchor image retrieval from document-level into phrase-level or sentence-level to obtainricher background information. Furthermore, wewill exploit edge labels while walking througha knowledge base to retrieve more relevant enti-ties. Our long-term goal is to extend this frame-work to other knowledge extraction and popula-tion tasks such as event extraction and slot fill-ing to construct multimedia knowledge bases ef-fectively from multiple languages with low cost.

Acknowledgments

This work was supported by the U.S. DARPALORELEI Program No. HR0011-15-C-0115,ARL/ARO MURI W911NF-10-1-0533, DARPAMultimedia Seedling grant, DARPA DEFT No.FA8750-13-2-0041 and FA8750-13-2-0045, andNSF CAREER No. IIS-1523198. The views andconclusions contained in this document are thoseof the authors and should not be interpreted as rep-resenting the official policies, either expressed orimplied, of the U.S. Government. The U.S. Gov-ernment is authorized to reproduce and distributereprints for Government purposes notwithstandingany copyright notation here on.

ReferencesEytan Adar, Michael Skinner, and Daniel S Weld.

2009. Information arbitrage across multi-lingual

wikipedia. In Proceedings of the Second ACM Inter-national Conference on Web Search and Data Min-ing, pages 94–103.

Laura Banarescu, Claire Bonial, Shu Cai, MadalinaGeorgescu, Kira Griffitt, Ulf Hermjakob, KevinKnight, Philipp Koehn, Martha Palmer, and NathanSchneider. 2013. Abstract meaning representa-tion for sembanking. In Proceedings of Associa-tion for Computational Linguistics 2013 Workshopon Linguistic Annotation and Interoperability withDiscourse.

Utsab Barman, Amitava Das, Joachim Wagner, andJennifer Foster. 2014. Code mixing: A challengefor language identification in the language of socialmedia. In Proceedings of Conference on EmpiricalMethods in Natural Language Processing Workshopon Computational Approaches to Code Switching.

Shane Bergsma and Benjamin Van Durme. 2011.Learning bilingual lexicons using the visual similar-ity of labeled web images. In Proceedings of Inter-national Joint Conference on Artificial Intelligence.

Romaric Besancon, Hani Daher, Herve Le Borgne,Olivier Ferret, Anne-Laure Daquo, and AdrianPopescu. 2015. Cea list participation at tac edlenglish diagnostic task. In Proceedings of the TextAnalysis Conference.

David M Blei, Andrew Y Ng, and Michael I Jordan.2003. Latent dirichlet allocation. the Journal of ma-chine Learning research, 3(993-1022).

William B Cavnar, John M Trenkle, et al. 1994. N-gram-based text categorization. In Proceedings ofSDAIR1994.

Navneet Dalal and Bill Triggs. 2005. Histograms oforiented gradients for human detection. In Proceed-ings of the Conference on Computer Vision and Pat-tern Recognition.

Kareem Darwish. 2010. Transliteration mining withphonetic conflation and iterative training. In Pro-ceedings of the 2010 Named Entities Workshop.

Manaal Faruqui and Shankar Kumar. 2015. Multi-lingual open relation extraction using cross-lingualprojection. In Proceedings of the 2015 Conferenceof the North American Chapter of the Associationfor Computational Linguistics.

Tim Finin, Dawn Lawrie, Paul McNamee, James May-field, Douglas Oard, Nanyun Peng, Ning Gao, Yiu-Chang Lin, Josh MacLin, and Tim Dowd. 2015.HLTCOE participation in TAC KBP 2015: Coldstart and TEDL. In Proceedings of the Text Anal-ysis Conference.

Ruka Funaki and Hideki Nakayama. 2015. Image-mediated learning for zero-shot cross-lingual doc-ument retrieval. In Proceedings of Conference onEmpirical Methods in Natural Language Process-ing.

Pascale Fung and Lo Yuen Yee. 1998. An ir approachfor translating new words from nonparallel and com-parable texts. In Proceedings of the 17th inter-national conference on Computational linguistics-Volume 1.

Ahmed Hassan, Haytham Fahmy, and Hany Hassan.2007. Improving named entity translation by ex-ploiting comparable and parallel corpora. AMML07.

Benjamin Heinzerling, Alex Judea, and MichaelStrube. 2015. Hits at tac kbp 2015: Entity discoveryand linking, and event nugget detection. In Proceed-ings of the Text Analysis Conference.

Ann Irvine and Chris Callison-Burch. 2013. Com-bining bilingual and comparable corpora for low re-source machine translation. In Proc. WMT.

Ann Irvine and Chris Callison-Burch. 2015. Discrimi-native Bilingual Lexicon Induction. ComputationalLinguistics, 1(1).

Heng Ji, Joel Nothman, Ben Hachey, and Radu Florian.2015. Overview of tac-kbp2015 tri-lingual entitydiscovery and linking. In Proceedings of the TextAnalysis Conference.

Heng Ji. 2009. Mining name translations from com-parable corpora by creating bilingual informationnetworks. In Proceedings of the 2nd Workshopon Building and Using Comparable Corpora: fromParallel to Non-parallel Corpora.

Yan Ke, Rahul Sukthankar, Larry Huston, Yan Ke, andRahul Sukthankar. 2004. Efficient near-duplicatedetection and sub-image retrieval. In Proceedingsof ACM International Conference on Multimedia.

Seokhwan Kim, Minwoo Jeong, Jonghoon Lee, andGary Geunbae Lee. 2010. A cross-lingual anno-tation projection approach for relation detection. InProceedings of the 23rd International Conference onComputational Linguistics.

Alexandre Klementiev and Dan Roth. 2006. Weaklysupervised named entity transliteration and discov-ery from multilingual comparable corpora. InProceedings of the 21st International Conferenceon Computational Linguistics and the 44th annualmeeting of the Association for Computational Lin-guistics.

Mikhail Kozhevnikov and Ivan Titov. 2013. Cross-lingual transfer of semantic role labeling models. InProceedings of Association for Computational Lin-guistics.

Mikhail Kozhevnikov and Ivan Titov. 2014. Cross-lingual model transfer using feature representationprojection. In Proceedings of Association for Com-putational Linguistics.

Qi Li, Heng Ji, Yu Hong, and Sujian Li. 2014.Constructing information networks using one singlemodel. In Proceedings of Conference on EmpiricalMethods in Natural Language Processing.

Wen-Pin Lin, Matthew Snover, and Heng Ji. 2011. Un-supervised language-independent name translationmining from wikipedia infoboxes. In Proceedingsof the First workshop on Unsupervised Learning inNLP, pages 43–52. Association for ComputationalLinguistics.

David G Lowe. 1999. Object recognition from localscale-invariant features. In Proceedings of the sev-enth IEEE international conference on Computer vi-sion.

Christopher D Manning, Mihai Surdeanu, John Bauer,Jenny Rose Finkel, Steven Bethard, and David Mc-Closky. 2014. The Stanford CoreNLP natural lan-guage processing toolkit. In In Proceedings of As-sociation for Computational Linguistics.

Ryan McDonald, Slav Petrov, and Keith Hall. 2011.Multi-source transfer of delexicalized dependencyparsers. In Proceedings of Conference on Empiri-cal Methods in Natural Language Processing.

Olena Medelyan and Catherine Legg. 2008. Inte-grating cyc and wikipedia: Folksonomy meets rig-orously defined common-sense. In Wikipedia andArtificial Intelligence: An Evolving Synergy, Papersfrom the 2008 AAAI Workshop.

Rada Mihalcea and Paul Tarau. 2004. Textrank:Bringing order into texts. In Proceedings of Con-ference on Empirical Methods in Natural LanguageProcessing.

Dragos Stefan Munteanu and Daniel Marcu. 2005. Im-proving machine translation performance by exploit-ing non-parallel corpora. Computational Linguis-tics, 31(4)(477–504).

Peter Nabende. 2010. Mining transliterations fromwikipedia using pair hmms. In Proceedings of the2010 Named Entities Workshop, pages 76–80. Asso-ciation for Computational Linguistics.

PR Newswire. 2011. Earned media evolved. In WhitePaper.

Sara Noeman and Amgad Madkour. 2010. Languageindependent transliteration mining system using fi-nite state automata framework. In Proceedings ofthe 2010 Named Entities Workshop, pages 57–61.Association for Computational Linguistics.

Margaret King Odell. 1956. The Profit in RecordsManagement. Systems (New York).

Sebastian Pado and Mirella Lapata. 2009. Cross-lingual annotation projection for semantic roles.Journal of Artificial Intelligence Research, 36:307–340.

Xiaoman Pan, Taylor Cassidy, Ulf Hermjakob, Heng Ji,and Kevin Knight. 2015. Unsupervised entity link-ing with abstract meaning representation. In Pro-ceedings of the 2015 Conference of the North Amer-ican Chapter of the Association for ComputationalLinguistics–Human Language Technologies.

Lawrence Philips. 1990. Hanging on the metaphone.Computer Language, 7(12).

Will Radford, Xavier Carreras, and James Henderson.Named entity recognition with document-specificKB tag gazetteers.

Reinhard Rapp. 1999. Automatic identification ofword translations from unrelated English and Ger-man corpora. In Proceedings of the 37th annualmeeting of the Association for Computational Lin-guistics on Computational Linguistics, pages 519–526.

Li Shao and Hwee Tou Ng. 2004. Mining new wordtranslations from comparable corpora. In Proceed-ings of the 20th international conference on Compu-tational Linguistics, page 618.

Avirup Sil and Alexander Yates. 2013. Re-ranking forjoint named-entity recognition and linking. In Pro-ceedings of the 22nd ACM international conferenceon Conference on information & knowledge man-agement.

Avirup Sil, Georgiana Dinu, and Radu Florian. 2015.The ibm systems for trilingual entity discovery andlinking at tac 2015. In Proceedings of the Text Anal-ysis Conference.

Philipp Sorg and Philipp Cimiano. 2008. Enrich-ing the crosslingual link structure of wikipedia-aclassification-based approach. In Proceedings of theAAAI 2008 Workshop on Wikipedia and Artifical In-telligence.

Richard Sproat, Tao Tao, and ChengXiang Zhai. 2006.Named entity transliteration with comparable cor-pora. In Proceedings of the 21st International Con-ference on Computational Linguistics and the 44thannual meeting of the Association for Computa-tional Linguistics.

Robert L Taft. 1970. Name Search Techniques. NewYork State Identification and Intelligence System,Albany, New York, US.

Raghavendra Udupa, K Saravanan, A Kumaran, andJagadeesh Jagarlamudi. 2009. Mint: A methodfor effective and scalable mining of named entitytransliterations from large comparable corpora. InProceedings of the 12th Conference of the EuropeanChapter of the Association for Computational Lin-guistics, pages 799–807.

Paul Viola and Michael Jones. 2001. Rapid object de-tection using a boosted cascade of simple features.In Proceedings of the Conference on Computer Vi-sion and Pattern Recognition.

Clare R Voss, Stephen Tratz, Jamal Laoudi, and Dou-glas M Briesch. 2014. Finding romanized arabic di-alect in code-mixed tweets. In Proceedings of Lan-guage Resources and Evaluation Conference.

Mengqiu Wang and Christopher D Manning. 2014.Cross-lingual projected expectation regularizationfor weakly supervised learning. Transactions of theAssociation for Computational Linguistics.

Xuanhui Wang, ChengXiang Zhai, Xiao Hu, andRichard Sproat. 2007. Mining correlated burstytopic patterns from coordinated text streams. In Pro-ceedings of the 13th ACM SIGKDD internationalconference on Knowledge discovery and data min-ing.

Chuan Wang, Nianwen Xue, and Sameer Pradhan.2015a. Boosting Transition-based AMR Parsingwith Refined Actions and Auxiliary Analyzers. InProceedings of the 53rd Annual Meeting of theAssociation for Computational Linguistics and the7th International Joint Conference on Natural Lan-guage Processing.

Han Wang, Jin Guang Zheng, Xiaogang Ma, Peter Fox,and Heng Ji. 2015b. Language and domain inde-pendent entity linking with quantified collective val-idation. In Proceedings of Conference on EmpiricalMethods in Natural Language Processing.