rad-spatialnet: a frame-based resource for fine-grained ...rad-spatialnet: a frame-based resource...

Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 2251–2260Marseille, 11–16 May 2020

c© European Language Resources Association (ELRA), licensed under CC-BY-NC

2251

Rad-SpatialNet: A Frame-based Resource for Fine-Grained Spatial Relationsin Radiology Reports

Surabhi Datta1, Morgan Ulinski2, Jordan Godfrey-Stovall1, Shekhar Khanpara3,Roy F. Riascos-Castaneda3, Kirk Roberts1

1School of Biomedical Informatics, The University of Texas Health Science Center at Houston2Department of Computer Science, Columbia University

3McGovern Medical School, The University of Texas Health Science Center at Houston{surabhi.datta,kirk.roberts}@uth.tmc.edu

AbstractThis paper proposes a representation framework for encoding spatial language in radiology based on frame semantics. The frameworkis adopted from the existing SpatialNet representation in the general domain with the aim to generate more accurate representationsof spatial language used by radiologists. We describe Rad-SpatialNet in detail along with illustrating the importance of incorporatingdomain knowledge in understanding the varied linguistic expressions involved in different radiological spatial relations. This workalso constructs a corpus of 400 radiology reports of three examination types (chest X-rays, brain MRIs, and babygrams) annotatedwith fine-grained contextual information according to this schema. Spatial trigger expressions and elements corresponding to a spatialframe are annotated. We apply BERT-based models (BERTBASE and BERTLARGE) to first extract the trigger terms (lexical units for aspatial frame) and then to identify the related frame elements. The results of BERTLARGE are decent, with F1 of 77.89 for spatial triggerextraction and an overall F1 of 81.61 and 66.25 across all frame elements using gold and predicted spatial triggers respectively. Thisframe-based resource can be used to develop and evaluate more advanced natural language processing (NLP) methods for extractingfine-grained spatial information from radiology text in the future.

Keywords: spatial relations, radiology, frame semantics

1. IntroductionRadiology reports are one of the most important sources ofmedical information about a patient and are thus one of themost-targeted data sources for natural language process-ing (NLP) in medicine (Pons et al., 2016). These reportsdescribe a radiologist’s interpretation of one or more two-or three-dimensional images (e.g., X-ray, computed tomog-raphy, magnetic resonance imaging, ultrasound, positronemission tomography). As a consequence, these reports arefilled with spatial relationships between medical findings(e.g., tumor, pneumonia, inflammation) or devices (e.g.,tube, stent, pacemaker) and some anatomical location (e.g.,right ventricle, chest cavity, T4, femur). These relations de-scribe complex three-dimensional interactions, requiring acombination of rich linguistic representation and medicalknowledge in order to process.

Table 1 shows an example radiology report.1 The ex-ample demonstrates the large number of medical findingsand the relationships with various anatomical entities oftenfound within radiology reports.

A flexible way of incorporating fine-grained linguis-tic representations with knowledge is through the use offrames for spatial relations (Petruck and Ellsworth, 2018).Notably, SpatialNet (Ulinski et al., 2019) extends the useof FrameNet-style frames with enabling the connection ofthese frames to background knowledge about how entitiesmay interact in spatial relationships.

In this paper, we propose an extension of SpatialNet forradiology, which we call Rad-SpatialNet. Rad-SpatialNet is

1Unlike the data annotated in this paper, the example in Table 1is publicly available without restriction from openi.nlm.nih.gov(image ID: CXR1000 IM-0003-1001).

Comparison: XXXX PA and lateral chest radio-graphsIndication: XXXX-year-old male, XXXX.Findings: There is XXXX increased opacitywithin the right upper lobe with possible massand associated area of atelectasis or focal consoli-dation.The cardiac silhouette is within normal limits.XXXX opacity in the left midlung overlying theposterior left 5th rib may represent focal airspacedisease.No pleural effusion or pneumothorax.No acute bone abnormality.

Impression: 1. Increased opacity in the right up-per lobe with XXXX associated atelectasis mayrepresent focal consolidation or mass lesion withatelectasis.Recommend chest CT for further evaluation.2. XXXX opacity overlying the left 5th rib mayrepresent focal airspace disease.

Table 1: Example radiology report containing spatiallanguage. Findings are in green, anatomical locationsare in blue, while the spatial expressions are in cyan.Note: XXXX corresponds to phrases stripped by theautomatic de-identifier.

composed of 8 broad spatial frame types (as instantiated bythe relation types), 9 spatial frame elements, and 14 entitytypes. We also describe an initial study in the annotationof the Rad-SpatialNet schema on a corpus of 400 radiol-

2252

ogy reports comprising of 3 different imaging modalitiesfrom the MIMIC-III clinical note corpus (Johnson et al.,2016). The annotated dataset will soon be made publiclyavailable. There are 1101 sentences that contain at least onespatial relation in the corpus. 1372 spatial trigger expres-sions (lexical units for spatial frames) are identified in to-tal with the most frequently occurring spatial relation typesbeing ‘Containment’, ‘Description’, and ‘Directional’. Fi-nally, we describe a baseline system for automatically ex-tracting this information using BERT (Devlin et al., 2019)based on deep bi-directional transformer architecture.

2. Related Work

A range of existing work has focused on extracting iso-lated radiological entities (e.g., findings/locations) utilizingNLP in radiology reports (Hassanpour and Langlotz, 2016;Hassanpour et al., 2017; Cornegruta et al., 2016; Bustoset al., 2019; Annarumma et al., 2019). There exists re-cently published radiology image datasets labeled with im-portant clinical entities extracted from corresponding reporttext (Wang et al., 2017; Irvin et al., 2019). Most of thesestudies have been targeted toward attribute extraction fromtext without focusing on recognizing relations among theseentities. Only a few studies have focused on relation ex-traction from radiology text (Sevenster et al., 2012; Yim etal., 2016; Steinkamp et al., 2019). However, the datasetswere limited to specific report types (e.g., hepatocellularcarcinoma) and the relations extracted do not capture spa-tial information.

Extracting spatial relations has been studied in thegeneral-domain for various purposes such as geographicinformation retrieval (Yuan, 2011), text-to-scene genera-tion (Coyne et al., 2010; Coyne and Sproat, 2001), andhuman-robot interaction (Guadarrama et al., 2013). Oneof the schemas developed for representing spatial languagein text is spatial role labeling (Kordjamshidi et al., 2010;Kordjamshidi et al., 2017). In the medical domain, somestudies have extracted spatial relations from text includingbiomedical and consumer health text (Kordjamshidi et al.,2015; Roberts et al., 2015). In radiology, two prior workshave identified spatial relations between radiological find-ings and anatomical locations, however, the reports werespecific to appendicitis and did not extract other commonclinically significant contextual information linked to thespatial relations (Roberts et al., 2012; Rink et al., 2013).

Frame semantics provide a useful way to representinformation in text and has been utilized in constructingsemantic frames to encode spatial relations (Petruck andEllsworth, 2018). The Berkeley FrameNet project (Baker,2014) contains 29 spatial frames with a total of 409 spatialrelation lexical units. A recent work (Ulinski et al., 2019)proposed a framework known as SpatialNet to map spatiallanguage expressions to actual spatial orientations by uti-lizing resources such as FrameNet (Baker, 2014) and Vi-gNet (Coyne et al., 2011). In this paper, we aim to extendthe SpatialNet framework to encode spatial language in ra-diology and utilize the framework to develop a dataset ofreports annotated with detailed spatial information.

Figure 1: Relationship between entities in Rad-SpatialNet

3. Rad-SpatialNet DescriptionRad-SpatialNet provides a framework description to rep-resent fine-grained spatial information in radiology reportsby converting the linguistic expressions denoting any spa-tial relations to radiology-specific spatial meanings. Forthis, we extend the core design proposed in the generaldomain SpatialNet (Ulinski et al., 2019), which is basedon FrameNet and VigNet, and tailor the framework specifi-cally to encode spatial language in the domain of radiology.We have presented an overview of radiological spatial rela-tions and the main participating entities in Figure 1. Spa-tialNet describes spatial frames by linking surface languageto lexical semantics and further mapping these frames torepresent the real spatial configurations. We update Spa-tialNet in this work with the aim to disambiguate the vari-ous spatial expressions used by radiologists in documentingtheir interpretations from radiographic images.

To build Rad-SpatialNet, we first utilize the language inradiology reports to construct a set of spatial frames lever-aging linguistic rules or valence patterns. The fundamen-tal principle in forming the radiology spatial frames is thesame used for constructing frames in FrameNet/SpatialNet.However, the main difference is that the target words (lexi-cal units) of the frames are the spatial trigger words whichare more common in radiology and are usually preposi-tions, verbs, and prepositional verbs. We then constructa list of spatial vignettes to transform these high-level spa-tial frames into more fine-grained frame versions by incor-porating semantic, contextual or relation type constraintsas well as radiology domain knowledge. The fine-grainedframes reveal the true meaning of the spatial expressionsfrom a radiology perspective. We describe these finalframes containing the actual spatial configurations as thespatio-graphic primitives. Unlike SpatialNet that uses theVigNet ontology to map the different lexical items into se-mantic categories, we utilize the publicly available radiol-ogy lexicon, RadLex, to map different radiological entitiesmentioned in the reports to standard terminologies recom-mended in radiology practice.

The main components involved in the proposed Rad-SpatialNet framework are described in the following sub-sections.

3.1. Ontology of Radiology TermsWe leverage RadLex (Langlotz, 2006) to map the variousradiological entities along with other clinically important

2253

information in the report text to standard unified vocabular-ies to facilitate standardized reporting and decision supportin radiology practice as well as research. RadLex consistsof a set of standardized radiology terms with their corre-sponding codes in a hierarchical structure. Rad-SpatialNetutilizes the RadLex ontology to map all possible contex-tual information with reference to any spatial relation inthe reports to the broader RadLex classes which capture thesemantic types of the various information. For example,‘Endotracheal tube’ which is a TUBE and a type of IM-PLANTABLE DEVICE is mapped to the broad RadLex classMEDICAL DEVICE. Similarly, ‘Ground-glass opacity’ be-longs to the RadLex class OPACITY which falls under thebroad class IMAGING OBSERVATION. Thus, terms such as‘opacity’, ‘opacification’, and ‘Ground-glass opacity’ aremapped to IMAGING OBSERVATION. We also link the var-ious modifier or descriptor entities to standard RadLex de-scriptors. For instance, in ‘rounded parenchymal opacity’,‘rounded’ is mapped to the MORPHOLOGIC DESCRIPTORclass of RadLex, which is one of the categories of RadLexdescriptors. Unlike VigNet, RadLex does not contain anygraphical relations representing spatial configurations. So,we utilize Radlex mainly to map the terms in reports toradiology-specific semantic categories and not for creatingspatio-graphic primitives. The mapped entities are utilizedin the following steps to construct the spatial frames andsubsequently the radiology-specific spatio-graphic primi-tives.

3.2. Spatial FramesThe spatial frames organize information in a radiologyreport sentence containing any spatial relation betweencommon radiological entities (e.g., imaging observationsand anatomical structures) according to the frame seman-tic principles, similar to FrameNet and SpatialNet. Allthe spatial frames created are inherited from the SPATIAL-CONTACT frame in FrameNet. We adopt similar valencepatterns as defined in SpatialNet by specifying various lex-ical and syntactic constraints to automatically identify theframe elements from a sentence. However, there are dif-ferences in the set of frame elements in Rad-SpatialNetcompared to SpatialNet. For example, besides FIGURE andGROUND, some of the other common elements in the Rad-SpatialNet frames are HEDGE, DIAGNOSIS, DISTANCE,and RELATIVE POSITION.

To do this, we identify the most frequent words orphrases expressing spatial relations in radiology. Such aword/phrase also forms the lexical unit for a spatial frame.The type or the sense of the spatial trigger is also recog-nized to include the spatial relation type information in theframe. For example, in the sentence describing the ex-act position of a medical device - ‘The umbilical venouscatheter tip is now 1 cm above the right hemidiaphragm’,‘above’ is the spatial trigger having a directional sense.This instantiates a spatial frame with Directional as therelation type and above.prep as the lexical unit. Someother common spatial relation types in Rad-SpatialNet areContainment, triggered by lexical units such as in.prep,within.prep, and at.prep; Descriptive, triggered by lexicalunits such as shows.v, are.v and with.prep; and Spread trig-

gered by lexical units such as extend (into).prep, through-out.prep, and involving.v. The spatial relation types areshown in Table 2 and the elements identified for the spa-tial frames are described in Table 3.

The semantic type of the FIGURE and GROUND el-ements are identified for each spatial frame constructedusing the RadLex ontology. For the example above re-lated to the positioning of the umbilical venous cathetertip, the semantic type of FIGURE is MEDICAL DEVICEand the type of GROUND is ANATOMICAL LOCATION.All this information–combining the semantic types andthe spatial relation type–are used to further refine thespatial frame FIGURE-DIRECTIONAL-GROUND-SF asMEDICAL DEVICE-DIRECTIONAL-ANATOMICAL LO-CATION-SF.

3.3. Spatial VignettesThe main idea of spatial vignettes are also adopted fromSpatialNet. In this paper, we develop vignettes primar-ily to resolve the ambiguities involved in using the samespatial trigger words or phrases to describe different radi-ological contexts. In other words, the same spatial expres-sions might have different spatial configurations based onthe context or the radiological entities associated.

The vignettes connect the spatial frames to spatio-graphic primitives utilizing the RadLex ontology, seman-tic/relation type constraints as well as domain knowledgeto generate more accurate spatial representations of radiol-ogy language. Consider the following two sentences havingthe same spatial trigger ‘extends into’:

1. There is interval increase in the right pleural effusionwhich extends into the fissure.

2. There is an NG tube which extends into the stomach.The first sentence contains a radiologist’s description of afluid disorder ‘pleural effusion’ with respect to an anatom-ical reference–fissure in the pleural cavity (a closed spacearound the lungs), whereas the second sentence describesthe positioning of a feeding tube. It is difficult to inter-pret the actual spatial meaning of the same prepositionalverb ‘extends into’ in these two different contexts solelyfrom the lexical information. The spatial vignettes map thespatial frames corresponding to these sentences to differ-ent spatio-graphic primitives representing the actual spa-tial orientation of ‘extends into’ utilizing radiology domainknowledge.

Semantic constraints are applied to the FIGUREand GROUND frame elements, whereas a spatial rela-tion type constraint is also added to the relation sense.If the semantic category of the FIGURE is MEDICALDEVICE and the relation type is DIRECTIONAL, withthe lexical unit extends (into).prep, then a spatial vi-gnette will generate the spatio-graphic-primitive MEDI-CAL DEVICE-TERMINATES INTO-ANATOMICAL LO-CATION-SGP from the spatial frame MEDICAL DEVICE-DIRECTIONAL-ANATOMICAL LOCATION-SF (corre-sponding to example 2 above). Another vignette will pro-duce the spatio-graphic primitive DISORDER-EXTENDSINTO-ANATOMICAL LOCATION-SGP if the semanticcategory of the FIGURE is DISORDER instead of MEDI-CAL DEVICE and particularly refers to fluid-related dis-orders like ‘pleural effusion’(corresponding to example 1

2254

RelationType

Description

Containment Denotes that a finding/observation/device is contained within an anatomical location (“There is again seenhigh T2 signal within the mastoid air cells bilaterally”)

Directional Denotes a directional sense in which a radiological entity is described wrt location (“An NGT has its tip belowthe diaphragm”)

Contact Denotes an entity is in contact with an anatomical structure (“NGT reaches the stomach”)Encirclement Denotes a finding is surrounding an anatomical location or another finding (“Left temporal hemorrhage with

surrounding edema is redemonstrated”)Spread Denotes traversal of an entity toward an anatomical location (“An NG tube extends to the level of the di-

aphragms.”)Description Denotes an anatomical location being described with any abnormality or observation (“There is also some

opacification of the mastoid air cells.”)Distance Denotes a qualitative distance between a radiographic finding and an anatomical location (“There are areas

of T2 hyperintensity near the lateral ventricles.”)Adjacency Denotes a radiographic finding is located adjacent to a location (“There is a small amount of hypodensity

adjacent to the body of the right lateral ventricle.”)Table 2: Broader categories of spatial relations in radiology

Element DescriptionElements with respect to a spatial trigger

FIGURE The object whose location is described through the spatial trigger (usually refers to find-ing/location/disorder/device/anatomy/tip/port)

GROUND The anatomical location of the trajector described (usually an anatomical structure)HEDGE Uncertainty expressions used by radiologists (e.g., ‘could be related to’, ‘may concern for’ etc.)DIAGNOSIS Clinical condition/disease associated with finding/observation suggested as differential diagnoses, usually appears after the

hedge related termsREASON Clinical condition/disease that acts as the source of the finding/observation/disorderRELATIVEPOSITION

Terms used for describing the orientation of a radiological entity wrt to an anatomical location (e.g., ‘posteriorly’ in “Blunt-ing of the costophrenic sulci posteriorly is still present”, ‘high’ in “The UV line tip is high in the right atrium.”)

DISTANCE The actual distance of the finding or device from the anatomical location (e.g., ‘1 cm’ in “ETT tube is 1 cm above thecarina.”)

POSITIONSTATUS

Any position-related information, usually in context to a device (e.g., ‘terminates’ in “A right PIC catheter terminates inthe mid SVC.”)

ASSOCIATEDPROCESS

Any process/activity associated with a spatial relation (e.g., ‘intubation’ in “may be related to recent intubation”)

Elements with respect to a radiological entitySTATUS Indicating status of entities (e.g., ‘stable’, ‘normal’, ‘mild’)MORPHOLOGIC Indicating shape (e.g., ‘rounded’)DENSITY Terms referring to densities of findings/observations (e.g., ‘hypodense’, ‘lucent’)MODALITY Indicating modality characteristics (e.g., ‘attenuation’)DISTRIBUTION Indicating distribution patterns (e.g., ‘scattered’, ‘diffuse’)TEMPORAL Indicating any temporality (e.g., ‘new’, ‘chronic’)COMPOSITION Indicating composition of any radiological observation (e.g., ‘calcified’)NEGATION The negated phrase related to a finding/observation (e.g., ‘without evidence of ’)SIZE The actual size of any finding/observation (e.g., ‘14-mm’ describing the size of a lytic lesion)LATERALITY Indicating side (e.g., ‘left’, ‘bilateral’)QUANTITY Indicating the quantity of any radiological entity (e.g., ‘multiple’, ‘few’)

Table 3: Frame elements in Rad-SpatialNet

Figure 2: Examples of spatial vignettes for differentiating the spatial meanings of three commonly found spatial expressionsin radiology reports.

2255

above). The vignettes here determine the spatial meaningsof radiology sentences based on both the semantic types ofFIGURE element and specific properties of the DISORDERtype (for example, fluidity in this case). Similar ambigui-ties are observed for spatial expressions containing lexcialunits such as projects (over).prep and overlying.v as they of-ten occur in both device and observation or disorder-relatedcontexts. Thus, the vignettes differentiate the real orienta-tion of the spatial expression when it is used in context to amedical device versus any other radiographic finding. Con-sider another set of examples below where both sentencesare related to the tip position of two medical devices:

1. The umbilical arterial catheter tip projects over theT8-9 interspace.

2. The tip of the right IJ central venous line projects overthe upper right atrium.

Here, the spatial vignettes produce a more specific spa-tial representation of the prepositional verb ‘projects over’based on the details of anatomical location. A spatial vi-gnette will produce the spatio-graphic primitive MEDI-CAL DEVICE TIP-IN FRONT OF-ANATOMICAL LOCA-TION-SGP for the first sentence. IN FRONT OF is de-rived as the anatomical structure is corresponding to the‘T8-9 interspace’ in the spine which is often used as thelevel of reference to indicate the position of catheters andtubes and these tubes and catheters are in front of thespine. Another vignette will produce MEDICAL DEVICETIP-TERMINATES AT LEVEL OF-ANATOMICAL LO-CATION-SGP for the second sentence as the IJ venous linelies within the internal jugular vein and might go upto the‘right atrium’.

Consider the following sentences containing ‘overly-ing.v’ as the lexical unit:

1. There is a less than 1 cm diameter rounded nodularopacity overlying the 7th posterior rib level.

2. The left IJ pulmonary artery catheter’s tip is currentlyoverlying the proximal SVC.

If the FIGURE is IMAGING OBSERVATION and theGROUND is particularly associated with anatomical lo-cations such as ribs, a spatial vignette will produce theprimitive IMAGING OBSERVATION-PROJECTS OVER-ANATOMICAL LOCATION-SGP for the first sentence.Ribs are also used the same way as spine to describe thelevel of objects or pathology. However, for the second sen-tence, the spatio-graphic primitive will be MEDICAL DE-VICE TIP-TERMINATES AT LEVEL OF-ANATOMICALLOCATION-SGP as the FIGURE is MEDICAL DEVICE andthe GROUND (anatomical location) is ‘SVC’.

The following examples contain the spatial trigger ‘of ’connecting two anatomical structures or parts of anatomicalstructures:

1. There is increased signal identified within the pons ex-tending to the right side of the midline.

2. The UA catheter tip overlies the left pedicle of the T9vertebral body.

In the above cases with respect to the spatial preposition‘of ’, the semantic types of both FIGURE and GROUND arereferring to ANATOMICAL LOCATIONs. However, there isa difference in the interpretation of the same trigger word

Item First 50 Next 150 Last 200Relation Types

CONTAINMENT 0.73 0.81 0.81DIRECTIONAL 0.29 0.73 0.46CONTACT 0.44 0.1 0.03ENCIRCLEMENT 0.67 0.57 0SPREAD 0.24 0.55 0.45DESCRIPTION 0.42 0.54 0.58DISTANCE 0 0.5 0.4ADJACENCY 0 0.25 0.44

Main entitiesSPATIAL TRIGGER 0.58 0.81 0.78ANATOMY 0.48 0.68 0.76DEVICE 0.35 0.82 0.83TIP 0.38 0.98 0.97FINDING/OBSERVATION 0.28 0.43 0.38DESCRIPTORS 0.29 0.64 0.71

Frame ElementsFIGURE 0.33 0.58 0.62GROUND 0.42 0.67 0.70DIAGNOSIS 0 0.51 0.54HEDGE 0.19 0.48 0.45REASON 0 0.38 0RELATIVE POSITION 0.07 0.48 0.58DISTANCE 0.4 0.86 0.71POSITION STATUS 0 0.62 0.42ASSOCIATED PROCESS 0.57 0 0

Table 4: Annotator agreement.

‘of ’. The vignette adds a constraint that if the spatial rela-tion type is DESCRIPTIVE and both FIGURE and GROUNDhave semantic type as ANATOMICAL LOCATION, then themeaning of the preposition is determined based on thewords of the FIGURE element. If the words are either‘side’ or ‘aspect’, then ‘of ’ refers to a subarea/side of theGROUND anatomical location. Whereas for other words,‘of ’ refers to a specific identifiable anatomical structurecontained within the GROUND element, similar to ‘pedi-cle’ present at each vertebra in Example 2 above. The spa-tial vignettes corresponding to the three spatial expressionsdescribed above - extends (into).prep, projects (over).prep,and of.prep are illustrated in Figure 2.

4. Annotation ProcessWe annotated a total of 400 radiology reports–Chest X-rayreports (136), Brain MRI reports (127), and Babygram re-ports (137)–from the MIMIC III clinical database (Johnsonet al., 2016). The language used in MIMIC reports is morecomplex and the report lengths as well as sentence lengthsare long compared to other available datasets such as open-i chest X-ray reports (Demner-Fushman et al., 2016). Wefiltered the babygram-related reports following the ‘baby-gram’ definition, that is, an X-ray of the whole body ofan infant (usually newborn and premature infants). Sincebabygram reports have frequent mentions of medical de-vice positions and involve multiple body organs, we incor-porate this modality mainly with the intention to build acorpus with balance between two major spatially-groundedradiological entities–imaging observations/clinical findingsand medical devices. We pre-processed the reports to de-identify some identifiable attributes including dates andnames and also removed clinically less important contents.All the sentences in the reports containing potential spatialrelations were annotated by two human annotators. Theannotation was conducted using Brat. The annotationswere reconciled three times–following the completion of50, 200, and 400 reports. Since identifying the spatialframe elements involves interpreting the spatial languagefrom the contextual information in a sentence, the annota-tions of the first 50 reports (i.e. our calibration phase) dif-

2256

Figure 3: Examples of annotations.

fered highly between the annotators. Some of the major dis-agreements related to spatial relation sense were discussedand the annotation guidelines were updated after first tworounds of annotation. Examples of sample annotations areprovided in Figure 3.

To provide more insights into some of the complexitiesencountered in the annotation process, we highlight a fewcases here. First, oftentimes two anatomical locations arepresent in a sentence and in such scenarios, an intermediateanatomical location (first location occurrence) is chainedas a FIGURE element in context to the broader anatomi-cal location (second location occurrence and the GROUNDelement). However, this chaining is not valid if the two lo-cations are not connected. Consider the sentences below:

1. There are few small air fluid levels in [mas-toid air cells]FirstLocation within the left [mastoidprocess]SecondLocation.

2. Area of increased signal adjacent to the left lat-eral [ventricle]FirstLocation at the level of [coronaradiata]SecondLocation.

For the first sentence, ‘mastoid air cells’ is containedwithin ‘mastoid process’ and these anatomical locationsare connected. Therefore, ‘mastoid air cells’ is anno-tated as FIGURE element and ‘mastoid process’ as theGROUND in context to the spatial frame formed by thelexical unit ‘within.prep’. Note that ‘mastoid air cells’is the GROUND element associated with the FIGURE ‘airfluid levels’ through the spatial trigger ‘in’. However, forthe second sentence, ‘ventricle’ and ‘corona radiata’ aretwo separate anatomical references and are not connected.Hence, two separate spatial relations are formed with the ra-diographic observation ‘signal’, one between ‘signal’ and‘ventricle’ described through ‘adjacent to’ and the other be-tween ‘signal’ and ‘corona radiata’ described through ‘at’and the RELATIVE POSITION ‘level of ’. Second, some in-stances require correct interpretation of whether preposi-tions such as ‘of ’ are SPATIAL TRIGGERs or are part oflocation descriptors. Note the examples below:

1. PICC line with its tip located at the junction of supe-rior vena cava and left brachiocephalic vein.

2. Abnormal signal in posterior portion of spinal cord.In the first sentence, ‘junction of ’ is annotated as the REL-ATIVE POSITION describing the connection between ‘su-perior vena cava’ and ‘brachiocephalic vein’, whereas inthe second sentence, ‘of ’ is a SPATIAL TRIGGER connect-ing the FIGURE–‘portion’ and the GROUND–‘spinal cord’.Third, although location descriptor words such as anterior,lateral, and superior are usually annotated as RELATIVEPOSITION, in a few cases they are annotated as SPATIALTRIGGER. For example, note the following two sentences:

1. There is an area of high signal intensity extending intothe [anterior]RELATIVE POSITION mediastinum.

2. There is a 6 mm lymph node [anteriorto]SPATIAL TRIGGER the carina.

In the first sentence, ‘anterior’ is used to describe ‘me-diastinum’ and is annotated as a RELATIVE POSITION,whereas in the second sentence, ‘anterior’ contributes inperceiving the actual spatial sense and hence ‘anterior to’is annotated as a SPATIAL TRIGGER.

5. Results5.1. Annotation statisticsThe inter-annotator agreement results are shown in Ta-ble 4. We calculate the overall F1 agreement for anno-tating the spatial relation types, the main entities, and thespatial frame elements. The agreement measures are par-ticularly low (around 0.4) for FINDING/OBSERVATIONs asoften there are higher chances of boundary mismatch in theprocess of separating the descriptor-related words from themain finding or observation term. There are very few in-stances of PROCESS entities and cases where the disor-der terms act as REASONs in the corpus which have re-sulted in the agreement measures being zero. Ultimately,Rad-SpatialNet requires an extremely knowledge-intensiveannotation process, so low agreement at this stage is notunreasonable. Future work will include additiional qualitychecks to ensure the semantic correctness of the annota-tions.

2257

Item FreqGeneral

Average sentence length 17.6Spatial triggers 1372Sentences with 1 spatial trigger 874Sentences with more than 1 spatial trigger 227

Entity TypesMEDICAL DEVICE 330TIP OF DEVICE 142DEVICE PORT/LEAD 11IMAGING OBSERVATION 436CLINICAL FINDING 367DISORDER 390ANATOMICAL LOCATION 1492DESCRIPTOR 1548ASSERTION 326QUANTITY 67LOCATION DESCRIPTOR 398POSITION INFO 167PROCESS 19

Spatial Frames (SFs)CONTAINMENT 642DESCRIPTION 387DIRECTIONAL 168SPREAD 69CONTACT 54ADJACENCY 32ENCIRCLEMENT 14DISTANCE 6Most frequent Lexical Units - Containment SF‘in.prep’ 410‘within.prep’ 134‘at.prep’ 86Most frequent Lexical Units - Description SF

‘of.prep’ 277‘are.prep’ 37‘with.prep’ 17Most frequent Lexical Units - Directional SF

‘above.prep’ 43‘projecting (over).prep’ 24‘below.prep’ 18

Spatial Frames based on semantic typesMEDICAL DEVICE-related 194MEDICAL DEVICE TIP-related 142IMAGING OBSERVATION-related 436CLINICAL FINDING-related 344DISORDER-related 212

Spatial frame elementsFIGURE 1491GROUND 1537HEDGE 249DIAGNOSIS 190REASON 33RELATIVE POSITION 398DISTANCE 45POSITION STATUS 167ASSOCIATED PROCESS 21

Table 5: General corpus statistics.

ELEMENT OBS FNDG DIS DEVC ATYSTATUS 231 121 77 1 22QUANTITY 50 31 14 5 30DISTRIBUTION 43 14 6 0 2MORPHOLOGIC 37 20 6 0 6SIZE DESC. 33 19 31 1 9NEGATION 32 50 20 0 1TEMPORAL 23 26 50 14 0LATERALITY 20 3 22 68 499SIZE 19 1 3 0 0COMPOSITION 5 8 2 0 2DENSITY 5 0 0 0 0MODALITY 2 0 0 0 0

Table 6: Descriptive statistics of the radiological entities inthe annotated corpus. (OBS - Observation, FNDG - Finding,DIS - Disorder, DEVC - Device, ATY - Anatomy)

Model P (%) R (%) F1BERTBASE (MIMIC) 92.20 43.04 57.52BERTLARGE (MIMIC) 93.72 67.13 77.89

Table 7: 10 fold CV results for spatial trigger extraction. P- Precision, R - Recall.

5.2. Corpus statistics358 (89.5%) of the reports contain spatial relations. Thereconciled annotations contain a total of 1101 sentenceswith mentions of spatial triggers. There are 1372 spatialtrigger terms in total (average of 3.8 triggers per report).The frequencies of the entity types, the types of spatialrelations, and the spatial frames are presented in Table 5.The predominant spatial trigger types to instantiate spatialframes are ‘Containment’, ‘Description’, and ‘Directional’.81 of the 330 DEVICE entities are described using variousdescriptor terms, 327 of the 436 OBSERVATION entities,240 of the 367 CLINICAL FINDINGs, 190 of the 390 DIS-ORDERs, and 564 of the 1492 ANATOMICAL LOCATIONscontain descriptors. The distribution of various types ofdescriptors (consistent with RadLex) across the main radi-ological entities are shown in Table 6. Note that the figuresin Table 5 and Table 6 are considering only those entitiesand their descriptors which are involved in a spatial rela-tion. There are 1328 frame instances which correspond tothe main radiological entities as demonstrated in Table 5.Among the remaining 44 spatial triggers, 11 are related toPort/Lead of devices and 33 describes how a specific partof an anatomical structure is linked to the main part.

5.3. Baseline system performanceAs an initial step of extracting the spatial trigger terms(lexical units of spatial frames) and the associated spatialinformation (spatial frame elements) in report sentences,we utilize both BERTBASE and BERTLARGE pre-trained lan-guage models as our baseline systems. We formulate boththe tasks of lexical unit identification and frame elementsextraction as sequence labeling task. We initialize themodel parameters obtained by pre-training BERTBASE andBERTLARGE on MIMIC-III clinical notes (Si et al., 2019)for 300K steps and fine-tune the models on our constructedcorpus. For fine-tuning, we set the maximum sequencelength at 128, learning rate at 2e-5, number of trainingepochs at 4 and use cased version of the models.

10-fold cross validation is performed to evaluate themodel’s performance with 80-10-10% training, validation,and test splits of the reports. First, the spatial triggers areextracted in a sentence. Second, we extract the commonframe elements with respect to the trigger. For element ex-

2258

Main Frame Elements GOLD SPATIAL TRIGGERS PREDICTED SPATIAL TRIGGERSP (%) R(%) F1 P(%) R(%) F1FIGURE 75.14 84.10 79.35 63.26 40.48 48.00GROUND 84.96 88.90 86.87 68.83 42.95 51.64HEDGE 64.31 77.46 69.78 56.91 31.84 38.69DIAGNOSIS 53.99 79.54 63.93 38.89 26.48 29.18RELATIVE POSITION 81.27 75.56 78.12 67.01 39.91 48.81DISTANCE 86.64 87.50 84.87 77.17 73.67 74.29POSITION STATUS 62.29 64.07 62.74 56.86 52.05 52.39ASSOCIATED PROCESS, REASON 0.0 0.0 0.0 0.0 0.0 0.0OVERALL 76.53 82.69 79.48 63.98 40.73 48.55

Table 8: 10 fold CV results for extracting spatial frame elements using BERTBASE (MIMIC). P - Precision, R - Recall.

Main Frame Elements GOLD SPATIAL TRIGGERS PREDICTED SPATIAL TRIGGERSP(%) R(%) F1 P(%) R(%) F1FIGURE 77.42 85.75 81.35 65.51 65.44 65.12GROUND 88.92 91.57 90.22 73.31 70.21 71.51HEDGE 67.20 77.94 71.59 60.43 57.26 57.82DIAGNOSIS 51.96 79.81 62.31 47.06 57.64 50.76RELATIVE POSITION 81.31 78.42 79.57 66.02 67.76 66.33DISTANCE 86.83 87.50 87.00 86.50 90.24 88.05POSITION STATUS 65.73 65.83 65.38 58.59 63.63 60.37ASSOCIATED PROCESS, REASON 0.0 0.0 0.0 0.0 0.0 0.0OVERALL 78.83 84.64 81.61 66.63 66.28 66.25

Table 9: 10 fold CV results for extracting spatial frame elements using BERTLARGE (MIMIC). P - Precision, R - Recall.

traction, we evaluate the system performance using both thegold spatial triggers and the predicted triggers on the testset. The results are shown in Table 7, Table 8, and Table 9.

The results demonstrate that BERTLARGE performs bet-ter than BERTBASE for both spatial trigger prediction andframe elements prediction. However, in case of spatial trig-ger, the recall is low for both the models (Table 7). For ele-ment extraction, BERTBASE’s overall F1 combining all theframe elements is 79.48 (using gold spatial triggers) and48.55 (using predicted triggers), whereas, for BERTLARGE,the difference in the overall F1 between using gold (81.61)versus predicted triggers (66.25) is much lower (15.4 vs30.9). The F1 values are zeroes for REASON and ASSOCI-ATED PROCESS because of few occurrences in the dataset.

6. DiscussionThe Rad-SpatialNet framework attempts to capture all pos-sible contextual information in radiology reports from aspatial perspective. The aim is to utilize linguistic informa-tion as well as domain knowledge to accurately representspatial language in radiology. The proposed examples ofspatial vignettes described above have been validated by apracticing radiologist (SK). We are currently in the processof developing the valence patterns and vignettes by takinginput from radiologists.

The results of the baseline system illustrates that itis difficult to identify the spatial trigger expression. Thismight be because of their wide variation in the reports andmany of these appear as multi-word expressions. We onlyuse the developed corpus to extract the core frame elementsin a spatial frame. The results indicate that there is enoughscope for improving the predictions by developing more ad-vanced methods in the future.

In this work, we only consider intra-sentence spatial re-lations. Oftentimes, we encounter inter-sentence relationsand scenarios where the differential diagnoses are docu-

mented in the sentence following the spatial relation oreven far apart in the ‘Impression’ section (around 12.75%of the reports in our corpus). Other complex informationin context to a spatial trigger could be considered for laterwork. Consider the sentence - ‘The tip of the catheter hasa mild rightward curve, suggesting that it may be directedinto a portal vein.’ Here, information about ‘intermedi-ate position change’ might also be annotated as a frameelement. Further, some phrases such as ‘needs reposition-ing’ can be differentiated from POSITION STATUS as PO-SITION RECOMMENDATION. Thus, further refinements toRad-SpatialNet are still necessary.

7. ConclusionThis work aims to develop a resource for encoding fine-grained spatial information in radiology reports. For this,we extend an existing spatial representation framework inthe open-domain, SpatialNet, to accurately represent radi-ology spatial language. We describe the components ofRad-SpatialNet. Construction of linguistic rules and thespatial vignettes are currently under development. We an-notate 400 radiology reports with important spatial infor-mation and apply a baseline model based on BERT to ex-tract the spatial triggers and the related elements. The re-sults demonstrate that the task of extracting fine-grainedspatial information is challenging and the corpus con-structed in this work can serve as an initial resource to de-velop methods with improved performance in the future.

Acknowledgments This work was supported in partby the National Institute of Biomedical Imaging andBioengineering (NIBIB: R21EB029575), the U.S. Na-tional Library of Medicine (NLM: R00LM012104), thePatient-Centered Outcomes Research Institute (PCORI:ME-2018C1-10963), and the Cancer Prevention ResearchInstitute of Texas (CPRIT: RP160015).

2259

8. Bibliographical ReferencesAnnarumma, M., Withey, S. J., Bakewell, R. J., Pesce, E.,

Goh, V., and Montana, G. (2019). Automated Triagingof Adult Chest Radiographs with Deep Artificial NeuralNetworks. Radiology, 291(1):196–202.

Baker, C. F. (2014). FrameNet : A Knowledge Base forNatural Language Processing. In Proceedings ofFrameSemantics in NLP: A Workshop in Honor ofChuck Fill-more, number 1968, pages 1–5.

Bustos, A., Pertusa, A., Salinas, J.-M., and de la Iglesia-Vayá, M. (2019). PadChest: A large chest x-ray im-age dataset with multi-label annotated reports. arXivpreprint arXiv:1901.07441.

Cornegruta, S., Bakewell, R., Withey, S., and Montana, G.(2016). Modelling Radiological Language with Bidirec-tional Long Short-Term Memory Networks. In Proceed-ings of the Seventh International Workshop on HealthText Mining and Information Analysis, pages 17–27.

Coyne, B. and Sproat, R. (2001). WordsEye: An automatictext-to-scene conversion system. In Proceedings of the28th Annual Conference on Computer Graphics and In-teractive Techniques, pages 487–496.

Coyne, B., Sproat, R., and Hirschberg, J. (2010). Spatialrelations in text-to-scene conversion. In ComputationalModels of Spatial Language Interpretation, Workshop atSpatial Cognition, volume 620, pages 9–16.

Coyne, B., Bauer, D., and Rambow, O. (2011). VigNet:Grounding Language in Graphics using Frame Seman-tics. In Proceedings of the ACL 2011 Workshop on Rela-tional Models of Semantics, pages 28–36.

Demner-Fushman, D., Kohli, M. D., Rosenman, M. B.,Shooshan, S. E., Rodriguez, L., Antani, S., Thoma,G. R., and McDonald, C. J. (2016). Preparing a col-lection of radiology examinations for distribution and re-trieval. Journal of the American Medical Informatics As-sociation, 23(2):304–310.

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.(2019). BERT: Pre-training of Deep Bidirectional Trans-formers for Language Understanding. In Proceedings ofthe 2019 Conference of the North American Chapter ofthe Association for Computational Linguistics: HumanLanguage Technologies, pages 4171–4186, Minneapolis,Minnesota, jun. Association for Computational Linguis-tics.

Guadarrama, S., Riano, L., Golland, D., Gohring, D.,Jia, Y., Klein, D., Abbeel, P., and Darrell, T. (2013).Grounding spatial relations for human-robot interaction.In IEEE International Conference on Intelligent Robotsand Systems, pages 1640–1647.

Hassanpour, S. and Langlotz, C. P. (2016). Informationextraction from multi-institutional radiology reports. Ar-tificial Intelligence in Medicine, 66:29–39.

Hassanpour, S., Bay, G., and Langlotz, C. P. (2017).Characterization of Change and Significance for Clin-ical Findings in Radiology Reports Through Natu-ral Language Processing. Journal of Digital Imaging,30(3):314–322.

Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S.,Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpan-

skaya, K., Seekins, J., Mong, D. A., Halabi, S. S., Sand-berg, J. K., Jones, R., Larson, D. B., Langlotz, C. P.,Patel, B. N., Lungren, M. P., and Ng, A. Y. (2019).CheXpert: A Large Chest Radiograph Dataset with Un-certainty Labels and Expert Comparison. In AAAI Con-ference on Artificial Intelligence.

Johnson, A. E., Pollard, T. J., Shen, L., wei H. Lehman,L., Feng, M., Ghassemi, M., Moody, B., Szolovits, P.,Celi, L. A., , and Mark, R. G. (2016). MIMIC-III, afreely accessible critical care database. Scientific Data,3:160035.

Kordjamshidi, P., Otterlo, M. V., and Moens, M.-F. (2010).Spatial Role Labeling : Task Definition and AnnotationScheme. In Proceedings of the Language Resources &Evaluation Conference, pages 413–420.

Kordjamshidi, P., Roth, D., and Moens, M.-F. (2015).Structured learning for spatial information extractionfrom biomedical text: Bacteria biotopes. BMC Bioinfor-matics, 16(1):1–15.

Kordjamshidi, P., Rahgooy, T., and Manzoor, U.(2017). Spatial Language Understanding with Multi-modal Graphs using Declarative Learning based Pro-gramming. In Proceedings of the 2nd Workshop onStructured Prediction for Natural Language Processing,pages 33–43.

Langlotz, C. P. (2006). RadLex: a new method forindexing online educational materials. Radiographics,26(6):1595–1597.

Petruck, M. R. and Ellsworth, M. (2018). RepresentingSpatial Relations in FrameNet. In Proceedings of theFirst International Workshop on Spatial Language Un-derstanding, pages 41–45.

Pons, E., Braun, L. M., Hunink, M. M., and Kors, J. A.(2016). Natural Language Processing in Radiology: ASystematic Review. Radiology, 279(2).

Rink, B., Roberts, K., Harabagiu, S., Scheuermann, R. H.,Toomay, S., Browning, T., Bosler, T., and Peshock, R.(2013). Extracting actionable findings of appendicitisfrom radiology reports using natural language process-ing. In AMIA Joint Summits on Translational ScienceProceedings, volume 2013, page 221.

Roberts, K., Rink, B., Harabagiu, S. M., Scheuermann,R. H., Toomay, S., Browning, T., Bosler, T., and Peshock,R. (2012). A machine learning approach for identifyinganatomical locations of actionable findings in radiologyreports. In AMIA Annual Symposium Proceedings, vol-ume 2012, pages 779–788.

Roberts, K., Rodriguez, L., Shooshan, S., and Demner-Fushman, D. (2015). Automatic Extraction and Post-coordination of Spatial Relations in Consumer Lan-guage. In AMIA Annual Symposium Proceedings, vol-ume 2015, pages 1083–1092.

Sevenster, M., Van Ommering, R., and Qian, Y. (2012).Automatically correlating clinical findings and body lo-cations in radiology reports using MedLEE. Journal ofDigital Imaging, 25(2):240–249.

Si, Y., Wang, J., Xu, H., and Roberts, K. (2019). Enhanc-ing clinical concept extraction with contextual embed-dings. Journal of the American Medical Informatics As-

2260

sociation, pages 1–8.Steinkamp, J. M., Chambers, C., Lalevic, D., Zafar, H. M.,

and Cook, T. S. (2019). Toward Complete Structured In-formation Extraction from Radiology Reports Using Ma-chine Learning. Journal of Digital Imaging, 32(4):554–564.

Ulinski, M., Coyne, B., and Hirschberg, J. (2019). Spa-tialNet: A Declarative Resource for Spatial Relations. InProceedings of the Combined Workshop on Spatial Lan-guage Understanding (SpLU) and Grounded Communi-cation for Robotics (RoboNLP), pages 61–70.

Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., and Sum-mers, R. M. (2017). ChestX-ray8: Hospital-scale chestX-ray database and benchmarks on weakly-supervisedclassification and localization of common thorax dis-eases. In 2017 IEEE Conference on Computer Visionand Pattern Recognition (CVPR), pages 3462–3471.

Yim, W.-W., Denman, T., Kwan, S. W., and Yetisgen, M.(2016). Tumor information extraction in radiology re-ports for hepatocellular carcinoma patients. In AMIAJoint Summits on Translational Science Proceedings,volume 2016, pages 455–64.

Yuan, Y. (2011). Extracting spatial relations from docu-ment for geographic information retrieval. In Proceed-ings - 2011 19th International Conference on Geoinfor-matics. IEEE.

IntroductionRelated WorkRad-SpatialNet DescriptionOntology of Radiology TermsSpatial FramesSpatial Vignettes

Annotation ProcessResultsAnnotation statisticsCorpus statisticsBaseline system performance

DiscussionConclusionBibliographical References

rad-spatialnet: a frame-based resource for fine-grained ...rad-spatialnet: a frame-based resource...

Documents