data privacy in biomedicine dictionaries and rules lecture ... · 1 data privacy in biomedicine...

1

Data Privacy in Biomedicine

Lecture 7: More Scrubbing

Bradley Malin, PhD ([email protected])

Professor of Biomedical Informatics, Biostatistics, & Computer Science

Vanderbilt University

February 5, 2020

© 2020 Bradley Malin 2Data Privacy in Biomedicine: Lecture 7 – Scrubbing

Today’s Lecture

◼ Dictionaries and Rules

Concept Match [B2W]

Medlee [B2W]

Lexicon [W2B]

◼ Machine Learning and Trained Systems

◼ Resynthesis


Concept Match (Berman ’03)

◼ Clinical Concept Dictionary-based approach

◼ If word in dictionary, then it remains in

document, otherwise removed

◼ Each retained concept is swapped for “synonym”

◼ Retains high frequency stop words

Berman JJ. Concept-match medical data scrubbing: how pathology text can be used in research. Archives of

Pathology and Laboratory Medicine. 2003; 127(6): 680-686. © 2020 Bradley Malin 4Data Privacy in Biomedicine: Lecture 7 – Scrubbing

Matching Process (Berman ‘03)

1. Parse all input into sentence

2. Parse each sentence into words

3. Each stop word (high-frequency) is preserved in

original place

E.g., “the”, “a”, “of”

4. Map remaining words / phrases to standard

nomenclature (e.g., UMLS)

Large terms subsume smaller substrings

5. Replace by alternate term mapping to same concept

code

e.g. “renal cell carcinoma” → C0007134 → “rcc” or

“hypernephroma”

6. Non-mapped words are blocked out

Berman JJ. Concept-match medical data scrubbing: how pathology text can be used in research. Archives of

Pathology and Laboratory Medicine. 2003; 127(6): 680-686.


UMLS

◼ Unified Medical Language System metathesaurus

◼ Very large / multi-purpose / multi-lingual vocabulary of

biomedical and health related concepts

◼ Over 100 different sources

International Classification of Diseases (ICD)

Current Procedural Terminology (CPT)

◼ Over 2 million medical terms

◼ Over 900,000 medical concepts

https://www.nlm.nih.gov/research/umls/


Concept Matching Sample(Berman ‘03)

1 2

3 4

5 6

2


Concept Matching Sample(Berman ‘03)


Match Limits (Berman ‘03)

◼ There are some limitations

Misspelled terms are automatically dropped

Synonym replacement may obscure semantic meaning

Does not handle ambiguous terms

Terms in dictionaries may be sensitive (e.g.,

“homicide”, “abuse”, ...)


Today’s Lecture


Concept Match [B2W]

Medlee [B2W]

Lexicon [W2B]


◼ Resynthesis


Another Variation on Concepts(Morrison ‘09)

◼ MedLEE - Medical Language Extraction & Encoding System

http://www.medlingmap.org/taxonomy/term/80

◼ 100 Clinical follow-up notes (PHI annotated by human)

F. Morrison et al. Repurposing the clinical record: can an existing natural language processing system de-

identify clinical notes. Journal of the American Medical Informatics Association. 2009; 16: 37-39.

PHI Type Instances of PHI Instances in Output % Leaked

Age > 89 7 5 7.1%

Clinician 157 6 3.8%

Date 300 0 0%

Hospital 100 7 7%

Location 45 3 6.7%

Patient 126 4 3.2%

Telephone 33 1 3.0%

ID’s 41 0 0%

Total 809 26 3.2%


Another Variation on Concepts(Morrison ‘09)

◼ Leaked location:

“st” (meant Street or Saint hospital name) was

interpreted as part of EKG

◼ Examples of leaked names

Colors: “Green” and “Brown”

Common English: “Rose”

Disease Names: “Dias” vs. “Dias Disease

F. Morrison et al. Repurposing the clinical record: can an existing natural language processing system de-

identify clinical notes. Journal of the American Medical Informatics Association. 2009; 16: 37-39. © 2020 Bradley Malin 12Data Privacy in Biomedicine: Lecture 7 – Scrubbing

Today’s Lecture


Concept Match [B2W]

Medlee [B2W]

Lexicon [W2B]


◼ Resynthesis

7 8

9 10

11 12

http://www.medlingmap.org/taxonomy/term/80

3


◼ Identifier discovery modeled as knowledge extraction

problem

◼ Exhaustively list patterns of names and numbers

“IDentity Marker” (IDM) [Sir, Mr., …]

followed by terms

IDM [maybe MD]

◼ think regular expression

Expressions specified for dates, phone numbers (nnn-nnnn), …

◼ Replaces names with “X” and terms with “x”

Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G. Medical document anonymization with a semantic

lexicon. In Proceedings of the 2000 AMIA Annual Fall Symposium. 2000; 729-733.

Semantic Lexicon(Ruch et al ‘00)



◼ Performs word-sense disambiguation via morpho-

syntactic tagger (MS) then rule-based word sense (WS)

Rules ranked by “reliability”

◼ Added terms to MEDTAG Lexicon* (based on UMLS –

contains 5131 entries), for identifier detection

such as those focused on medical institutions, list of drugs,

medical device names

◼ Detects if potential identifier term is followed by actual

identifier

e.g., “Doctors observed” as opposed to “Doctors Smith and

Johnson observed”

*P. Ruch et al. MEDTAG: tag-like semantics for medical document indexing. Proc AMIA Symp. 1999.


◼ Morpho-syntactic tagger (MS) makes the part-of-speech

explicit

Semantic Disambiguation(Ruch et al ‘00)

Tok Level

MS Level

v/cn {TOK:miss} ; np cn;pn

Given a word that is

ambiguous between a

v (verb) and cn

(common noun)

Little Miss Tuffet vs. I miss the diagnosis

If the word “miss” is

followed by an pn

(proper noun), then tag

as cn; otherwise pn


lexicon. In Proceedings of the 2000 AMIA Annual Fall Symposium. 2000; 729-733. © 2020 Bradley Malin 16Data Privacy in Biomedicine: Lecture 7 – Scrubbing

◼ Rule-based word sense (WS) leverages previous round

of disambiguation to derive entity-specific predictions

Again, rules ranked by “reliability”

MS Level

WS Level

idm/pers rel {MS:sp} pers; rel

Given a word that is

ambiguous between an

idm and pers.

If the word “doctors” is

followed by a rel

(relationship), and then

by sp (“preposition” -

according to MS), then

tag as pers.

doctors said vs. Doctors Smith and Wesson

;

Semantic Disambiguation(Ruch et al ‘00)




Extraction(Ruch et al ‘00)

◼ Extraction module processes the 3 level stream

(token → MS level → WS level)

◼ Switches on extraction mode when reads token

tagged as id from WS level

◼ Switches off when it hits barrier (i.e., token not

tagged as id)

◼ Specialized rules to handle multi-part last names

(e.g., “van Winkle”)



Evaluation(Ruch et al ‘00)

◼ 1000 medical documents from University Hospital Geneva

80,784 tokens

600 Post-operative reports

200 Laboratory and test results

200 discharge summaries

◼ Set A: 20% of documents for training

◼ Set B: 80% for testing

◼ If word is not observed in training – throw it out



13 14

15 16

17 18

4


Semantic Tagset (Ruch et al ‘00)

Tag Frequency Definition Example

1 qual 0.101 Qualifier fat

2 acto 0.095 General act leave

3 loc 0.093 Organ / body location liver

4 spat 0.087 Spatial concept high

5 temp 0.053 Temporal concept late

6 mod 0.051 Modal maybe

7 quant 0.047 Quantitative concept five

8 papr 0.045 Pathological process infection

9 find 0.042 Signs or symptoms fever

10 cpt 0.041 Other concept idea

. … … … …

31 idm 0.006 Identity Marker Dr.

?? id <<0.001 Identifier Proper

Noun

Louise


Semantic Tagset(Ruch et al ‘00)

0

0.02

0.04

0.06

0.08

0.1

0.12

0 10 20 30 40

Fre

qu

en

cy

Term Rank

idmid




Evaluation (Ruch et al ‘00)

◼ Over 40 rules written based on Set A

Took ~3 weeks

Observed 124 identifiers in 16,456 tokens

◼ Six types of results

Identifiers in corpus 467 100%

Identifiers correctly removed (ICR) 452 96.8%98.5%

ICR + additional terms 8 1.7%

Identifiers incompletely removed 3 0.6%1.5%

Identifiers left in text 4 0.9%

Non-identifier tokens removed 0 0%




◼ Limitations

Requires exhaustive specification

Hand curation of rules

Claim of generalizability, but no proof




Today’s Lecture

◼ Dictionaries & Rules


◼ Resynthesis


Trained Semantic Templates(TBK ‘02)

◼ Tagged references for name and local context

e.g., “Johnny” tagged as name

e.g., “underwent” tagged as context

e.g., type of surgery. etc.

◼ Made logical relation of predicate and ordered list of 1

argument

◼ Predicates defined by word order, not spacing

◼ Calculated frequency of relations in training set

Taira R, Bui A, Kangarloo H. Identification of patient name references within medical documents. In

Proceedings of the 2002 AMIA Annual Fall Symposium. 2002; 757-761.

19 20

21 22

23 24

5


Example of Predicates

Predicate Relative Frequency Example

Patient-healthStatus 0.189 John was doing well





Patient-age 0.181 John is 3 years old





Patient-age 0.181 John is 3 years old

Patient-condition 0.140 John developed a fever

Patient-procedure 0.109 John received therapy

Patient-gender 0.108 John is a 5 year-old male

Patient-anaphora 0.102 John is a patient with …

Patient-ADT 0.061 John was discharged

Patient-relative 0.035 John’s mother

Patient-ethnicity 0.028 John is an Asian male

Patient-heightWeight 0.022 John is a chubby male



◼ Algorithm

For each token

◼ If token not excluded***

Locate all possible logical relational constructs relating to

an identifier are associated with the token

For each construct

▪ Determine the probability that the token satisfies the

construct

▪ If the probability > threshold, then predict identifier


Exclusions

◼ Drug name list (6,200 entries)

◼ Part of physician name based on tokens

e.g., “Dr.”, “M.D.”

◼ Followed by diagnostic qualifier

e.g., “Syndrome”, “Disease”, “Procedure”

◼ Part of department or institution

e.g., “Medical Center”

◼ Associated with article / determiner attachment


Classification Model(TBK ‘02)

◼ 2-class problem w / maximum entropy model and log-linear basis

◼ Constructs “learned” from training set

Example: John is a 5 year old male with disease X…

Construct: isofAge(John, 5 year old)

Construct: isofGender(John, male)

b = vector of terms associated with construct

fi = “indicator” functions associated with terms, such as word ordering

i = weight associated with feature (from training)

Z = normalization constant (mass over all classes of predictions)

25 26

27 28

29 30

6


Alternative Models

◼ Any term trained classifier can be applied

Support vector machines

Naïve Bayes

Boosted Decision Trees

Conditional Random Fields

Recurrent Neural Networks (Deep Learning)

Uzuner, et al. A de-identifier for medical discharge summaries. Artif Intell Med. 2008;42: 13-35.

Wellner, et al. Rapidly retargetable approaches to de-identification in medical records. J Am Med Inform Assoc. 2007;14:564-73.


Conditional Random Fields (CRFs)

◼ Define the conditional probability of a tag (i.e.,

label) sequence

given an observed set sequence of tokens

is

Wellner B, et al. Rapidly retargetable approaches to de-identification in medical records. Journal of the

American Medical Informatics Association. 2007: 564-573.

Feature function



◼ Each feature function is basically a predicate over

a particular configuration of the observation

relative to the current position t, for a particular

label pair at, at-1

◼ Feature weights indicate how strongly the

predicate (over the observations) correlates with a

particular label pair


American Medical Informatics Association. 2007: 564-573. © 2020 Bradley Malin 34Data Privacy in Biomedicine: Lecture 7 – Scrubbing

CRF Example

◼ Feature for a contextual cue for “Dr.”

An indication we’re about to begin a DOCTOR

phrase

Dt Ba =if

Copy to Dr. Stone , U. BATESSE HOSPITAL

O O O BD

Outside of

a phraseBeginning

of a

“doctor”

phrase

O BH

Beginning

of a

“hospital

phrase

IHIH

Inside or end of

“hospital phrase”

and

Oat =−1

( )=−1tbWORD

and

“Dr”



◼ Feature weights (lambdas) come from

maximizing the conditional log likelihoods of the

training data D

◼ Latter term is a penalty to prevent overfitting

◼ Maximization achieved through iterative gradient

descent on the function

◼ Most likely label sequence from Viterbi (dynamic

programming) algorithm


American Medical Informatics Association. 2007: 564-573.

𝒍𝒊𝒌𝒆𝒍𝒊𝒉𝒐𝒐𝒅𝚲 𝑫 =

𝒂,𝒃

𝒍𝒐𝒈𝑷 𝒂|𝒃 + 𝑹𝝈 𝚲


Biasing the CRF

◼ Works on the “outside of phrase” scenario

◼ O weight for the corresponding feature

◼ Large negative values → label tokens as identifiers

◼ Large positive values → label tokens as non-identifiers

◼ Tune O using a Gauss-Newton line search [see Machine Learning Course]

◼ Terminate when evaluation results (e.g., recall) differ by a

small amount (e.g., 0.01%)

Minkov E, et al. NER Systems that suit user’s preferences: adjusting the recall-precision tradeoff for entity

extraction. Proceedings of the Human Language Technology Conference of the NAACL. 2006: 93-96.

Oat =if and

0 otherwise

𝒇𝑶 𝒂𝒕, 𝒂𝒕−𝟏, 𝒃, 𝒕 =1

31 32

33 34

35 36

7


Bigger Picture of the Process

L. Deleger, et al. Large-scale

evaluation of automated clinical note

de-identification and its impact on

information extraction. J Am Med

Inform Assoc. 2013 Jan 1;20(1):84-94.



◼ Trained system on 1350 pediatric reports

Tagged references for name and local context

◼ e.g., “Johnny” tagged as name

◼ e.g., “underwent” tagged as context

◼ e.g., type of surgery

Taira R, Bui A, Kangarloo H. Identification of patient name references within medical documents. In

Proceedings of the 2002 AMIA Annual Fall Symposium. 2002; 757-761.



◼ Out of 1350 pediatric reports

36% of documents contained name

907 name instances found in non-header

information

◼ Tested with 900 records

ROC of 0.9735

Operating point of 0.55 threshold

◼ 99.2% Precision and 93.9% Recall

Is this good enough?

◼ False Positives

Valid name syntax, but semantically

incorrect


◼ False Positives

Valid name syntax, but semantically incorrect

◼ “Dear Mark, Robert was in our office today”

Identification of a patient’s relative rather than the patient

◼ “Johnny’s sister Mary is 7 years old”

Patient and physician have same name

Rare use of gender description not describing Patient name

◼ “Tanner 4 female”

Drug names that could not be ruled out

Medical conditions that could not be ruled out



◼ Limitations

False Negatives?

Logical relation not modeled

Grammatically difficult expressions

Only name references, not all HIPAA identifiers

May require retraining for each type of dataset

(pediatric versus cardiology)

May identify false semantic templates in training



(Back to) The AMIA “Bakeoff”

◼ 2006 Natural Language Processing Challenge at

the American Medical Informatics Association

Annual Symposium (AMIA)

◼ 889 records from Partners Healthcare (Boston)

669 for training, 220 for testing

◼ Classes for challenge:

Patients

Doctors

Hospitals

IDs

Dates

Locations

Phone #’s

Ages over 90

O. Uzuner, et al. J Am Med Inform Assoc. 2007 Sep-Oct; 14(5): 550–563

37 38

39 40

41 42

8


Performance Measure Levels

◼ Token–level scores: measure performance for each token

◼ Instance–level scores*: Extends the model to account for…

Type: PHI type in the instance (e.g., patient vs. doctor)

Content: terms included in the instance (e.g., “Dr. John Smith”)

Extent: beginning and ending of the instance

*Model used by NIST in their named entity recognition (NER) tasks


Instance-Level Performance

◼ Instead of true and false

we have correct, incorrect, missing, and spurious

◼ Total number of correct entities

◼ Substitution Error

◼ Insertion Error

◼ Deletion Error

== = otherwise

ccC e

entities

e

e0

correct all areextent & content, type,if1 where,

#

1

== = otherwise

ssS e

entities

e

e0

incorrect isextent & content, type,of one if1 where,

#

1

== = otherwise

iiI e

entities

e

e0

spurious all areextent & content, type,if1 where,

#

1

== = otherwise

ddD e

entities

e

e0

missing all areextent & content, type,if1 where,

#

1


Examples◼ 12/07

Ground truth = 12 → date; 07 → non-PHI

Predict 12/07 is ID Incorrect type

Predict 12/07 is date

◼ Incorrect type, and

◼ Substitution error

◼ Usually: all partial matches are considered substitution errors

◼ Mendelian Gene

Ground truth = non-PHI

Predict Mendelian Gene is Name

◼ Spurious type, content, and extent → Insertion error

◼ John Smith

Ground truth = name

Predict non-PHI for both

◼ Missing type, content, and extent → Deletion error


Instance Level Metrics

◼ Instance Level Precision (ILP)

C / (C + S + I)

◼ Instance Level Recall (ILR)

C / (C + S + D)

◼ F-measure = 2* ILP * ILR / (ILP + ILR)


Significance Testing◼ Null hypothesis: absolute difference in performances of two systems

on your favorite criteria (recall, precision, F) is ~ 0.

◼ Randomization technique: Shuffle a system’s responses to “units” in

the test set N times (e.g., N = 9999).

◼ Create N pairs of pseudo-systems

◼ Count the number of times, n, when difference between the

performances of the pseudo-system pairs is greater than the

difference between the performance of the two actual systems

s = (n + 1) / (N + 1)

◼ If s > threshold, then the difference is explained by chance. Otherwise

difference is significant at the threshold level.

AMIA Test: Threshold set to 0.1; “unit” equals all tokens (or instances) in a record

N. Chinchor. The statistical significance of the MUC-4 Results. In Proceedings of the 4th Conference on

Message Understanding. 1992: 30-50.


Competition Systems


43 44

45 46

47 48

9


Competition Systems


Overall


Rules


Rules

Instance-Level Comparison

Token-Level Comparison


Some Observations

◼ Machine learning outperforms rules based

systems

◼ Performance is never optimal

E.g., phone numbers with strange formats

◼ Machine learning can overfit

E.g., training on Mr. Smith / Mrs. Jones, when test set is

John Smith / Jane Jones


Software: From Theory to Practice

with Conditional Random Fields

HIDE (Gardner & Xiong 2009)

J. Gardner, L. Xiong. An integrated framework for de-identifying unstructured medical data. Data and Knowledge Engineering

(DKE), 2009; 68(12).


Software: From Theory to Practice

with Conditional Random Fields

HIDE (Gardner & Xiong 2009) MIST (Aberdeen et al 2010)

J. Aberdeen, et al. The MITRE Identification Scrubber Toolkit: design, training, and assessment.. Int J Med Inform.

2010;79(12):849-59.

49 50

51 52

53 54

10


MIST Installation & Training


CRFs (MIST) Beyond AMIA(Aberdeen et al. 2010)

Discharge Laboratory Letter Order All

Train 200 400 200 400 1200

Test 50 100 50 100 300

Precision 0.946 0.905 0.931 0.993 0.943

Recall 0.986 0.966 0.956 0.999 0.978

Precision: 0.91 – 0.99 Recall: 0.95 – 0.99

◼ Vanderbilt’s EMR (No Name or Place Dictionaries invoked)

◼ Specialized version of DE-ID provides the “gold standard”

◼ De-identification model based on CRFs

◼ Four document classes: Discharge Summaries (DS), Letters, Labs, Orders


Findings With Diverse EMRs

◼ Another alternative: MCRF (Mallet Conditional Random Field) - Cincinnati

Children’s Hospital

◼ Not all types of identifiers are found at the same rate

◼ ~3500 clinical notes over 22 note types > 30,000 identifiers

Virtually

indistinguishable

from human

de-identification

L. Deleger, et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J

Am Med Inform Assoc. 2013 Jan 1;20(1):84-94.


◼ CRF Scrubbing @ Cincinnati

◼ ~3500 clinical notes over 22 note types

Negligible Impact on Medication Extraction(Deleger et al. 2013)

Original Notes Scrubbed Notes

Precision 96.3 96.3 – 96.5

Recall 89.3 88.9 – 89.5

F-measure 92.6 92.5 – 92.7


Can you Trust Synthetic Results?

◼ AMIA challenge used resynthesized data

◼ Do the results hold true on real data?

◼ What are the limits of resynthesis?


An Experimental Model(Yeniterzi et al, 2010)

◼ OO: de-identification model was trained and tested

with original medical records

replicates ideal training and evaluation

◼ RR: model was trained and tested with resynthesized

medical records

replicates AMIA evaluation

◼ OR: model was trained with original and tested on

resynthesized

replicates ideal training and evaluation

◼ RO: model was trained with resynthesized and tested

with original

replicates “off the shelf” applicationR. Yeniterzi, et al. Effects of personal identifier resynthesis on clinical text de-identification. Journal of the American Medical

Informatics Association. 2010; 17: 59-68.

55 56

57 58

59 60

11


Environment(Yeniterzi et al, 2010)

◼ Vanderbilt’s EMR

◼ Specialized version of DE-ID provides the “gold standard”

◼ De-identification model based on CRFs in MIST

◼ Resynthesis is improved model of AMIA (more realistic in replaced terms)

◼ Four document classes: Discharge Summaries (DS), Letters, Labs, Orders

◼ Fifth class uses 50 documents from each class for train, and all test

documents

Record

Class

Evaluation

Train Test

DS 200 50

LETTER 200 50

LAB 400 100

ORDER 400 100

HYBRID 200 300


A Systemic Analysis(Yeniterzi et al, 2010)

DE-ID+ ReLinking

ResynthesisEngine

MIST ORIG MODEL

MIST RESYNTH MODEL

MITRE Model Builder (Carafe)

OriginalRecords

Annotated OriginalRecords

TRAINOrig

TESTOrig

TESTResynth

TRAINResynth

ResynthesizedRecords

Exp 1

MITRE Model Builder (Carafe)

Exp 2

Exp 3Score O-R

Exp 4

Score O-OScore R-R

Score R-O


Record ClassRecall Precision F-Measure Accuracy

PHI Exposure

(1-label-blind recall)

OO Experiment

DS 0.986 0.946 0.966 0.993 0.014

LAB 0.966 0.905 0.935 0.983 0.034

LETTER 0.956 0.931 0.944 0.986 0.040

ORDER 0.999 0.993 0.996 0.999 0.001

AGGREGATE 0.978 0.943 0.960 0.990 0.022

HYBRID 0.962 0.925 0.943 0.986 0.035

RR Experiment

DS 0.986 0.972 0.979 0.998 0.010

LAB 0.995 0.991 0.993 0.999 0.005

LETTER 0.965 0.962 0.963 0.996 0.032

ORDER 0.990 0.989 0.989 0.999 0.010

AGGREGATE 0.983 0.977 0.980 0.998 0.014

HYBRID 0.970 0.960 0.965 0.997 0.022

OR Experiment

DS 0.871 0.919 0.894 0.990 0.101

LAB 0.731 0.843 0.783 0.987 0.268

LETTER 0.832 0.910 0.869 0.987 0.155

ORDER 0.788 0.984 0.875 0.992 0.212

AGGREGATE 0.816 0.913 0.862 0.989 0.171

HYBRID 0.842 0.911 0.875 0.990 0.147

RO Experiment

DS 0.674 0.887 0.766 0.961 0.324

LAB 0.348 0.723 0.470 0.899 0.652

LETTER 0.769 0.852 0.808 0.955 0.224

ORDER 0.766 0.834 0.799 0.926 0.234

AGGREGATE 0.642 0.841 0.728 0.942 0.355

HYBRID 0.404 0.789 0.535 0.914 0.592


Readings for Next Lecture

◼ E. Ratliff. Writer Evan Ratliff tried to vanish: here's what happened. Wired. November 20, 2009.

◼ V. BLue. Strava's fitness heatmaps are a "potential catastrophe". Engadget. February 2, 2018.

Optional

◼ A. Acquisi and R. Gross. Predicting Social Security Numbers from public data. Proceedings of the

National Academy of Sciences USA. 2009; 106(27): 10975-10980.

◼ V. Griffith and M. Jacobsson. Messin' with Texas: Deriving mother's maiden names using public

records. Proceedings of the Applied Cryptography and Network Security Conference. 2005: 91-

103.

◼ S. Munson, et al. Attitudes towards online availability of US public records. Proceedings of the

12th Annual International Digital Government Research Conference. 2011: 2-9.

◼ G. Friedland, et al. Sherlock Holmes' evil twin: on the impact of global inference for online privacy.

Proceedings of the Workshop on New Security Paradigms Workshop. 2011: 105-114.

61 62

63 64

data privacy in biomedicine dictionaries and rules lecture ... · 1 data privacy in biomedicine...

Documents