medical informatics · oleg pianykh [email protected] 4 patient ids patient ids solve many patient...

29
Oleg Pianykh [email protected] 1 Medical Informatics Oleg Pianykh, PhD [email protected] Part1: Making sense of MI standards Part 2: Introduction to medical images and image compression Oleg Pianykh [email protected]

Upload: others

Post on 19-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 1

Medical Informatics

Oleg Pianykh, PhD

[email protected]

Part1: Making sense of MI standards

Part 2: Introduction to medical images and image compression

Oleg Pianykh [email protected]

Page 2: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 2

Oleg Pianykh [email protected]

Part 1: Making sense of MI standards

Anything wrong with this image?

What’s the point?

OK, we have MI standards, but: Vendors may not like them.

Hospitals may not like them.

Physicians may not like them.

Standards are hard to learn and even harder to enforce.

So

- Can one misuse the standards?Yes!

- Can we use standards productively?Yes!

Let’s consider a few real projects.

Oleg Pianykh [email protected]

Page 3: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 3

Project 1: Patient identification

Patient identification is essential in any clinical workflow.

Identifying people by their names is not a good idea. The names: are not unique;

can be misspelled and mistyped (letters, commas, blanks between the name parts, swapping name parts);

can change (due to marriage, legal issues, and so on);

can be hard to transliterate when typing foreign names on a DICOM unit that does not support a specific foreign alphabet (consider entering Japanese names on an English-based CT scanner, for example);

can violate patient privacy.Oleg Pianykh [email protected]

Smith,JohnSmith JohnJohn SMITH…

DICOM hierarchy

Oleg Pianykh [email protected]

Patient

Patient ID (0010,0020)

Study

Study Instance UID (0020,000D)

Study

Study Instance UID (0020,000D)

Series

Series Instance UID (0020,000E)

Series

Series Instance UID (0020,000E)

Image

Image SOP Instance UID (0008,0018)

Image

Image SOP Instance UID (0008,0018)

Arrows indicate 1-n relationship

Do not use names, use Patient IDs !

Page 4: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 4

Patient IDs

Patient IDs solve many patient name problems, but they

still need some work:

Universal ID rules are needed (main task behind most

global EPR initiatives!).

Universal ID repositories are needed.

Local IDs can be abused just like patient names.

Data consolidation tasks, such as merging/splitting patient

records, often require human intervention (cannot be

solved automatically, “on the fly”).

Oleg Pianykh [email protected]

Typical Patient ID misuses

Oleg Pianykh [email protected]

Patient ID is the only ID in the DICOM hierarchy that is entered manually.

Consequently, Patient ID is the most misused item in healthcare workflow. Get it right!

When proper item identification is not present, integrated digital workflow is impossible.

Samples from realhospitals (confidential data anonymized)

Page 5: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 5

Project 2: Patient tracking

Outpatient center X wants to optimize/reduce patient wait times.

Solutions: Provide more physicians/staff for faster patient processing.

Serve free coffee in the waiting rooms.

Ask patients about their waiting experience.

Can HL7 help?

Oleg Pianykh [email protected]

Tracking patient in HIS

Oleg Pianykh [email protected]

To front desk

Wait time!HIS database stores all important HL7 updates in patient status – use them!

To exam room From exam room

HL7ADT

messages

Page 6: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 6

Results:

We can study patient arrival rate (at each minute during a week):

Oleg Pianykh [email protected]

Results:

We can study the size of patient waiting line at each minute during a week:

Oleg Pianykh [email protected]

Page 7: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 7

Typical HIS data failures

Manual entry supplies most of HIS data!

Result: incorrect or strangely-biased numerical results

Oleg Pianykh [email protected]

Example: Histogram for MRI exam length distribution (blue curve) shows unusual spikes for multiple of 5 minutes: data entry with 5-min accuracy.

Q: Can you explain this? Filtered out noise (black curve)

Q: Can you suggest other ways to identify incorrect input?

Project 3: Equipment utilization

Hospital Z wants to increase its revenue by shortening patient scanning time (or maybe just tracking this time reliably). So they want to study their scanning times…

Options: Assume that each scan should take the same amount of

time (ha-ha!).

Ask hospital staff to record the beginning and the end times for each scan (really?).

Can standards help?

Oleg Pianykh [email protected]

Any observation distorts the system to be observed !

Page 8: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 8

Solution: remember the (0008,0032) tag?

Oleg Pianykh [email protected]

Time ti=i sec

AIF Contrast change from baseline, HU

Using single DICOM Acquisition Time (0008,0032) tag,

HHMMSS.FFFFFF

Computing scan length

Oleg Pianykh [email protected]

Patient

Patient ID (0010,0020)

Study

Study Instance UID (0020,000D)

Study

Study Instance UID (0020,000D)

Series

Series Instance UID (0020,000E)

Series

Series Instance UID (0020,000E)

Image

Image SOP Instance UID (0008,0018)

Image

Image SOP Instance UID (0008,0018)

We know tn – acquisition time for each image

We find series time rk = max(tn)-min(tn)

for all images in series

We find study time sm = max(rk)-min(rk)for all series in study

Page 9: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 9

Result: Problem areas in utilization

Oleg Pianykh [email protected]

Result: exam length analysis

Oleg Pianykh [email protected]

Page 10: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 10

Project 4: Tracking patient radiation dose

Certain types of medical studies/imaging are ionizing: they expose patients to radiation.

Oleg Pianykh [email protected]

Radiation is cumulative and contributes to the risk of cancer.

We need to track patient radiation dose to avoid cumulative overexposure. How we can do this?

Dose can be computed from DICOM tags

DICOM records exposure time, X-ray tube current, and slice thickness for each image.

D = Time × Current × (Scan Length) would be proportional to the radiation dose

Oleg Pianykh [email protected]

Page 11: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 11

Moreover…

Using other DICOM tags, we can develop a complete dose control and analysis framework: Identify high-dose equipment, protocols (exam types)

Identify referring physicians (tag (0x0008, 0x0090)) who order the most irradiating exams

Identify exam-performing staff members (“Operator’s name” in (0x0008, 0x1070)) who scan with the highest doses

Study the effects of dose related to patient’s age, body part, weight, diagnosis, etc.

Oleg Pianykh [email protected]

Result: Tracking patient dose

Hidden

Page 12: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 12

Result: Analyzing dose data

Project 5: Tracking tumor growth

It is important to track tumor sizes in timely fashion (how they change, their response to medication, etc.).

Radiologists label tumors on images using DICOM-compatible annotations.

Oleg Pianykh [email protected]

Page 13: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 13

Result: Automating tumor tracking

Extract annotation coordinates from DICOM tags

Use these coordinates to estimate tumor size, for instance Size=(L1+L2)/2, or

Size=L1×L2

Automatically store new sizes in the patient database (DICOM Patient ID)

Even better – run CAD (computer-aided diagnostics) software on extracted image regions

Oleg Pianykh [email protected]

Length L1

Length L2

Result: Tumor size-driven decision support

Oleg Pianykh [email protected]

Decision support software suggests the best option based on the HIS/PACS data (such as tumor size).

Page 14: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 14

Project 6: Surgical planning, implants

Complex surgeries require sophisticated planning and visualization.

DICOM 3D software and 3D printers can help.

Oleg Pianykh [email protected]

Source: http://www.radiologycases.com/index.php/radiologycases/article/view/889/pdf

3D reconstruction from DICOM images

Plaster model from 3D printer

Project 7: Vendor-Neutral Archiving

Standards are all about being vendor-neutral.

Vendor dependencies: make your data incompatible;

lock you inside a single-vendor solution – cannot migrate, cannot evolve;

diminish your ability to implement the best tools available;

don’t even allow you to bargain.

VNA, and vendor-neutral environment/software in general, has finally become one of the major trends in healthcare.

Oleg Pianykh [email protected]: http://www.himss.org/files/HIMSSorg/content/files/TeraMedicaWhatisaVNA.pdf

Page 15: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 15

Conclusions

Standards can facilitate many clinical projects, provided that:1. you know how the standards work;

2. your equipment and workflow comply with the standards;

3. your data complies with the standards.

Oleg Pianykh [email protected]

Getting it right

Oleg Pianykh [email protected]

A hospital hires you to develop an EHR (Electronic

Hospital Record) system. Your first step would be to:

1. ask about your salary.

2. inspect the hospital DICOM units.

3. ask about Patient ID assignment policies.

4. check whether they work with foreign patients.

5. ask about their preferences for database engine.

Page 16: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 16

Part 2: Image data in DICOM

Image data holds a special place in the medical workflow because the images are: ambiguous – images cannot be read directly as text;

original data – images are taken to capture pathologies; reports and interpretations come later;

bulky – 99% of DICOM data is image pixels;

super-informative – manipulating image data with smart processing algorithms can reveal new, often invisible phenomena.

Oleg Pianykh [email protected]

Medical image analysis

Oleg Pianykh [email protected]

Page 17: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 17

Digital image trivia

Oleg Pianykh [email protected]

Pixel

Zoomed image fragment

Digital images are stored as pixel matrices (2D).

Once acquired, images retain same size, resolution, quality, and format (great displays cannot fix poor image quality).

Countless algorithms and software can process pixel matrices in many wonderful ways…

Images in DICOM

Oleg Pianykh [email protected]

Byte#

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 … 65547

65548

Decimal

224 127 16 0 O B 0 0 0 0 1 0 0 3 0 … 10 10

Binary

E0 7F 10 00 4F 42 00 00 00 00 01 00 00 03 00 … 0A 0A

g=7FE0 e=0010 VR type Reserved VR length L=0x00010000

VR value

(pixels samples)

Image in (7FE0, 0010) tag

Image is stored in (7FE0, 0010) tag

Sequence of image pixel samples

VR: OB (including RGB) or OW (deep grayscale)

Image height — stored as “Rows” attribute (0028,0010). Image width — stored as “Columns” attribute (0028,0011).Image pixel data — stored as “Pixel Data” attribute (7FE0,0010).

Page 18: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 18

Important image attributes

Oleg Pianykh [email protected]

Tag Name VR VM(0018,0050) Slice Thickness DS 1(0018,0088) Spacing between Slices DS 1(0018,1063) Frame Time DS 1(0020, 0032) Image Position DS 3(0020,0037) Image Orientation DS 6(0028,0002) Samples per Pixel US 1 (0028,0008) Number of Frames IS 1 (0028,0010) Rows (image height) US 1 (0028,0011) Columns (image width) US 1 (0028,0030) Pixel Spacing DS 2 (0028,0100) Bits Allocated Ba US 1 (0028,0101) Bits Stored Bs US 1 (0028,0102) High Bit Bh US 1 (7FE0,0010) Pixel Data OW/OB 1

Example: checking DICOM tags

Oleg Pianykh [email protected]

Page 19: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 19

Videos in DICOM

(0028,0008) “Number of Frames” attribute (VR=“IS”) stores number of static frames

(0018,1063) “Frame Time” (VR=“DS”) defines frame playback speed

Oleg Pianykh [email protected]

Example: 360 images (frames), with Frame Time = 1/24 sec.

Recent DICOM editions added MPEG4 – true video format

Pixel samples: basic…

Oleg Pianykh [email protected]

Wasted 4 bits !

Storing RGB (color) image with 12 bits per color channel

Page 20: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 20

… and more complex

Oleg Pianykh [email protected]

Wasted 4 bits

Wasted 2 bits

Two approaches to the “wasted space” problem:

1. Stick some other data into the empty bits (old way)2. Use image compression

Never underestimate your data size !

Image modality Typical image matrix (height width, bytes per pixel)

Image size, kilobytes (KB)

Typical number of images in a study

Typical study size, megabytes (MB)

Nuclear medicine, NM

128 × 128 × 1 16 100 1.5

Magnetic resonance, MR

256 × 256 × 2 128 200 25

Computed tomography, CT

512 × 512 × 2 512 500 250

Color ultrasound, US

600 × 800 × 3 1,400 500 680

Computed radiography, CR

2140 × 1760 × 2 7,356 4 30

Color 3D reconstructions

1024 × 1024 × 3 3,000 20 60

Digital mammography, MG

Up to 6400 × 4800 × 2 60,000 4 240

Oleg Pianykh [email protected]

Typical sizes of digital images and studies

Page 21: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 21

Can image compression help?

What is image compression doing, really?1. Reducing image sizes

2. Reducing image colors

3. Repacking image pixels in more compact form

Disk space is getting cheaper, networks are getting faster – do we still need to compress?

Oleg Pianykh [email protected]

Why/How do we compress data?

Q1. Disk space is cheap, networks are fast – why should we compress anything at all?

Q2. How does data and image compression work?

Remove blanks and spaces?

Decrease image resolution?

Convert to binary?

Anything else to make data size smaller???

Oleg Pianykh [email protected]

Page 22: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 22

How data compression works: lossless (text, images, anything)

Lossless compression: uncompressed image is identical to the original

Imagine a sequence of data values:

1000, 1001, 1002, 1002, 1000, 1000, 1001, 1057,….

We can encode the most frequent symbol with the shortest code word. For example, encoding 1000 with ayields:

a, 1001, 1002, 1002, a, a, 1001, 1057,….

- our sequence is getting shorter. This is the idea behind ZIP, RAR, and other lossless compression algorithms.

Oleg Pianykh [email protected]

Lossy compression (images)

Lossy: better compression, but uncompressed data is different from the original

Imagine a sequence of pixel values:

1000, 1001, 1002, 1002, 1000, 1000, 1001, 1057,….

Close pixel values (1000, 1001, 1002) can be hardly distinguished. So we can set them all to 1000 (lossy), and replace 1000 with a :

a, a, a, a, a, a, a, 1057,….

- the sequence is getting much shorter. This is the idea behind lossy compression algorithms.

Oleg Pianykh [email protected]

Page 23: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 23

Compression ratio

Compression ratio is defined as:

Diagnostically-safe compression ratio is hard to define, as it always depends on the particular image properties. You (and your doctors) need to experiment with different compression settings to select the optimal compression.

Oleg Pianykh [email protected]

size datacompressed

size dataoriginal R comp

Canada England GermanyRadiography 20-30 10 10Mammography 15-25 20 15CT 8-15 5 5-8MR 16-24 5 7RF/XA n/a 10 6

Summary of recommended compression rates in Canada, England, and Germany

Compression and information

Less informative (redundant) data is easier to compress blah-blah-blah-blah-blah – nearly zero information, easy

to compress

Medical informatics is cool ! – less redundant, harder to compress

Information can be quantified with entropy.

Oleg Pianykh [email protected]

Page 24: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 24

Entropy

Shannon’s entropy (1948) formula: Consider events e1, …, en with probabilities p1,…, pn. Then their entropy

Note that pi =1 means 100%-probable event. Therefore H=0 (no

information).

Less probable events (low pi) correspond to higher information (higher log(1/ pi)).

When all events are equally probable (pi = pj), we have the highest entropy H (most informative, most uncertain case).

Oleg Pianykh [email protected]

i

iii i

i ppp

pH 22 log1

log

Example: infant language

Infant language: word probabilities

Oleg Pianykh [email protected]

Baby Older

mommy 0.35 mommy 0.05

dad 0.2 dad 0.05

cat 0.2 cat 0.025

a-a-a 0.25 car 0.025

H1=1.96 eat 0.025

milk 0.025

no 0.8

H2=1.25

H1>H2, because with H2 we nearly always hear “no.”

Page 25: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 25

Predictive models

What if we can compute the next data item from the previous one? Then we can use our formula to recover the entire data stream and achieve the best possible compression.

Example: instead of writing1000, 1000, 1000, 1000, 1000, 1000

We can write1000 (5)

- where (5) means “repeat 5 times” (run-length compression). Predictive models are ideal for data with natural redundancy (images).

Oleg Pianykh [email protected]

Predictive compression with images

We can do a better compression job, predicting each pixel from its neighbors:

Oleg Pianykh [email protected]

2/)( :2 Model

:1 Model

... bax

ax

b

xa

As a result, we compress prediction error values ε instead of the original pixels.

The better we predict, the lower ε become, the better we can compress.

For normally-distributed ε, their entropy

(decays with decreasing ε).

Known pixel

Known pixel

Next pixel to compress

)2log(2

1 eH

Page 26: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 26

Image compression problem in medicine

Image compression is always in demand.

Only lossy compression can substantially reduce large image volume.

Popular lossy image compression algorithms (JPEG, JPEG2000) define image loss ε on average. That is, εcan get high at some pixels, obstructing the diagnostic information.

Oleg Pianykh [email protected]

Lossy artifacts: JPEG

Oleg Pianykh [email protected]

JPEG lossy

Checkboards, Rcomp=20

Page 27: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 27

Lossy artifacts: JPEG2000

Blur, Rcomp=20

Oleg Pianykh [email protected]

JPEG2000 lossy

Diagnostic compression

Diagnostic compression algorithms control ε at each pixel:

JPEG-LS – example of diagnostic compression

JPEG2000 can set ε depending on the region. But who selects this region?

Oleg Pianykh [email protected]

max

Page 28: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 28

Region-specific compression: JPEG2000

Oleg Pianykh [email protected]

Original compression Enhanced region

It would be interesting to develop an algorithm that automatically determines the

“diagnostic value” of each pixel area, to use stronger or lighter compression.

Streaming compression

Oleg Pianykh [email protected]

Idea: high-quality data on demand

Download low-resolution images first

For specific areas of interest, provide high-resolution

Pro: full data is never loaded

Con: user waits for each image update

Page 29: Medical Informatics · Oleg Pianykh opiany@gmail.com 4 Patient IDs Patient IDs solve many patient name problems, but they still need some work: Universal ID rules are needed (main

Oleg Pianykh [email protected] 29

Compression vs. network speed

Compression takes time!

Always test your compression with your network to find out whether you are actually gaining anything.

Oleg Pianykh [email protected]

Compression vs. noise

Lossy compression can be noise-removing, which can help us define diagnostically-acceptable Rcomp.

Oleg Pianykh [email protected]

1 20 26

0.2

0.4

0.6

Variance σ and its origin

JPEG compression ratio Rc

σ compression

σ noise

13

Variance in vessel diameter, observed at different JPEG compression levels

Lossy is safe before this point: compression loss is lower than the original image noise