finding & eliminating rogue hex characters in text fields martha cox cancer outcomes research...

14
Finding & Finding & Eliminating Rogue Eliminating Rogue Hex Characters Hex Characters in Text Fields in Text Fields Martha Cox Martha Cox Cancer Outcomes Research Program Cancer Outcomes Research Program CDHA / Dalhousie CDHA / Dalhousie

Upload: molly-french

Post on 29-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

Finding & Eliminating Finding & Eliminating Rogue Hex Rogue Hex

Characters Characters in Text Fieldsin Text Fields

Martha CoxMartha CoxCancer Outcomes Research ProgramCancer Outcomes Research ProgramCDHA / DalhousieCDHA / Dalhousie

Page 2: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

The ProblemThe Problem

Chart abstraction data containing several comment fields

(255 chars each)

Some values with "random" line feeds

Page 3: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

Patient ID Comments -------------------------------------------------------------------------------------------- 013 Found hyperplastic polyp -------------------------------------------------------------------------------------------- 017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. -------------------------------------------------------------------------------------------- 028 Pt did not have surgery.

Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma -------------------------------------------------------------------------------------------- 031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure.-No evidence of intraluminal tumor at this point. -------------------------------------------------------------------------------------------- 038 report not present. -------------------------------------------------------------------------------------------- 040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. -------------------------------------------------------------------------------------------- 056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. -------------------------------------------------------------------------------------------- 084 lap attempted X 2 but resection could not be carried out.

Most questions N A for laparatomy. -------------------------------------------------------------------------------------------- 155 had a hemicolectomy -------------------------------------------------------------------------------------------- 157 3 4 tumor above reflection, 1 4 was below reflection --------------------------------------------------------------------------------------------

Page 4: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

So I emailed my SAS buddies...So I emailed my SAS buddies...

Page 5: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

Lots of suggestionsLots of suggestions

• compress? kcompress?Returns seem to be between words. Compress would smash 2 words together.

• translate or tranwrd?Should work, but these wouldn't take a hex value for me.

Besides, which character(s) is the problem?

Page 6: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

data charlist; set shrug.sample1 (where=(PATIENT in (28))); length single singlhex $1; loopx = length(trim(COMMENT)); do i = 1 to loopx; single = substr(COMMENT, i, 1); singlhex = single; output; end; keep single singlhex;run;

How to find the Bad WordHow to find the Bad Word

Page 7: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

Patient 28's comment, one char at a timePatient 28's comment, one char at a time

– Obs single singlhex– 20 g 67 – 21 e 65 – 22 r 72 – 23 y 79 – 24 . 2E – 25 20 – 26 – 0D – 27 0A – 28 – 0D – 29 0A – 30 B 42 – 31 i 69

Page 8: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

Repair ProgramRepair Program

–data shrug.sample2;– set shrug.sample1;– badword = trim('0D'x) || left('0A'x);– goodword = ' ';– COMMENT = tranwrd(COMMENT, – badword, – goodword);– drop badword goodword;–run;

Page 9: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

Patient ID Comments --------------------------------------------------------------------------------------------013 Found hyperplastic polyp --------------------------------------------------------------------------------------------017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. --------------------------------------------------------------------------------------------028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma --------------------------------------------------------------------------------------------031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. --------------------------------------------------------------------------------------------038 report not present. --------------------------------------------------------------------------------------------040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. --------------------------------------------------------------------------------------------056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. --------------------------------------------------------------------------------------------084 lap attempted X 2 but resection could not be carried out. Most questions N A for laparatomy. --------------------------------------------------------------------------------------------155 had a hemicolectomy --------------------------------------------------------------------------------------------157 3 4 tumor above reflection, 1 4 was below reflection --------------------------------------------------------------------------------------------

ResultsResults

Page 10: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

Noticed that the breaks seemed to occurring where one might have used a slash (“/”).

Working in a VMS batch environment; no Display Manager.

Looking at the data via PROC REPORT with “flow” for the comments column.

Hmm...Hmm...

So, is this a data problem or a reporting problem?

Page 11: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

after much digging after much digging through SAS manuals...through SAS manuals...

Page 12: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

Split character in PROC REPORT not just for column headersalso used to split long text

values in the body of the reportdefault character is slash

The Answer!The Answer!

Page 13: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

Patient ID Comments--------------------------------------------------------------------------------------------013 Found hyperplastic polyp --------------------------------------------------------------------------------------------017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. --------------------------------------------------------------------------------------------028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma --------------------------------------------------------------------------------------------031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. --------------------------------------------------------------------------------------------038 report not present. --------------------------------------------------------------------------------------------040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. --------------------------------------------------------------------------------------------056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. --------------------------------------------------------------------------------------------084 lap attempted X 2 but resection could not be carried out. Most questions N/A for laparatomy. --------------------------------------------------------------------------------------------155 had a hemicolectomy --------------------------------------------------------------------------------------------157 3/4 tumor above reflection, 1/4 was below reflection --------------------------------------------------------------------------------------------

Final ResultsFinal Results

Page 14: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie

Any questions ?Any questions ?