finding & eliminating rogue hex characters in text fields martha cox cancer outcomes research...
TRANSCRIPT
Finding & Eliminating Finding & Eliminating Rogue Hex Rogue Hex
Characters Characters in Text Fieldsin Text Fields
Martha CoxMartha CoxCancer Outcomes Research ProgramCancer Outcomes Research ProgramCDHA / DalhousieCDHA / Dalhousie
The ProblemThe Problem
Chart abstraction data containing several comment fields
(255 chars each)
Some values with "random" line feeds
Patient ID Comments -------------------------------------------------------------------------------------------- 013 Found hyperplastic polyp -------------------------------------------------------------------------------------------- 017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. -------------------------------------------------------------------------------------------- 028 Pt did not have surgery.
Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma -------------------------------------------------------------------------------------------- 031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure.-No evidence of intraluminal tumor at this point. -------------------------------------------------------------------------------------------- 038 report not present. -------------------------------------------------------------------------------------------- 040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. -------------------------------------------------------------------------------------------- 056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. -------------------------------------------------------------------------------------------- 084 lap attempted X 2 but resection could not be carried out.
Most questions N A for laparatomy. -------------------------------------------------------------------------------------------- 155 had a hemicolectomy -------------------------------------------------------------------------------------------- 157 3 4 tumor above reflection, 1 4 was below reflection --------------------------------------------------------------------------------------------
So I emailed my SAS buddies...So I emailed my SAS buddies...
Lots of suggestionsLots of suggestions
• compress? kcompress?Returns seem to be between words. Compress would smash 2 words together.
• translate or tranwrd?Should work, but these wouldn't take a hex value for me.
Besides, which character(s) is the problem?
data charlist; set shrug.sample1 (where=(PATIENT in (28))); length single singlhex $1; loopx = length(trim(COMMENT)); do i = 1 to loopx; single = substr(COMMENT, i, 1); singlhex = single; output; end; keep single singlhex;run;
How to find the Bad WordHow to find the Bad Word
Patient 28's comment, one char at a timePatient 28's comment, one char at a time
– Obs single singlhex– 20 g 67 – 21 e 65 – 22 r 72 – 23 y 79 – 24 . 2E – 25 20 – 26 – 0D – 27 0A – 28 – 0D – 29 0A – 30 B 42 – 31 i 69
Repair ProgramRepair Program
–data shrug.sample2;– set shrug.sample1;– badword = trim('0D'x) || left('0A'x);– goodword = ' ';– COMMENT = tranwrd(COMMENT, – badword, – goodword);– drop badword goodword;–run;
Patient ID Comments --------------------------------------------------------------------------------------------013 Found hyperplastic polyp --------------------------------------------------------------------------------------------017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. --------------------------------------------------------------------------------------------028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma --------------------------------------------------------------------------------------------031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. --------------------------------------------------------------------------------------------038 report not present. --------------------------------------------------------------------------------------------040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. --------------------------------------------------------------------------------------------056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. --------------------------------------------------------------------------------------------084 lap attempted X 2 but resection could not be carried out. Most questions N A for laparatomy. --------------------------------------------------------------------------------------------155 had a hemicolectomy --------------------------------------------------------------------------------------------157 3 4 tumor above reflection, 1 4 was below reflection --------------------------------------------------------------------------------------------
ResultsResults
Noticed that the breaks seemed to occurring where one might have used a slash (“/”).
Working in a VMS batch environment; no Display Manager.
Looking at the data via PROC REPORT with “flow” for the comments column.
Hmm...Hmm...
So, is this a data problem or a reporting problem?
after much digging after much digging through SAS manuals...through SAS manuals...
Split character in PROC REPORT not just for column headersalso used to split long text
values in the body of the reportdefault character is slash
The Answer!The Answer!
Patient ID Comments--------------------------------------------------------------------------------------------013 Found hyperplastic polyp --------------------------------------------------------------------------------------------017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. --------------------------------------------------------------------------------------------028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma --------------------------------------------------------------------------------------------031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. --------------------------------------------------------------------------------------------038 report not present. --------------------------------------------------------------------------------------------040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. --------------------------------------------------------------------------------------------056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. --------------------------------------------------------------------------------------------084 lap attempted X 2 but resection could not be carried out. Most questions N/A for laparatomy. --------------------------------------------------------------------------------------------155 had a hemicolectomy --------------------------------------------------------------------------------------------157 3/4 tumor above reflection, 1/4 was below reflection --------------------------------------------------------------------------------------------
Final ResultsFinal Results
Any questions ?Any questions ?