finding & eliminating rogue hex characters in text fields martha cox cancer outcomes research...
TRANSCRIPT
![Page 1: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/1.jpg)
Finding & Eliminating Finding & Eliminating Rogue Hex Rogue Hex
Characters Characters in Text Fieldsin Text Fields
Martha CoxMartha CoxCancer Outcomes Research ProgramCancer Outcomes Research ProgramCDHA / DalhousieCDHA / Dalhousie
![Page 2: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/2.jpg)
The ProblemThe Problem
Chart abstraction data containing several comment fields
(255 chars each)
Some values with "random" line feeds
![Page 3: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/3.jpg)
Patient ID Comments -------------------------------------------------------------------------------------------- 013 Found hyperplastic polyp -------------------------------------------------------------------------------------------- 017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. -------------------------------------------------------------------------------------------- 028 Pt did not have surgery.
Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma -------------------------------------------------------------------------------------------- 031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure.-No evidence of intraluminal tumor at this point. -------------------------------------------------------------------------------------------- 038 report not present. -------------------------------------------------------------------------------------------- 040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. -------------------------------------------------------------------------------------------- 056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. -------------------------------------------------------------------------------------------- 084 lap attempted X 2 but resection could not be carried out.
Most questions N A for laparatomy. -------------------------------------------------------------------------------------------- 155 had a hemicolectomy -------------------------------------------------------------------------------------------- 157 3 4 tumor above reflection, 1 4 was below reflection --------------------------------------------------------------------------------------------
![Page 4: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/4.jpg)
So I emailed my SAS buddies...So I emailed my SAS buddies...
![Page 5: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/5.jpg)
Lots of suggestionsLots of suggestions
• compress? kcompress?Returns seem to be between words. Compress would smash 2 words together.
• translate or tranwrd?Should work, but these wouldn't take a hex value for me.
Besides, which character(s) is the problem?
![Page 6: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/6.jpg)
data charlist; set shrug.sample1 (where=(PATIENT in (28))); length single singlhex $1; loopx = length(trim(COMMENT)); do i = 1 to loopx; single = substr(COMMENT, i, 1); singlhex = single; output; end; keep single singlhex;run;
How to find the Bad WordHow to find the Bad Word
![Page 7: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/7.jpg)
Patient 28's comment, one char at a timePatient 28's comment, one char at a time
– Obs single singlhex– 20 g 67 – 21 e 65 – 22 r 72 – 23 y 79 – 24 . 2E – 25 20 – 26 – 0D – 27 0A – 28 – 0D – 29 0A – 30 B 42 – 31 i 69
![Page 8: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/8.jpg)
Repair ProgramRepair Program
–data shrug.sample2;– set shrug.sample1;– badword = trim('0D'x) || left('0A'x);– goodword = ' ';– COMMENT = tranwrd(COMMENT, – badword, – goodword);– drop badword goodword;–run;
![Page 9: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/9.jpg)
Patient ID Comments --------------------------------------------------------------------------------------------013 Found hyperplastic polyp --------------------------------------------------------------------------------------------017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. --------------------------------------------------------------------------------------------028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma --------------------------------------------------------------------------------------------031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. --------------------------------------------------------------------------------------------038 report not present. --------------------------------------------------------------------------------------------040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. --------------------------------------------------------------------------------------------056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. --------------------------------------------------------------------------------------------084 lap attempted X 2 but resection could not be carried out. Most questions N A for laparatomy. --------------------------------------------------------------------------------------------155 had a hemicolectomy --------------------------------------------------------------------------------------------157 3 4 tumor above reflection, 1 4 was below reflection --------------------------------------------------------------------------------------------
ResultsResults
![Page 10: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/10.jpg)
Noticed that the breaks seemed to occurring where one might have used a slash (“/”).
Working in a VMS batch environment; no Display Manager.
Looking at the data via PROC REPORT with “flow” for the comments column.
Hmm...Hmm...
So, is this a data problem or a reporting problem?
![Page 11: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/11.jpg)
after much digging after much digging through SAS manuals...through SAS manuals...
![Page 12: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/12.jpg)
Split character in PROC REPORT not just for column headersalso used to split long text
values in the body of the reportdefault character is slash
The Answer!The Answer!
![Page 13: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/13.jpg)
Patient ID Comments--------------------------------------------------------------------------------------------013 Found hyperplastic polyp --------------------------------------------------------------------------------------------017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. --------------------------------------------------------------------------------------------028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma --------------------------------------------------------------------------------------------031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. --------------------------------------------------------------------------------------------038 report not present. --------------------------------------------------------------------------------------------040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. --------------------------------------------------------------------------------------------056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. --------------------------------------------------------------------------------------------084 lap attempted X 2 but resection could not be carried out. Most questions N/A for laparatomy. --------------------------------------------------------------------------------------------155 had a hemicolectomy --------------------------------------------------------------------------------------------157 3/4 tumor above reflection, 1/4 was below reflection --------------------------------------------------------------------------------------------
Final ResultsFinal Results
![Page 14: Finding & Eliminating Rogue Hex Characters in Text Fields Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie](https://reader036.vdocuments.site/reader036/viewer/2022082817/56649e725503460f94b70894/html5/thumbnails/14.jpg)
Any questions ?Any questions ?