[ieee 2011 seventh international conference on intelligent information hiding and multimedia signal...

4
Invisible Communication through Portable Document File (PDF) Format Gundeep Singh Bindra Computer Science Engineering SRM University New Delhi, India [email protected] Abstract—With the fast paced development of the internet, exchanging or concealing of private data has become a serious concern. This Paper proposes a new and an unidentified steganographic technique to conceal your data which can be further encrypted (through cryptography) in a .pdf file format. The informative-theoretic method for performing steganography deals with text-to-text steganography while restoring the carrier file along with the hidden message after binding it in a PDF file format. This paper introduces a new steganographic technique through a PDF medium. It also suggests plausibility of exploring the possibility of adding layers of security to the suggested information hiding technique. Keywords- Portable Document Format; Invisible; Document Threat; Communication; Security; I. INTRODUCTION Steganography is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity. The word steganography is of Greek origin and means "concealed writing". Generally, messages will appear to be something else: images, articles, shopping lists, or some other cover- text and, classically, the hidden message may be in invisible ink between the visible lines of a private letter.[1] Steganography includes the concealment of information within computer files. In digital steganography, electronic communications may include steganographic coding inside of a transport layer, such as a document file, image file, program or protocol. Media files are ideal for steganographic transmission because of their large size. As a simple example, a sender might start with an innocuous image file and adjust the color of every 100th pixel to correspond to a letter in the alphabet, a change so subtle that someone not specifically looking for it is unlikely to notice it. Cryptographic and Steganographic methodology are very different. The advantage of using steganography over cryptography is that in steganography the person looking at the object wherein information is hidden won’t be able to notice the information stashed there, therefore would not attempt to decrypt it. For many years Information Hiding has captured the imagination of researchers. In this paper we introduce the textual steganographic technique through PDF or exploring the possibilities to develop a new steganographic method in PDF carrier files to protect information, and conceal secrets. A. Introduction To Pdf Format Portable Document Format (PDF) is a file format created by Adobe Systems in 1993 for document exchange. PDF is used for representing two dimensional documents in a manner independent of the application software, hardware, and operating system.[3] PDF File format is also a true programming language of its own, strongly dedicated to document creation and manipulation which has accumulated a lot of powerful programming features from version to version.[4] B. Information Hiding Information Hiding is the principle of segregation of the design decisions that are most likely to change in a computer program thereby protecting other parts of the program from extensive modification in case of a change in the design decision by providing a stable interface protecting the remainder of the program from implementation.[6] II. PROMBLEM DEFINATION To provide secured communication through a safe format. We are only limited by our imagination in the many ways information and data can be exploited to conceal additional information. Information hiding techniques provide an interesting challenge for digital forensic investigations. Information can easily traverse through firewalls undetected. Research into steganalysis techniques aids in the discovery of such hidden information as well as leads research towards improved methods for hiding information and today as we see it, it is the most popular and essential way of communication known as “Invisible Communication”. A. Existing Problem Cryptology prior to the modern age was almost synonymous with encryption, the conversion of information from a readable state to apparent nonsense. The sender retained the ability to decrypt the information and therefore avoided unwanted persons being able to read it. Modern cryptography[7] follows a strongly scientific approach, and designs cryptographic algorithms around computational hardness assumptions that are assumed hard to be broken by an adversary. 2011 Seventh International Conference on Intelligent Information Hiding and Multimedia Signal Processing 978-0-7695-4517-2/11 $26.00 © 2011 IEEE DOI 10.1109/IIHMSP.2011.103 173

Upload: gundeep-singh

Post on 10-Mar-2017

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE 2011 Seventh International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP) - Dalian, China (2011.10.14-2011.10.16)] 2011 Seventh International

Invisible Communication through Portable Document File (PDF) Format

Gundeep Singh BindraComputer Science Engineering

SRM University New Delhi, India

[email protected]

Abstract—With the fast paced development of the internet, exchanging or concealing of private data has become a serious concern. This Paper proposes a new and an unidentified steganographic technique to conceal your data which can be further encrypted (through cryptography) in a .pdf file format. The informative-theoretic method for performing steganography deals with text-to-text steganography while restoring the carrier file along with the hidden message after binding it in a PDF file format. This paper introduces a new steganographic technique through a PDF medium. It also suggests plausibility of exploring the possibility of adding layers of security to the suggested information hiding technique.

Keywords- Portable Document Format; Invisible; Document Threat; Communication; Security;

I. INTRODUCTION

Steganography is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity. The word steganography is of Greek origin and means "concealed writing". Generally, messages will appear to be something else: images, articles, shopping lists, or some other cover-text and, classically, the hidden message may be in invisible ink between the visible lines of a private letter.[1]

Steganography includes the concealment of information within computer files. In digital steganography, electronic communications may include steganographic coding inside of a transport layer, such as a document file, image file, program or protocol. Media files are ideal for steganographic transmission because of their large size. As a simple example, a sender might start with an innocuous image file and adjust the color of every 100th pixel to correspond to a letter in the alphabet, a change so subtle that someone not specifically looking for it is unlikely to notice it.

Cryptographic and Steganographic methodology are very different. The advantage of using steganography over cryptography is that in steganography the person looking at the object wherein information is hidden won’t be able to notice the information stashed there, therefore would not attempt to decrypt it.

For many years Information Hiding has captured the imagination of researchers. In this paper we introduce the textual steganographic technique through PDF or exploring

the possibilities to develop a new steganographic method in PDF carrier files to protect information, and conceal secrets.

A. Introduction To Pdf Format Portable Document Format (PDF) is a file format created

by Adobe Systems in 1993 for document exchange. PDF is used for representing two dimensional documents in a manner independent of the application software, hardware, and operating system.[3] PDF File format is also a true programming language of its own, strongly dedicated to document creation and manipulation which has accumulated a lot of powerful programming features from version to version.[4]

B. Information Hiding Information Hiding is the principle of segregation of the

design decisions that are most likely to change in a computer program thereby protecting other parts of the program from extensive modification in case of a change in the design decision by providing a stable interface protecting the remainder of the program from implementation.[6]

II. PROMBLEM DEFINATION

To provide secured communication through a safe format. We are only limited by our imagination in the many ways information and data can be exploited to conceal additional information. Information hiding techniques provide an interesting challenge for digital forensic investigations.

Information can easily traverse through firewalls undetected. Research into steganalysis techniques aids in the discovery of such hidden information as well as leads research towards improved methods for hiding information and today as we see it, it is the most popular and essential way of communication known as “Invisible Communication”.

A. Existing Problem Cryptology prior to the modern age was almost

synonymous with encryption, the conversion of information from a readable state to apparent nonsense. The sender retained the ability to decrypt the information and therefore avoided unwanted persons being able to read it.

Modern cryptography[7] follows a strongly scientific approach, and designs cryptographic algorithms around computational hardness assumptions that are assumed hard to be broken by an adversary.

2011 Seventh International Conference on Intelligent Information Hiding and Multimedia Signal Processing

978-0-7695-4517-2/11 $26.00 © 2011 IEEE

DOI 10.1109/IIHMSP.2011.103

173

Page 2: [IEEE 2011 Seventh International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP) - Dalian, China (2011.10.14-2011.10.16)] 2011 Seventh International

B. Disadvantages to the Traditional Approach 1) These systems are breakable in theory but it is

difficult to do so for various practical adversaries depending on the complexity of the algorithm being used to encipher the information. Information-theoretically secure schemes that probably cannot be broken exist but they are less practical than computationally-secure mechanisms. An example of such systems is the one-time pad.

2) Plainly visible encrypted messages, will arouse suspicion, and may in themselves be incriminating, specially in countries where encryption is illegal.

III. METHODOLOGY

This paper emphasizes the combination of steganography and cryptography. Whereas cryptography protects the contents of a message, steganography can be said to protect both messages and communicating parties. So arises the need of communication through a safe format "PDF" which would allow the concealment of data without raising any suspicion.

A. Data Validation In computer science, data validation[8] is the process of

ensuring that a program operates on clean, correct and useful data. It uses routines, often called "validation rules" or "check routines", that check for correctness, meaningfulness, and security of data that are input to the system. Incorrect data validation can lead to data corruption or security vulnerability. Data validation checks that data are valid, sensible, reasonable, and secure before they are processed.

B. Hide Other Various Acrobat Form Field Let’s say the user enters a number which is not the

unique code, and then both the fields are hidden. Otherwise, both the fields are visible.[10]

get active (visible). var code = this.getField ("NumCtaPat"); var i = this.getField ("DesInc."); if(code.value<0009011991|| event.value>0009011991) { i.hidden = true }elseif(code.value<0009011991|| event.value>0009011991) { i.hidden = true }elsei.hidden = false

Hence, "Invisible Communication through PDF" involves a combination of the above two techniques i.e. data validation and information hiding.

IV. IMPLEMENTATION

1) 1) Download and open the scripted PDF document (The document has been uploaded at http://gundeepbindra.com/ICTPDF/ICTPDF.pdf)

2) Figure 1. below shows the default value entered in the main form field when the document is opened, hereafter, referred to as the controlling form field. The reason why I have not included any asterisks or password feature is because doing so will firstly, negate the invisibility theme of my paper and secondly, will arise unnecessary suspicion.

Figure 1. Default View

3) Based on the Data validation technique, entering the unique code in the controlling form field alone will regulate the visibility of the other two form fields containing the user information.

4) After inputting the unique code (already in the knowledge of the recipient) into the controlling form field, the other two form fields will be available for entering the information that is to be concealed as shown in figure 2. below.

5) The unique code for this file is 0009011991.

Figure 2.

6) Figure 3. below shows the numeric value entered in the first form fields by the user, in this case, the sender or the person performing the hiding technique.

174

Page 3: [IEEE 2011 Seventh International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP) - Dalian, China (2011.10.14-2011.10.16)] 2011 Seventh International

7) Similarly a numeric value can also be entered in the second form field. Example: the first field can hold the credit card number and the second field can hold the ATM pin associated with it. (I am using only numeric information as to keep the implementation simple.The same can be extrapolated for string information as well.)

Figure 3.

8) Figure 4. below shows a random number entered in the controlling form field which hides the two form fields with the user’s entered information to be concealed. The sender can use any everyday information which is not security sensitive such as university roll number etc to define/describe the number being sent in the controlling form field so as to ward off any suspicion.

9) The PDF document must be saved after the information to be communicated has been incorporated by the sender. The resultant PDF will just have a few KBs of extra size. The value entered last to hide the information will be the default value after the PDF is opened next.

Figure 4.

10) The document is ready to be sent over a secured communication channel to the receiver.

11) The receiver can successfully open the PDF document to retrieve the data entered by the sender by entering the unique code set by the user in the script of the PDF file.

12) The input will trigger the display of the hidden fields with the information entered by the sender.

13) Figure 5, below shows the display of the two form fields which got displayed due to the unique code entered by the receiver.

Figure 5.

14) On receiving the secured information, the receiver must enter a random number to switch off the visibility of the two form fields and then save the PDF file

15) The value entered last (the random number entered by the receiver) in the controlling form field will be the default value when the PDF is opened next. The saved PDF will again result in an addition of a few KBs of extra size.

16) The Step by step Instruction Manual has been uploaded at http://www.gundeepbindra.com/ICTPDF/SbSIM.mov in addition to being embedded at the beginning of this paper.[11]

V. LIMITATION

Although the intruder is unaware of an invisible information being communicated, there is a plausibility, however rare, of conjecturing that there may be some transmission of information happening. In that case, cracking the numerical code using Brute-force on the computer will not be an ambitious task. The computer will have to iterate only 1010 times to hit the unique code.

VI. OVERCOMING THE CHALLENGE

Rather than a stand alone form field, a Brute-force resistant option is to use a set of three related form fields. This would considerably enhance robustness of the mechanism proposed in this paper. The visibility of the information would now depend on the correctness of all the triad controlling form fields instead of a unary controlling field.

Figure 6 below, illustrates the use of the triad field forms, where the three unique codes are 0009011991, 0009011989 and 0011081963 instead of a single code (0009011991) in the earlier case. The triad field forms would subsequently have phenomenally more number of combinations that need to be tried before hitting the unique code, making it pragmatically infeasible. In order to hit the correct unique code, it is mandatory to hit the correct combination of numbers in ALL THREE field forms of the triad.

175

Page 4: [IEEE 2011 Seventh International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP) - Dalian, China (2011.10.14-2011.10.16)] 2011 Seventh International

*Effciency = [3*1010[1010-1]+1-(1010-1)]/1010 = 29999999999 times

1012 %

The Efficiency increases to about 10 power 12 percent i.e. 1012 %

CONCLUSION AND FUTURE WORK

It is a cat-and-mouse game. There are new techniques being developed everyday. To ensure yet another layer of security for the information being communicated, cryptography or other encryption techniques can come into play after the concealed information is seen by the intruder. While the merit of cryptography will be retained, its limitation of information scrambling and subsequent arising of suspicion are already combated.

Figure 6.

Table 1. gives various cases of outcomes on entry of number codes in the form fields. A tick ( ) denotes a correct entry and a cross (X) denotes an incorrect.

Mathematical interpretation of the above, given that each field form has 10 digits, implies that

Total number of combinations = 1010

Total number of correct combination = 1 Total number of incorrect combinations = 10

ACKNOWLEDGMENT

At the outset, in all humility, I thank the Almighty for the plentiful blessings He has showered on me to undertake and see through to completion this paper. I wish to gratefully acknowledge the support of my parents, my sister Harshana, Girish Sir and thank all the authors and researchers whose work I have consulted.

10 -1

TABLE I. TABLE TYPE STYLES

Form FieldNo. 1

Form FieldNo. 2

Form FieldNo. 3

Outcome: Hitting the unique code in the

controlling form field CASE 1 X X Not Achieved CASE 2 X X Not Achieved CASE 3 X X Not Achieved CASE 4 X Not Achieved CASE 5 X Not Achieved CASE 6 X Not Achieved CASE 7 X X X Not Achieved CASE 8 Achieved

REFERENCES

[1] Wikipedia “The Free Encyclopedia”. Accessed: March 9, 2010 [2] Pahati, OJ (2001-11-29). "Confounding Carnivore: How to Protect

Your Online Privacy". Archived from the original on 2007-07-16. Retrieved on 2008-09-02. Accessed: March 11, 2010

[3] Adobe Systems Incorporated, PDF Reference, Sixth edition, version 1.23, Page 33. Accessed: March 11 2010

[4] Eric Filiol “Portable Document Format (PDF) Security Analysis and Malware Threats” , Page 3 – IEEE Journal Paper. September 2009. Accessed: March 3, 2010

[5] Adobe PDF 101: Quick overview of PDF File Format. Accessed: March 25, 2010 Various cases of outcomes on entry of number codes in the form fields

[6] Petitcolas, F.A.P.; Anderson, R.J.; Kuhn, M.G.; Computer. Lab., Cambridge University. “Information hiding-a survey”. ISSN: 0018-9219. Issue Date: July 1999 Volume: 87 Issue:7. page(s): 1062 -1078. Accessed: April 2, 2010

Adding all the “Not Achieved” cases, we get:

= 3[1 ×(1010 -1)×( 1010 -1) ] + 3 [1×1× (1010-1 )] [7] David L. Parnas Professor of Software Engineering, McMaster University. “The secret history of information hiding” ISBN:3-540- 43081-4. April 2002. Accessed: February 21, 2010

+ (1×1×1) = 3[(1010 -1)2 +3[1010-1]+1

[8] Victor R. Basili, Richard W. Selby Jr., Tsai-Yun Phillips “Metric Analysis and Data Validation” Dec 1981. IEEE Trans Software Engg. Accessed: March 23, 2010

= 3[1020+1-2. 1010+1010-1]+1..= 3[1020-1010]+1 = 3[1010 . 1010 -1010]+1 = 3. 1010 [1010-1]+1

[9] John Deubert “Adobe Acrobat 9 for Windows and Macintosh: Visual QuickStart Guide” Book. Accessed: May 9, 2010

TABLE II. TABLE TYPE STYLES[10] PDF Scripting, Tutorials, Tools, Scripts and Samples for Scripting

and PDFWebSite. http://www.pdfscripting.com. Accessed: April 2, 2011

Type of Form Field

Number of Wrong Hits

Remarks

Standalone 1010-1 Can be cracked by Brute-

force Method

Triad 3×1010[1010-1]+1 Efficiency increases by 29999999999 times*

[11] Randy Pausch, “The Last Lecture - September 18, 2007”, “First Steps Towards Storytelling in Virtual Reality.” ‘96 Conference Proceedings, Computer Graphics, August 1996, pages 193-203. Inspired me to make the embedded video. Accessed: June 2, 2011

*Efficiency comparison between the two types of cases

176