data hiding in arabic text - uotechnology.edu.iq
TRANSCRIPT
Data Hiding in Arabic Text
Ministry of Higher Education and Scientific Research
University of Technology
School of Technical Education
Data Hiding in Arabic Text
A Thesis
Submitted to Technical Education Department
University of Technology/ Baghdad
in Partial Fulfillment of the Requirement for the
Degree of Doctor of Philosophy
in Engineering Education Technology/
Electrical Engineering
By
Auday Jamal Fawzi
Supervised By
Dr. Saleh M. Al-Karaawy Dr. Shawket T. Al-Hiazay Ass. Prof. Prof.
2007
Data Hiding in Arabic Text
Language Certification
This is to certify that I have read the thesis titled “Data
Hiding in Arabic Text” and corrected any mistake in grammar
and style.
Dr. Moutaz S. Abdul Wahab
Assistant Professor
Data Hiding in Arabic Text
Supervisor’s Certificate
We certify that this thesis entitled “Data Hiding in Arabic Text” was
prepared by (Auday Jamal Fawzi) under our supervision at the
Department of Technical Education / University of Technology / Baghdad,
in partial fulfillment of requirements for the degree of Doctor of
Philosophy in Engineering Education Technology / Electrical Engineering.
Signature: Signature:
Name: Dr. Saleh M. Al-Karaawy Name: Dr. Prof. Shawkat T. Al-Hiazay
Engineering Supervisor Technical Supervisor
Date: / 1 / 2007 Date: / 1 / 2007
Data Hiding in Arabic Text
Examination Certificate
We certify that this thesis entitled “Data Hiding in Arabic Text”, is
submitted by the student (Auday Jamal Fawzi), and as Examining
Committee examined the student in its content and that, in our opinion, it
meets the standard of a thesis for the degree of Doctor of Philosophy in
Engineering Education Technology.
Signature:
Prof. Dr. Emad Al-Hussani
(member) Date: / 1 / 2007
Signature:
Assist Prof. Dr. Nasser K. Al-Ani
(member) Date: / 1 / 2007
Signature:
Assist Prof. Dr. Adnan Al-Sultani
(member) Date: / 1 / 2007
Signature:
Assist Prof. Dr. Ibtesam R. Karhiy
(member) Date: / 1 / 2007
Signature:
Prof. Dr. Hilal H. Saleh
(Chairman) Date: / 1 / 2007
Signature:
Assist Prof. Dr. Saleh M. Al-Karaawy
(Supervisor) Date: / 1 / 2007
Signature:
Prof. Dr. Shawket T. Al-Hiazay
(Supervisor) Date: / 1 / 2007
Signature:
Dr. Dhari Yousif Mahmood
(Head of the Technical Education Department) Date: / 1 / 2007
Data Hiding in Arabic Text
Dedication
I’d like to present this work to
My family with love
My teachers with respect
And to the memory of my teacher
Dr. Awatif Barsoum,
Data Hiding in Arabic Text
Acknowledgment
I’d like to express my deep gratitude to my supervisors
Dr. Saleh M. Al-Karaawy and Dr. Shawkat T. Al-Hiazay for their
willingness to discuss the research, continual encouragement and
their gentle and valuable comment.
Data Hiding in Arabic Text
I
Abstract There has been a rapid growth of interest in information and how this
information to be transferred within a network linking the entire world,
these necessarily require a way to maintain information privacy and
security. This is an incentive to achieve this research where its goal is to
find a new technique to cipher and hide information in Arabic text
documents.
To achieve the research goal, a program is prepared to cipher a
message and then hide it in an Arabic paragraph taking into consideration
to deal with two types of files, the first one is document file type and the
second is Rich Text Format (RTF) file, where the two types are compatible
with Microsoft Word application.
This work is concerned with the creation of a program to cipher a
message, then hiding it using a white space and word shift methods that
deal with English paragraph to be hidden message in an Arabic text
document. The program to hide message in an Arabic text is prepared by
taking the benefit of extension (ـ) used with Arabic text.
The above methods are implemented, and results show that there is
still need for a method having more efficiency to achieve more security.
For this reason a new technique is proposed to hide in an Arabic text named
a “Unicode system method”, which uses the Arabic character code to hide
the message. After implementing this technique on Arabic text, it is found
that the target file size takes the same source file size, and the third party
cannot recognize the difference between source and target files by eye
which makes it difficult to break the hidden message.
Because the processes of hiding are done by people who use the
World Wide Web, and because this net deals with different operating
systems, RTF files are used. This is because they serve as both a standard
Data Hiding in Arabic Text
II
of data transfer between word processing software, document formatting,
and a means of migrating content from one operating system to another.
Implementing RTF file technique on Arabic text gives two
advantages; first the third party cannot recognize the difference between
source and target files by eye, and increase the amount of information
hidden in file. However, one disadvantage is found which is the increase of
file size after hiding process and to avoid this problem, a proposed
subroutine is written to compress the file in order to make the difference
between its size and the source file as small as possible.
An educational program has been prepared depending on
instructional design concepts and using tutorial method to present concepts
and information of ciphering and hiding process. And for the benefit of the
target population from the program, a questionnaire form had been
prepared to be evaluated by a number of experts and students, and as a
result of the questionnaire and by using feedback process a development is
done to achieve the link between the theoretical side and practical side of
the research.
Data Hiding in Arabic Text
III
Table of Contents Abstract I
Contents III
List of Symbols VIII
List of Abbreviation VIII
Chapter One: Research Foundations
1.1 Introduction 1
1.2 Information Security Concepts 1
1.3 Motivations to Use Steganography 3
1.4 Information Hiding Applications 3
1.5 Research Problem 4
1.6 Research Importance 4
1.7 Research Aims 5
1.8 Research Limits 5
1.9 Terminology 6
1.10 Literature Review 6
1.10.1 Engineering Literatures 6
1.10.2 Instructional Technology Literatures 9
1.11 Thesis Organization 11
Chapter Two: Theoretical Concepts of Data Hiding
2.1 Introduction 12
2.2 Cryptography 12
2.2.1 General Concepts 12
2.2.2 Stream Cipher 13
2.2.3 Random Number Generation 14
2.2.4 Shift Register Based Schemes 14
2.2.4.1 Linear Feedback Shift Register 14
Data Hiding in Arabic Text
IV
2.2.4.2 Combination and Filter Generators 15
2.2.4.3 Multiplexers 15
2.2.4.4 Desirable Properties of LFSR-Based Keystream Generators
16
2.2.4.5 Life Cycle of a Key 16
2.2.5 Statistical Tests 17
2.3 Steganography (Data Hiding in Text) 19
2.3.1 General Concepts 19
2.3.2 Coding Methods 20
2.3.2.1 Open Space Methods 20
2.3.2.2 Syntactic Methods 22
2.3.2.3 Semantic Methods 23
2.3.2.4 Shift Coding 24
2.3.2.5 Feature Coding 25
2.3.3 Steganographic Protocols 26
2.3.3.1 Pure Steganography 26
2.3.3.2 Secret Key Steganography 26
2.3.3.3 Public Key Steganography 26
2.4 Unicode System 27
2.4.1 Characters 27
2.4.2 Arabic Characters 28
2.5 Data Compression 28
2.5.1 Static Huffman Coding 29
2.5.1.1 Encoding 30
2.5.1.2 Decoding 31
2.6 Technical Framework 31
2.6.1 Introduction 31
2.6.2 Instructional Design 32
2.6.3 Instructional Package 33
Data Hiding in Arabic Text
V
Chapter Three: The Proposed Hiding Algorithm
3.1 Introduction 34
3.2 Specification of the proposed Software 34
3.3 The Proposed Software Structure 35
3.4 Operation of the Proposed Software 37
3.4.1 Providing a Plain Text and a Password 37
3.4.2 Load Microsoft Word File 37
3.4.3 Selecting Hiding Method 38
3.4.4 Debrief the Time from the Computer Clock 38
3.4.5 Generate Keystream 40
3.4.5.1 Labels 40
3.4.5.2 The Registers 42
3.4.5.3 Initialization Registers 42
3.4.5.4 Design Principles 49
3.4.5.5 Keystream Generation 49
3.4.5.6 Keystream Testing 51
3.4.6 Huffman Code 55
3.4.7 Check Document File 59
3.4.8 Encryption 59
3.4.9 Hide Cipher Text 59
3.4.9.1 Hyphen Method 59
3.4.9.2 White Space Method 61
3.4.9.3 Change Word Position Method 61
3.4.9.4 Unicode System Method 61
3.4.10 Hiding the Time 64
3.5 Hiding Data in a Rich Text Format (RTF) File 64
3.5.1 Contents of an RTF File 64
3.5.2 Paragraph Formatting Properties 65
3.5.3 Hiding Algorithm 65
Data Hiding in Arabic Text
VI
3.5.4 Unhiding Algorithm 67
3.5.5 Compression Algorithm 67
3.6 Design Instructional Package 69
3.6.1 Analysis 69
3.6.2 Construction 70
3.6.3 Evaluation 73
3.6.4 Statistical Method 73
Chapter Four: Results and Discussion
4.1 Introduction 75
4.2 Ciphering and Hiding Data in .DOC Document Files 75
4.2.1 Open Document File 76
4.2.2 Select Hiding Method 77
4.2.3 Write the Message 78
4.2.4 Write the Password 78
4.2.5 Start Hiding Process 78
4.2.6 Hiding Data with Unicode System Method 88
4.2.7 Hiding Data with White Space Method 90
4.2.8 Hiding Data with Hyphen Method 91
4.2.9 Hiding Data with Change Position Method 93
4.3 Hiding Data in .RTF Document Files 95
4.3.1 Open Document File 95
4.3.2 Write the Message 96
4.3.3 Start Hiding Process 97
4.4 Discussion 100
4.5 Instructional Technology Side Results 103
4.5.1 Opinion List Results of Experts View Point Analysis 103
4.5.2 Questionnaire Results of Learners View Point Analysis 104
4.6 Conclusions 105
Data Hiding in Arabic Text
VII
4.7 Recommendations 106
4.8 Suggestions 106
References 107
Appendixes
Appendix A Unicode Tables A-1
Appendix B Program Subroutines B-1
Appendix C Expert’s and Learner’s Questionnaire Forms C-1
Data Hiding in Arabic Text
VIII
List of Symbols
X1 Frequency test value
X2 Serial test value
X3 Poker test value
X4 Run test value
X5 Autocorrelation test value
2χ Chi Square
List of Abbreviations
ANSI American National Standard Institute
AppWd Application Word Document
ASCII American Standard Code for Information Interchange
DOC Document file format
LFSR Linear Feedback Shift Register
RTF Rich Text File format
Unicode Universal Character Encoding Standard
XOR Exclusive OR gate
Data Hiding in Arabic Text
Chapter One
Research Foundations
Data Hiding in Arabic Text
Chapter One Research Foundations
1
1.1 Introduction
Steganography is the art of covered or hidden writing, the purpose of
steganography is covert communication to hide a message from a third
party, this differs from cryptography, the art of secret writing, which is
intended to make a message unreadable by a third party but does not hide
the existence of the secret communication. Although steganography is
separate and distinct from cryptography, there are many analogies between
the two, and some authors categorize steganography as a form of
cryptography since hidden communication is a form of secret writing [1].
Steganography hides the covert message but not the fact that two
parties are communicating with each other, the steganography process
generally involves placing a hidden message in some transport medium,
called the carrier, the secret message is embedded in the carrier to form the
steganography medium. The use of a steganography key may be employed
for encryption of the hidden message and for randomization in the
steganography scheme, in summary [2]:
steganography_medium = hidden_message + carrier + steganography_key
As an increasing amount of data is stored on computers and
transmitted over networks, it is not surprising that steganography has
entered the digital age. On computers and networks, steganography
applications allow for someone to hide any type of binary file in any other
binary file [3].
1.2 Information Security Concepts
Information security includes two fields; Cryptography and
Steganography:
1. Cryptography is the science of information security. The word is derived
from the Greek kryptos, meaning hidden. Cryptography is closely
Data Hiding in Arabic Text
Chapter One Research Foundations
2
related to the disciplines of cryptology and cryptanalysis. Cryptography
includes techniques such as microdots, merging words with images,
and other ways to hide information in storage or transit. However, in
today's computer-centric world, cryptography is most often associated
with scrambling plaintext (ordinary text, sometimes referred to as
cleartext) into ciphertext (a process called encryption), then back again
(known as decryption). Individuals who practice this field are known as
cryptographers [4].
2. Steganography on the other hand (pronounced stehg-uh-nah-gruhf-ee,
from Greek steganos, or "covered," and graphie, or "writing") is the art
of concealing the existence of information within seemingly innocuous
carriers. Steganography can be viewed as akin to cryptography. Both
have been used throughout recorded history as means to protect
information [4].
Steganography is the art of hiding signals inside other signals, this
basically comes down to using unnecessary bits (holes) in an innocent
file to store the sensitive data, the techniques used make it impossible
to detect that there is anything inside the innocent file, but the intended
recipient can obtain the hidden data. A further challenge is to fill these
holes with data in a way that remains invariant to a large class of host
signal transformations [5, 6].
While cryptography is about protecting the content of messages
(their meaning), steganography is about concealing their very existence, it
is usually interpreted to mean hiding information in other information.
Examples include sending a message to a spy by marking certain letters in
a newspaper using invisible ink, and adding sub-perceptible echo at certain
places in an audio recording, it is often thought that communications may
Data Hiding in Arabic Text
Chapter One Research Foundations
3
be secured by encrypting the traffic, but this has rarely been adequate in
practice [7].
1.3 Motivations to Use Steganography
There has been a rapid growth of interest in this subject over the last
few years, and for many reasons [8, 9]:
1. The publishing and broadcasting production have become interested in
techniques for hiding encrypted copyright marks and serial numbers in
digital films, audio recordings, books and multimedia products; an
appreciation of new market opportunities created by digital
distribution is coupled with a fear that digital works could be too easy
to copy.
2. Various governments to restrict the availability of encryption services
have motivated people to study methods by which private messages
can be embedded in seemingly innocuous cover messages. The ease
with which this can be done may be an argument against imposing
restrictions.
3. Protect data from compromise or disclosure, like a design for a new
business system, that information should be protected from disclosure.
4. People hide data is because they don't want anyone to see it except for
them.
5. People hide data is for covert communication, hiding data for covert
communication can be very effective if someone is not expecting
anyone to communicate in that way.
6. Someone may not want anyone to see data because it contains a virus
or Trojan.
1.4 Information Hiding Applications
Data hidden in text has a variety of applications, including copyright,
verification, authentication, and annotation. Making copyright information
Data Hiding in Arabic Text
Chapter One Research Foundations
4
inseparable from the text is one way for publishers to protect their products
in an era of increasing electronic distribution. Annotation can be used for
tamper protection. For example, if a cryptographic hash of the paper is
encoded into the paper, it is a simple matter to determine whether or not the
file has been changed. Verification is among the tasks that could easily be
performed by a server which, in this case, would return the judgment
“authentic” or “unauthentic” as appropriate [5].
1.5 Research Problem
The problem of the research could be summarized as follows:
1. The well-known methods for information hiding in a document file do
not offer an effective way to avoid attacking. So, it is the time to think
about a new method that is suitable for Internet applications such as
E-mail.
2. Spreading of the computers that are connected with a network for data
transfer, there is a need for transfer data securely between these
computers.
1.6 Research Importance
The importance of the work comes from the following aspects:
1. The possibility of using the software in governmental and special
offices to hide personal and security information on their local
computers.
2. There is an idea to implement a computer network in the University of
Technology, and most of the files to be transferred between users are
in Arabic. So there is an opportunity to exploit this media to hide
secured information (for example, transmit a secure data between
university chairmanship and departments).
3. The possibility of using this work as a practical course in a field of
ciphering and hiding data in an Arabic text.
Data Hiding in Arabic Text
Chapter One Research Foundations
5
4. The previous researches hide information in an English text only,
while the current research hides in an Arabic text.
5. Some previous researches study the information hiding technique
using theoretical side only, while this research study its theoretical and
practical sides.
1.7 Research Aims
The main aims of this work are:
1. Design software that is capable of ciphering and hiding data in Arabic
text.
2. Design an instructional package to represent or view the scientific
concepts of cryptography and steganography.
1.8 Research Limits
This work is limited to the following:
1. Design and implement software that is capable to hide information in
Arabic document files.
2. The document files used to hide data are Microsoft Word Document
that has extensions (.DOC, .RTF).
3. The hidden message use only Arabic characters.
4. Design an instructional package to describe concepts of cryptography
and steganography techniques in Arabic document file.
5. The software can be used by:
A. Computer engineering, computer science, communication
engineering.
B. Post graduate students in computer science, computer engineering,
and communication engineering departments.
6. The academic year 2005-2006, in University of Technology.
Data Hiding in Arabic Text
Chapter One Research Foundations
6
1.9 Terminology
There are many vocabularies in this research that need to be defined:
1. Network: A network is defined as two or more computers linked
together for the purpose of communicating and sharing information
and other resources. Most networks are constructed around a cable
connection that links the computers. This connection permits the
computers to talk (and listen) through a wire.
2. Encoding: Is the process of transforming information from one
format into another. Character encoding is a code that pairs a set of
natural language characters (such as an alphabet) with a set of
something else, such as numbers.
3. Decoding: Is the process of transforming information from one
format into another, it is opposite operation of encoding.
4. Package: Instructional package is one of the instructional design
programs, which consist of three elements: printing materials,
audible materials, visual materials.
1.10 Literature Review
1.10.1 Engineering Literatures
1. Brassil ,J. T., et al., “Copyright Protection for the Electronic
Distribution of Text Documents”, 1999. The researchers proposed a
watermarking method called word-shift coding. In this method, each
line is first divided into groups of words. Each group has a sufficient
number of characters. Then, each even group is shifted to the left or the
right according to the value of a specific bit in the payload. The odd
groups are used as references for measuring and comparing the
distances between the groups during the decoding stage. A correlation
method has been suggested for detecting the watermark. This method
requires the use of the original document, especially when the inter-
word spacing is variable [10].
Data Hiding in Arabic Text
Chapter One Research Foundations
7
2. Shaar, Mahmoud, et al., “A Hybrid Hiding Encryption Algorithm
for Data Communication Security”, 2003. The researcher presents an
encryption algorithm that can be used for hardware-implemented
applications to secure data communications. This encryption algorithm
is based on hiding a number of bits from plain text message into a
random vector of bits. The locations of the hidden bits are determined
by a key known to the sender and receiver. The name of this paper
demonstrates the two basic operations of this algorithm. These are
operations that include inserting part of the plaintext bits into a cover to
hide it from recognition. There are no conventional operations on the
ciphered text, just plain hiding in a random bit string [11].
3. Kim, Young-Won, et al., “A Text Watermarking Algorithm based
on Word Classification and Inter-word Space Statistics”, 2003. The
researcher proposes a text watermarking algorithm that exploits the
novel concepts of word classification and inter-word space statistics.
The words are classified using some features. Several adjacent words
are grouped into a segment, and the segments are also classified using
the word class information. The same amount of information is inserted
into each of the segment classes. The information is encoded by
modifying some statistics of inter-word spaces of the segments
belonging to the same class. Several advantages over the conventional
word-shift algorithms come from the concepts of the word and segment
classification and of using the statistical distributions of inter-word
spaces, where which in the conventional algorithms, individual lines or
words hide a portion of total watermarking information independently
of other lines or words [12].
Data Hiding in Arabic Text
Chapter One Research Foundations
8
4. Sui, Xin-Giiang, and Lilo, Hui, “A New Steganography method
Based on Hypertext”, 2004. The researcher proposes to analyze the
structure of the Hypertext files and proposes a new secure
steganography method. This method achieves the aim of hiding secret
information in hypertext by modifying the written states of the markup
letters. Experiments and analysis prove that it is a method with high
efficiency and security, since the method modifies only the markup
letters instead of the content itself where the stego-hypertext and the
cover have no difference in normal show. And the algorithm doesn't
lengthen the file since it just modifies the markup letters instead of
adding letters [13].
5. Topkara, M., et al., “Natural Language Watermarking”, 2005. The
researcher discusses natural language watermarking, which uses the
structure of the sentence constituents in natural language text in order
to insert a watermark. This approach is different from techniques,
collectively referred to as “text watermarking,” which embed
information by modifying the appearance of text elements, such as
lines, words, or characters. The goal in this paper is to review the
current state of the art in natural language watermarking. The type of
the text that is being modified for watermarking has an important effect
on the process of evaluation. For example, when watermarking a
magazine article or a novel, the emphasis may be on the preservation of
the author’s style. On the other hand, when watermarking a cooking
recipe or a user manual, preserving the preciseness and jargon would be
more important [14].
Data Hiding in Arabic Text
Chapter One Research Foundations
9
6. Voloshynovskiy, S . , et al . , “Text Data-Hiding for Digital and Printed
Documents: Theoretical and Practical Considerations”, 2006. In this
paper, the researcher proposes a new theoretical framework for the
data-hiding problem of digital and printed text documents. The main
idea for this interpretation is to consider a text character as a data
structure consisting of multiple quantifiable features such as shape,
position, orientation, size, color, etc. We also introduce color
quantization, a new semi-fragile text data-hiding method that is fully
automatable, has high information embedding rate, and can be applied
to both digital and printed text documents. The main idea of this
method is to quantize the color or luminance intensity of each character
in such a manner that the human visual system is not able to distinguish
between the original and quantized characters, but it can be easily
performed by a specialized reading machine. The implementation of
this method in a digital-only environment is straightforward. In the
experiments, the researchers implemented a prototype for Microsoft
Office Word documents capable of embedding and extracting any
arbitrary message. The experimental work confirmed that this method
has high perceptual invisibility, high information embedding rate, and
is fully automatable [15].
1.10.2 Instructional Technology Literatures
1. Uden, L .and Alderson, A., “Teaching and Learning Using
Instructional Design”, 2000. The researchers propose an Instructional
System Design module to the final year computing science students at
Staffordshire University, the aim was to teach the group of students the
various instructional design theories and Instructional System Design
processes .They want to establish whether the instructional design
theories and Instructional System Design processes did help students to
understand their learning better and improve on their work. They
Data Hiding in Arabic Text
Chapter One Research Foundations
10
concluded that applying Instructional Design Theories and the
Instructional System Design processes offers many benefits to helping
students in their learning. It enables students to classify the subject into
learning outcomes using taxonomy such as Gagne’s. The Instructional
System Design processes help students to identify the activities
involved in learning the subject. Finally, it also helps students to assess
their learning with the appropriate learning outcomes [16].
2. Tubsree, Chalong, and Tubsree, Nai-Fen Yu “Designing Effective
Instruction for Computer in Education Courses”, 2002. The researcher
proposes to design and develop an effective instruction for a computer
in education course. At the end of a study, seven instructional packages
were developed, the researcher then evaluated the developed packages
by considering students’ performance after studying the packages. It
was found that all students performed at the mastery level. They
produced high satisfaction on problem solving and construction task.
This indicated that the developed instructional packages helped
students learning [17].
3. Mushtaq, Rasha F . , “Educational Package for Detecting hidden
Information Embedded in an Image”, 2006. The researcher aims to
design an educational package forming the scientific concepts of
steganalysis, by building up instructional computer program depending
on the tutorial method in displaying its content and put it under
evaluation by the experts. The study reached the following:
a. The instructional package assisted the learners to develop their
self, because the package produces the feedback directly and
speedily.
Data Hiding in Arabic Text
Chapter One Research Foundations
11
b. The instructional package as a learning device has its
psychological and educational impacts, because the learners are
depending on their selves [18].
1.11 Thesis organization
This thesis consists of four chapters, as well as chapter one, it as follows:
• Chapter two: Gives the idea behind the cryptography and generating
a random keystream, steganography and hiding methods and
protocols, Unicode system, as well as data compression technique.
• Chapter three: Gives the design of the proposed software that used
to hide message in five methods (word space method, word shift
method, hyphen method, Unicode method, and hide in RTF file
format).
• Chapter four: Presents the software implementation and results, as
well as conclusions, recommendations and suggestions for future
works.
Data Hiding in Arabic Text
Chapter Two
Theoretical Concepts of
Data Hiding
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
12
2.1 Introduction
Cryptography and Steganography are effective methods used to
protect plain text message by encrypting and hiding it. The security of the
system is based on the difficulty of the inverse computation.
2.2 Cryptography
2.2.1 General Concepts
A method of encryption and decryption is called ciphering. Its goal
is to protect information from unauthorized users. Modern algorithms are
using a key to control encryption and decryption. A message can be
decrypted only if the decryption key matches the encryption key.
There are two classes of key-based encryption algorithms,
symmetric (or secret-key) and asymmetric (or public-key) algorithms.
The difference is that symmetric algorithms use the same key for
encryption and decryption, whereas asymmetric algorithms use a different
key for encryption and decryption.
Symmetric algorithms can be divided into stream ciphers and block
ciphers. Stream ciphers can encrypt a single bit of plaintext at a time,
whereas block ciphers take a number of bits (typically 64 bits), and encrypt
them as a single unit.
Asymmetric ciphers (also called public-key cryptography) permit
the encryption key to be public, allowing anyone to encrypt with the key,
whereas only the proper recipient (who knows the decryption key) can
decrypt the message. The encryption key is also called the public key and
the decryption key is the private key or secret key [19].
There are hundreds of types of cipher systems ranging from very
simple paper-and pencil systems to very complex cipher machine or
computer based enciphered systems. These can be categorized as either
transposition or substitution or a combination of the two. In a transposition
system, the plaintext characters of a message are systematically rearranged.
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
13
After transposing a message, the same characters are still present, but the
order of the letters is changed. In a substitution system, the plaintext
characters of a message are systematically replaced by other characters.
After the substitution takes place, the order of the underlying plaintext is
unchanged, but the same characters are no longer present [19].
2.2.2 Stream Cipher
A stream cipher is a type of symmetric encryption algorithm, it can
be designed to be exceptionally faster than any block cipher. While block
ciphers operate on large blocks of data, stream ciphers typically operate on
smaller units of plaintext, usually bits. The encryption of any particular
plaintext with a block cipher will result in the same ciphertext when the
same key is used. With a stream cipher, the transformation of these smaller
plaintext units will vary, depending on when they are encountered during
the encryption process.
A stream cipher generates what is called a keystream (a sequence of
bits used as a key). Encryption is accomplished by combining the
keystream with the plaintext, usually with the bitwise XOR operation. The
generation of the keystream can be independent of the plaintext and
ciphertext, yielding what is termed a synchronous stream cipher, or it can
depend on the data and its encryption, in which case the stream cipher is
said to be self-synchronizing [20].
Stream ciphers are generally faster than block ciphers, they are also
more appropriate, and in some cases mandatory, when buffering is limited
or when characters must be individually processed as they are received,
because they have limited or no error propagation, stream ciphers may
also be advantageous in situations where transmission errors are highly
probable [21].
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
14
2.2.3 Random Number Generation
A random number generator is an algorithm that outputs a sequence
of 0s and 1s such that at any point, the next bit cannot be predicted based
on the previous bits. However, true random number generation is difficult
to do on a computer, since computers are deterministic devices. Thus, if the
same random generator is run twice, identical results are received. True
random number generators take input from something in the physical
world, for example, the rate of neutron emission from a radioactive
substance.
Because of these difficulties, random number generation on a
computer is usually only pseudo-random number generation. A pseudo-
random number generator produces a sequence of bits that has a random
looking distribution. With each different seed, the pseudo-random number
generator generates a different pseudo-random sequence [22].
2.2.4 Shift Register Based Schemes
The vast majority of any proposed keystream generators are based in
some way on the use of linear feedback shift registers because their
behavior is easily analyzed using algebraic techniques [23].
2.2.4.1 Linear Feedback Shift Register
A Linear Feedback Shift Register (LFSR) is a mechanism for
generating a sequence of binary bits (keystream). It consists of a number of
stages numbered from left to right as 0…L-1 with feedback from each to
stage 0, as shown in Figure (2.1). The contents of the L stages of a register
describe its state.
The register is controlled by a clock and at each clocking instances
the contents of stage i are moved to stage i+1. The contents of stage L-1 are
output and form part of the sequence while the new contents to stage 0, are
calculated as some linear function of the previous contents from
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
15
stages 0…L-1, the particular function being dependent on the feedback
used where ci used as control bit of the feedback [23, 24].
Figure (2.1) Linear Feedback Shift Register
LFSRs are fast and easy to implement in both hardware and
software. With a judicious choice of feedback taps the sequences that are
generated can have a good statistical appearance. However, the sequences
generated by a single LFSR are not secure because a powerful
mathematical framework has been developed over the years which allows
for their straightforward analysis. However, LFSRs are useful as building
blocks in more secure systems.
2.2.4.2 Combination and Filter Generators
When using linear feedback shift registers there are two obvious
ways to generate an alternative output. The first is to use several registers
in parallel and to combine their output in some cryptographically secure
way. A generator like this is conventionally called a combination
generator. Another alternative is to generate the output sequence as some
nonlinear function of the state of a single register; such a register is termed
a filter generator [21].
2.2.4.3 Multiplexers
A multiplexer is a logic device that selects one input from a set of
inputs according to the value of another index input. The keystream
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
16
generator is conventionally described using two sequences and the
multiplexer is used to combine these two sequences in a highly nonlinear
way [23]. Figure (2.2) shows a two-to-one multiplexer block diagram.
Figure (2.2) Two by one multiplexer
2.2.4.4 Desirable Properties of LFSR-Based Keystream Generators
For essentially all possible secret keys, the output sequence of an
LFSR-based keystream generator should have the following properties [21]:
1. Large period.
2. Large linear complexity.
3. Good statistical properties.
2.2.4.5 Life Cycle of a Key
Keys have limited lifetimes for a number of reasons. The most
important reason is protection against cryptanalysis. Each time the key is
used, it generates a number of ciphertexts. Using a key repetitively allows
an attacker to build up a store of ciphertexts which may prove sufficient for
a successful cryptanalysis of the key value. Thus keys should have a
limited lifetime [25].
Data Input
Multiplexer
Selector
Data Output
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
17
2.2.5 Statistical Tests
There are some tests designed to measure the quality of a generator
purported to be a random bit generator, the tests described help detection of
certain kinds of weaknesses the generator may have. This is accomplished
by taking a sample output sequence of the generator and subjecting it to
various statistical tests. Each statistical test determines whether the
sequence possesses a certain attribute that a truly random sequence would
be likely to exhibit, if the sequence is deemed to have failed any one of the
statistical tests, the generator may be rejected as being non-random. On the
other hand, if the sequence passes all of the statistical tests, the generator is
accepted as being random. Below, five methods are discussed [21].
a. Frequency Test (Mono Bit Test)
The purpose of this test is to determine whether the number of 0’s
and 1’s in a binary sequence (s) are approximately the same, as would be
expected for a random sequence. Let n0 and n1 denote the number of 0’s
and 1’s in s, respectively. The statistics used is [21]:
n
)nn(X2
101
−= … 2.1
where X1: Frequency test value
n : length of the sequence
Which approximately follows a χ2 (Chi Square) distribution with
one degree of freedom if n ≥ 10.
b. Serial Test (Two-Bit Test)
The purpose of this test is to determine whether the number of
occurrences of 00, 01, 10, and 11 as subsequences of s are approximately
the same, as would be expected for a random sequence. Let n0, n1 denote
the number of 0’s and 1’s in s, respectively, and let n00, n01, n10, and n11
denote the number of occurrences of 00, 01, 10, 11 in s, respectively.
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
18
The n00 + n01 + n10 + n11 = (n - 1) since the subsequences are allowed
to overlap. The statistics used is [21]:
( ) ( ) 1nnn2nnnn
1n4X 2
120
211
210
201
2002 ++−+++
−= … 2.2
where X2: Serial test value
Which approximately follows a χ2 (Chi Square) distribution with
two degrees of freedom if n ≥ 21.
c. Poker Test
Let m be a positive integer such that )2(5][ mmn ×≥ and let ][k m
n= .
Divide the sequence s into k non-overlapping parts each of length m, and
let ni be the number of occurrences of the ith type of sequence of length m,
1 ≤ i ≤ 2m. The poker test determines whether the sequences of length m
each appears approximately the same number of times in s, as would be
expected for a random sequence. The statistics used is [21]:
knk
2Xm2
1i
2i
m
3 −⎟⎟⎠
⎞⎜⎜⎝
⎛= ∑
= … 2.3
where X3: Poker test value
Which approximately follows a χ2 distribution with 2m-1 degrees of
freedom.
d. Runs Test
The purpose of the runs test is to determine whether the number of
runs of various lengths in the sequence s is as expected for a random
sequence. The expected number of gaps (or blocks) of length i in a random
sequence of length n is ei = (n-i+3)/2i+2. Let k be equal to the largest integer
i for which ei≥5. Let Bi, Gi be the number of blocks and gaps, respectively,
of length i in s for each i, 1 ≤ i ≤ k. The statistics used is [21]:
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
19
∑−
+∑−
===
k
1i i
2iik
1i i
2ii
4 e)eG(
e)eB(X … 2.4
where X4: Runs test value
The statistics used is which approximately follows a χ2 distribution
with 2k-2 degrees of freedom.
e. Autocorrelation Test
The purpose of this test is to check for correlation between the
sequence s and its (non-cyclic) shifted versions.
Let d be a fixed integer, 1 <= d <= [n/2]. The number of bits in s not
equal to their d-shifts is [21]:
∑ ⊕=−−
=+
1dn
1idii ss)d(A … 2.5
where ⊕ denotes the XOR operator .The statistics used is
dn/2
dn)d(A2X5 −⎟⎠⎞
⎜⎝⎛ −
−= … 2.6
where X5: Autocorrelation test
Which approximately follows normal distribution if n-d>=10.
2.3 Steganography (Data Hiding in Text)
2.3.1 General Concepts
Soft-copy text is in many ways the most difficult place to hide data.
This is due largely to the relative lack of redundant information in a text
file as compared with a picture or a sound, while it is often possible to
make imperceptible modifications to a picture, even an extra letter or
period in text may be noticed by a casual reader. There are many methods
of encoding data, some of them are: open space methods that encode
through manipulation of white space, syntactic methods that utilize
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
20
punctuation, and semantic methods that encode using manipulation of the
words themselves [26].
2.3.2 Coding Methods
There are many categories of coding method, some of them are:
2.3.2.1 Open Space Methods
There are two reasons why the manipulation of white space in
particular yields useful results. First, changing the number of trailing
spaces has little chance of changing the meaning of a phrase or sentence.
Second, a casual reader is unlikely to take notice of slight modifications to
white space. There are three methods of using white space to encode data.
The methods exploit inter-sentence spacing, end-of-line spaces, and inter-
word spacing in justified text.
a. Inter-Sentence Spacing
The first method encodes a binary message into a text by
placing either one or two spaces after each terminating
character, e.g., a point (.) or comma (,), etc. A single space
encodes a “0,” while two spaces encode a “1.” This method has
a number of inherent problems; it is inefficient, requiring a great
deal of text to encode a very few bits. One bit per sentence
equates to a data rate of approximately one bit per 160 bytes
assuming sentences are on average two 80-character lines of
text. Its ability to encode depends on the structure of the text.
Many word processors automatically set the number of spaces
after periods to one or two characters. Finally, inconsistent use
of white space is not transparent [26]. Figure (2.3) shows an
example of data hiding using inter-sentence spacing method.
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
21
تعتبر وحدة المعالجة المركزية في الحاسب من أهم الأجزاء بل أهمها على الإطـلاق لأنها بمثابة العقل في الجهاز، كما أنها تعمل على إنجاز كافة العمليات الحـسابية فـي سرعات مذهلة، بالإضافة إلى معالجة مختلف أنواع البيانات والتنسيق بين جميع أجزاء
ج من أكثر الأجهزة تعقيدا، حيـث يحتـوي علـى ملايـين الحاسب، ويعتبر المعال الترانزستورات والتي تترابط مع بعضها البعض بواسطة شعيرات معدنية من الزجاج
.المصهور والتي لها سمكها أرق مئات المرات من سمك الشعرة الواحدة للإنسان
Figure (2.3) Example of data hidden using Inter-sentence spacing
b. End-of-Line Spaces
A second method of exploiting white space to encode data is to
insert spaces at the end of lines. The data are encoded allowing for a
predetermined number of spaces at the end of each line. Two spaces
encode one bit per line, four encode two, eight encode three, etc.,
dramatically increasing the amount of information it can encode over
the previous method. In Figure (2.4), the text has been selectively
justified, and then had spaces added to the end of lines to encode more
data, another advantages of this method are that it can be done with any
text, and it will go unnoticed by readers, since this additional white
space is peripheral to the text. As with the previous method, some
programs, e.g., “sendmail,” may inadvertently remove the extra space
characters. A problem unique to this method is that the hidden data
cannot be retrieved from hard copy [26].
Figure (2.4) Example of data hidden using End-of-line spaces
جزاء بل أهمها على الإطلاق لأنهـا تعتبر وحدة المعالجة المركزية في الحاسب من أهم الأ بمثابة العقل في الجهاز، كما أنها تعمل على إنجاز كافة العمليات الحسابية في سـرعات مذهلة، بالإضافة إلى معالجة مختلف أنواع البيانات والتنـسيق بـين جميـع أجـزاء
لـى ملايـين الحاسب، و يعتبر المعالج من أكثر الأجهزة تعقيدا، حيـث يحتـوي ع الترانزستورات والتي تترابط مع بعضها البعض بواسطة شعيرات معدنيـة مـن الزجـاج
. المصهور والتي لها سمكها أرق مئات المرات من سمك الشعرة الواحدة للإنسان
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
22
c. Inter-Word Spacing
A third method of using white space to encode data involves left-
justification of text. Data are encoded by controlling where the extra
spaces are placed. One space between words is interpreted as a “0”.
Two spaces are interpreted as a “1”. This method results in several bits
encoded on each line as shown in Figure (2.5). Because of constraints
upon justification, not every inter-word space can be used as data. In
order to determine which of the inter-word spaces represent hidden data
bits and which are part of the original text. Another way is a
Manchester-like encoding method, Manchester encoding groups bits in
sets of two, interpreting “01” as a “1” and “10” as a “0.” The bit strings
“00” and “11” are null. For example, the encoded message
“01100101010001” is reduced to “101111”, while “110011” is a null
string [26].
Figure (2.5) Example of data hidden using Inter-word spacing
2.3.2.2 Syntactic Methods
There are many circumstances where punctuation is ambiguous or
when mispunctuation has low impact on the meaning of the text. For
example, the phrases “bread, butter, and milk” and “bread, butter and milk”
are both considered correct usage of commas in a list. Exploiting the fact
that the choice of form is arbitrary. Alternation between forms can
represent binary data, e.g., anytime the first phrase structure (characterized
by a comma appearing before the “and”) occurs, a “1” is inferred, and
anytime the second phrase structure is found, a “0” is inferred. Other
examples include the controlled use of contractions and abbreviations.
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
23
While written English affords numerous cases for the application of
syntactic data hiding, these situations occur infrequently in typical prose.
The expected data rate of these methods is on the order of only several bits
per kilobyte of text.
Although many of the rules of punctuation are ambiguous or
redundant, inconsistent use of punctuation is noticeable to even casual
readers. Finally, there are cases where changing the punctuation will
impact the clarity, or even meaning, of the text considerably. This method
should be used with caution. Figure (2.6) shows the data hiding with this
method.
Figure (2.6) Example of data hidden using Syntactic methods
2.3.2.3 Semantic Methods
Semantic method is similar to the syntactic methods. Rather than
encoding binary data by exploiting ambiguity of form, these methods
assign two synonyms primary or secondary value. For example, the word
“big” could be considered primary and “large” secondary. Whether a word
has primary or secondary value bears no relevance to how often it will be
used, but, when decoding, primary words will be read as ones, secondary
words as zeros.
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
24
2.3.2.4 Shift Coding
a. Line-Shift Coding
Line-shift coding is very easy to perform and is considered the
most resistant to degradation due to copying. In line-shift coding, the
lines of text are shifted vertically to encode the document, see
Figure(2.7).
Figure (2.7) Example of data hidden using Line-shift coding
By determining which lines have been shifted, the
encoded bits can be discovered. Although this method
withstands copying, the human eye and other measurements can
easily detect it. It can also be easily defeated through respacing
or reformatting of the text.
b. Word-Shift Coding
Word-shift coding can also be easily done. In word-shift
coding, code words are coded into a document by shifting the
vertical location of words within lines of text, see Figure(2.8). In
doing so, the appearance of natural spacing must be maintained
in order not to arouse suspicion. By determining the location
where unnatural spacing has occurred, the encoded bits can be
revealed.
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
25
Figure (2.8) Example of data hidden using word-shift coding
There are advantages in using word-shift coding instead of line-
shift coding. Word-shift coding is less obvious to the unsuspecting
reader. Readers are used to reading text that has been justified for a
better presentation. However, there are also ways that word-shift coding
can be detected. If an attacker knew the spacing algorithm, the attacker
can calculate the differences in spacing and figure out the encoded data.
Like line-shift coding, word-shift coding can also be easily defeated
through respacing or justification of the text.
2.3.2.5 Feature Coding
Feature coding is another way of embedding data into a text file. In
feature coding, certain text features are altered depending on the embedded
data. For example, one type of feature coding would be extending the
vertical lines of characters such as “l”, “d”, “b”, “h”, “p”, and “q”. In order
for this type of feature coding to work, the text must be altered by
randomizing the lengths of the vertical lines before applying this algorithm.
The randomness will help the text look less suspicious to its readers.
In order to decode this algorithm, the text, after the randomization,
but before the algorithm application, can be compared with the message
containing the embedded data to retrieve the encoded bits. This type of
feature coding can be easily defeated if the vertical line length is adjusted
to a fixed length before the file is opened.
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
26
2.3.3 Steganographic Protocols
There are basically three types of steganographic protocols used,
they are Pure Steganography, Secret Key Steganography and Public Key
Steganography [27].
2.3.3.1 Pure Steganography
This method of Steganography is the least secure means by which to
communicate secretly because the sender and receiver can rely only upon
the presumption that no other parties are aware of this secret message.
Using open systems such as the Internet this is not the case at all.
2.3.3.2 Secret Key Steganography
Secret Key Steganography is defined as a steganographic system that
requires the exchange of a secret key (stego-key) prior to communication.
Here, steganography takes a cover message and embeds the secret message
inside it by using a secret key (stego-key). Only the parties who know the
secret key can reverse the process and read the secret message. Unlike Pure
Steganography where a perceived invisible communication channel is
present, Secret Key Steganography exchanges a stego-key, which makes it
more susceptible to interception. The benefit to Secret Key Steganography
is even if it is intercepted, only parties who know the secret key can extract
the secret message.
2.3.3.3 Public Key Steganography
Steganography takes the concepts from Public Key Cryptography as
explained below. Public Key Steganography is defined as a steganographic
system that uses a public key and a private key to secure the
communication between the parties wanting to communicate secretly. The
sender will use the public key during the encoding process and only the
private key, which has a direct mathematical relationship with the public
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
27
key, can decipher the secret message. Public Key Steganography provides a
more robust way of implementing a steganographic system because it can
utilize a much more robust and researched technology in Public Key
Cryptography. It also has multiple levels of security in that unwanted
parties must first suspect the use of steganography and then they would
have to find a way to crack the algorithm used by the public key system
before they could intercept the secret message.
2.4 Unicode System
Unicode is a universal character encoding standard, designed to
represent text for computer interchange, processing, and display of many
modern written languages. It is a 16-bit encoding that encompasses many
characters used in general text interchange throughout the world, they
include the principal written languages of Europe, America, the Middle
East, India, Africa, Asia, and Pacifica. Each Unicode index refers
unambiguously to a given character. Unicode allows a larger range of
characters to be addressed than is possible using a Single-Byte character
encoding [28]. Figure (2.9) shows the layout of this encoding system.
Figure (2.9) Unicode's encoding layout
2.4.1 Characters
The smallest component of written language that has semantic value,
refers to the abstract meaning and/or shape, rather than a specific shape,
though in code tables some form of visual representation is essential for the
reader to understand [28].
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
28
2.4.2 Arabic Characters
Arabic script is a cursive writing system used for the Arabic
language, the appearance of a letter changes depending on its
context/position: isolated, initial (joined on the left), medial (joined on both
sides), and final (joined on the right). Arabic code points in the U+0600 -
U+06FF range Unicode table (Appendix A) represents all of the letters
without regard to their position, it is up to the font to show the letter with
the proper appearance. For compatibility with existing standards, Unicode
also defined code points with explicit positions for most letters (Arabic
presentation standard form and form-B) [29].
2.5 Data Compression
In computer science, data compression is the process of encoding
information using fewer bits than a more obvious representation would
use [30].
As is the case with any form of communication, compressed data
communication only works when both the sender and receiver of the
information understand the encoding scheme. Compression is important
because it helps to reduce the consumption of expensive resources, such as
disk space or connection bandwidth.
The task of compression consists of two components, an encoding
algorithm that takes a message and generates a “compressed”
representation (hopefully with fewer bits), and a decoding algorithm that
reconstructs the original message or some approximation of it from the
compressed representation [31].
There are lossless and lossy forms of data compression. Lossless data
compression is used when the data has to be uncompressed exactly as it
was before compression. Text files are stored using lossless techniques,
since losing a single character can in the worst case make the text
dangerously misleading. Lossy compression, in contrast, works on the
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
29
assumption that the data doesn't have to be stored perfectly. Much
information can be simply thrown away from images, video data, and audio
data, and when uncompressed such data will still be of acceptable
quality [32].
2.5.1 Static Huffman Coding
The basic idea in Huffman coding is to assign short codewords to
those input blocks with high probabilities and long codewords to those with
low probabilities. It is a variable length coding technique that provides a
systematic approach to designing a variable length code which is best for
a given finite-alphabet source.
The Huffman algorithm uses the notion of prefix code. A prefix code
is a set of words containing no word that is a prefix of another word of the
set. The advantage of such a code is that decoding is immediate. Moreover
it can be proved that this type of code does not weaken the compression.
A prefix code on the binary alphabet {0,1} corresponds to a binary
tree in which the links from a node to its left and right children are labeled
by 0 and 1 respectively. Such a tree is called a (digital) tree. Leaves of the
tree are labeled by the original characters and labels of branches are the
words of the code (codewords of characters). Working with prefix code
implies that codewords are identified with leaves only. Moreover, in the
present method codes are complete: they correspond to complete tree i.e.
tree in which internal nodes have all exactly two children. In the model where characters of the text are given new codewords
the Huffman algorithm builds a code that is optimal in the sense that the
compression is the best possible. The length of the encoded text is
minimum. The code depends on the input text and more precisely on the
frequencies of characters in the text. The most frequent characters are given
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
30
short codewords while the least frequent symbols correspond to the longest
codewords [33].
2.5.1.1 Encoding
The complete compression algorithm is composed of three steps:
count of character frequencies, construction of the prefix code, encoding of
the text. The last two steps use information computed by their preceding
step [33].
First step consists counting the number of occurrences of each
character in the original text. It is possible to skip this first step if fixed
statistics on the alphabet are used. In this case however the method is
optimal according to the statistics but not necessarily for the specific text.
Second step of the algorithm builds the tree of a prefix code called a
Huffman tree using the character frequency freq(a) of each character a in
algorithm below.
Algorithm (2.1) Creating Huffman tree
Create a one-node tree t for each
Character a, setting weight(t)=freq(a) and label (t)=a,
Repeat until only one tree remains
Extract the two least weighted trees t1 and t2
Create a new tree t3 having
Left subtree t1, right subtree t2,
and weight weight(t3)= weight(t1)+weight(t2)
Figure (2.10) shows an example of the Huffman tree.
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
31
Figure (2.10) Example of a Huffman code represented as a binary tree
2.5.1.2 Decoding
Decoding a file containing a text compressed by Huffman algorithm
is a mere programming exercise. First the coding tree is rebuilt and then the
original text is recovered by parsing the compressed text with the coding
tree. The process begins at the root of the coding tree and follows a left
edge when a 0 is read or a right edge when a 1 is read. When a leaf is
encountered the corresponding character (in fact the original codeword of
it) is produced and the parsing resumes at the root of the tree. The process
ends when the codeword of the end marker is encountered [33].
2.6 Technical Framework
2.6.1 Introduction
Instructional design has developed as a prescriptive science based on
a system approach linking basic research in the psychological processes of
learning with concrete solutions to instructional problems such as optimal
learning retention and transfer.
Many students in higher education find learning difficult, especially
when it comes to understanding the course content and doing their
p(A)=0.16 p(D)=0.13 p(E)=0.11 p(C)=0.09
p(AD)=0.29 p(CE)=0.20
P(ADCE)=0.49 p(B)=0.51
p(ADCEB)=1.00
1
1
1
1
0
0
0
0
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
32
coursework or assignments. Students are complaining that they do not
know what is expected of them in their subjects. Although they attend
lectures regularly, they often fail to know what they are supposed to know.
This is especially true when it comes to the examinations. Students have
little idea what they should be revising and what types of questions they
would expect to be asked. This results in students becoming demoralized
and frustrated. This problem can be resolved by the adoption of
instructional design research in teaching [16].
2.6.2 Instructional Design
Instructional design is concerned with understanding, improving and
applying methods of instruction. It is a process of deciding what methods
of instruction are best for bringing about desired changes in student
knowledge and skills for a specific student population [34]. Growth of
instructional design has evolved over the past half-century from an initial
narrow focus on programmed instruction to a multidimensional field of study,
integrating psychology, education, measurement and management [35].
Instructional design theory is a set of prescriptions for determining
the appropriate instructional strategies to enable learners to acquire
instructional goals. The theory is prescription-based and founded on
instructional theory and related disciplines. The emphasis is on what works
rather than on the steps to carry out the design and development process [15].
Instructional design [36] is a set of procedures for systematically
designing and developing instructional materials. The emphasis is primarily
on what to do, rather than how to do it or why it works. Instructional design
has many variations, but all involve seven basic phases [37]:
1. Planning. 2. Classification. 3. Analysis. 4. Construction. 5. Implementation.
Data Hiding in Arabic Text
Chapter Two Theoretical Concepts of Data Hiding
33
6. Evaluation. 7. Development.
At the most general level, Instructional design is a process for
determining what to teach and how to teach it. The assumption is made that
there is a target population that should learn something. To determine what
is to be learned, the designer analyses a goal statement to identify
subordinate skills and formulates specific objectives and associated criteria
referenced assessments [16].
2.6.3 Instructional Package
Instructional package is a program that has the ability to create
instructional events by participating with the user. This makes the learning
sequential, graded in continual steps [38].
The instructional package, in general, is formed from the following items:
1. Title which represents the title of the package.
2. Introduction that shows the idea of the contents.
3. Target community identification.
4. Instructional target which can be measured and observed by the
learner to expect what he will do during his study of the package.
5. Help about using the package.
6. Contents of the package units show the units used by the package.
7. Pre-test to know the skills of the learner.
8. Instructional activities and alternatives which are suitable for learner
characteristics and take into consideration the personal differences.
9. Exercise shows the range of the package benefit, which contains
feedback.
10. Post-test which is the final test used after finishing from all units to
determine that the aims of the package are achieved.
Data Hiding in Arabic Text
Chapter Three
The Proposed Hiding
Algorithm
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
34
3.1 Introduction
The proposed method provides a new technique of encrypting and
hiding information in an Arabic document files, demonstrating how one
can easily encode and embed secret message in a text file format.
First, a linear feedback shift registers with 128 bits key is proposed
to build a stream cipher used to generate binary keystream sequence, and
take exclusive-or with the plain message to obtain ciphered message.
Second, a new method is proposed to hide the cipher message in an
Arabic text file, benefit from Unicode system characteristics.
3.2 Specification of the Proposed Software
The software hides the information in a file with extension (.DOC),
this means that it is fully compatible with Microsoft Word, which is a part
of Microsoft Office. This lets every one use the program to hide
information using Microsoft Word document files.
The software is written in Visual Basic Language, which benefits
from its features to design an information hiding editor and manage
Microsoft Word Objects used to deal with Microsoft Word files.
The process of hiding a stream of information in a file can be
achieved using the following hiding methods:
1. Using hyphens.
2. Using spaces between words.
3. Change the word position (to: right, left, up, or down).
4. Unicode system method.
5. Hiding in RTF file format.
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
35
An essential part of creating a useful program is providing a simple
and consistent way for the user to interact with it. Menus and toolbars
provide a quick, convenient, and widely accessible way to expose simple
commands and options to the user. They're easy to customize and
controlled by Visual Basic language, and used to write the program in a
way that lets making any modification in the future easy.
3.3 The proposed Software Structure
The program consists of several tasks and each was designed to
perform specific operation. The tasks of data encrypting and hiding
comprise the following steps:
1. Provide a plain text (to hide) and a password (for encryption).
2. Debrief time from computer clock.
3. Mix password with time.
4. Convert text to binary stream (with Huffman coding).
5. Initialize the linear feedback shift register.
6. Generate keystream.
7. Test keystream.
8. Check document file.
9. Encrypt plain text with keystream to get cipher text.
10. Hide cipher text in the document file.
11. Hide time in the document file.
Figure (3.1) shows the steps of the proposed software.
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
36
Figure (3.1) Steps of data encrypting and hiding
Enter Passw ord for encryption
D ebrief tim e from com puter clock
M ix passw ord w ith tim e to produce encryption key
Convert plain text to binary stream (using H uffm an coding)
Initialize the linear feedback shift register
G enerate keystream
Encrypt plain text (binary) w ith keystream to get cipher text
H ide cipher text in docum ent file
H ide tim e in docum ent file
iM ac
Enter text to hide
End process
Check paragraph
Test K eystream
Enter Passw ord for encryption
D ebrief tim e from com puter clock
M ix passw ord w ith tim e to produce encryption key
Convert plain text to binary stream (using H uffm an coding)
Initialize the linear feedback shift register
G enerate keystream
Encrypt plain text (binary) w ith keystream to get cipher text
H ide cipher text in docum ent file
H ide tim e in docum ent file
iM ac
Enter text to hide
End process
Check paragraph
Test K eystream
Pass
Fail
Pass
Fail
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
37
3.4 Operation of the proposed Software
In order to understand the operation of the proposed software, the
following subsection illustrates each step described in section (3.3).
3.4.1 Providing a Plain Text and a Password
The user provides a text (to be hidden), and a password (for
encoding). The password consists of ten digits (may be characters,
numbers, or mixture of both).
3.4.2 Loading Microsoft Word File
A document file can be loaded from file menu, which must be
compatible with Microsoft Word editor and contains a text written in
Arabic language.
Automating Word from Visual Basic allows the programmer to
export, edit, and return data by referencing another application's objects,
properties, and methods. Application objects that are referenced in another
application are called Automation objects. The first step toward making
Word available to Visual Basic for Automation is to create a reference to
the Word type library. A reference to the Word type library can be created,
by clicking References on the Tools menu in the Visual Basic Editor, and
then select the check box next to Microsoft Word 8.0 Object Library.
Open a Word Application object and assigns it to appWD. Using the
objects, properties, and methods of the Word Application object. The
following example opens an existing Word document.
appWd.Documents.Open filename
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
38
3.4.3 Selecting Hiding Method
One of the hiding methods can be selected from [Stego
Menu Select methods submenu]. The menu consists of four methods to
hide the information in the Arabic text, the first method is the Unicode
system method, second is the white space method, third is the hyphen
method, and last is the change position method.
3.4.4 Debriefing the Time from the Computer Clock
The program debrief first the time from the computer clock at
starting process. Start process is done by pushing (hide icon) from the
toolbar so that the program gets the time (in seconds) from the computer
clock at that moment. The time consists of eight characters (seven digits),
the first five digits represent the second and the two digits after dot
represent the partial second separating them by a point, which are ignored
to get pure seven digits. (For example 52170.63). These seven digits mixed
with ten digits of the password entered by the user according to the
following steps:
• Convert each digit of the password to its ASCII code to create an
encryption key consisting of twenty bytes (each digit represented by two
byte ASCII code).
• Multiply specific bytes of the encryption key by the digits of the time to
regenerate a new encryption key. Even if the password is repeated many
times, a different encryption key is generated to encode the data because
of the variation of computer clock time at each moment.
• Example, if the password provided is “D6JU3SHU80”, then the
encryption key will be “68547485518372855648”. And if the time at
start process is “72270.03”, the new encryption key will be
“68532191356072805192”, through the algorithm shown in figure (3.2).
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
39
In the flowchart, the first value between the brackets represents the
digit position and the second represents number of digits, for example if
i=1, the old_key will be two digits from position four (47) and timer will be
one digit from position one (7). For example, if the equation is
new_key(4) = old_key(4,2) * time(1,1)
Then
new_key (4) = 47 * 7
The condition (i=6) in the flowchart used to skip location six in the
time string which represent dot, for example “72270.03”
Figure (3.2) Generation process of the encryption key
i = 1
Start
new_key(2+(i*2))=old_key(2+(i*2),2)*time(i,1)
i = i + 1
i = 6
i <= 8
No
Yes
Yes
End
No
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
40
3.4.5 Generate Keystream
The generator of the keystream is built from five registers R1, R2,
R3, R4, and R5. Each register has variable cells length depending on the
encryption key (the summation of the five register cells are 128 to obtain a
128 bit key size).
3.4.5.1 Labels
Assign a label to registers and to each part of them with a name to represent its activity.
• Register’s name
5i1where,)i(gRe ≤≤
• Register’s length
∑ =128length_)i(gRewhere,length_)i(gRe
• Bit state
}{ length_)i(gRej1where1,0)j(cell_)i(gRe ≤≤=
• Feedback taps
}{ length_)i(gRej1where1,0)j(tap_)i(gRe ≤≤=
• Transfer bit locations
length_)i(gRej1where),j(cell_)i(gRe)cellsActive( cellssixteen ≤≤∈ • Transfer address
length_)i(gRej1where),j(cell_)i(gRe)UP_Address( cellsfour ≤≤∈ length_)i(gRej1where),j(cell_)i(gRe)DOWN_Address( cellsfour ≤≤∈
• Multiplexer selector (MS)
length_)i(gRej1where)j(cell_)2(gRe)1selectorMux( cellone ≤≤∈ length_)i(gRej1where)j(cell_)3(gRe)2selectorMux( cellone ≤≤∈ length_)i(gRej1where)j(cell_)4(gRe)3selectorMux( cellone ≤≤∈
• Keystream output
}{ length_Textj1where1,0)j(keystream ≤≤=
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
41
Mux
.K
ey s
trea
m
Feed
back
Fun
ctio
n
Feed
back
Fun
ctio
n
Feed
back
Fun
ctio
n
Feed
back
Fun
ctio
n
Feed
back
Fun
ctio
n
Reg
iste
r 1
Reg
iste
r 2
Reg
iste
r 3
Reg
iste
r 4
Reg
iste
r 5
Figure (3.3) LFSRs proposed to encode data
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
42
3.4.5.2 The Registers
Each cell in the registers contain one bit, and also has variable set of
feedback tap positions depending on the encryption key, the proposed
registers are shown in figure(3.3).
Each register is connected to its following register by one bit
connection (after each shift operation, a register transfers one bit to the
following register to change its bit stream), the connection is changed
depending on the bits in the transfer bit locations.
3.4.5.3 Initialization Registers
The generated key (encryption key) from the previous stage
initializes LFSRs (linear feedback shift registers) by specifying its
characteristics (register length, initial states, feedback taps, transfer bit
locations, transfer address, and multiplexer selector), which are used to
produce the keystream. Figure (3.4) shows the register characteristics
mapped on encryption key table, (Appendix B/ Subroutine-1 shows a full
initialize registers program).
Encryption key (digit)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Transfer bit Reg. Length
Initial state Feedback tap AD_Up AD_Down Ms
3 Ms
2 Ms
1
Figure (3.4) Encryption key table
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
43
a. Registers Length
To obtain 128 bit key size, the summation of the five register cells is
128 cells. The length of the first four registers are generated randomly
between 20 - 27 cells, this can be achieved by the algorithm explained in
the flowchart shown in figure (3.5).
To produce a random integer number in a given range, the following formula is used: rnd_key = (key(15+i, 1) + 1) / 10 ….3.1
Int((upperboundary – lowerboundary + 1)*Rnd + lowerboundary) ….3.2
Where, Upperboundary is the highest number in the range Lowerboundary is the lowest number in the range Rnd is a rnd_key generated from eq.(3.1)
To ensure that total register length is 128 cells, calculate the length
of register number five from the equation
3.3....length_)4(reglength_)3(reg
length_)2(reglength_)1(reg128length_)5(reg ⎥
⎦
⎤⎢⎣
⎡+
++−=
Figure (3.5) Flowchart represents generation of register length
i = 0
Start
rnd_key = (key(15+i, 1) + 1) / 10
i = i + 1
i <= 4Yes
EndNo
reg(i)_length = Int((30-20+1) * rnd_key + 20)
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
44
b. Initial State Bits
The registers are initialized from the encryption key as shown in the
flowchart of figure (3.6), where all registers are initialized with binary
number {0, 1}. Eq.(3.4) gets two digits from the encryption key to be used
by the following equation to generate a random number.
r1 = key(4+j , 2) + 1 … 3.4 where 1 <= j <= number of registers
The function right in the algorithm means to cut from the variable
specific digits from the right.
Figure (3.6) Flowchart represents generation of initial state
i = 1
Start
r1 = key(4+j, 2) + 1
i = i + 1
End
reg(j)_cell(i) = r2 Mod 2
j = 0
r1 = r1 / (key(5+i,1)+1)
r2 = right(r1,3)
j = j + 1 No
No
Yes
Yes
i <= reg(j)_length
j <= 5
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
45
c. Feedback Taps
Generate a random set of feedback tap for all registers, depending on
the encryption key, by the algorithm shown in flowchart of figure (3.7).
Figure (3.7) Flowchart represents generation of feedback tap
i = 1
Start
r1 = key(12+j, 2) + 1
i = i + 1
End
reg(j)_tap(i) = r2 Mod 2
j = 0
r1 = r1 / (key(13+i,1)+1)
r2 = right(r1,3)
j = j + 1 No
No
Yes
Yes
i <= reg(j)_length
j <= 5
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
46
d. Transfer Bit Locations
Transfer bit locations are sixteen active cells in each register. They
are used to transfer a bit from one active cell to the next register, the
algorithm used to generate these locations is represented by the flowchart
shown in figure (3.8).
Figure (3.8) Flowchart represents generation of active cells
i = 1
Start
r1 = key(1+j, 2) + 1
i = i + 1
i <= 16
End
reg(j)_cell( r2 Mod reg(j)_length+1 ) = Active
j = 0
r1 = r1 / (key(1+i,1)+1)
r2 = right(r1,4)
j = j + 1
j <= 5
No
No
Yes
Yes
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
47
e. Transfer Address
A transfer address is eight cells in each register, four cells are used to
address an active cell with previous register and the other four cells are
used to address an active cell with next register. The algorithm used to
generate these cells is represented by the flowchart shown in figure (3.9).
(a)Generation of transfer address cells (b)Generation of transfer address cells with previous register with next register
Figure (3.9) Flowcharts represent generation of transfer address cells
i = 1
Start
r1 = key(5+j, 2) + 1
i = i + 1
i <= 4
End
reg(j)_cell(r2 Mod reg(j)_length+1) = AD_U
j = 0
r1 = r1 / (key(5+i,1)+1)
r2 = right(r1,4)
j = j + 1
j <= 5
No
No
Yes
Yes
i = 1
Start
r1 = key(10+j, 2) + 1
i = i + 1
i <= 4
End
reg(j)_cell(r2 Mod reg(j)_length+1) = AD_D
j = 0
r1 = r1 / (key(10+i,1)+1)
r2 = right(r1,4)
j = j + 1
j <= 5
No
No
Yes
Yes
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
48
f. Multiplexer Selector
A multiplexer selector is three cells, one in register 2, one in
register 3 and one in register 4. They are used as a multiplexer
selector, to select which register output will formalize a bit of the
key stream. The algorithm used to generate these cells is
represented in the flowchart shown in figure (3.10).
Figure (3.10) Flowchart represents generation of multiplexer selector cells
Start
r1 = key(20, 2) + 1
End
reg(2)_cell(r2 Mod reg(2)_length+1) = MS1
r1 = r1 / (key(20,1)+1)
r2 = right(r1,4)
r1 = key(15, 2) + 1
reg(3)_cell(r2 Mod reg(2)_length+1) = MS2
r1 = r1 / (key(15,1)+1)
r2 = right(r1,4)
r1 = key(7, 2) + 1
reg(4)_cell(r2 Mod reg(2)_length+1) = MS3
r1 = r1 / (key(7,1)+1)
r2 = right(r1,4)
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
49
3.4.5.4 Design Principles
The generator uses five LFSRs connected to each other by a random
connection. This connection depends on bits in transfer address cells
(which address the active cells in each register) and all register’s output are
connected to a multiplexer. Choosing an output (multiplexer selector
changed at each clock pulse) represents a bit of the keystream, as described
in figure (3.3). 3.4.5.5 Keystream Generation
The combined shift registers perform the following operations,
Starting from register 1; i.e.: i=1, (Appendix B/ Subroutine-2 shows a full
stream generation program).
i. The content of reg(i)_cell(j) is shifted to reg(i)_cell(j-1) (one bit to
the right) for each j, 1≤ j ≤ L, where L is reg(i)_length.
ii. The new content of reg(i)_cell(L) is the feedback bit, calculated
from a random feedback function, as shown in algorithm (3.1).
Algorithm (3.1) Feedback bit calculation
Feedback = Reg(i)_cell(L) ,
For 1≤ j ≤ reg(i)_length ,
feedback = feedback XOR [reg(i)_cell(j) * reg(i)_tap(j)] ,
Reg(i)_cell(L) = feedback ,
iii. Calculate the address of transfer bit from reg(i), and the address of
transfer bit to reg(i+1) from address locations assigned in reg(i) and
reg(i+1) , as shown in algorithm (3.2).
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
50
Algorithm (3.2) Transfer bit Calculation
address_reg(i) = 0, k = 0
For 1≤ j ≤ reg(i)_length ,
If reg(i)_cell(j) = Address_DOWN = 1 ,
address_reg(i) = address_reg(i) + 2k ,
k = k + 1 ,
address_regi+1 = 0, k = 0
For 1≤ j ≤ reg(i+1)_length ,
If reg(i+1)_cell(j) = Address_UP = 1 ,
address_regi+1 = address_regi+1 + 2k ,
k = k + 1 ,
For Active_cell only
reg(i+1)_cell(address_regi+1) = reg(i)_cell(address_regi) ,
iv. Repeat from (i) to (iii) , to the remaining registers.
v. Calculate the address of the multiplexer selector from the three cells
one in each of the registers (2), (3) and (4), as shown in
algorithm(3.3).
Algorithm (3.3) Multiplexer selector address calculation
Mux_selector = ( reg(2)_cell(mux_selector1) ) * 20 +
( reg(3)_cell(mux_selector2) ) * 21 +
( reg(4)_cell(mux_selector3) ) * 22
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
51
vi. The output of the multiplexer forms part of the keystream, as shown
in algorithm (3.4).
Algorithm (3.4) Keystream selected from multiplexer
Select case Mux_selector
If 0 or 1, then keystream = keystream + reg(1)_cell(1)
If 2, then keystream = keystream + reg(2)_cell(1)
If 3 or 4, then keystream = keystream + reg(3)_cell(1)
If 5, then keystream = keystream + reg(4)_cell(1)
If 6 or 7, then keystream = keystream + reg(5)_cell(1)
vii. Repeat steps (i) to (vi), until generating the keystream used to
encode the text.
3.4.5.6 Keystream Testing
A test subroutine is used to determine whether the keystream
possesses some specific characteristic that makes it truly random
sequence. There are five statistical tests used (frequency test, serial test,
poker test, run test, and autocorrelation test). The keystream is checked-
up by all these tests and if any one of them fails, the program should
regenerate a new keystream, (Appendix B/ Subroutine-3 shows a full
stream test program).
a. Frequency test, using equation (2.1) with threshold value 3.8415 (one
degree of freedom and mean level 0.05), as shown in algorithm (3.5).
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
52
Algorithm (3.5) Calculate frequency test value
n0 = 0, n1 = 0,
For 1 < i < n,
If keystream(i) = 0, n0 = n0 + 1,
Else n1 = n1 + 1,
f_test = ( n0 - n1 )2 / n,
If f_test < 3.8415, then PASS, Else FAIL,
b. Serial test, using equation (2.2) with threshold value 5.9915
(two degrees of freedom and meaning level 0.05), as shown in
algorithm (3.6).
Algorithm (3.6) Calculate serial test value
n0 = 0, n1 = 0,
n00 = 0, n01 = 0, n10 = 0, n11 = 0,
For 1 < i < n,
If keystream(i) = 0, n0 = n0 + 1,
Else n1 = n1 + 1,
bits_check=keystream; get two bits from position i,
If bits_check =00, n00 = n00 + 1,
Else if bits_check =01, n01 = n01 + 1,
Else if bits_check =10, n10 = n10 + 1,
Else if bits_check =11, n11 = n11 + 1,
s_test = [4/(n-1)*(n002+n01
2+n102+n11
2)] - [(2/n)*(n02+n1
2)] + 1,
If s_test < 5.9915, then PASS,
Else FAIL,
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
53
c. Poker test, using equation (2.3) with threshold value 14.0617 (seven
degrees of freedom (2m-1=23-1=7), and meaning level 0.05), as shown
in algorithm (3.7).
Algorithm (3.7) Calculate poker test value
m = 3, ‘ length of block
bno = Int( n / m ) ‘ blocks number
n000=0, n001 =0, n010 =0, n011 =0, n100 =0, n101=0, n110 =0, n111=0,
For 1 < i < n step by m,
bits_check=keystream; get three bits from position i,
If bits_check =000, n000 = n000 + 1,
Else if bits_check =001, n001 = n001 + 1,
Else if bits_check =010, n010 = n010 + 1,
Else if bits_check =011, n011 = n011 + 1,
Else if bits_check =100, n100 = n100 + 1,
Else if bits_check =101, n101 = n101 + 1,
Else if bits_check =110, n110 = n11 0+ 1,
Else if bits_check =111, n111 = n111 + 1,
p_test = [(2m / bno) * (n0002 + n001
2 + n0102 + n011
2 + n1002 + n101
2
+ n1102 + n111
2)] - bno
If p_test < 14.0617, then PASS,
Else FAIL,
d. Run test, using equation (2.4) with threshold value 9.4877 (four
degrees of freedom (2k-2=2*3-2=4), and meaning level 0.05), as
shown in algorithm (3.8)
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
54
dn −
Algorithm (3.8) Calculate run test value
k=3 ‘ largest no. of bits
bl1=0, bl2=0, bl3=0 ‘ blocks
ga1=0, ga2=0, ga3=0 ‘ gaps
For 1 < i < n,
bl1 = calculate no of “1” in sequence,
bl2 = calculate no of “11” in sequence,
bl3 = calculate no of “111” in sequence,
ga1 = calculate no of “0” in sequence,
ga2 = calculate no of “00” in sequence,
ga3 = calculate no of “000” in sequence,
For 1 < i < k,
e(i) = (n – 1 + 3) / 2i+2 ‘ expected no. of gaps or blocks
r_test = ∑ −+∑ −==
k
1i
2k
1i
2 )i(e/)]i(e)i(ga[)i(e/)]i(e)i(bl[
If r_test < 9.4877, then PASS, Else FAIL,
e. Autocorrelation test, using equations (2.5) and (2.6) with threshold
value 1.6449 (meaning level 0.05), as shown in algorithm (3.9).
Algorithm (3.9) Calculate autocorrelation test value
A=0, ‘ autocorrelation value
d=8, ‘ shift value
For 1 < i < n,
A = A + [ keystream(i) XOR keystream(i+d) ]
a_test = 2 * [ A – (n-d) / 2 ] /
If a_test < 1.6449, then PASS,
Else FAIL,
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
55
3.4.6 Huffman Code
The text to be hidden can be converted from characters (ASCII code)
to binary. Each character is represented by eight bits, to reduce the total bit
message, a Huffman code is used. Figure (3.11) shows the binary stream
generation scheme.
At first, the frequencies (probabilities) of each character should be
counted in many text files. Containing Arabic characters, (more than 96%
of text to hide consists of only 36 characters; the Arabic letters, and the
Figure(3.11) Binary stream generation scheme
space) can be used to make an appropriate compression scheme.
Figure(3.12) shows a histogram of Arabic characters probability.
To build the Huffman tree of a prefix code using the characters
probability (as in figure(3.12)), the following steps are used for this
purpose:
• Order the characters from highest to lowest probability.
• Then the two least-probability characters are selected, logically grouped
together, and their probabilities added. This begins the construction of a
"binary tree" structure.
• Now again select the two elements the lowest probabilities, and
combination as a single element.
Plain text Binary stream Huffman coding
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
56
ا
لنتومي
رفعةبهقسدكجذىصشحأ
ؤءزغئظضثطإخ Space
• Continue in the same way to select the two elements with the lowest
frequency, group them together, and add their frequencies, until running
out of elements.
• The result is known as a "Huffman tree". To obtain the Huffman code
itself, each branch of the tree is labeled with a 1 or 0.
• Tracing down the tree gives the "Huffman codes", with the shortest
codes assigned to the characters with the greatest probability.
Figure (3.13) shows Huffman tree, and table (3.1) shows Arabic
characters listed from highest to lowest probability and their Huffman code.
Encode each character wants to hide, to its Huffman code, to get a binary
stream of data labeled as binary_stream(i), where [1 ≤ i ≤ stream_length],
(Appendix B/ Subroutine-4 shows a full text to Huffman code conversion
program, Subroutine-5 for Huffman code to text conversion program, and
Subroutine-8 for Huffman array creation).
Figure (3.12) Histogram of Arabic characters probability
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
57
Figure (3.13) Huffman tree and the generated codes
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
58
Table (3.1) Huffman codes for Arabic characters
Letter Probability % Code (Bit) SP 18.11 0 0 0 1 0 0 12.00 ا 0 0 1 0 9.93 ل 1 0 1 0 5.84 ي 0 1 1 0 5.51 م 1 1 1 0 4.37 و 0 0 0 0 1 3.99 ت 1 0 0 0 1 3.80 ن 0 1 0 0 1 3.71 ر 1 1 0 0 1 2.73 ف 0 0 0 1 0 1 2.66 ة 1 0 0 1 0 1 2.46 ع 0 1 0 1 0 1 2.44 هـ 1 1 0 1 0 1 2.39 ب 0 0 1 1 0 1 2.25 س 1 0 1 1 0 1 2.24 ق 0 1 1 1 0 1 2.12 ك 1 1 1 1 0 1 1.79 د 0 0 0 0 1 1 1.66 أ 1 0 0 0 1 1 1.36 ح 0 1 0 0 1 1 1.24 ش 1 1 0 0 1 1 0.98 ص 0 0 1 0 1 1 0.86 ى 1 0 1 0 1 1 0.84 ذ 0 1 1 0 1 1 0.81 ج 1 1 1 0 1 1 0.75 خ 0 0 0 1 1 1 0.75 إ 1 0 0 1 1 1 0.71 ط 0 1 0 1 1 1 0.43 ث 1 1 0 1 1 1 0.26 ض 0 0 1 1 1 1 0.22 ظ 1 0 1 1 1 1 0.19 ئ 0 0 1 1 1 1 1 0.19 غ 1 0 1 1 1 1 1 0.19 ز 0 1 1 1 1 1 1 0.18 ء 1 1 1 1 1 1 1 0.08 ؤ
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
59
3.4.7 Check Document File
This task is used to check if the paragraphs in a document file have
enough area to hide data. The way of checking depends on hiding method
selected by the user. If there is not enough area in a file, then the program
indicates that by a message: the number of bits wants to hide and number
of bits can hide into file.
3.4.8 Encryption
Encrypting binary_stream, using a stream cipher is defined as a
keystream generated from previous step of the same length, to produce a
cipher_stream by bitwise XOR operation, where
cipher_stream(i) = binary_stream(i) XOR keystream(i)
where 1 ≤ i ≤ stream_length
Appendix B/ Subroutine-6 shows a full Encipher process program, and
Subroutine-7 for full Decipher process program.
3.4.9 Hide cipher text
Hiding the cipher_stream in an Arabic text document, there are four
methods proposed to hide the information:
3.4.9.1 Hyphen method
A hyphen ( ـ or (kashida) is a small line used to connect between ( ـ
Arabic characters which are used to stretch characters to increase length of
words, to justify the paragraph to a specific margin.
The hyphen must be added between two linked characters. In this
work, for each word with no hyphen inserted is interpreted as “0”, one
hyphen is interpreted as “1”, as shown in algorithm (3.10).
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
60
next:
Algorithm (3.10) Data hiding using hyphen method
k = 0 For 1 < i <= words_Count k = k + 1 p = no_of_hyphen_can_add_to_word(i) If p >= cipher_stream(k) Then p = 0 char_curr = word(i)_char(1) new_text = new_text + char_curr For 2 < j < word_char_count char_prev = word(i)_char(j-1) char_curr = word(i)_char(j) If char_curr = "س" or “ ”ش then new_text = new_text + char_curr + " ”ـ , p = p + 1 If cipher_stream(k) = p Then new_text = new_text + Cut word(i) form right(word(i)_char_count-i) Go to next ElseIf [(char_curr = "ا" or "ة" or "ي") and (char_prev <> "ر" and "ز" and "و" and "ذ" and "د" and "ا" and "أ" and "آ" and "إ" and "ء" and "ل")] then new_text = new_text + " ”ـ + char_curr , p = p + 1 If cipher_stream(k) = p Then new_text = new_text + Cut word(i) form right(word(i)_char_count-i) Go to next Else new_text = new_text + char_curr If (i = l) Then char_prev = new_text(1 + p - 1) If char_prev <> "ر" and "ز" and "و" and "ذ" and "د" and then "ل" and "ء" and "إ" and "آ" and "أ" and "ا" new_text = new_text(1,word_char_count+p-1)+" ”ـ +char_curr, p=p+1 If cipher_stream(k) = p then Go to next
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
61
3.4.9.2 White spaces method
This method is used to add addition white space between words to
hide a data, one space between words is interpreted as a “0”, and two
spaces are interpreted as a “1”, as shown in algorithm (3.11)
Algorithm (3.11) Data hiding using white space method
k = 0
For 1 < i <= words_Count
k = k + 1
If cipher_stream(k) = 1 Then
Add and justify space between word(i) and word(i+1)
3.4.9.3 Change word position method
Hiding Data into a document by setting the position of word
vertically, relative to the base line text, as shown in algorithm (3.12).
Algorithm (3.12) Data hiding using word position method
k = 0
For 1 < i <= words_Count
k = k + 1
If cipher_stream(k) = 1 Then
word(i)_Position = UP
Else
word(i)_Position = Normal
3.4.9.4 Unicode system method
An Arabic Unicode table (takes the range 0600 – 06FF, shown in
Appendix A) represents standard forms of all characters used in Arabic
language, and another Unicode table (take the range FE70 – FEFF)
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
62
represents Arabic presentation forms-B that has all Arabic characters with
isolated form.
The idea for using Unicode system to hide data is to change the code
of the isolated characters (i.e. any character not connected to others within
a word), take each word in the paragraph, and check if there is an isolated
character (the Microsoft word document saves a character as a Unicode
with standard Arabic code, range 0600 – 06FF), then replacing it with the
same glyph character but with form-B Arabic code.
For example, Table (3.2) lists some Arabic characters with standard
code and form-B code.
Table (3.2) Arabic characters with different codes
Character Description Standard code
Hex value
Form-B code
Hex value
Alef 0627 FE8D ا
Beh 0628 FE8F ب
Teh 062A FE95 ت
Theh 062B FE99 ث
Jeem 062C FE9D ج
Hah 062D FEA1 ح
Khah 062E FEA5 خ
Dal 062F FEA9 د
Thal 0630 FEAB ذ
Reh 0631 FEAD ر
Zaih 0632 FEAF ز
Algorithm (3.13) shows hiding data using Unicode system method
(Appendix B/ Subroutine-9 shows a full hiding process program,
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
63
Subroutine-10 for full Unhiding process program, and Subroutine-11 for
ASCII code with Unicode table creation)
Algorithm (3.13) Data hiding using hyphen method
k = 0
For 1 < i <= words_Count
char_prev = Nothing
For 1 < j <= word(i)_char_count
char_curr = word(i)_char(j)
If
[j=1 And (char_curr="ا" or "أ" or "د" or "ذ" or "ر" or "ز"
or "و")]
** check first character **
or
[(char_prev="ا" or "أ" or "د" or "ذ" or "ر" or "ز" or "و") And
(char_curr="ا" or "أ" or "د" or "ذ" or "ر" or "ز" or "و")]
** check characters in middle **
or
[j=word(i)_char_count) and (char_prev="ا" or "أ" or "د"
or "ذ" or "ر" or "ز" or "و")]
** check last character **
Then
k = k + 1
If cipher_stream(k) = 1 then exchange word(i)_char(j)
End If
char_prev = char_curr
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
64
3.4.10 Hiding the Time
The time is debrief from the computer clock at previous step
(consist of seven digits). Each digit is multiplied by a multiple of
ten, the algorithm (3.14) shows this process.
Algorithm (3.14) Hiding time process
For 1 < i <= 7 time_loc(i) = time(i, 1) + [ (i - 1) * 10 ] For 1 < i <= 7 word(time_loc(i) + 1) = SHIFT POSITION_DOWN
3.5 Hiding Data in a Rich Text Format (RTF) File
The Rich Text Format (RTF) is a method of encoding formatted text
and graphics for easy transfer between applications.
The RTF Specification provides a format for text and graphics
interchange that can be used with different operating environments, and
operating systems. RTF uses the ANSI character set to control the
representation and formatting of a document [40].
3.5.1 Contents of an RTF File
An RTF file has the following syntax:
<File> '{' <header> <document> '}' It consists of unformatted text, control words, control symbols, and
groups. For ease of transport, a standard RTF file can consist of only 7-bit
ASCII characters, converters that communicate with Microsoft Word for
Windows should expect 8-bit characters.
A control word is a specially formatted command that RTF uses to
mark printer control codes and information that applications use to manage
documents.
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
65
A control symbol consists of a backslash followed by a single,
nonalphabetic character.
A group consists of text and control words or control symbols
enclosed in brackets { }. The opening brace ‘{’ indicates the start of the
group and the closing brace ‘}’ indicates the end of the group. Each group
specifies the text affected by the group and the different attributes of that
text. The RTF file can also include groups for fonts, styles, screen color,
pictures, footnotes, comments (annotations), headers and footers, summary
information, fields, and bookmarks, as well as document, section,
paragraph, and character formatting properties. If the font, file, style, screen
color, and summary-information groups and document-formatting
properties are included, they must precede the first plain-text character in
the document. These groups form the RTF file header.
Document text should be emitted as ANSI characters. If there are
Unicode characters that do not have corresponding ANSI characters, they
should be output using the \ucN keywords.
3.5.2 Paragraph Formatting Properties
There are many control words that specify generic paragraph
formatting properties. These control words can appear anywhere in the
body of the paragraph, not just at the beginning. If the \pard control word is
present, the current paragraph resets to default paragraph properties.
3.5.3 Hiding Algorithm
The proposed algorithm benefits from the special format of the RTF
file to hide data. In RTF file the control words, control symbols, and braces
constitute control information. All other characters in the file are plain text.
If the RTF reader cannot find a particular control word or control
symbol in the lookup table specified for the file format, the control word or
control symbol should be ignored. The proposal benefits from this
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
66
characteristic is to create a dummy control symbol and make it a cover to
hide the data with it, the following pseudo program opens a source RTF
document file and produces a target document file which contains the
hidden data, algorithm (3.15) shows hiding process.
Algorithm (3.15) Hiding data with RTF file format
Text = Convert Arabic text to English characters
Do While Not End of Source File
pointer1 = pointer1 + 1 pointer2 = pointer2 + 1
String_5bytes = Get from Source_file(pointer1) String_byte = Get from Source_file(pointer1)
Put into Target_file(pointer2) = String_byte
If String_5bytes = "\pard" and then "{" and then "\rtlch", Then
If String_byte = Space, Then pointer2 = pointer2 + 1
If coun >= Text_length Then Put into Target_file(pointer2) = "\azEND" pointer2 = pointer2 + 5 Exit Loop End If
Put into Target_file(pointer2) = "\az" + Text(coun,3)
pointer2 = pointer2 + 5
coun = coun + 3
End If
If String_5bytes = "}", Then
End If
Loop
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
67
3.5.4 Unhiding Algorithm
This algorithm is an opposite operation of the hide algorithm, where
the program searches for the dummy control symbol in the target RTF
document file and extracts the data from it. The following pseudo program
extracts the hidden data from the target file, algorithm (3.16) shows
unhiding process.
Algorithm (3.16) Unhiding data from RTF file format
Pointer = 0
Do While Not End of Target File
String_3bytes_1 = Get from Target_file(pointer)
If String_3bytes_1 = "\az" And q <> 1 Then
pointer = pointer + 3
String_3bytes_2 = Get from Target_file(pointer)
If String_3bytes_2 = "END", Then Exit Loop
Text = Text + String_3bytes_2
End If
pointer = pointer + 1
Loop
Convert English characters(Text) to Arabic text
3.5.5 Compression Algorithm
Arabic characters are stored in the RTF document file using a control
word denoted by \'hh, where hh is a hexadecimal value based on the
specified character set like ASCII code. The proposed software benefit
from this characteristic which replaces the four bytes character code \'hh by
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
68
one byte that represents a character. The following pseudo program
compresses the target file, algorithm (3.17) shows compression process.
Algorithm (3.17) Compress data of RTF file format
Do While Not End of Source File
pointer1 = pointer1 + 1 pointer2 = pointer2 + 1 String_2bytes = Get from Source_file(pointer1) String_byte = Get from Source_file(pointer1)
Put into Target_file(pointer2) = String_byte If String_2bytes = " \' " Then
pointer1 = pointer1 + 2 String_2bytes = Get from Source_file(pointer1) If String_2bytes = Coded Arabic character, Then Convert to original Arabic character using lookup table Put into Target_file(pointer2) = String_byte pointer1 = pointer1 + 1 End If
End If
Loop
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
69
3.6 Design Instructional Package
The researcher uses systems approach in designing the
package, figure (3.14) represents the design of the package
according to this method [41].
Analysis Construction Evaluation Feedback Feedback Figure (3.14) Process of Design the learning Package 3.6.1 Analysis
This stage specifies the scientific structure of the package, and it
consists of:
1. Specify needs
The first step of analyzing any package is to identify the learning
needs, which represent the need to produce the package. The need to
produce the current package is to construct and develop the learner’s
knowledge and capabilities in the field of ciphering and hiding data in an
Arabic text document.
2. Specify Aims
The package aims are used to inform the learners with faculties,
practicability and information that can be achieved by the package after
finishing its use. The aims of the current package are to define the
- Specify needs
- Specify aims
- Specify
community
- Scientific contents
- Set tests
- produce the
package
- Supervisors
- Experts
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
70
concepts of cryptography and steganography, be familiar with cipher a
stream of data and be familiar with hide data in Arabic text.
For assurance to achieve these aims, a behavioral aims are presented
before each unit:
• Describe the major milestone of the cryptography.
• Explain the principles of steganography.
• Describe the methods of data hiding.
• Recognize the data compression techniques.
• Recognize the Huffman coding method.
• Identify the Unicode system.
• Recognize the Arabic character’s characteristics.
3. Specify community
To achieve the benefit from the package, the designer must pay
attention to the properties, knowledge level and determine the
background knowledge of the learner, to be able to set-up the materials
of the package according to the learners’ level. For this the researcher
identifies the target community as:
• Computer engineering, computer science, communication
engineering.
• Higher education students in computer science, computer engineering,
and communication engineering.
3.6.2 Construction
In this stage the package is constructed according to the needs and
aims identified above, and it consists of:
1. Scientific contents
For each Instructional package there is content identified by the aims
given above. The researcher specifies the scientific contents of the
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
71
current package to learn the learners about data hiding in Arabic text.
Contents of the current package are dividing into five units:
1. Cryptography principles.
2. Steganography principles.
3. Huffman coding.
4. Unicode system.
5. Cipher simulation.
These units are designed as a computer presentation, benefit from the
capabilities of the computer to build these units using background,
colors, photos, etc.
2. Set tests
The researcher uses three types of test; the first is pre-test which is
used before beginning with each unit to determine the knowledge level
of the learner. The second is exercises which are used after finishing
from each unit to determine the information and concepts acquired by
the learner. The last is post-test which is used after finishing from all
units to determine that the aims of the package are achieved.
The researcher takes care for many steps before putting the questions:
• Simplicity presenting the questions.
• The questions are a part of the scientific contents of the package.
• The questions must be comprehensive.
• Present a help for how to answer the questions.
• Using feedback if any answer is false.
3. Produce the package
To produce the package the researcher uses Visual Basic language to
program the package and the PhotoShop program to design the pages
because they have facilities to build the presentation screen with full
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
72
Title
Introduction Community Aims Help
Contents
Unit aim
Pre-test
Scientific content
Exercise
Go to another unit
Post test
End
Fail
Pass
Fail
Pass
Yes
No
capacity to use colors, fonts, sounds and transition between pages easily
between screens. Figure (3.15) shows the flowchart of the learning
package, where represent the one unit and the others take the same
structure.
Figure (3.15) Flowchart of the learning package
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
73
3.6.3 Evaluation
This is the last stage used to design the package, this stage is used to
find out the negative and positive points in the package.
Evaluation process is first performed by the supervisors to ensure
that the following points are achieved:
• The quality of the aims and scientific contents.
• Simplicity of the language used.
• The questions and their feedback.
• The help of using the package.
• The background and the colors of the presentation.
For the purpose of measuring the package performance, two
questionnaires are distributed:
1. Questionnaire for viewpoints of experts, it includes (11) items shown
in Appendix (C), and the names of the experts are listed in Appendix
(C).
2. Users questionnaire, which contains (12) items shown in Appendix
(C),
The answer of the items will be by using symbol (√) in the field
which reflects the user’s opinion. The questionnaire gives a measure of five
different degrees which reflect the comprehension of information and
concepts.
3.6.4 Statistical Method
The researcher used the standard deviation to analyze the results of
the questionnaires, where there are five answers for each item from very
large to very little.
Data Hiding in Arabic Text
Chapter Three The Proposed Hiding Algorithm
74
Standard deviation is one of statistical dispersion, measuring how the
values in a data set are spread out. If the data points are all close to the
mean, then the standard deviation is close to zero. If many data points are
far from the mean, then the standard deviation is far from zero [42].
The standard deviation is calculated from the following equation:
1N
F)XX(S i2
ii−
∑ −= ….3.5
where Xi : degree of the given item
X : average
Fi : repetition number
N : sample number
Data Hiding in Arabic Text
Chapter Four
Results And
Discussion
Data Hiding in Arabic Text
Chapter Four Results and Discussion
75
4.1 Introduction
The proposed software consists of two programs, the first hides
Arabic text in a document file with extension (.DOC) using four methods
(Unicode system, white space, add hyphen, and change position), and the
second program hides Arabic text in a document file with extension (.RTF)
by hiding the message in the data part of the file, where the two extensions
are Microsoft Word compatible.
4.2 Ciphering and Hiding Data in .DOC Document Files
Ciphering and hiding process are done by many steps as shown in
figure (4.1), which will be discussed in brief using an Arabic plain text
document file used as a cover, Arabic message to hide and a password for
encryption process. The following figure shows the main window of the
proposed software.
Figure (4.1) Main window of the proposed software
Data Hiding in Arabic Text
Chapter Four Results and Discussion
76
As an example, cover paragraphs shown in figure (4.2) will be taken
to hide the message.
أهمية وحدة المعالجة المركزيةتعتبر وحدة المعالجة المركزية في الحاسب من أهم الأجزاء بل أهمهـا على الإطلاق لأنها بمثابة العقل في الجهاز، كما أنها تعمل على إنجـاز كافـة العمليات الحسابية في سرعات مذهلة، بالإضافة إلى معالجـة مختلـف أنـواع
أجزاء الحاسب، و يعتبر المعالج من أكثر الأجهزة البيانات والتنسيق بين جميع تعقيدا، حيث يحتوي على ملايين الترانزستورات والتي تتـرابط مـع بعـضها
والتي لهـا سـمكها ) من الزجاج المصهور ( البعض بواسطة شعيرات معدنية .أرق مئات المرات من سمك الشعرة الواحدة للإنسان بساعة النظـام، ولكـن لا يوجد بداخل كل حاسب ساعة خاصة تسمى
تستخدم هذه الساعة لمعرفة الوقت، وإنما لإرسال نبضات كهربائية صغيرة إلى وحدة المعالجة والتي بدورها تقوم باستخدام هذه النبضات للتحكم في العمليـات التي تنجزها، ولوجود هذه الساعة علاقة وثيقة بسرعة تردد المعالج، فعلى سبيل
هيرتز يـستطيع أن يـستقبل 300ي يقوم بالعمل على تردد المثال المعالج الذ مليون نبضة في الثانيـة وبمـا أن 300النبضات الكهربائية من الساعة بمعدل
من نبضات ( المعالجات تقوم عادة بإنجاز عملية واحدة فقط لكل نبضة كهربائية و . انيـة مليون عملية لكل ث 300فبالتالي بإمكان المعالج إنجاز ) ساعة النظام
بـشكل أصـغر ) أو الدوائر التـي بـداخلها ( من أهم أسباب جعل المعالجات فأصغر من قبل شركات تصنيع المعالجات هو جعل مسافات انتقال الكهرباء بين الترانزوستورز بداخل وحدة المعالجة أقصر الأمر الذي يعمل على زيادة سرعة
.المعالج أقسام، أهم هذه الأقسام والتـي تتكون وحدة المعالجة المركزية من عدة
يتم من خلالها معالجة البيانات والقيام بمختلف العمليات في الحاسب هما وحـدة .التحكم و وحدة التنفيذ
Figure (4.2) Arabic plain text paragraphs
Following are the steps of Ciphering and hiding process:
4.2.1 Open Document File
Open an Arabic text document file from the <File> menu shown in
the menu bar of the software, the file must have a “ .DOC ” extension
which is saved using Microsoft Word application and used to hide the
message. The proposed software will open the file using Microsoft Word
environment, as shown in figure (4.3).
Data Hiding in Arabic Text
Chapter Four Results and Discussion
77
Figure (4.3) Arabic text document file - cover file
4.2.2 Select Hiding Method
This step is used to select the method of hiding as shown in
figure(4.4), there are four hiding methods: Unicode system, white space,
change position and hyphen methods.
Figure (4.4) Stego menu of the proposed software - Select method submenu
Data Hiding in Arabic Text
Chapter Four Results and Discussion
78
4.2.3 Write the Message
In this step a message which one wants to hide is written in the
specific text box as shown in figure (4.5), where all Arabic characters as
well as space are allowd to be entered.
Figure (4.5) Message window
4.2.4 Write the Password
In this step a password for encryption is written in the specific text
box as shown in figure (4.6), where all English characters (upper case and
lower case) as well as numbers will be allowed to be entered.
Figure (4.6) Password window
4.2.5 Start Hiding Process
Starting cipher and hide process by selecting the “Hide” item from
the <Stego> menu as shown in figure (4.7).
Figure (4.7) Menu bar of the proposed software - Stego menu
Cipher and hide processes consist of many steps, each step is a part
(subroutine) of the software and will be discussed below:
Data Hiding in Arabic Text
Chapter Four Results and Discussion
79
1. Huffman Code Subroutine: converts message characters (written in
previous step) to binary using Huffman code, by taking each character
from the message and convert it to its equivalent binary code using
table(3.1). The result of the conversion is shown in figure (4.8), (where
for this example of the message shown in figure (4.5) the binary stream
has 196 bits length).
0 0 1 0 1 1 1 0 1 0 0 0 1 0 1 0 0 1 1 0 1 1 0 0 0 1 0 1 1 0 1 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 1 1 0 1 0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 1 1 0 1 1 0 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 1 1 0 1 0 0 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1
Figure (4.8) Binary bits of the converted plain text message
2. Check Subroutine: calculates the number of words and the number of
bits that can be hidden in the document, as well as the number of characters
in the message, number of bits before and after compression, and
compression ratio as shown in figure (4.9).
No. of words 239 Document
(cover) No. of bits can hide 221
No. of characters 42
No. of bits-before compression 336
No. of bits-after compression 196
Message
(to hide)
Compression ratio 46%
Figure (4.9) Check subroutine results
Data Hiding in Arabic Text
Chapter Four Results and Discussion
80
The subroutine checks if the number of bits in the message is larger
than the number of bits that can be hidden in the document, then a message
box appears like that in figure (4.10). If answered by “Yes” the program
continues with hiding process but ignores the last part of the message that
has not enough space in the document, if answered by “No” the program
will cancel the hiding process.
Figure (4.10) No enough space message box 3. Initialize Registers Subroutine: builds the registers which are used to
generate the keystream, and this can be achieved by many steps described
as follows:
• Convert each character of the password to its ASCII code to get the
key, as shown below
Char H 8 3 D 2 F V 7 S R ASCII 72 56 51 68 50 70 86 55 83 82
After converting the characters, the key is 72565168507086558382.
• Debrief the time from the computer clock, timer for example is
(78538.46).
tim(1) tim(2) tim(3) tim(4) tim(5) tim(6) tim(7)Timer digit 7 8 5 3 8 . 4 6
Data Hiding in Arabic Text
Chapter Four Results and Discussion
81
• Mix the key with time to get a key varying with time, by multiplying
each specific digit of the key with the specific digit of the time as
shown in figure (4.11), the result of each multiplication step changes
the value of part of the key.
Key location 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Original 7 2 5 6 5 1 6 8 5 0 7 0 8 6 5 5 8 3 8 2Tim(1) 7 Key 1 7 2 5 4 5 5 6 8 5 0 7 0 8 6 5 5 8 3 8 2Tim(2) 8 Key 2 7 2 5 4 5 4 4 8 5 0 7 0 8 6 5 5 8 3 8 2Tim(3) 5 Key 3 7 2 5 4 5 4 4 4 2 5 7 0 8 6 5 5 8 3 8 2Tim(4) 3 Key 4 7 2 5 4 5 4 4 4 2 1 7 1 8 6 5 5 8 3 8 2Tim(5) 8 Key 5 7 2 5 4 5 4 4 4 2 1 7 1 4 4 5 5 8 3 8 2Tim(6) 4 Key 6 7 2 5 4 5 4 4 4 2 1 7 1 4 4 5 2 3 2 8 2Tim(7) 6 Final 7 2 5 4 5 4 4 4 2 1 7 1 4 4 5 2 3 1 6 8
Figure (4.11) generating varied with time key
After last step of multiplication the new key produced is the final
result encryption key which is equal to 72545444217144523168,
which is used to generate registers length, state bits, feedback taps,
transfer bit locations, transfer address and multiplexer selector.
• The encryption key is generated from the previous step which
is used to produce the registers length, using algorithm in the
flowchart shown in figure (3.5). The result of registers length is
listed in table (4.1).
Data Hiding in Arabic Text
Chapter Four Results and Discussion
82
Table (4.1) Registers length
Register number Length (cells)
1 22 2 23 3 21 4 26 5 36
• Initialize state bits for each register using algorithm in the flowchart
shown in figure (3.6), where each cell has a binary value either ‘0’ or
‘1’. The state bits of all registers are listed in table (4.2).
Table (4.2) State Bits for each register
Register 1 1 0 1 0 0 1 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 0
Register 2 0 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1 0
Register 3 0 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1
Register 4 0 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 0 0 1 1 0 1 0 1 0 1
Register 5 1 1 0 1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1
• Generate feedback taps for each register using algorithm in the
flowchart shown in figure (3.7), where each tap has a binary
value either ‘0’ or ‘1’. The feedback taps of all registers are
listed in table (4.3).
Data Hiding in Arabic Text
Chapter Four Results and Discussion
83
Table (4.3) Feedback taps bit for each register
Register 1 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 0
Register 2 1 1 1 0 1 1 1 0 1 0 0 1 1 0 0 1 0 0 1 0 0 1 0
Register 3 0 1 1 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0
Register 4 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 0
Register 5 1 1 1 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 1 0 1 1 0 1 0 1 0 • Generate transfer bit locations using the algorithm in the flowchart
shown in figure (3.8). Each register has sixteen active cells, which are
listed in table (4.4).
Table (4.4) Locations for transition for each register
Register no. Cell no. of Active cells 1 1 2 3 4 5 6 7 8 10 12 13 17 19 20 21 222 1 2 3 6 8 9 10 11 12 14 16 18 19 20 21 223 1 2 3 4 5 9 10 12 13 14 15 16 17 18 19 214 2 3 4 5 6 7 8 12 13 14 15 18 20 21 24 265 3 6 7 11 13 14 17 18 23 24 26 27 28 33 35 36
• Generate transfer address using algorithm in the flowchart shown in
figure (3.9). Each register has four address cells used to address an
active cell with previous register as listed in table (4.5), and other four
address cells used to address an active cell with next register as listed
in table (4.6).
Data Hiding in Arabic Text
Chapter Four Results and Discussion
84
Table (4.5) Locations of transition for each register
Register no. Cell no. of address up 1 8 9 18 22 2 7 8 9 20 3 1 2 3 6 4 5 20 21 23 5 2 7 8 24
Table (4.6) Locations of transition for each register
Register no. Cell no. of address down 1 6 11 17 19 2 1 3 6 21 3 4 5 8 11 4 3 20 22 26 5 2 3 16 35
• Generate a multiplexer selector cells using algorithm in the flowchart
shown in figure (3.10). Table (4.7) lists the cell number for each
selector.
Table (4.7) Locations for transition for each register
Register no. Cell no. for multiplexer selector 2 3 3 12 4 24
After all register parameters are initialized the five registers are building as shown in figure (4.12)
Data Hiding in Arabic Text
Chapter Four Results and Discussion
85
Figure (4.12) The proposed Registers design as an example
Data Hiding in Arabic Text
Chapter Four Results and Discussion
86
Figure (4.12) shows five registers, the output of each register is
used as an input to the multiplexer, where the multiplexer selector
takes its value from the content of the cells (drawn by dark gray
rectangular with two arrows) assigned in the previous step. The active
cells for each register are drawn with light gray boxes, and the
feedback cells for each register are drawn with gray small boxes
above the cell boxes. Cells used for transition address-up are drawn by
gray rectangular -up- and cells used for transition address-down are
drawn by gray rectangular -down-.
4. Keystream Generation Subroutine: this subroutine uses the previous
registers model shown in figure (4.12) to generate a keystream with length
equal to the length of the message bits after compression. The result of the
generation is shown in figure (4.13) where the binary stream has 196 bits
length.
0 1 0 1 1 0 1 0 0 1 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 1 1 0 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 1 1 1 1 0 0 0 1 0 0 0 1 0 1 1 1 1 0 1 0 0 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 0 1 0 1 0 0 1 0 0 0 0 1 1 1 1 1 0 0 1 0 1 0 0 0 1 1 1 1 0 1 1 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 1 0
Figure (4.13) Keystream bits generated
5. Test Subroutine: tests the keystream generated from the previous step
using a statistical test. The software check values, if one of them or more,
above the permission value, the software will return to get another time
value and calculate a new key to generate a new keystream until all test
values are acceptable (in the range). Table (4.8) lists the values of threshold
for each statistical test and the test value (for this example).
Data Hiding in Arabic Text
Chapter Four Results and Discussion
87
Table (4.8) Test values of the keystream
Test type Test Value Threshold value Frequency test 00.184 03.8415 Serial test 00.447 05.9915 Poker test 02.938 14.0671 Runs test 01.650 09.4877 Autocorrelation test 00.001 01.9600
6. Encryption Subroutine: this task is used to XOR the plain text
message bits with the keystream bits to generate a cipher stream as
shown in figure (4.14).
0 1 1 1 0 1 0 0 1 1 0 1 0 0 0 0 1 0 1 1 0 1 0 0 1 0 0 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0 1 1 1 1 0 0 0 1 1 1 0 0 1 0 1 1 1 1 0 0 0 1 1 0 1 1 0 1 0 1 1 0 1 0 0 1 0 1 1 0 0 0 1 1
Figure (4.14) Cipher stream bits
7. Hiding Subroutine: this subroutine is used to hide the cipher stream that
is produced from the previous step. The hiding process is dependent on the
selected method as shown in figure (4.4), and the following sections
describe all the four methods. Figure (4.15) shows the window of the
proposed software after hiding process is complete.
Data Hiding in Arabic Text
Chapter Four Results and Discussion
88
Figure (4.15) Main window of the proposed software after hiding process
4.2.6 Hiding Data with Unicode System Method
This method benefits from the character code tables shown in
Appendix A, where for Arabic text there are two code tables, one for
standard characters and the other for isolated characters. The method states
to convert or leave each isolated character in the words from one code
number to another code number depending on the 0’s and 1’s of the cipher
stream bits without affecting the appearance of the document when
browsing in a word document viewer like Microsoft word software or
WordPad software.
The text document after hiding the message is shown in figure (4.16)
where it has identical view with the source document shown in figure (4.3),
they have the same file size and the software can hide 221 bits in the source
document file.
Data Hiding in Arabic Text
Chapter Four Results and Discussion
89
Figure (4.16) Document file after hiding the message (using Unicode system method)
The following figure (4.17) is part of the document with zooming
300% of the text before hiding process, and figure (4.18) is the same part of
the document but after hiding, the two figures look identical and cannot be
detected by third party.
Figure (4.17) Part of the document file before hiding process
Data Hiding in Arabic Text
Chapter Four Results and Discussion
90
Figure (4.18) Part of the document file after hiding process
(using Unicode system method)
4.2.7 Hiding Data with White Space Method:
This method uses the space between words to hide data, where one
space between two words represents the Boolean number zero and two
spaces between two words represent the Boolean number one.
Part of the document is shown in figure (4.19), and the text
document after hiding the message is shown in figure (4.20). The third
party can easily detect the difference between files by eye. The target file
and the source file have the same file size, and the software can hide 235
bits in the source document file.
Figure (4.19) Part of the document file after hiding process
(using white space method)
Data Hiding in Arabic Text
Chapter Four Results and Discussion
91
Figure (4.20) Document file after hiding the message (using white space method)
4.2.8 Hiding Data with Hyphen Method
This method benefits from Arabic characters that are connected to
each other, to hide data by adding a hyphen between them, where if there is
no hyphen in the word, the Boolean number zero is represented and one
hyphen in the word represents the Boolean number one.
The text document after hiding the message is shown in
figure (4.21), and part of the document is shown in figure (4.22).
In this method the third party cannot easily detect the difference
because Arabic paragraphs already have hyphens for justification
and alignment of the text. The target file and the source file have
Data Hiding in Arabic Text
Chapter Four Results and Discussion
92
the same file size, and the software can hide 218 bits in the
source document file.
Figure (4.21) Document file after hiding the message (using hyphen method)
Figure (4.22) Part of the document file after hiding process
(using hyphen method)
Data Hiding in Arabic Text
Chapter Four Results and Discussion
93
4.2.9 Hiding Data with Change Position Method
This method uses the line of a row as a base line and shifts word up
or leave it depending on the data to hide, where no shift in the word
represents the Boolean number zero and shift the word one pixel
represents the Boolean number one.
The text document after hiding the message is shown in
figure (4.23), the target file and the source file have the same file
size, and the software can hide 239 bits in the source document
file.
Figure (4.23) Document file after hiding the message (using change position method)
Data Hiding in Arabic Text
Chapter Four Results and Discussion
94
Part of the text with zooming 500% is shown in figure (4.24). In
figure (4.23) the third party cannot detect the difference by eye, but when
the text is zoomed to 500% the difference can be detected by eye as shown
in figure (4.24 a) and figure (4.24 b).
a. Text before hiding (original text)
b. Text after hiding
Figure (4.24) Part of the document file (using change position method)
Data Hiding in Arabic Text
Chapter Four Results and Discussion
95
4.3 Hiding Data in .RTF Document Files
The second proposed software is for hiding data in a Microsoft word
compatible format named as RTF file, the main window of the software is
shown in figure (4.25).
Figure (4.25) Main window of the proposed software
As an example, the same cover paragraphs that are shown in
figure (4.2) will be taken to hide the same message. Following are
the steps of the hiding process:
4.3.1 Open Document File
Open a word document file, which is used to hide the message (using
the same document file that is shown in figure (4.2)).
Data Hiding in Arabic Text
Chapter Four Results and Discussion
96
4.3.2 Write the Message
Write the message to hide in the specific text box as shown in
figure (4.26), where all Arabic characters as well as space will be
allowed to enter.
Figure (4.26) Message window
4.3.3 Start Hiding Process
Start hiding process by selecting the <Hide> item of the <Process>
menu from the menu bar as shown in figure (4.27).
Figure (4.27) Main menu of the software – Process submenu
Hiding process hides the message in the data zone of the RTF file.
Figure (4.28) shows part of the file viewed in hexadecimal code. The
software uses a dummy control symbol like ( \az ) to hide the message
characters with it, this is shown in figure (4.29) where after each dummy
control, three characters are used as a technique to hide.
Figure (4.30) represents the document file after hiding the message,
where it has identical view with the source document shown in figure (4.3).
Data Hiding in Arabic Text
Chapter Four Results and Discussion
97
Figure (4.28) View of the file before hiding
Figure (4.29) View of the file after hiding
Data Hiding in Arabic Text
Chapter Four Results and Discussion
98
Figure (4.30) Document file after hiding the message (using RTF file format)
The following figure (4.31) is part of the document with
zooming 300% of the text document after hiding, it looks identical
to that in figure (4.17) and cannot be detected by third party.
Figure (4.31) Part of the document file after hiding process
(using RTF file method)
Data Hiding in Arabic Text
Chapter Four Results and Discussion
99
In this method the size of the file after hiding is larger than the size
of the source file, for this problem a subroutine is written to compress the
file size to reach its original file size. Figure (4.32) shows the document file
after compression, from this figure and the previous figure the view of the
two files is identical.
Figure (4.32) Document file after compression
Data Hiding in Arabic Text
Chapter Four Results and Discussion
100
Figure (4.33) shows the main window of the proposed software after
implementing processes (hide, unhide, compress) on a document file.
Figure (4.33) Main window of the proposed software after hiding and compression processes
4.4 Discussion
From the previous sections, and after examining fifteen different
documents to hide messages by all methods used in this thesis, the best way
of hiding data in an Arabic text document is the Unicode system method
where the target file takes the same source file size, the third party cannot
recognize the difference by eye, not required the original document for
detecting the hidden message, and any change in font name, font size, font
style, and paragraph justification do not affect the hidden message.
The RTF file method is similar to the previous method but the file
size after hiding is increased, but the proposed software compresses the file
Data Hiding in Arabic Text
Chapter Four Results and Discussion
101
to make the difference between its size and the source file size as smaller as
possible.
The other three methods can be used for the hiding domain but with
some risk if the third party has knowledge of hiding data. On the other
hand, using cryptography before hiding improves the security of the
message.
The researcher can summarize from the literatures introduced in the
previous chapter that:
1. Brassil, et al. used a word shift coding method (where shift words to
left or right), while current study uses the same method but shift
words up or leave it.
2. Shaar proposed a study to hide a number of bits from plain text
message into a random vector bits, and the location of the hidden bits
are determined by a key. While the current study uses the same
strategy when hiding the time (which is a part of the password) in the
document file.
3. Kim used an inter word space method, while the current study uses the
same method but in different way.
4. Sui proposed a method to hide information in hypertext file, the final
stego-file is similar with that of the current study when using Unicode
system method to hide data, where the stego-file and the cover have
no difference in normal appearance and algorithm doesn’t lengthen
the file size.
5. Topkara uses a linguistic method to hide information, this study is
different from the current study, where it changes the sentence words
but maintains its mean, while the current study change the feature of
the characters to hide data.
6. Voloshynovskiy proposed a method to hide data in character’s color,
using a Microsoft word document as a cover file, this facility is used
Data Hiding in Arabic Text
Chapter Four Results and Discussion
102
by the current study, where using two types of file that are compatible
with Microsoft word, which are Document file and RTF file.
7. Current study agrees with Alderson, Tubsree and Fahim studies in
educational technology field in preparing instructional computer
program.
Table (4.9) reviews the hiding methods for two different document
files and their file size, capacity to hide data and document view before and
after hiding the message.
Table (4.9) Hiding methods review
Size of document file (Byte) Method type File
index Before hiding After hiding
No. of words in the
document
No. of Bitcan hide in file
Document view
before andafter
Unicode system 221 Identical
White space 235 Not Hyphenation 218 Not
Change position
1.doc 21,504 21,504
239 Identical(Normal
view) Before
compressAfter
compressRTF 1.rtf 8,78410,303 8,784
239
6072 Identical
Unicode system 208 Identical
White space 180 Not Hyphenation 198 Not
Change position
2.doc 20,992 20,992
202 Identical(Normal
view) Before
compressAfter
compressRTF 2.rtf 8,4399,892 8,440
202
5808 Identical
Data Hiding in Arabic Text
Chapter Four Results and Discussion
103
4.5 Instructional Technology Side Results 4.5.1 Opinion List Results of Experts Viewpoint Analysis
Table (4.10) represents the value of the standard deviation for each
item. From the table, it is found that item (3) gives the highest standard
deviation, while item (6) gives the lowest standard deviation. This means
that the experts have good agreement in that the information and concepts
that are displayed in the package are suitable scientifically, and they differ
in the clearness of displaying package style.
Table (4.10) Opinion list result of experts
No. The items Large Medium Little Standard deviation
1 The instructions of using the package are simple. 6 3 1 0.707
2 The density of displaying information on computer screen is suitable.
7 3 - 0.483
3 The information that displayed in the package are suitable scientifically.
8 2 - 0.421
4 Designing of the instructional package takes into account the personal differences.
7 2 1 0.699
5 Clearness of the item’s titles. 3 6 1 0.632
6 Displaying package style is limiting. 4 3 3 0.875
7 The attached images in the instructional units are participating to understand the concepts.
1 7 2 0.567
8 Understanding the producing questions in the program. 4 5 1 0.674
9 Suitable of the used color in the package. 2 6 2 0.666
10The questions are including all items in the instructional package.
3 5 2 0.737
11The language style that used to explain the scientific concepts and information is clear.
3 4 3 0.816
Data Hiding in Arabic Text
Chapter Four Results and Discussion
104
4.5.2 Questionnaire Results of Learners Viewpoint Analysis Table (4.11) represents the value of the standard deviation for each item. From the table, it is found that item (2) gives the highest standard deviation, while item (4) gives the lowest standard deviation. This means that the students agree in that the scientific concepts that are displayed in the package were clear, and they differ in that the flowchart presented assisted to increase the understanding of the scientific information.
Table (4.11) Opinion list result of learners
No. The items Large Medium Little Standard deviation
1 The division of the instructional package subject into five typical units participated to increase your understanding of the package.
10 4 1 0.632
2 The scientific concepts that displayed in the package were simple.
12 3 - 0.414
3 The information that displayed in the package was clear. 10 4 1 0.632
4 The flowchart assisted to increase in understanding the scientific information
9 3 3 0.828
5 The harmony between display image and related information was fines.
6 9 - 0.507
6 Moving steps between the instructional package screens were simple.
7 8 - 0.516
7 English language is better that Arabic language in displaying the instructional package materials.
5 6 4 0.798
8 The language style to explain the scientific concepts is understand. 7 7 1 0.632
9 Using of colors cleared the displayed concepts. 5 8 2 0.676
10 Titles of the instructional package items are clear. 8 6 1 0.639
11Immediate support for the answer, increased your desire to continue with the package.
7 6 2 0.723
12 The package increases your learning desire. 11 4 - 0.457
Data Hiding in Arabic Text
Chapter Four Results and Discussion
105
4.6 Conclusions
In this thesis, a new technique to hide information in Arabic text is
proposed. This technique takes into consideration some parameters that are
used to detect the existence of hidden information in a text file, such as file
size, justification, font size, font characteristics …etc.
The experiments were done using actual network, where files
containing hidden information were transferred using e-mail.
Even though hiding information in text file has some limitations, it is
necessary to think about some techniques that can improve the performance
of such method. Since, here in the University of Technology there is an
idea to connect the University departments in a common network and most
of documents and files transfer between users in the whole University are
in Arabic text file. Some secret information, may be between heads, be
transferred as hidden information using Arabic text files.
Many conclusions can be drawn from this work, the most important
of which are:
1. The tests show that the best method is the Unicode system method,
since:
a. The method modifies only the Unicode letters instead of the
content itself. The stego-file and the cover have no difference in
normal view.
b. The algorithm doesn't lengthen the file size since it just modifies
the Unicode letters instead of adding letters.
2. At the same level of the Unicode system method, the RTF file
method is good after compressing the file, taking into consideration
some parameters that affect the detection methods.
3. Other methods are also taken into consideration, but there may be
some risk from the third party.
Data Hiding in Arabic Text
Chapter Four Results and Discussion
106
4. This system deals with document files that are compatible with
Microsoft Word Documents.
5. Using statistical test to measure the quality of the generated
keystream to get a random bit generator.
6. The instructional package participates in solving the problems which
face the student, to connect between theoretical explanations of
information hiding and applies these concepts practically.
7. The instructional package takes into consideration the cognitive
difference between the learners, where they use the package
according to their learning speed.
8. The instructional package assists the learners to develop their
knowledge, using the feedback provided by the package.
4.7 Recommendations
1. The researcher recommends using this work in the implementation of
the proposed system in an actual University computer network.
2. Taking benefits from instructional package to enhance the ordinary
teaching method for cryptography and steganography subject.
4.8 Suggestions
There are many suggestions which can be taken as proper research in
information hiding process, these are:
1. Develop this system to hide voice data in an Arabic text documents.
2. Develop a system to hide information in Adobe Acrobat files.
3. Build an instructional package, to teach the student about hiding
voice data in an Arabic text documents.
4. Develop a system to hide information with online data transfer.
Data Hiding in Arabic Text
107
References
1. Bauer, F. L., “Decrypted Secrets: Methods and Maxims of Cryptology”,
3rd ed. Springer-Verlag, New York, 2002.
2. Arnold, M., Schmucker, M., and Wolthusen, S. D., “Techniques and
Applications of Digital Watermarking and Content Protection”, Artech
House, Norwood, Massachusetts, 2003.
3. Kessler, Gary C., “An Overview of Steganography for the Computer
Forensics Examiner”, Forensic Science Communications, No.3-Vol.6-
July 2004.
4. Pawliw, Borys and Neijts, Roberto, “Definitions”, 2002.
www.searchSecurity.com
5. Bender, W., Gruhl, D., Morimoto, N., and Lu, A., “Techniques for Data
Hiding”, IBM Systems Journal 35, Nos. 3&4, 313–336 (1996).
6. Engelfriet, Arnoud,” Steganography”, 2000.
www.stack.nl/galactus/remailers/index-privacy.html
7. Watermarking World web site, 2005.
www.watermarkingworld.org/faq.html
8. Franz, E., Jerichow, A., M¨oller, S., Pfitzmann, A., and Stierand, I.,
“Computer Based Steganography”, in Information Hiding, Springer
Lecture Notes in Computer Science v.1174, pp.7-21, 1996.
9. Linux, Fu-King, “Basic Data Hiding Tutorial”, 2003.
http://www.antionline.com/showthread.php?threadid=251463
10. Brassil, J. T., Low, S., and Maxemchuk, N.F., “Copyright protection for
the electronic distribution of text documents”, Proceedings of IEEE,
Vol.87, No.7, pp.1181- 1196, July 1999.
11. Shaar, M., Saeb, M. and Badawi, U., “A Hybrid Hiding Encryption
Algorithm for Data Communication Security”, Cairo University,
Faculty of Science Mathematics Dept., Computer Science Division,
1997.
Data Hiding in Arabic Text
108
12. Kim, Young-Won, Moon, Kyung-Ae, and Oh, Il-Seok, “A Text
Watermarking Algorithm based on Word Classification and Inter-word
Space Statistics”, Department of Computer Science, Chonbuk National
University, Korea, 2003.
13. Sui, Xin-Giiang, and Lilo, Hui, “A New Steganography method Based
on Hypertext”, National Key Lab of Modern Signal Processing, IEEE,
2004.
14. Topkara, M., Taskiran, C. M., and Delp, E. J., “Natural Language
Watermarking”, Video and Image Processing Laboratory, School of
Electrical and Computer Engineering, Purdue University, Indiana,
2005.
15. Voloshynovskiy, S., Vill´an, R., and Koval, O., Vila, J., “Text Data-
Hiding for Digital and Printed Documents”, Computer Vision and
Multimedia Laboratory - University of Geneva, Switzerland, 2006.
16. Uden, L. and Alderson, A., “Teaching and Learning Using Instructional
Design”, School of Computing, Staffordshire University, England,
2000.
17. Tubsree, Chalong, and Tubsree, Nai-Fen Yu, “Designing Effective
Instruction for Computer in Education Courses”, International
Conference on Computers in Education, IEEE, 2002.
18. Fahim, Rasha, “Educational package for detecting hidden information
embedded in an image”, Ph.D. thesis, Technical Education Department,
University of Technology, Iraq, 2006.
19. SSH communications security web site, 2004.
http://www.ssh.fi/support/cryptography/introduction/algorithms.html 20. RSA Security web site, 2004.
http://www.rsasecurity.com/rsalabs/node.asp?id=2164 21. Menezes, Alfred J., van Oorschot, Paul C., and Vanstone, Scott A.,
“Handbook of Applied Cryptography”, CRC Press Inc., Fifth Printing,
2001.
Data Hiding in Arabic Text
109
22. RSA Security web site, 2004.
http://www.rsasecurity.com/rsalabs/node.asp?id=2209 23. Robshaw, M.J.B., “Stream Ciphers”, RSA Laboratories, a division of
RSA Data Security, Inc., 1995.
24. RSA Security web site, 2004.
http://www.rsasecurity.com/rsalabs/node.asp?id=2174 25. RSA Security web site, 2004.
http://www.rsasecurity.com/rsalabs/node.asp?id=2266 26. Bender, W., Gruhl, D., Morimoto, N., and Lu, A., “Techniques for data
hiding”, IBM Systems Journal, Vol.35, 1996.
27. Dunbar, Bret, “A detailed look at Steganographic Techniques and their
use in an Open-Systems Environment”, SANS Institute, 2002.
28. Unicode Inc. web site, 2005.
http://www.unicode.org
29. McCreedy, David, “Gallery of Unicode Fonts”, 2005.
http://www.travelphrases.info/fonts.html
30. Wikipedia web site, “Data compression”, The free encyclopedia, 2005.
http://en.wikipedia.org 31. Blelloch, Guy E., “Introduction to Data Compression”, Computer
Science Department, Carnegie Mellon University, 2001.
32. Goebel, Greg, “Lossless Data Compression”, public domain, 2005.
http://www.vectorsite.net/ttdcmp1.html
33. Crochemore, M. and Lecroq, T., “Text data compression algorithms, in
(Algorithms and Theory of Computation Handbook)”, Chapter 10,
CRC Press, Boca Raton, 1998.
34. Reigeluth, C. M., “Instructional Design Theories and Models”,
Lawrence Erlbaum Associates. Hillsdale, NJ, 1983.
35. Tennyson, R.D., Schott, F., and Dijkstra, S., “Instructional Design:
international Perspectives, Theory, Research and Models”, Lawrence
Erlbaum Associates, Hillsdale, NJ, 1997.
Data Hiding in Arabic Text
110
36. Dick, W. and Carey, L. M., “The Systematic Design of Instruction”,
Scott Foresman, Glenview, IL, 1 997.
37. Astin, B.H, “Principles of instructional Design”, University of Chicago
Press, 1997.
38. Al-Hela, M. M., “Design and produce an Instructional materials”,
Jordan, 2000.
39. Instructional Technology Center web site, “Thirteen Steps to Better
Instructional Visuals for Electronic Presentation”, Iowa State
University, 1999.
40. Microsoft web site, “Microsoft Office Word 2003 Rich Text Format
(RTF) Specification”, White Paper, Published: April 2004.
http://www.microsoft.com/office.htm
41. McCulloch, Bob, “Instructional Design”, University of Calgary, 1998.
http://www.ucalgary.ca/~edtech/688/getstart.htm
42. Wikipedia web site, “Standard Deviation”, The free encyclopedia,
2006. http://en.wikipedia.org/wiki/Standard_deviation
Data Hiding in Arabic Text
Appendixes
Data Hiding in Arabic Text
Appendix A Unicode Tables
A-1
Arabic Characters Standard Form, Range 0600-06FF
The table bellow contains the Unicode Standard, version 4.1, 2005.
Data Hiding in Arabic Text
Appendix A Unicode Tables
A-2
Arabic Characters Form-B, Range FE70-FEFF
The table bellow contains the Unicode Standard, version 4.1, 2005.
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-1
Subroutine 1: Initialize registers Public Sub initialize(key) For i = 1 To 10 a1 = a1 + Mid(Str(Asc(Mid(key, i, 1))), 2, 2) Next tim = Format(Timer, "00000.00") For i = 1 To 8 c1 = Val(Mid(a1, 2 + (i * 2), 2)) c2 = Val(Mid(tim, i, 1)) If i = 6 Then GoTo q Mid(a1, 2 + (i * 2)) = c1 * c2 q: Next i key = a1 ' *** generate register length *** For i = 1 To 4 r = Mid(key, i + 15, 1) / 10 b(i) = Int((30 - 20 + 1) * r + 20) Next i b(5) = 128 - b(1) - b(2) - b(3) - b(4) '*** generate locations for transition * c *** For j = 1 To 5 r = Mid(key, 1 + j, 2) + 1 For i = 1 To 16 l2: r = r / (1 + (Val(Mid(key, 1 + i, 1)) + 1) / 1000) t1 = Str(r) t2 = Right(r, 4) t3 = Val(t2) Mod b(j) + 1 a = t3 j1 = 0 For k = 1 To 16 If c(j, k) = a Then j1 = j1 + 1 Next k If j1 = 0 Then c(j, i) = a Else GoTo l2 Next i Next j '*** sort c ** For k = 1 To 5 For i = 1 To 16
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-2
For j = 1 To 16 If c(k, i) < c(k, j) Then t = c(k, i) c(k, i) = c(k, j) c(k, j) = t End If Next j Next i Next k '*** generate locations of address * up * d *** For j = 1 To 5 r = Mid(key, 5 + j, 2) + 1 For i = 1 To 4 l3: r = r / (1 + (Val(Mid(key, 5 + i, 1)) + 1) / 1000) t1 = Str(r) t2 = Right(r, 4) t3 = Val(t2) Mod b(j) + 1 a = t3 j1 = 0 For k = 1 To 4 If d(j, k) = a Then j1 = j1 + 1 Next k If j1 = 0 Then d(j, i) = a Else GoTo l3 Next i Next j '*** sort d *** For k = 1 To 5 For i = 1 To 4 For j = 1 To 4 If d(k, i) < d(k, j) Then t = d(k, i) d(k, i) = d(k, j) d(k, j) = t End If Next j Next i Next k '*** generate locations of address * down * e *** For j = 1 To 5 r = Mid(key, 10 + j, 2) + 1 For i = 1 To 4 l4:
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-3
r = r / (1 + (Val(Mid(key, 10 + i, 1)) + 1) / 1000) t1 = Str(r) t2 = Right(r, 4) t3 = Val(t2) Mod b(j) + 1 a = t3 j1 = 0 For k = 1 To 4 If e(j, k) = a Then j1 = j1 + 1 Next k If j1 = 0 Then e(j, i) = a Else GoTo l4 Next i Next j '*** sort e *** For k = 1 To 5 For i = 1 To 4 For j = 1 To 4 If e(k, i) < e(k, j) Then t = e(k, i) e(k, i) = e(k, j) e(k, j) = t End If Next j Next i Next k '*** generate locations for multiplexer address * f *** r = Mid(key, 20, 2) + 1 r = r / (1 + (Val(Mid(key, 20, 1)) + 1) / 1000) t1 = Str(r) t2 = Right(r, 4) t3 = Val(t2) Mod b(2) + 1 f(1) = t3 r = Mid(key, 15, 2) + 1 r = r / (1 + (Val(Mid(key, 15, 1)) + 1) / 1000) t1 = Str(r) t2 = Right(r, 4) t3 = Val(t2) Mod b(3) + 1 f(2) = t3 r = Mid(key, 7, 2) + 1 r = r / (1 + (Val(Mid(key, 7, 1)) + 1) / 1000) t1 = Str(r) t2 = Right(r, 4) t3 = Val(t2) Mod b(4) + 1
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-4
f(3) = t3 '*** generate bits for each register *** For j = 1 To 5 r = Mid(key, 12 + j, 2) + 1 For i = 1 To b(j) r = r / (1 + (Val(Mid(key, 13, 1)) + 1) / 1000) t1 = Str(r) t2 = Right(r, 3) t3 = Val(t2) Mod 2 g(j, i) = t3 Next Next For j = 1 To 5 r = Mid(key, 4 + j, 2) + 1 For i = 1 To b(j) r = r / (1 + (Val(Mid(key, 5, 1)) + 1) / 1000) t1 = Str(r) t2 = Right(r, 3) t3 = Val(t2) Mod 2 x(j, i) = t3 Next Next End Sub Subroutine 2: Shift bits Public Sub shifter(s1) k = 0 For j = 1 To s1 k = k + 1 '*** register 1 *** '*** calculate feedback bit *** xone1 = x(1, 1) For i = 2 To b(1) If g(1, i) = 1 Then xone1 = xone1 Xor x(1, i) Next '*** shift *** For i = b(1) To 2 Step -1 x(1, i) = x(1, i - 1) Next x(1, 1) = xone1
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-5
'*** put bit in next reg. *** h1 = 0: h2 = 0 For i = 1 To 4 If x(1, d(1, i)) = 1 Then h1 = h1 + 2 ^ (4 - i) If x(2, e(2, i)) = 1 Then h2 = h2 + 2 ^ (4 - i) Next x(2, c(2, h2)) = x(2, c(2, h2)) Xor x(1, c(1, h1)) '***register 2 *** '*** calculate feedback bit *** xone2 = x(2, 1) For i = 2 To b(2) If g(2, i) = 1 Then xone2 = xone2 Xor x(2, i) Next '*** shift *** For i = b(2) To 2 Step -1 x(2, i) = x(2, i - 1) Next x(2, 1) = xone2 '*** put bit in next reg. *** h1 = 0: h2 = 0 For i = 1 To 4 If x(2, d(2, i)) = 1 Then h1 = h1 + 2 ^ (4 - i) If x(3, e(3, i)) = 1 Then h2 = h2 + 2 ^ (4 - i) Next x(3, c(3, h2)) = x(3, c(3, h2)) Xor x(2, c(2, h1)) '***register 3 *** '*** calculate feedback bit *** xone3 = x(3, 1) For i = 2 To b(3) If g(3, i) = 1 Then xone3 = xone3 Xor x(3, i) Next '*** shift *** For i = b(3) To 2 Step -1 x(3, i) = x(3, i - 1) Next x(3, 1) = xone3 '*** put bit in next reg. *** h1 = 0: h2 = 0 For i = 1 To 4 If x(3, d(3, i)) = 1 Then h1 = h1 + 2 ^ (4 - i) If x(4, e(4, i)) = 1 Then h2 = h2 + 2 ^ (4 - i) Next x(4, c(4, h2)) = x(4, c(4, h2)) Xor x(3, c(3, h1))
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-6
'*** register 4 *** '*** calculate feedback bit *** xone4 = x(4, 1) For i = 2 To b(4) If g(4, i) = 1 Then xone4 = xone4 Xor x(4, i) Next '*** shift *** For i = b(4) To 2 Step -1 x(4, i) = x(4, i - 1) Next x(4, 1) = xone4 '*** put bit in next reg. *** h1 = 0: h2 = 0 For i = 1 To 4 If x(4, d(4, i)) = 1 Then h1 = h1 + 2 ^ (4 - i) If x(5, e(5, i)) = 1 Then h2 = h2 + 2 ^ (4 - i) Next x(5, c(5, h1)) = x(5, c(5, h1)) Xor x(4, c(4, h2)) '*** register 5 *** '*** calculate feedback bit *** xone5 = x(5, 1) For i = 2 To b(5) If g(5, i) = 1 Then xone5 = xone5 Xor x(5, i) Next '*** shift *** For i = b(5) To 2 Step -1 x(5, i) = x(5, i - 1) Next x(5, 1) = xone5 '*** put bit in next reg. *** h1 = 0: h2 = 0 For i = 1 To 4 If x(5, d(5, i)) = 1 Then h1 = h1 + 2 ^ (4 - i) If x(1, e(1, i)) = 1 Then h2 = h2 + 2 ^ (4 - i) Next x(1, c(1, h2)) = x(1, c(1, h2)) Xor x(5, c(5, h1)) '*** calculate output bit *** h3 = x(2, f(1)) * 2 ^ 0 + x(3, f(2)) * 2 ^ 1 + x(4, f(3)) * 2 ^ 2 Select Case h3 Case 0, 1: ks(k) = x(1, b(1)) Case 2: ks(k) = x(2, b(2)) Case 3, 4: ks(k) = x(3, b(3)) Case 5: ks(k) = x(4, b(4))
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-7
Case 6, 7: ks(k) = x(5, b(5)) End Select Next End Sub Subroutine 3: Test stream Public Sub test(n) '*** frequency test *** n0 = 0: n1 = 0 For i = 1 To n If ks(i) = 0 Then n0 = n0 + 1 If ks(i) = 1 Then n1 = n1 + 1 Next test1 = (n0 - n1) ^ 2 / n If test1 < 3.8415 Then Form1.Text4(0).Text = Form1.Text4(0).Text + " pass" Else Form1.Text4(0).Text = Form1.Text4(0).Text + " fail" End If '*** Sreial test *** n0 = 0: n1 = 0 n00 = 0: n01 = 0: n10 = 0: n11 = 0: t = "" For i = 1 To n If ks(i) = 0 Then n0 = n0 + 1 If ks(i) = 1 Then n1 = n1 + 1 t = t + Mid(Str(ks(i)), 2, 1) Next For i = 1 To n - 1 t1 = Mid(t, i, 2) If t1 = "00" Then n00 = n00 + 1 If t1 = "01" Then n01 = n01 + 1 If t1 = "10" Then n10 = n10 + 1 If t1 = "11" Then n11 = n11 + 1 Next test2 = (4 / (n - 1) * (n00 ^ 2 + n01 ^ 2 + n10 ^ 2 + n11 ^ 2)) - (2 / n) * (n0 ^ 2 + n1 ^ 2) + 1 If test2 < 5.9915 Then
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-8
Form1.Text4(1).Text = Form1.Text4(1).Text + " pass" Else Form1.Text4(1).Text = Form1.Text4(1).Text + " fail" End If '*** Poker test *** bm = 3 ' length of block f1 = Int(n / bm) For i = 1 To n t = t + Mid(Str(ks(i)), 2, 1) Next n000 = 0: n001 = 0: n010 = 0: n011 = 0 n100 = 0: n101 = 0: n110 = 0: n111 = 0 For i = 1 To n Step bm t1 = Mid(t, i, bm) If t1 = "000" Then n000 = n000 + 1 If t1 = "001" Then n001 = n001 + 1 If t1 = "010" Then n010 = n010 + 1 If t1 = "011" Then n011 = n011 + 1 If t1 = "100" Then n100 = n100 + 1 If t1 = "101" Then n101 = n101 + 1 If t1 = "110" Then n110 = n110 + 1 If t1 = "111" Then n111 = n111 + 1 Next test3 = (2 ^ bm / f1) * (n000 ^ 2 + n001 ^ 2 + n010 ^ 2 + n011 ^ 2 + n100 ^ 2 + n101 ^ 2 + n110 ^ 2 + n111 ^ 2) - f1 If test3 < 14.0671 Then Form1.Text4(2).Text = Form1.Text4(2).Text + " pass" Else Form1.Text4(2).Text = Form1.Text4(2).Text + " fail" End If '*** Run test *** Dim e1(20) b1 = 0: b2 = 0: b3 = 0: g1 = 0: g2 = 0: g3 = 0 k = 3:p=0:t = ks(1): p = 1 For i = 2 To n If ks(i) = t Then p = p + 1 Else
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-9
If ks(i) = 0 And p = 1 Then b1 = b1 + 1 If ks(i) = 0 And p = 2 Then b2 = b2 + 1 If ks(i) = 0 And p = 3 Then b3 = b3 + 1 If ks(i) = 1 And p = 1 Then g1 = g1 + 1 If ks(i) = 1 And p = 2 Then g2 = g2 + 1 If ks(i) = 1 And p = 3 Then g3 = g3 + 1 t = ks(i): p = 1 End If Next If ks(i - 1) = 0 And p = 1 Then b1 = b1 + 1 If ks(i - 1) = 0 And p = 2 Then b2 = b2 + 1 If ks(i - 1) = 0 And p = 3 Then b3 = b3 + 1 If ks(i - 1) = 1 And p = 1 Then g1 = g1 + 1 If ks(i - 1) = 1 And p = 2 Then g2 = g2 + 1 If ks(i - 1) = 1 And p = 3 Then g3 = g3 + 1 For i = 1 To k e1(i) = (n - i + 3) / 2 ^ (i + 2) Next test4 = (((b1 - e1(1)) ^ 2 / e1(1)) + ((b2 - e1(2)) ^ 2 / e1(2)) + ((b3 - e1(3)) ^ 2 / e1(3))) + (((g1 - e1(1)) ^ 2 / e1(1)) + ((g2 - e1(2)) ^ 2 / e1(2)) + ((g3 - e1(3)) ^ 2 / e1(3))) If test4 < 9.4877 Then Form1.Text4(3).Text = Form1.Text4(3).Text + " pass" Else Form1.Text4(3).Text = Form1.Text4(3).Text + " fail" End If '*** Autocorrelation test *** d1 = 8: ad = 0 For i = 0 To n - d1 - 1 ad = ad + (ks(i) Xor ks(i + d1)) Next test5 = 2 * (ad - (n - d1) / 2) / Sqr(n - d1) If test5 < 1.96 Then Form1.Text4(4).Text = Form1.Text4(4).Text + " pass" Else Form1.Text4(4).Text = Form1.Text4(4).Text + " fail" End If End Sub
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-10
Subroutine 4: Convert text to Huffman code Public Sub text_to_binary_huffman() t5 = Form1.Text1.Text t1 = Format(Len(Form1.Text1.Text), "000") + Form1.Text1.Text q3 = "" Call huffarray For i = 1 To Len(t5) xt = Mid(t5, i, 1) For j = 1 To 36 If huff(1, j) = xt Then q3 = q3 + huff(2, j): Exit For Next Next hidelen = Len(q3) s = Len(q3) s1 = Space(10) For j = 1 To 10 t4 = s Mod 2 s = Int(s / 2) Mid(s1, 11 - j, 1) = Mid(Str(t4), 2, 1) Next j q3 = s1 + q3 hidelen = Len(q3) k = 0: coun = 0 For i = 0 To hidelen k = k + 1 m(k) = Val(Mid(q3, i + 1, 1)) Next hidelen = k - 1 End Sub
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-11
Subroutine 5: Convert Huffman code to text Public Sub binary_to_text_huffman() t1 = 0 For j = 0 To 9 t1 = t1 + (2 ^ (9 - j) * m(j + 1)) Next j bitlen = t1 t1 = "" For i = 11 To bitlen + 10 t1 = t1 + Mid(Str(m(i)), 2, 1) Next i Call huffarray k = 1: c1 = 3: p = 0 Do xt = Mid(t1, k, c1) For j = 1 To 36 If huff(2, j) = xt Then st = st + huff(1, j): p = 1: Exit For Next If p = 1 Then k = k + c1 p = 0: c1 = 3 If k > bitlen Then Exit Do Else c1 = c1 + 1 End If Loop End Sub Subroutine 6: Encipher process Public Sub encipher() For i = 1 To hidelen ci(i) = m(i) Xor ks(i) Next End Sub
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-12
Subroutine 7: Decipher process Public Sub decipher() For i = 1 To hidelen m(i) = ci(i) Xor ks(i) Next End Sub Subroutine 8: Create Huffman array Public Sub huffarray() huff(1, 1) = " ": huff(2, 1) = "000" huff(1, 2) = "ا": huff(2, 2) = "001" huff(1, 3) = "ل": huff(2, 3) = "0100" huff(1, 4) = "ي": huff(2, 4) = "0101" huff(1, 5) = "م": huff(2, 5) = "0110" huff(1, 6) = "و": huff(2, 6) = "0111" huff(1, 7) = "ت": huff(2, 7) = "10000" huff(1, 8) = "ن": huff(2, 8) = "10001" huff(1, 9) = "ر": huff(2, 9) = "10010" huff(1, 10) = "ف": huff(2, 10) = "10011" huff(1, 11) = "ة": huff(2, 11) = "101000" huff(1, 12) = "ع": huff(2, 12) = "101001" huff(1, 13) = "ه": huff(2, 13) = "101010" huff(1, 14) = "ب": huff(2, 14) = "101011" huff(1, 15) = "س": huff(2, 15) = "101100" huff(1, 16) = "ق": huff(2, 16) = "101101" huff(1, 17) = "ك": huff(2, 17) = "101110" huff(1, 18) = "د": huff(2, 18) = "101111" huff(1, 19) = "أ": huff(2, 19) = "110000" huff(1, 20) = "ح": huff(2, 20) = "110001" huff(1, 21) = "ش": huff(2, 21) = "110010" huff(1, 22) = "ص": huff(2, 22) = "110011" huff(1, 23) = "ى": huff(2, 23) = "110100" huff(1, 24) = "ذ": huff(2, 24) = "110101" huff(1, 25) = "ج": huff(2, 25) = "110110" huff(1, 26) = "خ": huff(2, 26) = "110111" huff(1, 27) = "إ": huff(2, 27) = "111000" huff(1, 28) = "ط": huff(2, 28) = "111001" huff(1, 29) = "ث": huff(2, 29) = "111010" huff(1, 30) = "ض": huff(2, 30) = "111011"
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-13
huff(1, 31) = "ظ": huff(2, 31) = "111100" huff(1, 32) = "ئ": huff(2, 32) = "111101" huff(1, 33) = "غ": huff(2, 33) = "1111100" huff(1, 34) = "ز": huff(2, 34) = "1111101" huff(1, 35) = "ء": huff(2, 35) = "1111110" huff(1, 36) = "ؤ": huff(2, 36) = "1111111" End Sub Subroutine 9: Hide process (Unicode method) Public Sub hide_text1() Call asc_to_uni k1 = 0 For i = 1 To x.ActiveDocument.Words.Count With doc.ActiveDocument.Words(i) t = Trim(.Text) X1 = "" For j = 1 To Len(t) X2 = Mid(t, j, 1) x3 = Mid(t, j + 1, 1) If (j = 1 And (X2 = "ا" Or X2 = "أ" Or X2 = "د" Or X2 = "ذ" Or X2 = "ر"_ Or X2 = "ز" Or X2 = "و")) Or ((X1 = "ا" Or X1 = "أ" Or X1 = "د" Or_ X1 = "ذ" Or X1 = "ر" Or X1 = "ز" Or X1 = "و") And (X2 = "ا" Or_ X2 = "أ" Or X2 = "د" Or X2 = "ذ" Or X2 = "ر" Or X2 = "ز" Or_ X2 = "و")) Or (j = Len(t) And (X1 = "ا" Or X1 = "أ" Or X1 = "د" Or_ X1 = "ذ" Or X1 = "ر" Or X1 = "ز" Or X1 = "و")) Then k1 = k1 + 1 If ci(k1) = 1 Then h1 = AscW(.Characters(j).Text) For k2 = 1 To 36 If h1 = ascii(k2) Then .Characters(j).Text = ChrW(unicode(k2)) Exit For End If Next k2 GoTo f End If End If
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-14
X1 = X2 Next j End With f: Next i End Sub Subroutine 10: Unhide process (Unicode method) Public Sub unhide_text1() Call asc_to_uni k1 = 0 For i = 1 To doc.ActiveDocument.Words.Count With doc.ActiveDocument.Words(i) t = Trim(.Text) If Len(t) = 1 Then For k = 1 To 36 If 2 ^ 16 + AscW(t) = unicode(k) Then k1 = k1 + 1: ci(k1) = 1 GoTo f End If Next k End If If Len(t) = 2 Then For k = 1 To 36 If 2 ^ 16 + AscW(Left(t, 1)) = unicode(k) Then k1 = k1 + 1: ci(k1) = 1 k1 = k1 + 1: ci(k1) = 1 i = i + 1: GoTo f End If Next k End If If Len(t) = 3 Then For k = 1 To 36 If 2 ^ 16 + AscW(Left(t, 1)) = unicode(k) Then k1 = k1 + 1: ci(k1) = 1 k1 = k1 + 1: ci(k1) = 1 k1 = k1 + 1: ci(k1) = 1 GoTo f
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-15
End If Next k End If X1 = "" For j = 1 To Len(t) X2 = Mid(t, j, 1) x3 = Mid(t, j + 1, 1) If (j = 1 And (X2 = "ا" Or X2 = "أ" Or X2 = "د" Or X2 = "ذ" Or X2 = "ر"_ Or X2 = "ز" Or X2 = "و")) Or ((X1 = "ا" Or X1 = "أ" Or_ X1 = "د" Or X1 = "ذ" Or X1 = "ر" Or X1 = "ز" Or_ X1 = "و") And (X2 = "ا" Or X2 = "أ" Or X2 = "د" Or_ X2 = "ذ" Or X2 = "ر" Or X2 = "ز" Or X2 = "و"))_ Or (j = Len(t) And (X1 = "ا" Or X1 = "أ" Or X1 = "د" Or_ X1 = "ذ" Or X1 = "ر" Or X1 = "ز" Or X1 = "و")) Then k1 = k1 + 1: ci(k1) = 0 End If X1 = X2 Next j End With f: Next i End Sub Subroutine 11: Lookup table between ASCII code and Unicode Public Sub asc_to_uni() ascii(1) = 1569: unicode(1) = 65152 ascii(2) = 1570: unicode(2) = 65153 ascii(3) = 1571: unicode(3) = 65155 ascii(4) = 1572: unicode(4) = 65157 ascii(5) = 1573: unicode(5) = 65159 ascii(6) = 1574: unicode(6) = 65161 ascii(7) = 1575: unicode(7) = 65165 ascii(8) = 1576: unicode(8) = 65167 ascii(9) = 1577: unicode(9) = 65171 ascii(10) = 1578: unicode(10) = 65173 ascii(11) = 1579: unicode(11) = 65177 ascii(12) = 1580: unicode(12) = 65181
Data Hiding in Arabic Text
Appendix B Program Subroutines
B-16
ascii(13) = 1581: unicode(13) = 65185 ascii(14) = 1582: unicode(14) = 65189 ascii(15) = 1583: unicode(15) = 65193 ascii(16) = 1584: unicode(16) = 65195 ascii(17) = 1585: unicode(17) = 65197 ascii(18) = 1586: unicode(18) = 65199 ascii(19) = 1587: unicode(19) = 65201 ascii(20) = 1588: unicode(20) = 65205 ascii(21) = 1589: unicode(21) = 65209 ascii(22) = 1590: unicode(22) = 65213 ascii(23) = 1591: unicode(23) = 65217 ascii(24) = 1592: unicode(24) = 65221 ascii(25) = 1593: unicode(25) = 65225 ascii(26) = 1594: unicode(26) = 65229 ascii(27) = 1601: unicode(27) = 65233 ascii(28) = 1602: unicode(28) = 65237 ascii(29) = 1603: unicode(29) = 65241 ascii(30) = 1604: unicode(30) = 65245 ascii(31) = 1605: unicode(31) = 65249 ascii(32) = 1606: unicode(32) = 65253 ascii(33) = 1607: unicode(33) = 65257 ascii(34) = 1608: unicode(34) = 65261 ascii(35) = 1609: unicode(35) = 65263 ascii(36) = 1610: unicode(36) = 65265 End Sub Variable Definition Public x(5, 50) As Byte ' Registers bits Public g(5, 50) As Byte ' Feedback bits Public b(5) As Byte ' Register length Public c(5, 16) As Byte ' Locations for transition Public d(5, 4) As Byte ' Locations of address * up Public e(5, 4) As Byte ' Locations of address * down Public f(3) As Byte ' Locations for multiplexer selector Public ks(5000) ' Key stream Public tim As String ' Timer Public m(5000) As Byte ' Plain text Public ci(5000) As Byte ' Cipher text Public hidelen As Integer ' Length of data to cipher Public timloc(7) As Byte ' Timer locations Public huff(2, 50) ' Hufman array Public ascii(40), unicode(40) ' ASCII code and Unicode tables
Data Hiding in Arabic Text
Appendix C Expert’s and Learner’s Questionnaire Forms
C-1
Expert’s Questionnaire Form University of Technology Technical Education Dept. Electricity engineering section Dear Sir …………………… Please, answer on this questionnaire items, which concern in (Hiding information in Arabic text), by putting (*) in the suitable field, from your point of view.
Beforehand, thank you very much for your cooperation Auday Jamal The researcher No. The items Large Medium little
1 The instructions of using the package are simple.
2 The density of displaying information on computer screen is suitable.
3 The information that displayed in the package are suitable scientifically.
4 Designing of the instructional package takes into account the personal differences.
5 Clearness of the item’s titles.
6 Displaying package style is limiting.
7 The attached images in the instructional units are participating to understand the concepts.
8 Understanding the producing questions in the program.
9 Suitable of the used color in the package.
10 The questions are including all items in the instructional package.
11 The language style that used to explain the scientific concepts and information is clear.
Data Hiding in Arabic Text
Appendix C Expert’s and Learner’s Questionnaire Forms
C-2
Learner’s Questionnaire Form University of Technology Technical Education Dept. Electricity engineering section Dear learner …………………. Please, answer on this questionnaire items, which concern in (Hiding information in Arabic text), by putting (*) in the suitable field, from your point of view.
Beforehand, thank you very much for your cooperation Auday Jamal The researcher No. The items Large Medium little
1 The division of the instructional package subject into five typical units participated to increase your understanding of the package.
2 The scientific concepts that displayed in the package were simple.
3 The information that displayed in the package was clear.
4 The flowchart assisted to increase in understanding the scientific information
5 The harmony between display image and related information was fines.
6 Moving steps between the instructional package screens were simple.
7 English language is better that Arabic language in displaying the instructional package materials.
8 The language style to explain the scientific concepts is understand.
9 Using of colors cleared the displayed concepts.
10 Titles of the instructional package items are clear.
11 Immediate support for the answer, increased your desire to continue with the package.
12 The package increases your learning desire.
Data Hiding in Arabic Text
Appendix C Expert’s and Learner’s Questionnaire Forms
C-3
Expert Names
No. Name Place of word
1 Dr. Krikor S. Krikor Technical Education Department
2 Dr. Sameera Abdulla Technical Education Department
3 Dr. Inaam A. Al-Sadik Technical Education Department
4 Dr. Ibtesam Raheem Karhiy Technical Education Department
5 Dr. Hosham Salim Technical Education Department
6 Dr. Sahar Radiy Technical Education Department
7 Dr. Intethar Institute of Instructor Composing
8 Ashuaq Kassem Technical education Department
9 Asia Mohammad Institute of Technology
10 Nagham Ezat Institute of Technology
Data Hiding in Arabic Text
الخلاصة
ضم العالم تانتقال تلك المعلومات ضمن شبكة كيفية المعلومات وفي عالم تطورت فيه
ككل يتطلب ضرورة وجود طريقة للحفاظ على خصوصية وحماية المعلومات، فكان ذلك دافعا
لقيام هذا البحث، إذ إن هدفه إيجاد تقنية جديدة لتشفير وإخفاء البيانات في ملفات النصوص
.العربية
برنامج حاسوبي يقوم بتشفير المعلومة النصية لغرض إعدادولتحقيق هدف البحث تم
بنظر الاعتبار التعامل مع نوعين من الملفات حيث يضم النوع ينآخذ ضمن الملفات، إخفاءها
Rich Text ويضم النوع الثاني الملفات من نوع Document file الملفات من نوع الأول
Format file معالج النصوص تطبيقات كلا النوعين مع قيتواف حيثMicrosoft Word
Processor.
ربية ومن ثم إخفاءها باستخدام عبرنامج لتشفير النصوص العمل على بناء ليركز هذا ا
مع ناللتان تتعاملا Word Shift Coding و White Space Methodكلا من طريقتي
برنامج إعداد وكذلك . النصوص العربيةاءإخف حيث تم استخدامها في الإنكليزيةالنصوص
المستخدمة في الكتابة العربية ) Extension(لاخفاء النصوص العربية مستعينا بفكرة الاستطالة
. البياناتإخفاءلغرض
زال هناك حاجة الى طريقة اكثر تتم تطبيق الطرق اعلاه لاخفاء النصوص وتبين انه لا
تتعامل مع للإخفاء طريقة جديدة إعداد سرية عالية لذا تم منالإخفاءكفاءة لما تتطلبه عملية
على الإخفاء تعتمد في عملية إذ Unicode System methodالنصوص العربية وتم تسميتها
ومن ثم تطبيقها بشكل عملي على النصوص العربية . الخاصة بالحروف العربية) Code(الشفرة
النصوص هو نفس حجم الملف قبل إخفاءلملف بعد لها بان حجم االإيجابيةوتبين من النتائج
عند استعراضه في برنامج معالج النصوص الإخفاء، ومطابقة الملف بعد عملية الإخفاءعملية
بشكل كامل مما يجعل من الصعب اكتشاف البيانات المخفية من الإخفاءمع الملف قبل عملية
.قبل الشخص المعترض للرسالة
شبكة المعلومات العالمية مستخدمي من قبل والإخفاءة التشفير ونظرا لاستخدام عملي
تشغيل مختلفة قد يصعب على البعض منها استعراض أنظمةولكون هذه الشبكة تتعامل مع
ولضمان انسيابية وصول النصوص ما بين Document fileملفات النصوص من نوع
جميع ان تتعرف عليه بالإمكان من الملفات آخر نوع إيجادالمرسل والمستلم، كان لابد من
. التشغيلأنظمة
Data Hiding in Arabic Text
لاخفاء البيانات حيث Rich Text Formatالى استخدام الملفات من نوع تم اللجوء لذا
برنامج حاسوبي بإعدادوذلك ) Source Code(تم التعامل مع الشفرات الرئيسية للملفات
. ضمن الملفللإخفاء
عند الإخفاءتي تم التوصل اليها هي مطابقة الملف بعد عملية ومن النتائج الإيجابية ال
إمكانية بشكل كامل، و الإخفاءاستعراضه في برنامج معالج النصوص مع الملف قبل عملية
من سلبيات هذه الطريقة فهو زيادة الى جانب ذلك . إخفائهامضاعفة كمية المعلومات التي يمكن
برنامج فرعي يقوم إعداد، ولغرض تفادي ذلك تم إخفائهاتم حجم الملف بزيادة المعلومات التي ي
.الأصليبضغط الملف وتصغير حجمه بحيث يكون مقارب تماما لحجم الملف
التصميم مبادئتصميم وتنفيذ حقيبة تعليمية بالاعتماد على كذلك البحث من أهداف
الخاصة بعمليتي التشفير لتقديم المفاهيم والمعلوماتالإرشاديةالتعليمي وباستخدام الطريقة
استبيان لغرض إعدادولغرض ضمان استفادة الفئة المستهدفة من الحقيبة التعليمية تم . والإخفاء
الخبراء ومجموعة اخرى من الطلبة، وبناءا الأساتذةاستعراضها وتقييمها من قبل مجموعة من
التعديلات التي تتطلبها الحقيبة راءإجعلى نتائج الاستبيان وبالاستعانة بعملية التغذية العكسية تم
.التعليمية من اجل الوصول الى الربط بين الجانبين العملي والنظري للبحث
Data Hiding in Arabic Text
جمهورية العراق
وزارة التعليم العالي والبحث العلمي
الجامعة التكنولوجية
قسم التعليم التكنولوجي
אאא
الىمقدمةأطروحة
قسم التعليم التكنولوجي في الجامعة التكنولوجية
دكتوراه فلسفةدرجةوهي جزء من متطلبات نيل
في
هندسة كهربائية/ تكنولوجيا التعليم الهندسي
من قبل
فوزيعدي جمال
بأشراف
شوكت ذياب الهيازعي. د. أ صالح مهدي القرعاوي. د. م.أ
2007