xml watermarking & information hiding 孙星明 博士、教授、博士生导师...
TRANSCRIPT
XML Watermarking & Information Hiding
孙星明 博士、教授、博士生导师
湖南大学计算机与通信学院网络与信息安全湖南省重点实验室
Markup Language
SGML (Standard Generalized Markup L
anguage)
XML (Extensible Markup Language)
HTML (HyperText Markup Language)
XHTML
Publishing Information in WWW
Publishing Information in WWW
XML Document
XML element type
text
image
Video
Audio
executive codes
…
CorrespondingWatermarking
and information hiding
techniquescan be employed
Can we use its own information to do watermarking or
information hiding?
Known content-based technique
Change font size, color
Append white spaces at the end of
a line0-space ( )
1-tab (	)
Shortcomings
white spaces at the end of a line
Increase page size
Layout might be changed
Detect very easily by selection
Specification
Element (Entity) <name attribute1 … attributen> contents </name >
<name attribute1 … attributen> </name >
<name attribute1 … attributen>
Attributename=“value”
Example<font face="Verdana" size="4" color="#FFFF00">Student Number: </font>
Properties of markup labels
Property 1: Element and attribute
names are case-insensitive<font face="Verdana" size="4" color="#FFFF00">Student Number: </font>
<Font face="Verdana" size="4" color="#FFFF00">Student Number: </font>
<font face="Verdana" size="4" color="#FFFF00">Student Number: </Font>
<Font face="Verdana" size="4" color="#FFFF00">Student Number: </Font>
…
Properties of markup labels
Property 2: Attributes are order-
insensitive<font face="Verdana" size="4" color="#FFFF00">Student Number: </font>
<font size="4" face="Verdana" color="#FFFF00">Student Number: </font>
Pair attributes technique
pair attributes order (Corinna John)
key attribute, corresponding attribute
key / corresponding (1) corresponding/key (0) <font face="Verdana" size="4" color="#FFFF0
0">Student Name:</font><font size="4" face="Verdana" color="#FFFF0
0">Student Name:</Font>
key / corresponding table
size, detect difficultly
Attributes permutation technique
equivalent attributes permutation<font face="Verdana" size="4" color="#FFFF00">Student Name:</font>
<font face="Verdana" color="#FFFF00" size="4">Student Name:</font>
<font size="4" face="Verdana" color="#FFFF00">Student Name:</font>
<font size="4" color="#FFFF00" face="Verdana" >Student Name:</font>
<font color="#FFFF00" face="Verdana" size="4" >Student Name:</font>
<font color="#FFFF00" size="4" face="Verdana" >Student Name:</font>
lexicographic (alphabetic) order: f precedes a permutation g iff f(k)<g(k) for the minimum value of k such that f(k)<>g(k).
Attributes permutation technique
Generating attributes permutations in lexicographical order
<font color="#FFFF00" face="Verdana" size="4" >Student Name:</font>
<font color="#FFFF00" size="4" face="Verdana" >Student Name:</font>
<font face="Verdana" color="#FFFF00" size="4">Student Name:</font>
<font face="Verdana" size="4" color="#FFFF00">Student Name:</font>
<font size="4" face="Verdana" color="#FFFF00">Student Name:</font>
<font size="4" color="#FFFF00" face="Verdana" >Student Name:</font>
attributes permutations order numberscolor face size 0
color size face 1
face color size 2
face size color 3
size face color 4
Size color face 5
Attributes permutation technique
If the number of attributes of an element >=2, it may be used to embed hidden information or watermark
Let be the elements, whose number of attributes , in a web page, the embedded capacity is
1{ }ni iE
| | 2iE
21
log (| | !)n
ii
E
Embedded capacity example
Name of web page Capacity (bytes)
www.163.com 48
www.sina.com.cn 279
www.sohu.com.cn 338
www.microsfot.com 15
www.ebay.com 78
www.yahoo.com 33
Perceivability
Can not perceive when browse the page
Hard to perceive through reading the source codes
Robust or resistant against editing Contents can be changed
Robust or resistant against editing Font, size, color can be changed
Security
attributes permutations order numberscolor face size 0
color size face 1
face color size 2
face size color 3
size face color 4
Size color face 5
Apply hash to concatenation of attributes and key to get order number
( )hash attribute key
Performance comparison
TypeSize
change
Perceivable by Capacity
(bit)
Extra
payloadview code
White
spaceY easy easy Page lines N
Case
changeN N easy Tags N
Attribute
pairN N hard Pair table
Equivalent
attributesN N hard N
1
| | / 2n
ii
E
21
log (| | !)n
ii
E
Other potential properties
String delimitersname=“value”
name=‘value’
White Space Between the Element’s Name and the First Attribute
<font face=”verdana” size=”3”>
<font face=”verdana” size=”3”>
White Space Between Attributes<font face=”verdana” size=”3”>
<font face=”verdana” size=”3”>
Other potential properties
White Space after “=“
<font face=”verdana” size=”3”>
<font face= ”verdana” size=”3”>
White Space Between Elements
<td>con1</td><td>con2</td>
<td>con1</td> <td>con2</td>
Other potential properties
The default value of an attribute
<font face=”verdana” size=”3”>
<font face=”verdana”>
Current progress
Introduce insignificant attributes<font face=”verdana”>
<font face=”verdana” xyz=“abcd”>
Break through the capacity bottle neck
Web page watermarking
Text watermarking
21
log (| | !)n
ii
E
Our focus on watermarking
Text content securityFunded by NSFC Key Project 60736016
Funded by NSFC 60373062
Software watermarkingFunded by NSFC 60573045
Wireless sensor network securityFunded by 973 Project 2006CB303000
Funded by NSFC 60873198
SteganalysisFunded by 115 Project
HyperText Markup Language (HTML), version 4.0, the publishing language of the World Wide Web
Recall that in HTML, element and attribute names are case-insensitive; the convention is meant to encourage readability.
Element and attribute names in this document have been marked up and may be rendered specially by some user agents.
http://www.w3.org/TR/1998/REC-html40-19980424/about.html#h-1.2.1
http://www.w3.org/TR/html/#xhtml HTML 4 [HTML4] is an SGML (Standard Generalized Markup Language) application
conforming to International Standard ISO 8879, and is widely regarded as the standard publishing language of the World Wide Web.
SGML is a language for describing markup languages, particularly those used in electronic document exchange, document management, and document publishing. HTML is an example of a language defined in SGML.
SGML has been around since the middle 1980's and has remained quite stable. Much of this stability stems from the fact that the language is both feature-rich and flexible. This flexibility, however, comes at a price, and that price is a level of complexity that has inhibited its adoption in a diversity of environments, including the World Wide Web.
HTML, as originally conceived, was to be a language for the exchange of scientific and other technical documents, suitable for use by non-document specialists. HTML addressed the problem of SGML complexity by specifying a small set of structural and semantic tags suitable for authoring relatively simple documents. In addition to simplifying the document structure, HTML added support for hypertext. Multimedia capabilities were added later.
In a remarkably short space of time, HTML became wildly popular and rapidly outgrew its original purpose. Since HTML's inception, there has been rapid invention of new elements for use within HTML (as a standard) and for adapting HTML to vertical, highly specialized, markets. This plethora of new elements has led to interoperability problems for documents across different platforms.
XML™ is the shorthand name for Extensible Markup Language [XML].
XML was conceived as a means of regaining the power and flexibility of SGML without most of its complexity. Although a restricted form of SGML, XML nonetheless preserves most of SGML's power and richness, and yet still retains all of SGML's commonly used features.
While retaining these beneficial features, XML removes many of the more complex features of SGML that make the authoring and design of suitable software both difficult and costly.
XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4 [HTML4]. XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents. The details of this family and its evolution are discussed in more detail in [XHTMLMOD].
XHTML 1.0 (this specification) is the first document type in the XHTML family. It is a reformulation of the three HTML 4 document types as applications of XML 1.0 [XML]. It is intended to be used as a language for content that is both XML-conforming and, if some simple guidelines are followed, operates in HTML 4 conforming user agents. Developers who migrate their content to XHTML 1.0 will realize the following benefits:
XHTML documents are XML conforming. As such, they are readily viewed, edited, and validated with standard XML tools.
XHTML documents can be written to operate as well or better than they did before in existing HTML 4-conforming user agents as well as in new, XHTML 1.0 conforming user agents.
XHTML documents can utilize applications (e.g. scripts and applets) that rely upon either the HTML Document Object Model or the XML Document Object Model [DOM].
As the XHTML family evolves, documents conforming to XHTML 1.0 will be more likely to interoperate within and among various XHTML environments.
The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while still remaining confident in their content's backward and future compatibility.
Terrorismhttp://www.arabteam2000-forum.com/
Jihad 信息隐藏技术训练手册 ( 阿拉伯文 ) 的部分英文翻译
Watermark embedding
Watermark detection
Classification of watermarking—by host Image
Audio
Video
Text (Document)
Software / Executive code
Database
Text watermarking & Information Hiding
web
book PDF,WORDWPS,PS,etc
TXTunformatted
WatermarkingWatermarking
Information hidingInformation hiding
Any redundance?
Character CodeOne to oneOne to one
NONO
Utilize format information
Line-shift Coding
vertically displacing an entire text line
Word-shift Coding
horizontally shifting the location of a word within a text line
Character feature coding
altering a particular feature of an individual character
Utilize language information
Synonym substitution
Syntactic transform
TMR tree (text meaning representation)
Add spaces at the end of a line
Text recoverable watermarking
Format based watermarking?
Natural language watermarking?
How to combine??
Text recoverable watermarking???