classification of cipher using machine learning techniques

17
Literature Survey Classification of Ciphers Submitted By: Om Prakash Enrollment No: 13303015 Supervisor Name: Dr. Satish Chandra Introduction In Cryptography, a cipher is an algorithm for performing encryption or decryption. The earliest form of ciphers deals with hiding the readability of the plaintext is known as Classical Ciphers. Ciphertexts produced by these ciphers always reveal statistical information about the plaintext. Further as the computational power of the computers enhanced, nearly all such ciphers became more or less readily breakable. Thus in the mid 19070s new ciphers came into existence is heavily based on mathematical theories and computer science practices to achieve computational hardness is classified as Modern Ciphers. As the computational hardness of the ciphers grown breaking it became a new challenging area known as Cryptanalysis. Attack Scenarios: Ciphertext-only attack: This is the most basic type of attack and refers to the scenario where the adversary just observes a ciphertext and attempts to determine the plaintext that was encrypted. Known-plaintext attack: Here, the adversary learns one or more pairs of plaintexts/ciphertexts encrypted under the same key. The aim of the adversary is then to determine the plaintext that was encrypted to give some other ciphertext (for which it does not know the corresponding plaintext). Chosen-plaintext attack: In this attack, the adversary has the ability to obtain the encryption of any plaintext(s) of its choice. It then attempts to determine the plaintext that was encrypted to give some other ciphertext. Chosen-ciphertext attack: The final type of attack is one where the adversary is even given the capability to obtain the decryption of any ciphertext(s) of its choice. The adversary's aim, once again, is then to determine the plaintext that was encrypted to give some other ciphertext (whose decryption the adversary is unable to obtain directly).

Upload: ompjha1991

Post on 15-Jan-2016

169 views

Category:

Documents


10 download

DESCRIPTION

This project is submitted in partial fulfillment of final year dissertation in the year 2015 by Om Prakash at Jaypee Institute of Information Technology, Noida. The project is developed in Accord .NET framework.Contact: Om [email protected]

TRANSCRIPT

Page 1: Classification of Cipher Using Machine Learning Techniques

Literature Survey

Classification of Ciphers

Submitted By: Om Prakash

Enrollment No: 13303015

Supervisor Name: Dr. Satish Chandra

Introduction

In Cryptography, a cipher is an algorithm for performing encryption or decryption. The

earliest form of ciphers deals with hiding the readability of the plaintext is known as

Classical Ciphers. Ciphertexts produced by these ciphers always reveal statistical

information about the plaintext. Further as the computational power of the computers

enhanced, nearly all such ciphers became more or less readily breakable. Thus in the mid

19070s new ciphers came into existence is heavily based on mathematical theories and

computer science practices to achieve computational hardness is classified as Modern

Ciphers.

As the computational hardness of the ciphers grown breaking it became a new challenging

area known as Cryptanalysis.

Attack Scenarios:

Ciphertext-only attack: This is the most basic type of attack and refers to the

scenario where the adversary just observes a ciphertext and attempts to determine the

plaintext that was encrypted.

Known-plaintext attack: Here, the adversary learns one or more pairs of

plaintexts/ciphertexts encrypted under the same key. The aim of the adversary is then

to determine the plaintext that was encrypted to give some other ciphertext (for which

it does not know the corresponding plaintext).

Chosen-plaintext attack: In this attack, the adversary has the ability to obtain the

encryption of any plaintext(s) of its choice. It then attempts to determine the plaintext

that was encrypted to give some other ciphertext.

Chosen-ciphertext attack: The final type of attack is one where the adversary is

even given the capability to obtain the decryption of any ciphertext(s) of its choice.

The adversary's aim, once again, is then to determine the plaintext that was encrypted

to give some other ciphertext (whose decryption the adversary is unable to obtain

directly).

Page 2: Classification of Cipher Using Machine Learning Techniques

A ciphertext-only attack is the easiest to carry out in practice; the only thing the adversary

needs is to eavesdrop on the public communication line over which encrypted messages are

sent. Considering this scenario the first and foremost thing that adversary needs to do is to

find the cipher used to encrypt the message. This creates the foundation for research and

development in area of classification of ciphers.

Contemporary challenging R & D problems in classification of ciphers

Very view work has been done and published in the public domain till now in this area.

Classifying classical ciphers using frequency analysis is a trivial task yet classifying modern

ciphers is quite a lot difficult and even the best solution has the success rate below 50%. Thus

it is very challenging research and development problem to classify the modern ciphers from

a group and even from the universe. My research focus is to classify the modern and classical

cipher from a group which is mostly used.

Paper 1:

Summary: In this paper Classical Substitution Cipher namely, Playfair, Vigenère and Hill

ciphers are considered, is classified using neural network based identification. The features of

the cipher methods under consideration are extracted and a back propagation neural network

is trained. The network is tested for random texts with random keys of various lengths. The

cipher text size is fixed as 1Kb. The results so obtained were encouraging.

Title of paper Classification of Substitution Ciphers using Neural Networks

Authors G.Sivagurunathan, V.Rajendran, and Dr.T.Purusothaman

Year of Publication 2010

Web link http://paper.ijcsns.org/07_book/201003/20100340.pdf

Publishing Details Sivagurunathan, G., Rajendran, V., & Purusothaman, T. (2010).

Classification of Substitution Ciphers using Neural Networks.

IJCSNS, 10(3), 274.

Page 3: Classification of Cipher Using Machine Learning Techniques

Paper 2:

Title of paper Classification of Ciphers

Authors Pooja Maheshwari

Year of Publication 2001

Web link http://www.security.iitk.ac.in/pages/projects/cryptanalysis/reposito

ry/pooja.ps

Publishing Details Maheshwari, P. (2001). Classification of ciphers (Doctoral

dissertation, Indian Institute of Technology, Kanpur).

Page 4: Classification of Cipher Using Machine Learning Techniques

Summary: This paper deals with classifying the Classical Ciphers namely Substitution

Cipher,Permutation Cipher, Combination of Substitution and Permutation Cipher,

andVigenere Cipher and Modern Ciphers namely DES and IDEA. In case of these classical

Ciphers the main attack is frequencydistribution whereas for classifying DES and IDEA

severalapproaches like randomness tests, use of XORoperations, use of threshold functions is

used and some encouraging results are found.

Paper 3:

Summary: In this paper the author has Classified Blowfish, RC4 and Camellia using Support

Vector Machine and a goodness threshold is achieved using which trivially good test vector

share obtained which further modified to get better result.

At the beginning a set of test vectors is generated by solving the following linear program.

Maximize Objective function f = ci

Subjected to a set of constraints of the form

ci bi > 𝑇

320

𝑖=1

and set of RC4 constraints

ci bi ≀ T

320

𝑖=1

Where the possible values of C1 is Blowfish or Camellia.

Title of paper Classification of Ciphers Using Machine Learning

Authors Gaurav Saxena

Year of Publication 2008

Web link http://www.security.iitk.ac.in/contents/publications/more/ciphers_m

achine_learning.pdf

Publishing Details Saxena, G. (2008). Classification of ciphers using machine learning.

Master's thesis, Department of Computer Science and Engineering,

Indian Institute of Technology. Kanpur.

Page 5: Classification of Cipher Using Machine Learning Techniques

The values of training and testing errors seem to indicate that it is easier to classify good test

vectors from bad test vectors if lower values of goodness threshold are considered.

Page 6: Classification of Cipher Using Machine Learning Techniques

Paper 4:

Summary: In this paper the following block cipher algorithms, DES, IDEA,AES, and RC

operating in ECB mode are considered .Eight different classification techniques which are:

NaΓ―ve Bayesian(NB), Support Vector Machine (SVM), neural network (MPL),Instance based

learning (IBL), Bagging (Ba), AdaBoostM1,Rotaion Forest (RoFo), Decision Tree are used

to identify the cipher text. This study aims to find the best classification algorithm to identify

the cipher encryption method. The performance of each of the classifiers is presented, and the

simulation results show that, in general, the RoFo classifier has the highest classification

accuracy.

Title of paper Classifying Encryption Algorithms Using Pattern Recognition

Techniques

Authors Suhaila O. Sharif, L.I. Kuncheva and S.P. Mansoor

Year of Publication 2010

Web link http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5689769

&isnumber=5688739

Publishing Details Sharif, S.O.; Kuncheva, L.I.; Mansoor, S.P., "Classifying encryption

algorithms using pattern recognition techniques," Information

Theory and Information Security (ICITIS), 2010 IEEE International

Conference on , vol., no., pp.1168,1172, 17-19 Dec. 2010

Page 7: Classification of Cipher Using Machine Learning Techniques

Paper 5:

Summary: In this paper, author has proposed an approach for identification of encryption

method for block ciphers using support vector machines. Five block ciphers namely DES

(CBC), 3DES, Blowfish, AES and RC5 is identified and result accuracy is obtained.

Title of paper Identification of Block Ciphers using Support Vector Machines

Authors Dileep A. D. and C. Chandra Sekhar

Year of Publication 2006

Web link http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1716462

&isnumber=36115

Publishing Details Dileep, A.D.; Sekhar, C.C., "Identification of Block Ciphers using

Support Vector Machines," Neural Networks, 2006. IJCNN '06.

International Joint Conference on , vol., no., pp.2696,2701, 0-0 0

Page 8: Classification of Cipher Using Machine Learning Techniques

Paper 6:

Summary: Ciphers encrypted with the same key are called ciphers in depth. A depth attack is

a form of cryptanalysis that takes advantage of finding ciphers in depth and could break a

cryptosystem without even knowing the encryption algorithm. The first task in a depth attack

is to cluster ciphers according to their common keys and is called depth detection. Then one

may want to know the file type of the underlying message of each cipher. In this paper depth

Title of paper Classifying File Type of Stream Ciphers in Depth Using Neural

Networks

Authors James George Dunham, Ming-Tan Sun and Judy C. R. Tseng

Year of Publication 2005

Web link http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1387088

&isnumber=30191

Publishing Details Dunham, J.G.; Ming-Tan Sun; Tseng, J.C.R., "Classifying file type

of stream ciphers in depth using neural networks," Computer

Systems and Applications, 2005. The 3rd ACS/IEEE International

Conference on , vol., no., pp.97,, 2005

Page 9: Classification of Cipher Using Machine Learning Techniques

detection is accomplished for stream ciphers with a hit rate of 99.5%. Ciphers in depth are

further classified according to the file types of their underlying messages with an accuracy of

over 90%. One important goal of this research is not to use the structure and key words of

any specific file types as this allows the result to be applied to general file types. Also, the

features extracted from the test samples for classification are simple ones, leaving room for

improving the performance by adopting more complicated features.

Paper 7:

Summary: Genetic algorithms (GAs) are a class of optimization algorithms. In this paper

authors have proposed genetic algorithm to decipher mono alphabetic substitution cipher

using frequency analysis to obtain objective function.

The following is an outline of proposed algorithm:

Title of paper Using Genetic Algorithm to Break a Mono - Alphabetic Substitution

Cipher

Authors S. S. Omran, A. S. Al-Khalid and D. M. Al-Saady

Year of Publication 2010

Web link http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5720065

&isnumber=5719958

Publishing Details Omran, S.S.; Al-Khalid, A.S.; Al-Saady, D.M., "Using Genetic

Algorithm to break a mono - alphabetic substitution cipher," Open

Systems (ICOS), 2010 IEEE Conference on , vol., no., pp.63,67, 5-7

Dec. 2010

Page 10: Classification of Cipher Using Machine Learning Techniques

Results Obtained:

Page 11: Classification of Cipher Using Machine Learning Techniques

Classification of Classical Cipher

Motivation: Relative English Letter Frequency Analysis of entries in the Concise Oxford

dictionary by some trusted compilers shows very interesting result.

Unigram relative frequency of letters in English language

Bigram frequency in the English language

th 1.52 en 0.55 ng 0.18

he 1.28 ed 0.53 of 0.16

in 0.94 to 0.52 al 0.09

er 0.94 it 0.50 de 0.09

an 0.82 ou 0.50 se 0.08

re 0.68 ea 0.47 le 0.08

nd 0.63 hi 0.46 sa 0.06

at 0.59 is 0.46 si 0.05

on 0.57 or 0.43 ar 0.04

nt 0.56 ti 0.34 ve 0.04

ha 0.56 as 0.33 ra 0.04

es 0.56 te 0.27 ld 0.02

st 0.55 et 0.19 ur 0.02

Page 12: Classification of Cipher Using Machine Learning Techniques

16 most common character-level trigrams in English language

1. the

2. and

3. tha

4. ent

5. ing

6. ion

7. tio

8. for

9. nde

10. has

11. nce

12. edt

13. tis

14. oft

15. sth

16. men

Proposed Algorithm:

Cost1 = |πΎπ‘ˆ 𝑖 βˆ’ π·π‘ˆ(𝑖)|25𝑖=0 // Cost for unigram frequency of alphabets in order

Cost2 = |πΎπ‘ˆπ‘† 𝑖 βˆ’ π·π‘ˆπ‘†(𝑖)|25𝑖=0 // Cost of sorted unigram frequency of alphabets

Cost3 = |𝐾𝐡𝑆 𝑖 βˆ’ 𝐷𝐡𝑆(𝑖)|25𝑖=0 // Cost of sorted bigram frequency of alphabets

Cost4 = |𝐾𝑇𝑆 𝑖 βˆ’ 𝐷𝑇𝑆(𝑖)|25𝑖=0 // Cost of sorted trigram frequency of alphabets

If(cost1 ≀ Tval1)

return P Cipher

if( cost2 ≀ Tval2)

if(cost3 ≀ Tval3 && cost4 ≀ Tval4)

return S Cipher

else

return PS Cipher

else

return Unclassified

Page 13: Classification of Cipher Using Machine Learning Techniques

Implementation:

Page 14: Classification of Cipher Using Machine Learning Techniques

Appendix

A. Gantt Chart

Survey of Challenging problems in Cryptography

Defining Problem Statement

Literature Survey on Classification of Ciphers

Impl. of Selected Ciphers

Impl. of Classical Cipher Classifier

Choosing the Modern Ciphers to Classify

Implementing Modern Ciphers Classifier

Analysing Correctness

Improving the accuracy

Start Date

Duration

Page 15: Classification of Cipher Using Machine Learning Techniques

B. Details of practice with new tool/technology

I am using Windows Form Application to implement Classical Cipher Classifier and

MATLAB to implement Modern Cipher Classifier.

Windows Form Application is a development environment with .Net framework supporting

various languages at the back end like c++, c#, VB, F#. The application can run on Windows

or even on Linux in Wine.

MATLAB is a high-level language and interactive environment for numerical computation,

visualization, and programming. By using MATLAB, one can analyze data, develop

algorithms, and create models and applications. The language, tools, and built-in math

functions enables us to explore multiple approaches and reach a solution faster than with

spreadsheets or traditional programming languages, such as C/C++ or Java. One can use

MATLAB for a range of applications, including signal processing and communications,

image and video processing, control systems, test and measurement, computational finance,

and computational biology. MATLAB is widely used today in industry and academia as the

language of technical computing.

I will be using MATLAB initially for result evaluation, when I will be able to get good

results then I will convert my MATLAB code in other programming language like Java or

OpenCV.

Page 16: Classification of Cipher Using Machine Learning Techniques

C. References

[1].Sivagurunathan, G., Rajendran, V., & Purusothaman, T. (2010). Classification of

Substitution Ciphers using Neural Networks. IJCSNS, 10(3), 274.

[2].Sharif, S.O.; Kuncheva, L.I.; Mansoor, S.P., "Classifying encryption algorithms using

pattern recognition techniques," Information Theory and Information Security (ICITIS), 2010

IEEE International Conference on , vol., no., pp.1168,1172, 17-19 Dec. 2010

[3]. Dileep, A.D.; Sekhar, C.C., "Identification of Block Ciphers using Support Vector

Machines," Neural Networks, 2006. IJCNN '06. International Joint Conference on , vol., no.,

pp.2696,2701, 0-0 0

[4]. Dunham, J.G.; Ming-Tan Sun; Tseng, J.C.R., "Classifying file type of stream ciphers in

depth using neural networks," Computer Systems and Applications, 2005. The 3rd

ACS/IEEE International Conference on , vol., no., pp.97,, 2005

[5]. Khadivi, P.; Momtazpour, M., "Application of data mining in cryptanalysis,"

Communications and Information Technology, 2009. ISCIT 2009. 9th International

Symposium on , vol., no., pp.358,363, 28-30 Sept. 2009

[6]. Khadivi, P.; Momtazpour, M., "Cipher-text classification with data mining," Advanced

Networks and Telecommunication Systems (ANTS), 2010 IEEE 4th International

Symposium on , vol., no., pp.64,66, 16-18 Dec. 2010

[7]. Omran, S.S.; Al-Khalid, A.S.; Al-Saady, D.M., "Using Genetic Algorithm to break a

mono - alphabetic substitution cipher," Open Systems (ICOS), 2010 IEEE Conference on ,

vol., no., pp.63,67, 5-7 Dec. 2010

[8]. Omran, S.S.; Al-Khalid, A.S.; Al-Saady, D.M., "A cryptanalytic attack on Vigenère

cipher using genetic algorithm," Open Systems (ICOS), 2011 IEEE Conference on , vol., no.,

pp.59,64, 25-28 Sept. 2011

[9]. De Canniere, C.; Biryukov, Alex; Preneel, B., "An introduction to Block Cipher

Cryptanalysis," Proceedings of the IEEE , vol.94, no.2, pp.346,356, Feb. 2006

[10]. Toemeh, R., & Arumugam, S. (2008). Applying Genetic Algorithms for Searching Key-

Space of Polyalphabetic Substitution Ciphers. Int. Arab J. Inf. Technol., 5(1), 87-91.

[11]. Dureha, A., & Kaur, A. (2013). A Generic Genetic Algorithm to Automate an Attack on

Classical Ciphers. International Journal of Computer Applications, 64(12), 20-25.

[12]. Mishra, S., & Bali, S. (2013). Public key cryptography using genetic algorithm.

International J Recent Technol. Eng.(IJRTE), 2(2), 150-154.

[14]. Toemeh, R., & Arumugam, S. (2008). Applying Genetic Algorithms for Searching Key-

Space of Polyalphabetic Substitution Ciphers. Int. Arab J. Inf. Technol., 5(1), 87-91.

Page 17: Classification of Cipher Using Machine Learning Techniques

[15]. Maheshwari, P. (2001). Classification of ciphers (Doctoral dissertation, Indian Institute

of Technology, Kanpur).

[16]. Saxena, G. (2008). Classification of ciphers using machine learning. Master's thesis,

Department of Computer Science and Engineering, Indian Institute of Technology. Kanpur.

[17]. Nagireddy, S. (2008). A Pattern Recognition Approach To Block.

[18]. Rao, M. B. (2003). Classification of RSA and IDEA Ciphers (Doctoral dissertation,

Indian Institute of Technology, Kanpur).

[19]. http://en.wikipedia.org/wiki/Letter_frequency

[20]. http://en.wikipedia.org/wiki/Bigram

[21]. http://en.wikipedia.org/wiki/Trigram