plagiarism detection as a problem of machine learning

23
Plagiarism Detection as a Plagiarism Detection as a Problem of Machine Problem of Machine Learning Learning Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing Center of Russian Academy of Sciences Forecsys Corporation

Upload: sulwyn

Post on 18-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Academician Yuri I. Zhuravlev Correspondent member of RAS Konstantin V. Rudakov Gleb V. Nikitov Computing Center of Russian Academy of Sciences Forecsys Corporation. Plagiarism Detection as a Problem of Machine Learning. About the problem. Detect citing in students’ papers - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Plagiarism Detection as a Problem of Machine Learning

Plagiarism Detection as a Plagiarism Detection as a Problem of Machine LearningProblem of Machine Learning

Academician Yuri I. ZhuravlevCorrespondent member of RAS Konstantin V. Rudakov

Gleb V. Nikitov

Computing Center of Russian Academy of Sciences

Forecsys Corporation

Page 2: Plagiarism Detection as a Problem of Machine Learning

About the problemAbout the problem

Detect citing in students’ papers

Do it quickly and conveniently Do it qualitatively and with

substantiation

Page 3: Plagiarism Detection as a Problem of Machine Learning

Decisions

Turnitin Mydropbox www.antiplagiat.ru

Page 4: Plagiarism Detection as a Problem of Machine Learning

`Working scheme

Paper

Instructor AntiplagiatCollection ofdocuments

Page 5: Plagiarism Detection as a Problem of Machine Learning

Searching domainSearching domainInternet

Page 6: Plagiarism Detection as a Problem of Machine Learning

Already usingAlready usingHigher School of EconomicsMoscow Institute of Economics, Management and LawMoscow Pedagogical State UniversityMoscow Municipal Psychological and Pedagogical InstituteNizhni Novgorod State UniversityAcademy of Budget and Treasury of the Russian Ministry of Finance

Page 7: Plagiarism Detection as a Problem of Machine Learning

Non-educational use

Higher Certifying Commission Russian State Library (ex-named

after Lenin)

Page 8: Plagiarism Detection as a Problem of Machine Learning

NegotiationsNegotiations

Moscow State University Moscow Physical and Technical

Institute Russian Academy of Justice International Academy of Enterprise

Page 9: Plagiarism Detection as a Problem of Machine Learning

Quality and PerformanceQuality and Performance

Leading positions in operating speed not affecting quality of the results

70 thousands of registered users Generating about 20 thousands

originality reports every day Continual improvement of searching

algorithms and expanding functionality

Page 10: Plagiarism Detection as a Problem of Machine Learning

Plagiarism, what is it?

Page 11: Plagiarism Detection as a Problem of Machine Learning

Formulation of problem Permissible objects:

Descriptive functions:

Fixed set of functions:

1 2| {1,..., } , | 1, ,i i iS i N Fr Fr i N

:| DDDescr

nSSDD |)()( 00

Page 12: Plagiarism Detection as a Problem of Machine Learning

Formulation of problem

The problem:

Initial information:

Final information:

fiA :

)(0 Di

1,...,1,0 kf

Page 13: Plagiarism Detection as a Problem of Machine Learning

Formulation of problem

Precedent information:

Precedent conditions:

1 1, ,..., , , , , 1,q q j jfS An S An где S An j q

0

1,...,

j j

qj A D S An

Page 14: Plagiarism Detection as a Problem of Machine Learning

Formulation of problem

Transitive and reflective relation :

Example:

0 0 0 0 0 01 2 1 2 1 2 1 2

1,...,, , ,i i

NFr Fr and Fr Fr i Fr Fr Fr Fr

)(),(min

,,

21

2121 FrLFrL

FrFrWLFrFr

22

12

12

11

22

21

12

11 ,,,, FrFrFrFrFrFrFrFr

Page 15: Plagiarism Detection as a Problem of Machine Learning

Formulation of problem

Additional conditions:

1 1 2 2

1 1 2 2

1 2 1 2 1 21,...,

0 01 2 1 2

, , ,

, ,

i i i i

N

i i i i

i i Fr Fr Fr Fr

A D Fr Fr A D Fr Fr

Page 16: Plagiarism Detection as a Problem of Machine Learning

Criteria

Solvability criteriaFor existence of the correct algorithm A it is

necessary and sufficient that the following conditions are met:

1 2 1 1 2 2 1 20 01 2 1 2

{1,..., } {1,..., ], : , : &j j i j i j i i

q Nj j An An i i S S S S D S D S

2121:21},...,1{

jjjj

qAnAnSSjj

Page 17: Plagiarism Detection as a Problem of Machine Learning

Criteria

Regularity:Definition (according to Zhuravlev). The problem Z is regular if all the problems with arbitrary final information are simultaneously solvable

Regularity criteria:For a problem to be regular it is necessary and

sufficient that the following conditions are met:

2121

},...,1{

jj

qSSjj

212211 0021

},...,1{21

},...,1{&:,: iijiji

NqSDSDSSSSiijj

Page 18: Plagiarism Detection as a Problem of Machine Learning

Criteria

Monotonous solvability criteria:For monotonous solvability of the problem it is

necessary and sufficient that the conditions of solvability criteria are met and are also met the following conditions:

1 2 1 1 2 2

1 2

1 2 1 2{1,..., } {1,..., }

0 0

: &j j i j i j

q N

i i

j j An An i i S S S S

D S D S

Page 19: Plagiarism Detection as a Problem of Machine Learning

Criteria

Monotonous regularity criteria:For monotonous solvability of the problem it is

necessary and sufficient that are met the following conditions:

2121

},...,1{

jj

qSSjj

212211 0021

},...,1{21

},...,1{||&:,, iijiji

NqSDSDSSSSiijj

Page 20: Plagiarism Detection as a Problem of Machine Learning

Criteria

Supercompleteness:The family of algorithms M is called supercomplete in the described class of problems if for each problem Z from the set of solvable problems there exist in M at least one correct algorithm.

S] [Z Z

Page 21: Plagiarism Detection as a Problem of Machine Learning

Criteria

Completeness:The family of algorithms M is called complete in the described class of problems if for each problem Z from the set of regular problems there exist in M at least one correct algorithm.

R] [Z Z

Page 22: Plagiarism Detection as a Problem of Machine Learning

Criteria

Supercompleteness criteria:For the family of algorithmic

operators to be supercomplete it is necessary and sufficient that the following conditions are met:

0M

1 2 1 20 0

1 2{1,..., }

, : :i i i i

Ni i S S B B D S B D S

0M

Page 23: Plagiarism Detection as a Problem of Machine Learning

Criteria

Completeness criteria:For the family of algorithmic

operators to be complete it is necessary and sufficient that the following conditions are met:

0M

1 2 2 1 1 20 0

1 2{1,..., }, : & :i i i i i i

Ni i S S S S B B D S B D S

0M