algorithms and their applications cs2004 (2012-2013) dr stephen swift 17.1 exam revision
TRANSCRIPT
CS2004 Exam Revision Slide 2
IntroductionThis lecture is a revision lecture for the examination for this moduleThe exam is on Thursday 7th May at 2pm
Numerous locations…Please note I was sent a lot of topics for this revision lecture
We can only cover some of them I am afraid...Some of them were sent rather late! ;-)
In this lecture we are going to cover:The assessmentThe format of the examExam topicsBrief revision of some topicsSome example questions
CS2004 Exam Revision Slide 3
Assessment
The overall structure for the assessment is as follows:
60% Coursework/Worksheets These should now be done!The last chance for assessment is this weekAt the usual times…
40% Exam (pending)
CS2004 Exam Revision Slide 4
Exam
A three hour examThe location should now be available
This will consist of:A multiple choice part A worth 40%An essay-type question part B worth 60%
All questions should be attempted
CS2004 Exam Revision Slide 5
What to Revise – Part 1
The exam will cover the theoretical aspects of the moduleThere will be no programming needed
No questions on Java or EclipseHowever you may need to understand and/or write some pseudo code
CS2004 Exam Revision Slide 6
What to Revise – Part 2Topics could include (not restricted to):
Algorithmic conceptsWhat is an algorithm, a program, etc…
Time Complexity and Asymptotic NotationT(n) and O(n)
Data structuresStacks, lists, arrays, queues, etc…
Sorting AlgorithmsBubblesort, Quicksort, etc…
CS2004 Exam Revision Slide 7
What to Revise – Part 3Topics could include (not restricted to):
Graph Traversal AlgorithmsDepth First, Breadth First, A*, MST, etc…
SearchSearch, Search Space, Fitness, Parameter optimisation, etc...
Heuristic Search MethodsHC, SHC, RRHC, SA, ILS, etc...
Evolutionary Computation and Other Methods
Genetic Algorithms, PSO, ACO, etc...
ApplicationsBin Packing, Data Clustering, TSP, etc...
CS2004 Exam Revision Slide 8
Exam – Part A – Part 1
Multiple choice questions (40%)
Choose an answer from a list of options20 questions of 2 marks eachSpend about 3 minutes per questionUse the multiple choice answer sheet
CS2004 Exam Revision Slide 9
Exam – Part A – Part 2
Good strategies for multiple choice questions:
Read through all of the questions firstAnswer the ones that you know firstDo NOT spend too much time on a single questionOften ruling out answers can reduce the options down to a fewDo not leave any blank!
CS2004 Exam Revision Slide 10
Exam – Part B – Part 1
Essay type questions4 questions of 15 marksSpend approximately 25 minutes per questionWrite your answers in the answer book
CS2004 Exam Revision Slide 11
Exam – Part B – Part 2
Good strategies for essay type questions:
Read through all of the questions firstAnswer the ones that you know firstDo NOT spend too much time on a single questionSketching a draft answer can help in laying out complex answersCross out anything you do not want marked
CS2004 Exam Revision Slide 12
Requested TopicsThe following subjects were emailed to me for revision topicsNote that I can only briefly go over them today given the time…
Ant Colony OptimisationParticle Swarm OptimisationData ClusteringBig-O Notation
CS2004 Exam Revision Slide 13
Swarm Intelligence Algorithms
Study of collective behaviour of decentralised, self-organised systems
No central controlOnly simple rules for each individualMost Popular Algorithms
Ant Colony Optimisation (ACO)Particle Swarm Optimisation (PSO)
CS2004 Exam Revision Slide 14
ACO – Key ConceptsUsed for solving graph based problemsAnts (blind) navigate from nest to food sourcesThe shortest path is discovered via pheromone trailsIndirect coordination/communication between agents or actions
Stigmergy
Route selection is determined according to the amount of pheromone on each edgePheromone levels on edges is changed
Ants deposit pheromone on the edges they traversePheromone evaporates over time
CS2004 Exam Revision Slide 15
PSO – Key ConceptsUsed for solving parameter estimation/optimisation based problemsIt uses a number of agents (particles) that constitute a swarm moving around in the search space looking for the best solutionEach particle is treated as a point in an n-dimensional space which adjusts its “flying” according to its own flying experience as well as the flying experience of other particlesA record is kept of each particles position, personal best (location) and the global best (location)
Each particle is accelerated towards its personal best and the global best
CS2004 Exam Revision Slide 16
Data Clustering – Part 1Data Clustering is a common technique for data
analysis, which is used in many fields, including
machine learning, data mining, pattern recognition,
image analysis and bioinformaticsData Clustering is the process of arranging objects (as
points) into a number of sets (k) according to “distance” Each set (ideally) shares some common trait - often
similarity or proximity for some defined distance measureEach set will be referred to as a cluster/groupFor the purposes of this module, each set is mutually
exclusive, i.e. an item cannot be in more than one cluster
CS2004 Exam Revision Slide 17
Data Clustering – Part 2The data that we are clustering usually consists of a number of examples (rows) (n) where we have measured a number of features (variables) (m =3 in the example below)We want to cluster the rows together based on how similar their features areWe shall assume that the data we are clustering is a table or matrix X, where xi is the ith row of X and xij is the jth variable (feature) or row i
For example x92 is 2.9 in the table below:Sample Sepal Length in cm Sepal Width in cm Petal Length in cm
1 5.1 3.5 1.42 4.9 3.0 1.43 4.7 3.2 1.34 4.6 3.1 1.55 5.0 3.6 1.46 5.4 3.9 1.77 4.6 3.4 1.48 5.0 3.4 1.59 4.4 2.9 1.410 4.9 3.1 1.5Etc... 5.4 3.7 1.5
CS2004 Exam Revision Slide 18
Data Clustering – Part 3
Data
ClusteringMethod
DataSimilarity
Number ofClusters
ClusterWorth
CS2004 Exam Revision Slide 19
Representing a ClusterA cluster will be represented as a vector C where ci=j means that object/item/row i is in cluster jFor example C = {1,2,3,2,3,3} (k=3)Largest cluster
The number that appears the most in the vector/cluster
Number of clustersThe most frequent number in the vector/count the clusters
1
23
6Cluster 1
Cluster 2
Cluster 3
4 5
CS2004 Exam Revision Slide 20
Big-O NotationUsed to compare algorithms
Why?
Gives a long term, large input size, limit on the growth rate of the effort an algorithm takesWe measure an algorithms effort in terms of counting primitive operationsWe derive a function T(n) (exact count) from the pseudo code, parameterised in terms of the algorithms input sizeSimple rules are applied to T(n) to get O(n)
What are they?
We will go over a worked example later…
CS2004 Exam Revision Slide 21
Example Multiple Choice – Part 1
What is pseudo code?a) A Java programb) A programming language independent description of an algorithmc) The intermediate code produced by the Java compilerd) An algorithmic description of a computer programe) None of the above
CS2004 Exam Revision Slide 22
Example Multiple Choice – Part 2Algorithm A runs in time complexity O(n2), algorithm B in O(n3) and algorithm C in O(n2). Which of the following statements is true?
a) Algorithm B is always faster than Cb) Algorithm A and C are the samec) Algorithm A is slower than Algorithm Bd) Algorithm A and C are asymptotically similar in performancee) None of the above
CS2004 Exam Revision Slide 23
Example Multiple Choice – Part 3Which of the following data structures would be appropriate for storing a list of names and addresses?
a) A stackb) A queuec) An arrayd) A linked liste) It depends on the application
CS2004 Exam Revision Slide 24
Example Multiple Choice – Part 4Which of the following is not part of a graph:
a) An arrowb) A nodec) An edged) A vertexe) An arc
CS2004 Exam Revision Slide 25
Example Multiple Choice – Part 5What is the fastest time that the following list of number could be sorted in descending order:
{10,9,8,7,6,5,4,3,2,1}a) No time – it is already sortedb) O(n)c) O(nln(n))d) O(n2)e) It depends on the algorithm
CS2004 Exam Revision Slide 26
Example Multiple Choice – Part 6A* search can be used for:
a) Finding the shortest route between two stations on the undergroundb) Optimising the parameters of a Neural Networkc) Organising similar objects into mutually exclusive setsd) Creating compilation CDse) Decision Planning
CS2004 Exam Revision Slide 27
Example Multiple Choice – Part 7Which of the following Heuristic Search Methods is the odd one out?
a) Random Restart Hill Climbingb) Stochastic Hill Climbingc) Genetic Algorithmd) Hill Climbinge) Simulated Annealing
CS2004 Exam Revision Slide 28
Example Multiple Choice – Part 8Which of the following statements is true regarding bin packing and data clustering?
a) Bin packing is slower than data clusteringb) Bin packing is used on small sized objects whilst data clustering is used on large objectsc) Both algorithms arrange objects into groupsd) Bin packing is used on 1-dimensional data whilst data clustering can be used on n-dimensional datae) They perform the same function
CS2004 Exam Revision Slide 29
Example Essay Type – Part 1Consider the following algorithm:
Algorithm 1. MaxArray(A)Input: An n row by m column Array A1) Let max = element 1,1 (A(1,1)) of Array A2) For i = 1 to n3) For j = 1 to m4) If A(i,j) > max Then5) Let max = A(i,j)6) End If7) End For8) End ForOutput: max- the largest element in array A
CS2004 Exam Revision Slide 30
Example Essay Type – Part 2For a total of 15 marks:Describe the algorithm in words (not pseudo code)Compute T(n) for algorithm 1Compute O(n) for algorithm 1How would you modify the algorithm to create a new algorithm MinArray?
CS2004 Exam Revision Slide 31
Example Essay Type – Part 3The algorithm is designed to locate the largest value in an array, passed as a parameter. It assumes that the maximum is equal to the first element (1,1) and then systematically examines each element in turn, updating the maximum if the current item under scrutiny is larger than the maximum. This maximum value is then returned by the algorithm.
CS2004 Exam Revision Slide 32
Example Essay Type – Part 4For T(n) count all variables reads, writes and operators, note we have two input sizes n and m
Algorithm 1. MaxArray(A)Input: An n row by m column Array A1) Let max = element 1,1 (A(1,1)) of Array A 22) For i = 1 to n n 3) For j = 1 to m n (m)4) If A(i,j) > max Then n m (5) [Assume
worse]5) Let max = A(i,j) n m (4)6) End If none7) End For none8) End For noneOutput: max- the largest element in array A none
CS2004 Exam Revision Slide 33
Example Essay Type – Part 5For T(n) continued...
2nnm5nm4nmT(n) = 10nm + n + 2
O(n) = nmTo create algorithm MinArray
We change the > on line 4 to a <We would also rename the algorithm name and results variable
CS2004 Exam Revision Slide 34
Example Essay Type – Part 6
B.3.1) Describe the main similarities and differences between a Genetic Algorithm (GA) and an Evolutionary Program (EP). [4 marks]
Taken from Part B of the CS2004 past exam paper 2011-2012
CS2004 Exam Revision Slide 35
Example Essay Type – Part 7Both maintain a populationOnly a GA has crossoverBoth have mutation but an EP has a much more complex mutation operator
All individual mutate in an EPOnly some in a GA
Both have selectionHowever a GA usually uses the Roulette Wheel which allows an individual to be selected zero or more timesAn EP uses Tournament Selection which allows an individual to be selected zero times or once