my research paper on data structures

21
A Research In Data Structures (Trees and Graph) Submitted by: Ryan Paolo Salvador Submitted to: Mrs. Elsa Salvador

Upload: ryan-paolo-salvador

Post on 17-Nov-2015

4 views

Category:

Documents


3 download

DESCRIPTION

My personal research paper

TRANSCRIPT

A ResearchInData Structures(Trees and Graph)

Submittedby:

Ryan Paolo Salvador

Submittedto:

Mrs. Elsa Salvador

October 9, 2013

Trees and indexes

The tree is one of the most powerful of the advanced data structures and it often pops up in even more advanced subjects such as AI and compiler design. Surprisingly though the tree is important in a much more basic application - namely the keeping of an efficient index. Whenever you use a database there is a 99% chance that an index is involved somewhere. The simplest type of index is a sorted listing of the key field. This provides a fast lookup because you can use a binary search to locate any item without having to look at each one in turn.The trouble with a simple ordered list only becomes apparent once you start adding new items and have to keep the list sorted - it can be done reasonably efficiently but it takes some advanced juggling. A more important defect in these days of networking and multi-user systems is related to the file locking properties of such an index. Basically if you want to share a linear index and allow more than one user to update it then you have to lock the entire index during each update. In other words a linear index isn't easy to share and this is where trees come in - I suppose you could say that trees are shareable.Tree ecology

A tree is a data structure consisting of nodes organised as a hierarchy - see Figure 1.

Tree1

Figure 1: Some tree jargon

There is some obvious jargon that relates to trees and some not so obvious both are summarised in the glossary and selected examples are shown in Figure 1.I will try to avoid overly academic definitions or descriptions in what follows but if you need a quick definition of any term then look it up in the glossary.Binary trees

A worthwhile simplification is to consider only binary trees. A binary tree is one in which each node has at most two descendants - a node can have just one but it can't have more than two.Clearly each node in a binary tree can have a left and/or a right descendant. The importance of a binary tree is that it can create a data structure that mimics a "yes/no" decision making process.For example, if you construct a binary tree to store numeric values such that each left sub-tree contains larger values and each right sub-tree contains smaller values then it is easy to search the tree for any particular value. The algorithm is simply a tree search equivalent of a binary search:start at the root

REPEAT until you reach a terminal node IF value at the node = search value THEN found IF value at node < search value THEN move to left descendant ELSE move to right descendantEND REPEAT

Of course if the loop terminates because it reaches a terminal node then the search value isn't in the tree, but the fine detail only obscures the basic principles.

The next question is how the shape of the tree affects the efficiency of the search. We all have a tendency to imagine complete binary trees like the one in Figure 2a and in this case it isn't difficult to see that in the worst case a search would have to go down the to the full depth of the tree. If you are happy with maths you will know that if the tree in Figure 2a contains n items then its depth is log2 n and so at best a tree search is as fast as a binary search. Tree2a

Figure 2a: The "perfect" binary tree .

The worst possible performance is produced by a tree like that in Figure 2b. In this case all of the items are lined up on a single branch making a tree with a depth of n. The worst case search of such a tree would take n compares which is the same as searching an unsorted linear list.So depending on the shape of the tree search efficiency varies from a binary search of a sorted list to a linear search of an unsorted list. Clearly if it is going to be worth using a tree we have to ensure that it is going to be closer in shape to the tree in Figure 2a than that in 2b. Tree2b

Figure 2b: This may be an extreme binary tree but it still IS a binary tree

Common Uses of Trees.1. Manipulate hierarchical data.2. Make information easy to search (Traversal)3. Manipulate sorted lists of data.4. Acts as a workflow for compositing digital images for visual effects.5. Router algorithms

How trees are represented in memory

Binary TreeAbinary treeis atree data structurein which each node has at most twochild nodes, usually distinguished as "left" and "right". Nodes with children areparent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to the "root" node (the ancestor of all nodes), if it exists. Any node in the data structure can be reached by starting at root node and repeatedly following references to either the left or right child. A tree which does not have any node other than root node is called a null tree. In a binary tree, a degree of every node is maximum two. A tree withnnodes has exactlyn1 branches or degree.Binary trees are used to implementbinary search treesandbinary heaps, finding applications in efficient searching andsorting algorithms.

Adirected edgerefers to the link from theparentto thechild(the arrows in the picture of the tree). Theroot nodeof a tree is thenodewith no parents. There is at most one root node in a rooted tree. Aleaf nodehas no children. Thedepthof a node is the length of the path from the root to the node. The set of all nodes at a given depth is sometimes called alevelof the tree. The root node is at depth zero. Thedepth(orheight) of a tree is the length of the path from the root to the deepest node in the tree. A (rooted) tree with only one node (the root) has a depth of zero. Siblingsare nodes that share the same parent node. A node p is anancestorof a node q if it exists on the path from the root to node q. The node q is then termed as adescendantof p. Thesizeof a node is the number of descendants it has including itself. In-degreeof a node is the number of edges arriving at that node. Out-degreeof a node is the number of edges leaving that node. The root is the only node in the tree with In-degree = 0. All the leaf nodes have Out-degree = 0.BINARY SEARCH TREEA binary search tree(BST), sometimes also called anorderedorsorted binary tree, is anode-basedbinary treedata structure which has the following properties:[1] The leftsubtreeof a node contains only nodes with keys less than the node's key. The right subtree of a node contains only nodes with keys greater than the node's key. The left and right subtree must each also be a binary search tree. There must be no duplicate nodes.Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes, nodes are compared according to their keys rather than any part of their associated records.The major advantage of binary search trees over otherdata structuresis that the relatedsorting algorithmsandsearch algorithmssuch asin-order traversalcan be very efficient.Binary search trees are a fundamental data structure used to construct more abstract data structures such assets,multisets, andassociative arrays.

TREE TRAVERSALS

Preorder traversal: To traverse a binary tree in Preorder, following operations are carried-out (i) Visit the root, (ii) Traverse the left subtree, and (iii) Traverse the right subtree.Therefore, the Preorder traversal of the above tree will outputs:7, 1, 0, 3, 2, 5, 4, 6, 9, 8, 10

Inorder traversal: To traverse a binary tree in Inorder, following operations are carried-out (i) Traverse the left most subtree starting at the left external node, (ii) Visit the root, and (iii) Traverse the right subtree starting at the left external node.Therefore, the Inorder traversal of the above tree will outputs:0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Postorder traversal: To traverse a binary tree in Postorder, following operations are carried-out (i) Traverse all the left external nodes starting with the left most subtree which is then followed by bubble-up all the internal nodes, (ii) Traverse the right subtree starting at the left external node which is then followed by bubble-up all the internal nodes, and (iii) Visit the root.Therefore, the Postorder traversal of the above tree will outputs:0, 2, 4, 6, 5, 3, 1, 8, 10, 9, 7

The binary tree is a fundamental data structure used in computer science. The binary tree is a useful data structure for rapidly storing sorted data and rapidly retrieving stored data. A binary tree is composed of parent nodes, or leaves, each of which stores data and also links to up to two other child nodes (leaves) which can be visualized spatially as below the first node with one placed to the left and with one placed to the right. It is the relationship between the leaves linked to and the linking leaf, also known as the parent node, which makes the binary tree such an efficient data structure. It is the leaf on the left which has a lesser key value (i.e., the value used to search for a leaf in the tree), and it is the leaf on the right which has an equal or greater key value. As a result, the leaves on the farthest left of the tree have the lowest values, whereas the leaves on the right of the tree have the greatest values. More importantly, as each leaf connects to two other leaves, it is the beginning of a new, smaller, binary tree. Due to this nature, it is possible to easily access and insert data in a binary tree using search and insert functions recursively called on successive leaves.

The typical graphical representation of a binary tree is essentially that of an upside down tree. It begins with a root node, which contains the original key value. The root node has two child nodes; each child node might have its own child nodes. Ideally, the tree would be structured so that it is a perfectly balanced tree, with each node having the same number of child nodes to its left and to its right. A perfectly balanced tree allows for the fastest average insertion of data or retrieval of data. The worst case scenario is a tree in which each node only has one child node, so it becomes as if it were a linked list in terms of speed. The typical representation of a binary tree looks like the following: 10 / \ 6 14 / \ / \ 5 8 11 18

The node storing the 10, represented here merely as 10, is the root node, linking to the left and right child nodes, with the left node storing a lower value than the parent node, and the node on the right storing a greater value than the parent node. Notice that if one removed the root node and the right child nodes, that the node storing the value 6 would be the equivalent a new, smaller, binary tree.

The structure of a binary tree makes the insertion and search functions simple to implement using recursion. In fact, the two insertion and search functions are also both very similar. To insert data into a binary tree involves a function searching for an unused node in the proper position in the tree in which to insert the key value. The insert function is generally a recursive function that continues moving down the levels of a binary tree until there is an unused leaf in a position which follows the rules of placing nodes. The rules are that a lower value should be to the left of the node, and a greater or equal value should be to the right. Following the rules, an insert function should check each node to see if it is empty, if so, it would insert the data to be stored along with the key value (in most implementations, an empty node will simply be a NULL pointer from a parent node, so the function would also have to create the node). If the node is filled already, the insert function should check to see if the key value to be inserted is less than the key value of the current node, and if so, the insert function should be recursively called on the left child node, or if the key value to be inserted is greater than or equal to the key value of the current node the insert function should be recursively called on the right child node. The search function works along a similar fashion. It should check to see if the key value of the current node is the value to be searched. If not, it should check to see if the value to be searched for is less than the value of the node, in which case it should be recursively called on the left child node, or if it is greater than the value of the node, it should be recursively called on the right child node. Of course, it is also necessary to check to ensure that the left or right child node actually exists before calling the function on the node.Because binary trees have log (base 2) n layers, the average search time for a binary tree is log The struct has the ability to store the key_value and contains the two child nodes which define the node as part of a tree. In fact, the node itself is very similar to the node in a linked list. A basic knowledge of the code for a linked list will be very helpful in understanding the techniques of binary trees. Essentially, pointers are necessary to allow the arbitrary creation of new nodes in the tree.

It is most logical to create a binary tree class to encapsulate the workings of the tree into a single area, and also making it reusable. The class will contain functions to insert data into the tree and to search for data. Due to the use of pointers, it will be necessary to include a function to delete the tree in order to conserve memory after the program has finished.class btree{ public: btree(); ~btree();

void insert(int key); node *search(int key); void destroy_tree();

private: void destroy_tree(node *leaf); void insert(int key, node *leaf); node *search(int key, node *leaf); node *root;};The insert and search functions that are public members of the class are designed to allow the user of the class to use the class without dealing with the underlying design. The insert and search functions which will be called recursively are the ones which contain two parameters, allowing them to travel down the tree. The destroy_tree function without arguments is a front for the destroy_tree function which will recursively destroy the tree, node by node, from the bottom up.

The code for the class would look similar to the following:btree::btree(){ root=NULL;}

The destroy_tree function will set off the recursive function destroy_tree shown below which will actually delete all nodes of the tree.void btree::destroy_tree(node *leaf){ if(leaf!=NULL) { destroy_tree(leaf->left); destroy_tree(leaf->right); delete leaf; }}

The function destroy_tree goes to the bottom of each part of the tree, that is, searching while there is a non-null node, deletes that leaf, and then it works its way back up. The function deletes the leftmost node, then the right child node from the leftmost node's parent node, then it deletes the parent node, then works its way back to deleting the other child node of the parent of the node it just deleted, and it continues this deletion working its way up to the node of the tree upon which delete_tree was originally called. In the example tree above, the order of deletion of nodes would be 5 8 6 11 18 14 10. Note that it is necessary to delete all the child nodes to avoid wasting memory.

void btree::insert(int key, node *leaf){ if(key< leaf->key_value) { if(leaf->left!=NULL) insert(key, leaf->left); else { leaf->left=new node; leaf->left->key_value=key; leaf->left->left=NULL; //Sets the left child of the child node to null leaf->left->right=NULL; //Sets the right child of the child node to null } } else if(key>=leaf->key_value) { if(leaf->right!=NULL) insert(key, leaf->right); else { leaf->right=new node; leaf->right->key_value=key; leaf->right->left=NULL; //Sets the left child of the child node to null leaf->right->right=NULL; //Sets the right child of the child node to nullThe case where the root node is still NULL will be taken care of by the insert function that is nonrecursive and available to non-members of the class. The insert function searches, moving down the tree of children nodes, following the prescribed rules, left for a lower value to be inserted and right for a greater value, until it finds an empty node which it creates using the 'new' keyword and initializes with the key value while setting the new node's child node pointers to NULL. After creating the new node, the insert function will no longer call itself.node *btree::search(int key, node *leaf){ if(leaf!=NULL) { if(key==leaf->key_value) return leaf; if(keykey_value) return search(key, leaf->left); else return search(key, leaf->right); } else return NULL;}

The search function shown above recursively moves down the tree until it either reaches a node with a key value equal to the value for which the function is searching or until the function reaches an uninitialized node, meaning that the value being searched for is not stored in the binary tree. It returns a pointer to the node to the previous instance of the function which called it, handing the pointer back up to the search function accessible outside the class.void btree::insert(int key){ if(root!=NULL) insert(key, root); else { root=new node; root->key_value=key; root->left=NULL; root->right=NULL; }}

GraphAgraphis a mathematical structure consisting of a set of vertices (also called nodes)and a set of edges. An edge is a pair of vertices. The two vertices are called the edgeendpoints. Graphs are ubiquitous in computer science. They are used to model real-world systems such as the Internet (each node represents a router and each edge represents a connection between routers); airline connections (each node is an airport and each edge is a flight); or a city road network (each node represents an intersection and each edge represents a block). The wireframe drawings in computer graphics are another example of graphs.A graph may be eitherundirectedordirected. Intuitively, an undirected edge models a "two-way" or "duplex" connection between its endpoints, while a directed edge is a one-way connection, and is typically drawn as an arrow. A directed edge is often called anarc. Mathematically, an undirected edge is an unordered pair of vertices, and an arc is an ordered pair. For example, a road network might be modeled as a directed graph, with one-way streets indicated by an arrow between endpoints in the appropriate direction, and two-way streets shown by a pair of parallel directed edges going both directions between the endpoints. You might ask, why not use a singleundirectededge for a two-way street. There's no theoretical problem with this, but from a practical programming standpoint, it's generally simpler and less error-prone to stick with all directed or all undirected edges.An undirected graph can have at mostedges (one for each unordered pair), while a directed graph can have at mostedges (one per ordered pair). A graph is calledsparseif it has many fewer than this many edges (typicallyedges), anddenseif it has closer toedges. Amultigraphcan have more than one edge between the same two vertices. For example, if one were modeling airline flights, there might be multiple flights between two cities, occurring at different times of the day.Apathin a graph is a sequence of verticessuch that there exists an edge or arc between consecutive vertices. The path is called acycleif. An undirected acyclic graph is equivalent to an undirected tree. A directed acyclic graph is called aDAG. It is not necessarily a tree.Nodes and edges often have associated information, such aslabelsorweights. For example, in a graph of airline flights, a node might be labeled with the name of the corresponding airport, and an edge might have a weight equal to the flight time. The popular game "Six Degrees of Kevin Bacon" can be modeled by a labeled undirected graph. Each actor becomes a node, labeled by the actor's name. Nodes are connected by an edge when the two actors appeared together in some movie. We can label this edge by the name of the movie. Deciding if an actor is separated from Kevin Bacon by six or fewer steps is equivalent to finding a path of length at most six in the graph between Bacon's vertex and the other actors vertex. (This can be done with the breadth-first search algorithm found in the companionAlgorithmsbook. The Oracle of Bacon at the University of Virginia has actually implemented this algorithm and can tell you the path from any actor to Kevin Bacon in a few clicks.)Uses of Graph

Applications of graph theory are primarily, but not exclusively, concerned with labeled graphs and various specializations of these.

Structures that can be represented as graphs are ubiquitous, and many problems of practical interest can be represented by graphs. The link structure of a website could be represented by a directed graph: the vertices are the web pages available at the website and a directed edge from page A to page B exists if and only if A contains a link to B. A similar approach can be taken to problems in travel, biology, computer chip design, and many other fields. The development of algorithms to handle graphs is therefore of major interest in computer science. There, the transformation of graphs is often formalized and represented by graph rewrite systems. They are either directly used or properties of the rewrite systems(e.g. confluence) are studied.

A graph structure can be extended by assigning a weight to each edge of the graph. Graphs with weights, or weighted graphs, are used to represent structures in which pairwise connections have some numerical values. For example if a graph represents a road network, the weights could represent the length of each road. A digraph with weighted edges in the context of graph theory is called a network.

Networks have many uses in the practical side of graph theory, network analysis (for example, to model and analyze traffic networks). Within network analysis, the definition of the term "network" varies, and may often refer to a simple graph.

Many applications of graph theory exist in the form of network analysis. These split broadly into three categories. Firstly, analysis to determine structural properties of a network, such as the distribution of vertex degrees and the diameter of the graph. A vast number of graph measures exist, and the production of useful ones for various domains remains an active area of research. Secondly, analysis to find a measurable quantity within the network, for example, for a transportation network, the level of vehicular flow within any portion of it. Thirdly, analysis of dynamical properties of networks.

Graph theory is also used to study molecules in chemistry and physics. In condensed matter physics, the three dimensional structure of complicated simulated atomic structures can be studied quantitatively by gathering statistics on graph-theoretic properties related to the topology of the atoms. For example, Franzblau's shortest-path (SP) rings. In chemistry a graph makes a natural model for a molecule, where vertices represent atoms and edges bonds. This approach is especially used in computer processing of molecular structures, ranging from chemical editors to database searching.

Graph theory is also widely used in sociology as a way, for example, to measure actors' prestige or to explore diffusion mechanisms, notably through the use of social network analysis software.

How are Graphs Represented in Memory?

Graph CreationThe following program constructs the graph shown in the introduction using theintuitive representation, MBgraph1, and then enumerates the vertices, neighbours and edges:#include

#include

intmain(void){MBgraph1 *graph;MBvertex *vertex;MBvertex *A, *B, *C, *D, *E;MBiterator *vertices, *edges;MBedge *edge;

/*Create a graph*/graph = MBgraph1_create();

/*Add vertices*/A = MBgraph1_add(graph,"A",NULL);B = MBgraph1_add(graph,"B",NULL);C = MBgraph1_add(graph,"C",NULL);D = MBgraph1_add(graph,"D",NULL);E = MBgraph1_add(graph,"E",NULL);

/*Add edges*/MBgraph1_add_edge(graph, A, B);MBgraph1_add_edge(graph, A, D);MBgraph1_add_edge(graph, B, C);MBgraph1_add_edge(graph, C, B);MBgraph1_add_edge(graph, D, A);MBgraph1_add_edge(graph, D, C);MBgraph1_add_edge(graph, D, E);

/*Display*/printf("Vertices (%d) and their neighbours:\n\n", MBgraph1_get_vertex_count(graph));vertices = MBgraph1_get_vertices(graph);while((vertex = MBiterator_get(vertices))) {MBiterator *neighbours;MBvertex *neighbour;unsignedintn =0;printf("%s(%d): ", MBvertex_get_name(vertex), MBgraph1_get_neighbour_count(graph, vertex));neighbours = MBgraph1_get_neighbours(graph, vertex);while((neighbour = MBiterator_get(neighbours))) {printf("%s", MBvertex_get_name(neighbour));if(n < MBgraph1_get_neighbour_count(graph, vertex) -1) {fputs(", ",stdout);}n++;}putchar('\n');MBiterator_delete(neighbours);}putchar('\n');MBiterator_delete(vertices);printf("Edges (%d):\n\n", MBgraph1_get_edge_count(graph));edges = MBgraph1_get_edges(graph);while((edge = MBiterator_get(edges))) {printf("\n", MBvertex_get_name(MBedge_get_from(edge)), MBvertex_get_name(MBedge_get_to(edge)));}putchar('\n');MBiterator_delete(edges);

/*Delete*/MBgraph1_delete(graph);Return 0;}

Graph Traversal

Graph traversalis the problem of visiting all the nodes in agraphin a particular manner, updating and/or checking their values along the way.Tree traversalis a special case of graph traversal.Unlike tree traversal, graph traversal may require that some nodes be visited more than once, since it is not necessarily known before transitioning to a node that it has already been explored. As graphs become moredense, this redundancy becomes more prevalent, causing computation time to increase; as graphs become more sparse, the opposite holds true.Thus, it is usually necessary to remember which nodes have already been explored by the algorithm, so that nodes are revisited as infrequently as possible (or in the worst case, to prevent the traversal from continuing indefinitely). This may be accomplished by associating each node of the graph with a "color" or "visitation" state during the traversal, which is then checked and updated as the algorithm visits each node. If the node has already been visited, it is ignored and the path is pursued no further; otherwise, the algorithm checks/updates the node and continues down its current path.Several special cases of graphs imply the visitation of other nodes in their structure, and thus do not require that visitation be explicitly recorded during the traversal. An important example of this is a tree, during a traversal of which it may be assumed that all "ancestor" nodes of the current node (and others depending on the algorithm) have already been visited. Both the depth-first and breadth-first graph searches are adaptations of tree-based algorithms, distinguished primarily by the lack of a structurally determined "root" node and the addition of a data structure to record the traversal's visitation state.