triple-triple rdf store with greedy graph based grouping

12
Title: Triple-Triple RDF store with Greedy Graph Based Grouping Name: Vinoth Chandar ( [email protected] ) Partner's Name: - RDBMS Platform: MySql 5.0

Upload: vinoth-chandar

Post on 18-Dec-2014

1.004 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Triple-Triple RDF Store with Greedy Graph based Grouping

Title: Triple-Triple RDF store with Greedy Graph Based GroupingName: Vinoth Chandar ( [email protected] )Partner 's Name: -RDBMS Platform: MySql 5.0

Page 2: Triple-Triple RDF Store with Greedy Graph based Grouping

Triple-Triple RDF Store with Greedy Graph Based GroupingAbstractTriple stores, based on relational databases, have received wide attention amongst database researchers and Semantic web enthusiasts . Improving the query performance on very large RDF datasets is a challenging problem, which needs to be addressed for efficient implementation of such triple stores. In this paper, we explore promising approaches to address the problem. We explore the possibili ty of storing the RDF triples in different orders in three tables and develop a query rewriting scheme for the same. We also look at optimization of the physical schema by graph clustering techniques that aim to bring related triples closer to each other on disk. We also present experimental results from the implementation of the scheme over a million triples. Our results show that our scheme can yield significant performance benefits on complex queries.

1. IntroductionIn the recent years, RDF[1] stores or Triple stores, that can store information about (subject, property, object) triples of Ontologies, have received significant attention from database researchers. Many efforts have been made to implement RDF stores using relational databases, and devising efficient schemes for accessing information from such stores. These efforts are focussed towards the larger vision of the 'Semantic Web'. To realize this vision, RDBMS based Triple stores should be able to store , and query enormous amounts of triples that describe web pages on the Internet.

The Billion Triple challenge [2] serves as a common

platform, on which the state of the art is evaluated and its progress towards the vision is assessed. Efficiently querying such Triple stores presents many challenges. Since RDF is essentially a graph based data format, queries involve multiple joins and become really slow when scaling to billion triples. If the 'Semantic Web' vision is to be incorporated, then we require very fast query retrieval techniques since the long response times of these systems would be unacceptable to a normal Internet user.

In this paper, we explore promising new ideas for Triple store implementation. In particular, we take the Triple-Triple idea (explained later) to its logical conclusion, and develop SPARQL to SQL query rewriting mechanisms for the same. We further enhance the Triple-Triple idea by introducing a computationally feasible clustering scheme that attempts to reduce the number of disk pages accessed, by moving related subjects/objects/properties closer to each other on disk. In fact, this clustering scheme can be applied to any general indexing scheme for the Triple stores.

Section 2 details related work in this area. Section 3 presents various approaches that were considered for improving the query performance. Section 4 presents a query rewriting technique corresponding to the Triple-Triple idea. Section 5 identifies and analyzes the benefits of grouping related triples in the same data block, to reduce the number of disk IO operations. Section 6 presents the experimental results and Section 7 concludes.

2. Related Work[3] establishes the validity of using relational databases to store and query ontologies. The paper extends SQL with a set of ontology related operators, that can help obtain more

Page 3: Triple-Triple RDF Store with Greedy Graph based Grouping

pertinent results for ontology driven applications . At the same time, the applications can also benefit from the efficient storage and retrieval mechanisms in a RDBMS. Simplistically, the baseline physical model for storing ontologies in RDF format, using a RDBMS consists of two tables Symbol table and a Triple table (refer figure 1). An– ontology describing the elements of the Web, contains URLs and URIs, which are long strings (lex field in SymbolTable). To avoid redundancy and wastage of disk space, these elements are assigned unique integer identifiers (hash field in SymbolTable). This mapping from the element to the identifiers is stored in the Symbol table. The Triples table has three columns s (subject), p (property), o (object) as– – per RDF conventions and each tuple in the table represents a RDF triple. The table has a compound primary key on all the three columns. Such a naive representation of the triples, enables us to analyze clearly, where the benefits come from, when evaluating much more sophisticated physical schema.

Many research efforts have attempted to propose alternate physical schema and improved SPARQL to SQL query rewriting techniques, to improve query performance over the baseline model. This is based on the realization that the baseline model can be used as a simple logical data model alone. Kowari metastore [4] proposes a RDF store based on AVL trees with each triple stored in three different orders –

spo, osp , pos, to help lookups based on each of the three elements in the Triple. However, it works with its own query language rather than using the general purpose SPARQL and RDQL. Adabi et al [5] pursue an interesting path, arguing towards having a table for each property, with the associated subject and object that are related by that property. Such a vertical partitioning approach tends to reduce the query response time by incorporating fast linear merge joins, when each table is sorted by the subject or object. However, such an approach inherently assumes that the queries are property bound. A non-property bound query would require us to query across all the tables. Hexastore [6] furthers the multiple indexing approach taken by Kowari, by storing the three elements of a triple, in six different orders. For example, the spo ordering is stored as a sorted list of subjects, with each subject pointing to another sorted list of properties defined for that subject. Each property in such sorted properties list points to a sorted list of objects defined

for that subject, property pair. Thus, all joins can be converted into fast linear merge joins. Hexastore occupies five times more space than a single triples table. However, this is acceptable with the ever falling storage costs.

3. Promising DirectionsWe will now explore some promising directions in which we can further improvements from the baseline physical model.

Figure 1 Baseline Physical model

Page 4: Triple-Triple RDF Store with Greedy Graph based Grouping

We will detail each idea and also present some motivation for pursuing [or abandoning] the idea. We also present concrete and motivating examples for our arguments, using MySql.

3.1 Applicabil ity of Spatial IndexesPotentially faster query execution times can be achieved if the joins between the triples are materialized in euclidean

space, in terms of fast minimum bounding rectangle (MBR) operations. For example, simply storing each triple as a (s,p), (p,o) line segment will materialize subject-subject joins as a MBRcontains operation between the line segment (s,minhash) (s,maxhash) and the start point of each triple line segment. Minhash and maxhash are the minimum and maximum integer ids from the symbol table. However, the support for R-Tree operations remain limited in commercial DBMS. MySql does not support spatial joins. Postgresql does not support R-Trees. Only Oracle Enterprise, supports spatial joins. We were unable to pursue this direction further, due to non availability (or rather infeasibility) of an Oracle

installation.

3.2 Triple-TripleOne promising idea is to create three redundant copies of the triples table with the compound primary keys spo, pos,– osp, such that each table has tuples sorted on subject , property and object in the order of occurrence in the primary keys of the tables. Figure 2 presents the Triple-Triple

physical model. From here on, we will refer a table by its primary key i.e. spo table will denote the triples table with (s,p,o) as the compound primary key. For the spo table, the triples will be clustered based on subjects and then clustered on properties and then on objects, Such, a clustering ensures that the triples are stored in sorted order in disk and hence fast linear merge joins can be applied. Note that this scheme requires only 3 times extra space than a triples table, described in the baseline model. Hence, this approach is definitely promising. The primary compound index is also useful for any query that involves a prefix of the compound key. For example, spo table can answer select queries based

Figure 2 Triple­Triple Physical model

Page 5: Triple-Triple RDF Store with Greedy Graph based Grouping

on s, sp, spo columns, using the primary index.

Though MySql does not support merge joins [7], the idea still would yield faster execution times since the sorted nature of the data ensures efficient use of the index. Remember that, even with a secondary index, if the relevant tuples are spread across the table in different blocks, the end performance could be worse than a table scan. For the Semantic web vision, MySql plays a pivotal role since many web sites are powered by MySql. This gives enough motivation to continue exploring the idea using MySql. For example, a simple three table join on 25K triples, using Triple-Triple yields very encouraging benefits over the baseline model. In Figure 3, Triples_s denotes spo table, Triples_p denotes the pos table and Triples_o denotes the osp table. Triples tables denotes the baseline triples table. Nodes table will denote the symbol table. All following examples in the paper will use the same conventions to denote the corresponding tables.

In comparison to the Hexastore, the Triple-Triple stores only three possible orderings of the elements in a triple. We explore if these orderings are sufficient for answering the same range of queries answered by the Hexastore. The only possible joins are subject-subject joins, object-object joins, property-property joins and subject-object joins, with all joins being equijoins. Hence, mechanisms for using the Triple-Triple should judiciously choose which table to use

for each join operation spo or pos or osp. Also, we must be– able to support selects on any combination of the three triple elements. These decisions are listed in Table 1.

Operation Method

subject-subject join spo JOIN spo

property-property join pos JOIN pos

object-object join osp JOIN osp

subject-object join spo JOIN osp

Subject select spo

Property select pos

Object select osp

subject-property selects / property-subject select

spo [no need for a separate pso]

subject-object selects/ object-subject selects

osp [no need for a separate sop]

Object-property selects/ property-object selects

pos [no need for a separate ops]

Subject-property-object select

Any table

Table 1 : Answering queries using Triple-Triple

Hence, to our understanding, the three possible orderings – spo, pos, osp are sufficient for handling the same set of– queries that the Hexastore handles. The only missing piece in building a complete Triple-Triple store is to define mechanisms to convert sparql to sql queries, using the appropriate tables for each triple. We explore this problem in the next section.

3.3 Applicabil ity of Secondary IndexesIt is tempting to create secondary indexes on other columns ,

Figure 3  Benefits of Triple­Triple over baseline

mysql> select * from Triples_s t_s, Triples_o t_o , Triples_p t_p where t_s.s = t_o.o and t_s.s = t_p.p; Empty set (0.28 sec)

mysql> select * from Triples t_1, Triples t_2, Triples t_3 where t_1.s = t_2.o and t_1.s = t_3.p;Empty set (2 min 10.83 sec)

Page 6: Triple-Triple RDF Store with Greedy Graph based Grouping

that do not form a prefix of the primary key of the triples table (both in the baseline and the Triple-Triple idea). For example, the spo table can have secondary indexes on po and o, to speed up the selects on those columns. However, initial experiments showed no improvement in the query response times, since the bulk of the benefit was from the primary index or the values were split across multiple disk pages so that the secondary index did not make sense. For example, there are relatively very few properties in the data, when compared to subjects or objects. Hence, a secondary index would not be beneficial here. Also, along the similar lines, secondary index on the lex field of the symbol table did not yield significant benefits. Hence, we stick with the Triple-Triple baseline model.

4. Add-JoinConverting SPARQL to a SQL query on the baseline is straight forward. When deciding which tables to use for each triple in the Triple-Triple store, we can be faced with interesting tradeoffs. For example, consider the SPARQL query in figure 4.

The triples involved in the sparql query are marked t1, t2, t3 and t4. It is easy to observe that t1 joins t2,t3,t4 on the subject and t3 joins t4 using a subject-object join. Our task is to select one of the three triple tables spo, pos, osp – – judiciously so that all joins make use of the primary index and hence the fastest access path to access the relevant data

is adopted. Since all joins of t1 are subject joins, we safely choose spo table for t1. Since all the joins involving t4 are subject joins, we choose spo table for t4. However, t3 presents a difficulty. For the t1,t3 join to be efficient , we need to use spo table for t3. For the t3,t4 join to be efficient, we need to pick the osp table for t3. Clearly, only one of these options is possible. In general, a sparql to sql compiler for the Triple-Triple has to make these hard decisions dynamically during runtime. From our experiments, we noticed that the MySql optimizer (or any other DBMS) does not do a good job in choosing the right join order for the tables and substantial performance gains can be achieved by simply rewriting the query by explicitly specifying a join order. These are hard search problems and thus, even in the context of the Triple-Triple, the sparql compiler cannot be expected to do a very good job in optimizing the choice of tables.

Hence, we adopt a method we term as Add-Join, which tries to achieve the best of both worlds, by using multiple triples tables for a single triple in the SPARQL query. In effect, we add extra joins to the resulting sql query. But, as we show, the cost of such additional joins is no way prohibitive and can be traded off in return for a deterministic simple sparql-sql compilation technique. For example, in the above example, we use two tables for t3 triple t3_o [which is a– osp table] and t3_s [which is a spo table]. We join t1 with t3_s and t3_o with t4, and finally join t3_s and t3_o on all three columns. The final join is very fast since it involves all three columns, so that the primary index can be used. In effect, all of the joins in the query can use the clustered primary index. Figure 5 shows that the additional joins are not prohibitively expensive, using the same example. Also, when rewriting the query for the Triple-Triple, we must ensure that we have as few rows to start with as possible. Since, MySql uses a single-sweep multi join

Figure 4: Tradeoff in SQL conversion

SELECT ?label ?producer ?commentWHERE {t1   dataFromProducer9:Product444 rdfs:label ?label .t2   dataFromProducer9:Product444 rdfs:comment ?comment .t3   dataFromProducer9:Product444 bsbm:producer ?p .t4   ?p rdfs:label ?producer }

Page 7: Triple-Triple RDF Store with Greedy Graph based Grouping

algorithm [8], this would ensure that we try to match as few tuples as possible in each stage of the multi join.

Hence, in addition to joins, we can also involve multiple triple tables for a single triple, when there is an initial select operation on the triple. For eg: Though a triple joins on s, it might involve a select on p as a bound input value. In those cases, selecting p using a spo table may not be efficient. Hence, we introduce an additional pos table for the triple and perform the select upon it, and later joining the pos and spo tables. We now present the algorithm to convert SPARQL to SQL, based on Add-Join method.

Query Rewriting method:Step 0. Convert the SPARQL query to sql, on the baselineStep 1. Identify the triples that have bound values for its elements i.e. the input for the sql query.Step 2. In the explicit join order that we intend to provide, start with the triple with bound input values and follow it with triples that join with that triple. Step 3. For selects on the non join attributes, insert entries for 'Nodes' as necessary in the explicit order.Step 4. Once, the explicit ordering is done, introduce additional triple tables for each triple, as per the Add-Join

method described above. 5. Graph Based Grouping We observe that the further benefits can be achieved only through physical optimizations. One interesting observation we make is that the selects on the triples table can be improved if the related tuples are brought closer to each other on disk. For example consider the spo table with selects using po, p, o on it. This is a common scenario when triples with different properties are joined on the subject. Remember that we introduce additional joins only for the joins between triples and selects using bound input values.

By bringing the subjects with common properties and objects, closer to each other, we, in effect, reduce the number of disk pages across which the result of the select operation is distributed. Thus, such a scheme would result in direct reductions in the I/O cost for the operation. The same scenario can be applied to other two tables as well, bringing together related properties and objects respectively.

The integer identifiers assigned to the elements of the triples, determine the order in which the triples appear on disk. Right now, these ids are assigned randomly. Hence, we cannot use general purpose clustering schemes based on euclidean distances, to group the related triples together. Also, without the Triple-Triple, it would be impossible to give equal priority to each of subject, property and object. This is because the first element of the compound primary key determines the order on disk and hence a clustering scheme has to choose between the three elements. Another approach is to abandon the primary keys all together and define a physical representation that brings together related triples close to each other , based on all three columns. However, such a scheme would compromise on having data in sorted order. The Triple-Triple idea lends flexibility by allowing us to optimize with respect to subject, property and

Figure 5 Cost of additional joins

Add­Join :select * from Triples_s as t1 STRAIGHT_JOIN Triples_s as t3_s STRAIGHT_JOIN Triples_o as t3_o STRAIGHT_JOIN Triples_s as t4 where t1.s = ­1280067686087321383 and t1.s = t3_s.s and t3_o.o = t4.s and t3_s.s = t3_o.s and t3_s.p = t3_o.p and t3_s.o = t3_o.o;  0.2 sec

Use spo for t3: select * from Triples_s as t1 STRAIGHT_JOIN Triples_s as t3 STRAIGHT_JOIN Triples_s as t4 where t1.s = ­1280067686087321383 and t1.s = t3.s and t3.o = t4.s;0.17 sec

Page 8: Triple-Triple RDF Store with Greedy Graph based Grouping

object using the spo, pos and osp tables respectively. Hence, the integers Ids need to be assigned intelligently in order to leverage these benefits. We now define metrics which quantify the strength of the relationship between two elements. S-score, P-score and O-score denote the metrics for interrelating subjects, properties and objects respectively.

S-score(s1, s2) = Number of triple pairs t1,t2 with (t1.s=s1 and t2.s=s2) and ( t1.p = t2.p or t1.o = t2.o)

defines the S-score for two subjects s1 and s2. P-score and O-score are defined similarly. Once, we have these metrics computed, we build three graphs S-graph, P-graph, O-graph that depict relationships between subjects, properties , objects using S-score, P-score, O-score as edge weights respectively. The S-graph will have a vertices for each subject and S-score between two subjects as the edge weight. Note that no subject or object occurs as a property. Hence, the problem of assigning ids to properties can be solved independent of the other two elements. However, some subjects also occur as objects. But, there can be only one id that can be assigned to that element. We therefore prune out the O-graph by removing all vertices and edges corresponding to such overlapping subjects.

Each disk page can be viewed as a cluster or group and the problem of finding the most related subjects can be

formulated as an optimization problem as described in figure 6. The formulation can be suitably generalized to P-graphs and O-graphs also. The problem is an instance of the Knapsack constrained Maximum Weighted cluster problem [9].

The formulation aims to extract the cluster from the graph such that the sum of all the edge weights in the cluster is maximal, subject to the constraint that there can be at most B triples on a block. For MySql, which uses 16Kb data blocks, B = 16Kb/12 = 1300. Once, such a maximal cluster is extracted, we assign consecutive ids to all the elements of the cluster. We then repeat the algorithm, pruning the graph removing the assigned edges. However, in practice, this problem is NP hard and grows computationally unrealistic, with large data sets which involve thousands of subjects. The S-graph is also very dense and this complicates the solution further. For example, a 25K triples database, contains 2367 subjects, 200K edges. Hence, when we scale to a billion triples, the graph construction itself may become a very long process. There are other graph clustering tools such as MCL[10] and Graclus [11], for unsupervised learning from graphs. Though, these tools do not solve the exact problem as described above, they attempt to provide clusters from the graph based on connected components. Attempts at hierarchical clustering using the MCL algorithm, yields only 3 clusters, reflecting on the dense nature of these graphs and non applicability of standard graph clustering techniques.

Hence, we attempt to develop computationally feasible greedy algorithms for the problem. One such greedy algorithm is described in Figure 7. The algorithm greedily constructs parts of the relationship graphs and assigns identifiers based on such partial graphs. The algorithm closely approximates the optimal solution for certain parts of the graph. Nonetheless, it is suboptimal since we ignore the

Figure 6 : Optimal clustering of subjects

Let S denote a cluster and Si denote a subject belonging to S. Objective :Max : ∀i   j∀    S­score( Si, Sj ),i!= jΣConstraints:

 numTriples(Si) <= BΣwhere numTriples(Si) denote the number of triples with subject SiB denotes the number of triples per block

Page 9: Triple-Triple RDF Store with Greedy Graph based Grouping

strength of the relationships between the discovered subjects Si.

In section 6, we validate the effectiveness of this algorithm. Once again, the id assignment for properties can be done in a symmetric fashion. For the objects, we need to additionally ignore objects which are also subjects.

6. Empirical resultsWe present empirical results that demonstrate the validity of the approaches proposed in this paper. Specifically, we study the query performance compared to the baseline and the real benefits of the grouping technique described in the previous section. Our experimental setup is detailed in Table 2.

6.1 Query PerformanceFor each scheme that we evaluate, we define a metric called speed up to compare it against the baseline. Speed up is simply the query response time for the baseline divided by the query response time for the scheme for the same query. Higher the speed up better the scheme i.e. it determines how many magnitudes of performance improvement is achieved by the scheme. Figure 8 and 9 present the speed ups for three schemes Rewrite (simply rewriting the query by– explicitly specifying the join order), Add-Join , Add-Join with grouping. The results are presented for queries 3,4,5,6 [12]. Queries 4,5 are typical examples of the complex

queries that cause scalability problems for applications.

RDBMS MySql 5.0

OS Ubuntu 8.04

Processor AMD Turion TL 58

32/64 bit 32

Processor speed 1.9 Ghz

L1 cache (MB) 128

L2 cache (MB) 512

FSB speed (MHz) 800

RAM (Gb) 2

Disk capacity (Gb) 160

Disk rotation (RPM)

5400

Buffering (MB) 8

Table 2 Platform details

The improvements on query 3 and query 6 are not significant. The baseline timings for those queries are not very large in the first place and involve lesser number of joins. For example, query 6 involves only one triple. Add-Join uses two triples for the query and offers the same amount of performance as the baseline. There are very significant benefits on the queries 4 and 5. The benefits due to Add-Join method account for the bulk of the benefits over the baseline amounting to approximately 50x improvement from the baseline. The direct benefits due to grouping technique amount to 10x-20x times over the baseline, when compared to the Add-Join method without grouping. It remains to be seen if better grouping techniques can yield significantly higher benefits. However, these results

Figure 7: Greedy Grouping

While there is a subject S [that has not been assigned an Id] :

­ Assign next available id to S­ compute all the subjects Si 

related to S; i.e has a non zero S­score with S

­ compute S­score(S, Si) for all such discovered subjects

­ Assign ids to Si in the increasing order of S­score(S,Si) till  Σ numTriples(Si) <= B

Page 10: Triple-Triple RDF Store with Greedy Graph based Grouping

demonstrate the validity of the techniques described earlier.

Figure 8 Speedup from baseline (250K Triples)

Figure 9 Speedup from baseline (1M Triples)

Page 11: Triple-Triple RDF Store with Greedy Graph based Grouping

6.2 Validation of Grouping schemeIt is also important to validate the grouping scheme presented in section 5. We expect the grouping scheme to reduce the number of disk I/O for selects on the triples tables. Table 3 presents the number of rows the query optimizer expects to examine, for answering selects on the triple tables. It can be seen that the grouping has resulted in a decrease in the number of rows examined for the query.

Table No grouping With grouping

spo table 1000619 973866

pos table 1000619 805152

osp table 1000619 828004

Table 3 : Expected number of rows accessed for selects

We also measure the amount of interrelationship between the triples in each block, with and without grouping, to observe the effectiveness of our grouping algorithm. For each disk block [i.e. a set of 1300 tuples] we construct S-graphs, P-graphs and O-graphs representing those clusters. We then compute the sum of all the edge weights of each such cluster and average it across all data blocks. This allows us to quantify the effectiveness of our grouping scheme. Ideally, we should also be comparing our grouping scheme against the optimal solution. However, the optimal solution is very hard to compute as mentioned earlier and also cannot be predicted accurately since it depends on the nature of the triples. We divide the average edge weight for table with grouping by the average edge weight for the corresponding table without grouping to obtain a metric called the R-score or the relationship score for those two tables. Figure 10 Presents the R-scores for all three triples tables for 250K and 1M triple tables.

The results indicate that the grouping algorithm has been quite effective, increasing the interrelationship by 2x-10x times of the random Id assignment. The R-score for the pos table is lower since there are fewer number of properties and hence in a number of cases, a single data block is full of triples from the same property. In fact, the average edge weight for the pos table is much lower than those of the spo and osp tables. There are higher values for the osp table since there are far more objects than subjects, in which case, there are more number of edges in the cluster graph constructed.

7. Conclusions and Future workThus, we have explored some promising approaches to improving query performance in relational triple stores. An interesting query rewriting mechanism which introduces additional joins to speed up query execution has been discussed. Optimization of the physical schema by leveraging the interrelationship between the elements of a triple, has been proposed. A greedy grouping algorithm which is simple and computationally feasible has been

Figure 10 : R­score

Page 12: Triple-Triple RDF Store with Greedy Graph based Grouping

proposed and validated. The results show that our approach is promising and can be potentially combined with other techniques in literature to yield faster RDF stores. As a part of future work, we intend to compare the performance of the system with the Hexastore and potentially enhance the Hexastore with our grouping algorithm. As mentioned earlier, we would like to develop better grouping algorithms by leveraging parallel computing techniques to overcome the computational issues. We believe that better grouping algorithms can yield significantly higher performance. Another key observation we make, is that no physical schema will perform best for all types of queries. Hence, with the cheap availability of storage, multiple physical schema can co exist within the same RDF store and the SPARQL compiler can judiciously employ them based on the type of the query.

8. References[1] Resource Description Framework http://www.w3.org/RDF/[2] http://iswc2008.semanticweb.org/calls/call-for-semantic-web-challenge-and-billion-triples-tracks/[3] Das, S., Chong, E. I., Eadon, G., and Srinivasan, J. 2004. Supporting ontology-based semantic matching in RDBMS. In Proceedings of the Thirtieth international Conference on Very Large Data Bases - Volume 30 (Toronto, Canada, August 31 - September 03, 2004). M. A. Nascimento, M. T. Özsu, D. Kossmann, R. J. Miller, J. A. Blakeley, and K. B.

Schiefer, Eds. Very Large Data Bases. VLDB Endowment, 1054-1065.[4] David Wood, Kowari: A Platform for Semantic Web“ Storage and Analysis ,In XTech 2005 Conference”

[5] Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. 2007. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd international Conference on Very Large Data Bases (Vienna, Austria, September 23 - 27, 2007). Very Large Data Bases. VLDB Endowment, 411-422. [6] Weiss, C., Karras, P., and Bernstein, A. 2008. Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1, 1 (Aug. 2008), 1008-1019. DOI= http://doi.acm.org/10.1145/1453856.145396[7] Nested-Loop Join Algorithms http://dev.mysql.com/doc/refman/5.0/en/nested-loop-joins.html[8] Using Explain syntax http://dev.mysql.com/doc/refman/5.0/en/using-explain.html[9] Anuj Mehrotra and Michael A Trick, Cliques and“ Clustering: A Combinatorial Approach , ”[10] MCL : An algorithm for clustering graphs, http://www.micans.org/mcl/ [11] Graclus http://www.cs.utexas.edu/users/dml/Software/graclus.html[12] Class project website http://www.cs.utexas.edu/~jsequeda/cs386d/project.html