hello datastax enterprise graph
TRANSCRIPT
Hello DSE Graph Lets build a knowledge graph Markus Höfer, codecentric AG DataStax Summit Europe 2016
2
What is this talk about?
3
Knowledge Graph What is the problem?
4
15 locations
5
Many customers
6
300+ employees
7
A lot of knowledge
8
Results in graph
9
How can we answer the following questions?
! Which projects are currently running? ! Which employee has expertise in required fields and
doesn’t work for a project at the requested time? ! Which employees already worked for a customer? ! Which employees are currently not working for a
project? ! Which skills are requested most by customers?
10
Implementation
11
The Setup
12
Java driver
• Query execution very similar to "normal" DataStax Java Driver queries
GraphResultSet executeGraph(String var1); GraphResultSet executeGraph(String var1, Map<String, Object> var2); GraphResultSet executeGraph(GraphStatement var1);
13
Create a graph
GraphResultSet resultSet = dseSession.executeGraph( "system.createGraph(name).ifNotExist().build()", ImmutableMap.<String, Object> of("name", GRAPH_NAME) );
14
The Schema
// Vertex label schema.vertexLabel('employee').ifNotExists().create() schema.vertexLabel('project').ifNotExists().create() // Edge label schema.edgeLabel('worked_for').ifNotExists().create() // Property keys schema.propertyKey('from').Timestamp().ifNotExists().create() schema.propertyKey('name').Text().ifNotExists().create()
15
Indexes
// Materialized schema.vertexLabel('customer').index('byName')
.materialized().by('name').add() // Secondary schema.vertexLabel('employee').index('byCity')
.secondary().by('city').add() // Search schema.vertexLabel(‘project').index('search')
.search().by(‘discription').add()
16
Insert Data
g.V().has('employee', 'name', 'Markus Höfer') .tryNext().orElseGet { graph.addVertex( label, 'employee', 'name', 'Markus Höfer', 'geo', POINT(51.926164 7.718504) ) }
Add employee vertex
17
Insert Data
Add project vertex
g.V().has('project', 'name', 'ProjectX') .tryNext().orElseGet { graph.addVertex( label, 'project', 'name', 'ProjectX' ) }
18
Add edge
def employee = g.V(employee_id).next() def project = g.V(project_id).next() employee.addEdge( 'worked_for', project, 'from', Instant.parse('2015-11-30T00:00:00.00Z'), 'until', Instant.parse('2016-11-30T00:00:00.00Z') )
19
Subgraph looks like this
20
Working with DataStax Studio
21
Working with DataStax Studio
• Markdown for documentation
• Gremlin for schema and
traversal development
22
Working with DataStax Studio
• Schema aware content assist!
23
Working with DataStax Studio
Profiling Without index
With materialized index
24
The Result
25
Looking back to our initial questions
26
Working with DataStax Studio
Which skills are requested most by customers?
g.V().hasLabel('project') .out('requires').groupCount().by('name') .order(local).by(valueDecr).limit(local, 10)
27
Working with DataStax Studio
Which employee has expertise in required fields and doesn‘t work for a project at the requested time?
28
Working with DataStax Studio
Which employees are currently not working for a project?
29
Whats next?
Add more connections, e.g. - Community events - Blog posts - Trainings - Geospatial traversals - etc
30
Questions?
Markus Höfer IT Consultant
[email protected] www.codecentric.de blog.codecentric.de/en
#HashtagMarkus