graph day texas: open source graph projects from pokitdok

Post on 23-Jan-2017

325 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A tour of the PokitDok Health Graph and some open source graph projects

Graph Day Texas, Jan 2016Denise Gosnell, PhD

Twitter and Github:@pokitdok

@denisekgosnell

Confidential 2

PokitDok APIs:

The business of health,for developers.

https://platform.pokitdok.com/

Twitter and Github:@pokitdok

@denisekgosnell

Confidential 3

PokitDok APIs: Marketplace

Confidential 4

Doctor on Demand: Powered by PokitDok

Twitter and Github:@pokitdok

@denisekgosnell

onward to graphs.

6

What we built. The HealthGraph

What we’ve open sourced.A Gremlin-Python LibraryCustom Titan BuildDynamic JSON Graph [WIP]HealthGraph DSL [WIP]

Talk Outline: Twitter and Github:@pokitdok

@denisekgosnell

The PokitDok HealthGraph

Confidential 8

X12 Data Standard: ETL hell from the 1970s

Twitter and Github:@pokitdok

Confidential 9

X12 Data Standard: ETL hell from the 1970s

Twitter and Github:@pokitdok

Confidential 10

Health Graph: Transaction as Trees

• We treat transactions as first-class objects in the graph

• Buried in the depth of an X12 transactions are the entities of interest

Twitter and Github:@pokitdok

Interactive graph available at: https://fullmetalhealth.com/dsl/

Confidential 11

HealthGraph: Property Graph Model

Twitter and Github:@pokitdok

@denisekgosnell

Confidential 12

HealthGraph: Probabilistic Inferences

Confidential 13

HealthGraph: Data Inferences

Twitter and Github:@pokitdok

@denisekgosnell

Confidential 14

HealthGraph: Predictive Models

• What is the probability claim X will be denied?

• A new customer just searched for “family practice”; recommend the best provider within 10 miles.

• Given a CPT code, what is the expected reimbursement rate from insurance company A in zip code 37601?

Twitter and Github:@pokitdok

@denisekgosnell

Confidential 15HealthGraph: Top 100k Providers

Twitter and Github:@pokitdok

@MacraeAlec

PokitDok Open Source:

Gremlin Python

Confidential 17

Our HealthGraph Production Stack

• Titan 0.5.3

• TinkerPop’s Blueprints 2.50

• Cassandra and Elastic Search

Gremlin-Python

Twitter and Github:@pokitdok

@denisekgosnell

Confidential 18

• Lighter Context Switching between development tools and environments

• Incompatible syntax issues between Gremlin and Python

• Using Python.

Gremlin-Python MotivationTwitter and Github:

@corbinbs@denisekgosnell

Confidential 19

Option 1: Grab our docker container1. Install Dockerhttps://www.docker.com/docker-toolbox

2. Jump in the “Docker Quickstart Terminal”

3. Fire up our example container:docker run -i -t pokitdok/gremlin-python-test-drive

Option 2: Shell script install1. Clone our repo:https://github.com/pokitdok/gremlin-python

2. Run the set-up scripts:$./test_drive/setup.sh &&./test_drive/run.sh

Gremlin-Python Test Drive Twitter and Github:@corbinbs

@denisekgosnell

Confidential 20

Bi-Partite Graph Recommendation System

Customer

viewedscheduled_wit

h

Doctor

Twitter and Github:@pokitdok

@denisekgosnell

Confidential 21

Bi-Partite Graph Recommendation System

Customer

viewed

Doctor

Twitter and Github:@pokitdok

@denisekgosnell

Confidential 22

Bi-Partite Graph Recommendation System

Customer

viewed

Doctor

Twitter and Github:@pokitdok

@denisekgosnell

Confidential 23

Bi-Partite Graph Recommendation System

Customer

viewed

Doctor

Twitter and Github:@pokitdok

@denisekgosnell

Confidential 24

Bi-Partite Graph Recommendation System

Customer

viewed

Doctor

Twitter and Github:@pokitdok

@denisekgosnell

g.E.has(‘edge_type’,’scheduled_with’) .in_v() .group_count(ranked_docs, lambda it: it.full_name, lambda it: it.b+1.0)

Confidential 25

Gremlin-Python Test Drive Twitter and Github:@corbinbs

@denisekgosnell

PokitDok Open Source:

Custom Build of Titan 0.5.3 to Integrate with CDH5 Containers

Confidential 27

Motivation for Release of Custom Build:Graph Production Stack:

Titan 0.5.x ships with Hadoop 2.2API Production Stack:

contains Cloudera’s CDH5 containers and Hadoop 2.6.0You guessed it:

infrastructure dependency errors upon integration the Hadoop 2.6.0 API is not fully backwards compatible with Hadoop 2.2

Twitter and Github:@pokitdok

Confidential 28

Released:A modification of the Titan 0.5.3 build to upgrade to Hadoop 2.6.0 and resolve numerous conflicts among transitive dependencies.

… someone had to do it.

Grab it here: https

://github.com/pokitdok/titan/tree/0.5.3-

hadoop2.6.0

Tested for Cassandra but not Hbase.

Twitter and Github:@pokitdok

HealthGraph Dynamic JSON Load

Open Source Version [WIP]

Confidential 30

Dyanmic JSONLoader:

Goal: Bulk load of JSON from sequenced HDFS files straight to a Titan DB

Twitter and Github:@pokitdok

Confidential 31

1. Extract PokitDok HealthGraph specific features

2. Move to Titan 1.0 and TP3 compatibility3. Release on PokitDok GitHub

Dyanmic JSONLoader Future WorkTwitter and Github:

@pokitdok

HealthGraph DSL

Open Source Version [WIP]

Confidential 33

X12 Data Standard: ETL hell from the 1970s

Twitter and Github:@pokitdok

Confidential 34

X12 Spec Trees vs. Graph DSL:Twitter and Github:

@pokitdok

Interactive graph available at: https://fullmetalhealth.com/dsl/

Confidential 35

Graph DSL with TinkerPop 2.5:Twitter and Github:

@pokitdok

Confidential 36

1. Move to Titan 1.0 and TP3 compatibility2. Release on PokitDok GitHub3. Current Open Question:

We are looking for(ward to) more documentation on implementing custom gremlin steps(DSLs) in TP3

DSL Future WorkTwitter and Github:

@pokitdok

and there will be more…!

Confidential 38

Reach Out

Dev Blog: FullMetalHealth.com@PokitDok @DeniseKGosnell

Denise.Gosnell@pokitdok.com

A tour of the PokitDok Health Graph and some open source graph projects

Graph Day Texas, Jan 2016Denise Gosnell, PhD

Twitter and Github:@pokitdok

@denisekgosnell

top related