the bigger data story
TRANSCRIPT
![Page 1: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/1.jpg)
the
BIGger data storyBen Keller [@vinegarbin bjkeller.github.io linkedin.com/in/bjkeller ] STEAM Vent 8 January 2015
ger/
Creative Commons Attribution-ShareAlike 4.0 International License
![Page 2: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/2.jpg)
prelude
![Page 3: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/3.jpg)
na
i=1
wi⇤f�!
ma
i=1
vi⇤
la
i=1
xi⇤g�!
na
i=1
wi⇤
Given
⇤ = K�/Iwhere
compute
Im g = ker fsuch that
![Page 4: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/4.jpg)
algorithmic problem
na
i=1
wi⇤f�!
ma
i=1
vi⇤
la
i=1
xi⇤g�!
na
i=1
wi⇤
Given
⇤ = K�/Iwhere
compute
Im g = ker fsuch that
input
output
![Page 5: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/5.jpg)
na
i=1
wi⇤f�!
ma
i=1
vi⇤
la
i=1
xi⇤g�!
na
i=1
wi⇤
Given
⇤ = K�/Iwhere
compute
Im g = ker fsuch that
a graph
![Page 6: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/6.jpg)
a (directed) graph
a
bcd
1 2
3
![Page 7: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/7.jpg)
a (directed) graph
vertices
a
bcd
1 2
3
![Page 8: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/8.jpg)
a (directed) graph
an edge (arc)
a
bcd
1 2
3
![Page 9: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/9.jpg)
graphs
a
bcd
1 2
3
used to represent relationships
in our algebra, represents which multiplications work
ab, abc, ad, aba, …
ba = bd = cb = … = 0
while others don’t
![Page 10: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/10.jpg)
act I
![Page 11: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/11.jpg)
recommender systems
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
tell us what we might like based on similarities and what others have liked
can represent data as a graph
how do algorithms work with respect to graph?
![Page 12: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/12.jpg)
recommender graphs
Brian
Charlescherriesapples
model similarity of items
by shared likes of users
![Page 13: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/13.jpg)
recommender graphs
Brian
Charlescherriesapples
model similarity of items
by shared likes of users
to construct new edges
weighted by number of shared users
cherriesapples
![Page 14: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/14.jpg)
similarity graph
Brian
Charlescherriesapples
model similarity of items
by shared likes of users
to construct new edges
weighted by number of shared users
cherriesapples
giving a new graph representing similarity between items
apples
bananas
cherries
doughnuts
eggs
![Page 15: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/15.jpg)
making recommendations
with similarity graph
apples
bananas
cherries
doughnuts
eggs
apples
bananas
Abby
combining graph of “likes”
![Page 16: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/16.jpg)
making recommendations
make list by ranking them by weight
recommend items similar to those a user likes
apples
bananas
cherries
doughnuts
eggs
Abby
![Page 17: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/17.jpg)
interlude
![Page 18: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/18.jpg)
genetic disease
• causal factors of disease are inherited
• assumed to manifest themselves as variations of the genome
• may combine in complex ways
![Page 19: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/19.jpg)
act II
![Page 20: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/20.jpg)
a genetic disease question
chr A
chr B
have paired regions of genome where variations cooccur in bipolar disorder patients
how are these regions related
by biology?
![Page 21: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/21.jpg)
a genetic disease question
chr A
chr B …… …
genes in regions
“biology” of genes
![Page 22: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/22.jpg)
a genetic disease question
…
… …
…
……
genes “biology”
![Page 23: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/23.jpg)
a familiar graph
• model biological factors of genes by words found in descriptions of what gene does
• gives us a graph similar to starting graph for recommenders
• form similarity graph only between genes in regions
…
……
![Page 24: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/24.jpg)
sCDKN2A/B
PPARG
HHEX TCF7L2
"mortality"
"g1""repression"
MTHFR
TNF
"ethanol""intake" "consumption"gene-
environmentinteraction
NURR1
FOS
"cocaine" dopaminesignalling
Look at local connections between genes
words can help find explanations
![Page 25: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/25.jpg)
interlude
![Page 26: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/26.jpg)
user's ways of thinking
how user thinks about:
tasks
biology of disease
![Page 27: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/27.jpg)
tool support
how tools
allow manipulations
representinformation
![Page 28: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/28.jpg)
manipulations
representation
tasks
knowledge
user has to manage relationships in her head
![Page 29: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/29.jpg)
cognitive engineering
Interdisciplinary approach to supporting user performance in complex systems
![Page 30: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/30.jpg)
act III
![Page 31: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/31.jpg)
a model of biology
A
BC
D
![Page 32: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/32.jpg)
A
BC
D
genetic variations
![Page 33: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/33.jpg)
A
BC
D
genetic variations
affect
how cells/organs work
![Page 34: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/34.jpg)
A
BC
D
genetic variations
affect
how cells/organs work
resulting insymptoms of
disease
![Page 35: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/35.jpg)
A
BC
D
genetic variations
affect
how cells/organs work
resulting insymptoms of
disease
groups of
in particular ways
shared
certain
^
^
^
and there are other things we don’t know about
![Page 36: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/36.jpg)
A
BC
D
data is observation of measures in cells/organs of individuals
![Page 37: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/37.jpg)
with our graph trick we can take data relating people to observations
and create a graph showing which observations are shared by groups of people
![Page 38: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/38.jpg)
and we can do it for each kind of observation
![Page 39: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/39.jpg)
how do these
relate?scientist left to decide:
but…
![Page 40: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/40.jpg)
scientist wants to know
how groups here
affect values below
and how values hereare affected by groups
of values above
![Page 41: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/41.jpg)
scientist wants to know
how groups here
affect values below
and how values hereare affected by groups
of values above
but cannot answer easily with the tools we’ve chosen
![Page 42: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/42.jpg)
A
BC
D
we will ultimately solve this problem by supporting the scientist in her reasoning
not by choosing our favorite tool
![Page 43: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/43.jpg)
finale
![Page 44: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/44.jpg)
data science
math/statistics computation
![Page 45: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/45.jpg)
data science
math/statistics computation
human reasoning
![Page 46: The Bigger Data Story](https://reader033.vdocuments.site/reader033/viewer/2022051414/55a4e6861a28ab36748b47b7/html5/thumbnails/46.jpg)
This work is licensed under a Creative Commons Attribution-ShareAlike
4.0 International License.