higher-order organization of interactions in human chromosomes: 1d sequence → 3d structure → ...
TRANSCRIPT
Higher-order organization of interactions in human chromosomes:
1D sequence → 3D structure → topologically associated domain (TAD) via
network community detection
Sang Hoon Lee School of Physics, Korea Institute for Advanced Study
http://newton.kias.re.kr/~lshlj82
@ Indianapolis, 20 June, 2017
(legend on next page)
S10 Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc.
(legend on next page)
S10 Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc.
details unknown, in many aspects
Hi-C: the interaction map of chromatin loci
A B
C
D
Figure 1. We Used In Situ Hi-C to Map over 15 Billion Chromatin Contacts across Nine Cell Types in Human and Mouse, Achieving 1 kbResolution in Human Lymphoblastoid Cells(A) During in situ Hi-C, DNA-DNA proximity ligation is performed in intact nuclei.
(B) Contact matrices from chromosome 14: the whole chromosome, at 500 kb resolution (top); 86–96 Mb/50 kb resolution (middle); 94–95 Mb/5 kb resolution
(bottom). Left: GM12878, primary experiment; Right: biological replicate. The 1D regions corresponding to a contact matrix are indicated in the diagrams above
and at left. The intensity of each pixel represents the normalized number of contacts between a pair of loci. Maximum intensity is indicated in the lower left of each
panel.
(C) We compare our map of chromosome 7 in GM12878 (last column) to earlier Hi-Cmaps: Lieberman-Aiden et al. (2009), Kalhor et al. (2012), and Jin et al. (2013).
(D) Overview of features revealed by our Hi-C maps. Top: the long-range contact pattern of a locus (left) indicates its nuclear neighborhood (right). We detect at
least six subcompartments, each bearing a distinctive pattern of epigenetic features. Middle: squares of enhanced contact frequency along the diagonal (left)
indicate the presence of small domains of condensed chromatin, whose median length is 185 kb (right). Bottom: peaks in the contact map (left) indicate the
presence of loops (right). These loops tend to lie at domain boundaries and bind CTCF in a convergent orientation.
See also Figure S1, Data S1, I–II, and Tables S1 and S2.
Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc. 1667
locus i
locus j
locus i
locus j
value: the interaction frequency between loci i and j, or physical proximity
E. Lieberman-Aiden et al., Science 326, 289 (2009):~ 1 Mb resolution
S. S. P. Rao et al., Cell 159, 1665 (2014):~ 1 kb resolution
Hi-C: the interaction map of chromatin loci
A B
C
D
Figure 1. We Used In Situ Hi-C to Map over 15 Billion Chromatin Contacts across Nine Cell Types in Human and Mouse, Achieving 1 kbResolution in Human Lymphoblastoid Cells(A) During in situ Hi-C, DNA-DNA proximity ligation is performed in intact nuclei.
(B) Contact matrices from chromosome 14: the whole chromosome, at 500 kb resolution (top); 86–96 Mb/50 kb resolution (middle); 94–95 Mb/5 kb resolution
(bottom). Left: GM12878, primary experiment; Right: biological replicate. The 1D regions corresponding to a contact matrix are indicated in the diagrams above
and at left. The intensity of each pixel represents the normalized number of contacts between a pair of loci. Maximum intensity is indicated in the lower left of each
panel.
(C) We compare our map of chromosome 7 in GM12878 (last column) to earlier Hi-Cmaps: Lieberman-Aiden et al. (2009), Kalhor et al. (2012), and Jin et al. (2013).
(D) Overview of features revealed by our Hi-C maps. Top: the long-range contact pattern of a locus (left) indicates its nuclear neighborhood (right). We detect at
least six subcompartments, each bearing a distinctive pattern of epigenetic features. Middle: squares of enhanced contact frequency along the diagonal (left)
indicate the presence of small domains of condensed chromatin, whose median length is 185 kb (right). Bottom: peaks in the contact map (left) indicate the
presence of loops (right). These loops tend to lie at domain boundaries and bind CTCF in a convergent orientation.
See also Figure S1, Data S1, I–II, and Tables S1 and S2.
Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc. 1667
locus i
locus j
locus i
locus j
value: the interaction frequency between loci i and j, or physical proximity
E. Lieberman-Aiden et al., Science 326, 289 (2009):~ 1 Mb resolution
S. S. P. Rao et al., Cell 159, 1665 (2014):~ 1 kb resolution
320×480 resolution (163 ppi)
1080×1920resolution (401 ppi)
Hi-C: the interaction map of chromatin loci
A B
C
D
Figure 1. We Used In Situ Hi-C to Map over 15 Billion Chromatin Contacts across Nine Cell Types in Human and Mouse, Achieving 1 kbResolution in Human Lymphoblastoid Cells(A) During in situ Hi-C, DNA-DNA proximity ligation is performed in intact nuclei.
(B) Contact matrices from chromosome 14: the whole chromosome, at 500 kb resolution (top); 86–96 Mb/50 kb resolution (middle); 94–95 Mb/5 kb resolution
(bottom). Left: GM12878, primary experiment; Right: biological replicate. The 1D regions corresponding to a contact matrix are indicated in the diagrams above
and at left. The intensity of each pixel represents the normalized number of contacts between a pair of loci. Maximum intensity is indicated in the lower left of each
panel.
(C) We compare our map of chromosome 7 in GM12878 (last column) to earlier Hi-Cmaps: Lieberman-Aiden et al. (2009), Kalhor et al. (2012), and Jin et al. (2013).
(D) Overview of features revealed by our Hi-C maps. Top: the long-range contact pattern of a locus (left) indicates its nuclear neighborhood (right). We detect at
least six subcompartments, each bearing a distinctive pattern of epigenetic features. Middle: squares of enhanced contact frequency along the diagonal (left)
indicate the presence of small domains of condensed chromatin, whose median length is 185 kb (right). Bottom: peaks in the contact map (left) indicate the
presence of loops (right). These loops tend to lie at domain boundaries and bind CTCF in a convergent orientation.
See also Figure S1, Data S1, I–II, and Tables S1 and S2.
Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc. 1667
locus i
locus j
locus i
locus j
value: the interaction frequency between loci i and j, or physical proximity
E. Lieberman-Aiden et al., Science 326, 289 (2009):~ 1 Mb resolution
S. S. P. Rao et al., Cell 159, 1665 (2014):~ 1 kb resolution
topologically associated domains (TADs)
Hi-C: the interaction map of chromatin loci
A B
C
D
Figure 1. We Used In Situ Hi-C to Map over 15 Billion Chromatin Contacts across Nine Cell Types in Human and Mouse, Achieving 1 kbResolution in Human Lymphoblastoid Cells(A) During in situ Hi-C, DNA-DNA proximity ligation is performed in intact nuclei.
(B) Contact matrices from chromosome 14: the whole chromosome, at 500 kb resolution (top); 86–96 Mb/50 kb resolution (middle); 94–95 Mb/5 kb resolution
(bottom). Left: GM12878, primary experiment; Right: biological replicate. The 1D regions corresponding to a contact matrix are indicated in the diagrams above
and at left. The intensity of each pixel represents the normalized number of contacts between a pair of loci. Maximum intensity is indicated in the lower left of each
panel.
(C) We compare our map of chromosome 7 in GM12878 (last column) to earlier Hi-Cmaps: Lieberman-Aiden et al. (2009), Kalhor et al. (2012), and Jin et al. (2013).
(D) Overview of features revealed by our Hi-C maps. Top: the long-range contact pattern of a locus (left) indicates its nuclear neighborhood (right). We detect at
least six subcompartments, each bearing a distinctive pattern of epigenetic features. Middle: squares of enhanced contact frequency along the diagonal (left)
indicate the presence of small domains of condensed chromatin, whose median length is 185 kb (right). Bottom: peaks in the contact map (left) indicate the
presence of loops (right). These loops tend to lie at domain boundaries and bind CTCF in a convergent orientation.
See also Figure S1, Data S1, I–II, and Tables S1 and S2.
Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc. 1667
locus i
locus j
locus i
locus j
value: the interaction frequency between loci i and j, or physical proximity
E. Lieberman-Aiden et al., Science 326, 289 (2009):~ 1 Mb resolution
S. S. P. Rao et al., Cell 159, 1665 (2014):~ 1 kb resolution
adjacency matrix of a weighted network whose nodes are the loci and the weights are the interaction frequency
topologically associated domains (TADs)
Hi-C: the interaction map of chromatin loci
A B
C
D
Figure 1. We Used In Situ Hi-C to Map over 15 Billion Chromatin Contacts across Nine Cell Types in Human and Mouse, Achieving 1 kbResolution in Human Lymphoblastoid Cells(A) During in situ Hi-C, DNA-DNA proximity ligation is performed in intact nuclei.
(B) Contact matrices from chromosome 14: the whole chromosome, at 500 kb resolution (top); 86–96 Mb/50 kb resolution (middle); 94–95 Mb/5 kb resolution
(bottom). Left: GM12878, primary experiment; Right: biological replicate. The 1D regions corresponding to a contact matrix are indicated in the diagrams above
and at left. The intensity of each pixel represents the normalized number of contacts between a pair of loci. Maximum intensity is indicated in the lower left of each
panel.
(C) We compare our map of chromosome 7 in GM12878 (last column) to earlier Hi-Cmaps: Lieberman-Aiden et al. (2009), Kalhor et al. (2012), and Jin et al. (2013).
(D) Overview of features revealed by our Hi-C maps. Top: the long-range contact pattern of a locus (left) indicates its nuclear neighborhood (right). We detect at
least six subcompartments, each bearing a distinctive pattern of epigenetic features. Middle: squares of enhanced contact frequency along the diagonal (left)
indicate the presence of small domains of condensed chromatin, whose median length is 185 kb (right). Bottom: peaks in the contact map (left) indicate the
presence of loops (right). These loops tend to lie at domain boundaries and bind CTCF in a convergent orientation.
See also Figure S1, Data S1, I–II, and Tables S1 and S2.
Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc. 1667
locus i
locus j
locus i
locus j
value: the interaction frequency between loci i and j, or physical proximity
E. Lieberman-Aiden et al., Science 326, 289 (2009):~ 1 Mb resolution
S. S. P. Rao et al., Cell 159, 1665 (2014):~ 1 kb resolution
adjacency matrix of a weighted network whose nodes are the loci and the weights are the interaction frequency
detecting the topologically associated domains (TADs) ≡ detecting the community structures in networks
topologically associated domains (TADs)
adjacency matrix
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
100
nz = 2730
p1=0.5, p2=0.05, p3=0.5; pS=0, dS=0
community structures in networks
“modularity” (the objective function to be maximized)
review papers: M. A. Porter, J.-P. Onnela, and P. J. Mucha, Not. Am. Math. Soc. 56, 1082 (2009); S. Fortunato, Phys. Rep. 486, 75 (2010).
Q =1
2m
X
i 6=j
✓Aij � �
kikj2m
◆� (gi, gj)
�
Newman-Girvan null model termki =P
i Aij =P
j Aij
gi: the community to which node i belongs2m =
Pi 6=j Aij =
Pi ki
A = {Aij}
adjacency matrix
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
90
100
nz = 2730
p1=0.5, p2=0.05, p3=0.5; pS=0, dS=0
community structures in networks
“modularity” (the objective function to be maximized)
review papers: M. A. Porter, J.-P. Onnela, and P. J. Mucha, Not. Am. Math. Soc. 56, 1082 (2009); S. Fortunato, Phys. Rep. 486, 75 (2010).
Q =1
2m
X
i 6=j
✓Aij � �
kikj2m
◆� (gi, gj)
�
Newman-Girvan null model term
TAX
ON
OM
IES
OF
NE
TW
OR
KS
FRO
MC
OM
MU
NIT
YST
RU
CT
UR
EPH
YSI
CA
LR
EV
IEW
E86
,036
104
(201
2)
that
alle
dges
are
antif
erro
mag
netic
atre
solu
tion
λ=
"m
axan
dth
ereb
yfo
rces
each
node
into
itsow
nco
mm
unity
.
III.
ME
SOSC
OPI
CR
ESP
ON
SEFU
NC
TIO
NS
(MR
FS)
Tode
scri
beho
wa
netw
ork
disi
nteg
rate
sin
toco
mm
uniti
esas
the
valu
eof
λis
incr
ease
dfr
om"
min
to"
max
[see
Fig.
1(a)
fora
sche
mat
ic],
one
need
sto
sele
ctsu
mm
ary
stat
istic
s.T
here
are
man
ypo
ssib
lew
ays
tosu
mm
ariz
esu
cha
disi
nteg
ratio
npr
oces
s,an
dw
efo
cus
onth
ree
diag
nost
ics
that
char
acte
rize
fund
amen
talp
rope
rtie
sof
netw
ork
com
mun
ities
.Fi
rst,
we
use
the
valu
eof
the
Ham
ilton
ianH
(λ)(
1),w
hich
isa
scal
arqu
antit
ycl
osel
yre
late
dto
netw
ork
mod
ular
ityan
dqu
antifi
esth
een
ergy
ofth
esy
stem
[13,
14].
Seco
nd,
we
calc
ulat
ea
part
ition
entr
opy
S(λ
)to
char
acte
rize
the
com
mun
itysi
zedi
stri
butio
n.To
doth
is,
let
nk
deno
teth
enu
mbe
rof
node
sin
com
mun
ityk
and
defin
ep
k=
nk/N
tobe
the
prob
abili
tyto
choo
sea
node
from
com
mun
ityk
unif
orm
lyat
rand
om.T
hisy
ield
sa(S
hann
on)p
artit
ion
entr
opy
ofS
(λ)=
−!
η(λ
)k=
1p
klo
gp
k,w
hich
quan
tifies
the
diso
rder
inth
eas
soci
ated
com
mun
itysi
zedi
stri
butio
n.T
hird
,we
use
the
num
ber
ofco
mm
uniti
esη
(λ).
ξ=1,
η=3
4ξ=
0,η =
1ξ=
0.2,
η=8
ξ=0.
4, η
=12
ξ=0.
6, η
=17
ξ=0.
8, η
=24
ξ = 0
.2ξ =
0.4
ξ = 0
.6ξ =
0.8
ξ = 0
ξ = 1
0
0.2
0.4
0.6
0.81
ξ
ferr
omag
netic
link
sno
nlin
ksan
tifer
rom
agne
tic li
nks
(a)
(c)
(b)
Hef
f
Sef
fη ef
f
FIG
.1.
(Col
oron
line)
(a)
Sche
mat
icof
som
eof
the
way
sth
ata
netw
ork
can
brea
kup
into
com
mun
ities
asth
eva
lue
ofλ
(or
ξ)
isin
crea
sed.
(b)Z
acha
ryK
arat
eC
lub
netw
ork
[23]
ford
iffe
rent
valu
esof
the
effe
ctiv
efr
actio
nof
antif
erro
mag
netic
edge
sξ
.All
inte
ract
ions
are
eith
erfe
rrom
agne
ticor
antif
erro
mag
netic
;i.e
.,fo
rth
eva
lues
ofξ
that
we
used
,th
ere
are
none
utra
lin
tera
ctio
ns.
We
colo
red
ges
inbl
ueif
the
corr
espo
ndin
gin
tera
ctio
nsar
efe
rrom
agne
tic,a
ndw
eco
lor
them
inre
dif
the
inte
ract
ions
are
antif
erro
mag
netic
.We
colo
rth
eno
des
base
don
com
mun
ityaf
filia
tion.
(c)
The
Hef
f,S
eff,
and
ηef
f
MR
Fs,
and
the
inte
ract
ion
mat
rix
Jfo
rdi
ffer
ent
valu
esof
ξ.
We
colo
rel
emen
tsof
the
inte
ract
ion
mat
rix
byde
pict
ing
the
abse
nce
ofan
edge
inw
hite
,fe
rrom
agne
ticed
ges
inbl
ue(d
ark
gray
),an
dan
tifer
rom
agne
ticed
ges
inre
d(l
ight
gray
).
Bec
ause
we
need
tono
rmal
izeH
,S,a
ndη
toco
mpa
reth
emac
ross
netw
orks
,we
defin
ean
effe
ctiv
een
ergy
Hef
f(λ
)=
H(λ
)−H
min
Hm
ax−
Hm
in=
1−
H(λ
)H
min
,(4
)
whe
reH
min
=H
("m
in)
andH
max
=H
("m
ax);
anef
fect
ive
entr
opy
Sef
f(λ
)=
S(λ
)−S
min
Sm
ax−
Sm
in=
S(λ
)lo
gN
,(5
)
whe
reS
min
=S
("m
in)
and
Sm
ax=
S("
max
);an
dan
effe
ctiv
enu
mbe
rof
com
mun
ities
ηef
f(λ
)=
η(λ
)−η
min
ηm
ax−
ηm
in=
η(λ
)−1
N−
1,
(6)
whe
reη
min
=η
("m
in)=
1an
dη
max
=η
("m
ax)=
N.
Som
ene
twor
ksco
ntai
na
smal
lnu
mbe
rof
entr
ies
"ij
that
are
orde
rsof
mag
nitu
dela
rger
than
mos
tot
her
entr
ies.
For
exam
ple,
inth
ene
twor
kof
Face
book
frie
ndsh
ips
atC
alte
ch[2
1,22
],98
%of
the
"ij
entr
ies
are
less
than
100,
but
0.02
%of
them
are
larg
erth
an80
00.
The
sela
rge
"ij
valu
esar
ise
whe
ntw
olo
w-s
tren
gth
node
sbe
com
eco
nnec
ted.
Usi
ngth
enu
llm
odel
Pij
=k i
k j/(
2m),
the
inte
ract
ion
betw
een
two
node
si
and
jbe
com
esan
tifer
rom
agne
ticw
hen
λ>
Aij/P
ij=
2mA
ij/(
k ik j
).If
ane
twor
kha
sa
larg
eto
tal
edge
wei
ght
but
both
ian
dj
have
smal
lst
reng
ths
com
pare
dto
othe
rno
des
inth
ene
twor
k,th
enλ
need
sto
bela
rge
tom
ake
the
inte
ract
ion
antif
erro
mag
netic
.In
prio
rst
udie
s,ne
twor
kco
mm
unity
stru
ctur
eha
sbee
nin
vest
igat
edat
diff
eren
tm
esos
copi
csc
ales
byco
nsid
erin
gpl
ots
ofva
riou
sdi
agno
stic
sas
afu
nctio
nof
the
reso
lutio
npa
ram
eter
λ[1
3,14
,17]
.In
the
pres
ent
exam
ple,
such
plot
sw
ould
bedo
min
ated
byin
tera
ctio
nsth
atre
quir
ela
rge
reso
lutio
n-pa
ram
eter
valu
esto
beco
me
antif
erro
mag
netic
.To
over
com
eth
isis
sue,
we
defin
eth
eef
fect
ive
frac
tion
ofan
tifer
rom
agne
ticed
ges
ξ=
ξ(λ
)=
ℓA(λ
)−ℓA
("m
in)
ℓA("
max
)−ℓA
("m
in)
∈[0
,1],
(7)
whe
reℓA
(λ)
isth
eto
tal
num
ber
ofan
tifer
rom
agne
ticin
-te
ract
ions
for
the
give
nva
lue
ofλ
.In
othe
rw
ords
,it
isth
enu
mbe
rof
"ij
elem
ents
that
are
smal
ler
than
λ.
Thu
s,ℓA
("m
in)
isth
ela
rges
tnu
mbe
rof
antif
erro
mag
netic
inte
rac-
tions
forw
hich
ane
twor
kst
illfo
rms
asi
ngle
com
mun
ity,a
ndth
eef
fect
ive
num
ber
ofan
tifer
rom
agne
ticin
tera
ctio
nsξ
(λ)
isth
enu
mbe
rof
antif
erro
mag
netic
inte
ract
ions
(nor
mal
ized
toth
eun
itin
terv
al)
inex
cess
ofℓA
("m
in).
The
func
tion
ξ(λ
)in
crea
ses
mon
oton
ical
lyin
λ.
Swee
ping
λfr
om"
min
to"
max
corr
espo
nds
tosw
eepi
ngth
eva
lue
ofξ
from
0to
1.(O
neca
nth
ink
ofλ
asa
cont
inuo
usva
riab
lean
dξ
asa
disc
rete
vari
able
that
chan
ges
with
even
ts.)
Asw
epe
rfor
msu
chsw
eepi
ngfo
ragi
ven
netw
ork,
the
num
ber
ofco
mm
uniti
esin
crea
sesf
rom
η(ξ
=0)
=1
toη
(ξ=
1)=
Nan
dyi
elds
ave
ctor
[Hef
f(ξ
),S
eff(ξ
),η
eff(ξ
)]w
hose
com
pone
nts
we
call
the
mes
osco
pic
resp
onse
func
tions
(MR
Fs)
ofth
atne
twor
k.(W
eal
soso
met
imes
refe
rto
the
vect
orits
elf
asan
MR
F.)
Bec
ause
Hef
f∈
[0,1
],S
eff∈
[0,1
],η
eff∈
[0,1
],an
dξ
∈[0
,1]f
orev
ery
netw
ork,
we
can
com
pare
the
MR
Fsac
ross
netw
orks
and
use
them
toid
entif
ygr
oups
ofne
twor
ksw
ithsi
mila
rm
esos
copi
cst
ruct
ures
.In
Fig.
1(b)
,w
esh
owth
eZ
acha
ryK
arat
eC
lub
netw
ork
[23]
for
diff
eren
tva
lues
of
0361
04-3
� "
ki =P
i Aij =P
j Aij
gi: the community to which node i belongs2m =
Pi 6=j Aij =
Pi ki
A = {Aij}
community detection with tunable resolutionmethod: I. S. Jutla, L. G. S. Jeub, and P. J. Mucha, GenLouvain (generalized Louvain) version 2.1 [November, 2016] ref) http://netwiki.amath.unc.edu/GenLouvain/GenLouvain
original version: V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, JSTAT 2008 (10), P10008.
data: the Hi-C map in S. S. P. Rao et al., Cell 159, 1665 (2014). the human B-lymphoblastoid cell (GM12878)(data) resolution: 1 kb, … , 1 Mb available. We are using intrachromosomal interactions with the 100 kb and 10 kb resolutions
5.6 GB 152 GBwith the normalization scheme introduced in P. A. Knight and D. Ruiz, IMA J. Numer. Anal. 33, 1029 (2013): the same one used in Cell 159, 1665 (2014).
time complexity: O(n log n)
ref) http://www.curetoday.com/tumor/childhood/treatment/cdr0000258001
˜Aij = ciAijcj such that
Pi˜Aij =
Pj˜Aij = 1
with the locus-specific correction factor {ci}
goal: providing a systematic way to detect TAD with tunable resolution
� = 0.6
� = 1
� = 1.4
goal: providing a systematic way to detect TAD with tunable resolution
2017. 4. 6. 22)15CTCF - Wikipedia
1/9페이지https://en.wikipedia.org/wiki/CTCF
CTCFFor the prison, see Colorado Territorial Correctional Facility.
Transcriptional repressor CTCF also known as 11-zinc fingerprotein or CCCTC-binding factor is a transcription factor that inhumans is encoded by the CTCF gene.[3][4] CTCF is involved in manycellular processes, including transcriptional regulation, insulator activity,V(D)J recombination[5] and regulation of chromatin architecture.[6]
Discovery
CCCTC-Binding factor or CTCF was initially discovered as a negativeregulator of the chicken c-myc gene. This protein was found to be bindingto three regularly spaced repeats of the core sequence CCCTC and thus wasnamed CCCTC binding factor.[7]
Function
The primary role of CTCF is thought to be in regulating the 3D structure ofchromatin.[6] CTCF binds together strands of DNA, thus formingchromatin loops, and anchors DNA to cellular structures like the nuclearlamina.[8] It also defines the boundaries between active andheterochromatic DNA.
Since the 3D structure of DNA influences the regulation of genes, CTCF'sactivity influences the expression of genes. CTCF is thought to be a primarypart of the activity of insulators, sequences that block the interactionbetween enhancers and promoters. CTCF binding has also been bothshown to promote and repress gene expression. It is unknown whetherCTCF affects gene expression solely through its looping activity, or if it hassome other, unknown, activity.[6]
Observed activity
2017. 4. 6. 22)15CTCF - Wikipedia
2/9페이지https://en.wikipedia.org/wiki/CTCF
The binding of CTCF has been shown to have many effects, which areenumerated below. In each case, it is unknown if CTCF directly evokes theoutcome or if it does so indirectly (in particular through its looping role).
Transcriptional regulation
The protein CTCF plays a heavy role in repressing the insulin-like growthfactor 2 gene, by binding to the H-19 imprinting control region (ICR) alongwith differentially-methylated region-1 (DMR1) and MAR3.[9][10]
Insulation
Binding of targeting sequence elements by CTCF can block the interactionbetween enhancers and promoters, therefore limiting the activity ofenhancers to certain functional domains. Besides acting as enhancerblocking, CTCF can also act as a chromatin barrier[11] by preventing thespread of heterochromatin structures.
Regulation of chromatin architecture
CTCF physically binds to itself to form homodimers,[12] which causes thebound DNA to form loops.[13] CTCF also occurs frequently at theboundaries of sections of DNA bound to the nuclear lamina.[8] Usingchromatin immuno-precipitation (ChIP) followed by ChIP-seq, it wasfound that CTCF localizes with cohesin genome-wide and affects generegulatory mechanisms and the higher-order chromatin structure.[14]
Regulation of RNA splicing
CTCF binding has been shown to influence mRNA splicing.[15]
DNA binding
CTCF binds to the consensus sequence CCGCGNGGNGGCAG (in IUPACnotation).[16] This sequence is defined by 11 zinc finger motifs in its
benchmark: the same sequence region (different cell type, though) in R. E. Boulos et al., Phys. Rev. Lett. 111, 118102 (2013).
remarkable linear gradient of the average replication forkpolarity (the mean orientation of the processing replicationmachinery) [15–17]. These replication domains coincidewith a remarkable gene arrangement [14] with, in particu-lar, an overrepresentation of highly expressed genes closeto domain borders [18]. In fact, replication domain bordersappears to be specified by a region (! 200 kb) of open andtranscriptionally active chromatin [16,19] that is signifi-cantly enriched in insulator DNA-binding proteins such asCTCF (the CCCTC-binding factor) [16]. Putative replica-tion origins at domain borders are thus associated withdistinctive attributes that make these origins key featuresof the replication-associated organization of the genome,qualifying them as ‘‘master’’ replication origins [19]. Wefocused our analysis on the coupling between the structuraldata and the replication domain organization in the humangenome. We have performed the analysis of interactiondata obtained from high-throughput chromosome confor-mation capture (Hi-C) technology [2] by mainly concen-trating on the intra- and interchromosomal contact mapsobtained in the human erythroid cell line K562 (100 kbresolution maps with GEO accession number GSE18199).These Hi-C contact maps are positively defined and sym-metric and so can be represented and analyzed using graphtheory [20]. We consider the Hi-C contact matrix as theadjacency matrix of a weighted graph, where the verticesvi are the 100 kb DNA loci and the edges are weightedaccording to the number of Hi-C binary interactions.Because the number of intrachromosome interactionsdecreases very fast when increasing the separation sbetween the loci (! s"1) [2,20], the weighted networkamounts to focus on interactions between loci separatedby short genomic distances (& 10 Mb) over which contactprobabilities are the highest. Alternatively, the non-weighted version of the network takes equally into accountshort-range and long-range interactions within a chromo-some. In this case, we optionally remove from the data allbinary interactions that are present only once (t ¼ 1) ortwice (t ¼ 2), as some of these may well be attributed toexperimental noise (t ¼ 0 corresponds to no thresholding).
In Fig. 1 is shown a Hi-C contact matrix [Fig. 1(b)]corresponding to intrachromosome interactions on a 12Mbfragment of human chromosome 10 where four replicationdomains were identified [Fig. 1(a)] in K562 as U-shapedpatterns in the mean replication timing (MRT) profile ofthis cell line [16,17]. As sketched by the dashed squares inFig. 1(b), these four U domains likely correspond to fourmatrix-square blocks of enriched interactions. This obser-vation suggests that MRT U domains correspond to somespatial compartmentalization into self-interacting struc-tural chromatin units where the bordering early initiationzones prevent cross talk between these domains [16]. Toquantify the importance of these U-domain borders in theHi-C contact interaction graph, we perform a statisticalanalysis over the 876 U domains ($ 3 Mb) identified in
K562 [16]. We also consider 140 additional ‘‘splitdomains’’ of size % 3 Mb whose borders have similargene organization and chromatin structure as replicationdomain borders [9].
(a)
(c)
(b)
FIG. 1 (color online). (a) MRT profile (thick black curve)[16,24] from early 0 to late 1 along a 12 Mb fragment of humanchromosome 10 in K562. The horizontal colored bars correspondto the four replication domains identified as MRT U-shapedpatterns [16] (red segments, 200 kb borders; dark blue segments,400 kb center; light blue segments, interior). CTCF enrichmentprofile (thin purple curve) (ENCODE release 3, March 2010) [25].(b) Corresponding intrachromosome Hi-C contact matrix [2].Each pixel represents the total number of interactions betweenpairs of 100 kb loci. The dashed squares delimit interactionswithin the four U domains. (c) Stationary configuration obtainedfor this 12 Mb fragment when using the 2D particle model tolayout the chromosome 10 interaction graph [Fig. 3(a)]. Verticesare colored according to their position relative to replicationdomains: the border is represented in red, the center in dark blue,the interior in light blue, and the exterior in black. The repre-sented edges correspond to connections between, respectively,replication domain borders (red symbols and lines) and centers(dark blue symbols and lines) with their neighbors distant frommore than 4 Mb. The contact threshold t ¼ 2 (see the text).
PRL 111, 118102 (2013) P HY S I CA L R EV I EW LE T T E R Sweek ending
13 SEPTEMBER 2013
118102-2
benchmark: the same sequence region (different cell type, though) in R. E. Boulos et al., Phys. Rev. Lett. 111, 118102 (2013).
chr10: normalized values
820 840 860 880 900 920position (100 kb)
820
840
860
880
900
920po
sitio
n (1
00 k
b)10-5
10-4
10-3
10-2
10-1
100
� = 20
� = 2
� = 5
� = 10
remarkable linear gradient of the average replication forkpolarity (the mean orientation of the processing replicationmachinery) [15–17]. These replication domains coincidewith a remarkable gene arrangement [14] with, in particu-lar, an overrepresentation of highly expressed genes closeto domain borders [18]. In fact, replication domain bordersappears to be specified by a region (! 200 kb) of open andtranscriptionally active chromatin [16,19] that is signifi-cantly enriched in insulator DNA-binding proteins such asCTCF (the CCCTC-binding factor) [16]. Putative replica-tion origins at domain borders are thus associated withdistinctive attributes that make these origins key featuresof the replication-associated organization of the genome,qualifying them as ‘‘master’’ replication origins [19]. Wefocused our analysis on the coupling between the structuraldata and the replication domain organization in the humangenome. We have performed the analysis of interactiondata obtained from high-throughput chromosome confor-mation capture (Hi-C) technology [2] by mainly concen-trating on the intra- and interchromosomal contact mapsobtained in the human erythroid cell line K562 (100 kbresolution maps with GEO accession number GSE18199).These Hi-C contact maps are positively defined and sym-metric and so can be represented and analyzed using graphtheory [20]. We consider the Hi-C contact matrix as theadjacency matrix of a weighted graph, where the verticesvi are the 100 kb DNA loci and the edges are weightedaccording to the number of Hi-C binary interactions.Because the number of intrachromosome interactionsdecreases very fast when increasing the separation sbetween the loci (! s"1) [2,20], the weighted networkamounts to focus on interactions between loci separatedby short genomic distances (& 10 Mb) over which contactprobabilities are the highest. Alternatively, the non-weighted version of the network takes equally into accountshort-range and long-range interactions within a chromo-some. In this case, we optionally remove from the data allbinary interactions that are present only once (t ¼ 1) ortwice (t ¼ 2), as some of these may well be attributed toexperimental noise (t ¼ 0 corresponds to no thresholding).
In Fig. 1 is shown a Hi-C contact matrix [Fig. 1(b)]corresponding to intrachromosome interactions on a 12Mbfragment of human chromosome 10 where four replicationdomains were identified [Fig. 1(a)] in K562 as U-shapedpatterns in the mean replication timing (MRT) profile ofthis cell line [16,17]. As sketched by the dashed squares inFig. 1(b), these four U domains likely correspond to fourmatrix-square blocks of enriched interactions. This obser-vation suggests that MRT U domains correspond to somespatial compartmentalization into self-interacting struc-tural chromatin units where the bordering early initiationzones prevent cross talk between these domains [16]. Toquantify the importance of these U-domain borders in theHi-C contact interaction graph, we perform a statisticalanalysis over the 876 U domains ($ 3 Mb) identified in
K562 [16]. We also consider 140 additional ‘‘splitdomains’’ of size % 3 Mb whose borders have similargene organization and chromatin structure as replicationdomain borders [9].
(a)
(c)
(b)
FIG. 1 (color online). (a) MRT profile (thick black curve)[16,24] from early 0 to late 1 along a 12 Mb fragment of humanchromosome 10 in K562. The horizontal colored bars correspondto the four replication domains identified as MRT U-shapedpatterns [16] (red segments, 200 kb borders; dark blue segments,400 kb center; light blue segments, interior). CTCF enrichmentprofile (thin purple curve) (ENCODE release 3, March 2010) [25].(b) Corresponding intrachromosome Hi-C contact matrix [2].Each pixel represents the total number of interactions betweenpairs of 100 kb loci. The dashed squares delimit interactionswithin the four U domains. (c) Stationary configuration obtainedfor this 12 Mb fragment when using the 2D particle model tolayout the chromosome 10 interaction graph [Fig. 3(a)]. Verticesare colored according to their position relative to replicationdomains: the border is represented in red, the center in dark blue,the interior in light blue, and the exterior in black. The repre-sented edges correspond to connections between, respectively,replication domain borders (red symbols and lines) and centers(dark blue symbols and lines) with their neighbors distant frommore than 4 Mb. The contact threshold t ¼ 2 (see the text).
PRL 111, 118102 (2013) P HY S I CA L R EV I EW LE T T E R Sweek ending
13 SEPTEMBER 2013
118102-2
0
2
4
6
8
10
12
14
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
GM
1287
8-C
tcf-S
tdR
aw (a
mea
n)
position (10 kb)
chr3
raw: γ=0.6γ=1.0γ=1.4
normalized: γ=0.6γ=1.0γ=1.4
CTCF
the biological factors (curves) vs the community boundaries (points)
chr10: normalized values
820 840 860 880 900 920position (100 kb)
820
840
860
880
900
920
posi
tion
(100
kb)
10-5
10-4
10-3
10-2
10-1
100
0
5
10
15
20
25
30
35
40
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
GM
1287
8-H
3k79
me2
-Std
Sig
(am
ean)
position (10 kb)
chr3
raw: γ=0.6γ=1.0γ=1.4
normalized: γ=0.6γ=1.0γ=1.4
0
10
20
30
40
50
60
70
80
90
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
GM
1287
8-H
3k27
ac-S
tdSi
g (a
mea
n)
position (10 kb)
chr3
raw: γ=0.6γ=1.0γ=1.4
normalized: γ=0.6γ=1.0γ=1.4
0
2
4
6
8
10
12
14
16
18
20
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
GM
1287
8-H
3k9a
c-St
dSig
(am
ean)
position (10 kb)
chr3
raw: γ=0.6γ=1.0γ=1.4
normalized: γ=0.6γ=1.0γ=1.4
0
2
4
6
8
10
12
14
16
18
20
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
GM
1287
8-H
3k4m
e3-S
tdSi
g (a
mea
n)
position (10 kb)
chr3
raw: γ=0.6γ=1.0γ=1.4
normalized: γ=0.6γ=1.0γ=1.4
0
1
2
3
4
5
6
7
8
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
GM
1287
8-H
3k36
me3
-Std
Sig
(am
ean)
position (10 kb)
chr3
raw: γ=0.6γ=1.0γ=1.4
normalized: γ=0.6γ=1.0γ=1.4
H3k36me3H3k4me3
H3k9ac
H3k27ac
H3k79me2
0
2
4
6
8
10
12
14
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
GM
1287
8-C
tcf-S
tdR
aw (a
mea
n)
position (10 kb)
chr3
raw: γ=0.6γ=1.0γ=1.4
normalized: γ=0.6γ=1.0γ=1.4
CTCF
the biological factors (curves) vs the community boundaries (points)2017. 4. 6. 22)15CTCF - Wikipedia
1/9페이지https://en.wikipedia.org/wiki/CTCF
CTCFFor the prison, see Colorado Territorial Correctional Facility.
Transcriptional repressor CTCF also known as 11-zinc fingerprotein or CCCTC-binding factor is a transcription factor that inhumans is encoded by the CTCF gene.[3][4] CTCF is involved in manycellular processes, including transcriptional regulation, insulator activity,V(D)J recombination[5] and regulation of chromatin architecture.[6]
Discovery
CCCTC-Binding factor or CTCF was initially discovered as a negativeregulator of the chicken c-myc gene. This protein was found to be bindingto three regularly spaced repeats of the core sequence CCCTC and thus wasnamed CCCTC binding factor.[7]
Function
The primary role of CTCF is thought to be in regulating the 3D structure ofchromatin.[6] CTCF binds together strands of DNA, thus formingchromatin loops, and anchors DNA to cellular structures like the nuclearlamina.[8] It also defines the boundaries between active andheterochromatic DNA.
Since the 3D structure of DNA influences the regulation of genes, CTCF'sactivity influences the expression of genes. CTCF is thought to be a primarypart of the activity of insulators, sequences that block the interactionbetween enhancers and promoters. CTCF binding has also been bothshown to promote and repress gene expression. It is unknown whetherCTCF affects gene expression solely through its looping activity, or if it hassome other, unknown, activity.[6]
Observed activity
2017. 4. 6. 22)15CTCF - Wikipedia
2/9페이지https://en.wikipedia.org/wiki/CTCF
The binding of CTCF has been shown to have many effects, which areenumerated below. In each case, it is unknown if CTCF directly evokes theoutcome or if it does so indirectly (in particular through its looping role).
Transcriptional regulation
The protein CTCF plays a heavy role in repressing the insulin-like growthfactor 2 gene, by binding to the H-19 imprinting control region (ICR) alongwith differentially-methylated region-1 (DMR1) and MAR3.[9][10]
Insulation
Binding of targeting sequence elements by CTCF can block the interactionbetween enhancers and promoters, therefore limiting the activity ofenhancers to certain functional domains. Besides acting as enhancerblocking, CTCF can also act as a chromatin barrier[11] by preventing thespread of heterochromatin structures.
Regulation of chromatin architecture
CTCF physically binds to itself to form homodimers,[12] which causes thebound DNA to form loops.[13] CTCF also occurs frequently at theboundaries of sections of DNA bound to the nuclear lamina.[8] Usingchromatin immuno-precipitation (ChIP) followed by ChIP-seq, it wasfound that CTCF localizes with cohesin genome-wide and affects generegulatory mechanisms and the higher-order chromatin structure.[14]
Regulation of RNA splicing
CTCF binding has been shown to influence mRNA splicing.[15]
DNA binding
CTCF binds to the consensus sequence CCGCGNGGNGGCAG (in IUPACnotation).[16] This sequence is defined by 11 zinc finger motifs in its
2017. 4. 6. 21)59Histone Modifications - What is Epigenetics?
1/5페이지http://www.whatisepigenetics.com/histone-modifications/
Histone Modifications
Schematic representation shows the organization and packaging of genetic material. Nucleosomes arerepresented by DNA (grey) wrapped around eight histone proteins, H2A, H2B, H3, and H4 (coloredcircles). N-terminal histone tails (blue) are shown protruding from H3 and H4.
A histone modification is a covalent post-translational modification(PTM) to histone proteins which includes methylation, phosphorylation,acetylation, ubiquitylation, and sumoylation. The PTMs made to histonescan impact gene expression by altering chromatin structure or recruitinghistone modifiers. Histone proteins act to package DNA, which wrapsaround the eight histones, into chromosomes. Histone modifications act indiverse biological processes such as transcriptional activation/inactivation,chromosome packaging, and DNA damage/repair. In most species, histoneH3 is primarily acetylated at lysines 9, 14, 18, 23, and 56, methylated atarginine 2 and lysines 4, 9, 27, 36, and 79, and phosphorylated at ser10,ser28, Thr3, and Thr11. Histone H4 is primarily acetylated at lysines 5, 8,12 and 16, methylated at arginine 3 and lysine 20, and phosphorylated atserine 1. Thus, quantitative detection of various histone modificationswould provide useful information for a better understanding of epigeneticregulation of cellular processes and the development of histone modifyingenzyme-targeted drugs.
Histone Acetylation/Deacetylation
Histone acetylation occurs by the enzymatic addition of an acetyl group(COCH3) from acetyl coenzyme A. The process of histone acetylation istightly involved in the regulation of many cellular processes including
comparison between the boundary points and CTCF peaks
communities by Louvain (our method)
TADs from Rao et al., Cell (2014).
CTCF peak
sequence
sequence
sequence
comparison between the boundary points and CTCF peaks
communities by Louvain (our method)
TADs from Rao et al., Cell (2014).
CTCF peak
sequence
sequence
sequence
boundary points
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10-1 100 101 102
TPR
(sen
sitiv
ity) a
nd T
NR
(spe
cific
ity)
γ
chr1
Rao et al.: TPRTNR
CTCF peak (1): TPRTNR
Rao et al.: TPR, randomizedTNR, randomized
CTCF peak (1): TPR, randomizedTNR, randomized
Rao et al. vs CTCF peak (1): TPRTNR
sensitivity and specificity: 10 kb resolution
“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)
“specificity” (true negative rate: TNR) = TN/ (TN + FP)
BP = the set of boundary points by Louvain (our method)
BP c= the set of nonempty sites � BP
BPext
= the set of boundary points by Rao et al. or the CTCF peaks
BP cext
= the set of nonempty sites � BPext
true positive (TP) = |BP \BPext
|true negative (TN) = |BP c \BP c
ext
|false positive (FP) = |BP \BP c
ext
|false negative (FN) = |BP c \BP
ext
|
comparison between the boundary points and peaks
communities by Louvain (our method)
TADs from Rao et al., Cell (2014).
CTCF peak data
sequence
sequence
sequence
boundary points
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10-1 100 101 102
TPR
(sen
sitiv
ity) a
nd T
NR
(spe
cific
ity)
γ
chr1
Rao et al.: TPRTNR
CTCF peak (1): TPRTNR
Rao et al.: TPR, randomizedTNR, randomized
CTCF peak (1): TPR, randomizedTNR, randomized
Rao et al. vs CTCF peak (1): TPRTNR
sensitivity and specificity: 10 kb resolution
“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)
“specificity” (true negative rate: TNR) = TN/ (TN + FP)
BP = the set of boundary points by Louvain (our method)
BP c= the set of nonempty sites � BP
BPext
= the set of boundary points by Rao et al. or the CTCF peaks
BP cext
= the set of nonempty sites � BPext
true positive (TP) = |BP \BPext
|true negative (TN) = |BP c \BP c
ext
|false positive (FP) = |BP \BP c
ext
|false negative (FN) = |BP c \BP
ext
|
comparison between the boundary points and peaks
communities by Louvain (our method)
TADs from Rao et al., Cell (2014).
CTCF peak data
sequence
sequence
sequence
boundary points
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10-1 100 101 102
TPR
(sen
sitiv
ity) a
nd T
NR
(spe
cific
ity)
γ
chr1
Rao et al.: TPRTNR
CTCF peak (1): TPRTNR
Rao et al.: TPR, randomizedTNR, randomized
CTCF peak (1): TPR, randomizedTNR, randomized
Rao et al. vs CTCF peak (1): TPRTNR
sensitivity and specificity: 10 kb resolution
“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)
“specificity” (true negative rate: TNR) = TN/ (TN + FP)
BP = the set of boundary points by Louvain (our method)
BP c= the set of nonempty sites � BP
BPext
= the set of boundary points by Rao et al. or the CTCF peaks
BP cext
= the set of nonempty sites � BPext
true positive (TP) = |BP \BPext
|true negative (TN) = |BP c \BP c
ext
|false positive (FP) = |BP \BP c
ext
|false negative (FN) = |BP c \BP
ext
|
comparison between the boundary points and peaks
communities by Louvain (our method)
TADs from Rao et al., Cell (2014).
CTCF peak data
sequence
sequence
sequence
boundary points
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10-1 100 101 102
TPR
(sen
sitiv
ity) a
nd T
NR
(spe
cific
ity)
γ
chr1
Rao et al.: TPRTNR
CTCF peak (1): TPRTNR
Rao et al.: TPR, randomizedTNR, randomized
CTCF peak (1): TPR, randomizedTNR, randomized
Rao et al. vs CTCF peak (1): TPRTNR
sensitivity and specificity: 10 kb resolution
“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)
“specificity” (true negative rate: TNR) = TN/ (TN + FP)
BP = the set of boundary points by Louvain (our method)
BP c= the set of nonempty sites � BP
BPext
= the set of boundary points by Rao et al. or the CTCF peaks
BP cext
= the set of nonempty sites � BPext
true positive (TP) = |BP \BPext
|true negative (TN) = |BP c \BP c
ext
|false positive (FP) = |BP \BP c
ext
|false negative (FN) = |BP c \BP
ext
|
comparison between the boundary points and peaks
communities by Louvain (our method)
TADs from Rao et al., Cell (2014).
CTCF peak data
sequence
sequence
sequence
boundary points
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10-1 100 101 102
TPR
(sen
sitiv
ity) a
nd T
NR
(spe
cific
ity)
γ
chr1
Rao et al.: TPRTNR
CTCF peak (1): TPRTNR
Rao et al.: TPR, randomizedTNR, randomized
CTCF peak (1): TPR, randomizedTNR, randomized
Rao et al. vs CTCF peak (1): TPRTNR
sensitivity and specificity: 10 kb resolution
“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)
“specificity” (true negative rate: TNR) = TN/ (TN + FP)
BP = the set of boundary points by Louvain (our method)
BP c= the set of nonempty sites � BP
BPext
= the set of boundary points by Rao et al. or the CTCF peaks
BP cext
= the set of nonempty sites � BPext
true positive (TP) = |BP \BPext
|true negative (TN) = |BP c \BP c
ext
|false positive (FP) = |BP \BP c
ext
|false negative (FN) = |BP c \BP
ext
|
comparison between the boundary points and peaks
communities by Louvain (our method)
TADs from Rao et al., Cell (2014).
CTCF peak data
sequence
sequence
sequence
boundary points
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10-1 100 101 102
TPR
(sen
sitiv
ity) a
nd T
NR
(spe
cific
ity)
γ
chr1
Rao et al.: TPRTNR
CTCF peak (1): TPRTNR
Rao et al.: TPR, randomizedTNR, randomized
CTCF peak (1): TPR, randomizedTNR, randomized
Rao et al. vs CTCF peak (1): TPRTNR
sensitivity and specificity: 10 kb resolution
“sensitivity” (true positive rate: TPR) = TP/ (TP + FN)
“specificity” (true negative rate: TNR) = TN/ (TN + FP)
BP = the set of boundary points by Louvain (our method)
BP c= the set of nonempty sites � BP
BPext
= the set of boundary points by Rao et al. or the CTCF peaks
BP cext
= the set of nonempty sites � BPext
randomly chosen boundary points (1000 realizations)
true positive (TP) = |BP \BPext
|true negative (TN) = |BP c \BP c
ext
|false positive (FP) = |BP \BP c
ext
|false negative (FN) = |BP c \BP
ext
|
comparison between the boundary points and peaks
communities by Louvain (our method)
TADs from Rao et al., Cell (2014).
CTCF peak data
sequence
sequence
sequence
boundary points
comparison between the boundary points and peaks
communities by Louvain (our method)
TADs from Rao et al., Cell (2014).
CTCF peak
from Prof. Per Stenberg @ Umeå University
sequence
sequence
sequence
boundary points
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
10-1 100 101 102 103
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
true positive (TP) = |BP \BPext
|true negative (TN) = |BP c \BP c
ext
|false positive (FP) = |BP \BP c
ext
|false negative (FN) = |BP c \BP
ext
|
precision: 10 kb resolution
“precision” (positive predictive value: PPV) = TP/ (TP + FP)
BP = the set of boundary points by Louvain (our method)
BP c= the set of nonempty sites � BP
BPext
= the set of boundary points by Rao et al. or the CTCF peaks
BP cext
= the set of nonempty sites � BPext
randomly chosen boundary points (1000 realizations)
comparison between the boundary points and peaks
communities by Louvain (our method)
TADs from Rao et al., Cell (2014).
CTCF peak
from Prof. Per Stenberg @ Umeå University
sequence
sequence
sequence
boundary points
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
10-1 100 101 102 103
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
true positive (TP) = |BP \BPext
|true negative (TN) = |BP c \BP c
ext
|false positive (FP) = |BP \BP c
ext
|false negative (FN) = |BP c \BP
ext
|
precision: 10 kb resolution
“precision” (positive predictive value: PPV) = TP/ (TP + FP)
BP = the set of boundary points by Louvain (our method)
BP c= the set of nonempty sites � BP
BPext
= the set of boundary points by Rao et al. or the CTCF peaks
BP cext
= the set of nonempty sites � BPext
randomly chosen boundary points (1000 realizations)
Their Arrowhead + HiCCUPS method is very slow
O(n4) ! O(n2
) by dynamic programming
O(n log n)
other null-model terms in the modularity than Newman-Girvan?
or “communities.” Intuitively, a community consists of a setof nodes that are connected among one another more denselythan they are to nodes in other communities. A popular wayto identify community structure is to optimize a quality func-tion, which can be used to measure the relative densities ofintra-community connections versus inter-community connec-tions. See Refs. 16, 20, and 23 for recent reviews on networkcommunity structure and Refs. 24–27 for discussions of vari-ous caveats that should be considered when optimizing qualityfunctions to detect communities.
One begins with a network of N nodes and a given set ofconnections between those nodes. In the usual case ofsingle-layer networks (e.g., static networks with only onetype of edge), one represents a network using an N ! N adja-cency matrix A. The element Aij of the adjacency matrixindicates a direct connection or “edge” from node i to node j,and its value indicates the weight of that connection. Thequality of a hard partition of A into communities (wherebyeach node is assigned to exactly one community) can bequantified using a quality function. The most popular choiceis modularity16,20,21,28,29
Q0 ¼X
ij
½Aij $ cPij%dðgi; gjÞ ; (1)
where node i is assigned to community gi, node j is assignedto community gj, the Kronecker delta dðgi; gjÞ ¼ 1 if gi ¼ gj
and it equals 0 otherwise, c is a resolution parameter (whichwe will call a structural resolution parameter), and Pij is theexpected weight of the edge connecting node i to node junder a specified null model. The choice c ¼ 1 is very com-mon, but it is important to consider multiple values of c toexamine groups at multiple scales.16,30,31 Maximization ofQ0 yields a hard partition of a network into communitiessuch that the total edge weight inside of modules is as largeas possible (relative to the null model and subject to the limi-tations of the employed computational heuristics, as optimiz-ing Q0 is NP-hard16,20,32).
Recently, the null model in the quality function (1) hasbeen generalized so that one can consider sets of L adjacencymatrices, which are combined to form a rank-3 adjacencytensor A that can be used to represent time-dependent ormultiplex networks. One can thereby define a multilayermodularity (also called “multislice modularity”)3
Q ¼ 1
2l
X
ijlr
fðAijl $ clPijlÞdlr þ dijxjlrgdðgil; gjrÞ ; (2)
where the adjacency matrix of layer l has components Aijl,the element Pijl gives the components of the correspondinglayer-l matrix for the optimization null model, cl is the struc-tural resolution parameter of layer l, the quantity gil gives thecommunity assignment of node i in layer l, the quantity gjr
gives the community assignment of node j in layer r, the ele-ment xjlr gives the connection strength (i.e., an “interlayercoupling parameter,” which one can call a temporal resolu-tion parameter if one is using the adjacency tensor to repre-sent a time-dependent network) from node j in layer r tonode j in layer l, the total edge weight in the network isl ¼ 1
2
Pjr jjr, the strength (i.e., weighted degree) of node j in
layer l is jjl ¼ kjl þ cjl, the intra-layer strength of node j inlayer l is kjl ¼
Pi Aijl, and the inter-layer strength of node j
in layer l is cjl ¼P
r xjlr.Equivalent representations that use other notation can,
of course, be useful. For example, multilayer modularitycan be recast as a set of rank-2 matrices describing connec-tions between the set of all nodes across layers [e.g., forspectral partitioning29,33,34]. One can similarly generalize Qfor higher-rank tensors, which one can use when studyingcommunity structure in networks that are both time-dependent and multiplex, through appropriate specificationof inter-layer coupling tensors.
B. Network diagnostics
To characterize multilayer community structure, wecompute four example diagnostics for each hard partition:the modularity Q, the number of modules n, the mean com-munity size s (which is equal to the number of nodes in thecommunity and is proportional to 1/n), and the stationarityf.35 To compute f, we calculate the autocorrelation functionU(t, tþm) of two states of the same community G(t) at mtime steps (i.e., m network layers) apart
Uðt; tþ mÞ ) jGðtÞ \ Gðtþ mÞjjGðtÞ [ Gðtþ mÞj
; (3)
where jGðtÞ \ Gðtþ mÞj is the number of nodes that aremembers of both G(t) and G(tþm), and jGðtÞ [ Gðtþ mÞj isthe number of nodes in the union of the community at times tand tþm. Defining t0 to be the first time step in which thecommunity exists and t0 to be the last time in which it exists,the stationarity of a community is35
f )
Xt0$1
t¼t0Uðt; tþ 1Þ
t0 $ t0: (4)
This gives the mean autocorrelation over consecutive timesteps.36
In addition to these diagnostics, which are defined usingthe entire multilayer community structure, we also computetwo example diagnostics on the community structures of thecomponent layers: the mean single-layer modularity hQsi andthe variance varðQsÞ of the single-layer modularity over alllayers. The single-layer modularity Qs is defined as the staticmodularity quality function, Qs ¼
Pij½Aij $ cPij%dðgi; gjÞ,
computed for the partition g that we obtained via optimiza-tion of the multilayer modularity function Q. We have chosento use a few simple ways to help characterize the timeseries for Qs, though of course other diagnostics can also beinformative.
C. Data sets
We illustrate dynamic network null models using twoexample network ensembles: (1) 75-time-layer brain net-works drawn from each of 20 human subjects and (2) behav-ioral networks with about 150 time layers drawn from eachof 22 human subjects. Importantly, the use of network
013142-3 Bassett et al. Chaos 23, 013142 (2013)
Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions
ref) D. S. Bassett, M. A. Porter, N. F. Wymbs, S. T. Grafton, J. M. Carlson, and P. J. Mucha, Chaos 23, 013142 (2013).
clustering techniques have been developed to identify com-munities, and they have yielded insights in the study of thecommittee structure in the United States Congress,18 func-tional groups in protein interaction networks,19 functionalmodules in brain networks,4 and more. A particularly success-ful technique for identifying communities in networks16,20 isoptimization of a quality function known as “modularity,”21
which recently has been generalized for detecting commun-ities in time-dependent and multiplex networks.3
Modularity optimization allows one to algorithmicallypartition a network’s nodes into communities such that thetotal connection strength within groups of the partition is morethan would be expected in some null model. However, modu-larity optimization always yields a network partition (into a setof communities) as an output whether or not a given networktruly contains modular structure. Therefore, application of sub-sequent diagnostics to a network partition is potentially mean-ingless without some comparison to benchmark or null-modelnetworks. That is, it is important to establish whether the parti-tion(s) obtained appear to represent meaningful communitystructures within the network data or whether they might havereasonably arisen at random. Moreover, robust assessment ofnetwork organization depends fundamentally on the develop-ment of statistical techniques to compare structures in a net-work derived from real data to those in appropriate models(see, e.g., Ref. 22). Indeed, as the constraints in null modelsand network benchmarks become more stringent, it canbecome possible to make stronger claims when interpretingorganizational structures such as community structure.
In the present paper, we examine null models in time-dependent networks and investigate their use in the algorithmicdetection of cohesive, dynamic communities in such networks(see Fig. 2). Indeed, community detection in temporal net-works necessitates the development of null models that areappropriate for such networks. Such null models can help pro-vide bases of comparison at various stages of the community-detection process, and they can thereby facilitate the principledidentification of dynamic structure in networks. Indeed, the im-portance of developing null models extends beyond commu-nity detection, as such models make it possible to obtainstatistically significant estimates of network diagnostics.
Our dynamic network null models fall into two catego-ries: optimization null models, which we use in the identifi-cation of community structure; and post-optimization nullmodels, which we use to examine the identified communitystructure. We describe how these null models can be selected
in a manner appropriate to known features of a network’sconstruction, identify potentially interesting network scalesby determining values of interest for structural and temporalresolution parameters, and inform the choice of representa-tive partitions of a network into communities.
II. METHODS
A. Community detection
Community-detection algorithms provide ways to decom-pose a network into dense groups of nodes called “modules”
FIG. 1. An important property of many real-world networks is communitystructure, in which there exist cohesive groups of nodes such that a networkhas stronger connections within such groups than it does between such groups.Community structure often changes in time, which can lead to the rearrange-ment of cohesive groups, the formation of new groups, and the fragmentationof existing groups.
FIG. 2. Methodological considerations important in the investigation ofdynamic community structure in temporal networks. (A) Depending on thesystem under study, a single network layer (which is represented using an or-dinary adjacency matrix with an extra index to indicate the layer) might bydefinition only allow edges from some subset of the complete set of nodepairs, as is the case in the depicted chain-like graph. We call such a situationpartial connectivity. (B) Although the most common optimization null modelemploys random graphs (e.g., the Newman-Girvan null model, which isclosely related to the configuration model1,16), other models can also provideimportant insights into network community structure. (C) After determining aset of partitions that maximize the modularity Q (or a similar quality function),it is interesting to test whether the community structure is different from, forexample, what would be expected with a scrambling of time layers (i.e., a tem-poral null model) or node identities (i.e., a nodal null model).4
013142-2 Bassett et al. Chaos 23, 013142 (2013)
Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions
As described in more detail in Ref. 13, we construct anensemble of 66 behavioral networks from 22 individuals and3 experimental conditions. These networks represent a set offinger movements in the same simple motor learning experi-ment from which we constructed the brain networks in dataset 1. Subjects were instructed to press a sequence of buttonscorresponding to a sequence of 12 pseudo-musical notesshown to them on a screen.
Each node represents an interval between consecutivebutton presses. A single network layer consists of N¼ 11nodes (i.e., there is one interval between each pair of notes),which are connected in a chain via weighted, undirectededges. In Ref. 13, we examined the phenomenon of motor“chunking,” which is a fascinating but poorly understoodphenomenon in which groups of movements are made withsimilar inter-movement durations. (This is similar to remem-bering a phone number in groups of a few digits or groupingnotes together as one masters how to play a song.) For eachexperimental trial l and each pair of inter-movement inter-vals i and j, we define the weight of an edge connectinginter-movement i to inter-movement j as the normalized sim-ilarity in inter-movement durations. The normalized similar-ity between nodes i and j is defined as
qijl ¼!dl " dijl
!dl; (6)
where dijl is the absolute value of the difference of lengths ofthe ith and jth inter-movement time intervals in trial l and !dl
is the maximum value of dijl in trial l. These weights yieldthe elements Wijl of a weighted, undirected multilayer net-work W. Because finger movements occur in series, inter-movement i is connected in time to inter-movement i 6 1 butnot to any other inter-movements iþ n for jnj 6¼ 1.
To encode this conceptual relationship as a network, weset all non-contiguous connections in W to 0 and therebyconstruct a weighted, undirected chain network A. In Fig.3(b), we show an example trial layer from A for a single sub-ject in this experimental data. We couple layers of A to oneanother with weight xjlr, which gives the connection strengthbetween node j in experimental trial r and node j in trial l. Ina given instantiation of the network, we again let xjlr $ x 2½0:1; 40& be identical for all nodes j for all connectionsbetween nearest-neighbor layers. (Again, xjlr ¼ 0 in all othercases.) Because inter-movement nodes are ordered, one canapply community-detection algorithms to identify commun-ities of nodes in sequence. Each community represents amotor “chunk.”
III. RESULTS
A. Modularity-optimization null models
After constructing a multilayer network A with elementsAijl, it is necessary to select an optimization null model P inEq. (2). The most common modularity-optimization nullmodel used in undirected, single-layer networks is theNewman-Girvan null model16,20,21,28,29
Pij ¼kikj
2m; (7)
where ki ¼P
j Aij is the strength of node i and m ¼ 12
Pij Aij.
The definition (7) can be extended to multilayer networksusing
Pijl ¼kilkjl
2ml; (8)
where kil ¼P
j Aijl is the strength of node i in layer l andml ¼ 1
2
Pij Aijl. Optimization of Q using the null model (8)
identifies partitions of a network into groups that have moreconnections (in the case of binary networks) or higher con-nection densities (in the case of weighted networks) thanwould be expected for the distribution of connections (orconnection densities) expected in a null model. We use thenotation Al for the layer-l adjacency matrix composed of ele-ments Aijl and the notation Pl to denote the layer-l null-model matrix with elements Pijl. See Fig. 4(a) for an examplelayer Al from a multilayer behavioral network and Fig. 4(b)for an example instantiation of the Newman-Girvan nullmodel Pl.
1. Optimization null models for ordered nodenetworks
The Newman-Girvan null model is particularly usefulfor networks with categorical nodes, in which a connection
FIG. 3. Network layers and community assignments from two example datasets: (A) a brain network based on correlations between blood-oxygen-level-dependent (BOLD) signals4 and (B) a behavioral network based on similar-ities in movement times during a simple motor learning experiment.13 Weuse these data sets to illustrate situations with categorical nodes and orderednodes, respectively. In the bottom panels, we show community assignmentsobtained using multilayer community detection for (C) the brain networksand (D) the behavioral networks.
013142-5 Bassett et al. Chaos 23, 013142 (2013)
Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions
between any pair of nodes can occur in theory. However,when using a chain network of ordered nodes, it is useful toconsider alternative null models. For example, in a networkrepresented by an adjacency matrix A0, one can define
Pij ¼ qA0ij ; (9)
where q is the mean edge weight of the chain network andA0 is the binarized version of A, in which nonzero elementsof A are set to 1 and zero-valued elements remain unaltered.Such a null model can also be defined for a multilayer net-work that is represented by a rank-3 adjacency tensor A. Onecan construct a null model P with components
Pijl ¼ qlA0ijl ; (10)
where ql is the mean edge weight in layer l and A0 is thebinarized version of A. The optimization of Q using this nullmodel identifies partitions of a network whose communitieshave a larger strength than the mean. See Fig. 4(c) for anexample of this chain null model Pl for the behavioral net-work layer shown in Fig. 4(a).
In Fig. 4(d), we illustrate the effect that the choice ofoptimization null model has on the modularity values Q ofthe behavioral networks as a function of the structural resolu-tion parameter. (Throughout the manuscript, we use aLouvain-like locally greedy algorithm to maximize the mul-tilayer modularity quality function.57,58) The Newman-Girvan null model gives decreasing values of Q forc 2 ½0:1; 2:1#, whereas the chain null model produces lowervalues of Q, which behaves in a qualitatively different
FIG. 4. Modularity-optimization null models. (A) Example layer Al from a behavioral network. (B) Newman-Girvan and (C) chain null models Pl for the layershown in panel (A). (D) Optimized multilayer modularity value Q, (E) number of communities n, and (F) mean community size s for the complete multilayerbehavioral network employing the Newman-Girvan (black) and chain (red) optimization null models as a function of the structural resolution parameter c.(G) Optimized modularity value Q, (H) number of communities n, and (I) mean community size s for the multilayer behavioral network employing chain opti-mization null models as a function of the effective fraction nmlðcÞ of edges that have larger weights than their null-model counterparts. We averaged the valuesof Q, n, and s over the 3 different 12-note sequences and C¼ 100 optimizations. Box plots in (D-F) indicate quartiles and 95% confidence intervals over the 22individuals in the study. The error bars in panels (G-I) indicate a standard deviation from the mean. In some instances, this is smaller than the line width. Thetemporal resolution-parameter value is x ¼ 1.
013142-6 Bassett et al. Chaos 23, 013142 (2013)
Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions
between any pair of nodes can occur in theory. However,when using a chain network of ordered nodes, it is useful toconsider alternative null models. For example, in a networkrepresented by an adjacency matrix A0, one can define
Pij ¼ qA0ij ; (9)
where q is the mean edge weight of the chain network andA0 is the binarized version of A, in which nonzero elementsof A are set to 1 and zero-valued elements remain unaltered.Such a null model can also be defined for a multilayer net-work that is represented by a rank-3 adjacency tensor A. Onecan construct a null model P with components
Pijl ¼ qlA0ijl ; (10)
where ql is the mean edge weight in layer l and A0 is thebinarized version of A. The optimization of Q using this nullmodel identifies partitions of a network whose communitieshave a larger strength than the mean. See Fig. 4(c) for anexample of this chain null model Pl for the behavioral net-work layer shown in Fig. 4(a).
In Fig. 4(d), we illustrate the effect that the choice ofoptimization null model has on the modularity values Q ofthe behavioral networks as a function of the structural resolu-tion parameter. (Throughout the manuscript, we use aLouvain-like locally greedy algorithm to maximize the mul-tilayer modularity quality function.57,58) The Newman-Girvan null model gives decreasing values of Q forc 2 ½0:1; 2:1#, whereas the chain null model produces lowervalues of Q, which behaves in a qualitatively different
FIG. 4. Modularity-optimization null models. (A) Example layer Al from a behavioral network. (B) Newman-Girvan and (C) chain null models Pl for the layershown in panel (A). (D) Optimized multilayer modularity value Q, (E) number of communities n, and (F) mean community size s for the complete multilayerbehavioral network employing the Newman-Girvan (black) and chain (red) optimization null models as a function of the structural resolution parameter c.(G) Optimized modularity value Q, (H) number of communities n, and (I) mean community size s for the multilayer behavioral network employing chain opti-mization null models as a function of the effective fraction nmlðcÞ of edges that have larger weights than their null-model counterparts. We averaged the valuesof Q, n, and s over the 3 different 12-note sequences and C¼ 100 optimizations. Box plots in (D-F) indicate quartiles and 95% confidence intervals over the 22individuals in the study. The error bars in panels (G-I) indicate a standard deviation from the mean. In some instances, this is smaller than the line width. Thetemporal resolution-parameter value is x ¼ 1.
013142-6 Bassett et al. Chaos 23, 013142 (2013)
Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions
or “communities.” Intuitively, a community consists of a setof nodes that are connected among one another more denselythan they are to nodes in other communities. A popular wayto identify community structure is to optimize a quality func-tion, which can be used to measure the relative densities ofintra-community connections versus inter-community connec-tions. See Refs. 16, 20, and 23 for recent reviews on networkcommunity structure and Refs. 24–27 for discussions of vari-ous caveats that should be considered when optimizing qualityfunctions to detect communities.
One begins with a network of N nodes and a given set ofconnections between those nodes. In the usual case ofsingle-layer networks (e.g., static networks with only onetype of edge), one represents a network using an N ! N adja-cency matrix A. The element Aij of the adjacency matrixindicates a direct connection or “edge” from node i to node j,and its value indicates the weight of that connection. Thequality of a hard partition of A into communities (wherebyeach node is assigned to exactly one community) can bequantified using a quality function. The most popular choiceis modularity16,20,21,28,29
Q0 ¼X
ij
½Aij $ cPij%dðgi; gjÞ ; (1)
where node i is assigned to community gi, node j is assignedto community gj, the Kronecker delta dðgi; gjÞ ¼ 1 if gi ¼ gj
and it equals 0 otherwise, c is a resolution parameter (whichwe will call a structural resolution parameter), and Pij is theexpected weight of the edge connecting node i to node junder a specified null model. The choice c ¼ 1 is very com-mon, but it is important to consider multiple values of c toexamine groups at multiple scales.16,30,31 Maximization ofQ0 yields a hard partition of a network into communitiessuch that the total edge weight inside of modules is as largeas possible (relative to the null model and subject to the limi-tations of the employed computational heuristics, as optimiz-ing Q0 is NP-hard16,20,32).
Recently, the null model in the quality function (1) hasbeen generalized so that one can consider sets of L adjacencymatrices, which are combined to form a rank-3 adjacencytensor A that can be used to represent time-dependent ormultiplex networks. One can thereby define a multilayermodularity (also called “multislice modularity”)3
Q ¼ 1
2l
X
ijlr
fðAijl $ clPijlÞdlr þ dijxjlrgdðgil; gjrÞ ; (2)
where the adjacency matrix of layer l has components Aijl,the element Pijl gives the components of the correspondinglayer-l matrix for the optimization null model, cl is the struc-tural resolution parameter of layer l, the quantity gil gives thecommunity assignment of node i in layer l, the quantity gjr
gives the community assignment of node j in layer r, the ele-ment xjlr gives the connection strength (i.e., an “interlayercoupling parameter,” which one can call a temporal resolu-tion parameter if one is using the adjacency tensor to repre-sent a time-dependent network) from node j in layer r tonode j in layer l, the total edge weight in the network isl ¼ 1
2
Pjr jjr, the strength (i.e., weighted degree) of node j in
layer l is jjl ¼ kjl þ cjl, the intra-layer strength of node j inlayer l is kjl ¼
Pi Aijl, and the inter-layer strength of node j
in layer l is cjl ¼P
r xjlr.Equivalent representations that use other notation can,
of course, be useful. For example, multilayer modularitycan be recast as a set of rank-2 matrices describing connec-tions between the set of all nodes across layers [e.g., forspectral partitioning29,33,34]. One can similarly generalize Qfor higher-rank tensors, which one can use when studyingcommunity structure in networks that are both time-dependent and multiplex, through appropriate specificationof inter-layer coupling tensors.
B. Network diagnostics
To characterize multilayer community structure, wecompute four example diagnostics for each hard partition:the modularity Q, the number of modules n, the mean com-munity size s (which is equal to the number of nodes in thecommunity and is proportional to 1/n), and the stationarityf.35 To compute f, we calculate the autocorrelation functionU(t, tþm) of two states of the same community G(t) at mtime steps (i.e., m network layers) apart
Uðt; tþ mÞ ) jGðtÞ \ Gðtþ mÞjjGðtÞ [ Gðtþ mÞj
; (3)
where jGðtÞ \ Gðtþ mÞj is the number of nodes that aremembers of both G(t) and G(tþm), and jGðtÞ [ Gðtþ mÞj isthe number of nodes in the union of the community at times tand tþm. Defining t0 to be the first time step in which thecommunity exists and t0 to be the last time in which it exists,the stationarity of a community is35
f )
Xt0$1
t¼t0Uðt; tþ 1Þ
t0 $ t0: (4)
This gives the mean autocorrelation over consecutive timesteps.36
In addition to these diagnostics, which are defined usingthe entire multilayer community structure, we also computetwo example diagnostics on the community structures of thecomponent layers: the mean single-layer modularity hQsi andthe variance varðQsÞ of the single-layer modularity over alllayers. The single-layer modularity Qs is defined as the staticmodularity quality function, Qs ¼
Pij½Aij $ cPij%dðgi; gjÞ,
computed for the partition g that we obtained via optimiza-tion of the multilayer modularity function Q. We have chosento use a few simple ways to help characterize the timeseries for Qs, though of course other diagnostics can also beinformative.
C. Data sets
We illustrate dynamic network null models using twoexample network ensembles: (1) 75-time-layer brain net-works drawn from each of 20 human subjects and (2) behav-ioral networks with about 150 time layers drawn from eachof 22 human subjects. Importantly, the use of network
013142-3 Bassett et al. Chaos 23, 013142 (2013)
Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions
Pij = ⇢A0ijPij =
kikj2m
other null-model terms in the modularity than Newman-Girvan?
ref) D. S. Bassett, M. A. Porter, N. F. Wymbs, S. T. Grafton, J. M. Carlson, and P. J. Mucha, Chaos 23, 013142 (2013).
between any pair of nodes can occur in theory. However,when using a chain network of ordered nodes, it is useful toconsider alternative null models. For example, in a networkrepresented by an adjacency matrix A0, one can define
Pij ¼ qA0ij ; (9)
where q is the mean edge weight of the chain network andA0 is the binarized version of A, in which nonzero elementsof A are set to 1 and zero-valued elements remain unaltered.Such a null model can also be defined for a multilayer net-work that is represented by a rank-3 adjacency tensor A. Onecan construct a null model P with components
Pijl ¼ qlA0ijl ; (10)
where ql is the mean edge weight in layer l and A0 is thebinarized version of A. The optimization of Q using this nullmodel identifies partitions of a network whose communitieshave a larger strength than the mean. See Fig. 4(c) for anexample of this chain null model Pl for the behavioral net-work layer shown in Fig. 4(a).
In Fig. 4(d), we illustrate the effect that the choice ofoptimization null model has on the modularity values Q ofthe behavioral networks as a function of the structural resolu-tion parameter. (Throughout the manuscript, we use aLouvain-like locally greedy algorithm to maximize the mul-tilayer modularity quality function.57,58) The Newman-Girvan null model gives decreasing values of Q forc 2 ½0:1; 2:1#, whereas the chain null model produces lowervalues of Q, which behaves in a qualitatively different
FIG. 4. Modularity-optimization null models. (A) Example layer Al from a behavioral network. (B) Newman-Girvan and (C) chain null models Pl for the layershown in panel (A). (D) Optimized multilayer modularity value Q, (E) number of communities n, and (F) mean community size s for the complete multilayerbehavioral network employing the Newman-Girvan (black) and chain (red) optimization null models as a function of the structural resolution parameter c.(G) Optimized modularity value Q, (H) number of communities n, and (I) mean community size s for the multilayer behavioral network employing chain opti-mization null models as a function of the effective fraction nmlðcÞ of edges that have larger weights than their null-model counterparts. We averaged the valuesof Q, n, and s over the 3 different 12-note sequences and C¼ 100 optimizations. Box plots in (D-F) indicate quartiles and 95% confidence intervals over the 22individuals in the study. The error bars in panels (G-I) indicate a standard deviation from the mean. In some instances, this is smaller than the line width. Thetemporal resolution-parameter value is x ¼ 1.
013142-6 Bassett et al. Chaos 23, 013142 (2013)
Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions
our data: not exactly (exclusively) the chain shape
or “communities.” Intuitively, a community consists of a setof nodes that are connected among one another more denselythan they are to nodes in other communities. A popular wayto identify community structure is to optimize a quality func-tion, which can be used to measure the relative densities ofintra-community connections versus inter-community connec-tions. See Refs. 16, 20, and 23 for recent reviews on networkcommunity structure and Refs. 24–27 for discussions of vari-ous caveats that should be considered when optimizing qualityfunctions to detect communities.
One begins with a network of N nodes and a given set ofconnections between those nodes. In the usual case ofsingle-layer networks (e.g., static networks with only onetype of edge), one represents a network using an N ! N adja-cency matrix A. The element Aij of the adjacency matrixindicates a direct connection or “edge” from node i to node j,and its value indicates the weight of that connection. Thequality of a hard partition of A into communities (wherebyeach node is assigned to exactly one community) can bequantified using a quality function. The most popular choiceis modularity16,20,21,28,29
Q0 ¼X
ij
½Aij $ cPij%dðgi; gjÞ ; (1)
where node i is assigned to community gi, node j is assignedto community gj, the Kronecker delta dðgi; gjÞ ¼ 1 if gi ¼ gj
and it equals 0 otherwise, c is a resolution parameter (whichwe will call a structural resolution parameter), and Pij is theexpected weight of the edge connecting node i to node junder a specified null model. The choice c ¼ 1 is very com-mon, but it is important to consider multiple values of c toexamine groups at multiple scales.16,30,31 Maximization ofQ0 yields a hard partition of a network into communitiessuch that the total edge weight inside of modules is as largeas possible (relative to the null model and subject to the limi-tations of the employed computational heuristics, as optimiz-ing Q0 is NP-hard16,20,32).
Recently, the null model in the quality function (1) hasbeen generalized so that one can consider sets of L adjacencymatrices, which are combined to form a rank-3 adjacencytensor A that can be used to represent time-dependent ormultiplex networks. One can thereby define a multilayermodularity (also called “multislice modularity”)3
Q ¼ 1
2l
X
ijlr
fðAijl $ clPijlÞdlr þ dijxjlrgdðgil; gjrÞ ; (2)
where the adjacency matrix of layer l has components Aijl,the element Pijl gives the components of the correspondinglayer-l matrix for the optimization null model, cl is the struc-tural resolution parameter of layer l, the quantity gil gives thecommunity assignment of node i in layer l, the quantity gjr
gives the community assignment of node j in layer r, the ele-ment xjlr gives the connection strength (i.e., an “interlayercoupling parameter,” which one can call a temporal resolu-tion parameter if one is using the adjacency tensor to repre-sent a time-dependent network) from node j in layer r tonode j in layer l, the total edge weight in the network isl ¼ 1
2
Pjr jjr, the strength (i.e., weighted degree) of node j in
layer l is jjl ¼ kjl þ cjl, the intra-layer strength of node j inlayer l is kjl ¼
Pi Aijl, and the inter-layer strength of node j
in layer l is cjl ¼P
r xjlr.Equivalent representations that use other notation can,
of course, be useful. For example, multilayer modularitycan be recast as a set of rank-2 matrices describing connec-tions between the set of all nodes across layers [e.g., forspectral partitioning29,33,34]. One can similarly generalize Qfor higher-rank tensors, which one can use when studyingcommunity structure in networks that are both time-dependent and multiplex, through appropriate specificationof inter-layer coupling tensors.
B. Network diagnostics
To characterize multilayer community structure, wecompute four example diagnostics for each hard partition:the modularity Q, the number of modules n, the mean com-munity size s (which is equal to the number of nodes in thecommunity and is proportional to 1/n), and the stationarityf.35 To compute f, we calculate the autocorrelation functionU(t, tþm) of two states of the same community G(t) at mtime steps (i.e., m network layers) apart
Uðt; tþ mÞ ) jGðtÞ \ Gðtþ mÞjjGðtÞ [ Gðtþ mÞj
; (3)
where jGðtÞ \ Gðtþ mÞj is the number of nodes that aremembers of both G(t) and G(tþm), and jGðtÞ [ Gðtþ mÞj isthe number of nodes in the union of the community at times tand tþm. Defining t0 to be the first time step in which thecommunity exists and t0 to be the last time in which it exists,the stationarity of a community is35
f )
Xt0$1
t¼t0Uðt; tþ 1Þ
t0 $ t0: (4)
This gives the mean autocorrelation over consecutive timesteps.36
In addition to these diagnostics, which are defined usingthe entire multilayer community structure, we also computetwo example diagnostics on the community structures of thecomponent layers: the mean single-layer modularity hQsi andthe variance varðQsÞ of the single-layer modularity over alllayers. The single-layer modularity Qs is defined as the staticmodularity quality function, Qs ¼
Pij½Aij $ cPij%dðgi; gjÞ,
computed for the partition g that we obtained via optimiza-tion of the multilayer modularity function Q. We have chosento use a few simple ways to help characterize the timeseries for Qs, though of course other diagnostics can also beinformative.
C. Data sets
We illustrate dynamic network null models using twoexample network ensembles: (1) 75-time-layer brain net-works drawn from each of 20 human subjects and (2) behav-ioral networks with about 150 time layers drawn from eachof 22 human subjects. Importantly, the use of network
013142-3 Bassett et al. Chaos 23, 013142 (2013)
Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions
Pij = ⇢A0ijPij =
kikj2m
other null-model terms in the modularity than Newman-Girvan?
ref) D. S. Bassett, M. A. Porter, N. F. Wymbs, S. T. Grafton, J. M. Carlson, and P. J. Mucha, Chaos 23, 013142 (2013).
between any pair of nodes can occur in theory. However,when using a chain network of ordered nodes, it is useful toconsider alternative null models. For example, in a networkrepresented by an adjacency matrix A0, one can define
Pij ¼ qA0ij ; (9)
where q is the mean edge weight of the chain network andA0 is the binarized version of A, in which nonzero elementsof A are set to 1 and zero-valued elements remain unaltered.Such a null model can also be defined for a multilayer net-work that is represented by a rank-3 adjacency tensor A. Onecan construct a null model P with components
Pijl ¼ qlA0ijl ; (10)
where ql is the mean edge weight in layer l and A0 is thebinarized version of A. The optimization of Q using this nullmodel identifies partitions of a network whose communitieshave a larger strength than the mean. See Fig. 4(c) for anexample of this chain null model Pl for the behavioral net-work layer shown in Fig. 4(a).
In Fig. 4(d), we illustrate the effect that the choice ofoptimization null model has on the modularity values Q ofthe behavioral networks as a function of the structural resolu-tion parameter. (Throughout the manuscript, we use aLouvain-like locally greedy algorithm to maximize the mul-tilayer modularity quality function.57,58) The Newman-Girvan null model gives decreasing values of Q forc 2 ½0:1; 2:1#, whereas the chain null model produces lowervalues of Q, which behaves in a qualitatively different
FIG. 4. Modularity-optimization null models. (A) Example layer Al from a behavioral network. (B) Newman-Girvan and (C) chain null models Pl for the layershown in panel (A). (D) Optimized multilayer modularity value Q, (E) number of communities n, and (F) mean community size s for the complete multilayerbehavioral network employing the Newman-Girvan (black) and chain (red) optimization null models as a function of the structural resolution parameter c.(G) Optimized modularity value Q, (H) number of communities n, and (I) mean community size s for the multilayer behavioral network employing chain opti-mization null models as a function of the effective fraction nmlðcÞ of edges that have larger weights than their null-model counterparts. We averaged the valuesof Q, n, and s over the 3 different 12-note sequences and C¼ 100 optimizations. Box plots in (D-F) indicate quartiles and 95% confidence intervals over the 22individuals in the study. The error bars in panels (G-I) indicate a standard deviation from the mean. In some instances, this is smaller than the line width. Thetemporal resolution-parameter value is x ¼ 1.
013142-6 Bassett et al. Chaos 23, 013142 (2013)
Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions
our data: not exactly (exclusively) the chain shape
taking advantage of the contact probability scaling?
or “communities.” Intuitively, a community consists of a setof nodes that are connected among one another more denselythan they are to nodes in other communities. A popular wayto identify community structure is to optimize a quality func-tion, which can be used to measure the relative densities ofintra-community connections versus inter-community connec-tions. See Refs. 16, 20, and 23 for recent reviews on networkcommunity structure and Refs. 24–27 for discussions of vari-ous caveats that should be considered when optimizing qualityfunctions to detect communities.
One begins with a network of N nodes and a given set ofconnections between those nodes. In the usual case ofsingle-layer networks (e.g., static networks with only onetype of edge), one represents a network using an N ! N adja-cency matrix A. The element Aij of the adjacency matrixindicates a direct connection or “edge” from node i to node j,and its value indicates the weight of that connection. Thequality of a hard partition of A into communities (wherebyeach node is assigned to exactly one community) can bequantified using a quality function. The most popular choiceis modularity16,20,21,28,29
Q0 ¼X
ij
½Aij $ cPij%dðgi; gjÞ ; (1)
where node i is assigned to community gi, node j is assignedto community gj, the Kronecker delta dðgi; gjÞ ¼ 1 if gi ¼ gj
and it equals 0 otherwise, c is a resolution parameter (whichwe will call a structural resolution parameter), and Pij is theexpected weight of the edge connecting node i to node junder a specified null model. The choice c ¼ 1 is very com-mon, but it is important to consider multiple values of c toexamine groups at multiple scales.16,30,31 Maximization ofQ0 yields a hard partition of a network into communitiessuch that the total edge weight inside of modules is as largeas possible (relative to the null model and subject to the limi-tations of the employed computational heuristics, as optimiz-ing Q0 is NP-hard16,20,32).
Recently, the null model in the quality function (1) hasbeen generalized so that one can consider sets of L adjacencymatrices, which are combined to form a rank-3 adjacencytensor A that can be used to represent time-dependent ormultiplex networks. One can thereby define a multilayermodularity (also called “multislice modularity”)3
Q ¼ 1
2l
X
ijlr
fðAijl $ clPijlÞdlr þ dijxjlrgdðgil; gjrÞ ; (2)
where the adjacency matrix of layer l has components Aijl,the element Pijl gives the components of the correspondinglayer-l matrix for the optimization null model, cl is the struc-tural resolution parameter of layer l, the quantity gil gives thecommunity assignment of node i in layer l, the quantity gjr
gives the community assignment of node j in layer r, the ele-ment xjlr gives the connection strength (i.e., an “interlayercoupling parameter,” which one can call a temporal resolu-tion parameter if one is using the adjacency tensor to repre-sent a time-dependent network) from node j in layer r tonode j in layer l, the total edge weight in the network isl ¼ 1
2
Pjr jjr, the strength (i.e., weighted degree) of node j in
layer l is jjl ¼ kjl þ cjl, the intra-layer strength of node j inlayer l is kjl ¼
Pi Aijl, and the inter-layer strength of node j
in layer l is cjl ¼P
r xjlr.Equivalent representations that use other notation can,
of course, be useful. For example, multilayer modularitycan be recast as a set of rank-2 matrices describing connec-tions between the set of all nodes across layers [e.g., forspectral partitioning29,33,34]. One can similarly generalize Qfor higher-rank tensors, which one can use when studyingcommunity structure in networks that are both time-dependent and multiplex, through appropriate specificationof inter-layer coupling tensors.
B. Network diagnostics
To characterize multilayer community structure, wecompute four example diagnostics for each hard partition:the modularity Q, the number of modules n, the mean com-munity size s (which is equal to the number of nodes in thecommunity and is proportional to 1/n), and the stationarityf.35 To compute f, we calculate the autocorrelation functionU(t, tþm) of two states of the same community G(t) at mtime steps (i.e., m network layers) apart
Uðt; tþ mÞ ) jGðtÞ \ Gðtþ mÞjjGðtÞ [ Gðtþ mÞj
; (3)
where jGðtÞ \ Gðtþ mÞj is the number of nodes that aremembers of both G(t) and G(tþm), and jGðtÞ [ Gðtþ mÞj isthe number of nodes in the union of the community at times tand tþm. Defining t0 to be the first time step in which thecommunity exists and t0 to be the last time in which it exists,the stationarity of a community is35
f )
Xt0$1
t¼t0Uðt; tþ 1Þ
t0 $ t0: (4)
This gives the mean autocorrelation over consecutive timesteps.36
In addition to these diagnostics, which are defined usingthe entire multilayer community structure, we also computetwo example diagnostics on the community structures of thecomponent layers: the mean single-layer modularity hQsi andthe variance varðQsÞ of the single-layer modularity over alllayers. The single-layer modularity Qs is defined as the staticmodularity quality function, Qs ¼
Pij½Aij $ cPij%dðgi; gjÞ,
computed for the partition g that we obtained via optimiza-tion of the multilayer modularity function Q. We have chosento use a few simple ways to help characterize the timeseries for Qs, though of course other diagnostics can also beinformative.
C. Data sets
We illustrate dynamic network null models using twoexample network ensembles: (1) 75-time-layer brain net-works drawn from each of 20 human subjects and (2) behav-ioral networks with about 150 time layers drawn from eachof 22 human subjects. Importantly, the use of network
013142-3 Bassett et al. Chaos 23, 013142 (2013)
Downloaded 18 Mar 2013 to 128.146.70.188. Redistribution subject to AIP license or copyright; see http://chaos.aip.org/about/rights_and_permissions
Pij = ⇢A0ijPij =
kikj2m
pcontact
⇠(s�3/2
(equilibrium globule)
s�1
(fractal globule)
cf)
ref) E. Lieberman-Aiden et al., Science 326, 289 (2009);L. A. Mirny, Chromosome Res. 19, 37 (2011).
other null-model terms in the modularity than Newman-Girvan?
ref) D. S. Bassett, M. A. Porter, N. F. Wymbs, S. T. Grafton, J. M. Carlson, and P. J. Mucha, Chaos 23, 013142 (2013).
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 0.2 0.4 0.6 0.8 1 1.2
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 0.2 0.4 0.6 0.8 1 1.2
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
of the genome inferred from Hi-C. More gen-erally, a strong correlation was observed betweenthe number of Hi-C readsmij and the 3D distancebetween locus i and locus j as measured by FISH[Spearman’s r = –0.916, P = 0.00003 (fig. S3)],suggesting that Hi-C read count may serve as aproxy for distance.
Upon close examination of the Hi-C data, wenoted that pairs of loci in compartment B showeda consistently higher interaction frequency at agiven genomic distance than pairs of loci in com-partment A (fig. S4). This suggests that compart-ment B is more densely packed (15). The FISHdata are consistent with this observation; loci incompartment B exhibited a stronger tendency forclose spatial localization.
To explore whether the two spatial compart-ments correspond to known features of the ge-nome, we compared the compartments identifiedin our 1-Mb correlation maps with known geneticand epigenetic features. Compartment A correlatesstrongly with the presence of genes (Spearman’sr = 0.431, P < 10–137), higher expression [viagenome-wide mRNA expression, Spearman’sr = 0.476, P < 10–145 (fig. S5)], and accessiblechromatin [as measured by deoxyribonuclease I(DNAseI) sensitivity, Spearman’s r = 0.651, Pnegligible] (16, 17). Compartment A also showsenrichment for both activating (H3K36 trimethyl-ation, Spearman’s r = 0.601, P < 10–296) andrepressive (H3K27 trimethylation, Spearman’sr = 0.282, P < 10–56) chromatin marks (18).
We repeated the above analysis at a resolutionof 100 kb (Fig. 3G) and saw that, although thecorrelation of compartment A with all other ge-nomic and epigenetic features remained strong(Spearman’s r > 0.4, P negligible), the correla-tion with the sole repressive mark, H3K27 trimeth-ylation, was dramatically attenuated (Spearman’sr = 0.046, P < 10–15). On the basis of these re-sults we concluded that compartment A is moreclosely associated with open, accessible, activelytranscribed chromatin.
We repeated our experiment with K562 cells,an erythroleukemia cell line with an aberrant kar-yotype (19). We again observed two compart-ments; these were similar in composition to thoseobserved in GM06990 cells [Pearson’s r = 0.732,
Fig. 4. The local packing ofchromatin is consistent with thebehavior of a fractal globule. (A)Contact probability as a functionof genomic distance averagedacross the genome (blue) showsa power law scaling between500 kb and 7 Mb (shaded re-gion) with a slope of –1.08 (fitshown in cyan). (B) Simulationresults for contact probability asa function of distance (1 mono-mer ~ 6 nucleosomes ~ 1200base pairs) (10) for equilibrium(red) and fractal (blue) globules.The slope for a fractal globule isvery nearly –1 (cyan), confirm-ing our prediction (10). The slopefor an equilibrium globule is –3/2,matching prior theoretical expec-tations. The slope for the fractalglobule closely resembles the slopewe observed in the genome. (C)(Top) An unfolded polymer chain,4000 monomers (4.8 Mb) long.Coloration corresponds to distancefrom one endpoint, ranging fromblue to cyan, green, yellow, or-ange, and red. (Middle) An equi-librium globule. The structure ishighly entangled; loci that arenearby along the contour (sim-ilar color) need not be nearby in3D. (Bottom) A fractal globule.Nearby loci along the contourtend to be nearby in 3D, leadingto monochromatic blocks bothon the surface and in cross sec-tion. The structure lacks knots.(D) Genome architecture at threescales. (Top) Two compartments,corresponding to open and closedchromatin, spatially partition thegenome. Chromosomes (blue, cyan,green) occupy distinct territories.(Middle) Individual chromosomesweave back and forth betweenthe open and closed chromatincompartments. (Bottom) At thescale of single megabases, the chromosome consists of a series of fractal globules.
A
C D
B
9 OCTOBER 2009 VOL 326 SCIENCE www.sciencemag.org292
REPORTS
on
July
7, 2
016
http
://sc
ienc
e.sc
ienc
emag
.org
/D
ownl
oade
d fr
om
of the genome inferred from Hi-C. More gen-erally, a strong correlation was observed betweenthe number of Hi-C readsmij and the 3D distancebetween locus i and locus j as measured by FISH[Spearman’s r = –0.916, P = 0.00003 (fig. S3)],suggesting that Hi-C read count may serve as aproxy for distance.
Upon close examination of the Hi-C data, wenoted that pairs of loci in compartment B showeda consistently higher interaction frequency at agiven genomic distance than pairs of loci in com-partment A (fig. S4). This suggests that compart-ment B is more densely packed (15). The FISHdata are consistent with this observation; loci incompartment B exhibited a stronger tendency forclose spatial localization.
To explore whether the two spatial compart-ments correspond to known features of the ge-nome, we compared the compartments identifiedin our 1-Mb correlation maps with known geneticand epigenetic features. Compartment A correlatesstrongly with the presence of genes (Spearman’sr = 0.431, P < 10–137), higher expression [viagenome-wide mRNA expression, Spearman’sr = 0.476, P < 10–145 (fig. S5)], and accessiblechromatin [as measured by deoxyribonuclease I(DNAseI) sensitivity, Spearman’s r = 0.651, Pnegligible] (16, 17). Compartment A also showsenrichment for both activating (H3K36 trimethyl-ation, Spearman’s r = 0.601, P < 10–296) andrepressive (H3K27 trimethylation, Spearman’sr = 0.282, P < 10–56) chromatin marks (18).
We repeated the above analysis at a resolutionof 100 kb (Fig. 3G) and saw that, although thecorrelation of compartment A with all other ge-nomic and epigenetic features remained strong(Spearman’s r > 0.4, P negligible), the correla-tion with the sole repressive mark, H3K27 trimeth-ylation, was dramatically attenuated (Spearman’sr = 0.046, P < 10–15). On the basis of these re-sults we concluded that compartment A is moreclosely associated with open, accessible, activelytranscribed chromatin.
We repeated our experiment with K562 cells,an erythroleukemia cell line with an aberrant kar-yotype (19). We again observed two compart-ments; these were similar in composition to thoseobserved in GM06990 cells [Pearson’s r = 0.732,
Fig. 4. The local packing ofchromatin is consistent with thebehavior of a fractal globule. (A)Contact probability as a functionof genomic distance averagedacross the genome (blue) showsa power law scaling between500 kb and 7 Mb (shaded re-gion) with a slope of –1.08 (fitshown in cyan). (B) Simulationresults for contact probability asa function of distance (1 mono-mer ~ 6 nucleosomes ~ 1200base pairs) (10) for equilibrium(red) and fractal (blue) globules.The slope for a fractal globule isvery nearly –1 (cyan), confirm-ing our prediction (10). The slopefor an equilibrium globule is –3/2,matching prior theoretical expec-tations. The slope for the fractalglobule closely resembles the slopewe observed in the genome. (C)(Top) An unfolded polymer chain,4000 monomers (4.8 Mb) long.Coloration corresponds to distancefrom one endpoint, ranging fromblue to cyan, green, yellow, or-ange, and red. (Middle) An equi-librium globule. The structure ishighly entangled; loci that arenearby along the contour (sim-ilar color) need not be nearby in3D. (Bottom) A fractal globule.Nearby loci along the contourtend to be nearby in 3D, leadingto monochromatic blocks bothon the surface and in cross sec-tion. The structure lacks knots.(D) Genome architecture at threescales. (Top) Two compartments,corresponding to open and closedchromatin, spatially partition thegenome. Chromosomes (blue, cyan,green) occupy distinct territories.(Middle) Individual chromosomesweave back and forth betweenthe open and closed chromatincompartments. (Bottom) At thescale of single megabases, the chromosome consists of a series of fractal globules.
A
C D
B
9 OCTOBER 2009 VOL 326 SCIENCE www.sciencemag.org292
REPORTS
on
July
7, 2
016
http
://sc
ienc
e.sc
ienc
emag
.org
/D
ownl
oade
d fr
om
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1 1.2
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1 1.2
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
different null model terms , taking advantage of the contact probability scaling
pcontact
⇠(s�3/2
(equilibrium globule)
s�1
(fractal globule)
the original Newman-Girvan null model
the equilibrium globule null model
the fractal globule null model
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10-1 100 101 102 103
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
100 kb
100 kb
100 kb
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
10-1 100 101 102 103
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
10 kb
10 kb
10 kb
the modularity
cf)
PNGij =
2mkikjPi0 6=j0 ki0kj0
=2mkikj
(2m)(2m)=
kikj2m
PEGij =
2mkikj |i� j|�3/2
Pi0 6=j0 ki0kj0 |i0 � j0|�3/2
PFGij =
2mkikj |i� j|�1
Pi0 6=j0 ki0kj0 |i0 � j0|�1
Q =1
2m
X
i 6=j
h⇣Aij � �P (⇤)
ij
⌘� (gi, gj)
i
ref) E. Lieberman-Aiden et al., Science 326, 289 (2009);L. A. Mirny, Chromosome Res. 19, 37 (2011).
Pij
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 0.2 0.4 0.6 0.8 1 1.2
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 0.2 0.4 0.6 0.8 1 1.2
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
of the genome inferred from Hi-C. More gen-erally, a strong correlation was observed betweenthe number of Hi-C readsmij and the 3D distancebetween locus i and locus j as measured by FISH[Spearman’s r = –0.916, P = 0.00003 (fig. S3)],suggesting that Hi-C read count may serve as aproxy for distance.
Upon close examination of the Hi-C data, wenoted that pairs of loci in compartment B showeda consistently higher interaction frequency at agiven genomic distance than pairs of loci in com-partment A (fig. S4). This suggests that compart-ment B is more densely packed (15). The FISHdata are consistent with this observation; loci incompartment B exhibited a stronger tendency forclose spatial localization.
To explore whether the two spatial compart-ments correspond to known features of the ge-nome, we compared the compartments identifiedin our 1-Mb correlation maps with known geneticand epigenetic features. Compartment A correlatesstrongly with the presence of genes (Spearman’sr = 0.431, P < 10–137), higher expression [viagenome-wide mRNA expression, Spearman’sr = 0.476, P < 10–145 (fig. S5)], and accessiblechromatin [as measured by deoxyribonuclease I(DNAseI) sensitivity, Spearman’s r = 0.651, Pnegligible] (16, 17). Compartment A also showsenrichment for both activating (H3K36 trimethyl-ation, Spearman’s r = 0.601, P < 10–296) andrepressive (H3K27 trimethylation, Spearman’sr = 0.282, P < 10–56) chromatin marks (18).
We repeated the above analysis at a resolutionof 100 kb (Fig. 3G) and saw that, although thecorrelation of compartment A with all other ge-nomic and epigenetic features remained strong(Spearman’s r > 0.4, P negligible), the correla-tion with the sole repressive mark, H3K27 trimeth-ylation, was dramatically attenuated (Spearman’sr = 0.046, P < 10–15). On the basis of these re-sults we concluded that compartment A is moreclosely associated with open, accessible, activelytranscribed chromatin.
We repeated our experiment with K562 cells,an erythroleukemia cell line with an aberrant kar-yotype (19). We again observed two compart-ments; these were similar in composition to thoseobserved in GM06990 cells [Pearson’s r = 0.732,
Fig. 4. The local packing ofchromatin is consistent with thebehavior of a fractal globule. (A)Contact probability as a functionof genomic distance averagedacross the genome (blue) showsa power law scaling between500 kb and 7 Mb (shaded re-gion) with a slope of –1.08 (fitshown in cyan). (B) Simulationresults for contact probability asa function of distance (1 mono-mer ~ 6 nucleosomes ~ 1200base pairs) (10) for equilibrium(red) and fractal (blue) globules.The slope for a fractal globule isvery nearly –1 (cyan), confirm-ing our prediction (10). The slopefor an equilibrium globule is –3/2,matching prior theoretical expec-tations. The slope for the fractalglobule closely resembles the slopewe observed in the genome. (C)(Top) An unfolded polymer chain,4000 monomers (4.8 Mb) long.Coloration corresponds to distancefrom one endpoint, ranging fromblue to cyan, green, yellow, or-ange, and red. (Middle) An equi-librium globule. The structure ishighly entangled; loci that arenearby along the contour (sim-ilar color) need not be nearby in3D. (Bottom) A fractal globule.Nearby loci along the contourtend to be nearby in 3D, leadingto monochromatic blocks bothon the surface and in cross sec-tion. The structure lacks knots.(D) Genome architecture at threescales. (Top) Two compartments,corresponding to open and closedchromatin, spatially partition thegenome. Chromosomes (blue, cyan,green) occupy distinct territories.(Middle) Individual chromosomesweave back and forth betweenthe open and closed chromatincompartments. (Bottom) At thescale of single megabases, the chromosome consists of a series of fractal globules.
A
C D
B
9 OCTOBER 2009 VOL 326 SCIENCE www.sciencemag.org292
REPORTS
on
July
7, 2
016
http
://sc
ienc
e.sc
ienc
emag
.org
/D
ownl
oade
d fr
om
of the genome inferred from Hi-C. More gen-erally, a strong correlation was observed betweenthe number of Hi-C readsmij and the 3D distancebetween locus i and locus j as measured by FISH[Spearman’s r = –0.916, P = 0.00003 (fig. S3)],suggesting that Hi-C read count may serve as aproxy for distance.
Upon close examination of the Hi-C data, wenoted that pairs of loci in compartment B showeda consistently higher interaction frequency at agiven genomic distance than pairs of loci in com-partment A (fig. S4). This suggests that compart-ment B is more densely packed (15). The FISHdata are consistent with this observation; loci incompartment B exhibited a stronger tendency forclose spatial localization.
To explore whether the two spatial compart-ments correspond to known features of the ge-nome, we compared the compartments identifiedin our 1-Mb correlation maps with known geneticand epigenetic features. Compartment A correlatesstrongly with the presence of genes (Spearman’sr = 0.431, P < 10–137), higher expression [viagenome-wide mRNA expression, Spearman’sr = 0.476, P < 10–145 (fig. S5)], and accessiblechromatin [as measured by deoxyribonuclease I(DNAseI) sensitivity, Spearman’s r = 0.651, Pnegligible] (16, 17). Compartment A also showsenrichment for both activating (H3K36 trimethyl-ation, Spearman’s r = 0.601, P < 10–296) andrepressive (H3K27 trimethylation, Spearman’sr = 0.282, P < 10–56) chromatin marks (18).
We repeated the above analysis at a resolutionof 100 kb (Fig. 3G) and saw that, although thecorrelation of compartment A with all other ge-nomic and epigenetic features remained strong(Spearman’s r > 0.4, P negligible), the correla-tion with the sole repressive mark, H3K27 trimeth-ylation, was dramatically attenuated (Spearman’sr = 0.046, P < 10–15). On the basis of these re-sults we concluded that compartment A is moreclosely associated with open, accessible, activelytranscribed chromatin.
We repeated our experiment with K562 cells,an erythroleukemia cell line with an aberrant kar-yotype (19). We again observed two compart-ments; these were similar in composition to thoseobserved in GM06990 cells [Pearson’s r = 0.732,
Fig. 4. The local packing ofchromatin is consistent with thebehavior of a fractal globule. (A)Contact probability as a functionof genomic distance averagedacross the genome (blue) showsa power law scaling between500 kb and 7 Mb (shaded re-gion) with a slope of –1.08 (fitshown in cyan). (B) Simulationresults for contact probability asa function of distance (1 mono-mer ~ 6 nucleosomes ~ 1200base pairs) (10) for equilibrium(red) and fractal (blue) globules.The slope for a fractal globule isvery nearly –1 (cyan), confirm-ing our prediction (10). The slopefor an equilibrium globule is –3/2,matching prior theoretical expec-tations. The slope for the fractalglobule closely resembles the slopewe observed in the genome. (C)(Top) An unfolded polymer chain,4000 monomers (4.8 Mb) long.Coloration corresponds to distancefrom one endpoint, ranging fromblue to cyan, green, yellow, or-ange, and red. (Middle) An equi-librium globule. The structure ishighly entangled; loci that arenearby along the contour (sim-ilar color) need not be nearby in3D. (Bottom) A fractal globule.Nearby loci along the contourtend to be nearby in 3D, leadingto monochromatic blocks bothon the surface and in cross sec-tion. The structure lacks knots.(D) Genome architecture at threescales. (Top) Two compartments,corresponding to open and closedchromatin, spatially partition thegenome. Chromosomes (blue, cyan,green) occupy distinct territories.(Middle) Individual chromosomesweave back and forth betweenthe open and closed chromatincompartments. (Bottom) At thescale of single megabases, the chromosome consists of a series of fractal globules.
A
C D
B
9 OCTOBER 2009 VOL 326 SCIENCE www.sciencemag.org292
REPORTS
on
July
7, 2
016
http
://sc
ienc
e.sc
ienc
emag
.org
/D
ownl
oade
d fr
om
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1 1.2
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1 1.2
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
different null model terms , taking advantage of the contact probability scaling
pcontact
⇠(s�3/2
(equilibrium globule)
s�1
(fractal globule)
the original Newman-Girvan null model
the equilibrium globule null model
the fractal globule null model
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10-1 100 101 102 103
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
100 kb
100 kb
100 kb
better than the original null model?
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
10-1 100 101 102 103
PPV
(pre
cisi
on)
γ
chr1
Rao et al.CTCF peak (1)
Rao et al.: randomizedCTCF peak (1): randomizedRao et al. vs CTCF peak (1)
10 kb
10 kb
10 kb
→ fractal globule at the ~ Mb scale?
the modularity
cf)
overcompensate?
PNGij =
2mkikjPi0 6=j0 ki0kj0
=2mkikj
(2m)(2m)=
kikj2m
PEGij =
2mkikj |i� j|�3/2
Pi0 6=j0 ki0kj0 |i0 � j0|�3/2
PFGij =
2mkikj |i� j|�1
Pi0 6=j0 ki0kj0 |i0 � j0|�1
Q =1
2m
X
i 6=j
h⇣Aij � �P (⇤)
ij
⌘� (gi, gj)
i
ref) E. Lieberman-Aiden et al., Science 326, 289 (2009);L. A. Mirny, Chromosome Res. 19, 37 (2011).
Pij
comparison with the fractal globule model
generated with the conformation dependent polymerization (CDP) fractal globule model, cf) M. V. Tamm et al., Phys. Rev. Lett. 114, 178102 (2015).
time and length scales the entanglements play a crucial roleand the scaling theory [2] predicts αent ¼ 1=4.To check the predictions of the scaling theory we held out
extensive computer simulations using the dissipative particledynamics (DPD) technique, which is known [64,65] tocorrectly reflect dynamics of dense polymer systems. Thepolymer model we use consists of renormalized monomerswith the size of order of the chromatin persistence length,corresponding DPD time step is of order 1 nsec or more (seeRef. [47] for more details). Volume interactions between themonomers are chosen to guarantee the absence of chain self-intersections, the entanglement length isNe ≈ 50" 5mono-mer units [66]. The modeled chains have N ¼ 218 ¼262 144 units confined in a cubic volume with periodicboundary conditions. In a chain that is long (N=Ne ≃ 5000)the equilibration time by far exceeds the times accessible incomputer simulation, so the choice of starting configurationsplays a significant role. Here we provide a short outline ofhow we construct and prepare the initial states, addressingthe reader to [47] for further details.The first initial state we use is a randomized Moore curve
similar to that described in Ref. [26], it has a very distinctdomain structure with flat domain walls. The second initialstate is generated by a mechanism which we call“conformation-dependent polymerization in poor solvent.”This algorithm, which, for the best of our knowledge, hasnever been suggested before, is constructing the chainconformation by consecutively adding monomer units in away that they tend strongly to stick to the already existingpart of the chain. In Ref. [47] we show that the resultingconformations show exactly the statistical characteristicsexpected from fractal globules, while a full account of thisnew algorithm will be given in Ref. [30]. In what follows,for brevity we call the globule prepared by the randomizedMoore algorithm “Moore,” and one prepared by theconformation-dependent polymerization “random fractal.”As a control sample we use a standard equilibrium globulewhich we call “Gaussian.”Prior to the diffusion measurements all three initial states
are annealed for τ ¼ 3.2 × 107 modeling steps. The stat-istical properties of the random fractal and Gaussianglobule do not change visibly during the annealing time,while the Moore globule is evolving with domain wallsroughening and its statistical characteristics (e.g., depend-ence of the spatial distance between monomers on thegenomic distance hR2ðnÞi; see [47]) approaching those forthe random fractal globule state.Snapshots of conformations annealed from different
initial states are shown in Fig. 1. In fractal states, contraryto the Gaussian one, fragments close along the chain tend toform domains of the same color. The states are furthercharacterized in Fig. 2. The fractal globule curve appearsvery similar (but for the saturation at large n due to thefinite size effects) to the universal spatial size-length curvefor unentangled rings discussed in Refs. [20,68]. R2ðnÞ forthe Moore state seems to approach the fractal globule curvewith growing modeling time suggesting the existence of a
unique metastable fractal globule state. Fractal globulesprepared by two different techniques are significantlydifferent at first, but converge with growing simulationtime, making the results obtained after annealing unsensi-tive to the details of the initial state.Monomer spatial displacement was measured for t ¼
6.5 × 107 DPD time steps after the annealing (correspond-ing to ∼0.1 sec on the real time scale), with results shownin Fig. 3. Impressively, mean-square displacement for the
FIG. 1 (color online). The snapshots of globule conformations:random fractal (top), Moore (middle), and Gaussian (bottom)globules. (a) General view of the modeling cell after initialannealing. Chains are gradiently colored from blue to red. (b)–(d)The evolution of a 1000-monomer subchain conformation:(b) initial conformation at the start of measurement, (c) after 218 ≈2.5 × 105 DPD steps, (d) after 226 ≈ 6.5 × 107 DPD steps. Thecube on the figure corresponds to the whole simulation box andhas the size 46 × 46 × 46 DPD length units.
FIG. 2 (color online). Mean-square distance hR2i betweenmonomers as a function of genomic distance n. Gaussian (green)and random fractal (red) states are stable on the modeling timescale (see Fig. 2 in Ref. [47]). Initial Moore state (black) relaxesafter annealing to the blue curve, approaching the random fractalstate. Inset shows the same plots in (hR2in−0.8, n=Ne) coordinatesused in [20].
PRL 114, 178102 (2015) P HY S I CA L R EV I EW LE T T ER Sweek ending1 MAY 2015
178102-3
2
text) and assuming that for such small displacements therole of chain connectivity is negligible, one gets the esti-mate of 1nsec per DPD time step. The whole accessibletimescale is then of order of 0.1sec.
Note that this is an estimate from below, as the sim-ulated media is, generally speaking, more viscous thanpure water, and the chain connectivity in fact does playsome role in the self-di↵usion of DPD beads even on thesmall time-scales.
II. INITIAL STATES
In our work we use three di↵erent ways to constructinitial states of globules which we describe below in de-tail. In all cases chain of 218 = 262144 monomers aregenerated in a cubic box with periodic boundary con-ditions, the size of the modeling box is 46 ⇥ 46 ⇥ 46reduced DPD units, making the average number den-sity of monomers equal to ⇢ = 3 (this value is knownto be especially good for modeling the dynamic proper-ties of polymer chains). All three initial states are con-structed on a cubic lattice with lattice constant equalto 3�1/3 ⇡ 0.69 and after the construction are allowedto anneal for 225 = 3.2 ⇥ 107 DPD time steps. Onlyafter this annealing the self-di↵usion measurements arestarted.
A. Random fractal globule
The mechanism of fractal globule formation suggestedbelow is novel and will be discussed and characterized infull detail in [8]. Here we provide a brief overview of theidea necessary for the reader to understand the main textand convince himself that the initial state we are dealingwith indeed has all the properties of a fractal globule.
The idea of this mechanism to design a fractal globulestate, which we propose here for the first time, is basedon the following considerations. Imagine a polymer chainbeing synthesized while being in a poor solvent, in a waythat all the already synthesized part is forming a tightglobule. Assume also the synthesis to be very fast as com-pared to the internal movements of monomers within aglobule. In that way one expects that at all intermediatestages the already formed part of the globule is in a com-pact state. Also one expect that formation of knots andentanglements will be highly suppressed since the newmonomers are mostly to the surface of the existing glob-ule, and cannot go through it as there are no holes leftin the structure. Clearly, the conformation thus formedis very reminiscent of a fractal globule.
To exploit this idea we proceed as follows. We con-struct the polymer conformation as a trajectory of a lat-tice random walk in a potential strongly attracting the
PP
P
P
PP
1
4
53
6
2
A B
PP
P
P
PP
1
4
53
62
Figure 1: The conformation-dependent random walk. (A) onthe next step the probabilities of choosing steps 3 and 5 arelarge compared to steps 1 and 4 (P4 = P1;P3 = (1+2A)P1 =20001P1, P5 = (1 + A)P1 = 10001P1, probabilities of steps2 and 6 are proportional to " and are essentially zero; (B)trapped configuration: weights of all possible steps equal "and are equiprobable.
walker to the places it has already visited. At each stepa walker on a cubic lattice has 6 neighboring cites (seeFigure 1) where he can possibly move. We postulate theprobability to go at each of the possible target cites todepend on whether it was already visited, and on howmany visited cites it has as its neighbors. In particular,we use the following assumptions:
Pi = N�1
8>>>>>>><
>>>>>>>:
"
if the target cite is visited,
1 +A
# of visited neighbors
the target cite have
!
if the target cite is not visited,
N =P
i=1..6 Pi
(6)
Here, " should be extremely small so that double visitingof the same cites should be possible only if the walk getslocked (we use " = 10�9), while A is a constant definingthe strength of attraction to the existing trajectory, andshould therefore be large to keep all the intermediateconformations compact. By trial and error we have foundA = 10, 000 to work best.
A trajectory constructed in this way includes a finitefraction (of order of several percent) self-intersections.However, he resulting states happens to be almost un-knotted:
The segment of 104 monomers is reduced to a knot ofless than 102 monomers. For comparison, a segment ofan equilibrium globule of 104 monomers is reduced to aknot of 2 · 103 points.
� = 0.2
� = 0.4
� = 0.6
applying the Louvain algorithm with PNGij =
kikj2m
comparison with the fractal globule model
generated with the conformation dependent polymerization (CDP) fractal globule model, cf) M. V. Tamm et al., Phys. Rev. Lett. 114, 178102 (2015).
time and length scales the entanglements play a crucial roleand the scaling theory [2] predicts αent ¼ 1=4.To check the predictions of the scaling theory we held out
extensive computer simulations using the dissipative particledynamics (DPD) technique, which is known [64,65] tocorrectly reflect dynamics of dense polymer systems. Thepolymer model we use consists of renormalized monomerswith the size of order of the chromatin persistence length,corresponding DPD time step is of order 1 nsec or more (seeRef. [47] for more details). Volume interactions between themonomers are chosen to guarantee the absence of chain self-intersections, the entanglement length isNe ≈ 50" 5mono-mer units [66]. The modeled chains have N ¼ 218 ¼262 144 units confined in a cubic volume with periodicboundary conditions. In a chain that is long (N=Ne ≃ 5000)the equilibration time by far exceeds the times accessible incomputer simulation, so the choice of starting configurationsplays a significant role. Here we provide a short outline ofhow we construct and prepare the initial states, addressingthe reader to [47] for further details.The first initial state we use is a randomized Moore curve
similar to that described in Ref. [26], it has a very distinctdomain structure with flat domain walls. The second initialstate is generated by a mechanism which we call“conformation-dependent polymerization in poor solvent.”This algorithm, which, for the best of our knowledge, hasnever been suggested before, is constructing the chainconformation by consecutively adding monomer units in away that they tend strongly to stick to the already existingpart of the chain. In Ref. [47] we show that the resultingconformations show exactly the statistical characteristicsexpected from fractal globules, while a full account of thisnew algorithm will be given in Ref. [30]. In what follows,for brevity we call the globule prepared by the randomizedMoore algorithm “Moore,” and one prepared by theconformation-dependent polymerization “random fractal.”As a control sample we use a standard equilibrium globulewhich we call “Gaussian.”Prior to the diffusion measurements all three initial states
are annealed for τ ¼ 3.2 × 107 modeling steps. The stat-istical properties of the random fractal and Gaussianglobule do not change visibly during the annealing time,while the Moore globule is evolving with domain wallsroughening and its statistical characteristics (e.g., depend-ence of the spatial distance between monomers on thegenomic distance hR2ðnÞi; see [47]) approaching those forthe random fractal globule state.Snapshots of conformations annealed from different
initial states are shown in Fig. 1. In fractal states, contraryto the Gaussian one, fragments close along the chain tend toform domains of the same color. The states are furthercharacterized in Fig. 2. The fractal globule curve appearsvery similar (but for the saturation at large n due to thefinite size effects) to the universal spatial size-length curvefor unentangled rings discussed in Refs. [20,68]. R2ðnÞ forthe Moore state seems to approach the fractal globule curvewith growing modeling time suggesting the existence of a
unique metastable fractal globule state. Fractal globulesprepared by two different techniques are significantlydifferent at first, but converge with growing simulationtime, making the results obtained after annealing unsensi-tive to the details of the initial state.Monomer spatial displacement was measured for t ¼
6.5 × 107 DPD time steps after the annealing (correspond-ing to ∼0.1 sec on the real time scale), with results shownin Fig. 3. Impressively, mean-square displacement for the
FIG. 1 (color online). The snapshots of globule conformations:random fractal (top), Moore (middle), and Gaussian (bottom)globules. (a) General view of the modeling cell after initialannealing. Chains are gradiently colored from blue to red. (b)–(d)The evolution of a 1000-monomer subchain conformation:(b) initial conformation at the start of measurement, (c) after 218 ≈2.5 × 105 DPD steps, (d) after 226 ≈ 6.5 × 107 DPD steps. Thecube on the figure corresponds to the whole simulation box andhas the size 46 × 46 × 46 DPD length units.
FIG. 2 (color online). Mean-square distance hR2i betweenmonomers as a function of genomic distance n. Gaussian (green)and random fractal (red) states are stable on the modeling timescale (see Fig. 2 in Ref. [47]). Initial Moore state (black) relaxesafter annealing to the blue curve, approaching the random fractalstate. Inset shows the same plots in (hR2in−0.8, n=Ne) coordinatesused in [20].
PRL 114, 178102 (2015) P HY S I CA L R EV I EW LE T T ER Sweek ending1 MAY 2015
178102-3
2
text) and assuming that for such small displacements therole of chain connectivity is negligible, one gets the esti-mate of 1nsec per DPD time step. The whole accessibletimescale is then of order of 0.1sec.
Note that this is an estimate from below, as the sim-ulated media is, generally speaking, more viscous thanpure water, and the chain connectivity in fact does playsome role in the self-di↵usion of DPD beads even on thesmall time-scales.
II. INITIAL STATES
In our work we use three di↵erent ways to constructinitial states of globules which we describe below in de-tail. In all cases chain of 218 = 262144 monomers aregenerated in a cubic box with periodic boundary con-ditions, the size of the modeling box is 46 ⇥ 46 ⇥ 46reduced DPD units, making the average number den-sity of monomers equal to ⇢ = 3 (this value is knownto be especially good for modeling the dynamic proper-ties of polymer chains). All three initial states are con-structed on a cubic lattice with lattice constant equalto 3�1/3 ⇡ 0.69 and after the construction are allowedto anneal for 225 = 3.2 ⇥ 107 DPD time steps. Onlyafter this annealing the self-di↵usion measurements arestarted.
A. Random fractal globule
The mechanism of fractal globule formation suggestedbelow is novel and will be discussed and characterized infull detail in [8]. Here we provide a brief overview of theidea necessary for the reader to understand the main textand convince himself that the initial state we are dealingwith indeed has all the properties of a fractal globule.
The idea of this mechanism to design a fractal globulestate, which we propose here for the first time, is basedon the following considerations. Imagine a polymer chainbeing synthesized while being in a poor solvent, in a waythat all the already synthesized part is forming a tightglobule. Assume also the synthesis to be very fast as com-pared to the internal movements of monomers within aglobule. In that way one expects that at all intermediatestages the already formed part of the globule is in a com-pact state. Also one expect that formation of knots andentanglements will be highly suppressed since the newmonomers are mostly to the surface of the existing glob-ule, and cannot go through it as there are no holes leftin the structure. Clearly, the conformation thus formedis very reminiscent of a fractal globule.
To exploit this idea we proceed as follows. We con-struct the polymer conformation as a trajectory of a lat-tice random walk in a potential strongly attracting the
PP
P
P
PP
1
4
53
6
2
A B
PP
P
P
PP
1
4
53
62
Figure 1: The conformation-dependent random walk. (A) onthe next step the probabilities of choosing steps 3 and 5 arelarge compared to steps 1 and 4 (P4 = P1;P3 = (1+2A)P1 =20001P1, P5 = (1 + A)P1 = 10001P1, probabilities of steps2 and 6 are proportional to " and are essentially zero; (B)trapped configuration: weights of all possible steps equal "and are equiprobable.
walker to the places it has already visited. At each stepa walker on a cubic lattice has 6 neighboring cites (seeFigure 1) where he can possibly move. We postulate theprobability to go at each of the possible target cites todepend on whether it was already visited, and on howmany visited cites it has as its neighbors. In particular,we use the following assumptions:
Pi = N�1
8>>>>>>><
>>>>>>>:
"
if the target cite is visited,
1 +A
# of visited neighbors
the target cite have
!
if the target cite is not visited,
N =P
i=1..6 Pi
(6)
Here, " should be extremely small so that double visitingof the same cites should be possible only if the walk getslocked (we use " = 10�9), while A is a constant definingthe strength of attraction to the existing trajectory, andshould therefore be large to keep all the intermediateconformations compact. By trial and error we have foundA = 10, 000 to work best.
A trajectory constructed in this way includes a finitefraction (of order of several percent) self-intersections.However, he resulting states happens to be almost un-knotted:
The segment of 104 monomers is reduced to a knot ofless than 102 monomers. For comparison, a segment ofan equilibrium globule of 104 monomers is reduced to aknot of 2 · 103 points.
� = 0.2
� = 0.4
� = 0.6
� = 0.6
� = 1
� = 1.4
the normalized Hi-C map
applying the Louvain algorithm with PNGij =
kikj2m
. . .
applying the Louvain algorithm with PNGij =
kikj2m
comparison with the fractal globule model
3 DE3E DE D 1 (D
N=8192,Radiusofvolumeexclusion=0.2Kuhnlength
3 DE3E DE D 1 (D
N=8192,Radiusofvolumeexclusion=0.2Kuhnlength,Noboundary
IdentifiedTADs=3975(raw),4451(normalized),g = 0.2
Radiusofgyration&End-to-enddistance/radiusofgyration
Powerlawofend-to-enddistance/radiusofgyration
1)Reference:easytoestimate(fromknownend-to-enddistancescalingandradiusofgyrationscaling)
2)TAD:foridealcase,itmaybecalculable
It’sbettertoextractresultsfromlongerglobules(longerthan~30000monomers),togetmorereliable
scalings atgivensubchain lengthdomain(101~103.5)
0slope
=structurewellpreserved?
generated with the conformation dependent polymerization (CDP) fractal globule model cf) M. V. Tamm et al., Phys. Rev. Lett. 114, 178102 (2015).
� = 0.2, averaged over 200 samples
applying the Louvain algorithm with PNGij =
kikj2m
detecting loops?
Chr 4
C
BA
E
= 13
= 30
Transitive
Intransitive
22.55 Mb20.55
Chr 4
20.55
22.55
5’-GAGCAATTCCGCCCCCTGGTGGCAGATCTG-3’
5’-GGCGGAGACCACAAGGTGGCGC CAGATCCC-3’
17.4
17.6
1 kb resolution
CTCFRAD21SMC3
Chr 1
Chr 1
17.6 Mb17.4
0 0.5 1 1.5 2 2.5 3 3.5 4Number of PeaksD
Reverse motif
Forward motif
Fold
Chan
ge
0
0.5
1.0
1.5
2.0
2.5
0% 20% 40% 60% 80% 100%Percentage of peak loci bound
YY1
ZNF143
CTCFRAD21SMC3
0 1 2-1-2Corner score
0
1
2
3
4
5x 100
RandomPeaks
Numb
er of
Pea
ks
(2%)(3%)(3%)(92%)
CTGCCACCTNGTGGconsensus
CCACNAGGTGGCAGconsensus
x 1000
CTCF anchor (arrowhead indicates motif orientation)
Loop domain
Ordinary domain
290 Kb110Kb
190 Kb
350 Kb
270 Kb
130 Kb
450 Kb
170Kb
F
Figure 6. Many Loops Demarcate Contact Domains; The Vast Majority of Loops Are Anchored at a Pair of Convergent CTCF/RAD21/SMC3Binding Sites(A) Histograms of corner scores for peak pixels versus random pixels with an identical distance distribution.
(B) Contact matrix for chr4:20.55 Mb–22.55 Mb in GM12878, showing examples of transitive and intransitive looping behavior.
(C) Percent of peak loci bound versus fold enrichment for 76 DNA-binding proteins.
(D) The pairs of CTCF motifs that anchor a loop are nearly all found in the convergent orientation.
(legend continued on next page)
1674 Cell 159, 1665–1680, December 18, 2014 ª2014 Elsevier Inc.
from S. S. P. Rao et al., Cell 159, 1665 (2014)
summary and outlook • While adjusting the resolution parameter, finding the network communities as TADs + comparing with biological factors (“metadata”): CTCF, histone modification, etc.
• Incorporating more tailor-made null model terms for TAD detection• Trying the model Hi-C map generated from the model based on the fractal globule structure
• detecting loops by measuring the distance between the starting and ending points of TADs
• What can we learn from these various scales of TADs?
in collaboration with
Jae-Hyung Jeon(POSTECH)
Xavier Durang(KIAS)
Sungmin Lee(SKKU)
Ludvig Lizana(Umeå Univ.)
Per Stenberg(Umeå Univ.)
Markus Nyberg(Umeå Univ.)
Rajendra Kumar(Umeå Univ.)
sponsored by the NRF-STINT !Korea-Sweden" Research Cooperation
Yeonghoon Kim(POSTECH)
for details and unlimited discussion . . .come visit me @ the poster presentation (P1057) of the NetSci main conference, 6pm–8pm, tomorrow (. . . and anytime later)
Synthetic yeast genome’s chromosome structure: special issue of Science, March 10, 2017.
BUILDING ON NATURE’S DESIGN
In 1996, a breakthrough was achieved when the sequence of
~12 million base pairs, divided among 16 chromosomes, was
reported for baker’s yeast (Saccharomyces cerevisiae). Now, some
20 years later, the Synthetic Yeast Genome Project (Sc2.0) reports
on fi ve newly constructed synthetic yeast chromosomes, advanc-
ing e� orts to substantially reengineer all 16 yeast chromosomes
with the goal of creating a fully synthetic eukaryotic genome.
Genomes are in constant fl ux: They are prone to deletions,
duplications, and insertions; recombination and rearrangement;
and invasion and disruption by selfi sh genetic elements such as trans-
posable elements. These many changes are subject to the vagaries of
natural selection, resulting in a genome organization not based on
principles of e� ciency or economy of space, but instead contingent
on the evolutionary history of the organism.
Sc2.0 has set out to untangle, streamline, and reorganize the genetic
blueprint of one of the most studied of all eukaryotic genomes. Here
they report on their development, design, construction, testing, and
curation principles, which may be scalable to other, larger genomes.
Ultimately, researchers aspire to remove all transposons and repetitive
elements, recode UAG stop codons, and move transfer RNA genes to
a novel neochromsome without causing fi tness defects, while simul-
taneously adding features to facilitate chromosome construction and
manipulation. When complete, the fi nal synthetic yeast strain will
be another milestone in our ability to work with and understand the
eukaryotic genome.
By Laura M. Zahn and Guy Riddihough*
*Now at Life Science Editors. Email: [email protected]
SCIENCE sciencemag.org 10 MARCH 2017 • VOL 355 ISSUE 6329 1039
DA_0310SpecialIntropage.indd 1039 3/8/17 11:13 AM
Published by AAAS
on
Mar
ch 9
, 201
7ht
tp://
scie
nce.
scie
ncem
ag.o
rg/
Dow
nloa
ded
from
INSIGHTS | PERSPECTIVES
1024 10 MARCH 2017 • VOL 355 ISSUE 6329 sciencemag.org SCIENCE
By Krishna Kannan
1
and
Daniel G. Gibson
1,2
A core theme in synthetic biology, “un-
derstanding by creating,” inspired the
effort to generate the first synthetic
cell, JCVI-Syn1.0 (1). The project
Sc2.0 is elevating this concept by at-
tempting to create a synthetic version
of a more evolved organism, Saccharomyces
cerevisiae, a eukaryotic single-celled yeast.
In a set of papers in this issue (2–8), sci-
entists of the Sc2.0 project who previously
constructed a single yeast chromosome (9)
now report constructing five additional
yeast chromosomes (more than one-third of
the entire genome) (see the photo). Using a
variety of phenotypic assays and structural
and functional genomics techniques, the re-
searchers observed that the synthetic chro-
mosomes drive biological processes just like
the natural, native chromosomes.
The quintessential first step toward creat-
ing a synthetic organism is the careful design
of the genomic material, which ultimately
controls every physiological process in the
cell. Project Sc2.0 built a software framework,
BioStudio, to generate chromosomal designs
(2). A set of rules were applied while design-
ing each chromosome, including removal of
repetitive regions and introns (except for the
HAC1 intron), recoding of TAG stop codon to
TAA (allowing TAG to be repurposed), and
the relocation of transfer RNA genes into a
neochromosome. In addition, sites (loxPsym)
were introduced throughout the chromo-
some at the 3ʹ ends of nonessential genes
for chemically-inducible genome rearrange-
ments (through Cre-recombinase). This al-
lowed the selection of desired phenotypes
and the examination of corresponding geno-
types (synthetic chromosome rearrangement
and modification by loxP-mediated evolution,
or SCRaMbLE). Despite the many variations
(thousands) introduced during the construc-
for large-scale energy applications. The
nonperiodic layered nanophotonic struc-
ture showed good performance (6), but its
required nanometer precision control of the
thin films is still a challenge for scaling up
to the size of meters, which is needed for
even a small (kilowatt-scale) cooling system.
The unprecedented properties of a meta-
material such as negative refraction and
superlensing originates from its internal
structures instead of its chemical constitu-
ents (7). Because its structural unit cell is of-
ten smaller than the wavelength of interest,
practical implementations of optical meta-
materials have always been challenging. Zhai
et al. devised a glass-polymer metamaterial
in which a set of glass microspheres were
randomly and uniformly dispersed in a vis-
ibly transparent polymer matrix. Because of
the surface phonon-polariton Mie resonance
excited at room temperature on the glass
surface, this amorphous metamaterial has
a maximal broadband emissivity—near the
blackbody limit across the entire atmospheric
window—that results in cooling of the mate-
rial itself (8). Both the polymer and glass are
transparent to the full solar spectrum, so the
hybrid metamaterial minimally absorbs and
reflects most solar energy when backed with
a thin silver mirror (see the figure).
Zhai et al. demonstrated an average ra-
diative cooling flux greater than 110 W m–2
in a continuous 3-day field test. This en-
ergy flux is at a rate similar to that of pho-
tovoltaic solar cell energy conversion but
with the great advantage of running both
day and night. More impressively, the key
roadblock for large area deployment of ra-
diative cooling was removed. Because the
material is amorphous and flexible, the
authors developed a glass-polymer hybrid
manufacturing technique to produce the
microstructured metamaterial, which can
be made as films several meters in length
in a continuous roll-to-roll manner. Using
such a scalable metamaterial, they demon-
strated passive water cooling by nearly 10
Celsius degrees below ambient temperature
without use of electricity.
There are still challenges yet to be ad-
dressed for the implementation of radiative
cooling metamaterials into applications.
Given that the cooling occurs on both sides
of metamaterials, detailed thermal design
will be important to maximize the cooling
rate for the substrate side, and effective
heat exchange strategies therefore must
be developed. In addition, the IR radiation
transport inside metamaterials caused by
volumetric multiple scattering among the
random Mie resonating glass spheres should
be carefully studied so as to further maxi-
mize the total emissive power. Other issues
should also be carefully investigated, such
as how weather conditions negate cooling
performances and how the polymer-based
metamaterial maintains its performance
during long-term outdoor exposure.
Although extraction of the 110 W m–2 heat
flux is a relatively low cooling rate, these
designed metamaterials should find prom-
ising application for cooling large systems
such as buildings in warm climates (9). Pres-
ently, air conditioning uses ~6% of all of the
electricity produced in the United States,
and as a result, more than 100 million met-
ric tons of carbon dioxide are released into
the atmosphere each year. The impact of
such a passive radiative cooling without
use of electricity for building applications
alone can be immense. The broad use of ra-
diative cooling technology not only leads to
energy savings but also reduces fluorinated
greenhouse gases from refrigerants used in
conventional air conditioners, thus improv-
ing air quality. At higher temperatures T,
passive radiative cooling can be drastically
enhanced because the outgoing radiative
flux is proportional to T 4 according to the
Stefan-Boltzmann law. This scalably manu-
factured metamaterial may enable transfor-
mative cooling farms for power plants and
data centers, which consume unsustainable
amounts of water and electricity.
Although radiative cooling is promis-
ing, the better use of this waste energy can
be more desirable. For example, the waste
heat could be converted into electricity by
using thermoelectric devices. Nevertheless,
the passive radiative cooling demonstrated
here unleashes the immense potential of
using the cold universe as a new avenue of
keeping us cool on Earth. j
REFERENCES
1. Y. Zhai et al., Science 355, 1062 (2017). 2. F. D. Stacey, P. M. Davis, Physics of the Earth (Wiley, 1977). 3. R. Hillenbrand, T. Taubner, F. Keilmann, Nature 418, 159
(2002). 4. X. Lu et al., Renew. Sustain. Energy Rev. 65, 1079 (2016). 5. E. Rephaeli, A. Raman, S. Fan, Nano Lett. 13, 1457 (2013). 6. A. Raman et al., Nature 515, 540 (2014). 7. Y. Liu, X. Zhang, Chem. Soc. Rev. 40, 2494 (2011). 8. J. A. Schuller, R. Zia, T. Taubner, M. L. Brongersma, Phys.
Rev. Lett. 99, 107401 (2007). 9. N. Fernandez, W. Wang, K. Alvine, S. Katipamula, Pacific
Northwest National Laboratory Report no. PNNL-24904, Richland, WA (2015).
10.1126/science.aam8566
SYNTHETIC BIOLOGY
Yeast genome,
by design
Scientists are inching closer
to generating a
synthetic eukaryotic cell
1Synthetic Genomics, Inc., 11149 North Torrey PinesRoad, La Jolla, CA 92037, USA. 2J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA. Email: [email protected]
“The impact of such a
passive radiative cooling
without use of electricity for
building applications alone
can be immense.”
DA_0310Perspectives.indd 1024 3/8/17 11:09 AM
Published by AAAS
on
Mar
ch 9
, 201
7ht
tp://
scie
nce.
scie
ncem
ag.o
rg/
Dow
nloa
ded
from
Synthetic yeast genome’s chromosome structure: special issue of Science, March 10, 2017.INSIGHTS | PERSPECTIVES
1024 10 MARCH 2017 • VOL 355 ISSUE 6329 sciencemag.org SCIENCE
By Krishna Kannan
1
and
Daniel G. Gibson
1,2
A core theme in synthetic biology, “un-
derstanding by creating,” inspired the
effort to generate the first synthetic
cell, JCVI-Syn1.0 (1). The project
Sc2.0 is elevating this concept by at-
tempting to create a synthetic version
of a more evolved organism, Saccharomyces
cerevisiae, a eukaryotic single-celled yeast.
In a set of papers in this issue (2–8), sci-
entists of the Sc2.0 project who previously
constructed a single yeast chromosome (9)
now report constructing five additional
yeast chromosomes (more than one-third of
the entire genome) (see the photo). Using a
variety of phenotypic assays and structural
and functional genomics techniques, the re-
searchers observed that the synthetic chro-
mosomes drive biological processes just like
the natural, native chromosomes.
The quintessential first step toward creat-
ing a synthetic organism is the careful design
of the genomic material, which ultimately
controls every physiological process in the
cell. Project Sc2.0 built a software framework,
BioStudio, to generate chromosomal designs
(2). A set of rules were applied while design-
ing each chromosome, including removal of
repetitive regions and introns (except for the
HAC1 intron), recoding of TAG stop codon to
TAA (allowing TAG to be repurposed), and
the relocation of transfer RNA genes into a
neochromosome. In addition, sites (loxPsym)
were introduced throughout the chromo-
some at the 3ʹ ends of nonessential genes
for chemically-inducible genome rearrange-
ments (through Cre-recombinase). This al-
lowed the selection of desired phenotypes
and the examination of corresponding geno-
types (synthetic chromosome rearrangement
and modification by loxP-mediated evolution,
or SCRaMbLE). Despite the many variations
(thousands) introduced during the construc-
for large-scale energy applications. The
nonperiodic layered nanophotonic struc-
ture showed good performance (6), but its
required nanometer precision control of the
thin films is still a challenge for scaling up
to the size of meters, which is needed for
even a small (kilowatt-scale) cooling system.
The unprecedented properties of a meta-
material such as negative refraction and
superlensing originates from its internal
structures instead of its chemical constitu-
ents (7). Because its structural unit cell is of-
ten smaller than the wavelength of interest,
practical implementations of optical meta-
materials have always been challenging. Zhai
et al. devised a glass-polymer metamaterial
in which a set of glass microspheres were
randomly and uniformly dispersed in a vis-
ibly transparent polymer matrix. Because of
the surface phonon-polariton Mie resonance
excited at room temperature on the glass
surface, this amorphous metamaterial has
a maximal broadband emissivity—near the
blackbody limit across the entire atmospheric
window—that results in cooling of the mate-
rial itself (8). Both the polymer and glass are
transparent to the full solar spectrum, so the
hybrid metamaterial minimally absorbs and
reflects most solar energy when backed with
a thin silver mirror (see the figure).
Zhai et al. demonstrated an average ra-
diative cooling flux greater than 110 W m–2
in a continuous 3-day field test. This en-
ergy flux is at a rate similar to that of pho-
tovoltaic solar cell energy conversion but
with the great advantage of running both
day and night. More impressively, the key
roadblock for large area deployment of ra-
diative cooling was removed. Because the
material is amorphous and flexible, the
authors developed a glass-polymer hybrid
manufacturing technique to produce the
microstructured metamaterial, which can
be made as films several meters in length
in a continuous roll-to-roll manner. Using
such a scalable metamaterial, they demon-
strated passive water cooling by nearly 10
Celsius degrees below ambient temperature
without use of electricity.
There are still challenges yet to be ad-
dressed for the implementation of radiative
cooling metamaterials into applications.
Given that the cooling occurs on both sides
of metamaterials, detailed thermal design
will be important to maximize the cooling
rate for the substrate side, and effective
heat exchange strategies therefore must
be developed. In addition, the IR radiation
transport inside metamaterials caused by
volumetric multiple scattering among the
random Mie resonating glass spheres should
be carefully studied so as to further maxi-
mize the total emissive power. Other issues
should also be carefully investigated, such
as how weather conditions negate cooling
performances and how the polymer-based
metamaterial maintains its performance
during long-term outdoor exposure.
Although extraction of the 110 W m–2 heat
flux is a relatively low cooling rate, these
designed metamaterials should find prom-
ising application for cooling large systems
such as buildings in warm climates (9). Pres-
ently, air conditioning uses ~6% of all of the
electricity produced in the United States,
and as a result, more than 100 million met-
ric tons of carbon dioxide are released into
the atmosphere each year. The impact of
such a passive radiative cooling without
use of electricity for building applications
alone can be immense. The broad use of ra-
diative cooling technology not only leads to
energy savings but also reduces fluorinated
greenhouse gases from refrigerants used in
conventional air conditioners, thus improv-
ing air quality. At higher temperatures T,
passive radiative cooling can be drastically
enhanced because the outgoing radiative
flux is proportional to T 4 according to the
Stefan-Boltzmann law. This scalably manu-
factured metamaterial may enable transfor-
mative cooling farms for power plants and
data centers, which consume unsustainable
amounts of water and electricity.
Although radiative cooling is promis-
ing, the better use of this waste energy can
be more desirable. For example, the waste
heat could be converted into electricity by
using thermoelectric devices. Nevertheless,
the passive radiative cooling demonstrated
here unleashes the immense potential of
using the cold universe as a new avenue of
keeping us cool on Earth. j
REFERENCES
1. Y. Zhai et al., Science 355, 1062 (2017). 2. F. D. Stacey, P. M. Davis, Physics of the Earth (Wiley, 1977). 3. R. Hillenbrand, T. Taubner, F. Keilmann, Nature 418, 159
(2002). 4. X. Lu et al., Renew. Sustain. Energy Rev. 65, 1079 (2016). 5. E. Rephaeli, A. Raman, S. Fan, Nano Lett. 13, 1457 (2013). 6. A. Raman et al., Nature 515, 540 (2014). 7. Y. Liu, X. Zhang, Chem. Soc. Rev. 40, 2494 (2011). 8. J. A. Schuller, R. Zia, T. Taubner, M. L. Brongersma, Phys.
Rev. Lett. 99, 107401 (2007). 9. N. Fernandez, W. Wang, K. Alvine, S. Katipamula, Pacific
Northwest National Laboratory Report no. PNNL-24904, Richland, WA (2015).
10.1126/science.aam8566
SYNTHETIC BIOLOGY
Yeast genome,
by design
Scientists are inching closer
to generating a
synthetic eukaryotic cell
1Synthetic Genomics, Inc., 11149 North Torrey PinesRoad, La Jolla, CA 92037, USA. 2J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA. Email: [email protected]
“The impact of such a
passive radiative cooling
without use of electricity for
building applications alone
can be immense.”
DA_0310Perspectives.indd 1024 3/8/17 11:09 AM
Published by AAAS
on
Mar
ch 9
, 201
7ht
tp://
scie
nce.
scie
ncem
ag.o
rg/
Dow
nloa
ded
from
10 MARCH 2017 • VOL 355 ISSUE 6329 1025SCIENCE sciencemag.org
PH
OT
O:
ST
EV
E G
SC
HM
EIS
SN
ER
/S
CIE
NC
E S
OU
RC
E
tion of synthetic chromosomes, these muta-
tions could still be considered “not drastic,”
that is, without radical changes to genome
size or its structural and functional organi-
zation. This conservative design could be,
in part, key to the success of functionalizing
each synthetic yeast chromosome created to
date.
The design rules were implemented in a
stepwise, hierarchical assembly of the syn-
thetic chromosomes, as previously described
(9, 10), starting with chunks built from oli-
gonucleotides (750 base pairs), which were
assembled into 2- to 3-kb minichunks and,
subsequently, megachunks of 10-kb or 30- to
60-kb DNA molecules in vitro. Each mega-
chunk (with the exception of terminal mega-
chunks) carried an auxotrophic selectable
marker at the 3ʹ end that was used to directly
swap out the wild-type chromosomal DNA.
This marker was recycled
during the swapping of the
next synthetic megachunk
with the native chromosome
(switching auxotrophies pro-
gressively by integration, or
SwAP-In). Depending on the
length of the chromosome,
as many as 33 serial SwAP-In
experiments were conducted
to generate cells carrying a
completely synthetic chro-
mosome alongside 15 other
native chromosomes. Using
this procedure, project Sc2.0
has generated six complete
yeast chromosomes (synII,
synIII, synV, synVI, synX,
synXII) and one half chro-
mosome (synIXR) (3, 10).
In what could be an impor-
tant stride toward generat-
ing a completely synthetic
yeast, Sc2.0 has initiated the
process of combining all the synthetic chro-
mosomes into a single strain by using an
endoduplication backcross process (3). Cells
carrying two and three completely synthetic
chromosomes have been generated with no
differential phenotype or genome architec-
ture compared to the wild-type cells (3, 4).
The design of the synthetic chromosomes
was not perfect, but close. Few deliberate
recoding events with synonymous codons
(PCRTags) were included in the synthetic
DNA to track the replacement of the native
chromosome with synthetic parts. Although
most of these “watermarks” were benign,
some altered expression of genes, which
modified messenger RNA (mRNA) secondary
structure and resulted in a conspicuous phe-
notype (3). Other watermarks altered gene
expression by either creating a putative site
for transcription factor binding (5) or by di-
rectly affecting mRNA translation efficiency
potentially due to discrepant decoding effi-
ciency (6).
In some cases, the introduction of loxPsym
sites reduced the expression of essential
genes, thus creating a detrimental pheno-
type. Sequencing, meiotic recombination,
pooled PCRTag mapping, and electrophoretic
karyotyping were used to identify “bugs”
(2–8). Simultaneous elimination of multiple
bugs was carried out by using clustered regu-
larly interspaced short palindromic repeats
(CRISPR)–Cas9 or by using at least a single
“selectable” insertion event (8).
The stepwise replacement of a native chro-
mosome with the SwAP-In method allows for
detecting adverse phenotypes of the design
during several stages of the assembly of the
synthetic chromosome. However, it also cre-
ates many opportunities for massive duplica-
tion and rearrangement events, most likely
detected only after the construction of the
entire synthetic chromosome (5, 8). Given the
recent advancements in complete de novo
chromosome synthesis (11, 12) and transplan-
tation technologies (1), in the near future, one
can envision the concomitant removal and
replacement of native chromosomes with
entire chromosomes that are designed and
chemically-synthesized from the bottom-up.
Indeed, some of these chromosome synthesis
and assembly technologies were used by the
Sc2.0 scientists to generate “minichunks” and
“megachunks.”
Recombination between the native and
the synthetic chromosomes could be avoided
by introducing sufficient differences (recod-
ing the genes, for example) in the sequence.
Knowledge from techniques like in vivo selec-
tive 2′-hydroxyl acylation and profiling (13)
could avoid disrupting critical mRNA struc-
tures during recoding. Complete synthesis
and transformation of chromosomes could
potentially be accomplished in a fraction of
time compared to the SwAP-In technology.
The scope of the design process could
be expanded in future studies to elucidate
genomic principles that underpin eukary-
otic life. Recently, a bacterial cell was de-
signed and synthesized with a minimal set
of genes (473) to answer a basic question
in biology about the smallest genomic con-
tent that could support life (14). The studies
reported in this issue open the door to ad-
dress a similar question pertaining to yeast.
Chromosomal design could also be extended
to functionally modularize the genome (or-
ganizing genes on the chromosome based
on their function). This could illuminate
complex and conserved regulatory mecha-
nisms that might eventually
apply to higher eukaryotes,
including humans. Notably,
efforts toward understand-
ing genome organization
principles and customizing
genome design by minimi-
zation and modularization
are underway in the yeast
Kluyveromyces marxianus
(15), the fastest-growing
eukaryotic organism, thus
greatly accelerating the ge-
nome design-build-test cycle.
Undoubtedly, progress by
the Sc2.0 project will ad-
vance our understanding of
basic biological processes
and how the genome func-
tions. Consequently, com-
putational models of yeasts
with highly predictable out-
comes could be designed
and generated with a large
degree of success. Such designer organisms
could be exploited as models to comprehend
human diseases (8), identify disease targets,
and generate therapeutics. j
REFERENCES
1. D. G. Gibson et al., Science 329, 52 (2010). 2. S. M. Richardson et al., Science 355, 1040 (2017). 3. L. A. Mitchell et al., Science 355, eaaf4831 (2017). 4. G. Mercy et al., Science 355, eaaf4597 (2017). 5. Y. Wu et al., Science 355, eaaf4706 (2017). 6. W. Zhang et al., Science 355, eaaf3981 (2017). 7. Y. Shen et al., Science 355, eaaf4791 (2017). 8. Z.-X. Xie et al., Science 355, eaaf4704 (2017). 9. N. Annaluru et al., Science 344, 55 (2014). 10. J. S. Dymond et al., Nature 477, 471 (2011). 11. D. G. Gibson et al., Proc. Natl. Acad. Sci. U.S.A. 105, 20404
(2008). 12. D. G. Gibson et al., Nat. Methods 6, 343 (2009). 13. R. C. Spitale et al., Nat. Chem. Biol. 9, 18 (2013). 14. C. A. Hutchison III et al., Science 351, aad6253 (2016). 15. M. Eisenstein, Nat. Methods 14, 117 (2017).
10.1126/science.aam9739
Six synthetic chromosomes for the budding yeast drive biological processes just like their
natural counterparts. S. cerevisiae has 16 chromosomes.
DA_0310Perspectives.indd 1025 3/8/17 11:09 AM
Published by AAAS
on
Mar
ch 9
, 201
7ht
tp://
scie
nce.
scie
ncem
ag.o
rg/
Dow
nloa
ded
from
Synthetic yeast genome’s chromosome structure: special issue of Science, March 10, 2017.INSIGHTS | PERSPECTIVES
1024 10 MARCH 2017 • VOL 355 ISSUE 6329 sciencemag.org SCIENCE
By Krishna Kannan
1
and
Daniel G. Gibson
1,2
A core theme in synthetic biology, “un-
derstanding by creating,” inspired the
effort to generate the first synthetic
cell, JCVI-Syn1.0 (1). The project
Sc2.0 is elevating this concept by at-
tempting to create a synthetic version
of a more evolved organism, Saccharomyces
cerevisiae, a eukaryotic single-celled yeast.
In a set of papers in this issue (2–8), sci-
entists of the Sc2.0 project who previously
constructed a single yeast chromosome (9)
now report constructing five additional
yeast chromosomes (more than one-third of
the entire genome) (see the photo). Using a
variety of phenotypic assays and structural
and functional genomics techniques, the re-
searchers observed that the synthetic chro-
mosomes drive biological processes just like
the natural, native chromosomes.
The quintessential first step toward creat-
ing a synthetic organism is the careful design
of the genomic material, which ultimately
controls every physiological process in the
cell. Project Sc2.0 built a software framework,
BioStudio, to generate chromosomal designs
(2). A set of rules were applied while design-
ing each chromosome, including removal of
repetitive regions and introns (except for the
HAC1 intron), recoding of TAG stop codon to
TAA (allowing TAG to be repurposed), and
the relocation of transfer RNA genes into a
neochromosome. In addition, sites (loxPsym)
were introduced throughout the chromo-
some at the 3ʹ ends of nonessential genes
for chemically-inducible genome rearrange-
ments (through Cre-recombinase). This al-
lowed the selection of desired phenotypes
and the examination of corresponding geno-
types (synthetic chromosome rearrangement
and modification by loxP-mediated evolution,
or SCRaMbLE). Despite the many variations
(thousands) introduced during the construc-
for large-scale energy applications. The
nonperiodic layered nanophotonic struc-
ture showed good performance (6), but its
required nanometer precision control of the
thin films is still a challenge for scaling up
to the size of meters, which is needed for
even a small (kilowatt-scale) cooling system.
The unprecedented properties of a meta-
material such as negative refraction and
superlensing originates from its internal
structures instead of its chemical constitu-
ents (7). Because its structural unit cell is of-
ten smaller than the wavelength of interest,
practical implementations of optical meta-
materials have always been challenging. Zhai
et al. devised a glass-polymer metamaterial
in which a set of glass microspheres were
randomly and uniformly dispersed in a vis-
ibly transparent polymer matrix. Because of
the surface phonon-polariton Mie resonance
excited at room temperature on the glass
surface, this amorphous metamaterial has
a maximal broadband emissivity—near the
blackbody limit across the entire atmospheric
window—that results in cooling of the mate-
rial itself (8). Both the polymer and glass are
transparent to the full solar spectrum, so the
hybrid metamaterial minimally absorbs and
reflects most solar energy when backed with
a thin silver mirror (see the figure).
Zhai et al. demonstrated an average ra-
diative cooling flux greater than 110 W m–2
in a continuous 3-day field test. This en-
ergy flux is at a rate similar to that of pho-
tovoltaic solar cell energy conversion but
with the great advantage of running both
day and night. More impressively, the key
roadblock for large area deployment of ra-
diative cooling was removed. Because the
material is amorphous and flexible, the
authors developed a glass-polymer hybrid
manufacturing technique to produce the
microstructured metamaterial, which can
be made as films several meters in length
in a continuous roll-to-roll manner. Using
such a scalable metamaterial, they demon-
strated passive water cooling by nearly 10
Celsius degrees below ambient temperature
without use of electricity.
There are still challenges yet to be ad-
dressed for the implementation of radiative
cooling metamaterials into applications.
Given that the cooling occurs on both sides
of metamaterials, detailed thermal design
will be important to maximize the cooling
rate for the substrate side, and effective
heat exchange strategies therefore must
be developed. In addition, the IR radiation
transport inside metamaterials caused by
volumetric multiple scattering among the
random Mie resonating glass spheres should
be carefully studied so as to further maxi-
mize the total emissive power. Other issues
should also be carefully investigated, such
as how weather conditions negate cooling
performances and how the polymer-based
metamaterial maintains its performance
during long-term outdoor exposure.
Although extraction of the 110 W m–2 heat
flux is a relatively low cooling rate, these
designed metamaterials should find prom-
ising application for cooling large systems
such as buildings in warm climates (9). Pres-
ently, air conditioning uses ~6% of all of the
electricity produced in the United States,
and as a result, more than 100 million met-
ric tons of carbon dioxide are released into
the atmosphere each year. The impact of
such a passive radiative cooling without
use of electricity for building applications
alone can be immense. The broad use of ra-
diative cooling technology not only leads to
energy savings but also reduces fluorinated
greenhouse gases from refrigerants used in
conventional air conditioners, thus improv-
ing air quality. At higher temperatures T,
passive radiative cooling can be drastically
enhanced because the outgoing radiative
flux is proportional to T 4 according to the
Stefan-Boltzmann law. This scalably manu-
factured metamaterial may enable transfor-
mative cooling farms for power plants and
data centers, which consume unsustainable
amounts of water and electricity.
Although radiative cooling is promis-
ing, the better use of this waste energy can
be more desirable. For example, the waste
heat could be converted into electricity by
using thermoelectric devices. Nevertheless,
the passive radiative cooling demonstrated
here unleashes the immense potential of
using the cold universe as a new avenue of
keeping us cool on Earth. j
REFERENCES
1. Y. Zhai et al., Science 355, 1062 (2017). 2. F. D. Stacey, P. M. Davis, Physics of the Earth (Wiley, 1977). 3. R. Hillenbrand, T. Taubner, F. Keilmann, Nature 418, 159
(2002). 4. X. Lu et al., Renew. Sustain. Energy Rev. 65, 1079 (2016). 5. E. Rephaeli, A. Raman, S. Fan, Nano Lett. 13, 1457 (2013). 6. A. Raman et al., Nature 515, 540 (2014). 7. Y. Liu, X. Zhang, Chem. Soc. Rev. 40, 2494 (2011). 8. J. A. Schuller, R. Zia, T. Taubner, M. L. Brongersma, Phys.
Rev. Lett. 99, 107401 (2007). 9. N. Fernandez, W. Wang, K. Alvine, S. Katipamula, Pacific
Northwest National Laboratory Report no. PNNL-24904, Richland, WA (2015).
10.1126/science.aam8566
SYNTHETIC BIOLOGY
Yeast genome,
by design
Scientists are inching closer
to generating a
synthetic eukaryotic cell
1Synthetic Genomics, Inc., 11149 North Torrey PinesRoad, La Jolla, CA 92037, USA. 2J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA 92037, USA. Email: [email protected]
“The impact of such a
passive radiative cooling
without use of electricity for
building applications alone
can be immense.”
DA_0310Perspectives.indd 1024 3/8/17 11:09 AM
Published by AAAS
on
Mar
ch 9
, 201
7ht
tp://
scie
nce.
scie
ncem
ag.o
rg/
Dow
nloa
ded
from
10 MARCH 2017 • VOL 355 ISSUE 6329 1025SCIENCE sciencemag.org
PH
OT
O:
ST
EV
E G
SC
HM
EIS
SN
ER
/S
CIE
NC
E S
OU
RC
E
tion of synthetic chromosomes, these muta-
tions could still be considered “not drastic,”
that is, without radical changes to genome
size or its structural and functional organi-
zation. This conservative design could be,
in part, key to the success of functionalizing
each synthetic yeast chromosome created to
date.
The design rules were implemented in a
stepwise, hierarchical assembly of the syn-
thetic chromosomes, as previously described
(9, 10), starting with chunks built from oli-
gonucleotides (750 base pairs), which were
assembled into 2- to 3-kb minichunks and,
subsequently, megachunks of 10-kb or 30- to
60-kb DNA molecules in vitro. Each mega-
chunk (with the exception of terminal mega-
chunks) carried an auxotrophic selectable
marker at the 3ʹ end that was used to directly
swap out the wild-type chromosomal DNA.
This marker was recycled
during the swapping of the
next synthetic megachunk
with the native chromosome
(switching auxotrophies pro-
gressively by integration, or
SwAP-In). Depending on the
length of the chromosome,
as many as 33 serial SwAP-In
experiments were conducted
to generate cells carrying a
completely synthetic chro-
mosome alongside 15 other
native chromosomes. Using
this procedure, project Sc2.0
has generated six complete
yeast chromosomes (synII,
synIII, synV, synVI, synX,
synXII) and one half chro-
mosome (synIXR) (3, 10).
In what could be an impor-
tant stride toward generat-
ing a completely synthetic
yeast, Sc2.0 has initiated the
process of combining all the synthetic chro-
mosomes into a single strain by using an
endoduplication backcross process (3). Cells
carrying two and three completely synthetic
chromosomes have been generated with no
differential phenotype or genome architec-
ture compared to the wild-type cells (3, 4).
The design of the synthetic chromosomes
was not perfect, but close. Few deliberate
recoding events with synonymous codons
(PCRTags) were included in the synthetic
DNA to track the replacement of the native
chromosome with synthetic parts. Although
most of these “watermarks” were benign,
some altered expression of genes, which
modified messenger RNA (mRNA) secondary
structure and resulted in a conspicuous phe-
notype (3). Other watermarks altered gene
expression by either creating a putative site
for transcription factor binding (5) or by di-
rectly affecting mRNA translation efficiency
potentially due to discrepant decoding effi-
ciency (6).
In some cases, the introduction of loxPsym
sites reduced the expression of essential
genes, thus creating a detrimental pheno-
type. Sequencing, meiotic recombination,
pooled PCRTag mapping, and electrophoretic
karyotyping were used to identify “bugs”
(2–8). Simultaneous elimination of multiple
bugs was carried out by using clustered regu-
larly interspaced short palindromic repeats
(CRISPR)–Cas9 or by using at least a single
“selectable” insertion event (8).
The stepwise replacement of a native chro-
mosome with the SwAP-In method allows for
detecting adverse phenotypes of the design
during several stages of the assembly of the
synthetic chromosome. However, it also cre-
ates many opportunities for massive duplica-
tion and rearrangement events, most likely
detected only after the construction of the
entire synthetic chromosome (5, 8). Given the
recent advancements in complete de novo
chromosome synthesis (11, 12) and transplan-
tation technologies (1), in the near future, one
can envision the concomitant removal and
replacement of native chromosomes with
entire chromosomes that are designed and
chemically-synthesized from the bottom-up.
Indeed, some of these chromosome synthesis
and assembly technologies were used by the
Sc2.0 scientists to generate “minichunks” and
“megachunks.”
Recombination between the native and
the synthetic chromosomes could be avoided
by introducing sufficient differences (recod-
ing the genes, for example) in the sequence.
Knowledge from techniques like in vivo selec-
tive 2′-hydroxyl acylation and profiling (13)
could avoid disrupting critical mRNA struc-
tures during recoding. Complete synthesis
and transformation of chromosomes could
potentially be accomplished in a fraction of
time compared to the SwAP-In technology.
The scope of the design process could
be expanded in future studies to elucidate
genomic principles that underpin eukary-
otic life. Recently, a bacterial cell was de-
signed and synthesized with a minimal set
of genes (473) to answer a basic question
in biology about the smallest genomic con-
tent that could support life (14). The studies
reported in this issue open the door to ad-
dress a similar question pertaining to yeast.
Chromosomal design could also be extended
to functionally modularize the genome (or-
ganizing genes on the chromosome based
on their function). This could illuminate
complex and conserved regulatory mecha-
nisms that might eventually
apply to higher eukaryotes,
including humans. Notably,
efforts toward understand-
ing genome organization
principles and customizing
genome design by minimi-
zation and modularization
are underway in the yeast
Kluyveromyces marxianus
(15), the fastest-growing
eukaryotic organism, thus
greatly accelerating the ge-
nome design-build-test cycle.
Undoubtedly, progress by
the Sc2.0 project will ad-
vance our understanding of
basic biological processes
and how the genome func-
tions. Consequently, com-
putational models of yeasts
with highly predictable out-
comes could be designed
and generated with a large
degree of success. Such designer organisms
could be exploited as models to comprehend
human diseases (8), identify disease targets,
and generate therapeutics. j
REFERENCES
1. D. G. Gibson et al., Science 329, 52 (2010). 2. S. M. Richardson et al., Science 355, 1040 (2017). 3. L. A. Mitchell et al., Science 355, eaaf4831 (2017). 4. G. Mercy et al., Science 355, eaaf4597 (2017). 5. Y. Wu et al., Science 355, eaaf4706 (2017). 6. W. Zhang et al., Science 355, eaaf3981 (2017). 7. Y. Shen et al., Science 355, eaaf4791 (2017). 8. Z.-X. Xie et al., Science 355, eaaf4704 (2017). 9. N. Annaluru et al., Science 344, 55 (2014). 10. J. S. Dymond et al., Nature 477, 471 (2011). 11. D. G. Gibson et al., Proc. Natl. Acad. Sci. U.S.A. 105, 20404
(2008). 12. D. G. Gibson et al., Nat. Methods 6, 343 (2009). 13. R. C. Spitale et al., Nat. Chem. Biol. 9, 18 (2013). 14. C. A. Hutchison III et al., Science 351, aad6253 (2016). 15. M. Eisenstein, Nat. Methods 14, 117 (2017).
10.1126/science.aam9739
Six synthetic chromosomes for the budding yeast drive biological processes just like their
natural counterparts. S. cerevisiae has 16 chromosomes.
DA_0310Perspectives.indd 1025 3/8/17 11:09 AM
Published by AAAS
on
Mar
ch 9
, 201
7ht
tp://
scie
nce.
scie
ncem
ag.o
rg/
Dow
nloa
ded
from
RESEARCH ARTICLE SUMMARY◥
SYNTHETIC BIOLOGY
3D organization of synthetic andscrambled chromosomesGuillaume Mercy,* Julien Mozziconacci,* Vittore F. Scolari, Kun Yang, Guanghou Zhao,Agnès Thierry, Yisha Luo, Leslie A. Mitchell, Michael Shen, Yue Shen, Roy Walker,Weimin Zhang, Yi Wu, Ze-xiong Xie, Zhouqing Luo, Yizhi Cai, Junbiao Dai, Huanming Yang,Ying-Jin Yuan, Jef D. Boeke, Joel S. Bader, Héloïse Muller,† Romain Koszul†
INTRODUCTION: The overall organization ofbudding yeast chromosomes is driven and reg-ulated by four factors: (i) the tethering andclustering of centromeres at the spindle polebody; (ii) the loose tethering of telomeres at thenuclear envelope, where they form small, dynamicclusters; (iii) a single nucleolus in which the ribo-somal DNA (rDNA) cluster is sequestered fromother chromosomes; and (iv) chromosomal armlengths. Hi-C, a genomic derivative of the chro-mosome conformation capture approach, quan-tifies the proximity of all DNA segments presentin the nuclei of a cell population, unveiling theaveragemultiscale organization of chromosomesin the nuclear space. We exploited Hi-C to inves-tigate the trajectories of synthetic chromosomeswithin the Saccharomyces cerevisiae nucleus andcompare them with their native counterparts.
RATIONALE: The Sc2.0 genome design speci-fies strong conservation of gene content andarrangement with respect to the native chro-mosomal sequence. However, synthetic chromo-somes incorporate thousands of designer changes,notably the removal of transfer RNA genes andrepeated sequences such as transposons andsubtelomeric repeats to enhance stability. Theyalso carry loxPsym sites, allowing for induciblegenome SCRaMbLE (synthetic chromosomerearrangement andmodification by loxP-mediatedevolution) aimed at accelerating genomic plas-ticity. Whether these changes affect chromosomeorganization, DNA metabolism, and fitnessis a critical question for completion of the Sc2.0project. To address these questions, we usedHi-C to characterize the organization of syn-thetic chromosomes.
RESULTS: Comparison of synthetic chromo-somes with native counterparts revealed no sub-stantial changes, showing that the redesignedsequences, and especially the removal of re-peated sequences, had little or no effect onaverage chromosome trajectories. Sc2.0 synthet-ic chromosomes have Hi-C contact maps withmuch smoother contact patterns than those ofnative chromosomes, especially in subtelomer-
ic regions. This improved“mappability” results di-rectly from the removal ofrepeated elements all alongthe length of the syntheticchromosomes. These obser-vations highlight a concep-
tual advance enabled by bottom-up chromosomesynthesis, which allows refinement of exper-imental systems to make complex questionseasier to address. Despite the overall similar-ity, differences were observed in two instances.First, deletion of the HML and HMR silentmating-type cassettes on chromosome III ledto a loss of their specific interaction. Second,repositioning the large array of rDNA repeatsnearer to the centromere cluster forced sub-stantial genome-wide conformational changes—for instance, inserting the array in the mid-dle of the small right arm of chromosome IIIsplit the arm into two noninteracting regions.The nucleolus structure was then trapped inthe middle between small and large chromo-some arms, imposing a physical barrier betweenthem.In addition to describing the Sc2.0 chromo-
some organization, we also used Hi-C to identifychromosomal rearrangements resulting fromSCRaMbLE experiments. Inducible recombina-tion between the hundreds of loxPsym sitesintroduced into Sc2.0 chromosomes enablescombinatorial rearrangements of the genomestructure. Hi-C contact maps of two SCRaMbLEstrains carrying synIII and synIXR chromosomesrevealed a variety of cis events, including simpledeletions, inversions, and duplications, as wellas translocations, the latter event representinga class of trans SCRaMbLE rearrangements notpreviously observed.
CONCLUSION: This large data set is a re-source that will be exploited in future studiesexploring the power of the SCRaMbLE system.By investigating the trajectories of Sc2.0 chro-mosomes in the nuclear space, this work pavesthe way for future studies addressing the in-fluence of genome-wide engineering approacheson essential features of living systems.▪
RESEARCH | SYNTHETIC YEAST GENOME
Mercy et al., Science 355, 1050 (2017) 10 March 2017 1 of 1
The list of author affiliations is available in the full article online.*These authors contributed equally to this work.†Corresponding author. Email: [email protected] (H.M.);[email protected] (R.K.)Cite this article as G. Mercy et al., Science 355, eaaf4597(2017). DOI: 10.1126/science.aaf4597
Synthetic chromosome organization. (A) Hi-C contact maps of synII and native (wild-type,WT)chromosome II. Red arrowheads point to filtered bins (white vectors) that are only present in thenative chromosome map. kb, kilobases. (B) Three-dimensional (3D) representations of Hi-C mapsof strains carrying rDNA either on synXII or native chromosome III. (C) Contact maps and 3Drepresentations of synIXR (yellow) and synIII (pink) before (left) and after (right) SCRaMbLE.Translocation breakpoints are indicated by green and blue arrowheads.
ON OUR WEBSITE◥
Read the full articleat http://dx.doi.org/10.1126/science.aaf4597..................................................
on
Mar
ch 9
, 201
7ht
tp://
scie
nce.
scie
ncem
ag.o
rg/
Dow
nloa
ded
from
RESEARCH ARTICLE SUMMARY◥
SYNTHETIC BIOLOGY
3D organization of synthetic andscrambled chromosomesGuillaume Mercy,* Julien Mozziconacci,* Vittore F. Scolari, Kun Yang, Guanghou Zhao,Agnès Thierry, Yisha Luo, Leslie A. Mitchell, Michael Shen, Yue Shen, Roy Walker,Weimin Zhang, Yi Wu, Ze-xiong Xie, Zhouqing Luo, Yizhi Cai, Junbiao Dai, Huanming Yang,Ying-Jin Yuan, Jef D. Boeke, Joel S. Bader, Héloïse Muller,† Romain Koszul†
INTRODUCTION: The overall organization ofbudding yeast chromosomes is driven and reg-ulated by four factors: (i) the tethering andclustering of centromeres at the spindle polebody; (ii) the loose tethering of telomeres at thenuclear envelope, where they form small, dynamicclusters; (iii) a single nucleolus in which the ribo-somal DNA (rDNA) cluster is sequestered fromother chromosomes; and (iv) chromosomal armlengths. Hi-C, a genomic derivative of the chro-mosome conformation capture approach, quan-tifies the proximity of all DNA segments presentin the nuclei of a cell population, unveiling theaveragemultiscale organization of chromosomesin the nuclear space. We exploited Hi-C to inves-tigate the trajectories of synthetic chromosomeswithin the Saccharomyces cerevisiae nucleus andcompare them with their native counterparts.
RATIONALE: The Sc2.0 genome design speci-fies strong conservation of gene content andarrangement with respect to the native chro-mosomal sequence. However, synthetic chromo-somes incorporate thousands of designer changes,notably the removal of transfer RNA genes andrepeated sequences such as transposons andsubtelomeric repeats to enhance stability. Theyalso carry loxPsym sites, allowing for induciblegenome SCRaMbLE (synthetic chromosomerearrangement andmodification by loxP-mediatedevolution) aimed at accelerating genomic plas-ticity. Whether these changes affect chromosomeorganization, DNA metabolism, and fitnessis a critical question for completion of the Sc2.0project. To address these questions, we usedHi-C to characterize the organization of syn-thetic chromosomes.
RESULTS: Comparison of synthetic chromo-somes with native counterparts revealed no sub-stantial changes, showing that the redesignedsequences, and especially the removal of re-peated sequences, had little or no effect onaverage chromosome trajectories. Sc2.0 synthet-ic chromosomes have Hi-C contact maps withmuch smoother contact patterns than those ofnative chromosomes, especially in subtelomer-
ic regions. This improved“mappability” results di-rectly from the removal ofrepeated elements all alongthe length of the syntheticchromosomes. These obser-vations highlight a concep-
tual advance enabled by bottom-up chromosomesynthesis, which allows refinement of exper-imental systems to make complex questionseasier to address. Despite the overall similar-ity, differences were observed in two instances.First, deletion of the HML and HMR silentmating-type cassettes on chromosome III ledto a loss of their specific interaction. Second,repositioning the large array of rDNA repeatsnearer to the centromere cluster forced sub-stantial genome-wide conformational changes—for instance, inserting the array in the mid-dle of the small right arm of chromosome IIIsplit the arm into two noninteracting regions.The nucleolus structure was then trapped inthe middle between small and large chromo-some arms, imposing a physical barrier betweenthem.In addition to describing the Sc2.0 chromo-
some organization, we also used Hi-C to identifychromosomal rearrangements resulting fromSCRaMbLE experiments. Inducible recombina-tion between the hundreds of loxPsym sitesintroduced into Sc2.0 chromosomes enablescombinatorial rearrangements of the genomestructure. Hi-C contact maps of two SCRaMbLEstrains carrying synIII and synIXR chromosomesrevealed a variety of cis events, including simpledeletions, inversions, and duplications, as wellas translocations, the latter event representinga class of trans SCRaMbLE rearrangements notpreviously observed.
CONCLUSION: This large data set is a re-source that will be exploited in future studiesexploring the power of the SCRaMbLE system.By investigating the trajectories of Sc2.0 chro-mosomes in the nuclear space, this work pavesthe way for future studies addressing the in-fluence of genome-wide engineering approacheson essential features of living systems.▪
RESEARCH | SYNTHETIC YEAST GENOME
Mercy et al., Science 355, 1050 (2017) 10 March 2017 1 of 1
The list of author affiliations is available in the full article online.*These authors contributed equally to this work.†Corresponding author. Email: [email protected] (H.M.);[email protected] (R.K.)Cite this article as G. Mercy et al., Science 355, eaaf4597(2017). DOI: 10.1126/science.aaf4597
Synthetic chromosome organization. (A) Hi-C contact maps of synII and native (wild-type,WT)chromosome II. Red arrowheads point to filtered bins (white vectors) that are only present in thenative chromosome map. kb, kilobases. (B) Three-dimensional (3D) representations of Hi-C mapsof strains carrying rDNA either on synXII or native chromosome III. (C) Contact maps and 3Drepresentations of synIXR (yellow) and synIII (pink) before (left) and after (right) SCRaMbLE.Translocation breakpoints are indicated by green and blue arrowheads.
ON OUR WEBSITE◥
Read the full articleat http://dx.doi.org/10.1126/science.aaf4597..................................................
on
Mar
ch 9
, 201
7ht
tp://
scie
nce.
scie
ncem
ag.o
rg/
Dow
nloa
ded
from