semi-supervised learning: a neural network model

36
Semi-supervised learning: a neural network model Data Mining and Machine Learning Lab. Lezione 8 Master in Data Science for Economics, Business and Finance 2018 Università degli Studi di Milano

Upload: others

Post on 03-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semi-supervised learning: a neural network model

Semi-supervised learning: a neural network model

Data Mining and Machine Learning Lab.

Lezione 8

Master in Data Science for Economics, Business and Finance

2018

Università degli Studi di Milano

Page 2: Semi-supervised learning: a neural network model

Outline

● Semi-supervised learning– Inductive and Transductive learning

– Graph based learning

● Hopfield neural network model

● COSNet

● COSNet extensions

● Possible developments

Page 3: Semi-supervised learning: a neural network model

Semi-supervised learning

● Semi-supervised learning refers to a branch of machine learning studies how to learn from labeled and unlabeled data

● Input: a set S={(x1,y

1), (x

2,y

2)…, (x

n,y

n)} of labeled instances, and a set

un unlabeled instances U={xn+1, x

n+2 …, x

n+h}, with x

i ∈ Rd e y

i∈{­1, 1}

● Output: a labeling (yn+1), y

n+2 …, y

n+h)∈ {­1, 1}h for instances in U

● Inductive learning: infers a function f: Rd             {­1, 1} ● Transductive learning: infers a function f: U            {­1, 1}

● An example of semi-supervised (transductive) learning is given by graph-based methods, which represents instances as nodes and their similarities as egdes/connections

Page 4: Semi-supervised learning: a neural network model

Graph-based learning

Input:● V instances

G=(V,W)

Page 5: Semi-supervised learning: a neural network model

Input:● V instances

● W edge weights

G=(V,W)

Graph-based learning

Page 6: Semi-supervised learning: a neural network model

Input:● V instances

● W conneciton matrix

● S, U bipartition of V– S labeled instances

– U unlabeled instances

S

U

G=(V,W)

Graph-based learning

Page 7: Semi-supervised learning: a neural network model

Input:● V instances

● W connection matrix

● S, U bipartition of V– S labeled instances

– U unlabeled instances

● Sp, Sn bipartition of S

?

?

?

?

S

U

G=(V,W)

Graph-based learning

Page 8: Semi-supervised learning: a neural network model

Input:● V instances

● W connection matrix

● S, U bipartition of V– S labeled instances

– U unlabeled instances

● Sp, Sn bipartition of S

Output:● Up,Un bipartition of US

U

G=(V,W)

Graph-based learning

Page 9: Semi-supervised learning: a neural network model

Hopfield Networks

● A paper by John Hopfield in 1982 was the catalyst in attracting the attention of many physicists to "Neural Networks".

● His aim:How is one to understand the incredible effectiveness of a brain in tasks such as recognizing a particular face in a complex scene?

● Like all computers, a brain is a dynamical system that carries out its computations by the change of its 'state' with time.

Page 10: Semi-supervised learning: a neural network model

Hebb's rule

● A Hopfield network (HN) is based on the Hebbian rule (Donald Hebb, 1949)

– Hebb’s ruleHebb’s rule states that if neuron states that if neuron ii is near enough to excite neuron is near enough to excite neuron jj and repeatedly participates in its activation, the synaptic and repeatedly participates in its activation, the synaptic connection between these two neurons is strengthened and neuron connection between these two neurons is strengthened and neuron jj becomes more sensitive to stimuli from neuron becomes more sensitive to stimuli from neuron ii..

– If two neurons on either side of a connection are activated If two neurons on either side of a connection are activated synchronously, then the weight of that connection is increasedsynchronously, then the weight of that connection is increased

– If two neurons on either side of a connection are activated If two neurons on either side of a connection are activated asynchronously, then the weight of that connection is decreasedasynchronously, then the weight of that connection is decreased

Page 11: Semi-supervised learning: a neural network model

Neuron

Synapses can start from both dendrites and axon

Dendrites represent the ”input” to the neuron, the axon its output

A neuron activates (sends electrical impulses) when its input overcome a certain energy level (fired status)

Othervise the neuron is inactive

Page 12: Semi-supervised learning: a neural network model

Hopfield Networks

ji

wij

i

x1

x2

x3

xn

Wi1

.

. i

Wi2

x i=sgn (∑j=1

n

wij x j−θ i)

xi

sgn (a)={1 if a> 0−1 if a≤0 }

- Try to reprouce brain neurons - Dynamic model in which at each time t each neuron i has an activation value (state) x

i∈{1,0(­1)} and activation

threshold i⋅

Page 13: Semi-supervised learning: a neural network model

Update rule

● The state of the network x(t)=(x1(t), x

2(t), …, x

n(t)) at each

time t is the vector of the neuron activation values at time t.● The neurons are subject to the asynchronous rule for

updating one neuron at a time:

Pick a unit i at random and set

If the input at neuron i is greater than i, turn it on otherwise turn it off

● Moreover, Hopfield assumes symmetric weights: wij = w

ji

x i(t+ 1)=Sgn(∑j=1

i−1

w ij x j(t+ 1)+ ∑k =i+ 1

n

w ik x k (t)−θi)

Page 14: Semi-supervised learning: a neural network model

Energy function

● Hopfield defined the state function called “energy function”:

E(x) = - ½ ij xixjwij + i xii

● If we pick unit i and the firing rule (previous slide) does not change its state xi, it will not change E

● Theorem: the dynamics from the initial state follows a trajectory to an equilibrium state, which is (local) minimum of the energy function

Page 15: Semi-supervised learning: a neural network model

Convergence

● xi: -1 to 1 transition

– It means xi initially equals 0, and jwijxj i

– The corresponding change in E is

E = [1-(-1)] (- ½ j (wijxj + wjixj) + i )

= - 2(j wijxj - i) (by symmetry)

0 (since the neuron passed from state -1 to state 1)

Page 16: Semi-supervised learning: a neural network model

Convergence

● xi: 1 to -1 transition

– It means xi initially equals 1, and jwijxj i

– The corresponding change in E is

E = (-1-1) (-½ j (wijxj + wjixj)+ i )= 2(j wijxj - i) 0On every updating we have E 0

– Hence the dynamics of the net tends to move E toward a minimum

– We stress that there may be different such states — they are local minima. Global minimization is not guaranteed.

Page 17: Semi-supervised learning: a neural network model

Convergence

● The symmetry condition wij = wji is crucial for E 0

● Without this condition ½ j(wij + wji) sj - i cannot be reduced to ( j wijsj - i), so that Hopfield's updating rule cannot be guaranteed to yield a passage to energy minimum

– It might instead yield a limit cycle of order 2

Page 18: Semi-supervised learning: a neural network model

HN as local optimizer

● To design Hopfield nets to solve optimization problems:

– choose weights for the network so that E is a measure of the overall constraint violation.

– A famous example is the traveling salesman problem.● [HBTNN articles:Neural Optimization; Constrained Optimization and the

Elastic Net. See also TMB2 Section 8.2.]

Page 19: Semi-supervised learning: a neural network model

Gene Annotation using Integrated Networks (GAIN)

● Karaoz et al. (2003)

Discrete Hopfield network (DHN)

Genes NeuronsGene-Gene Similarity Weights

(thresholds = 0)Input bipartitions Initial state

DynamicsOutput Equilibrium state

Page 20: Semi-supervised learning: a neural network model

GAIN

● Initial state for each neuron i :

xi (0) = 1, -1, 0

positive label negative label

?, Unlabeled

x i(t+ 1)=Sgn(∑j=1

i−1

w ij x j(t+ 1)+ ∑k =i+ 1

n

w ik x k (t))

Page 21: Semi-supervised learning: a neural network model

● Energy function

● Minimizing E means maximizing the weighted sum of consistent edges (i.e. connecting nodes at the same state)

● The equilibrium state x = (s, u) characterizes the bipartition

of UUp = {i ∈ U | u

i = 1}

Un = {i ∈ U | ui = -1}

E (x )=−12⋅∑

i=1

n

x i(∑j=1

n

x j wij )

GAIN

Page 22: Semi-supervised learning: a neural network model

● ”Same relevance” for positive and negativeexamples

● Data imbalance not managed

● GAIN tries to find a global minimum x of Eassuming that the initial state s of labeled nodes is a part of x, i.e.

x = (s, u)

● In many cases s is not a part of a minimum● No coherence with the prior knowledge

Drawbacks of GAIN

Page 23: Semi-supervised learning: a neural network model

COSNet [1,2]

GAIN:● Positive labels := 1● Nagative labels := -1● Thresholds := 0

COSNet: ● Positive labels := sinα parameters to● Nagative labels := -cosα be learned!● Thresholds := γ

Parametric DHN: < W, γ, α>

Page 24: Semi-supervised learning: a neural network model

● H = < W, γ, α> DHN on nodes in V

– W connection matrix

– γ = (γ1 , γ

2 , ..., γ

n) vector of activation thresholds

– α ∈ (0, π/2) , neuron values are sinα, -cosα

● In GAIN – γ = 0

– α = π/4

COSNet

Page 25: Semi-supervised learning: a neural network model

Sub Network

Two Subnetworks: H|S,U

p and H|U,S

p

S ULabeled Unlabeled

Deal with Data Imbalance and prior knowledge ”coherence”

Preserve prior knowledge

Page 26: Semi-supervised learning: a neural network model

Sub-network property

● Given– DHN H < W, γ, α> with neurons V

– S, U bipartition of V

– Sp, Sn bipartition of S

– Up, Un bipartition of U

It holds: if x = (s, u) is an energy global minimum H,

then u is an energy global minimum of H|U,S

p

Page 27: Semi-supervised learning: a neural network model

● Having a part s of a minimum of energy of H, it's possible to discover the hidden part u by minimizing the energy of H|

U,S p

H

u = ?s

H|U,S

p

us

H

Sub-network property

Page 28: Semi-supervised learning: a neural network model

Sketch of COSNet

● INPUT: W similarity matrix; S, U bipartiton of V; Sp, Sn

bipartiton of S

● OUTPUT: Up ,Un bipartition of U

1. Generate a temporary solution Up ,Un

2. Find the couple (α , γ ) such that the initial state of the network H|

S,Up is as close as possible to an equilibrium state

● Extend the parameters (α , γ ) to the network H|U,S

p

3. Run the network H|U,S

p

Page 29: Semi-supervised learning: a neural network model

Step 1: generating a temporary solution

ps positive rate in S

pu positive rate in US U

Sp

Procedure:● Generate k according to binomial distribution B(|U|, )● Up := k elements randomly chosen in U● Un := U \ Up

∣S p∣

∣S∣

FACT:∣S p

∣S∣=argmax

xProb{ pu= x∣ps=

∣S p∣∣S∣ }

Up=?

Page 30: Semi-supervised learning: a neural network model

Step2: finding the optimal parameters

S Set of labeled points in R2 Lp : positive points Ln : negative points

Pi ≡ ( , )

AIM: ”optimal” separation of Lp from Ln by a straight line y = tanα x + q according to the F-score criterion

i ∑j∈S p

∪U p

wij ∑j∈S n

∪U n

wij

Label of Pi = label of i

Page 31: Semi-supervised learning: a neural network model

γ = -q cosα

we chose the

same γ for each

neuron

y = tanα + q

x

y

Fact: Fscore (opt) = 1 the corresponding state of H|

S,Up is an equilibrium

point

Step2: finding the optimal parameters

Page 32: Semi-supervised learning: a neural network model

Data imbalance management

x

y

x

y

|Sp|<<|Sn|, α > π/4

|sinα|>|cosα|

|Sp|>>|Sn|, α < π/4

|sinα|<|cosα|

Constraint:α ∈ ]0, π/2[

Page 33: Semi-supervised learning: a neural network model

Step 3: finding the final solution

● Dynamics of the sub-network H|U,S

p with thefound parameters until fixed point u is reached

● Infer bipartition of U as follows:

● Up = {i ∈ U | ui = sinα}

● Un = {i ∈ U | ui = -cosα}

Page 34: Semi-supervised learning: a neural network model

Extending the number of parameters [3]

● There are some extensions of COSNet [3,4] considering neurons of different types, each one with its own pair of activation values. For instance with two types:Type 1: activation values {sinα1, -cosα1}

Type 2: activation values {sinα2, -cosα2}

Page 35: Semi-supervised learning: a neural network model

Conclusions

● We studied:– A Cost-Sensitive method based on neural network for

predicting labels in graph● Better performance w.r.t. the state-of-the-art methods

● The time complexity O(|S|*log|S| + |W|) allowsthe application to nets with thousands of nodes

● We increased the number of parameters by considering two or more categories of neurons

– Learned by the model or received in input as argument

● COSNet è implelentato in R nell'omonimo package disposnibile sul repository Bioconductor

Page 36: Semi-supervised learning: a neural network model

References

● [1] A. Bertoni, M. Frasca, G. Valentini. COSNet: a Cost Sensitive Neural Network for Semi-supervised Learning in Graphs, In: "European Conference on Machine Learning, ECML PKDD, 2011, Athens, Proceedings, Part I, Lecture Notes on Artificial Intelligence, vol. 6911, pp.219- 234, Springer.

● [2] M. Frasca, A. Bertoni, M. Re, and G. Valentini. A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Networks, 43:84 – 98, 2013.

● [3] M. Frasca, A. Bertoni, A. Sion. A neural procedure for Gene Function Prediction, Neural Nets and Surroundings,Smart Innovation, Systems and Technologies. Volume 19, 2013, pp 179-188, WIRN 2012.

● [4] M. Frasca, S. Bassis, and G. Valentini. Learning node labels with multi-category hopfield networks. Neural Computing and Applications, 27(6):1677–1692, 2016.