decision camp 2014 - charles forgy - affecting rules performance
DESCRIPTION
Factors affecting rules performanceTRANSCRIPT
Factors AffectingRule Performance
Charles L. Forgy
October, 2014
Outline
• Rete: A 10,000 foot view.
• Dealing With Expensive Rules.
• Other Considerations.
• Conclusion.
RETE: A 10,000 FOOT VIEW.
Reactive Rules
We are only going to consider reactive rules.
• Inference rules.
• Monitoring rules.
Rete
Most rule engines use some form of Rete to handle reactive rules.
• Advisor, Clips, Drools, Ilog, Jess, Opsj, Smarts, Tibco ...
OPSJ
rule marking
if {
st: Stage(st.value == "marking”);
j1: Junction(j1.visited == "now”);
e1: Edge(e1.p2 == j1.base_point);
j2: Junction(j2.base_point == e1.p1,
j2.visited == "yes”);
} do {
j2.visited = "check";
update(j2);
}
The Recognize-Act Cycle
1. Match: The engine determines which rules have satisfied conditions.
2. Conflict Resolution: One or more rules are selected for execution.
3. Action: The Then parts of the selected rules are executed.
The Recognize-Act cycle repeats until there are no satisfied rules.
The Problem for the Engine
• The working memory may contain thousands of objects.
• Systems may contain hundreds to thousands of rules.
• A condition part may contain several conditions, each of which has to match an object.
Saving Information
• Typically, only a few objects are added to or removed from Working Memory on each cycle.
• So, most of the information that was computed for any cycle N is still useful on cycle N+1.
The Task
Given the state of the system on cycle N, to determine the state for cycle N+1:
• Delete the specific pieces of information that are no longer correct.
• Add in the new pieces of information.
More Specifically
Given the state of the system on cycle N, to determine the state for cycle N+1:
• For each object in working memory that was deleted or changed, remove every piece of match information referring to that object.
• For each object in working memory that was changed or added, add any match information that can make use of that object.
Constraints Within Conditions
rule marking
if {
st: Stage(st.value == "marking”);
j1: Junction(j1.visited == "now”);
e1: Edge(e1.p2 == j1.base_point);
j2: Junction(j2.base_point == e1.p1,
j2.visited == "yes”);
} do {
j2.visited = "check";
update(j2);
}
Constraints Between Conditions
rule marking
if {
st: Stage(st.value == "marking");
j1: Junction(j1.visited == "now");
e1: Edge(e1.p2 == j1.base_point);
j2: Junction(j2.base_point == e1.p1,
j2.visited == "yes");
} do {
j2.visited = "check";
update(j2);
}
Handling Inserted Objects
To find the rules that might be affected by a changed working memory object, Rete
1. Uses the constraints within each condition to find the conditions that match at least that information. (“Alpha” tests.)
2. Uses the constraints between conditions to determine whether the changed object fully matches. (“Beta” tests.)
Saving Information
Rete saves all the information that it computes while processing each rule.
• It keeps track of the objects that match each condition based on only Alpha tests.
• For each initial sequence of conditions in a condition part, it keeps track of the lists of objects that match those conditions when Beta tests are also applied.
Order Rule
rule order
if {
low: Integer;
mid: Integer(
mid.intValue() > low.intValue());
hi: Integer(
hi.intValue() > mid.intValue());
} do {
delete(mid);
}
Execute:insert(new Integer(0));
insert(new Integer(1));
Alpha matches:low: Integer(0), Integer(1)
mid: Integer(0), Integer(1)
hi: Integer(0), Integer(1)
Alpha Matches
rule order
if {
low: Integer;
{Integer(0), Integer(1)}
mid: Integer(
mid.intValue() > low.intValue());
{Integer(0), Integer(1)}
hi: Integer(
hi.intValue() > mid.intValue());
{Integer(0), Integer(1)}
} do {
delete(mid);
}
Execute:insert(new Integer(0));
insert(new Integer(1));
Alpha matches:low: Integer(0), Integer(1)
mid: Integer(0), Integer(1)
hi: Integer(0), Integer(1)
Beta matches:[low, mid]: [Integer(0), Integer(1)]
[low, mid, hi]:
Beta Matches
rule order
if {
low: Integer;
mid: Integer(
mid.intValue() > low.intValue());
{[Integer(0), Integer(1)]}
hi: Integer(
hi.intValue() > mid.intValue());
{ }
} do {
delete(mid);
}
Execute:insert(new Integer(2));
Alpha matches:low: Integer(0), Integer(1), Integer(2)
mid: Integer(0), Integer(1), Integer(2)
hi: Integer(0), Integer(1), Integer(2)
Beta matches:[low, mid]: [Integer(0), Integer(1)],
[Integer(0), Integer(2)],
[Integer(1), Integer(2)]
[low, mid, hi]: [Integer(0), Integer(1), Integer(2)]
Beta Matches
rule order
if {
low: Integer;
mid: Integer(mid.intValue() > low.intValue());
{[Integer(0), Integer(1)],
[Integer(0), Integer(2)],
[Integer(1), Integer(2)]}
hi: Integer(hi.intValue() > mid.intValue());
{[Integer(0), Integer(1), Integer(2)]}
} do {
delete(mid);
}
Handling Deleted Objects
Processing deleted objects is fast. The engine keeps track of the saved data which each object is involved in, so when the object is deleted, the engine can directly remove all the stored match information.
Order Rule
rule order
if {
low: Integer;
mid: Integer(
mid.intValue()>low.intValue());
hi: Integer(
hi.intValue()>mid.intValue());
} do {
delete(mid);
}
After rule fires.
Alpha matches:low: Integer(0), Integer(2)
mid: Integer(0), Integer(2)
hi: Integer(0), Integer(2)
Beta matches:[low, mid]: [Integer(0), Integer(2)]
[low, mid, hi]:
Important Points
• Rete saves state, processing only the changed objects each cycle.
• For each initial sequence of conditions in a condition part, Rete keeps track of the lists of objects that match those conditions.
DEALING WITH EXPENSIVE RULES
Finding Expensive Rules
The expensive rules are not always the ones the developer suspects; profiling is essential
Rule engines have very different profiling tools.
– Must check the documentation for the engine you are using.
OPSJ – Java Profiler
The Slowest Rule
rule start_visit_3_junction
if {
stg: Stage(stg.value == "labeling");
junct: Junction(junct.kind == "3j",
junct.visited == "no");
} do {
junc.visited = “now”;
stg.value = visiting_3j";
update(junc, stg);
}
Excessive Beta Matches
In many cases, expensive rules are caused by creating excessive beta matches.
Typical issues:
– Computing more information than is needed.
– Throwing away information and recomputing it.
How Many Junctions Are There?
rule start_visit_3_junction
if {
stg: Stage(stg.value == "labeling");
junct: Junction(junct.kind == "3j",
junct.visited == "no");
} do {
junc.visited = “now”;
stg.value = “visiting_3j”;
update(junc, stg);
}
Updating the Stage CausesBeta Matches to be Discarded
rule start_visit_3_junction
if {
stg: Stage(stg.value == "labeling“);
junct: Junction(junct.kind == "3j",
junct.visited == "no");
} do {
junc.visited = “now”;
stg.value = “visiting_3j”;
update(junc, stg);
}
Option 1
rule start_visit_3_junction
if {
stg: Stage(stg.value == "labeling");
junct: Junction(junct.kind == "3j",
junct.visited == "no");
} do {
junc.visited = “now”;
update(junc);
insert(new Stage(“visiting_3j”));
}
Option 1: Continued
To accommodate this change, other rules must be modified to use insert/delete of stages as well.
Option 2
rule start_visit_3_junction
if {
stg: Stage(stg.value == "labeling");
junct: from j:Junction(j.kind == "3j",
j.visited == "no")
TakeAny;
} do {
junc.visited = “now”;
stg.value = “visiting_3j”;
update(junc, stg);
}
Effects of Changes
• Option 1 computes a lot of state information, but does not throw it away while it is still useful.
• Option 2 computes only the state that is needed at each step.
OPSJ – Java Profiler
Initial Boundary Rules
rule initial_boundary_junction_L
if {
stg: Stage(stg.value ==
"find_initial_boundary");
junct: Junction(junct.kind == "2j");
edge1: Edge (edge1.p1 == junct.base_point,
edge1.p2 == junct.p1);
edge2: Edge (edge2.p1 == junct.base_point,
edge2.p2 == junct.p2);
!j2: Junction (j2.base_point >
junct.base_point);
} do { . . . }
Expensive Idioms
Another common problem is using expensive idioms in rule conditions.
Maximize Idiom
rule initial_boundary_junction_L
if {
stg: Stage(stg.value ==
"find_initial_boundary");
junct: Junction(junct.kind == "2j");
edge1: Edge (edge1.p1 == junct.base_point,
edge1.p2 == junct.p1);
edge2: Edge (edge2.p1 == junct.base_point,
edge2.p2 == junct.p2);
!j2: Junction(j2.base_point >
junct.base_point);
} do { . . . }
Why This Is Expensive
• When comparing a collection of beta matches with a collection of alpha matches, all tests are not equally expensive.
• Equality tests can be handled quite efficiently.
• Order tests (<, >, etc.) take more time to evaluate.
The Slow Comparison
rule initial_boundary_junction_L
if {
stg: Stage(stg.value ==
"find_initial_boundary");
junct: Junction(junct.kind == "2j");
edge1: Edge (edge1.p1 == junct.base_point,
edge1.p2 == junct.p1);
edge2: Edge (edge2.p1 == junct.base_point,
edge2.p2 == junct.p2);
!j2: Junction(j2.base_point >
junct.base_point);
} do { . . . }
Avoiding the Idiom
rule initial_boundary_junction_L
if {
stg: Stage(stg.value ==
"find_initial_boundary");
junct: from j:Junction
TakeMax(j.base_point);
test (junct.kind == "2j");
edge1: Edge (edge1.p1 == junct.base_point,
edge1.p2 == junct.p1);
edge2: Edge (edge2.p1 == junct.base_point,
edge2.p2 == junct.p2);
} do { . . . }
OTHER CONSIDERATONS
When To Optimize
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”
- Donald Knuth
When To Optimize
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
“Yet we should not pass up our opportunities in that critical 3%.”
- Donald Knuth
NP-Complete Problems
There is a class of problems known as “NP-Complete” problems.
There is no known algorithm that can solve NP-Complete problems in polynomial time.
The Bad News
Rule matching is an NP-Complete problem.
But…
Processing SQL queries is also an NP Complete problem.
CONCLUSIONS
Understand Rule Engine Performance
• Rete is a state-saving algorithm; on each cycle it maps from working memory changes to changes in rule matching information.
• Usually, the constant (“alpha”) tests are not a problem.
• The variable (“beta”) tests can be a problem.
Profiling
• Generally, only a few rules will impact performance significantly.
• It is important to profile the rules, and not guess at the culprits.
Dealing With Expensive Rules
Typical problems to look for:
– Computing more information than is needed.
– Throwing away information that is still useful.
– Using expensive idioms.
There are problems that are inherently expensive. (But we can hope they are rare.)
Thank You