implementing the genetic algorithm in xslt: poc

Post on 20-Jan-2015

780 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Implementing the Genetic Algorithm in XSLT from 2002

TRANSCRIPT

Proof of Concept:SOA Application Composition using the Genetic Algorithm

Jim Fuller

http://www.ruminate.co.uk http://www.slgchorus.com

Introduction• Technical Director / Internet Services Manager

for Stuart Lawrence Group companies• on-IDLE ltd sponsored 1st XSLT conference in

the world: XSLT UK 2001 along with Dave Pawson

• co-founder of the EXSLT effort, along with Dave Pawson, Jeni Tennison, Uche Obigu, et al.

• Technical reviewer and author for now defunct WROX, on books dealing with XML, XSLT and web services

Lecture Overview

How we use WS todayXSLT and S-expressionsGenetic Algorithm refresherEarly Genetic Experiments with XSLTApplication composition using Genetic

AlgorithmConclusions

How we use WS in today's applications

• Indirectly consume web services via WSDL / UDDI subsequent generation of stub code

• Direct Consumption of SOAP via manual crafting of HTTP Request headers + SOAP envelope

• Primary use cases: Integration and Interoperability

• Emerging use cases: orchestration, higher level business processes, and automated application composition

MVC type architectures are popular

Client Tier

Presentation Tier

Business Tier

Integration Tier

Resource Tier Data Repository,

XML Binding, Persistence

Model

View Controller

External web services

Internal web services

WS MVC with the Browser

Controller

EventHandler

SOAPEventHandler

Model

The Model receives events from the Controller and updates itself sending Data which gets transformed by our view components.

View

-IE web service client side processing

-XSLT templates-CSS-Global.xml-Global.xsl

HTTP GETHTTP POSTREQUEST

Internal web services

External web services

HTTP RESPONSE

Internet Explorer Client

SOA Anchor

• Stability via web service server: BEA Weblogic, IBM Websphere, Systinet WASP, .NET, ColdFusionMX

• versioning control of web services• Easy to deploy same web service through multiple

transports• Smooth out learning curve for many of the underlying

XML technologies ( SAML )• security integration with underlying PKI • Instant solution to some problems• Deploy existing code as web service, no need for

‘special’ web service code embedded in your own code

Bazaar not opened yet

• Currently developers ask how can *I* use them in *my* applications.

• Web services live behind the firewall and solve integration problems; extraprise.

• Google, Amazon and Microsoft are all examples of monolithic web services.

• Many deployed web services are highly specific to a certain problem domain.

• Who will bind a specific public web service with their precious application ? (Amazon in research pane).

The world of ‘millions of web services’

• The question is not ‘how will a developer find a web service?’ but how will a machine find and use the right web service ?

• How will the developer/machine know it’s the right one ? That its stable, correct version, and it can be trusted…

• The promise of SOA is real time application composition generating applications or components, based on a set of general evolving criteria

Automatic application composition methods

• One approach, not linked to any problem domain is to use the Genetic Algorithm…though there are obvious constraints using these methods

Random searchof the problem

domain

AI / intelligent Software agent

methods

Genetic Algorithm Refresher

• The Genetic Algorithm ( GA ) is a model of the evolution of a population of artificial individuals.

• Each individual is a chromosome which contains discrete units of information; in computers this can be a string, binary numbers, etc… .

• With each generation the best fitness individuals are selected for genetic operations to create new generation

• The driving force behind the search for new and better solutions is the retention and combination of good partial solutions to a problem

Abridged Genetic Algorithm

• The Fundamental Theorem of Genetic Algorithms

M(H, t):# of individuals in population 't' with the schema 'H'.f(H): average fitness of the individuals with the schema 'H'.F: average fitness of the entire population.p1:probability of the schema being destroyed by crossover.p2:probability of the schema being destroyed by mutation.

GA operations

• Reproduction: An individual is perfectly replicated to a new population

• Crossover ( Recombination ): Parental material is recombined to create offspring to join new population

• Mutation: random changes• Permutation: reordering • Editing: evaluation to a terminal• Encapsulation: single indivisible function• Decimation: removal of individuals

Genetic Programming ProcessStep 0. Create a random initial population of individuals

Step 1. Evaluate the fitness of each individual

Step 2. Select individuals according to their fitness, which will participate in generating offspring (moms+dads)

Step 3. Apply primary and secondary genetic operations to generate new offspring population

Step 4. Repeat the steps 1,2,3, to generate X number of generations

Step 5. choose best fit individual

Symbolic expressions and XSLT

• XSLT List questions….I originally wanted to solve ‘I want to transform source xml to target xml using XSLT’. Could use generic templates or some other automated process.

• Vestigial lisp memories of s expressions are similar to xslt / xml: data and programming in one

• XSLT guru David Carlisle presence at XSLT UK 2001 opened my eyes to functional programming

• My work with EXSLT defined the limitations of XSLT…which led me to build frameworks to implement complex MVC architectures

(+(* 2 3) 4) evaluates to 10 and symbolic expression looks like;

Simplest Lisp Example

3

4

+

*

2

Hierarchical computer programs are more expressive then manipulating linear strings

XSLT are also general hierarchical computer programs

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version=“2.0">

<xsl:template match="a"> <d/>

<c/> </xsl:template></xsl:stylesheet>

<d/><c/>

<xsl:template/>

<xsl:stylesheet/>

There are some differences, e.g. there are a variety of node types within XML

Problem definition

• Create a GA process that will discover an XSLT program which taken a source.xml generates a target.xml

• Prototype uses ASF ANT to control the whole process

• Michael Kay’s excellent SAXON xslt processor, XSLT 2.0 simplified situation by removal of dealing with RTF’s and node-set usage

• Initially create a simple problem, e.g. that of transforming a source xml into a copy of itself

Source XML

<a>

<b>

<c>

<d></d>

</c>

</b>

</a>

Target XML

<a>

<b>

<c>

<d></d>

</c>

</b>

</a>

Early Genetic ExperimentStep 0. Randomly generate initial population of xslt documents

Step 1. evaluate fitness using via xml diff of target.xml to result.xml

Step 2. select individuals according to their fitness which can be used by step 3

Step 3. Apply primary and secondary genetic operations to generate new offspring population from selected individuals

Step 4. Repeat steps 1,2,3, to generate X number of generations

Step 5. choose best fit individual of last generation

Objective Generate an xslt program that transforms source xml into result xml which is equivalent to target xml

Terminal Set <a/> <b/> <c/> <d/>

Function Set Subset of xslt instructions

Fitness Cases One fitness case

Raw fitness Node count on xmldiff patch file difference between result xml and target xml

Standardized fitness

Same as raw fitness, approaching 0 is better fitness

Parameters M=500, G=51

Step 0. Generate Initial Population

Used IBM xml generator: com.ibm.XMLGenerator.XMLGenerator to generate a population of xslt documents.

<?xml version='1.0'?><!-- Created by IBM XML Generator

numberLevels=10, maxRepeats=3, Random seed=1060890913224fixedOdds=1, impliedOdds=4, defaultOdds=4maxIdRefs=3, maxEntities=3, maxNMTokens=3isExplicitRoot=true, root element name is 'xsl:stylesheet'entOdds=1 Entity list:[]doctype declaration?false

--><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="a"> <c/> <d/> </xsl:template></xsl:stylesheet>

Avoid ‘early taxonomisation’

• No attributes• No namespaces• No schemas• Xmlgenerator DTD defines allowable

terminals and functions e.g. xsl:apply-templates, xsl:for-each, xsl:value-of, xsl:copy-of, xsl:choose, xsl:if, xsl:copy.

• used <a>, <b>, <c>, <d> as the only allowable elements

Ant: generate_initial_population <target name=“generate_initial_population">

<tempfile property="temp.file" prefix="xslt_" suffix=".xsl" destdir="${dirs.src}"/>

<!-- defines start.TODAY, start.DSTAMP, start.TSTAMP properties //--><tstamp prefix="start"/>

<!-- current population number //--><property name="xslt.build_number" value="${gen_count}"/>

<!-- apply transforms using xslt //--><java classname="com.ibm.XMLGenerator.XMLGenerator"

fork="true" failonerror="false"

output="${temp.file}">

<arg value="${xslt.initial_dtd}"/> <arg value="-root"/> <arg value="${xslt.root_node}"/> <arg value="-nodecl"/> <arg value="-f"/> <arg value="1"/> <arg value="-l"/> <arg value="10"/>

</java>

</target>

Step 1: Evaluate Fitness

XSLT generation

xslt Source.xml

result.xml Target.xml

evaluate fitness

transformation

xml diff

Each individual is ranked, by testing xslt program against a source xml

Step 1. evaluate fitness (cont)

• Could have chosen multiple source and target xml to use in fitness assessment

• Output of transformation (result.xml) is xmldiff’ed with target xml

• I used an extremely simple xml diff tool that just output xml patch

• Converted Diff patch file into a number, which is the number of nodes contained in the patch file

TREEDIFFMERGE DIFFERENCE PATCH RESULT XML from XSLT individual transformation with SOURCE XML

<?xml version="1.0" encoding="UTF-8"?>

<diff xmlns:diff='http://diff.org'>

<diff:insert dst="1">

<a>

<b>

<c>

<d />

</c>

</b>

</a>

</diff:insert>

</diff>

<?xml version="1.0" encoding="UTF-8"?><root/>

<?xml version="1.0" encoding="UTF-8"?>

<diff xmlns:diff='http://diff.org'>

<diff:copy src="2" dst="1">

<diff:copy src="16" dst="2" />

</diff:copy>

</diff>

<?xml version="1.0" encoding="utf-8"?><root>

<a/><a><a><c/><c><a><d/></a><c/></c></a><b><b/><a/><c/><b>

<c>

<d/>

</c>

</b></b><a/></a><d><a><c/><a/><a/></a><c/></d><c/>

</root>

<?xml version="1.0" encoding="UTF-8"?>

<diff />

<?xml version="1.0" encoding="utf-8"?><root><a>

<b>

<c>

<d/>

</c>

</b>

</a></root>

XML Diff issues

• Most diff algorithms are based on a paper published in 1976 by J. W. Hunt and M. D. McIlroy, An Algorithm for Differential File Comparison

• XML is not just text, it has a structure, text based diff programs do not take this into accordance

• simple example: <footie/> versus <footie></footie>logically these are equal

Ant: transform_src

<target name="transform_src"><java classname="net.sf.saxon.Transform"

fork="true"failonerror="false"output="${current_xslt_file}.xml">

<arg value="${source_xml}"/><arg value="${current_xslt_file}"/>

</java></target>

Ant: fitness_src

<target name="fitness_src"><java classname="TreeDiffMerge"

fork="true"failonerror="false"output="${current_xslt_file}.fitness.xml">

<arg value="-d"/><arg value="${current_xslt_file}"/><arg value="${target_xml}"/>

</java></target>

Step 2. Select individuals

• Probabilistic selection to choose which individuals participate in genetic operation

Selected XSLT population

Select individuals for genetic operations, based on their fitness

A word on fitness

• Raw fitness: is the natural representation in terms of the specific problem

• Standardized fitness: lower the better• Adjusted fitness: lies between 0-1• Normalized fitness: lies between 0-1 with

sum of fitness values = 1• In our case the lower the number of

‘different’ nodes the better, use standardized fitness

Step 3. Primary Genetic Operations

Selected XSLT population

New generation

Reproduction

Individual reproduced into new generation

Step 3. Primary Genetic Operations

Selected XSLT population

New generation

Creates 2 offspring‘Mom’

‘Dad’

Crossover ( Recombination )

Select parents then crossover creates 2 offspring

Step 3. Primary Genetic OperationsCrossover ( Recombination )

‘Dad XSLT’‘Mom XSLT’

‘offspring xslt’

‘offspring xslt’

New generationSwap nodes between selected parent xslt

Step 3. Secondary Genetic Operations

• Mutation: is a form of random crossover

• Permutation: Reorganize nodes

• Editing: evaluate a set of nodes

• Encapsulation: takes a branch and replaces with 1 indivisible node

• Decimation: removes individual based on domain specific criteria

Step 3. Secondary Genetic Operations

mutation

‘selected XSLT’

Pick a node and randomly mutate

Completely new set of instructions

‘offspring xslt’

Step 3. Secondary Genetic Operations

permutation

‘selected XSLT’ ‘offspring xslt’

Permutated node order

Step 3. Secondary Genetic Operations

editing

‘selected XSLT’ ‘offspring xslt’

Replace node with evaluated expression

Step 3. Secondary Genetic Operations

encapsulation

‘selected XSLT’ ‘define new function’

Identify useful subtrees and encapsulate by defining new function

‘XSLT’

Step 3. Secondary Genetic Operations

decimation

Identify very poor fitness individuals and remove from population

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"></xsl:stylesheet>

<xsl:stylesheet/>

Ant: select, perform, and generate new population

<target name="select_crossover_population">….xslt transformation selected crossover using xslt</target>

<target name="select_reproduction_population">….xslt transformation selected reproduction using xslt</target>

<target name=“perform_genetic_operation">….genetic operations were performed using xslt</target>

<target name="generate_new_generation">…new individuals were copied over to a new directory</target>

Step 4. Generate X populations

• M= 500, g = 51• Set initial genetic operation probabilities:

90% crossover on selected individuals10% reproduction on selected individuals0% secondary operations on selected

individuals• Define termination criteria if you want an

ongoing process until a desired fitness is obtained.

• Iterate until done

Ant properties<project name=“early_genetic_trial" default="build" basedir=".">

<!-- setup ant-contrib//--><property name="ant-contrib.jar" location="c:\java\ant-contrib-0.3.jar"/><taskdef resource="net/sf/antcontrib/antcontrib.properties"

classpath="${ant-contrib.jar}"/>

<!-- genetic parameters//--><property name="genetic.pop_size" value=“500"/><property name="genetic.number_of_generations" value=“51"/><property name="gen.reproduction_probability" value=".10"/><property name="gen.recombinate_probability" value=".90"/>

<property name="gen.mutation_probability" value=“.0"/><property name="gen.permuation_probability" value=".0"/><property name="gen.editing_probability" value=".0"/><property name="gen.encapsulation_probability" value=".0"/><property name="gen.decimation_probability" value=".0"/>

<!-- xml properties //--><property name=“source_xml" value="c:\_genetic\generate_initial_population\source.xml"/><property name=“target_xml" value="c:\_genetic\generate_initial_population\target.xml"/>

<!-- xslt properties //-->…contained xslt properties

<!-- directory properties //-->…contained directory properties

<!-- report properties //--><property name="xslt.report" value="C:\java\jakarta-ant-1.5.1\etc\log.xsl"/>

Simplified Ant Build Target

<target name="build" depends="clean, create">

<antcall target=" generate_initial_population ">

<param name=“no_of_individuals" value=" ${genetic.pop_size}"/>

</antcall>

<antcall target=“initiate_genetic_run“/>

<antcall target=“report “/>

<echo message=“successfully ran genetic run”/>

</target>

Results

• Non-normative results indicate ok processing time e.g. PIII 128 meg RAM approx 7 minutes to solve this problem

• For simple mapping this was an effective technique

• Many times best fit were poorly performing XSLT, needed to add criteria to fitness that timed processing time

Ruminations

• Early success with XSLT approach proved the applicability of GA with xml based technologies

• Was easy to let people define a source and target xml

• Issues of speed and efficiency can be addressed later on

• How could I involve web services into such a process ?

GA Strategies to Consider

• Could directly apply the genetic algorithm directly with another language; java or c# ?

• Leverage existing XSLT approach and add SOAP as a new function/terminal via XSLT extension

Enhance existing Prototype

• augment XSLT approach and introduce web services into terminal/function set

• Needed a local repository of Web Services to add to existing function set

• Needed to enhance XSLT with a generic SOAP XSLT Extension which indirectly invokes a web services via WSDL definition

• Adjust generate initial population to include soap extension

Simple application compositionStep 0. Randomly generate initial population of xslt documents, this is

now a 2 stage process to include web services via new function

Step 1. evaluate fitness using via xml diff of target.xml to result.xml

Step 2. select individuals according to their fitness which can be used by step 3

Step 3. Apply primary and secondary genetic operations to generate new offspring population from selected individuals

Step 4. Repeat steps 1,2,3, to generate X number of generations

Step 5. choose best fit individual of last generation

Web Services Search Engine

• Long term storage in WSIL format• Data was stored in XML Xindice XML

Repository• Which is accessible via WebDav and

HTTP Get• Can query using XPATH• Harvested by a combination of google,

scanning and general web robot techniques

Manual Harvesting of Web Services

• Google ‘file: wsil’ or inspection.wsil

• Google ‘file: wsdl’

• Scanning common Application Server ports, sending simple SOAP messages

• Xmethods and general registries

• Did not want to bind to either WSDL or UDDI….

Simple WSIL example

<?xml version="1.0"?><inspection

xmlns="http://schemas.xmlsoap.org/ws/2001/10/inspection/">

<service> <description

referencedNamespace="http://schemas.xmlsoap.org/wsdl/" location="http://example.com/stockquote.wsdl" />

</service></inspection>

WSIL with 2 services<?xml version="1.0"?><inspection xmlns="http://schemas.xmlsoap.org/ws/2001/10/inspection/" xmlns:wsiluddi="http://schemas.xmlsoap.org/ws/2001/10/inspection/uddi/"> <service> <abstract>A stock quote service with two descriptions</abstract> <description referencedNamespace="http://schemas.xmlsoap.org/wsdl/" location="http://example.com/stockquote.wsdl"/> <description referencedNamespace="urn:uddi-org:api"> <wsiluddi:serviceDescription location="http://www.example.com/uddi/inquiryapi"> <wsiluddi:serviceKey>4FA28580-5C39-11D5-9FCF-BB3200333F79</wsiluddi:serviceKey> </wsiluddi:serviceDescription> </description> </service> <service> <description referencedNamespace="http://schemas.xmlsoap.org/wsdl/" location="ftp://anotherexample.com/tools/calculator.wsdl"/> </service> <link referencedNamespace="http://schemas.xmlsoap.org/ws/2001/10/inspection/" location="http://example.com/moreservices.wsil"/></inspection>

inspection.wsil at XMETHODS<?xml version='1.0' encoding='UTF-8'?><inspection xmlns='http://schemas.xmlsoap.org/ws/2001/10/inspection/'

xmlns:wsiluddi='http://schemas.xmlsoap.org/ws/2001/10/inspection/uddi/' xmlns:wsilxmethods='http://schemas.xmethods.net/ws/2001/10/inspection/'>

<service> <abstract>Get the Barnes &amp; Noble price by ISBN</abstract> <description referencedNamespace='http://schemas.xmlsoap.org/wsdl/'

location='http://www.abundanttech.com/webservices/bnprice/bnprice.wsdl'/> <description referencedNamespace='http://www.xmethods.net/'> <wsilxmethods:serviceDetailPage

location='http://www.xmethods.net/ve2/ViewListing.po?key=uuid:C5119582-90AC-51E7-72AA-ED7D8927C9D1'>

<wsilxmethods:serviceID>272507</wsilxmethods:serviceID> </wsilxmethods:serviceDetailPage> </description> </service>…..</inspection>

XSLT Generic SOAP client

• Created extension function in SAXON, which grew out of a SOAP debugging tool effort ( another talk ! )

• Ability to invoke a web service via WSDL and randomly choose web service

• Web service invocation called during xslt transformation

• Function prototype: ws:invoke(wsdl,methodname,nodeset)

Example of using a web service in XSLT

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:ws=“http://www.ruminate.co.uk/ws”version=“2.0"> <xsl:template match="a"><xsl:value-of

select=“ws:invoke(‘http://somewsdlfile.wsdl’,’getGUID’,a)”/>

<b/> </xsl:template></xsl:stylesheet>

Issues• Step 0 generation required additional stages, to introduce ws:invoke

combined with WSIL information• Encapsulation was applied to xslt statements that contained

ws:invoke function, so crossover would not change the statement• Always choose 1st method ( in order ) in WSDL• Step 0 consistently generated highly unfit programs, required larger

population size• Mutation seeding ws:invoke statement vastly speeded up process• New timeout factors necessary• GA process significantly slowed down due to inclusion of web

services• GA process was more effective with better fitness evaluation; e.g.

ranking fitness consisted of 3 source and targets

Objective Generate an xslt program that multiplies 2 numbers, converts to Celsius and returns number in Chinese

Terminal Set <a/>, <b/> ( 2 numbers )

Function Set Subset of xslt instructions + ws:invoke

Fitness Cases three fitness cases

Raw fitness Node count on xmldiff patch file difference between result xml and target xml

Parameters M=1000, G=51

Results

• Multiply 2 numbers convert to Celsius and result should be in Chinese: average 2 hours

• Tried a variety of more complicated problems, with many runs never converging to a solution; It is apparent that there is not enough ‘genetic material’ online yet

• Prototype proved that GA can be applied• Assisting GA always speeded up the process• Many optimization opportunities

Enhancement

• Could have used Dimitri Novachtev’s FXSL, though this would have imposed a pure fp viewpoint on process

• Use UDDI as web services repository• Applied GA to ANT or xml pipeline, or even to

BPEL, WS-CAF or any xml vocabulary• Prototyping with ANT was successful, but

eventually will embed in a software framework

The Internet as a maturing Software Framework

• Inheritance versus composition resuse mechanism

• Hierarchical versus relational data models

• Synchronous versus asynchronous

• Stateful versus stateless

• Declarative versus OO versus procedural

• Coarse grained versus RPC versus Object based web services

Conclusion In 5 years time will there be advances in hardware processing to make GA

techniques viable?

problem domain experts can formulate representation of a problem to be solved using simple xml

Coders become farmers

Its counter intuitive to generate a million line ‘messy’ program to solve a problem

Are there any amends/changes to key specifications that will assist or restrict the GA method ?

Thank you, any questions ?

References

• JOHN R KOZA, Genetic Programming, MIT Press 1992• W3C, SOAP Version 1.2 • W3C, XML Version 1• W3C, XSLT Version 2: • W3C, WSDL Version 1:• WSIL Version 1• J. W. Hunt and M. D. McIlroy , An Algorithm for

Differential File Comparison published in 1976• SAXON XSLT PROCESSOR by Michael Kay,

http://saxon.sourceforge.net• ASF ANT, http://ant.apache.org• FXSL, Dimitre Novatchev http://fxsl.sourceforge.net

top related