university of sheffield nlp gate development hints reporting bugs submitting a patch the user guide...

27
University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

Upload: nathanial-hayslip

Post on 31-Mar-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

GATE development hints

• Reporting bugs

• Submitting a patch

• The user guide

• Continuous integration

Page 2: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Bugs, feature requests

• Use the tracker on SourceForge http://sourceforge.net/projects/gate/support

• Give as much detail as possible GATE version, build number, platform, Java

version (1.5.0_15, 1.6.0_03, etc.) Steps to reproduce Full stack trace of any exceptions, including

"Caused by…"

• Check whether the bug is already fixed in the latest nightly build

Page 3: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Patches

• Use the patches tracker on SourceForge

• Best format is an svn diff against the latest subversion Save the diff as a file and attach it, don't paste

the diff into the bug report.

• We generally don't accept patches against earlier versions

Page 4: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Patches (2)

• GATE must compile and run on Java 5 Not sufficient to set source="1.5" and

target="1.5" but compile on Java 6 This doesn't prevent you calling

classes/methods that don't exist in 5

• Test your patch on Java 5 before submitting

Page 5: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

The User Guide

• Everything in GATE is (theoretically) documented in the GATE User Guide http://gate.ac.uk/userguide

• Every change to the core should be mentioned in the change log http://gate.ac.uk/userguide/chap:changes

• User guide is written in LaTeX

Page 6: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Updating the user guide

• Lives in subversion https://gate.svn.sourceforge.net/svnroot/

gate/userguide/trunk

• Build requires pdflatex, htlatex (tex4ht package), sed, make, etc. On Windows, use Cygwin

• Download http://gate.ac.uk/sale/big.bib and put in directory above the .tex files

Page 7: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Updating the user guide (2)

• Edit the .tex files

• Graphics, screenshots, etc. should be .png

• Check in changes to .tex files, the PDF and HTML are regenerated automatically by…

Page 8: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Hudson

• Continuous integration platform

• Automatically rebuilds GATE and user guide (among others) whenever they change

• Also does a clean build of GATE every night Nightly builds published at

http://gate.ac.uk/download/snapshots

Page 9: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Hudson

• Junit test results available for each build

• http://gate.ac.uk/hudson

Page 10: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

Running GATE Embedded in Tomcat (or any multithreaded system)

Issues and tricks

Page 11: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Introduction

• Scenario: Implementing a web service (or other web

application) that uses GATE Embedded to process requests.

Want to support multiple concurrent requests Long running process - need to be careful to

avoid memory leaks, etc.

• Example used is a plain HttpServlet Principles apply to other frameworks (struts,

Spring MVC, Metro/CXF, Grails…)

Page 12: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Setting up

• GATE libraries in WEB-INF/lib gate.jar + JARs from lib

• Usual GATE Embedded requirements: A directory to be "gate.home" Site and user config files Plugins directory Call Gate.init() once (and only once) before

using any other GATE APIs

Page 13: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Initialisation using a ServletContextListener

• ServletContextListener is registered in web.xml

• Called when the application starts uppublic void contextInitialized(ServletContextEvent e) { ServletContext ctx = e.getServletContext(); File gateHome = new File(ctx.getRealPath("/WEB-INF")); Gate.setGateHome(gateHome); File userConfig = new File(ctx.getRealPath("/WEB-INF/user.xml")); Gate.setUserConfigFile(userConfig); // site config is gateHome/gate.xml // plugins dir is gateHome/plugins Gate.init();}

<listener> <listener-class>gate.web..example.GateInitListener</listener-class></listener>

Page 14: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

GATE in a multithreaded environment

• GATE PRs are not thread-safe Due to design of parameter-passing as

JavaBean properties

• Must ensure that a given PR/Controller instance is only used by one thread at a time

Page 15: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

First attempt: one instanceper request

• Naïve approach - create new PRs for each request

public void doPost(request, response) { ProcessingResource pr = Factory.createResource(...); try { Document doc = Factory.newDocument(getTextFromRequest(request)); try { // do some stuff } finally { Factory.deleteResource(doc); } } finally { Factory.deleteResource(pr); }}

Many levels of nested try/finally: ugly but necessary to make sure we clean up even when errors occur. You will get very used to these…

Page 16: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Problems with this approach

• Guarantees no interference between threads• But inefficient, particularly with complex PRs (large

gazetteers, etc.)• Hidden problem with JAPE:

Parsing a JAPE grammar creates and compiles Java classes Once created, classes are never unloaded Even with simple grammars, eventually OutOfMemoryError

(PermGen space)

Page 17: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Second attempt: using ThreadLocals

• Store the PR/Controller in a thread local variableprivate ThreadLocal<CorpusController> controller = new ThreadLocal<CorpusController>() { protected CorpusController initialValue() { return loadController(); }};

private CorpusController loadController() { //...}

public void doPost(request, response) { CorpusController c = controller.get(); // do stuff with the controller}

Page 18: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Better than attempt 1…

• Only initialise resources once per thread

• Interacts nicely with typical web server thread pooling

• But if a thread dies, no way to clean up its controller Possibility of memory leaks

Page 19: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

A solution: object pooling

• Manage your own pool of Controller instances

• Take a controller from the pool at the start of a request, return it (in a finally!) at the end

• Number of instances in the pool determines maximum concurrency level

Page 20: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Simple exampleprivate BlockingQueue<CorpusController> pool;

public void init() { pool = new LinkedBlockingQueue<CorpusController>(); for(int i = 0; i < POOL_SIZE; i++) { pool.add(loadController()); }}

public void doPost(request, response) { CorpusController c = pool.take(); try { // do stuff } finally { pool.add(c); }}

public void destroy() { for(CorpusController c : pool) Factory.deleteResource(c);}

Blocks if the pool is empty: use poll() if you want to handle empty pool yourself

Page 21: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Exporting the grunt work -the Spring Framework

• Spring Framework http://www.springsource.org/ Handles application startup and shutdown Configure your business objects and

connections between them using XML GATE provides helpers to initialise GATE,

load saved applications, etc. Built-in support for object pooling Web application framework (Spring MVC) Used by other frameworks (Grails, CXF, …)

Page 22: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Initialising GATE with Spring

<beans xmlns="http://www.springframework.org/schema/beans" xmlns:gate="http://gate.ac.uk/ns/spring"> <gate:init gate-home="/WEB-INF" plugins-home="/WEB-INF/plugins" site-config-file="/WEB-INF/gate.xml" user-config-file="/WEB-INF/user-gate.xml"> <gate:preload-plugins> <value>/WEB-INF/plugins/ANNIE</value> </gate:preload-plugins> </gate:init></beans>

Page 23: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Loading a saved application

<gate:saved-application id="myApp" location="/WEB-INF/application.xgapp" scope="prototype" />

• scope="prototype" means create a new instance each time we ask for it Default is singleton - one and only one

instance

Page 24: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Spring servlet example

• Spring provides HttpRequestHandler interface to manage servlet-type objects with Spring

• Declare an HttpRequestHandlerServlet in web.xml with the same name as the Spring bean

Page 25: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Spring servlet example

public class MyHandler implements HttpRequestHandler { public void setApplication(CorpusController app) { ... } public void handleRequest(request, response) { Document doc = Factory.newDocument(getTextFromRequest(request)); try { // do some stuff with the app } finally { Factory.deleteResource(doc); } }}

• Write the handler assuming single-threaded access Will use Spring to handle pooling for us

Page 26: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Tying it together

• web.xml<!-- set up Spring --><listener> <listener-class> org.springframework.web.context.ContextLoaderListener </listener-class></listener>

<!-- servlet --><servlet> <servlet-name>mainHandler</servlet-name> <servlet-class> org.springframework.web.context.support.HttpRequestHandlerServlet </servlet-class></servlet>

Page 27: University of Sheffield NLP GATE development hints Reporting bugs Submitting a patch The user guide Continuous integration

University of Sheffield NLP

Tying it together (2)

• applicationContext.xml<gate:init ... /><gate:saved-application id="myApp" location="/WEB-INF/application.xgapp" scope="prototype" />

<bean id="myHandlerTarget" class="my.pkg.MyHandler" scope="prototype"> <property name="application" ref="myApp" /></bean>

<bean id="handlerTargetSource" class="org.springframework.aop.target.CommonsPoolTargetSource"> <property name="targetBeanName" value="myHandlerTarget" /> <property name="minIdle" value="3" /> <property name="maxIdle" value="3" /> <property name="whenExhaustedActionName" value="WHEN_EXHAUSTED_BLOCK" /></bean>

<bean id="mainHandler" class="org.springframework.aop.framework.ProxyFactoryBean"> <property name="targetSource" ref="handlerTargetSource" /></bean>