2 dr birgit plietzsch arts computing advisor [email protected] swithun crowe developer for arts...

48
Using Alfresco to create an Open Archival Information System 1 Dr Birgit Plietzsch Arts Computing Advisor bp10@st- andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects [email protected] & IT Services, University of St Andrews

Upload: robin-marriner

Post on 01-Apr-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

1

Using Alfresco to create an Open Archival Information SystemDr Birgit Plietzsch

Arts Computing Advisor

[email protected]

Swithun Crowe

Developer for Arts and

Humanities Computing projects

[email protected]

&

IT Services, University of St Andrews

Page 2: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

2

Structure

1. Introduction to the University of St Andrews Digital Archiving Project (DAP)

2. The DAP Open Archival Information System

3. Developing the OAIS Ingest function in Alfresco

Page 3: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

3

Digital Preservation

Digital Preservation is …• the active management of digital information over time to ensure its

accessibility• long-term, error-free storage of digital information, with means for retrieval

and interpretation, for the entire time span the information is required for.• Long-term is defined as "long enough to be concerned with the impacts of changing

technologies, including support for new media and data formats, or with a changing user community. Long Term may extend indefinitely”.

• Retrieval means obtaining needed digital files from the long-term, error-free digital storage, without possibility of corrupting the continued error-free storage of the digital files.

• Interpretation means that the retrieved digital files, files that, for example, are of texts, charts, images or sounds, are decoded and transformed into usable representations. This is often interpreted as "rendering", i.e. making it available for a human to access. However, in many cases it will mean able to be processed by computational means.

(Source: Wikipedia)

Page 4: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

4

Institutional context

• Legal requirements (e.g. Freedom of Information Act)

• Protection of institutional intellectual property

• Funding body requirements• until 2008 Arts and Humanities Data Service for Arts and

Humanities (national depository for arts and humanities research data)

• no such body exists now for the Arts and Humanities• other subjects national support is patchy

• Moral obligations• protection of cultural and corporate memory

Page 5: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

5

Records of the Parliaments of Scotland project

www.rps.ac.uk

• proceedings of the Scottish Parliament from the first surviving act of 1235 to the union of 1707

• 10 years of research• no print publication• c16.5m words• issues:

• inconsistent editorial practices

• obsolescence of software originally used

• long-term sustainability of research data

Page 6: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

6

Digital Archiving Project (DAP)

• Pilot project

• Scope:• data contained in electronic resources produced within the Faculty

of Arts, University of St Andrews

• Aims:• ensure long-term sustainability of RPS data• investigate the requirements of digital archiving and obtain

experience• meet funding body requirement• flexible implementation (to allow for additional future uses)

Page 7: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

7

The DAP archive

Concepts and Properties of Archives and Hosting in the Strategy and their Relationships ©Charles Beagrie Ltd 2009. CreativeCommons Attribution-Share Alike3.0 Key: solid colour represents core properties and fading colour represents weaker properties of archives and hosting services.

Concepts and Properties of Archives and Hosting in the Strategy and their Relationships

© Charles Beagrie Ltd 2009. CreativeCommons Attribution-Share Alike3.0

Page 8: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

8

Structure

1. Introduction to the University of St Andrews Digital Archiving Project (DAP)

2. The DAP Open Archival Information System

3. Developing the OAIS Ingest function in Alfresco

Page 9: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

9

The DAP Open Archival Information System

• An Open Archival Information System (or OAIS) is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.

• reference model: ISO 14721:2003

Page 10: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

10

Open Archival Information System: workflows

Seven functions

• Ingest • Archival

Storage • Data

Management • Administration • Preservation

Planning • Access • Management

SIP Submission Information PackageAIP Archival Information PackageDIP Dissemination Information Package

Page 11: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

11

Open Archival Information System: data package

Implementation

• Content Information:• XML• TIFF• DOC• Etc

• Preservation Description Information:

• PREMIS

• Descriptive Information:

• MODS

• Packaging Information:

• METS

Page 12: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

12

Preservation strategy

• What needs to be preserved?• data• layout• functionality• user experience

• What are the significant properties?• generic low-level properties (e.g. basic data unit, byte-level encoding, data type, and logical schema)• data type specific properties (example: text)

• underlying abstract forms (font, spacing, layout)• sub-properties (e.g. font type, style, family, size, colour)

• How do we preserve?• bit stream preservation• emulation• migration

• Adopted approach:• data is preserved• combination of bit stream preservation and file format migration upon ingest

Page 13: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

13

Data models

• description needs of different types of material• electronic resources• digital images • video• research papers• University records• etc.

• introduce flexibility• future wider uses of the archive

Page 14: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

14

Electronic resources data model

• expressed in MODS

• 3 layers

• use for pilot

• more models can be developed

Project

Research data

Documen-tation

Code

Resource type

Digital object

Resource Discovery Metadata

Page 15: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

15

Approaches investigated

Monolithic approach

• Repository framework: Fedora Commons

• issues with suitable front end for Ingest, Access, Preservation Planning, or Administration functions

• highly customisable

• Metadata• MODS• METS• PREMIS

• DSpace• issues with Archival Storage

and Data Management functions

• EPrints• issues with Administration

and Access functions

• RODA• technical issues

No support for Preservation Planning

Breakdown into OAIS requirements

Page 16: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

16

Access

• Plato• Testbed

Implementation of DAP

Software used

• Alfresco• www.alfresco.com

• Fedora Commons

• fedora-commons.org

• Planets Suite• www.openplanets

foundation.org

Archival storage &

Data Management

Management

• Share• Explorer• Records Management

Ingest Preservation Planning

Administration

Page 17: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

17

The DAP Open Archival Information System

Page 18: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

18

Unresolved issues

• Version control of AIPs• Alfresco / Fedora interaction?

• Access front end• Fedora Commons front ends do not normally support OAIS

functions

• Can extra properties be added to folders and files in Records Management site?

We welcome ideas that might help us resolve the above three issues.

Page 19: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

19

Structure

1. Introduction to the University of St Andrews Digital Archiving Project (DAP)

2. The DAP Open Archival Information System

3. Developing the OAIS Ingest function in Alfresco

Page 20: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

20

Developing the OAIS Ingest in Alfresco

• FITS and PREMIS• Technical metadata

• RPS and MODS• Resource discovery metadata

• Antivirus scanning• METS

• Wrapping files and metadata

Introduction

Page 21: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

21

FITS and PREMIS

• FITS (File Information Tool Set)• http://code.google.com/p/fits/

• Consolidates file format metadata from 3rd party tools• Jhove, DROID, NLNZ ME, Exiftool and others

• Output as XML• PREMIS (PREservation Metadata: Implementation

Strategies)• http://www.loc.gov/standards/premis/

• Data dictionary of semantic units, maps to XML• Transform FITS XML to PREMIS using XSLT

Introduction

Page 22: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

22

FITS and PREMIS

• Text property defined in custom aspect for storing FITS XML in node metadata

• Create temporary file containing content of node• Run FITS on temporary file• Put output into custom property• Later on, transform this to PREMIS XML• Can be run as space rule• Compile to AMP using Alfresco SDK

The action

Page 23: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

23

FITS and PREMIS

<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>

<bean id="fits-action-messages" class="org.alfresco.i18n.ResourceBundleBootstrapComponent">

<property name="resourceBundles">

<list><value>alfresco.module.FitsAction.fits-action-messages</value></list>

</property>

</bean>

<bean id="fits-model-bootstrap" parent="dictionaryModelBootstrap" depends-on="dictionaryBootstrap">

<property name="models">

<list><value>alfresco/module/FitsAction/context/fitsModel.xml</value></list>

</property>

</bean>

<bean id="fits-action“ class="uk.ac.st_andrews.repo.action.executer.FitsActionExecuter“ parent="action-executer">

<property name="serviceRegistry"><ref bean="ServiceRegistry"/></property>

</bean>

</beans>

fits-action-context.xml

Page 24: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

24

FITS and PREMIS

package uk.ac.st_andrews.repo.action.executer;

public class FitsActionExecuter extends ActionExecuterAbstractBase

{

public void setServiceRegistry(ServiceRegistry serviceRegistry);

protected void addParameterDefinitions(List<ParameterDefinition> paramList);

protected void executeImpl(Action action, NodeRef actionedUponNodeRef);

}

FitsActionExecuter

Page 25: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

25

FITS and PREMIS

63 // make sure node exists

64 if (!nodeService.exists(actionedUponNodeRef))

65 {

66 throw new Exception("no node");

67 }

68

69 // make sure that node has fits aspect

70 QName fitsAspect = QName.createQName(fitsURI, "fitsAspect");

71 if (!nodeService.hasAspect(actionedUponNodeRef, fitsAspect))

72 {

73 this.nodeService.addAspect(actionedUponNodeRef, fitsAspect, null);

74 }

75

76 // create new FITS instance

77 Fits fits = new Fits();

78 Fits.allowRounding = true;

79 FitsOutput result = null;

FitsActionExecuter.executeImpl (fragment)

Page 26: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

26

FITS and PREMIS

81 // put input into temp file

82 ContentReader reader =

83 contentService.getReader(actionedUponNodeRef, ContentModel.PROP_CONTENT);

84 String fileName =

85 (String) nodeService.getProperty(actionedUponNodeRef, ContentModel.PROP_NAME);

86 File inputFile =

87 TempFileProvider.createTempFile("FitsActionExecuter_", "." + fileName);

88 reader.getContent(inputFile);

89

90 // transform into technical metadata

91 result = fits.examine(inputFile);

92 Document doc = result.getFitsXml();

93

94 // put result of transformation into output

95 XMLOutputter serializer = new XMLOutputter(Format.getPrettyFormat());

96 String output = serializer.outputString(doc);

97

98 // get property to write to

99 QName fitsProp = QName.createQName(fitsURI, "fitsOutput");

100 nodeService.setProperty(actionedUponNodeRef, fitsProp, output);

FitsActionExecuter.executeImpl (fragment cont.)

Page 27: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

27

FITS and PREMIS

<identification status="CONFLICT">

<identity format="Microsoft Word" mimetype="application/msword">

<tool toolname="Exiftool" toolversion="8.25" />

<tool toolname="file utility" toolversion="5.04" />

<tool toolname="NLNZ Metadata Extractor" toolversion="3.4GA" />

<tool toolname="ffident" toolversion="0.2" />

</identity>

<identity format="OLE2 Compound Document Format" mimetype="application/octet-stream">

<tool toolname="Droid" toolversion="3.0" />

<externalIdentifier toolname="Droid" toolversion="3.0" type="puid">fmt/111</externalIdentifier>

</identity>

</identification>

Fragment of FITS XML showing conflicting file formats

Page 28: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

28

FITS and PREMIS

<premis:format> <premis:formatDesignation> <premis:formatName>Microsoft Word</premis:formatName> </premis:formatDesignation></premis:format><premis:format> <premis:formatDesignation> <premis:formatName>OLE2 Compound Document Format</premis:formatName> </premis:formatDesignation> <premis:formatRegistry> <premis:formatRegistryName>Droid (3.0)</premis:formatRegistryName> <premis:formatRegistryKey>fmt/111</premis:formatRegistryKey> <premis:formatRegistryRole>puid</premis:formatRegistryRole> </premis:formatRegistry></premis:format>

Corresponding fragment of PREMIS XML

Page 29: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

29

RPS and MODS

• Records of the Parliaments of Scotland marked up in thousands of XML documents

• http://www.rps.ac.uk

• Using Text Encoding Initiative (TEI) • http://www.tei-c.org/index.xml

• TEI headers contain resource discovery metadata• Extract metadata from documents and populate custom

metadata fields• Can be run as space rule• Compile as AMP using Alfresco SDK

Introduction

Page 30: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

30

RPS and MODS

<TEI.2 id="_william_and_mary_t1689_3_6_d6_trans" n="william_and_mary_trans">

<teiHeader>

<fileDesc>

<titleStmt>

<title>A committee appointed for controverted elections</title>

</titleStmt>

<editionStmt>

<edition n="session">william_and_mary_t1689_3_1_d2_trans</edition>

</editionStmt>

<publicationStmt>

<date>16890314</date>

</publicationStmt>

</fileDesc>

</teiHeader>

<text>...</text>

</TEI.2>

TEI example Unique ID for document

Document belongs to translated version of records from reign of William and Mary

Main heading in document

Pointer to session that document belongs to

Date of document, in YYYYMMDD format

Page 31: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

31

RPS and MODS

package uk.ac.st_andrews.repo.content.metadata;

public class RPSMetadataExtracter extends org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter

{

public RPSMetadataExtracter();

protected Map<String, Serializable> extractRaw(ContentReader reader);

}

RPSMetadataExtracter

Page 32: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

32

RPS and MODS

63 // set up parser

64 SAXParser sp = spf.newSAXParser();

65 InputStream cis = reader.getContentInputStream();

66 InputSource is = new InputSource(cis);

67 RPSSaxParser teip = new RPSSaxParser();

68

69 // do parsing

70 teip.setProperties(map);

71 sp.parse(is, teip);

72 map = teip.getProperties();

73

74 // loop over properties found

75 Set s = map.entrySet();

76 Iterator it = s.iterator();

77 while (it.hasNext())

78 {

79 Map.Entry m = (Map.Entry) it.next();

80 putRawValue((String) m.getKey(), (String) m.getValue(), rawProperties);

81 }

RPSMetadataExtracter.extractRaw

Page 33: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

33

RPS and MODS

package uk.ac.st_andrews.repo.content.metadata;

public class RPSSaxParser extends org.xml.sax.helpers.DefaultHandler

{

public void setProperties(Map<String, Serializable> prop);

public Map<String, Serializable> getProperties();

public void startElement(String uri, String localName, String qName, Attributes attributes);

public void endElement(String uri, String localName, String qName);

public void characters(char[] ch, int start, int length);

private void handleID(String id);

private void handleDate(String d);

}

RPSSaxParser

Page 34: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

34

RPS and MODS

// property names

21 private static final String KEY_ID = "rpsID";

22 private static final String KEY_REIGN = "rpsReign";

23 private static final String KEY_VERSION = "rpsVersion";

24 private static final String KEY_HEADING = "rpsHeading";

25 private static final String KEY_SESSION = "rpsSession";

26 private static final String KEY_DATE = "rpsDate";

27 private static final String KEY_TITLE = "cmTitle";

// some properties get set in RPSSaxParser.characters

185 if (true == inTitle)

186 {

187 rawProperties.put(KEY_TITLE, new String(ch, start, length));

188 }

189 else if (true == inSession)

190 {

191 rawProperties.put(KEY_SESSION, new String(ch, start, length));

192 }

RPSSaxParser

Page 35: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

35

RPS and MODS

# Namespaces

namespace.prefix.rps=http://www.rps.ac.uk/ns/1.0

namespace.prefix.cm=http://www.alfresco.org/model/content/1.0

# Mapping of property names to Qualified names used in model

rpsID=rps:id

rpsReign=rps:reign

rpsSession=rps:session

rpsDate=rps:date

rpsVersion=rps:version

rpsHeading=rps:heading

cmTitle=cm:title

RPSMetadataExtracter.properties

Page 36: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

36

RPS and MODS

<aspect name="rps:metadata">

<title>RPS Metadata</title>

<properties>

<property name="rps:id"><type>d:text</type></property>

<property name="rps:reign"><type>d:text</type></property>

<property name="rps:session"><type>d:text</type></property>

<property name="rps:date"><type>d:text</type></property>

<property name="rps:heading"><type>d:text</type></property>

<property name="rps:version"><type>d:text</type></property>

</properties>

</aspect>

rpsModel.xml (fragment showing aspect)

Page 37: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

37

RPS and MODS

# I18N strings

rpsID=RPS ID

rpsReign=RPS Reign

rpsSession=RPS Session

rpsDate=RPS Date

rpsVersion=RPS Version

rpsHeading=RPS Heading

webclient.properties

Page 38: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

38

RPS and MODS

• Metadata Object Description Schema • http://www.loc.gov/standards/mods/

• MODS is a resource discovery metadata standard• Working on defining MODS data models

• For Project, Resource Type and Digital Object levels

• Will move RPS metadata into MODS fields

Using MODS

Page 39: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

39

Antivirus Action

• Creates an action for scanning files for viruses• Uses ClamAV

• http://www.clamav.net/lang/en/

• Can be configured for other tools• Emails creator of file if virus found• Deletes file from repository if virus found• Can be run as space rule• Compile as AMP using Alfresco SDK

Introduction

Page 40: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

40

Antivirus Action

antivirus-action.xml (fragment)

<bean id="antivirus-action" class="uk.ac.st_andrews.repo.action.executer.AntivirusActionExecuter" parent="action-executer">

<!– services needed by bean -->

<property name="contentService“><ref bean="contentService" /></property>

<property name="nodeService"><ref bean="nodeService" /></property>

<property name="templateService"><ref bean="templateService" /></property>

<property name="actionService"><ref bean="actionService" /></property>

<property name="personService"><ref bean="personService" /></property>

<!– person that email will come from, defined in alfresco-golbal.properties -->

<property name="fromEmail">

<value>${antivirus.mailer}</value>

</property>

<!– path to Freemarker template, defined in alfresco-golbal.properties -->

<property name="emailTemplate">

<value>${antivirus.template}</value>

</property>

Page 41: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

41

Antivirus Action

antivirus-action.xml (fragment, cont.)

<property name="command">

<bean class="org.alfresco.util.exec.RuntimeExec">

<property name="commandMap">

<map>

<!– command to run, ${antivirus.exe} set in alfresco-golbal.properties, ${source} in Java class -->

<entry key=".*" value="${antivirus.exe} ${source}"/>

</map>

</property>

<property name="errorCodes">

<value>1</value><!– exit code 1 indicates that virus was found -->

</property>

</bean>

</property>

</bean>

Page 42: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

42

Antivirus Action

AntivirusActionExecuter

package uk.ac.st_andrews.repo.action.executer;

public class AntivirusActionExecuter extends ActionExecuterAbstractBase

{

public void setContentService(ContentService contentService);

public void setNodeService(NodeService nodeService);

public void setTemplateService(TemplateService templateService);

public void setActionService(ActionService actionService);

public void setPersonService(PersonService personService);

public void setFromEmail(String fromEmail);

public void setCommand(RuntimeExec command);

public void setEmailTemplate(String emailTemplate);

public void init();

protected void addParameterDefinitions(List<ParameterDefinition> paramList);

protected void executeImpl(final Action ruleAction, final NodeRef actionedUponNodeRef);

}

Page 43: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

43

Antivirus Action

AntivirusActionExecuter.executeImpl (fragment)

135 // put content into temp file

136 ContentReader reader =

137 contentService.getReader(actionedUponNodeRef, ContentModel.PROP_CONTENT);

138 String fileName =

139 (String) nodeService.getProperty(actionedUponNodeRef, ContentModel.PROP_NAME);

140 File sourceFile =

141 TempFileProvider.createTempFile("anti_virus_check_", "_" + fileName);

142 reader.getContent(sourceFile);

143

144 // set source property for command

145 Map<String, String> properties = new HashMap<String, String>(1);

146 properties.put(VAR_SOURCE, sourceFile.getAbsolutePath());

147

148 // execute the transformation command

149 ExecutionResult result = null;

150 try

151 {

152 result = command.execute(properties);

153 }

154 catch (Throwable e)

155 {

156 throw new AlfrescoRuntimeException("Antivirus check error: \n" + command, e);

157 }

Page 44: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

44

Antivirus Action

AntivirusActionExecuter.executeImpl (fragment, cont.)

165 // try to get document creator's details

166 String creatorName = (String) nodeService.getProperty(actionedUponNodeRef,

167 ContentModel.PROP_CREATOR);

168 if (null == creatorName || 0 == creatorName.length())

169 {

170 throw new Exception("couldn't get creator's name");

171 }

172

173 NodeRef creator = personService.getPerson(creatorName);

174 if (null == creator)

175 {

176 throw new Exception("couldn't get creator");

177 }

178

179 String creatorEmail = (String) nodeService.getProperty(creator,

180 ContentModel.PROP_EMAIL);

181 if (null == creatorEmail || 0 == creatorEmail.length())

182 {

183 throw new Exception("couldn't get creator's email address");

184 }

Page 45: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

45

Antivirus Action

AntivirusActionExecuter.executeImpl (fragment, cont.)

186 // put together message

187 Map<String, Object> model = new HashMap<String, Object>(8, 1.0f);

188 model.put("filename", fileName);

189 model.put("message", result);

190

191 String emailMsg = templateService.processTemplate("freemarker", emailTemplate, model);

192

193 // send email message

194 Action emailAction = actionService.createAction("mail");

195 emailAction.setParameterValue(MailActionExecuter.PARAM_TO, creatorEmail);

196 emailAction.setParameterValue(MailActionExecuter.PARAM_FROM, fromEmail);

197 emailAction.setParameterValue(MailActionExecuter.PARAM_SUBJECT,

198 "Virus found in " + fileName);

199 emailAction.setParameterValue(MailActionExecuter.PARAM_TEXT, emailMsg);

200 emailAction.setExecuteAsynchronously(true);

201 actionService.executeAction(emailAction, null);

202

203 // delete node

204 nodeService.addAspect(actionedUponNodeRef, ContentModel.ASPECT_TEMPORARY, null);

205 nodeService.deleteNode(actionedUponNodeRef);

Page 46: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

46

METS and Fedora Commons

• Metadata and Encoding Transmission Standard (METS)• http://www.loc.gov/standards/mets/

• METS is a wrapper for other metadata documents• Plan to generate METS documents containing/referencing:

• Ingested files• Renderings of these files (thumbnails, reference copies, archival

formatted versions etc.)• Resource discovery metadata• Technical metadata

• Fedora Commons can ingest METS documents as SIPs• http://fedora-commons.org/

Introduction

Page 47: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

47

Find out more

• FITS in Alfresco• http://forge.alfresco.com/projects/fitsinalfresco/

• RPS Metadata Extracter• http://forge.alfresco.com/projects/rpsmetadata/

• Antivrus• http://forge.alfresco.com/projects/antivirus/

• http://www.st-andrews.ac.uk/itsupport/academic/arts

Project source code available on Alfresco Forge

University of St Andrews Digital Archiving Project

Page 48: 2 Dr Birgit Plietzsch Arts Computing Advisor bp10@st-andrews.ac.uk Swithun Crowe Developer for Arts and Humanities Computing projects cs2@st-andrews.ac.uk

48

Using Alfresco to create an Open Archival Information SystemDr Birgit Plietzsch

Arts Computing Advisor

[email protected]

Swithun Crowe

Developer for Arts and

Humanities Computing projects

[email protected]

&

IT Services, University of St Andrews