technical coping strategies for resource discovery - paul walk

23
Paul Walk [email protected] @paulwalk http://www.paulwalk.net Technical challenges in resource discovery

Upload: jisc

Post on 26-Jun-2015

1.117 views

Category:

Education


1 download

DESCRIPTION

Technical Coping Strategies for Resource Discovery Paul's plenary presentation at the Jisc/British Library Discovery Summit 2013 February 2013, London

TRANSCRIPT

Page 1: Technical Coping Strategies for Resource Discovery - Paul Walk

Paul    [email protected]

@paulwalkhttp://www.paulwalk.net

Technical challenges in resource discovery

Page 2: Technical Coping Strategies for Resource Discovery - Paul Walk

Contents

1. a  general  consideration:• open  or  closed

2. a  particular  challenge:• synchronisation  in  an  open  world

3. the  ‘nothing  new’,  but  doing  it  better• APIs  that  work  and  can  be  trusted

Page 3: Technical Coping Strategies for Resource Discovery - Paul Walk

a healthy(?) state of tension between open and closed

Page 4: Technical Coping Strategies for Resource Discovery - Paul Walk

open and closed worlds

• I’m  not  talking  about  licensing  or  access  to  data

• open• unbounded  -­‐  like  the  Web

• closed• bounded  -­‐  like  most  collections  management  system,  aggregations  etc.

• formally,  much  of  what  we  do  is  underpinned  by  ‘open/closed  worlds’  assumptions:

• open  world  assumption:  any  statement  not  known  to  be  true  is  unknown• closed  world  assumption:  any  statement  not  known  to  be  true  is  false

Page 5: Technical Coping Strategies for Resource Discovery - Paul Walk

characteristics of an open world

Page 6: Technical Coping Strategies for Resource Discovery - Paul Walk

characteristics of a closed/bounded world

Page 7: Technical Coping Strategies for Resource Discovery - Paul Walk

judging where to apply each

• we  need  our  infrastructure  (especially  integration  technology  between  systems)  to  be  open  and  relatively  unbounded

• the  Web  is  still  the  best  available  foundation  for  this

• however,  we  still  need  to  manage  our  resources,  maintain  quality  and  honour  complex  rights  management  commitments

• we  probably  need  to  recognise  that  users’  experience  is  often  enhanced  through  the  application  of  a  more  focussed,  targeted  and  context-­‐aware  approach

Page 8: Technical Coping Strategies for Resource Discovery - Paul Walk

a particular challenge

Page 9: Technical Coping Strategies for Resource Discovery - Paul Walk

synchronisation

• how  is  the  state  of  the  resource  maintained  across  an  infrastructure  of  ‘federated’  repositories?

• if  a  resource  is  changed  or  deleted,  how  does  the  right-­‐hand  side  aggregation  know?

• note  -­‐  this  is  based  on  our  existing  ‘harvesting’  or  ‘pull’  approach

ResourceCollection

ResourceCollection

ResourceCollection

Aggregation

Aggregation

ResourceCollection

Aggregation

multiple harvest routes,multiple copies

Page 10: Technical Coping Strategies for Resource Discovery - Paul Walk

ResourceSync

• a  joint  project  of  NISO  and  OAI,  led  by  Herbert  Van  de  Sompel  of  Los  Alamos

• a  light-­‐weight  mechanism  to  allow  the  state  of  web  resources  to  be  communicated  between  web  systems

• developing  a  spec  which  builds  on  the  sitemap  speciTication,  allowing  content  providers  to  publish  changesets

• draft:  http://bit.ly/WYhTz2

• Jisc  have  funded  UK  participation  in  this

Page 11: Technical Coping Strategies for Resource Discovery - Paul Walk

The sun shone, having no alternative, on the nothing new. Murphy,  Samuel  Becket

Page 12: Technical Coping Strategies for Resource Discovery - Paul Walk

A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable

Leslie Lamport

Page 13: Technical Coping Strategies for Resource Discovery - Paul Walk

a common ‘anti-pattern’

• as  a  developer,  I  have  no  reason  to  trust  that  these  APIs  are  any  good.  

• after  all,  the  service  provider  doesn’t  seem  to  trust  them  for  their  own  application....

some aggregated data of broad interest and potential usefulness

UI

APIAPIAPI

Future3rd-party

dev

Future3rd-party

dev

Future3rd-party

dev

UI

UI

UI

= certainty= belief= speculation

end-user

end-userend-user

end-user

Page 14: Technical Coping Strategies for Resource Discovery - Paul Walk

a better pattern

• As  a  developer,  I’m  more  likely  to  trust  this  pattern.

• the  content  provider  is  using  their  own  API  to  deliver  their  own  application.

• they  have  a  vested  interest!

some aggregated data of broad interest and potential usefulness

API

3rd-partyapp

focussedapp

UIUI

end-userend-user

= certainty= belief= speculation

Page 15: Technical Coping Strategies for Resource Discovery - Paul Walk

APIs are not best thought of as machine-to-machine interfaces

APIs are interfaces for developers

Page 16: Technical Coping Strategies for Resource Discovery - Paul Walk

messages from developers to content-providers

• These  are  from  yesterday’s  developer  day  held  here  at  the  BL  in  support  of  this  summit:

• please  don’t  build  elaborate  APIs  which  do  not  allow  us  to  see  all  of  the  data,  or  its  extent.  It’s  not  that  we  simply  want  to  download  all  the  data  -­‐  but  we  do  need  to  see  what  we’re  dealing  with

• if  you  give  us  access  to  incomplete  data  (perhaps  because  you’re  worried  about  revealing  poor  data  quality),  then  we  will  tend  to  either  abandon  our  attempts  to  use  it  or  we  will  ‘Bill  in  the  gaps’  with  data  from  elsewhere.  So  offering  an  API  which  delivers  incomplete  data  is  usually  self-­‐defeating

• the  implicit  bargain,  made  explicit:• give  us  access  to  the  data  as  soon  as  possible  and  we  will  do  some  of  the  work  to  process  so  it  is  Bit  for  some  new  purpose  -­‐  and  we  will  happily  share  this  code  with  you

Page 17: Technical Coping Strategies for Resource Discovery - Paul Walk

Questions for the parallel sessions

1. Which  emerging  technologies  do  we  need  to  focus  on  in  2013?

2. Do  we  still  need  to  aggregate?

3. What  does  data  quality  stop  us  doing?

Page 18: Technical Coping Strategies for Resource Discovery - Paul Walk

Which emerging technologies do we need to focus on in 2013?

• Graphs:  Content  Context  is  king

• both  Facebook  and  Google  are  betting  heavily  on  graph  technologies

• closer  to  home  -­‐  so  are  content  providers  like  the  BBC

• linking  these  is  an  interesting  challenge

• databases  based  on  a  graph  model  give  the  potential  for  a  richer  understanding  about  entities  (users!)

• instrumentation  in  personal  devices  makes  more  context  available  (e.g.  geo-­‐location).

Page 19: Technical Coping Strategies for Resource Discovery - Paul Walk

Do we still need to aggregate?

Page 20: Technical Coping Strategies for Resource Discovery - Paul Walk

Do we still need to aggregate?

yes.

Page 21: Technical Coping Strategies for Resource Discovery - Paul Walk

Do we still need to aggregate?

• to  address  systems/network  latency  -­‐  provide  a  cache

• to  showcase!

• for  ‘Web  Scale  concentration’

• network  effects  if  user  facing  services  also  developed

• to  create  middleman  business  opportunities

• as  infrastructure  to  support  locally  developed  services

• as  an  approach  to  preservation

yes.

Page 22: Technical Coping Strategies for Resource Discovery - Paul Walk

What does data quality stop us doing?

• interpreted  as:  “what  does  a  concern  for  data  quality  stop  us  doing?”• it  stops  us  from  releasing  data  early

• interpreted  as:  “what  does  poor/uncertain  data  quality  stop  us  doing?”• it  erodes  trust,  which  impacts  the  likelihood  of  someone  doing  something  worthwhile  with  our  data

• reconciling  these  concerns  is  a  major  challenge  for  us.

Page 23: Technical Coping Strategies for Resource Discovery - Paul Walk

thank you!

Paul    [email protected]

@paulwalkhttp://www.paulwalk.net