moritz a universe of data

Post on 18-Dec-2014

474 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

SomeNotesonDigitalData–withasuggestionTomMoritz/InternetArchiveFebruary,2009

AUNIVERSEOFDATA???Whatis“data”?TheUSNSFDataNetsolicitationdefines“data”as:“Anyinformationthatcanbestoredindigitalformandaccessedelectronically,including,butnotlimitedto,numericdata,text,publications,sensorstreams,video,audio,algorithms,software,modelsandsimulations,images,etc.”iThisdefinitionistechnicallyacceptablebutnotscientificallyepistemic.Infact,itisusefultothinkof“data”intwodistinctways.“Data”refers(asintheDataNetdefinition)tothecomputerreadablecodethatisstoredin,accessedfromorflowsbetweencomputers.“Data”alsomeansprecise,well‐definedrepresentationsofobservations,descriptionsormeasurementsofareferent(objectorevent)recordedinsomestandard,well‐specifiedway.ThemoreinclusiveDataNetdefinitionhasthevirtueofforcingustoconsideraunified,holisticapproachtoknowledgeandtotheformalresourcesthatinformandexpressit;weareforcedtoconfronttheWebasitexiststoday.HOWMUCHDATA?Inanowfamousquip,LewisCarrollnotedthattheperfectscaleformapswas1:1butthatfarmerstendtobecomedisgruntledwhensuchmapsareunrolledovertheirfields.Thenotionthatwecouldtheoreticallyrecord“everything”inrealtime‐‐“1:1capture“–leavesustoponderthelimitsof“data”collection,managementandlongevity–full‐life‐cycleiicurationandstewardship.Withtheevolutionofsatellitecoverages,nanotechnology,roboticsandembeddednetworksensors,itispossible,forexample,tosystematicallyrecordpresence/absencedataforbirdsatanestingsite–ateverynestingsiteinagivenarea‐‐24‐7,forever[SEEforexample:http://www.jamesreserve.edu/webcams.lasso?CameraID=Cam14]iiiorforthatmattertorecordeveryhumanheartbeat.ivAndtoarchivethesedatainperpetuity?(Thecasualassumptionthatwemightcomprehensivelysavealldataisbeliedbyarecentforecastprojectingthatin2007,thetotaldataproducedonearthforthefirsttimeexceededtheavailablestorage.v)

viWHO’SRESPONSIBLE?Itisalsothecasethattechnology,standardsandmethodologies,thatinstitutions,organizationsandprofessions,haveevolvedandbecomeestablishedtomanageandpreservelogicaldomainsofknowledgeaswellasselectedtechnicalformatsofdata.Thepointrespectinglogicalsegmentsisrelativelyclear–naturalhistorymuseumsandherbariaholdpreserved(e.g.dead)organismsasspecimens;zoosandgardensandaquariaholdlivingorganismsexsitu;protectedareasholdlivingorganismsinsitu;cryogenicsfacilitiesholdtissuesamples–similarly,theirlibrariesholdlogicallycorrespondingpublishedorarchivalworks.Respectingtechnicalformats:librariesholdboundpaper/printmaterials;archivesholdunboundpaper/manuscriptorunboundpaper/typescriptmaterials;mediarepositoriesholdnon‐printmedia;computercentersholddatasetsandcomplexmodels(hypotheticalassemblagesofdatathatgeneratenewdata);artmuseumsholdpaintingsandsculptures;adancecompanyperformsdances;andindigenousgroupstewardsits“oldknowledge”.Similarly,librariansandarchivists,curatorsandzookeepers,rangersandinformationtechnologists,dancersandshamanshaveallreceivedvocationalchargeforsiloedsegmentsofour“knowledgebase”.Butwhoisresponsibleforthewhole?Beforetheadventofdigitaltechnologythislatterquestionwouldhavebeenmetaphysicallyinterestingbutpointless‐‐nolongeritseems.Scanningoursocietyandculture,itseemslibrariesandlibrariansarethemosteligiblecandidatesfortherole.Andifthereceived“compartments”organizational,professional,logicalstructuresarenolongerdictatedbyoperationalconstraints(egtheabilitytocurateadragonflyortoselectandconserveabook)howcanwemosteffectivelyorganizethemanagementofknowledgeasdata.Atthenationallevel,thereareprimeexamplesofinstitutionsthatadmirablyservelogicaldomainsofourknowledgebase,theNationalLibraryofMedicineisone.viiTheLibraryofCongressalonehasthestatureandscopeofinteresttocommandourtrustandexpectations.BUTDATAFORWHAT???

HarvardbiologistRichardLewontinnotesthat–likethedrunklookingforhiskeysunderastreetlight“becausethelightisbetterthere”–researchhasoftenbeenconstrainedtostudiesforwhichcareerorientedresearchershavetheapparatusandmethodstoproducecreditable(e.g.laudable,promotion‐worthy)results.viiiOurcurrenterahasseenanevolutionoftechnologythatchallengescomfortable“disciplinary”categoriesofresearchandconventionalformat‐definedcodesoffiduciaryresponsibility.Notonlyhavetraditionaldistinctionsbetweenthedomainsoftheartsandthehumanitiesandthesciencesbeenchallengedbuttheconventionsofscientificdisciplinesinthemselves–asfociforresearchandinvestment–arebeingchallenged.Newpossibilitiesfortrans‐disciplinarityareemergingbuttherequisitetoolsandmethodsarenotyetfullyformedandorganizationalpathsforsuchresearcharenotalwaysclear.ANDHOWDOESDATAHAVEMEANING?Whendataisconsideredinthescientificorresearchcontext,itssemanticpropertiesnecessarilybecomeessential.Thusourabilitytocontextualizedatabecomesprimary.Parametersoftimeandspaceareimmediatelyrelevant–somedatawillhaveageographiccontext(derivingoneparameterofmeaningfromlocation‐‐insitu)otherdatawillbeessentiallyageographic(exsitu),experimentalandindependentofgeographybutnotofexperimentalframe.Timeasaparameterofdatamaysimilarlybehistoricalorahistorical.Agency,materials,equipment(calibration)andoperationsalsosetprimaryparametersfordata.Huge–darewesay“exorbitant”?‐‐investmentshavebeenmadeinthe“metadataindustry”–mostparticularlyinlibraryandarchivalcataloging.Inthenewmedia,Webenvironment–othersolutionsoperatinguponnaturallanguageand“native[pre‐existent]metadata”haveproducedprodigious,cost‐effective(profitable)results.WHOSEDATA?Inanerawhencombinationsandrecombinationsofdataareroutine,“demandside”problemsoccurrespectingvalidationandcertificationofresultsand“supplyside”problemsoccurrespectingattributionandcreditfortheoriginatorsofdata.Moreoverscientists’claimsfordiscretepersonal“priority”ofdiscoveryareinevitablybeingchallenged.Collaborationismoreandmorecommon‐‐asforeseenbyRobertK.Mertonix‐‐anindividual’scontributiontothewholecorpusofknowledgeislessandlessclearlyattributable.Notionsof“authorship”arechallengedbyanonymousinstitutional/organizationalclaimstoauthorship.xAnd“smallscience”(ecology,fieldbiology,etc)–wheretheindividualscientistisstillseemasasingleactor‐‐isoftenperceivedasweaklydeveloped–asprovidingnomorethan“disaggregatedcomponentsofanincipientnetwork”xi.Atthesametimetherehasbeenaquantumincreaseintheefforttoisolateandtomonetizeintellectualpropertyxii.Intellectual“assets”–whetherintheformof

genomicdiscoveriesorscientificjournalarticles–havebecomeincreasinglycommoditized.xiiiItisalsothecasethatthedigitalenvironmenthasdisruptedtraditionaleconomicvaluechains(thishasbeenobviouslytrueinthepublishingindustryandintheentertainmentindustrywheretheconsequencesofthesepressureshavebeenaccusations,threatsandlawsuits–oftentothebizarreextentthatnaturalalliesinthevaluechainhaveattackedeachotheroreventothedegreethatcustomers/clientsofanindustryhavebeenattackedbytheindustryitself.AGLOBALDATAIMPERATIVE???PerhapsneglectingFaust(?),ThomasJeffersonasserted,“Thefieldofknowledgeisthecommonpropertyofallmankind.”Itseemsmoreresponsibletoconsideranethicalscaleofneedthatcompelsfreeandopenpublicaccesstotheresultsofnondestructiveresearch(obviouslythedefinitionof“nondestructive”requiresdebate).Thisspectrumofcommonneedincludes:humanhealth,pharmacology,publichealth;agrarianandagriculturalknowledge;environmentalknowledgeandconservationand–moregenerally–mostnon‐destructivescienceandtechnology,criticalforeducation.Thedilemmaweface,worldwideisthatmostdevelopingcountriesanddevelopingsegmentsofsocietyarethoseleastcapableofclearingthethresholdsofuseimposedbymarketcontrolsonknowledgeinallforms.xivInthenaiveexuberancethatformedtheLeagueofNations,an“InternationalCommitteeonIntellectualCooperation”wasenvisionedasaforumforglobalfocusoncommongoods‐‐today,inafarmoreexactway,wehavetheopportunitytoplananddeveloptechnicalresources,standardsandmethodologiesthatwillnotdenythebenefitsofhumanknowledgetotheleastprivileged.Acomprehensivestrategyrequiresthatwesuccessfullyaddress4primarymodalitiesofconstraint:technology,culture,economyandlaw.TheInternetArchive–focusingonR&Dandprototyping‐‐hasbuiltessentialcomponentsofwhatcouldultimatelybecomeafullservice,fulllifecycle‘collectiveutility’or“servicecloud”‐‐foropendigitalmanagementofhumanknowledge.ThisevolutiondoesnotrequirethattheArchiveitselfbecomethis“servicecloud”butthatitcomposeacomprehensiveresponseand‐‐togetherwithotherinstitutionsandorganizations,programsandinitiatives‐‐catalyzeacomprehensiveresponse.xvMostessentialelementsareinplace–oratleastemerging.Wecanandshouldactnow.iSustainable Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation NSF 07-601 , p.5. ii “the data management life cycle (including data creation, access, use, and preservation)” Sustainable

Digital Data Preservation and Access Network Partners (DataNet) Program Solicitation NSF 07-601 , p.5. iii Or as another instance see recent NYT article: Natalie Anger “Tracking forest creatures on the move.” NYT Feb 2, 2009 http://www.nytimes.com/2009/02/03/science/03angier.html?_r=1&scp=1&sq=tracking%20mammals&st=cse iv The California poet William Everson once asked poignantly: “And when the last coyote has been tagged…?” v “…the amount of information created, captured or replicated exceeded available storage for the first tie in 2007. Not all information created and transmitted gets stored, but by 2011, almost half of the digital universe will not have a permanent home.” John Gantz et al. (IDC) The diverse and exploding digital universe; an updated forecast or worldwide information growth through 2011. (March, 2008)www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf vi Serge Bloch in NYT: Natalie Anger “Tracking forest creatures on the move.” NYT Feb 2, 2009 SEE: http://www.nytimes.com/2009/02/03/science/03angier.html?_r=1&scp=1&sq=tracking%20mammals&st=cse viiHISTORICBUDGETSUPPORTFORNLMviii R. Lewontin, The Triple Helix: Gene, Organism, Environment ix “Property rights in science are whittled down to a bare minimum by the rationale of the scientific ethic. The scientist’s claim to “his” intellectual “property” is limited to that of recognition and esteem which, if the institution functions with a modicum of efficiency, is roughly commensurate with the significance of the increments brought to the common fund of knowledge.” Robert K. Merton, “A Note on Science and Democarcy,” Journal of Law and Political Sociology 1 (1942): 121. x SEE for example: Peter Galison, “The Collective Author,” in M. Biagioli and P. Galison (ed.s) Scientific Authorship: Crdit and Intelletual Property in ScienceNY, Routledge, 2003. xi SEE: THE ROLE OF SCIENTIFIC AND TECHNICAL DATA AND INFORMATION IN THE PUBLIC DOMAIN PROCEEDINGS OF A SYMPOSIUM J.M. Esanu and P.F. Uhlir, (Ed.s) Steering Committee on the Role of Scientific and Technical Data and Information in the Public Domain Office of International Scientific and Technical Information Programs Board on International Scientific Organizations Policy and Global Affairs Division, National Research Council of the National Academies,, xii SEE L. Lessig, Code xiii SEE Julian Birkinshaw and Tony Sheehan, “Managing the Knowledge Life Cycle,” MIT Sloan Management Review, 44 (2) Fall, 2002: 77. xivSEEforex.:xv A short list is relatively easy to compose…

top related