the simple hard problem of time

Download The simple hard problem of time

If you can't read please download the document

Upload: michael-caruso

Post on 25-Jul-2015

87 views

Category:

Technology


1 download

TRANSCRIPT

1. Historical Data The Simple Hard Problem Of Time Time and time series are fundamentalto the fabric of relationships that existin any database that manages historicaldata. Time and time series are independent ofthe applications and entities they helpto model. Historical Data A Simple Hard Problem 1 2. Historical Databases, Time, and Time Series An introduction to some financial dataand a simple hard problem. An introduction to temporal data from adata base designers perspective. A introduction to a query processingarchitecture that supports this model. Historical Data A Simple Hard Problem 2 3. A Simple Hard Problem Find the average pe for the S&P 500 forthe last 12 month ends.^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.evaluate: [ ^date print; Named Universe SP500 list average: [price / eps12]. printNL];Historical Data A Simple Hard Problem3 4. A Simple Hard Problem Complex Data Varies Over TimeNamed Universe SP500 list S&P updates the membership of theS&P 500 monthly, so... Time Series hold more than simple datalike numbers and strings.Historical Data A Simple Hard Problem 4 5. A Simple Hard ProblemComparable Data Is Measured At Different Points In Timeprice / eps12 This simple ratio is based on data measured and recorded at different frequencies and at different points in time. price is probably measured and recorded on a business day basis eps12 is a quarterly value based on the most recent 4 quarters of data. Historical Data A Simple Hard Problem 5 6. A Simple Hard ProblemHistorical Data is Restated Stock splits require adjustments tohistorical data: At the end of 1997, Microsoft reported earnings that made eps12 $3.24 per share. Effective February 23, 1998, Microsoft stock split 2 for 1. After the split and until a new value is reported, the value of eps12 should be $1.62 per share. Historical Data A Simple Hard Problem 6 7. A Simple Hard Problem Historical Data is Restated Split adjustments require therestatement of all historical per sharedata to make it consistent andcomparable.DateValueDate Value03/31/19972.27 03/31/1997 1.1306/30/19972.64 06/30/1997 1.3209/30/19973.60 09/30/1997 1.8012/31/19973.24 12/31/1997 1.62 Historical Data A Simple Hard Problem 7 8. A Simple Hard ProblemRestating Historical Data There are two ways to restate historicaldata: Convert a simple fact into a massive, complex, and error prone update. Adjust the affected data on access using a time-series of adjustment factors. Historical Data A Simple Hard Problem 8 9. A Simple Hard ProblemRestating Historical Data Split adjustment data is completelyirregular and has absolutely noperiodicity, so Efficient, irregular, event-oriented time-series are required to store it withminimal redundancy and maximalconsistency.Historical Data A Simple Hard Problem 9 10. A Simple Hard ProblemSeemingly Regular Data Is Not as Regular As It Seems Companies report their data on a fiscal,not a calendar basis: the fourth quarter of 1998 for Woolworths ends in January, 1998 the fourth quarter of 1998 for Walgreens ends in August, 1998Historical Data A Simple Hard Problem 10 11. A Simple Hard ProblemSeemingly Regular Data Is Not as Regular As It Seems Accessing the most recent earnings pershare as of August 25, 1998 meansaccessing: 2nd Quarter, 1999 fiscal data for Woolworths 3rd Quarter, 1998 fiscal data for WalgreensHistorical Data A Simple Hard Problem 11 12. A Simple Hard Problem Currency Conversion What if this simple hard problem wasbased on a universe of internationalsecurities? What if different data sources reportdata for the same security in differentcurrencies? Currency conversion rates - anothertime-series required to correctly usefinancial data. Historical Data A Simple Hard Problem 12 13. A Simple Hard ProblemA Summary of Some of the Issues Complex aggregates, not just numbersand strings, vary over time. Comparable data is measured atdifferent points in time. Regularly measured data is adjusted forthe effects of irregularly spaced events. Seemingly regular data is often not asregular as it first appears.Historical Data A Simple Hard Problem 13 14. A Simple Hard ProblemA Summary of Some of the Needs Complex rules are required to correctlyinterpret and use the data. These rules must be encapsulated in areusable form so that every applicationdoes not need to reproduce them. These rules must be accessible to theDBMS if it is to be more than a staticrepository. Historical Data A Simple Hard Problem 14 15. A Simple Hard ProblemA Summary of Some of the Needs Simplicity Despite the complexity associated withaccessing and using the data, simplequeries must remain simple to state:^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.evaluate: [ ^date print; Named Universe SP500 list average: [price / eps12]. printNL];Historical Data A Simple Hard Problem15 16. A Simple Hard ProblemA Summary of Some of the Needs The issue is building and using anhistorical database, not just storing andretrieving stand-alone time-series. Historical Data A Simple Hard Problem 16 17. A Designers Perspective On A Simple Hard Problem With time providing the context toanswer it correctly^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.evaluate: [ ^date print; Named Universe SP500 list average: [price / eps12]. printNL]; How do we get there?Historical Data A Simple Hard Problem17 18. A Designers PerspectiveSome Underlying Data Underneath it all, a simple enoughworkhorsePriceRecord defineFixedProperty: security. defineFixedProperty: recordDate. defineFixedProperty: rawPrice. defineFixedProperty: rawVolume. defineFixedProperty: adjustmentDateHistorical Data A Simple Hard Problem 18 19. A Designers PerspectiveTemporal Multi-Valued Relationships and a temporal, multi-valuedrelationship from Security toPriceRecord (a.k.a... a TimeSeries).Security define: prices withDefault: PriceRecordpricesSecurityPriceRecordPriceRecord[1:n] (T) Historical Data A Simple Hard Problem19 20. A Designers PerspectiveTemporal Multi-Valued Relationships Temporal multi-valued relationshipscan be accessed and used as time-seriesNamed Security IBM :prices countNamed Security IBM :prices minimum: [recordDate]Named Security IBM :prices mavg30: [price]Named Security IBM :prices asOf: ^today - 6 monthEnds Historical Data A Simple Hard Problem 20 21. A Designers PerspectiveTemporal Multi-Valued Relationships But they exhibit their modeling powerwhen combined with the temporalcontext of an operation to yield thecorrect single value for that contextNamed Security IBM prices rawPrice19980731 evaluate: [Named Security IBM prices rawPrice] Historical Data A Simple Hard Problem 21 22. A Designers PerspectiveWhat About Rules Like Split Adjustment? Split adjustment requires a time-series ofadjustment factors for each Security:Security define: adjustmentFactor withDefault: 1.0; And a rule to compute a relativeadjustment factor between an arbitrarydate and the present:Security defineMethod: [ | adjustmentRelativeTo: aDate | (:adjustmentFactor asOf: ^today) / (:adjustmentFactor asOf: aDate)]; Historical Data A Simple Hard Problem 22 23. A Designers PerspectiveWhat About Rules Like Split Adjustment? With the rule in place, PriceRecord andSecurity can use it:PriceRecord defineMethod: [ | adjustedPrice |rawPrice / adjustmentFactor ];PriceRecord defineMethod: [ | adjustmentFactor |security adjustmentRelativeTo: (adjustmentDate else: recordDate) ];Security defineMethod: [ | price |prices adjustedPrice ]; Historical Data A Simple Hard Problem 23 24. A Designers Perspective Queries Revisited to enable the simple statement ofcomplex queries Named Security IBM price ^today - 1 monthEnds evaluate: [Named Security IBM price] ^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.evaluate: [ ^date print; Named Universe SP500 list average: [price / eps12]. printNL]; Historical Data A Simple Hard Problem 24 25. A Designers PerspectiveWhat Are Time Series? A Recap Time series are date indexed collections. Time series support collection leveloperations: select: average: min: max: The set of collection level operations isand must be user extensible: mavg30: lsgrow:Historical Data A Simple Hard Problem 25 26. A Designers PerspectiveWhat Are Time Series? A Recap Time series have an associated datetype that serves as a calendar. The date type defines a time line alongwhich observations are stored. The events recorded in a time seriesdivide the time line into intervals. Historical Data A Simple Hard Problem 26 27. A Designers PerspectiveWhat Are Time Series? A Recap Time series support the interval queriesneeded to project temporal multi-valuedrelationships to context dependentsingle valued relationships: Find the observation on or before a given time point. Find the time point that begins (ends) the interval containing a given time point. Historical Data A Simple Hard Problem 27 28. An Architectural Perspective(Dont Try This At Home) The engine that powers these examplesemploys a model of information thatintegrates data base and programminglanguage principles into a scalable database programming language. Historical Data A Simple Hard Problem 28 29. An Architectural Perspective(Dont Try This At Home) The examples are data base oriented,but the architecture and implementationis not that of a programming languagemanipulating data extracted from anexternal data base.Historical Data A Simple Hard Problem 29 30. An Architectural Perspective (Dont Try This At Home) The examples are object-oriented, butthe architecture and implementation isnot that of a traditional object-orientedprogramming language. Historical Data A Simple Hard Problem 30 31. An Architectural Perspective Key Features of the Model Relationship centric information modelbased on category theory. Objects are abstract entities that haveno internal state or structure. They arenot records. All information is stored in thefunctions that connect objects. Historical Data A Simple Hard Problem 31 32. An Architectural PerspectiveInherently Algebraic The following diagram is a simplifiedview of the algebraic structure of atime-series lookup operation:ElementsekepartResultSeriesDates csel eselkselQueryHistorical Data A Simple Hard Problem 32 33. An Architectural PerspectiveInherently Collection Centric And Parallel For example, when processing the pricemethod in:^today - 1 monthEnds to: ^today -12 monthEnds by: 1 monthEnds.evaluate: [^date print;Named Universe SP500 list average: [price / eps12]. printNL]; the engine is operating on a set ofSecurity objects, not a single Security.Historical Data A Simple Hard Problem33 34. An Architectural PerspectiveGlobally Optimize-able Optimizations apply to the entireapplication, not just the data base orprogramming language portions of it: query precision computation flows tuned to clustering morphism factoring Historical Data A Simple Hard Problem 34