biosurveillance 2.0: lecture at emory university
DESCRIPTION
Invited lecture at Emory University Rollins School of Public Health. We presented our InSTEDD global early warning and response social platform; Evolve (http://instedd.org/evolve) with live demonstration.TRANSCRIPT
![Page 1: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/1.jpg)
Biosurveillance 2.0 Collaboration and Web 2.0/3.0 Semantic
Technologies for Better Early Disease Warning and Effective Response
Taha Kass-HoutNicolás di Tada
Invited by Dr. Barbara Massoudi, PhD, MPH
Lecture at Emory University Rollins School of Public Health
Public Health Informatics, INFO 503
Atlanta, GA, USA
![Page 2: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/2.jpg)
![Page 3: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/3.jpg)
Background
![Page 4: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/4.jpg)
DAY
CA
SE
S
Opportunity for control
Background
Late Detection and Response
![Page 5: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/5.jpg)
DAY
CA
SE
S
Opportunity for control
Background
Early Detection and Response
![Page 6: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/6.jpg)
Public Health Measures
• Representativeness
• Completeness
• Predictive Value
• Timeliness
Background
![Page 7: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/7.jpg)
Public Health Measures
1000 Malaria infections (100%)
50 Malaria notifications (5%)
Get as close to the bottom of the pyramid
as possible
Urge frequent reporting: Weekly daily immediately
Specificity / Reliability
Sensitivity / Timeliness • Main attributes
o Representativenesso Completenesso Predictive value positive
Background
![Page 8: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/8.jpg)
Analyze and interpret
Signal as early
as possible
Automated analysis/thresholds
Time
• Main attributeso Timeliness
Public Health MeasuresHealth care hotline
Background
![Page 9: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/9.jpg)
Public Health – Two Perspectives
• Case management – Individual cases of notifiable diseases– Relationship networks (contact
tracing)
• Population surveillance– Larger risk patterns
Background
![Page 10: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/10.jpg)
Case Management
• Questions and problems:– Is a case due to recent transmission?– If so, does the case share any feature with
other recent cases?
• Current methods:– Investigations and interviews– Meeting with other investigators
Background
![Page 11: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/11.jpg)
Population Surveillance
• Questions and problems:– Are more cases happening than expected?– Does an excess suggest ongoing transmission
in a specific region?
• Current methods:– Semi-automated routine temporal and
space-time statistical analysis
Background
![Page 12: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/12.jpg)
Why location matters:Case Management
• If you are studying a case of a certain disease that was just declared
• It is harder to picture the situation by looking at something like this...
Background
![Page 13: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/13.jpg)
Background
Why location matters:Case Management
![Page 14: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/14.jpg)
Why location matters:Case Management
• Than by looking at this..
Background
![Page 15: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/15.jpg)
Why location matters:Case Management
Background
![Page 16: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/16.jpg)
Why location matters:Population Surveillance
• If you are studying the spatial distribution of a set of disease clusters, this next slide seems more difficult…
Background
![Page 17: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/17.jpg)
Why location matters:Population Surveillance
Background
![Page 18: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/18.jpg)
Why location matters:Population Surveillance
• Than this...
Background
![Page 19: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/19.jpg)
Why location matters:Population Surveillance
Background
![Page 20: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/20.jpg)
The Problem Space
• Current systems design, analysis and evaluation has been geared towards specific data sources and detection algorithms – not humans
• We have systems in place for those threats we have been faced with before
The Problem
![Page 21: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/21.jpg)
Traditional DISEASE SURVEILLANCE
• In the past two decades focus was on – automatically detecting anomalous patterns in
data (often a single stream)
• Modern methods– rely on human input and judgment – incorporate temporal, spatial, and multivariate
information
The Problem
![Page 22: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/22.jpg)
9/20, 15213, cough/cold, …9/21, 15207, antifever, …9/22, 15213, CC = cough, ...1,000,000 more records…
Huge mass of data Detection algorithm “What are we supposed to do with
this?”
Too many alerts
Traditional DISEASE SURVEILLANCE
The Problem
![Page 23: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/23.jpg)
Our Approach
• Human-based
• Collaborative and cross-disciplinary
• Web 2.0/3.0 platform
Our Approach
![Page 24: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/24.jpg)
Information Sources
• Event-based - ad-hoc unstructured reports issued by formal or informal sources
• Indicator-based - (number of cases, rates, proportion of strains…)
Timeliness, Representativeness, Completeness, Predictive Value, Quality, …
Our Approach
![Page 25: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/25.jpg)
9/20, 15213, cough/cold, …9/21, 15207, antifever, …9/22, 15213, CC = cough, ...1,000,000 more records…
Huge mass of data
Feedback loop
MODERN DISEASE SURVEILLANCE
Our Approach
Fewer and more actionable alerts
Effective and coordinated response
![Page 26: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/26.jpg)
Evolve: Main Components
Feature extraction, reference and baseline information
Tags
Multiple Data Streams
User-Generated and Machine Learning Metadata
Comments
Spatio-temporal
Flags/Alerts/Bookmarks
Evo
lve Bo
t
Event Classification,
Characterization and Detection
Previous Event Training Data
Previous Event Control Data
Metadataextraction
Machine learning
Social network
Professional feedback
Anomaly detection
Collaborative Spaces
Hypotheses generation\testing
Our Solution
![Page 27: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/27.jpg)
Evolve: Main Components
Our Solution
![Page 28: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/28.jpg)
Item
Hypothesis
Field Actions and Verifications
Feedback / Confirmation
Our Solution
Evolve: Process
Item ItemItem
Item
Item Item
ItemItem
![Page 29: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/29.jpg)
Advantages of Machine Learning
P(malaria) = 22% P(influenza) = 13% P(other ILI) = 33%
Our Solution
![Page 30: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/30.jpg)
Machine Learning Techniques
1. Classifiers
2. Clustering
3. Bayesian Statistics
4. Neural Networks
5. Genetic Algorithms
Our Solution
![Page 31: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/31.jpg)
How to represent a document:
cold
fever
Our Solution
![Page 32: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/32.jpg)
(1) Classifiers:Problem Definition
• Map items to vectors (Feature extraction)
• Normalize those vectors
• Train the classifier
• Measure the results with new information
• Feedback the classifier
• Separate classes in feature space
Our Solution
![Page 33: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/33.jpg)
Classifiers:Support Vector Machines (SVM)
Our Solution
![Page 34: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/34.jpg)
SVM – Margin Maximization
• Support vectors define the separator
Our Solution
![Page 35: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/35.jpg)
SVM – Non-linear?
Φ: x → φ(x)
Map to higher-dimension space
Our Solution
![Page 36: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/36.jpg)
SVM – Filtering or classifying
ClassifierClassifier
Document 1
Document 1
Document 2
Document 2
Document 3
Document 3
PositivesPositives
NegativesNegatives
Training DocumentTraining
DocumentTraining
DocumentTraining
Document
Our Solution
![Page 37: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/37.jpg)
(2) Clustering:Problem Definition
• Map items to vectors (Feature extraction)
• Normalization
• Agglomerative or Partitional
Our Solution
![Page 38: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/38.jpg)
Clustering: AGGLOMERATIVE
Our Solution
![Page 39: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/39.jpg)
Clustering: PARTITIONAL
Our Solution
![Page 40: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/40.jpg)
(3) Bayesian Statistics
P(A |B) P(B | A).P(A)
P(B)
Probability of disease A (flu)
once symptom B (fever) is observed
Probability of disease A (flu)
once symptom B (fever) is observed
Probability of fever once flu is confirmed
Probability of fever once flu is confirmed
Probability of flu (prior or marginal)
Probability of flu (prior or marginal)
Probability of fever (prior or
marginal)
Probability of fever (prior or
marginal)
Our Solution
![Page 41: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/41.jpg)
(4) Neural Networks
• Given a set of stimuli, train a system to produce a given output…
Our Solution
![Page 42: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/42.jpg)
Hidden LayerHidden Layer
Output LayerOutput Layer
Input LayerInput Layer
Neural Network: Structure
[…]
[…]
{I0,I1,……In}
{O0,O1,……On}
Weight
Weight
).(0 in
I
i in wIH
Our Solution
![Page 43: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/43.jpg)
Neural Network:Application
Event?
Our Solution
![Page 44: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/44.jpg)
(5) Genetic Algorithm:Basic
• Define the model that you want to optimize
• Create the fitness function
• Evolve the gene pool testing against the fitness function.
• Select the best individual
Our Solution
![Page 45: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/45.jpg)
Genetic Algorithm:Model
• Model the transmission process using a set of parameters (e.g., an infectious disease):– Onset time between an infection and illness– Latency period– Incubation period– Symptomatic period– Infectious period
(Onset, Latency, Incubation, Symptomatic , Infectious)
( 2 days, 3 days, 1 day, 4 days, 3 days)
Our Solution
![Page 46: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/46.jpg)
Genetic Algorithm:Model Fitness
Fitness = 1/AreaFitness = 1/Area
Our Solution
![Page 47: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/47.jpg)
Genetic Algorithm:Process
1. Create an initial population of candidates
2. Use operators to generate new candidates (mating and mutation)
3. Discard worst individuals or select best individuals in generation
4. Repeat from 2 until you find a candidate that satisfies the solution searched
Our Solution
![Page 48: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/48.jpg)
(4,5,6,3,5) (4,3,6,2,5)
Genetic Algorithm:Process
(5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2)
(2,3,4,6,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
(5,3,2,6,5)
(3,4,4,6,2)
(5,3,2,6,5)
(3,4,4,6,2)
Our Solution
![Page 49: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/49.jpg)
Result of incorporating all 5 techniques:Improved Surveillance
Our Solution
![Page 50: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/50.jpg)
Our Solution
InSTEDD Evolve
Related items (e.g., News articles) are grouped into a thread. Threads are later associated with events (hypothesized or confirmed).
Related items (e.g., News articles) are grouped into a thread. Threads are later associated with events (hypothesized or confirmed).
InSTEDD Evolve: (http://instedd.org/evolve)
Tag cloud and semantic heatmap
Tag cloud and semantic heatmap
![Page 51: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/51.jpg)
Our Solution
InSTEDD Evolve
InSTEDD Evolve: (http://instedd.org/evolve)
Filter feature which automatically filters for related items, updates the map and associated tagsFilter feature which automatically filters for related items, updates the map and associated tags
![Page 52: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/52.jpg)
Our Solution
InSTEDD Evolve
InSTEDD Evolve: (http://instedd.org/evolve)
Auto-generated (machine-learning) tags. These tags are semantically ranked (a
statistical probability match). Users can further train the classifier by accepting or rejecting a suggestion. Users can similarly
train the geo-locator by simply accepting or rejecting and updating a location.
Auto-generated (machine-learning) tags. These tags are semantically ranked (a
statistical probability match). Users can further train the classifier by accepting or rejecting a suggestion. Users can similarly
train the geo-locator by simply accepting or rejecting and updating a location.
![Page 53: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/53.jpg)
Our Solution
InSTEDD Evolve
InSTEDD Evolve: (http://instedd.org/evolve)
Tracking the recent Avian Influenza Outbreak in Egypt (reports started to appear late January 2009). Notice the pattern of reported incidents along the Nile river.
Tracking the recent Avian Influenza Outbreak in Egypt (reports started to appear late January 2009). Notice the pattern of reported incidents along the Nile river.
![Page 54: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/54.jpg)
Acknowledgements
![Page 55: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/55.jpg)
Through funding from:
![Page 56: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/56.jpg)
Thank You!
Taha Kass-Hout Nicolás di Tada
![Page 57: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/57.jpg)
BACKGROUND MATERIAL
![Page 58: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/58.jpg)
Index• Disease surveillance References
– Computing– Automating Laboratory Reporting– Using EMR data for disease surveillance– Related Projects– Misc Readings
• Open Source Software (OSS) References– Open Source License References– Open Source References– Open Source and Public Health References
• Architectural Matters– Service Oriented Architecture (or SOA)– Synchronization Architecture– Cloud Architecture
![Page 59: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/59.jpg)
DISEASE SURVEILLANCEReferences and Related-Efforts
![Page 60: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/60.jpg)
REFERENCES• Izadi, M. and Buckeridge, D., Decision Theoretic Analysis of Improving Epidemic
Detection, AMIA 2007, Symposium Proceedings 2007• EpiNorth-Based material (http://www.epinorth.org):
– Mereckiene, J., Outbreak Investigation Operational Aspects. Jurmala, Latvia, 2006
– Bagdonaite, J., and Mereckiene, J., Outbreak Investigation Methodological aspects. Jurmala, Latvia, 2006
– Epidemic Intelligence: Signals from surveillance systems, Anne Mazick, Statens Serum Institut, Denmark, EpiTrain III, Jurmala, August 2006
• Daniel Neil, Incorporating Learning into Disease Surveillance Systems
![Page 61: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/61.jpg)
REFERENCES• Computing
– The Future of Statistical Computing in Wilkinson (2008)– Complex Event Processing Over Uncertain Data in Wasserkrug (2008)– Outbreak detection through automated surveillance A review of the
determinants of detection in Buckeridge (2007) – Approaches to the evaluation of outbreak detection methods in
Watkins (2006)– Algorithms for rapid outbreak detection a research synthesis
Buckeridge (2004)– Data mining in bioinformatics using Weka in Frank (2004)– Aho-Corasick Algorithm in Kilpeläinen
• Automating Laboratory Reporting– Automatic Electronic Laboratory-Based Reporting in Panackal (2002)– Benefits and Barriers to Electronic Laboratory Results Reporting for Notifiable
Diseases in Nguyen (2007)
![Page 62: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/62.jpg)
REFERENCES• Using EMR Data for Disease Surveillance
– Using Electronic Medical Records to Enhance Detection and Reporting of Vaccine Adverse Events in Hinrichsen (2007)
– Electronic Medical Record Support for PH in Klompas (2007)– A knowledgebase to support notifiable disease surveillance in Doyle (2005)– Automated Detection of Tuberculosis Using Electronic Medical Record Data in
Calderwood (2007)• Misc Readings
– Breakthrough in modeling emerging disease hotspots in Jones (2008)– Use of data mining techniques to investigate disease risk classification as a
proxy for compromised biosecurity of cattle herds in Wales in Ortiz-Pelaez (2008)
– Euclidean distance: http://en.wikipedia.org/wiki/Euclidean_distance – Tags/Folksonomy:
• Tag Decay: A View Over Aging Folksonomy in Russell (2007)• Cloudalicious: Folksonomy Over Time in Russell (2006)
![Page 63: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/63.jpg)
RELATED PROJECTS• InSTEDD Evolve: (http://instedd.org/evolve)
– Collaborative Analytics and Environment for Linking Early Health-Related Event Detection to an Effective Response (http://taha.instedd.org/2008/09/collaborative-analytics-and-environment.html )
• ALPACA "ALPACA Light Parsing And Classifying Application (ALPACA) is a classifying tool designed for use in community-oriented software as well as in Academia. The application consists of two parts: a parsing tool for transforming raw documents into readable data, and a classifying tool for categorizing documents into user-provided classes. The application provides a user-friendly interface and a Plug-in functionality to provide a simple way to add more parsers/classifiers to the application." http://2008.hfoss.org/ALPACA
• Weka An open source "...collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." http://www.cs.waikato.ac.nz/~ml/weka/
![Page 64: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/64.jpg)
RELATED PROJECTS• The R Project for statistical computing: http://www.r-project.org
– Surveillance Project: An Open Source R-package disease surveillance framework for "...the development and the evaluation of outbreak detection algorithms in univariate and multivariate routine collected public health surveillance data." http://surveillance.r-forge.r-project.org
• The R package surveillance in Höhle (multiple articles)
• Google's Research Publications: MapReduce Simplified Data Processing on Large Clusters (http://labs.google.com/papers/mapreduce.html)– Hadoop: a software platform that lets one easily write and run applications
that process vast amounts of data (http://hadoop.apache.org/core)
![Page 65: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/65.jpg)
OPEN SOURCE SOFTWAREReferences and Related-Efforts
![Page 66: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/66.jpg)
REFERENCES• Open Source License References
– http://www.opensource.org/licenses – http://openacs.org/about/licensing/open-source-licensing
• Open Source References– http://www.lifehack.org/articles/technology/open-source-life-how-the-open-
movement-will-change-everything.html – http://en.wikipedia.org/wiki/Open_source – http://www.opensource.org/
• Open Source and Public Health References– http://www.ibiblio.org/pjones/wiki/index.php/
Open_Source_Software_for_Public_Health – http://en.wikipedia.org/wiki/List_of_open_source_healthcare_software – http://www.epha.org/a/320 – Open Source Development for Public Health: A Primer with Examples of Existing
Enterprise Ready Open Source Applications in Turner (2006)– A Quick Survey of Open Source Software for Public Health Organizations in Mirabito
and Kass-Hout (2007)
![Page 67: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/67.jpg)
ARCHITECTURAL MATTERSReferences and Related-Efforts
![Page 68: Biosurveillance 2.0: Lecture at Emory University](https://reader036.vdocuments.site/reader036/viewer/2022062406/55923cef1a28abbd778b476a/html5/thumbnails/68.jpg)
REFERENCES• Service Oriented Architecture (or SOA)
– Proposal for Fulfilling Strategic Objectives of the U.S. Roadmap for National Action on Decision Support through a Service—oriented Architecture Leveraging HL7 Services in Kawamoto (2007)
– Service-oriented Architecture in Medical Software: Promises and Perils in Nadkarni (2007)
– Wiki sources:• SOA: http://en.wikipedia.org/wiki/Service_Orientated_Architecture • Semantic service oriented architecture:
http://en.wikipedia.org/wiki/Semantic_service_oriented_architecture • Synchronization Architecture
– InSTEDD’s Mesh4x: http://mesh4x.org • Cloud Architecture
– Google App Engine: Google App Engine Goes Up Against Amazon Web Services in Gartner Report (2008)