federal big data working group meetup

18
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup January 7, 2014 1

Upload: yitta

Post on 25-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Federal Big Data Working Group Meetup. Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup January 7, 2014. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Federal Big Data Working Group  Meetup

1

Federal Big Data Working Group Meetup

Dr. Brand NiemannDirector and Senior Data Scientist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

January 7, 2014

Page 2: Federal Big Data Working Group  Meetup

2

Mission Statement• Federal: Supports the Federal Big Data Initiative, but not endorsed

by the Federal Government or its Agencies;• Big Data: Supports the Federal Digital Government Strategy which

is "treating all content as data", so big data = all your content;• Working Group: Data Science Teams composed of Federal

Government and Non-Federal Government experts producing big data products (see Possible Team Presentations below); and

• Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House.

Page 3: Federal Big Data Working Group  Meetup

3

Co-organizers• Brand Niemann and Kate Goodier• Kate Goodier, Host: Excelerate Solutions offices in Tysons Corner:

– Capacity about 50 with Skype and wifi available. The Silver Line Spring Hill Metro Stop (planned to open in March) is across the street (Route 7 and Spring Hill Road).

• Directions to the building are easy and they have open underground parking:– See photo below from Excelerate Solutions Office looking south to the

Spring Hill Road Silver Line Metro Station (planned to open in March 2014).

• Logistics:– Refreshments, restrooms, etc.

Page 4: Federal Big Data Working Group  Meetup

4

Suggested Format• 6:30 p.m. Tutorials (I will start with - Proposed GMU Course, and hope that others

would offer to do tutorials as well) and Refreshments• 7:00 p.m. Introductions and Announcements (10 seconds per individual depending

on the size of the group)– Remarks by Dr. George Strawn, Director, NITRD/NCO and co-chair of the

Federal Big Data Senior Steering Work Group• 7:15 p.m. Featured Presentation/Demonstration (where did you get the data, where

did you store the data, and what were your results)– Start with our Semantic Big Data Science Application: Semantic Medline on the YarcData

Graph Appliance for the Federal Big Data Senior Steering Work Group that our Semantic Data Science Team made a good presentation of to Lee Watkins Jr., Director of Bioinformatics at the Institute of Genetic Medicine Center for Inherited Disease Research (CIDR) recently.

• 8:30 p.m. Networking/Individual Demos (talk among yourselves and look at one another's work)

• 9:00 p.m. Continue Your Conversations Elsewhere (We need to clear out of the space)

Page 5: Federal Big Data Working Group  Meetup

5

Next Meetups• Second Meetup: Tuesday, February 4, 6:30 p.m.

– Continue Data Science Tutorial: Practical Data Science for Data Scientists– What Went Wrong with the Obamacare Web Site, and How Can It Be Fixed? and Why the First

Rollout of HealthCare.gov Crashed, an Architectural Assessment, Eric Kavanagh, Inside Analysis, and Geoffrey Malafsky, PSIKORS Institute; Healthcare.gov Data Science, Brand Niemann, Semantic Community; and Healthcare.gov Prototype Video, Kees van Mansom, Be Informed

• Third Meetup: Tuesday, February 18, 6:30 p.m.– Continue Data Science Tutorial: Modus Operandi Semantic Knowledge Base– Wave All-Source Semantic Fusion Engine: Eric Little, Modus Operandi: and Department of

Defense Metadata Engineers.• Fourth Meetup: March 4, 6:30 p.m.

– Continue Data Science Tutorial: Graph Databases and Bigdata SYSTAP Literature Survey of Graph Databases

– Bigdata SYSTAP, Michael Personick and Bryan Thompson, SYSTAP• April Workshop: Date and Location TBA

– 2nd Cloud: SOA, Semantics, Data Science, and Business Concept Computing (16th SOA for eGov Conference).

Page 6: Federal Big Data Working Group  Meetup

6

Practical Data Science for Data Scientists

http://semanticommunity.info/Data_Science/Practical_Data_Science_for_Data_Scientists

Page 7: Federal Big Data Working Group  Meetup

7

Resources• Required Textbook

– Doing Data Science:• http://shop.oreilly.com/product/0636920028529.do• Free Sampler:

– http://cdn.oreillystatic.com/oreilly/booksamplers/9781449358655_sampler.pdf (PDF)

• Optional Supplemental Reading:– Data Science Starter Kit:

• http://shop.oreilly.com/category/get/data-science-kit.do– DC Data Community:

• http://datacommunitydc.org/blog/about/

• DC Data Community Calendar:– http://datacommunitydc.org/blog/calendar/

• Technology Requirements– Internet and Free Tools like Spotfire Cloud:

• https://spotfire.cloud.tibco.com/tsc/#!/compproductrequest– NodeXL:

• http://nodexl.codeplex.com/

Page 9: Federal Big Data Working Group  Meetup

9

Tutorial

• Overview: Data Science and the Data Science Process• My Profile: Breaking Government/AOL Government

Data Stories and Products– Select some interesting content and make it structured– Select a related data set/table– Explore both and write a story about it:

• Where did you get the data?,• Where did you store the data?, and• What were your results?• What were the steps?

• Assignment: Do something like My Profile

Page 10: Federal Big Data Working Group  Meetup

10

Overview: Data Science

http://semanticommunity.info/Data_Science

Key Concepts ExtractedWhat is Data Science? The future belongs to the companies andpeople that turn data into productsSee Sidebar Topics

Page 11: Federal Big Data Working Group  Meetup

11

Overview: Data Science Process

http://semanticommunity.info/Analytics/Predictive_Analytic_World_Government_2013#Story

So my three overlapping circles are: "Find and Prepare Data Sets", "Store and Query Data Sets", and "Discover Data Stories in the Data Sets“See mapping between the three Venn Diagrams in the table below.

Page 12: Federal Big Data Working Group  Meetup

12

Select some interesting content

http://breakinggov.com/2012/03/30/defense-department-bets-big-on-big-data/

Page 13: Federal Big Data Working Group  Meetup

13

Make it structured

http://semanticommunity.info/@api/deki/files/27612/SpotfireCloud.xlsx

Page 14: Federal Big Data Working Group  Meetup

14

Select a related data set/table

http://semanticommunity.info/@api/deki/files/27612/SpotfireCloud.xlsx

My Note: This isCategorized (Faceted Search)Correlation (Two Numeric Variables)Relational (Columns and Rows)Linked (URLs)Semantic Web (Subject, Predicate, and Object)Graph/Network Analytics (Edge and Node Tables)Geospatial (Could add Latitude and Longitude)

Page 15: Federal Big Data Working Group  Meetup

15

AOL Gov to BreakingGov Migration

Web Player

Note: The lack of correlation between Excel size and Spotfire sizeis due to the presence of large boundary (Shape) files).

Page 17: Federal Big Data Working Group  Meetup

17

Explore both and write a story about it

• Where did you get the data?,– The Web and spreadsheets

• Where did you store the data?, and– Spreadsheets

• What were your results?– All files were accounted for in the two migrations (data quality), versatile formats

were created, and visualizations help me and others build on this data science work

• Steps:– Search MindTouch for Spotfire File Name: Like GDELT-Spotfire– Find Where It Was Used at One Or More Locations– Change Web Player Links in Spotfire Dashboard, Story, and Slides– Test to See If Embedded File Works– Repeat the Process 283 Times!

Page 18: Federal Big Data Working Group  Meetup

18

Preview of What You Are Going To Hear

• The Best Way to Get BIG DATA is By Starting Small:– BIG DATA– Subcommittee on Networking and Information Technology Research

and Development (NITRD Subcommittee)• These three activities fostered Semantic Medline on the YarcData Graph

Appliance for the White House Big Data Initiative.– Data Science Team Example– Generic Problems– Semantic Medline – YarcData Graph Appliance Application for Federal

Big Data Senior Steering WG– Modus Operandi: Mantra, Performance, and Vision– Knowledge Base: Modus Operandi Web Intelligence in MindTouch– Big Data in Memory: Innovation Story