inverted world: open data, open government - sign in...
TRANSCRIPT
Inverted World: Open Data, Open Government
Astana, Kazakhstan Oct. 4-5, 2014 By Eric Kavanagh, CEO The Bloor Group
We Live In Interesting Times!
Are we living in a new era of the Child Emperor?
The world of media has gone from a Push to a Pull model.
Craigslist guts revenues for major newspapers by 50%.
With newspapers, what you read was nobody’s business.
When you browse online today, every motion you make can be monitored and tracked!
What happens when practices invert?
Mark Zuckerberg
Why Open Government? Trust!
“If you always tell the truth, you don’t have to remember anything!” Trust breeds trust, while
mistrust breeds mistrust. Trust is the foundation of all
social and civil transactions. Without trust, chaos will ensue
Mark Twain
What Happens When Opacity Rules?
World Bank Data in 2000 stated that US GDP was $10T
In that same year, Russia was listed at $340B!
How could one of two Super Powers be nearly 30x the size of the other, in terms of GDP?
The Black Market!
Russia was still mired in practices & behaviors that stemmed from decades of oppression
Most transactions, small and large, took place under the radar of government officials
Lack of openness on rules and data created widespread mistrust
Lack of trust leads to unrest of various kinds!
Time to Return to First Principles?
Philosopher Lao Tzu cautioned against complicated legal codes 2,500 years ago
“When the laws are complex, the bandits will abound!”
You cannot consciously obey laws you don’t get
Enforcement = Arbitrary
Ancient Wisdom Still Applies Today!
All Roads Lead to Open Government?
Policies must be defined to determine which data sets should be open
Procedures must be designed to enable a smooth flow of data
Practices must be devised that enable all parties to collaborate
This Will Take Time!
Today’s Governments Must Embrace Transparency
But How?
Why Not Open Government?
All data is open to someone! If you don’t open the door, someone else will!
Talk about an inverted world! A US Citizen gets political asylum in Russia? The former Soviet Union?
When you try to control too tightly, you can often lose all control!
How Can Open Government Happen?
An ideal solution exists, and it’s called the Highly Distributed File System
HDFS is the foundation of Hadoop, and it’s an open-source system, aka FREE!
HDFS is infinitely scalable, and is designed to not lose data
This foundation can serve as the ultimate storage area for open data
Hadoop Players: Strenghts & Weaknesses
Purveyors of the pure source code; but navigation is difficult.
$1B invested esp. Intel; proprietary enterprise software.
Purely open-source approach; serious ‘support’ costs.
Focus on traditional data management; proprietary.
Enterprise hardened, tandem with Vertica; proprietary.
Just the Data Will Not Be Enough
Data without the context of process and meaning provides no value
Open Government also requires transparency of process: who does what, when, why, and how?
The complete picture must be viewable such that public eyes can help
Collaboration 3.0? Many Hands…
A range of functionality can enable citizens to assist the government
Questionable expenses or processes can be flagged; a critical mass leads to formal review
Registered users who find valuable issues will earn points and credibility in this meritocracy!
Beware the Specter of Data Quality
Data Quality is notoriously bad in organizations and government entities
Problems go far beyond misspelled names and bad addresses
Business logic and rules trapped in legacy apps
Faulty integration points Metadata out of synch
with data values
Key Code Analysis – Invoice data sets extracted with correlation
• CAGE: 984274, DUNS: 973437 – FPDS DUNS and Names extracted & correlated
• 158181 unique DUNS codes – Will be included in normalized composite IT Asset
records – Composite records for lookup added to Hadoop
• By DUNS or Global DUNS: get all related DUNS, CAGE, names
• By CAGE: get all related DUNS, names • By name: get all related DUNS, CAGE, names
Number CAGE Per DUNS Code
0.1
1
10
100
1000
10000
100000
1000000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 23 24 27 35 40 43 44 46 54 71 78 90 119
Number DUNS Codes With X CAGE Codes
One DUNS code has 119 CAGE
0.1
1
10
100
1000
10000
100000
1000000
0 1 2 3 4 5
FPDS Number DUNS with N Global DUNS
0.1
1
10
100
1000
10000
100000
1 3 5 7 9 11 13 15 17 19 21 24 27 35 112
FPDS: Number DUNS with N Names
6849 instances for code = 123456787
0.1
1
10
100
1000
10000
0 50 100 150 200 250
Num
ber
Glo
bal D
UN
S
Number DUNS
FPDS: Number Global DUNS with N DUNS
0.1
1
10
100
1000
0 200 400 600 800 1000 1200 1400 N
umbe
r G
loba
l DU
NS
Number Names
FPDS: Global DUNS with Multiple Names
140827
13302
17363
942
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
DUNS GlobalDUNS
FPDS DUNS Code Matches to WAWF Codes
Found NotFound
DUNS NGlobalDUNS Nnames
123456787 0 6849
136666505 0 112
790238851 0 96
103933453 1 35
103385519 1 33
005149120 1 27
067641597 1 25
005103494 0 24
332619535 0 24
020751082 1 22
054781240 1 22
621599893 1 21
790238638 0 21
834476079 1 21
FPDS DUNS With Most Names 123456787 miscellaneous foreign contractors 123456787 etisalat c/o us consulate general dubai 123456787 boswedden house 123456787 turner engine controls b. v. 123456787 swissport hellas cargo s a 123456787 orbit couriers sa 123456787 goldair aviation handling s.a.
123456787 federal egov iae initiative generic duns
123456787 federal egov iae initiative - generic duns
123456787 miscellaneous foreign contractorsan 123456787 prc-desoto 123456787 inversiones sochagota e.u. 123456787 comcel 123456787 transporte y servicio lucio
123456787 jesse james members only maxi taxi svc
123456787 club naval de oficiales 123456787 inchcape shipping services 123456787 dr. thalia abatzi 123456787 central asia development group 123456787 bennett-fouch and associates 123456787 noor al-sabah company 123456787 ait/arc infrasture solutions 123456787 not available 123456787 77 construction company
136666505 adese genc petrol 136666505 amy lily chung 136666505 anderson erin ruth 136666505 andrew william knef 136666505 anduaga-arias laura 136666505 angelica m. de la cruz 136666505 anthony o'brien, 330531-5100194 136666505 batac belle 136666505 bottesini beth ms. 136666505 bouck shannon 136666505 bunn amy b. 136666505 carlene clark 136666505 cho, boong haeng 136666505 choe, sun young 136666505 christina michajlyszyn 136666505 christopher cannon 136666505 christopher l. booth 136666505 chun, kil mo 136666505 conflict + transition consultancies 136666505 cozzone elaine 136666505 deborah p. carney 136666505 denihan patricia joann
136666505 dong sook mcgeorge, 690525-2716816
136666505 dorene d.lukewalton,pharm d. 136666505 dr. terry a. klein
FPDS Global DUNS with Most Names & DUNS
GlobalDUNS NDUNS Nnames 877936518 12 27299 624770475 212 21866 148095086 80 21754 027079776 2 17128 103933453 86 17075 026157235 4 15694 963737366 106 15200 134303192 19 14481 067641597 108 13998 064680213 102 13809 077652761 93 12914 002204600 15 12570 039860122 44 12382 805258373 130 11995
GlobalDUNS NDUNS Nnames 624770475 212 21866 805258373 130 11995 012003349 128 9748 877987347 127 8253 057272486 124 6935 007250079 123 9076 071767334 123 9474 158140041 117 6671 019710586 116 8163 091441089 116 7813 616924770 116 7217 067641597 108 13998
Prompted Collaboration and New Business Information
Showing these results prompted discussions leading to: – There are generic DUNS heavily used but these
are being removed from use via policy changes – System validation rules are not current with all
policy – Additional “rules” of how to track, audit, align,
merge spread by email • All put back into Data Normalization system and then
into modified Java
New results available over all data sets <1day
Make the Data Useful: Search 3.0!
Traditional Google-based search is very primitive, pays little attention to context or semantics
Most end users have become dulled by it and don’t invest much time
What’s the loneliest place on the Web? Page two of Google results!
The Search Giant Seems Rather Distracted
One Solution? Zakta Research
Zakta provides a comprehensive search platform
Search results begin with a semantic map: Penguins -- Arctic birds, the Linux logo, or Pittsburgh hockey team?
Guided navigation through topics enables solid research Give Users the Tools
To Discover Issues
Beware: Security! Is IT Safe?
Any system can be hacked, whether from inside or outside
Strategies and tactics of hackers change all the time and must be monitored closely
With so much money living in a digital world, security poses grave challenges for us all Mischief Makers Are
Here to Stay!
One Solution? Extrahop
Replicates all network traffic to create a mirror image of application architectures
Provides multifaceted view of information landscape; identifies all packets of data
Could have saved entire Target breach from ever taking place! Security Requires
More Than Tools and People
Truth: A Roadmap to Prosperity
In all of life’s dealings, truth engenders trust
Trust fosters openness Openness leads to
collaboration The end result? Peace
and prosperity, with very little mischief!
“Please, let it have been a goat!”
Faoud Ajami
Your Presenter Today:
Eric Kavanagh, CEO The Bloor Group [email protected] Twitter: @Eric_Kavanagh +512.426.7725 http://www.insideanalysis.com
Take the Inside Track to Insight!