big data or the agile turn? · big data is like nothing before!! 1) commercial adaptive systems are...

Post on 17-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data or The Agile Turn?Seda Gürses

seda@esat.kuleuven.beCOSIC, University of LeuvenCITP, Princeton University

22. March 2018CITIP

1

big data is like nothing before!!

1) commercial adaptive systems are built to capture our grammars of action (for extraction of value)

surveillance model: visual models/knowledge gathering

capture model: linguistic models/reorganization of grammars of action (Agre, 1994)

2) these systems optimize over people, environments and infrastructure

“instantaneous reconfiguring of spatial elements toward any emergent strategic end [extraction of value] by corporate entities. Philips & Cury

2

systems of knowledge vs. optimization

what is at stake?

can privacy laws come to respond to what is at stake?

not a match: laws focus on knowledge/reasoning, current systems on optimization

Optimization involves:

1. techniques of logistics and control

2. discourses legitimating a mathematical state as a solution to social contention

Fenwick McKelvey, 2018

3

the turn to agile

shrink wrap services

waterfall model agile programming

PC cloud

4

• exploratory study (work in progress)

• Privacy after the Agile Turn (forthcoming)

• https://osf.io/27x3q/

• interviews and chats

• devs, devops, product managers, a/b testers, AI/data product developers, data engineers, privacy officers

• industry white papers

• legal and policy literature

methodology

5

shrink wrap software

6

agile methods

SOAcloud

IaaS/PaaS

SaaS

7

the turn to agile

shrink wrap services

waterfall model agile programming

PC cloud

8

shrink wrap services

9

1) All teams will henceforth expose their data and functionality through service interfaces.

2) Teams must communicate with each other through these interfaces.

3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team's data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.

4) It doesn't matter what technology they use. HTTP, Corba, Pubsub, custom protocols -- doesn't matter. Bezos doesn't care.

5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.

6) Anyone who doesn't do this will be fired.~2001/2002

10

shrink wrap services

server (thin) client modelbinary runs solely on client side

requires matching soft & hardware data “secured” by service

collaborative

updates and maintenance server side

updates & maintenance cumbersome

user has control (oh no!)

pay as you use/trialpay in advance

enterprise apps

Microsoft Word office 365

11

server - thin client model

bundled services

licensing and pricing models intensified tracking

pooling of data

transaction throughout use

implications of the shift to services

agile service integration

12

version+

purchase

shrink wrap software production use

time

pay per use

service bundle

use

13

picture album creation service

authentication payment mapsembedded media

social

CRM

team integration

production tools

UX capture

SDK/PaaS cybersecurity performance

AB Testing

advertisement

data brokers analytics

14

15

http://uservoice.com

http://sproutvideo.com

http://startapp.com

http://fitocracy.com

http://meuspedidos.com.br

http://oyorooms.com

http://urbanclap.com

http://himalayastore.com

http://travelport.com

http://credomobile.com

http://deputy.com

fullstory in top 1 million siteshttp://remitly.com

http://wahoofitness.com

http://wayup.com

http://tieks.com

http://referralcandy.com

http://codeschool.com

http://owler.com

http://surfdome.com

http://autopilothq.com

http://conte.it

http://autoeurope.com

http://moosejaw.com

http://clickminded.com

http://keen.io

http://samcart.com

http://thebouqs.com

http://mymove.com

http://scripted.com

http://namely.com

http://shethinx.com

http://castorama.pl

http://nexojornal.com.br

Thanks to Dillon Reisman from Princeton U. for the web crawl, 2016

16

17

waterfall model agile programming

18

waterfall model

spiralmodel

agile programming

Xtreme programming

19

waterfall modelrequirements analysis and

specification

architectural design

implementation and integration

verification

operation and maintenance

20

21

process and tools

individuals and interactions

working softwarecomprehensive documentation

customer collaboration

contract negotiation

responding to changefollowing a plan

agile manifesto

22

if short iterations are good, make them as short as possible

eXtreme Programming

if simplicity is good, do the simplest thing that can work

if testing is good, test all the time

if code reviews are good, review code continuously

23

server - thin client model

short iterations

data centric development

simplicity

testing testing testing

rapid feature development

reuse and modularity

user centric development

implications of the shift to agile dev

24

rapid feature development

product manager

boss/VC said so

where do features come from?

designers said so

competitor did it

where do features go?

behavioral analytics

feature based optimization

customer

25

data centric development

predictive modeling 4 pricing

user churn

user/behavioral analytics

data productsmetrics

anecdotes

data centric development

26

website

new information panel

A/B testing

27

• recursively keeping track:

• capturing behavior of users

• capturing behavior of service components

• capturing behavior of your capture models

• QA and continuous monitoring become one thing

optimizing people, environment and infrastructure

28

time

pay per use

service bundle

use

feature space

consent

29

WHAT ISSUES DO THESE SYSTEMS RAISE?

30

These systems capture knowledge of people’s behavior, and they reconfigure them through rapid development of features that are able to identify, sequence, reorder and transform human activities.

This also means that they open these human activities to evaluation in terms of economic efficiency. Philip Agre.

Philip Agre: Two models of privacy

31

“ensuring privacy in clouds”who is responsible for privacy?

Lack of transparency, assurance, accountability

lack of clear responsibility

Service Level Agreements: users are dependent on data controller (no leverage on contracts)

where is the data geographically?(jurisdiction: which government will knock on your door/eavesdrop you?)

Service Level Agreements:

what’s the scope of third party access?

what security practices are used?

how are backups and data retention managed?

how is individual consent and subject access managed?

lack of trust regulatory challenges

32

ISSUES AGILE TURN RAISES

these systems are not information or knowledge systems but optimization systems

optimizing ads/search/recommendations -> optimizing people, geographies, infrastructure

algorithmic accountability -> systems are continuously optimized, not just algorithms

centralize decision making, trump reasoning with optimization, remove due process

explanation (towards individuals) not sufficient/misleading: does not show optimization

explanation (towards individuals) not sufficient: does not reveal optimization and externalities

organizational accountability as a way to empower institutions/agency wrt such systems?

33

ISSUES AGILE TURN RAISES

they involve a series of privacy, security and safety issues (Amodei et al. 2016)

Accidents!?!?!: Unintended and harmful behavior that may emerge from ML systems when we specify the wrong objective function, are not careful about learning process,

or commit other ML related implementation errors (∆designer intention & outcome)

algorithmic discrimination: our datasets are already racialized/gendered etc.

lots of studies on deceptive users (adversarial ML) -> few studies on algorithmic manipulation

Most systems are vulnerable to data poisoning attacks (introducing training data that causes a learning system to make mistakes), adversarial examples (inputs designed to be misclassified by machine learning systems), and the exploitation of flaws in the design of autonomous systems’ goals. AI systems may cheat themselves to optimize their system goal

34

thank you!

35

references• Philip E. Agre, Surveillance and capture: Two models of privacy, The

Information Society, Vol. 10, Iss. 2, 1994

• Irina Kaldrack and Martina Leeker, There is no software, just services, Meson Press, 2015.

• Gürses and Van Hoboken, Privacy After the Agile Turn, Cambridge Handbook of Consumer Privacy, https://osf.io/27x3q/ (upcoming)

Interdisciplinary Summer School on Privacy

https://isp.cs.ru.nl

International Workshop on Privacy Engineering

http://IWPE.info

36

top related