phil pearce - blackhat analytics

Post on 29-Aug-2014

9.182 Views

Category:

Business

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Don't miss the next year of Marketing Festival Brno - http://www.marketingfestival.cz You can also buy a video of this presentation at marketingfestival.cz

TRANSCRIPT

[photo of generla zorg]

BlackHat Analytics 2:Privacy Wars in the future

#BlackhatAnalytics #MktFest

Web Analytics Exchange mentor

750 GA questions answered

Tracking protection group

(DNT)

WelcomePhil PearcePPC, Privacy and Analytics ExpertFreelancer@philpearcephil@precisionppc.mewww.linkedin.com/in/philpearce

Summary

1.Background2.Definition3.Example Techniques4.Classifications5.Penalties6.Industry issues7.Why the imbalance?8.Class action wars9.Privacy apocalypse & Big Data10.Look at the future…11.Questions

A long time ago...… in a google universe far, far away...

Define: Blackhat Analytics

Define: Blackhat Analytics

Define: Blackhat Analytics“0” results

If you do this search now...

Define: Blackhat Analytics

Me

Me

It turns out...

...I know more than Google ;)

HypothesisAt some point in the future "BlackHat Analytics" or “Faking Conversions” might become more widespread. Because...

1. WA is becoming more important for business decision making.

2. Automatic performance based PPC bid management system are becoming more widely used.

3. Increase in online competitiveness & more revenue at stake.

Definition

Intentional act of distorting, deleting, unethically using, or hijacking WA data using technical or

legal loopholes; with the goal of making financial gains, or obtaining a competitive advantage.

Phil Pearce 2009

Evil tracking from pre-2010Referral backlink log spam (depreciated SEO technique)

GA log spam (Spider visit loading JS) Visited links CSS hack (History Sniffing)

Flash cookie respawn (Zombie Cookies)

EverCookie (all of the above+)

Ad behavioural targeting (Interest Based Stalking)Remarketing Ads (Return Visitor Stalking) - Starwars stalkerSafari 3rd party POST cookie (Preference bypassing)

NEW “Headless Browser” spam

Super evil: EverCookie

The EverCookie was so difficult to delete:

even NSA considered using it!

Source: http://www.slideshare.net/jonbonachon/tor-stinks

DECLASSIFIED

But they decided they did`nt need it ;)

Classification

Intent Accidental MaliciousTarget Own website Competitors websiteData collection Purpose Same Different purposeScale Niche Mass effectImpact Data uneffected GA Account deletion

Intent Accidental Malicious

Target Own website Competitors website

Purpose of data

collectionSame

purposeDifferent purpose

Scale Niche Mass effect

Impact Data uneffected

GA Account deletion

.

MalintentMalintent

Bad/Unreliable Measure Data

Good/Accurate Measure Data

Classifications

Flash Cookie Respawn

Cashback cookies (e.g Quidco)

Flash Cookie

EverCookie

CSS history sniffing

Speed checking robots

Hostname spam

Fake conversions

Google Wifi incident

Google (not provided)

Phone call logs

App error logs

Referral log spam

Unintentional or Accidental

MalintentMalintentUnintentional or Accidental

Bad/Unreliable Measure Data

Good/Accurate Measure Data

Updates

More good/reliable

measure data as less

accidental data

mistakes

Speed checking robots

Hostname spam

Google Wifi incident

If nasty tracking code is installed - Who is liable?

Liability for Privacy & Security

Is the agency liable? No.

BUT agency is responsible for* Uphold professional standards (e.g. GACP status)* Pro-active client relationship

Local laws say Website Owner is responsible (not agency or Vendor)

Why do people still do this bad stuff?

The Lure of the Dark side is too strong!

Its all about the money! £££

Affiliate networks looking to increase CPA and attract new Affiliate.

Online News website looking to retain users & sell stories (e.g. NYT)

Banner networks looking to improve CPM & reduce cookie deletion rates and overcome keywords “not provided”.

Sustained CPC bidding wars

Big data

But there is a disturbance in the task force...

Meet the new Matt Cutts ...

Google Privacy “Red” team soon to be hired in 2013 following FTC settlement.

Mission to discovering and prioritizing subtle, unusual, and emergent privacy & security flawshttps://www.google.com/about/jobs/locations/mountain-view/engineering/systems/data-privacy-engineer-privacy-red-team-mountain-view.html

Hired WebSpam fighter to Force quality improvements in 2000.

http://www.mattcutts.com/blog/about-me/NEW

Matt “Red team” leaderCutts

“Internal” Imperial Bureau Security

New Google Product Manager of Privacy & information security

F@#K - GA account deleted!

You will not collect any data that personally identifies an individual such as a:

full name email address billing information or other data which can be reasonably

linked to such information by Google

You must post a Privacy Policy which provides notice that your use of cookies is to collect traffic data.

You must not circumvent any privacy features (e.g, an opt-out) that are part of GA.

www.google.com/analytics/terms/us.html

Why cant GA just remove the bad PII data?

Free WA packages unable to remove PII without deleting whole GA accounts!

Raw logs are only stored for ~30days Right to be forgotten was introduced after GA was

designed.(although this might be possible with Universal which is user-centric, not visitor-centric)

“Sensitive” data also is an issue

http://en.wikipedia.org/wiki/Personal_identifier#Examples_of_PID

www.yoursite.com

privacy@google.comhttps://support.google.com/adwords/answer/8206?contact=1&rd=1

site:comptetitor.com inurl:"utm_content * gmail.com“http://www.google.com/#q=inurl:cz+inurl:utm_content+*+gmail&pws=0&num=100&filter=0&as_qdr=all

e.g. www.fashiondays.cz/campaigns/?cod=xxxx&mail=NAME.REMOVED@gmail.com&utm_medium=email&utm_source=new_registration_pagE&utm_campaign=new_registration_page&utm_content=Layout_B

Example1: Accidental PII

Solution/Counter-measure for Accidental PII

Or use temporary robots.txt fix:User-agent: *Disallow: /*utm_medium=emailDisallow: /*gmail.comNoarchive: /*utm_medium=emailNoarchive: /*gmail.com

Add exclude parameters to GWT:

eail, emailutm_source, utm_medium,

utm_campain, utm_content, utm_keyword

Legal Disclaimer: The purpose of this example is to demonstrate a hole in all Analytics platforms, and how to patch this hole. It is used for TESTING purposes ONLY.

By reading this example you agree to NOT use this on a live website, and agree that I (Phil Pearce) and NOT liabilities for and damage that a website owner may suffer arising out of this example & tool.

If you are in any doubt, please seek the advice of the Google legal team www.google.com/contact/ or your local legal counsel BEFORE testing.

Note: This issue has been raised on the GACP private discussion forum 6months ago, prior to this event.

Disclaimer

Example2: Do you recognise this number?

-92,23,372,036,854,775,807

It is a Quintillion or “Big Integer”

Intentional Data damageWARNING: Don’t Try this at Home!

javascript:_gaq.push(['_setAccount' ,'UA-xxxxxx-1'],['_addTrans','8148350','affiliation','-9223372036854775807','-9223372036854775807','0.00','-','-' ,' -' ],['_addItem','SKU00001','8148350','BIG refund' ,'-','-9223372036854775807','1'],['_trackTrans' ]);

http://www.google-analytics.com/__utm.gif?utmwv=5.4.6&utms=44&utmn=393079074&utmhn=domain.com&utmt=tran&utmtid=8148350&utmtst=affi liation&utmtto=-9223372036854775807&utmttx=-9223372036854775807&utmtsp=0.00&utmtci=-&utmtrg=-&utmtco=-&utmcs=UTF-8&utmsr=1366x768&utmvp=1366x550&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=11.9 r900&utmdt=TITLE&utmhid=509485053&utmr=-&utmp=/&utmht=1385061484294&utmac=UA-XXXXX-1&utmcc=__utma=251194116.2116214072.1385060410.1385060410.1385060410.1; __utmz=251194116.1385060410.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);&utmu=qjAL~

Solution/Counter-measure for intention Data Damage

Tool to manually fix… bit.ly/bigintegerfix

Legal Disclaimer: The purpose of this example is to demonstrate a hole in all Analytics platforms, and how to patch this hole. It is used for TESTING purposes ONLY.By reading this example you agree to NOT use this on a live website, and agree that I (Phil Pearce) and NOT liabilities for and damage that a website owner may suffer arising out of this example & tool.If you are in any doubt, please seek the advice of the Google legal team www.google.com/contact/ or your local legal counsel BEFORE testing.Note: This issue has been raised on the GACP private discussion forum 6months ago, prior to this event.

Fine calculator

.

Fine = (No. users effected * Scale badness * Size of Brand) less

(Website Risk assessment + Vendor privacy self certification)

Czech Maximum Fine is CZK 10,000,000 / £300K

(typically the fine is CZK 10,000 / £0.3K with the highest fine so far £70K to Institute

for Drug Control for unlawfully data collection & processing)

Office for Personal Data Protectionuoou.cz

Fine example of CZK 10,000 to Solus Association last month for misuse of financial datahttp://aktualne.centrum.cz/finance/penize/clanek.phtml?id=792359

No cookie fines so far.

Consumers VS Advertiser

But there is still an Imbalance in the force

Maturity in Advertising sector

User data allows better Ad targeting = £

MORE data better targeting = £££

Because…

Data is power

We do'na the datacapt

Rise of the Big Data Empire

Data Greed &

Fear of losing existing user data

Dark motivations:

Triggered…

Group/Class Action Wars

Note: “Class” is a collective of users(e.g. “South Bohemian Mothers group” vs Temelin nuclear Power plant)

Define: Class Action Prosecutor they represent the users.

Like Affiliates (i.e revenue motivated)but larger resources & clever-er

For example….

US Class Action Prosecutor: Like bounty hunters, but more

… sophisticated!

BIG class-action fines in US

Do class action lawsuits exist in Europe or are they only in US?

Question…

Class Action Prosecutors: also now active in UK!

e.g. Google UK vs Olswang Class Action (Safari 3rd party cookie bypassing on iOS)

First every UK “group action” vs Google UK on Feb 2013 claiming 10m Safari users effected

www.googlelawsuit.co.uk and www.facebook.com/SafariUsersAgainstGooglesSecretTracking

UK test case, could set precedent for

EU class-action cases!

BUT… Group-action picked the “Wrong Google” when the case was submitted, doh!

They accidently selected “Google Inc” not “Google UK”http://www.infosecurity-magazine.com/view/34033/google-responds-to-british-lawsuit-uk-privacy-laws-dont-apply/

Successful class action raids in US…

Settlement funds 50:50 between users and Class Action Lawyers. Previous settlements 70:30, thus smaller % cut for Class Action Lawyers, but huge volume claim.

£13million hit

£10per user

£6.5million

W3C republic – A new hope for Truce

Must be UNSET by default

DNT user signal

Browser ignore the W3C consensus

Firefox: Talk`s about a blockade of 3rd party cookies

MS: Windows8 IE10 rollsout DNT=1 which is UNSET by default!

Firefox Lost battle: Too many False positive

Firefox says its Han`s are tied for a few month on

3rd party cookies

Dark Side too powerful ;)

MS IE10 DNT=1 browser signal

ON by default…

http://www.ypolicyblog.com/policyblog/2012/10/26/dnt/

http://www.admonsters.com/article/apache-ignores-ie10-dnt-signal

…IE10 DNT signalgrounded

…Both Apache & Yahoo threaten to ignore DNT=1 from IE10…

Allow “Good” cookies

Alternative Cookie Clearinghouse proposed (like stopbad malware list)

Block “Bad” Cookie`s

W3C republic 2 years reign – disunity rules!

Peter Swire - Chief resignJonathan Mayer – Firefox resignThomas Roessler - Joins Google

http://lists.w3.org/Archives/Public/public-tracking/2013Jul/0550.html

Advertising Principles (AdChoices) proposed as alternative principles to

W3C`s DNT

Privacy in the Universe restored!

Users have choice & freedom within the Global Galactic Empire

But… The secret arms race

BIG Data Centre…with ability to process:Device signaturesUserID respawnCustom Remarketing

Also affiliate networks start building Device Signature conversion tracking tools:We (tradedoubler.com) are looking at options such as device recognition, using non-personally identifiable information that is freely available from a user’s device. Using advanced matching algorithms a single device can be recognized at the point of impression/click and conversion without the use of cookies. http://www.tradedoubler.com/uk-en/blog/firefox-22-cookies/

The Dark Star

Plans for Device Signatures leaked

Belgiumadvanced

scanner study (by KU Leuven University)

War for Anonymity (aka War of Shadows)

Browser (excluding Chrome) secretly move to anonymise device signatures

Thus… destroying any

shadow tracking

So that all customised devices extensions look the same!

Facebook(Borg) & Google(Empire) forced browser push DNT=0 on user login

Prism Tracker

Unexpected “Snow den monster”

Enforcers/regulators get a boost of user support

Ed

Headless Browser robotic crawler causing havok in GA data!

Impossible to differentiate from a real user!www.webmasterworld.com/search_engine_spiders/4619880.htm

nodejsmodules.org/new/tags/spider

Examples of Headless Browsers:• Zombie.js• Phantom.js• HtmlUnit

Definition: A headless browser is a web browser WITHOUT a user interface.

Authenticate/Logged-in user tracking might be the only method of prevention

Jedi Strike

2014 invasion of Privacy officers

Forced 2% global revenue power

University Research divisions expand the use of Taint Droids

Note: Anti-train droid link:http://gsbabil.github.io/AntiTaintDroid/

Polarisation:

Dark get darker (e.g. IE fav icon 3rd party cookies bypassing hole)

White get Whiter (e.g. duckduckgo.com & ixquick.com)

Fines/Lawsuits

Low Chance of Blackhat Detection

High Chance of Blackhat Detection

Balance of Power

Ad Revenue

Class Action Prosecutors

Jedi Enforcers

Google Data Empire

Facebook Borg

Browsers (in the middle)

Affiliates

…HAS CAUSED USER CONFUSION

& A MUDDLE

Because…

LITTLE MISS INFORMATI

ON

THIS HAS CAUSED USER CONFUSION

& A MUDDLE

So… Are we the bad guys?

In the eyes of the user… YES!!

…How do WE prevent big corporations (and niche bad players)

misusing user data/power?

With Great Data comes Great responsibility

Industry need to govern & enforce itself!

Look to the future…

That’s means YOU need to agree not break the analytics code of honour

AND make sure no one else abuses the system!

Good Bad Report any thing that

looks a bit “Grey”

Standards & Self regulation

• Vendor built-in privacy & miss-use protection• Adwords & Adsense ToS levels• Affiliate network guidelines

• WAA Code of Conduct• GA qualified individual• GAP certified partner• WAA Certified Ethical Analyst• Risk assessment / Compliance audit• Third party reviews & compliance automated monitoring

Please look out for U.i.O

User Intent Override

Is this a User Intent Override?

UIO?

Need for Industry standards and Honey pots / seeds tests.

Forced Training & Accreditation (e.g. Certified Analyst or MOWA member)

Google Adwords privacy cpc tax and Google organic SERP ranking bonus

Fixes (GA profile filters) GA profile filters:

Hostname include filter: (^|\.)yourdomain.com$

ISP location exclude Ask.com bot: ^(inktomi corporation|iac search and media europe ltd|iac search media inc|yahoo\! inc\.|facebook inc\.|stumbleupon inc\.|dub6 ec2|site confidence test agent servers|site ?confidence|apache ltd\.|nielsen netratings|affinity internet inc|microsoft corp)$

Top content report - Contains box: (email|add|postcode|zipcode|tel) or [?&](.+)=(.*)gmail\.com

Weekly scheduled report to check for the above Check data stored in

utm_content, User-defined, CustomFields & Event fields

Check all GA profiles including Raw Data profile for PII`s, and add exclude parameters where necessary.

Fixes (process changes) Account protection

Training for developers and marketers Check Scheduled reports not sending to

unknown users. Limit number of Number of Admin users Enable 2 stage authentication if possible. Looks for unusual variances of data spikes in

GA (especially new visits to homepage) CPA audits (GA vs Affiliate report)

Back to the present day…

Expected soon

Yikes… are they Disabling Tracking??

…California DNT track law Sept 2013

I`ll be track-ed (still)

California just asks for DNT visibility(i.e. Does your server see the DNT signal?)

Prevention Use a tag management system, that is configured with

digitalData layer privacy features enabled (see appendix)

Try to use POST request rather than GET request where possible, or a form action=/thankyoupage.html

Keep pdf reader, flash & java updated

Lockdown FTP to fixed set of static IP`s, and use 2stage Authentifcaiton for GTM write-access.

This is how things should be…Closing Remarks

Google acts even more responsibly

Facebook introduces a more human privacy interface

Users should not needing to relying on despicable class action lawyers

Enforcers just watchers not needing to intervene

May the Marketing Force be with you!

Party!Party Tonight:19:30 NVMERI 20:10 MyCool King + DJ Trush 21:00 Charlie Straight22:15 midi lidi

Questions

Appendix…

Small Favour http://www.digitalanalyticsassociation.org/?page=codeofethics Google for “DAA code of ethics” Please Sign!

Also see UK institute of analysts code of conduct.

DISCLAIMER – I`m not a lawyerGA terms of servicehttp://www.google.com/analytics/terms/us.htmlhttp://www.google.com/analytics/learn/privacy.html

Privacy Trouble shooterhttp://support.google.com/bin/static.py?hl=en&ts=1291807&page=ts.cs

Report a privacy concernhttp://www.google.com/contact/

Contact Google Analyticshttp://support.google.com/analytics/bin/request.py?hlrm=en&contact_type=contact_policyhttps://support.google.com/adwords/answer/8206?contact=1&rd=1

Report a security concernsecurity@google.comhttp://www.google.com/security.html

Discussion Questions How much is your data worth? Can you afford to drive traffic in the dark with no

insight? Is PII or sensitive data or urls being accidentally

tracked? Can competitors detect that PII data is being sent

into GA? Are you in a very competitive industry? When was the last time you audited your WA

installation? Are you capturing data that easily allows an

individual to be “linked” or “re-identified” by Google (e.g. detailed demographic data example, or Netflix.com + IMDB.com example1 or example2)

Related presentations & resources

.

CookieTAB virus screenshotshttps://www.dropbox.com/s/w0gprycb23ajguw/2011_03_18%20CookieTAB%20virus%20screenshots%20.pptx

Effect of EU Cookie law on US businesses: https://www.dropbox.com/s/ces1m53mm7o4gmm/2012-10-04%20GAUGE%20Boston%20-%20Effect%20of%20EU%20Cookie%20law%20on%20US%20organisations.pptx

Recipe for a Cookie Lawhttps://www.dropbox.com/s/l9n3gchusdv57bm/2011_03_18%20Recipe%20for%20a%20Cookie%20Law%20by%20Phil%20Pearce%20.pptx

Cookie law Implementation Exampleshttps://www.dropbox.com/s/7q8qfxesk44tpkc/Implimentation%20Examples%20by%20Phil%20Pearce%202012_03_18.pptx

Cookie compliance Audit - Example.docxhttps://www.dropbox.com/s/idyrql6c1aniaw6/01%20UK%20Cookie%20compliance%20Audit%20-%20Example.docx

CookieLaw research in 90mb Dropbox: https://www.dropbox.com/s/uapu90d7rc2uxl1/2012_Cookie_Law_Resources_Folder_40mb_Download.zip

AppendixExternal privacy feedback mechanisms:safeharbor.export.gov/companyinfo.aspx?id=16626feedback-form.truste.com/watchdog/request?url=www.google.comwww.bbb.org/sanjose/business-reviews/internet-services/google-in-mountain-view-ca-214105/file-a-complaintwww.networkadvertising.org/contact-support/report-problem/i-would-report-violation-of-nai-code-nai-member-company-2www.snapsurveys.com/swh/surveylogin.asp?k=133707671186 [ICO.gov.uk form]addons.mozilla.org/en-US/firefox/addon/privacy-dashboard/ [W3C feedback mechanism]www.google.com/trends/explore?hl=en#cat=0-14-54-1281&geo=US&date=today%203-m&cmpt=q [user web searches in category of “privacy” per country]

Security & Privacy prize of upto £13K offered by Google for detecting holes:www.google.com/about/appsecurity/reward-program/blog.chromium.org/2012/08/announcing-pwnium-2.htmlExample XSS hole in GA found in 2008: derkeiler.com/Mailing-Lists/Full-Disclosure/2008-12/msg00200.html

Open Source feedback techniques fourthparty.info/dataappanalysis.org/download.html

Free to check cookie databases:www.cookielaw.org/cookie-search.aspx?domain=http://www.facebook.comwww.cookiecert.com/cookies-for-facebook.comprivacyscore.com/score_details/2a03b4fe8d9d4eb8b4fb0ccf356cbaaa/showcase

Privacy by Design: Client-side valuesSourced from customer experience digitalData Privacy sub-group on JavaScript objects with Privacy meta data

digitalData.privacy.mapping = [ "page" : "public", // describes the page itself "product" : "public", // further describes the product associated with the page "cart" : "identifiable", "transaction" : "sensitive", "transaction total" : "private @analytics", // defined as private, but we make an exception for our analytics tools "event" : "public", "privacy" : "public", // everyone should be able to see the general privacy definitions and policy "privacy mapping" : "@system", // in this example, we decide we don't want to expose the exact policy mapping "user" : "identifiable", "user segment" : "public"];

// PP: Proposed digitalData layer privacy object for VISITORdigitalData = { "visitor": { "returningStatus": "new", // new or returning visitor: used to only trigger consent message for new visitors "preferenceForDNT": window.navigator.doNotTrack, // yes|no|"not specified". MUST defaulted to "not specified" "anonymizeIp": false, // hash last 3 characters of IP address in GA. Defaulted to off/false. "geoplugin_status": geoplugin_status, // 403 error, 200 is look-up ok "geoIPcountryCode": geoplugin_countryCode, // geo-plugin JS variable "geoIPcontinentCode": geoplugin_continentCode // geo-plugin JS variable }, {// Server-side USER values on login or registration "user": { "profile": { "auth_isSignedIn": true, "auth_isNewRegistration": true, // used to only trigger consent message on first registration "auth_userIDtoSessionIDoveride": false, "profileID": 12345 } } } }

Privacy by Design: Server ResponseEffectively this is saying… serverResponseForDNT = obeyDNT|ignoresDNT|inprogressDNT|"not specified"

// From DNT Preference Expression Spec: http://www.w3.org/TR/tracking-dnt/#status-representationhttp://www.w3.org/2011/tracking-protection/drafts/tracking-dnt.html#status-representation

{ "targeting": "yes", // IsOnlineBehaviouralTargeting for Publishers OR onsite remarketing for Advertisers enabled? "tracking": "yes", // Is AudienceMeasurementTracking enabled "qualifiers": "afc", // external "A"udit + "F"raud prevention + ad-frequency "C"apping "controller": "http://www.yourdomain.com/privacy.html", "same-party": [{ "google-analytics.com", "stats.g.doubleclick.net", "api.youtube.com" }], "third-party": [{ "googleadservices.com" "ads.doubleclick.net", }], "audit": [{ "http://policy.cookiereports.com/caf4f823-en-gb.html" // e.g. w3.org/P3P/validator.html }], "policy": "/privacy.html#cookies", "edit": "http://www.yourdomain.com/user-dashboard/edit-your-data"}

top related