ensuring real estate website listing data security

25
Ensuring Real Estate Website Listing Data Security Avoid Litigation by Protecting Your Listing Data Before the Theft Occurs

Upload: distil-networks

Post on 13-Apr-2017

344 views

Category:

Real Estate


0 download

TRANSCRIPT

Page 1: Ensuring Real Estate Website Listing Data Security

Ensuring Real Estate Website Listing Data Security

Avoid Litigation by Protecting Your Listing Data

Before the Theft Occurs

Katherine Oberhofer
[email protected] would it make sense to include the OntheHouse case study slide into the presentation?
Charlie Minesinger
sureCharlie MinesingerDirector of Solution SalesP: 703-962-1614C: 202-361-9171Skype: cmine2468W: www.distilnetworks.com*Only 41% of Web Traffic Originates from Humans; Mobile Bots Arrive inDroves.*2015 Bad Bot Landscape Report<http://info.distilnetworks.com/hubfs/2015_Bad_Bot_Landscape_Report.pdf>
Page 2: Ensuring Real Estate Website Listing Data Security

Presenters

Charlie MinesingerDirector of Solution Sales

Distil Networks

Matt CohenChief TechnologistClareity Consulting

Page 3: Ensuring Real Estate Website Listing Data Security

Introductions and BackgroundTrends in Scraping Real Estate WebsitesOverview of Study and FindingsImmediate Opportunities and Threats from Scraping

Agenda

Toward better Security for Real Estate Data Online

Page 4: Ensuring Real Estate Website Listing Data Security

Distil in Real Estate and Premium Brands

Page 5: Ensuring Real Estate Website Listing Data Security

Market Leader in Bot Detection and Mitigation

● Only bot detection vendor to be included in Gartner’s 2015 Online Fraud Detection Market Guide

● Key Attack Trend: “Fraudsters spreading their attacks over thousands of IP addresses”

● Key Inclusion Criteria: “Ability to detect online fraud as transactions occur in real time or near real time”

● Interesting to note: No WAF vendors in this report (as their detection model is primarily rules-based)

Page 6: Ensuring Real Estate Website Listing Data Security

What Is Web Scraping?Web ScrapingAlso known as screen scraping, web scraping is the act of copying large amounts of data from a website – either manually or with an automated program (Bot)

Legitimate ScrapingScraping can sometimes be benevolent and totally acceptable. For example, the search engine bots that index your website

Malicious ScrapingA systematic theft of intellectual property accessible on a website, including pricing, content, images, and proprietary data

Page 7: Ensuring Real Estate Website Listing Data Security

MLSs:○ Obligation to protect copyright○ Higher cost to use reactive methods - beacons, legal, etc○ Duty to enforce NAR Policy (VOWs, so far)○ Missed revenue opportunities for licensing content

Brokers / Agents:○Provided content license on listing for specific purpose○Responsible for NAR Policy (VOWs, so far)○Stale (scraped) data undermines trust and reputation in

brand○Higher costs - bots drive up costs for online services

Why Bots / Scraping is a Problem in Real Estate

Page 8: Ensuring Real Estate Website Listing Data Security

Software Vendors / Publishers:○ Resource Utilization – more servers and bandwidth costs○ Poor Website Performance – latency and brownouts, etc.○ Clean up Marketing Metrics – optimize for humans○ Ad Fraud – advertisers are not paying for non-human traffic○ People Resources – keep your team focused on revenue!

Bottom LineScrapers scrape because they are making money with your listings! And the Real Estate industry is left with...

→ Higher costs→ Lost revenues

Why Bots / Scraping is a Problem in Real Estate

Page 9: Ensuring Real Estate Website Listing Data Security

Realtor.org offers free tools to track data - Reactive = expensive○Checklist for Syndication has many references to data scraping – legal guidance○NoScrape – aborted project - no update since 2010?

Problem is not going away

Industry Help? ...Way behind on Bad Bots

Ads for Scraping Programs on Realtor.com!

Realtor.com blog to “deter scraping” relies on obsolete IP address blocking and expensive IP litigation“REALTOR.com® logging, tracking and monitoring patterns that indicate data is being stolen for these illegitimate purposes. Once an offender is identified, their IP address is blocked from accessing the site.” (Oct 10, 2014)

Page 10: Ensuring Real Estate Website Listing Data Security

Scraping as a service sites proliferate – scraping VERY accessible!o Search for “web data scraping” on elance.com, odesk.com,

freelancer.com, etco Google Search terms: “scraping real estate data” and “scrape MLS

listings”o Services: Mozenda.com, 80legs.com, webharvey.com, scraping.pro,

etc

Problem is not going away

Web Scraping - Cheap, Easy & DIY

Page 11: Ensuring Real Estate Website Listing Data Security

Costs of Scraping MLS Data○ Resource costs - 10% to 40% of server utilization and

bandwidth○ Customer Care - Cost per call from consumer? Calls per

month?○ Website Performance – brownouts results in 3 days of low

traffic○ Ad Fraud - If 30% of ads are seen by bots, are advertisers

paying?○ Lead Gen… $15/mover, $30/storage facility, … $100s per

listing going to third parties, not the broker, not the agent→ Biggest Losers: MLS and BrokersValue of solution?

○Antivirus is $40 to $75 year per member ( = $3 - $6/month)

○Anti-scraping protection should be same or less cost

Bottom Line on Scraping

Page 12: Ensuring Real Estate Website Listing Data Security

For now, two surveys:

○MLS Executives - 100 MLS Executives rep. MLSs with over 600,000 subscribers.

○ IDX Vendors – 14 rep. 400,000 IDX & VOW websites. Others would only speak

informally. Because they manage the largest set of scraping targets

Email invitation, web-survey over several weeks.

Study Methodology

Because they play a part in all scraping contexts – MLS, Publishers, and IDX/VOW.

● Technology Selection. Selects and contracts for the MLS systems.

● Data Licensing. Manages the data license agreements with the Advertising Portals

● Industry Policy. Collectively set IDX / VOW rules

Page 13: Ensuring Real Estate Website Listing Data Security

99% say compliance with rules protecting misuse of MLS data is importantImplementing anti-scraping should be a priority for MLS vendors:

95% agree that IDX sites should be subject to rules specifically mandating scraping protections. This needs follow-up w/ NAR committees.

59% of respondents do NOT test VOW sites for anti-scraping compliance

Most testing performed is not rigorous

Some rely on self-reporting

98% of respondents want a set of standardized tests to verifythat VOW and syndication

sites are protected

MLS Study – Key Results

Elias Terman
[email protected] can we get a more relevant image in here?
Page 14: Ensuring Real Estate Website Listing Data Security

43% of IDX/VOW vendors were not aware of issue pervasiveness.62% rate Compliance with MLS rules is most important factor in having IDX/VOW vendors implement an anti-scraping solution

Other drivers for adoption of anti-scraping protection ○Customer demand for anti-scraping protections○Cost of infrastructure use/abuse ○Security concerns○System performance issues

IDX / VOW Study – Key Results

Elias Terman
[email protected] [email protected] why this image? Don't we have infographic type images from the surveys?
Charlie Minesinger
This CAPTCHA image? not sure. No strong preference about this for me
Katherine Oberhofer
[email protected] [email protected] this slide and the prior were done by Beecham. We do not have an infographic from the surveys yet, as the report is still being worked.
Page 15: Ensuring Real Estate Website Listing Data Security

○ 50% of IDX vendor respondents believe 15-30% bot traffic is acceptable

○ 50% believe less than 1% bot traffic is acceptable (more like MLS)

○ Most IDX/VOW vendors are using reactive detection tacticsLog analysis - reactive and labor-intensive monitoringIP-based methods - ineffective against sophisticated scrapersObsolete Preventions - IP-based rate limiting and CAPTCHAs

→ Likely underestimating (missing bots) with these methods!

○ More than half cannot identify the costs of bots to their business...if you cannot measure it, you cannot manage it, & certainly not budget it

○ While 100% put NAR compliance as a priority, only 25% have budgeted for services to provide anti-scraping service to comply with VOW rules

IDX / VOW Study - Misaligned, Lacking Key Data

Page 16: Ensuring Real Estate Website Listing Data Security

○Scripts, such as CURL or Ruby, making requests at any rate○Selenium, fully automated browser making requests at any rate (fully automating browser)○Headless browser with or without Phantom JS (fully simulating browser, browser pre-rendering)○IP cycling using any bot technology at rate of less than 5 requests per IP Address, then change IP○Crawlers - at any speed, even slow crawlers making 10 requests per minute or less○Anonymized proxy for IP to make requests using any technology or at any rate of requests○Spoofed bot user-agent, e.g. using fake “googlebot” or “bingbot” as user-agent, IE running on Linux, etc○Non-Browser user-agent, spoofed user-agents for mobile browsers or mobile applications○Blocking traffic from data centers and hosting providers (why would consumers be using those IP?) ○Blocking bots from Consumer ISPs while letting legitimate requests through

It’s An Arms Race … More Detail:

Modern Anti-Scraping Tool Requirements

Page 17: Ensuring Real Estate Website Listing Data Security

○ 7 of top 10 sources of bots are Consumer ISPs: (1) Comcast, (2) Time Warner Cable, (3) Verizon FIOS,

(4) Charter, (5) Cox, (6) CenturyLink, and (7) AT&T Uverse

○ 50% - 75% of bot traffic on RE sites is from Consumer ISPs

○ Most Consumer ISPs had 1,500+ IPs with bot traffic

○ 18-45% Automated browsers - mimicking humans

○ 14-25% in Bot Database - fingerprinted, known bots

○ 16-42% Slow Crawlers - recycling IPs and user agents

Highlights of Bot Sophistication in Real Estate

The Facts on Scraping Real Estate Data

Page 18: Ensuring Real Estate Website Listing Data Security

Purpose Built Solution, Not a Feature

Bot Detection is a New Category, NOT a Feature○ NOT a Content Delivery Service (CDN)

○ NOT a Distributed Denial of Service (DDoS) protection solution

○ NOT a simple IP list or set of scripts

○ NOT a Web Application Firewall (WAF)

A purpose built bot detection solution is always updating and evolving

Page 19: Ensuring Real Estate Website Listing Data Security

Catch 99.9% of Malicious Bots with Distil

A Typical WAF Catches 20%

IP BLOCK

USER AGENTTESTING

IP ANALYSIS

USER AGENTTESTING

JAVASCRIPTTESTCOOKIE

SELENIUM TEST

BROWSER RATE

LIMITING

AUTOMATED BROWSER

PHANTOM JS

MACHINELEARNING

IP CYCLING

Distil Catches up to 99.9%

Page 20: Ensuring Real Estate Website Listing Data Security

Detect Your Bot Traffic

Page 21: Ensuring Real Estate Website Listing Data Security

Control Over Your Bot Traffic

MonitorMonitor to inspect requests and record the traffic to Distil and/or your own server logs BlockSet to Block to serve the client an unblock verification form

CAPTCHA Serve a hardened CAPTCHA to test the client for verification

DropDrop them to present them with an access denied page

Page 22: Ensuring Real Estate Website Listing Data Security

Flexible Deployment OptionsCloud

○ Deploys in hours

○ Blazing fast Anycast DNS-based GeoIP Routing. Automatic content compression optimizes for faster delivery

○ 17 datacenters automatically fail over when a primary location goes offline

○ Automatically increases infrastructure and bandwidth to accommodate spikes

USER DISTIL CLOUD CDN

LOAD BALANCER WEB SERVER

Page 23: Ensuring Real Estate Website Listing Data Security

Flexible Deployment OptionsPhysical or Virtual Appliance(s)

○ Install on virtualized or Bare Metal appliance(s)

○ Deploys in days

○ High availability configurations with failover monitoring

○ Heartbeat up to Distil Cloud

USER INTERNET LOAD BALANCER WEB SERVER

DISTIL APPLIANCE

Page 24: Ensuring Real Estate Website Listing Data Security

Best of Breed Solution will Include:

○99% Accuracy, cannot rely on IP address to identify bots or use rate limiting on IP○Dedicated Service - NOT a button/feature/add-on○Layers of tactics, multiple detection tactics, with ongoing R&D○Easy to Implement - deploy in days or weeks○Real-time detection and mitigation - be proactive to save time and money○Flexible Configurable options for actions to mitigate bots○Affordable cost per member, per site, or per MLS - flexible business model

Selection Criteria for Anti-Scraping

Page 25: Ensuring Real Estate Website Listing Data Security

www.distilnetworks.com

QUESTIONS….COMMENTS?

1.703.962.1614http://resources.distilnetworks.com/h/c/175726-real-

estate

Call CharlieC H A R L I E @ D I S T I L N E T W O R K S . C O

M