data driven security
TRANSCRIPT
Security in the context of APIs = Adaptive and Data Driven
Source: Incapsula
Velocity and Exposure to Abuse are two sides of the same coin.
Exposure
Undesired Uses
KPI Data Pollution
Cost Increases
Attacks
Velocity
Integration
Things
Quality Improvements
DevOps
How can you make sense in a Fishmarket?
Apigee Sense: In a nutshell
3
Bot Attack Stopped
Legitimate Traffic
sense
data signatures
A global processing pipeline for data flowing through Apigee Edge with a feedback loop which allows traffic shaping on Edge.
Collect + Analyze + Act
Collect We collect over 1 Billion records each day from traffic running through Apigee Edge. This data is collected at over 1000 different API endpoints (servers), and delivered to the data
lake with less than 5 minute end to end latency by a high throughput fully distributed data flow engine. There is negligible data loss within this system. The system is designed for better than 99.99% availability.
These represent API calls in a large number of industry segments: Hospitality, Telco, Retail, Healthcare, Manufacturing, and more ….
Apigee Edge Data Lake
Thousands of Servers, globally distributed. Running a highly available Managed API Service.
Over a billion API calls per day served with 99.99% availability
Over a Terabyte of data stored each day. Globally distributed. Accessible from a high throughput analysis system. Managed for a 90 day or greater retention period.
High throughput data flow engine.
Analyze The data in the data lake is automatically analyzed using Machine Learning algorithms by a
large cluster. The results stored back into the data lake. The cluster runs algorithms which consider all of the data, not just the data belonging to any one customer. These algorithms consider data seen over large time windows (24 hours, or more). This system enables our customer network to engage in mutually beneficial network effects. An attack on any one of our customers will be used to learn and defend all of our customers.
The cluster is designed to do this with low latency (a few minutes) between when data is available and result computation is completed. The cluster is able to auto-scale to process more data when data rates are higher, and scale down to keep costs under control when data rates are lower.
Data Lake
Analysis Cluster
Machine Learning Algorithms run both “per customer” and “global analysis” and then interpret the combined analysis in a per customer context.
The cluster scales to balance the needs for timeliness and cost.
Terabytes of data move between the cluster and the data lake each day.
Act The results are presented on a dashboard. A Monitoring Engine will also generate actionable
alerts when attacks are detected. The dashboard will show a drill down view on every attack. Any action taken at the dashboard is stored back in the data lake.
Actions are then read and used to shape the traffic running through Apigee Edge. Other than enabling the Sense service, there is no footprint on the Edge API Proxy. This means that we can effectively separate the concerns around security and defense of the API from those around programming and delivering the API program.
Data Lake Apigee Edge Dashboard and Monitoring
Traffic shaping on Apigee Edge is implemented outside the mainline API proxy development and deployment path in order to separate the concerns around security from those around delivering the API program.
Alerting will watch for you. Drill down so that you know who is hitting you and how. Act so that you can stop or manage them. Maintain history for audit purposes.