a framework for user centred privacy and security in the cloud · clarus platform. it details the...
TRANSCRIPT
CLARUS - H2020-ICT-2014-1 - G.A. 644024
© CLARUS Consortium 1 / 48
A framework for user
centred privacy and
security in the cloud
The CLARUS Platform v1
Type (distribution level) Public
Contractual date of Delivery 31-12-2016
Actual date of delivery 28-02-2017
Deliverable number D5.4
Deliverable name The CLARUS Platform v1
Version V1.0
Number of pages 49
WP/Task related to the
deliverable WP5/T5.3
WP/Task responsible MTI
Author(s) THALES
Partner(s) Contributing AKKA, OFFIS, URV, FCRB, MTI
Document ID CLARUS-D5.4- The CLARUS Platform V1-v1.0
Abstract This document aims at designing and describing the status of the
CLARUS Platform. It details the integration of different CLARUS proxy
modules into a unique CLARUS proxy and also the instantiation of
this proxy in the context of the project case studies: Geo publication
and e-health case studies.
This document is intended to be read by an administrator of the
CLARUS proxy to configure it for a specific usage.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 2 /48
Disclaimer
CLARUS (644024) is a Research and Innovation Actions project funded by the EU Framework Programme for Research and Innovation Horizon 2020. This document contains information on CLARUS core activities, findings and outcomes. Any reference to content in this document should clearly indicate the authors, source, organization and date of publication. The content of this publication is the sole responsibility of the CLARUS consortium and cannot be considered to reflect the views of the European Commission.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 3 /48
Table of Contents
1 INTRODUCTION ............................................................................................................................................ 4
1.1 SCOPE OF THE DOCUMENT ..................................................................................................................................... 4
1.2 REVISION HISTORY ............................................................................................................................................... 4
1.3 NOTATIONS, ABBREVIATIONS AND ACRONYMS ........................................................................................................... 5
2 INTEGRATION OF CLARUS MODULES TO BUILD THE CLARUS PROXY ............................................................. 7
2.1 INTEGRATION OPTIONS ......................................................................................................................................... 7
2.1.1 Micro-services paradigm ....................................................................................................................... 7
2.1.2 Virtualized modules ............................................................................................................................... 7
2.1.3 CLARUS modules as libraries or plugins ................................................................................................ 8
2.2 HYBRID INTEGRATION FOR PERFORMANCE OPTIMIZATION ............................................................................................ 8
2.2.1 CLARUS modules integration................................................................................................................. 8
2.2.2 Session management .......................................................................................................................... 10
2.2.3 Generic sequence diagrams for the CLARUS proxy ............................................................................. 11
2.2.4 Extension of the CLARUS proxy: Plugin-ability .................................................................................... 18
2.2.5 Generic vs. Specific CLARUS proxy ....................................................................................................... 21
3 INTEGRATION OF CLARUS PROXY IN THE CONTEXT OF CLARUS CASE STUDIES ........................................... 23
3.1 GEO PUBLICATION CASE STUDY............................................................................................................................. 23
3.1.1 Description .......................................................................................................................................... 23
3.1.2 CLARUS proxy integration ................................................................................................................... 28
3.2 EHEALTH CASE STUDY ......................................................................................................................................... 36
3.2.1 Description .......................................................................................................................................... 36
3.2.2 CLARUS proxy integration ................................................................................................................... 38
4 CONCLUSIONS ............................................................................................................................................ 41
APPENDIX A. THE CLARUS PLATFORM ........................................................................................................... 42
A.1 PLATFORM DESIGN ....................................................................................................................................... 42
A.2 USER MANUAL ............................................................................................................................................. 43
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 4 /48
1 Introduction
1.1 Scope of the document
This document specifies the integrated CLARUS platform solution. It proposes several integration options
to build the CLARUS proxy and details the one adopted in the context of the project. Besides, the
instantiation of this proxy deployment in the context of the two project case studies, i.e. geo publication
and e-health, is also presented to demonstrate the genericity of the CLARUS solution.
This document is destined to be read by an administrator that wants to integrate the CLARUS solution
within his infrastructure. The First part of the document intends to describe the mechanisms and tools
provided by CLARUS. It also describes how to integrate the CLARUS proxy module to build a unique
framework. The second part intends to describe two successful integrations of the CLARUS proxy in real-
life industrial case studies.
1.2 Revision History
Version Date Author Description
0.1 2016/06/09 THALES Document initialization
0.2 2016/06/13 THALES Draft part of Environment and connection
parts
0.3 2016/06/15 THALES Draft of the GIT service
0.4 2016/06/15 THALES Draft of the VM Provisioning part and the
release process
0.6 2016/11/23 MTI Add explanations in Chapter 4
0.7 2016/11/23 THALES Add architecture scheme and explanation
about the “Public CLOUD Services” zone
0.8 2016/11/28 FCRB User to CLARUS interface components of
the eHealth usecase
0.8 2016/11/30 THALES
Re-organization of the ToC in order to
place the Thales platform description in
Annex
0.10 2016/12/09 AKKA Description of the Geolocalisation
integration
0.10-bis 2016/12/15 FCRB & OFFIS Description of the ehealth integration.
Session management and authentication
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 5 /48
solution interfacing
0.11 2016/12/18 THALES
Contribution aggregated and re-
organised. Scope described. Document
header update
0.12 10/01/2017 OFFIS Specification of the authentication API
0.13 11/01/2017 THALES & MTI Section 2.1 updated and workflow in
section 2.2.5 added
0.14 27/01/2017 ALL Final contribution gathering and
formatting.
0.15 01/02/2017 AKKA Comments on the documents
0.16 02/02/2017 OFFIS Update of the contribution to the
authentication API
0.17 21/02/2017 MTI Contribution to sections 2.2.1 and 2.2.5
0.18 22/02/2017 MTI Consolidation of the document
27/02/2017 KHUL Review 1 of the document
27/02/2017 OFFIS Review 2 of the document
1.0 28/02/2017 MTI/THALES Finalization of the document
1.3 Notations, abbreviations and acronyms
Acronym Definition
ARG Attack Response Graph
CSP Cloud Service Provider
DoS Denial of Service
DDoS Distributed Denial of Service
FaaS Function as a service
HIDS Host-based Intrusion Detection System
IDS Intrusion Detection System
JAR Java Archive
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 6 /48
LDAP Lightweight Directory Access Protocol
MMT Montimage Monitoring Tool
NIDS Network-based Intrusion Detection System
NNID Neural Network Intrusion Detection
OGC Open Geo Spatial standards
SaaS Software as a service
SOA Service Oriented Architecture
SUO System Under Observation
TCP Transmission Control Protocol
VM Virtual Machine
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 7 /48
2 Integration of CLARUS modules to build the CLARUS proxy
This section aims at describing the different choices we have for the deployment of the CLARUS modules
in order to build the CLARUS proxy solution. It also provides details on the different interfaces of
extendable CLARUS modules (based on a plugin architecture) to have a better understanding on how to
use and adapt CLARUS to an existing environment or context by supporting a new network
communication protocol, or by adding new data protection mechanism, or by interfacing a specific
authentication solution. Generic workflows of data managed by the proxy are also presented in this
section.
2.1 Integration options
2.1.1 Micro-services paradigm
Microservices paradigm is defined as a specialisation of an implementation approach for service
oriented architectures (SOA). This paradigm is generally used to build software systems that are flexible
and independently deployable. Microservices can be defined as small and granular processes that
communicate with each other over a network in order to fulfil a specific goal. The microservices
approach is one of the most popular realisations of service oriented architecture that followed the
introduction of DevOps for building continuously deployed systems.
In microservices architecture, services generally have small granularity and rely on lightweight protocols.
One of the most important properties of microservices is their independence in the sense that a
microservice should be independently deployable and does not require an external process to be run
properly.
The main benefit of distributing different goals of the system into different smaller services is that it
enhances the cohesion and decreases the coupling. This makes it easier to dynamically modify and/or
add functions to the system at any time. It also allows the architecture of an individual service to
emerge through continuous refactoring, and hence reduces the need for a big up-front design and
allows for releasing software early and continuously.
In the context of CLARUS, different proxy modules like data operations can be specified as
microservices. This brings more flexibility to the proxy functionalities since it can orchestrate different
operations and combine them according to the end user domain requirements. Nevertheless, and since
we decided in CLARUS that only one obfuscation operation will be used for a specific domain
(encryption or anonymization or splitting), we think that this option provides more flexibility than
requested and can be less performing that other solutions (e.g., direct method calls).
2.1.2 Virtualized modules
Each module in the CLARUS proxy can run as a single virtual machine or container to provide the
functionality that it is built for. This deployment strategy can be considered as “Function as a Service”
(FaaS) category of cloud computing services that provides a platform allowing customers to develop,
run, and manage application functionalities without the complexity of building and maintaining the
infrastructure typically associated with developing and launching an application. Notice, that this option
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 8 /48
can be built upon micro-services paradigms and can also regroup several microservices into one unique
functionality to be proposed as a virtualized function. In the context of CLARUS, several functionalities
like for example monitoring and attack tolerance can be built as a separate VM that can be launched in
the same host as the CLARUS proxy or remotely in another sever.
2.1.3 CLARUS modules as libraries or plugins
Several modules in CLARUS can be proposed as standard libraries. This is the case, for instance, of data
protection modules that can be called by the protocol module to perform specific obfuscation tasks on
predefined sensitive data. These libraries can implement a common interface and can be proposed, in
this case, as plugins that are loaded at in the initialisation of the proxy or at runtime when a specific
need is detected. This option is very interesting in the context of CLARUS to avoid network-based
communications between the proxy modules that can engender delays and impact thus its
performance.
2.2 Hybrid integration for performance optimization
2.2.1 CLARUS modules integration
The main objective of building the CLARUS proxy is to secure sensitive data to be stored in the cloud and
process them in a performing way. To reach this objective, it is very important to design and build
optimal modules and integrate them in a way that avoids delays and bottlenecks.
For this reason, our main idea was to avoid network-based communications between modules as much
as possible and we opted for a communication based on Java methods calls since all the proxy modules,
but the attack tolerance module, are implemented in Java language.
Following this reasoning, the CLARUS proxy has been implemented as a main process with a plugin
architecture as shown in Figure 1. This process will load different plugins, compiled as java JARs and
respecting the APIs presented in section 2.2.3. These plugins are from 3 types:
Protocol plugins: The communication between the client application and the CSP (storing and
manipulating sensitive data) relies natively on a dedicated protocol or list of protocols. These
protocols should be added as plugins to manage the client application requests and the CSP
responses. The data communicated using the proxy will be modified/obfuscated but will rely
always on the native protocols supported by the client application and CSP. In this case, we
ensure the transparency of the Proxy mainly from the end-user point of view. Natively the TCP
protocol is supported by the CLARUS proxy which facilitate the integration of protocols on top of
it.
Data protection plugins: The CLARUS proxy proposes different protection mechanisms for
sensitive data. These mechanisms depend generally on the client application requirements in
terms of data security and privacy. Different plugins are already implemented or under
implementation in the context of CLARUS project. This is the case for the following list: simple
encryption, searchable encryption, homomorphic encryption, anonymization, splitting, verifiable
keyword search.
Authentication plugin: In the current state of the implementation, only LDAP (Lightweight
Directory Access Protocol) is implemented as a plugin. This is because it is one of the most
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 9 /48
common methods to authenticate users in a company network and it is easy to connect to this
authentication and identification service. Other authentication methods can be added for future
developments.
CL
AR
US
Pro
xy
Data protection
pluginData protection
pluginData protections loader
Authentication
mechanism loader
Protocols loader
Authentication
plugin
Protocol
plugin
Security Policy Manager
Inte
r-P
roxy c
om
mu
nic
atio
n
Access Control Policy
Management
Monitoring
Administration
e.g., Data Encyption
CL
AR
US
pro
xy m
ain
pro
ce
ss
Figure 1. CLARUS modules integration
This CLARUS proxy main process can also handle inter-proxy communication. In this context, we
consider that communication between 2 proxies is secured by using for instance a VPN connection. Two
cases are to be handled:
The proxy is deployed between the client application and a remote proxy. In this case, this
proxy will relay/forward the communication without any modification. Indeed, all the
obfuscation / de-obfuscation task in delegated to the remote proxy that owns the data.
The proxy is deployed between a first proxy and the CSP. In this case, the first proxy will be
considered as the client application of the main proxy that will treat the first proxy as a
technical user.
One other process must run in the same host (physical or virtual host) as the CLARUS proxy process. It is
the administration process implemented as a command line utility namely clarus-adm. It allows:
configuring the repository used by the CLARUS proxy for the access rights management
configuring the user authentication module
configuring the Cloud Service Providers
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 10 /48
o to register a Cloud Service Provider (CSP)
o delete a CSP
o update a CSP configuration
o enable or disable a CSP
configuring the failover mode
configuring the deployment of modules
o registering a new module
o deleting a module
o updating a module
Two other separate processes should be deployed in the same host as the CLARUS proxy process. But
this is not mandatory to have them co-located. These modules are:
The policy manager module that defines the CLARUS security policies, i.e. what to protect in the
outsourced datasets and how to protect it. The output of this program is a JSON file needed by
the CLARUS main process to configure itself.
The access control policy manager that manages the access right to different authenticated
proxy users. This manager defines the access rights of the users on the storage/processing
services protected by CLARUS. It also defines the permissions of the users on the outsourced
datasets.
Finally, the monitoring module is defined as a standalone module that can run on the same proxy host
as a separate process or a virtual machine in this host or on any standalone (physical or virtual) host. In
the context of CLARUS project, we propose to deploy this monitoring service as a pre-configured VM to
facilitate its usage. The monitoring service has a graphical user interface that displays different reports
to the proxy administrator and detects near real-time potential security issues. The following Figure 2
presents an example of an alert notified to the administrator after the detection of side channel attack.
Figure 2. MMT GUI showing the detected attack trace
2.2.2 Session management
The CLARUS proxy should act transparent to the user. Because of this, user authentication is done
implicitly. The CLARUS proxy will, for example, identify or authenticate a user based on the used
protocol to establish a connection to the proxy and thus to the underlying application or service.
However, such an authentication depends on several factors and especially on the used protocol. It may
be possible that several protocols do not allow a constant authentication with each command/request
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 11 /48
sent by an application. To circumvent this, a user session management mechanism is introduced in the
proxy to keep alive a session as long as an authenticated user continues using the CLARUS proxy with
the same application and protocol.
This session management may for example rely on network sessions (TCP), SQL sessions or on complex
sessions like HTTP sessions. In this case, we rely on a single sign on (SSO) mechanisms to ensure
authentication. In general, the session management depends entirely on the protocol that is used to
connect to the CLARUS proxy and as a consequence, the session management will initially be handled by
the user protocol module plugin. The details of these modules are described in deliverable D5.2. The
session management in the CLARUS proxy can be roughly described as following:
1. When an incoming initial network connection is made to the CLARUS proxy, the user protocol
module will once try to authenticate or identify the user. Based on this information, a user
session is created.
2. Subsequent network packets/commands will not trigger another authentication process, as long
as this user session is valid and the connection has not been suspended.
3. When the connection has ended, the user session is destroyed. Another attempt to connect will
restart the process, hence a new authentication and user session will be created.
2.2.3 Generic sequence diagrams for the CLARUS proxy
This section presents the main data workflow for the integrated CLARUS proxy. It shows the interaction
between different modules to provide the desired obfuscation service of sensitive data.
2.2.3.1 Generic sequence diagrams in the basic case of 1 CLARUS proxy
The CLARUS solution provides a dedicated proxy for the end user towards the CSP, as shown in Figure 3.
The CLARUS proxy may be deployed within the client computer, in a server within the user’s domain, or
in any other location trusted by the user. It allows managing requests from one or many users and
dealing with one or many accounts in one or more CSPs.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 12 /48
Figure 3. CLARUS solution using a unique proxy
The generic sequence diagram for data storage is described in the following Figure 4. After the
initialisation of the CLARUS proxy and loading different plugins and reading the security and access
policies, the proxy can receive requests from the application user.
If the request is related to data storage, the proxy will extract the user identity to check if he/she is
authorised to store data. In the case of a positive answer, the data will be extracted from the packet
depending on the protocol structure (the extraction methodology is implemented in the protocol
plugin). Besides, the sensitive data specified in the security policy will be also extracted to be obfuscated
using the dedicated data protection mechanism.
The last step will be to use the same protocol to send the obfuscated data on the cloud. For specific data
protection mechanisms, several steps are needed to properly store the data in the cloud. This is the case
for example of the verifiable keyword search mechanism.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 13 /48
User Proxy Main Process Proxy Protocol Plugin Authentication Data operation
Application request
Init proxy
Load security policy
Load access policy
Data Storage
Request
Extract user
Authenticate
Result
Extract data
CSP
Obfuscate data
Obfuscated data
Sensitive data
Secured CSP RequestStore secured data
ResultResult
Result
Figure 4. Data storage sequence diagram (without buffering)
Depending on the data protection mechanism, sometimes it is not possible to perform the data
obfuscation on partial data. In this case, we need to use a “buffering mode” to completely constitute the
whole data to obfuscate before processing it. This is typically the case for anonymization mechanism. In
this case, an alternative sequence diagram is presented in Figure 5.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 14 /48
User Proxy Main Process Proxy Protocol Plugin Authentication Data operation
Application request
1
Init proxy
Load security policy
Load access policy
Data Storage
Request
Extract user
Authenticate
Result
Extract data
CSP
Obfuscate data
Obfuscated data
Sensitive data
Secured CSP RequestStore secured data
ResultResult
Result
Application request
n Data Storage
Request
Extract data
Obfuscate data
Sensitive dataIte
rative
Build compete data
Figure 5. Data storage sequence diagram with buffering
In the same way, when a retrieval request is intercepted by the proxy, a buffering can be needed before
processing the obfuscated data to build the clear data in the trusted zone. The sequence diagram shown
in Figure 6, presents the different steps for data retrieval: From the data request, the user is identified
and authenticated. If he/she is authorized to retrieve data, the request is forwarded to the CSP. This
retrieval request can be modified depending on the data protection applied to the original data. The
obfuscated data is thus send from the CSP to the proxy that will build at runtime or based on a buffering
mode the clear data that will be send to the application user using the same communication protocol.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 15 /48
User Proxy Main Process Proxy Protocol Plugin Authentication Data operation
Application request
1
Init proxy
Load security policy
Load access policy
Data Retrieval
Request
Extract user
Authenticate
Result
Extract retrieval
request
CSP
Obfuscated data 1
Retrieval request
De-obfuscate data
Obfuscated data
Reply
Build answer
clear data
Build compete
obfuscated data
Packets
Obfuscated data n
Figure 6. data retrieval sequence diagram
Other alternative sequence diagrams are possible mainly if a specific processing is mandatory at the CSP
side or if the authentication is done on each packet. In this specific case, the session management is not
performed by the proxy since the user information is always present in the processed packets.
2.2.3.2 Generic sequence diagram in the case of 2 CLARUS proxies
In case of the multiple CLARUS proxy use case, there are at least two different CLARUS proxies that
collaboratively work on the same data. In the following example, CLARUS Proxy 2 owns the data and
stored it in its CSP, while CLARUS Proxy 1 belongs to a partner that is granted working access to this data
(cf. Figure 7).
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 16 /48
Figure 7. CLARUS solution using a multiples proxies
Before data access can be achieved, the two CLARUS proxies must establish a secure connection. This is
done via VPN. All subsequent communication is performed over this secure VPN connection.
Figure 8. Sequence diagram for collaboration between 2 CLARUS proxies
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 17 /48
Since data access inside of a CLARUS proxy is connected to different users, an external CLARUS proxy
(CLARUS Proxy 1 in this example) must also provide user credentials. However, the different “real” users
that connect to CLARUS Proxy 1 are not known to Proxy 2 (and they should not be known), so another
technical user must be used for the authentication process. This technical user will be used for all
requests that come from CLARUS Proxy 1. CLARUS Proxy 2 registers this technical user for the
subsequent requests from Proxy 1. This is especially important, since all data requests are routed
through the User Protocol Module, since the data flow is actually the same as from a local client. Hence,
the User Protocol Module can inspect the communication from Proxy 1 and take the according actions,
i.e. route the requests to the specific data modules. The requests coming from Proxy 1 are in fact also
requests from the user’s applications behind Proxy 1. But instead of processing these requests
internally, Proxy 1 forwards them to the external Proxy 2. Proxy 2 will process the requests in the same
way as explained in the previous section 2.2.3.1.
2.2.3.3 Generic sequence diagram for proxy monitoring and attack tolerance
Attack Tolerant Framework MMT-Network MMT-System MMT-Application
Configure(list of metrics)
Configure(list of metrics)
Network metrics and events
System metrics and events
Application events
Continuous
monitoring
Analyse Security
Monitoring agents
Counter-measure GUI
NotifyAlert()()
NotifyAlert()() Display()
React()Result
NotifyCounterMeasure()
Display()
Figure 9. Sequence diagram for CLARUS monitoring and attack tolerance
The monitoring solution in CLARUS relies on MMT1 monitoring tool that has 3 agents:
1- MMT-Network agent: monitors the network traffic, measures network related metrics and
detects attacks and abnormal behaviours
2- MMT-Application agent: Logs the proxy internals, collects application related metrics and
detects abnormal behaviours.
1 MMT stands for Montimage Monitoring Tool. More details are available in montimage.com
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 18 /48
3- MMT-System agent: inspects the usage of proxy host resources in terms of memory and CPU to
detect any abnormal usage.
As shown in Figure 9, these three agents send security related metrics and events to the attack tolerant
framework that detects (based on attack signature and behavioural analysis) potential attacks or
misbehaviours. Thus, alerts are sent to the monitoring tool GUI to inform the proxy administrator and a
set of counter-measures are triggered in an automatic or semi-automatic ways. The administrator is also
informed about these corrective actions.
2.2.4 Extension of the CLARUS proxy: Plugin-ability
2.2.4.1 Adding a new protocol
The communication between the client application and the CSP can rely on different protocols
depending on the nature of provided services. In the context of CLARUS project, HTTP, PostgreSQL and
WFS protocols are studied. But other protocols can be needed to store, retrieve and search on sensitive
data deployed in an untrusted cloud.
To support more protocols, the CLARUS proxy has been designed with a plugin architecture that allows
adding new protocol plugins to be delivered as JAR files. The ISO protocol layers must be respected and
different packet reconstruction mechanisms are to be implemented in these plugins to access the
sensitive data transmitted generally in the packets payloads.
The plugin protocol should respect an API designed in the D5.1 deliverable in section 4.3 and an example
of PostgreSQL protocol has been implemented and presented in D5.2 deliverable in section 3.2. The
protocol plugin manages in general different functionalities like:
Processing requests and responses (buffering mode and streaming mode)
Processing the computation requests (orchestration mode)
Management of computation commands
Notification of data protection operations and the results
User identification
2.2.4.2 Adding a new data operation
In the context of WP3, different data operation modules have been designed and implemented in order
to protect sensitive data e.g., data encryption, data splitting, data anonymization etc. Details about
these mechanisms are provided in D3.3 deliverable.
The CLARUS proxy has been designed to allow the support for new data protection mechanisms by
relying on a plugin architecture. The Data operations public API is a common interface for all the data
operation modules, both already implemented or under development within the CLARUS project and for
future data operation modules. It plays a key role in the plug-ability architecture of the CLARUS proxy.
This API is currently under work and is subject to changes and the last version of this API is presented in
D5.2 deliverable section 4.1 and updated in this section.
Table 1. Data operation API
eu.clarussecure.dataoperations Interface DataOperation
Version 1.0
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 19 /48
Type Method and description
void delete(java.lang.String[] attributeNames, java.lang.String[] criteria)
Outbound DELETE Operation, deletes data specified by criteria.
java.lang.String[][] get(Promise promise, java.lang.String[][][] contents)
Inbound GET operation (RESPONSE), reconstructs data received by CSP.
Promise get(java.lang.String[] attributeNames, java.lang.String[] criteria, Operation
operation)
Outbound GET operation.
java.lang.String[][][] post(java.lang.String[] attributeNames, java.lang.String[][] contents)
Outbound POST Operation, modifies data according to security policy.
java.lang.String[][] put(java.lang.String[] attributeNames, java.lang.String[] criteria,
java.lang.String[][] contents)
Outbound PUT Operation, modifies data specified by criteria, according to
security policy.
The Data operation API has methods to create, retrieve, update, and delete (CRUD) protected data and
follows REST naming conventions. The POST method takes as parameters the attribute names (as in the
original dataset schema) and the contents of the dataset as a string table (columns have to be ordered
in the same way as the attribute names). This operation returns an array of protected datasets. The PUT
method functions in the same way, except that it adds a criteria parameter, to filter the records which
have to be updated. The GET operation is divided in two different methods. An outbound get method,
which takes as inputs the attribute names, the search criteria and an operation to be performed on the
returned data (such as a sum, an average, a count, etc.) and returns an object which implements the
Promise interface. This object contains book-keeping information, such as a unique ID, as well as the
details of the GET call. The Promise object will be passed to the inbound get method, together with the
returned data from the CSP, to unprotect or reconstruct the data (when applicable) and to apply the
requested operations on the unprotected data this method returns the unprotected data. The DELETE
method takes as inputs the attribute name and the deletion criteria.
Instantiating a new Data operation module requires the security policy in XML format.
Table 2. Promise Interface
eu.clarussecure.dataoperations Interface Promise
Type Method and description
java.lang.String[] getAttributeNames()
Attribute names involved in the call.
java.lang.String[][] getCall()
Modified call, to operate with protected data.
int getId()
Internal ID of the call.
Operation getOperation()
Operation applied to the returned data.
The Promise interface has to be implemented by the new data operation module developer, and has to
include at least the following methods. getId returns the unique identification of the Promise object.
getAttributeNames returns the attribute names included in the GET call. getCall returns the modified
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 20 /48
call to pass to the protocol module. getOperation returns the operation to be executed on the result of
the call.
2.2.4.3 Interfaces to other authentication solutions
Authentication to the CLARUS proxy is done via existing authentication systems like LDAP. If such a
system is not available, a very simple user-based authentication system is provided with the CLARUS
proxy. It allows the management and authentication of users based on usernames and passwords.
However, it is always recommended to use the company’s own system for authentication and
identification purposes, since it already contains all users and does not incur any additional
management. As already stated, CLARUS will therefore include a plugin to the very well-known LDAP
system, but additional plugins for other authentication systems can easily be added. The details of the
authentication system and its modules are described in the Deliverable D5.2 (version 1 includes a first
view at these modules, while the next deliverable that will contain version 2 of the CLARUS modules will
add more details to the authentication modules).
Any authentication plugin (for example for LDAP) must implement the following interface:
Table 3: CLARUS Authentication
Module Name CLARUS Authentication
Owner OFFIS
Input Parameters Authenticate Identify
Username and Password Username
Output Parameters
Authenticate Identify
True, if a user exists with matching username and password, else false
True, if a user with the username exists
Functionality
Interdependencies with other modules
CLARUS User registration and CLARUS Protocol Module
Implementation Status
Implemented
Platform Java library (JAR)
Testability and Acceptability Criteria
Function against LDAP server
Delivery date V1 Delivery - Intermediate evaluation (M24)
Delivery Date V2 Delivery (M28) Case study based evaluation (M34), Final delivery and demonstration (M36)
Authentication interface ClarusAccess:
Table 4: CLARUSAccess Interface
eu.clarussecure.proxy.access Interface ClarusAccess
Type Method and description
boolean authenticate(org.apache.http.auth.UsernamePasswordCredentials cr)
Authenticates a user by the given username and password.
boolean authenticate(java.lang.String username, java.lang.String password)
Authenticates a user by the given username and password.
boolean identify(java.lang.String username)
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 21 /48
Identifies a user by a given username.
Hence, it is sufficient if the authentication service provides information about the existence of a user in
case of identification or the correct combination of username and password in case of authentication.
The deployment of this authentication library is done via a JAR file that includes the interface and
necessary helper classes. Implementations of the interface (currently only LDAP) will be added as
plugins and are automatically found if an according JAR file including a plugin is placed in a specified
directory.
2.2.5 Generic vs. Specific CLARUS proxy
The CLARUS proxy can be deployed using one of the two principles called “Generic” and “Specific”
proxies.
Figure 10. Generic proxy (left figure) vs. Specific proxy (right figure)
The generic CLARUS proxy
The generic CLARUS proxy embeds all the modules and plugins developed at each proxy release. Its
deployment is suitable to all domains. This involves less maintenance and the administrator does not
need to have specific background in system or network architecture. The process of re-configuring the
CLARUS server for another type of dataset is also more flexible since only the security policies (denoting
the needed protocols, data protection mechanism etc.) need to be updated. On the other hand, since
the entire CLARUS application is contained inside a single server a simple failure of one component may
result in the failure of the whole CLARUS service. Besides, this monolithic deployment is more difficult to
scale and to maintain since it involves different domains and heterogeneous data operations and can
target different protocols
The specific CLARUS proxy
Protocol module Data operation module
The application loads only the necessary modules.
Failure of one module is easily handled and does not
impact the application.
Scaling of modules is also easier
All modules are loaded even the unnecessary ones
Failure of one module impacts the other inside the same
application.
Scaling of modules needs to be done by scaling the
whole CLARUS application
CLARUS Server CLARUS JVM
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 22 /48
The specific CLARUS proxy only embeds the module needed for a specific usage (e.g. only one protocol
plugin + only one data operation + only one authentication plugin). This deployment mode implies that
the CLARUS proxy will be different in a sense that the CLARUS proxy will only embed the modules that
are needed for the runtime. By adopting this mode of deployment, the different services of CLARUS can
be easily up-scaled. And targeting one specific domain will improve maintainability. This mode also
implies a lighter footprint inside the Java Virtual Machine since only the classes that are needed for
runtime will be loaded into memory. Thus, it implies sharper performance. This kind of architecture
improves also the reliability of the whole solution since the failure of one component does not imply the
failure of the complete service. This last deployment mode is more adapted to microservices
architecture.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 23 /48
3 Integration of CLARUS proxy in the context of CLARUS case studies
At the present stage of the project, the implementation of the proxy is still ongoing, and so the generic
installation instructions are evolving. Given this constraint, it may not be appropriate to provide a
detailed description of how to set up the CLARUS proxy. Instead, it is preferable to describe the main
outlines of how to integrate the CLARUS proxy in the specific context of the demonstration cases.
Even though the code of a client application does not need to be modified when integrating CLARUS, the
client applications need to be reconfigured in order for them to point towards the proxy instead of
pointing to the cloud service. For example, in the case of client applications using the postgreSQL
protocol to communicate with a database in the cloud, we need to modify the configuration to connect
to the database by replacing the URL of the cloud database with the URL of the CLARUS proxy.
One of the most important things to do for integrating CLARUS in an existing application is to define the
security policy to be applied, i.e. what to protect in the outsourced dataset and how to protect if. This is
done thanks to the clarus-spm command line interface defined in the CLARUS interface dossier D5.1.
clarus-spm <command> <arguments> <options>
In the following subsections, the security policy for securing each demonstration case is described in
detail. A security policy (and a proxy process) is needed per protection technique and per protocol. That
is why different proxies are sometimes specified for storage and retrieval - in the case the protocols
used for storing data and for retrieving them are different (e.g. Postgres to store and WFS to retrieve).
3.1 Geo Publication case study
3.1.1 Description
Three datasets have been defined for the Geo Publication use case:
a set of groundwater boreholes locations for demonstrating web mapping
a gas distribution network for demonstrating geo collaboration
a set of mineral measurements at different points of a given region for demonstrating geo
processing (Kriging)
More details about the Geo Publication case study are presented in D2.1 “Definition of application
cases” deliverable.
3.1.1.1 Initial DB storage
Uploading these three datasets to the cloud is done thanks to a database client tool (e.g. pgAdmin).
Thanks to this database client tool, the data provider initializes the remote spatial database
(PostgreSQL/PostGIS) with a database script that defines the schema of the database and/or contains
geodata. The data provider connects directly to the remote spatial database using a user account,
specifying all connection information: host, port, database name, user and password.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 24 /48
Figure 11 - Geo data storage
Typical query (SQL insert) :
INSERT INTO "public"."groundwater_boreholes"
("nom_com","adresse","code_bss","denominati","type_point","district","circonscri","pre
cision","altitude","prof_max",geom) VALUES ('CAROMB','CAMPING COMMUNAL, BORDURE DU
CD.13','09155X0062/F',NULL,'Qualitomètre','Le Rhône et les cours d''eau côtiers
méditerranéens','Rhône-Méditerranée-
Corse',NULL,'230.00','151.000','0101000020110F0000B3E3F5FE505921419D2A0DCB35EA5441');
3.1.1.2 Geo publication
In order to publish the geo data according to OGC standards2 we use GeoServer, a web application
running in the cloud aiming to expose the data stored in the PostGIS database thanks to standardized
web services (e.g. Web Feature Service WFS).
2 http://www.opengeospatial.org/standards
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 25 /48
Figure 12 - Geo data publication
The client application in this case is an OpenLayers JavaScript application running in the web browser,
requesting for WFS layers.
Figure 13 - Geo data search/retrieval
Typical query (WFS search by attribute):
HTTP request URL HTTP request Body (Payload)
http://10.15.0.91:8080/ge <wfs:GetFeature service="WFS" version="1.0.0"
outputFormat="GML2"
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 26 /48
oserver/wfs
xmlns:topp="http://www.openplans.org/topp"
xmlns:wfs="http://www.opengis.net/wfs"
xmlns:ogc="http://www.opengis.net/ogc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/wfs
http://schemas.opengis.net/wfs/1.0.0/WFS-
basic.xsd">
<wfs:Query typeName="clarus:groundwater_boreholes_3857">
<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>clarus:code_bss</ogc:PropertyName>
<ogc:Literal>08705X0008/HY</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>
</wfs:Query>
</wfs:GetFeature>
3.1.1.3 Geo collaboration
In order to collaborate on geo data, again we use GeoServer to expose the data stored in the spatial
database. Here the protocol used is WFS-T, standing for transactional Web Feature Service. As the name
suggests, a WFS-T service is a WFS service managing transactions for creation, deletion and update of
features. Typical WFS-T queries are similar to WFS queries but they are transactional.
Figure: Geo data update
3.1.1.4 Geo processing
The Kriging process is implemented thanks to a WPS container (52°North WPS4R3) interconnected with
an R server.
3 http://52north.org/communities/geoprocessing/wps/backends/52n-wps-r.html
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 27 /48
Figure 14 - Geo processing (upload application)
The interconnection between the Java module located at the WPS Container and R is handled by the
TCP/IP server Rserve.
Figure 15 - Geo processing (execute process)
Typical query (WPS execute):
HTTP request URL HTTP request Body (Payload)
http://10.15.0.91:8080/wp <p0:Execute xmlns:p0="http://www.opengis.net/wps/1.0.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 28 /48
s/WebProcessingService?R
equest=Execute&Service=
WPS&version=1.0.0&Identi
fier=org.n52.wps.server.r.k
rige3857
xmlns:wps="http://www.opengis.net/wps"
xmlns:ows="http://www.opengis.net/ows/1.1" service="WPS"
version="1.0.0">
<ows:Identifier>org.n52.wps.server.r.krige3857</ows:Identifier>
<p0:DataInputs>
<p0:Input>
<ows:Identifier>n1</ows:Identifier>
<p0:Data><p0:LiteralData>2</p0:LiteralData></p0:Data>
</p0:Input><p0:Input><ows:Identifier>cx</ows:Identifier>
<p0:Data><p0:LiteralData>178460,178460</p0:LiteralData>
</p0:Data></p0:Input><p0:Input><ows:Identifier>cy</ows:Identifier><p0
:Data><p0:LiteralData>330140,330180</p0:LiteralData></p0:Data></p0:In
put>
<p0:Input><ows:Identifier>vari</ows:Identifier><p0:Data><p0:LiteralDa
ta>cadmium</p0:LiteralData>
</p0:Data></p0:Input></p0:DataInputs><p0:ResponseForm><p0:RawDataOutp
ut><ows:Identifier>essai</ows:Identifier>
</p0:RawDataOutput>
</p0:ResponseForm>
</p0:Execute>
3.1.2 CLARUS proxy integration
The aim of this section is to describe (from a high level perspective) how to integrate the CLARUS
solution in the specific context of the Geo publication use cases described above.
3.1.2.1 Integrate the CLARUS proxy for initial DB storage
In the specific case of geo publication, the upload of the dataset is done at once via a SQL script
executed on a spatial database in the cloud. Other operations on this dataset like search, retrieval,
update or computation are described in the following sections as they are performed using other
protocols than SQL (i.e. OGC web services).
Therefore the SQL proxies to be integrated for initial DB storage only need to be deployed and run for a
limited time (the time required for initializing the database). Once the database is initialized, the proxy is
stopped.
This is a particular purpose, with specific constraints. It is probable that scalability aspects will be critical
in this case (temporary high traffic), and it is thus advised to launch several proxies, to distribute the
workload across these multiple instances.
Proxy for storing coarsened data (geopublication)
Security policy for the proxy
The security policy allows the security manager to define what to protect in the dataset and how to
protect it. It contains the protocol and the port.
Policy definition (data usage, port, protocol)
Protocol Port Data usage
postgreSQL 5432 store
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 29 /48
Data to protect
In the groundwater_boreholes PostGIS table dedicated to anonymized search (geopublication):
- the attribute geom contains sensitive locations and is qualified as identifier
- the address field is marked as confidential
Attribute path Attribute type Data type
/groundwater_boreholes/geom Identifier geometric_object
/groundwater_boreholes/nom_com non_confidential categoric
/groundwater_boreholes/adresse confidential categoric
/groundwater_boreholes/code_bss non_confidential categoric
/groundwater_boreholes/denominati non_confidential categoric
/groundwater_boreholes/type_point non_confidential categoric
/groundwater_boreholes/district non_confidential categoric
/groundwater_boreholes/circonscri non_confidential categoric
/groundwater_boreholes/altitude non_confidential numeric_continuous
/groundwater_boreholes/prof_max non_confidential numeric_continuous
How to protect the data (anonymization module)
Attribute type Protection Parameter
Identifier Coarsening radius=5000
Confidential Suppression
Non-confidential -
Run the proxy
The proxy is launched via a shell/bat script, passing the security policy and the IP address of the backend
as arguments. For example:
$ proxy.sh --security-policy "../securitypolicies/boreholes_coarsening_psql.xml" <IP>
Proxy for storing split data (geocollaboration)
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 30 /48
Security policy for the proxy
Policy definition (data usage, port, protocol)
Protocol Port Data usage
postgreSQL 5432 store
Data to protect
In the gasleak PostGIS table dedicated to retrieval and update (geocollaboration):
- the attribute geometry contains sensitive locations and is qualified as identifier
- the address field is marked as confidential
Attribute path Attribute type Data type
/gasleak/geometry identifier geometric_object
/gasleak/address confidential categoric
/gasleak/methoddetection non_confidential categoric
/gasleak/leaksurvey non_confidential categoric
/gasleak/facility non_confidential categoric
/gasleak/surfacetype non_confidential categoric
/gasleak/leakstatus non_confidential categoric
/gasleak/leakcause non_confidential categoric
/gasleak/datereport non_confidential datetime
/gasleak/leakrepair non_confidential categoric
/gasleak/comments non_confidential categoric
How to protect the data (splitting module)
Attribute type Protection Parameter
Identifier Splitting clouds=2
Confidential Encryption id_key=123
Non-confidential -
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 31 /48
Run the proxy
$ proxy.sh --security-policy "../securitypolicies/gasleak_splitting_psql.xml" <IP>
Proxy for storing split data (geoprocessing)
Security policy for the proxy
Policy definition (data usage, port, protocol)
Protocol Port Data usage
postgreSQL 5432 store
Data to protect
In the meuse_kriging PostGIS table dedicated to geostatistical computation (geoprocessing):
- the attribute geometry contains sensitive locations and is qualified as identifier
- the cadmium field contains sensitive measurements and is marked as confidential
- the copper field contains sensitive measurements and is marked as confidential
- the lead field contains sensitive measurements and is marked as confidential
- the zinc field contains sensitive measurements and is marked as confidential
Attribute path Attribute type Data type
/meuse_kriging/geom identifier geometric_object
/meuse_kriging/cadmium confidential numeric_continuous
/meuse_kriging/copper confidential numeric_continuous
/meuse_kriging/lead confidential numeric_continuous
/meuse_kriging/zinc confidential numeric_continuous
/meuse_kriging/elev non_confidential numeric_continuous
/meuse_kriging/dist non_confidential numeric_continuous
/meuse_kriging/om non_confidential numeric_continuous
/meuse_kriging/ffreq non_confidential numeric_discrete
/meuse_kriging/soil non_confidential numeric_discrete
/meuse_kriging/lime non_confidential numeric_discrete
/meuse_kriging/landuse non_confidential categoric
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 32 /48
/meuse_kriging/dist.m non_confidential numeric_continuous
How to protect the data (splitting module)
Attribute type Protection Parameter
Identifier Splitting clouds=2
Confidential Encryption id_key=456
Non-confidential -
Run the proxy
$ proxy.sh --security-policy "../securitypolicies/meuse_splitting_psql.xml" <IP>
3.1.2.2 Integrate the CLARUS proxy for WFS search/retrieval/update
In this case the CLARUS proxies to be deployed need to run permanently in order to intercept and parse
requests and responses to/from the GeoServer in the cloud.
One of the main constraints here is high availability, in order to ensure uptime (proxy working and
available) for a long period of time.
Proxy for searching/retrieving coarsened data
Security policy for the proxy
The security policy allows the security manager to define what to protect/unprotect in the dataset and
how to protect/unprotect it. It contains the protocol and the port.
Policy definition (data usage, port, protocol)
Protocol Port Data usage
WFS 8080 search
Data to unprotect and how to unprotect
In the case of anonymization, no operation is needed to protect the requests or unprotect the
responses.
Run the proxy
The proxy is launched via a shell/bat script, passing the security policy and the IP address of the backend
as arguments. For example:
$ proxy.sh --security-policy "../securitypolicies/boreholes_coarsening_wfs.xml" <IP>
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 33 /48
Proxy for searching/updating split data
Security policy for the proxy
Policy definition (data usage, port, protocol)
Protocol Port Data usage
WFST 8080 update
Data to protect/unprotect
In the gasleak GeoServer layer dedicated to retrieval and update (geocollaboration):
- the attribute geometry contains sensitive locations and is qualified as identifier
- the attribute address is marked as confidential
Attribute path Attribute type Data type
/gasleak/geometry identifier geometric_object
/gasleak/address confidential categoric
/gasleak/methoddetection non_confidential categoric
/gasleak/leaksurvey non_confidential categoric
/gasleak/facility non_confidential categoric
/gasleak/surfacetype non_confidential categoric
/gasleak/leakstatus non_confidential categoric
/gasleak/leakcause non_confidential categoric
/gasleak/datereport non_confidential datetime
/gasleak/leakrepair non_confidential categoric
/gasleak/comments non_confidential categoric
How to protect/unprotect the data (splitting module)
Attribute type Protection Parameter
Identifier Splitting clouds=2
Confidential Encryption id_key=123
Non-confidential -
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 34 /48
Run the proxy
$ proxy.sh --security-policy "../securitypolicies/gasleak_splitting_wfst.xml" <IP
3.1.2.3 Integrate the CLARUS proxy for WPS computation
In this case, the proxy is in charge of intercepting and parsing computation operations. Detection of
computation operations consists of searching in the request one of the computation commands
configured in the security policy. The computation operation should be modified by the data operation
embedded in the proxy. For instance, in the case of splitting, the data operation splits the original
request by several ordered requests which are sent to the CSP(s)
Proxy for computing over split data
Security policy for the proxy
In the computation case, the security policy defines the commands (or processed) to protect/unprotect
and how to protect/unprotect it. Again, it specifies the protocol and the port.
Policy definition (data usage, port, protocol)
Protocol Port Data usage
WPS 8080 compute
Commands to protect/unprotect
The eu.clarussecure.wps.krige WPS process, executed on the meuse PostGIS table, is dedicated to
geostatistical computation (geoprocessing).
- Command :
The eu.clarussecure.wps.krige process operates on protected data and is
qualified as a “kriging” operation.
- Input parameters :
a dataset attribute corresponding to the data layer(s) on which the process
needs to be executed (i.e. meuse in this case)
the x, y attributes containing sensitive information not to be disclosed to the
CSP and corresponding to the coordinates of point M to be estimated
the vari attribute corresponding to the variable "cadmium" or "copper" or
"lead" or "zinc" to be estimated at point M
- Output parameters :
The result attribute contains sensitive information and corresponds to the
estimated value of vari at point M
o How to protect/unprotect the computation (splitting module)
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 35 /48
The computation will be done in coherence with how the dataset has been protected during the
storage. So the data operation derives the security policy parameters (data types, attribute types,
protection parameters) from the security policy defined for storage.
The x and y values are kept in CLARUS and they are not disclosed to the CSPs (the CSPs are only
requested to compute (partial) distances on the coordinates they store. The operations depending on
the actual x, y position to interpolate are done by CLARUS).
The vari attribute is non-confidential.
Run the proxy
$ proxy.sh --security-policy "../securitypolicies/meuse_splitting_wps.xml" <IP>
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 36 /48
3.2 eHealth case study
3.2.1 Description
The following scenario represents the eHealth case and consists of three different components:
A web application created with HTML, CSS and AngularJS that allows the user to perform the
search of Passive Electronic Medical Records (EMR) and the statistical computations.
A RESTful webservice deployed on a Tomcat server which receives the requests from the web
application and sends the appropriate SQL queries to the database.
A PostgreSQL database containing the dataset from the Hospital.
Figure 16. eHealth case study
This scenario does not consider the integration with the CLARUS proxy described in section 3.2.2.
3.2.1.1 Dataset queries
The dataset of the eHealth case contains information related to patients, discharge records, laboratory
results and different medical information based on real data from the Hospital. This information is
distributed into different tables in the database but querying to a single table is not enough to get all the
information related to a discharge document or a laboratory result.
To avoid making queries with a high amount of JOIN clauses, we have created some SQL views that will
act as a table so the queries will look like a “more common” SQL query. Thus, instead of doing this query
to all the required tables:
SELECT p.pat_id, p.pat_name, p.pat_last1, p.pat_last2, p.pat_gen, p.pat_zip, e.ep_id,
e.ep_age, e.ep_range, dr.dis_id, dr.dis_ver, dr.dis_serv, dr.dis_adm, dr.dis_dis,
dr.dis_days, dr.dis_adtp, dr.dis_dest, dr.dis_sig1, dr.dis_sig2, dr.dis_pdf,
dia.dia_id, dia.dia_desc FROM patient p INNER JOIN episode e ON e.ep_pat = p.pat_id
INNER JOIN discharge_report dr ON dr.dis_ep = e.ep_id INNER JOIN document_diagnose doc
ON (doc.dis_id, doc.dis_ver) = (dr.dis_id, dr.dis_ver) INNER JOIN diagnose_cie9mc dia
ON dia.dia_id = doc.dia_id;
We can get the same results doing this query to the view:
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 37 /48
SELECT * FROM discharge_advanced;
3.2.1.2 Initial DB storage
As in the Geo Publication use case, the dataset is uploaded to the cloud using a database client tool. The
data provider will run some SQL scripts to initialize the eHealth database defining the schema, the views
and inserting the required data.
3.2.1.3 Passive EMR search
In this case the user is allowed to retrieve a set of documents based on different parameters. We can
differentiate between simple and advanced searches. The simple search is used to retrieve documents
by patient identifier, which is more specific because the results will be for a unique identifier, or by
patient name, where the specificity of the results will vary depending on the amount of filters applied
(e.g. search by name, last name, both...). The advanced search is an extension of the simple search. It
allows the user to retrieve documents by identifier or name and/or by other criteria like specific
admission or discharge date, a concrete diagnose and a specific medical service.
The table below shows how the search requests done to the webservice are mapped into SQL queries:
HTTP request SQL query
http://webservicehost:8080/FCRBWS/rest/eHealth/
search?pat_id=00001234
SELECT * FROM discharge_advanced WHERE
pat_id = ’00001234’;
http://webservicehost:8080/FCRBWS/rest/eHealth/
search?pat_name=SANDRA&pat_last1=GARCIA
SELECT * FROM discharge_advanced WHERE
pat_name = ’SANDRA’ AND pat_last1 =
’GARCIA’;
http://webservicehost:8080/FCRBWS/rest/eHealth/
search?pat_gen=F&ep_range=08&dis_serv=END
SELECT * FROM discharge_advanced WHERE
dis_serv = ’END’ AND ep_range = ‘08’ AND
pat_gen = ‘F’;
3.2.1.4 Statistical computations
In this case the user is allowed to retrieve statistics, like percentages and mean and standard deviations,
performed over the dataset. The results will vary depending on the amount of filters applied to the
computation (e.g. percentage by gender of patients with diabetes diagnose for the endocrinology
service).
The table below shows how the computation requests done to the webservice are mapped into SQL
queries:
HTTP request SQL query
http://webservicehost:8080/FCRBWS/rest/eHealth/
statistics/dis_serv?pat_gen=F
SELECT DISTINCT dis_serv AS field,
count(dis_serv) OVER (PARTITION BY
dis_serv) AS total,
(100.0*(count(dis_serv) OVER (PARTITION
BY dis_serv))/(count(*) OVER ())) AS
percentage FROM discharge_advanced WHERE
pat_gen = 'F' ORDER BY field;
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 38 /48
3.2.2 CLARUS proxy integration
The CLARUS modules used by the eHealth case are the Searchable Encryption (SE) module for the
Passive EMR Search application and the Data Anonymization module for the Statistical Computation
application. Each integration case is described in more detail in the following subsections.
Figure 17. eHealth case study integration
3.2.2.1 Integrate the CLARUS proxy for Passive EMR Search
For the passive EMR search application of the eHealth case, the upload of the dataset is done at once via
a SQL script executed on a database in the cloud. The proxy will run continuously because the protocol
used for storing and retrieving data will be the same (PostgreSQL).
Proxy configuration
The data used by the Passive EMR search application will be encrypted using searchable encryption,
which allows the search and retrieval operations directly over encrypted data without prior decryption,
because only allowed users should be able to retrieve this information in clear through the proxy. Thus,
if a user accesses directly to the cloud, the data will be useless for him.
Security policy for the proxy
Policy definition
Protocol Port Data usage
PostgreSQL 5432 store and retrieve
Data to protect and how to protect
As all the information is considered sensitive, all the datasets must be encrypted.
Data to unprotect and how to unprotect the data
All the data must be decrypted in order to get the clear data.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 39 /48
Run the proxy
$ proxy.sh --security-policy "../securitypolicies/ehealth_encryption.xml" <IP>
3.2.2.2 Integrate the CLARUS proxy for Statistical Computations
For the statistical computations application of the eHealth case, the upload of the dataset is done at
once via a SQL script executed on a database in the cloud. The proxy will run continuously because the
protocol used for storing and retrieving data will be the same (PostgreSQL).
Proxy configuration
The data used by the Statistical computation application will be anonymized using k-anonymity and t-
closeness because we have to guarantee that patient privacy is not being compromised at the same
time that data is clear enough to be computed.
Security policy for the proxy
Policy definition
Protocol Port Data usage
postgreSQL 5432 store and retrieve
Data to protect
There are different attributes from different tables that have to be anonymized to preserve the patient
privacy.
- Identifiers: patient id and names can identify the patient directly, so they must be removed or
encrypted.
- Quasi-identifiers: patient gender, zip code, age and age range, admission and discharge dates
and the amount of days stayed at the hospital could identify a patient if they are combined, so
they must be masked.
- Confidential attributes: discharge report id and version and related diagnose from a patient are
sensitive information that could identify the patient indirectly, so they should be masked too.
In the table, only attributes different from non-confidential type are listed.
Attribute path Attribute type Data type
eHealthDB/patient/pat_id identifier categoric_ordinal
eHealthDB/patient/pat_name identifier categoric
eHealthDB/patient/pat_last1 identifier categoric
eHealthDB/patient/pat_last2 identifier categoric
eHealthDB/patient/pat_gen quasi_identifier categoric
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 40 /48
eHealthDB/patient/pat_zip quasi_identifier categoric
eHealthDB/episode/ep_age quasi_identifier numeric_discrete
eHealthDB/episode/ep_range quasi_identifier numeric_discrete
eHealthDB/discharge_report/dis_adm quasi_identifier date
eHealthDB/discharge_report/dis_dis quasi_identifier date
eHealthDB/discharge_report/dis_days quasi_identifier numeric_discrete
eHealthDB/discharge_report/dis_id confidential categoric_ordinal
eHealthDB/discharge_report/dis_ver confidential categoric_ordinal
eHealthDB/document_diagnose/dia_id confidential categoric
How to protect the data
For the k-anonymity protection, the most usual values for k are compressed between 3 and 10. The t-
closeness protection could be optional depending on the level of anonymization that we want to apply
to the dataset. We have to make sure that the configuration applied guarantees a good ratio between
anonymization and data utility.
Attribute type Protection Parameter
identifier supression
quasi_identifier k-anonymity k=5
confidential t-closeness t=0.1
non_confidential not
Data to unprotect and how to unprotect the data
In this case, as the user will not need to get the original data and will only receive the computation
results, no operation is needed.
Run the proxy
$ proxy.sh --security-policy "../securitypolicies/ehealth_anonymization.xml" <IP>
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 41 /48
4 Conclusions
In this document, which is part of the CLARUS implementation work package (WP5), we present the
different integration possibilities offered by CLARUS. We also describe the flexibility of CLARUS and how
the user can extend it by implementing new plugins.
To do so, we began by introducing the different possible paradigms for the CLARUS integration. CLARUS
as a security provider performing fast computation needs an integration mechanism that was fault
tolerant, scalable and with high performance, that’s why we chose a hybrid method to integrate the
modules.
Then we further listed the requirement needed to build CLARUS extensions. Since CLARUS was built on
an extensible architecture, the need to list and provide guidance to create new data operation modules,
protocol modules or even new authentication mechanism is a valuable asset to the platform.
In order to demonstrate the successful integration of the CLARUS proxy, we chose to describe the
platform that is used within the project to represent the two use cases of CLARUS in which the
integration mechanism chosen in the beginning of this document was used.
As a conclusion, the work done in this document is valuable to a system administrator that wants to
integrate the CLARUS solution into his already existing infrastructure.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 42 /48
Appendix A. The CLARUS platform
The CLARUS platform hosts all the CLARUS proxy modules and allows their integrations to build different
proof of concept demonstrations. This platform is hosted on the THALES premises. More details are
provided in the next sections.
A.1 Platform Design
A.1.1 Architecture
Figure 18. CLARUS platform architecture
Figure 18 presents the global architecture of the CLARUS platform. The details of different platform
features are presented in the next subsection A.2.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 43 /48
A.2 User Manual
This annex is intended to guide the CLARUS partners on how to use the platform to build different proof
of concept demonstrators integrating different version of CLARUS proxies and use cases. This manual
can also be a first documentation for an early adopter of the CLARUS solution.
A.2.1 Environment
The CLARUS platform is also intended to facilitate the development and delivery of CLARUS’ modules. It
is for now divided into 4 main networks:
- integ-dev
- geoLoc
- eHealth
- services
The first (integ-dev) is an open network where the developers are able to start/instantiate virtual
machine(s) based on Openstack4 in order to deploy their own modules. This network has no specific
constraints so that every user can build new VMs to test specific integrations between different
modules.
The second and third networks (GeoLoc and eHealth) are only available through the port numbers 80
and 443. These networks are intended to be the PoC demonstrators of the project use-cases and are
intended to be easily and automatically reproducible.
The final network (services) is the network where all the platform-related services are available. For
instance, a GIT repository based on the GitLab tool is available so that every partner can store the
source code, the configuration files, and the documentation of its developed module(s). An aptitude
relay is also available so that every partner can install packages into its VM(s) on the integ-dev network.
WARNING: In all environments no Internet is available
In order to order to correctly deploy the two use case applications, THALES also provides public cloud
services supporting Postgres and S3/SWIFIT for Storage as a Service. One Postgres server for each use
case and three S3 CSPs are available for different usages.
4 https://www.openstack.org/
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 44 /48
A.2.2 Connection
An X509 certificate is requested from THALES administration team composed of Romain FERRARI and
Jérôme VENANT in order to set up the VPN connection to the platform.
The access to the VPN server is to be performed using OpenVPN with UDP through the 11194 port. An
example of a configuration file for Linux is provided below:
#### VPN definition
client
remote <IP_OF_THE_CLARUS_VPN> <PORT_OF_CLARUS_VPN> udp
nobind
dev tapKalEL
auth-nocache
resolv-retry infinite
#### SSL/TLS root certificate, certificate, private key
askpass
ca client.certs/ca.crt
cert client.certs/client.crt
key client.certs/client.key #### MUST BE KEPT SECRET
# Wireless networks often produce a lot of duplicate packets. Set
this flag to silence duplicate packet warnings
mute-replay-warnings
#### Misc
comp-lzo
daemon
log /tmp/ovpn.log
#### MTU troubles Fix
fragment 1300
mssfix
# Set log file verbosity.
verb 3
# Silence repeating messages
mute 10
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 45 /48
A.2.3 VM Provisioning
A.2.3.1 Account configuration
The Thales Administration Team provides authorized partners with a login and a password to connect to
the Openstack Dashboard in order to manage their VMs.
The first thing to do after a first connection is to create a set of RSA keys. In order to do that, you may
import your own keys created on your own computer or you can generate them directly using the Web
interface available using this URL: http://Horizon.kalel.theresis.org
Figure 19. Authentication the CLARUS platform
After connection, a user needs follow these steps:
Click on the top right side of the screen on your username then on parameters.
Click on the “Compute” link on the left side then click on “Access and Security”
Choose the tab Key Pair.
Figure 20. OpenStack Interface – Access & Security
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 46 /48
Click on create a Key Pair, choose a name and click on “Create key pair”. The key pair will be
created inside the interface and you will have to download the private part. DO NOT LOOSE
YOUR PRIVATE KEY. If you lose it, we will have no way of retrieving it and you will no longer
have access to you VMs.
A.2.3.2 Create a VM
The VM in the sandbox environment MUST be provisioned using the CLARUS Template image. It’s an
Ubuntu based image (Ubuntu 16.04) with the configuration needed to work with the CLARUS
integration platform.
Connect to your VPN
Open your favorite browser following this URL: http://Horizon.kalel.theresis.org
Type in the user and password that was given to you by the Thales Administration Team.
Figure 21. Openstack interface - Instances
Click on the “Compute” link on the left side then click on “Instances”.
Then Click on “Launch Instance” on the right side of the screen.
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 47 /48
Figure 22. Openstack interface – Creating a new VM
Fill in the form. For a standard VM you can use the same configuration as the one above.
The “Access and Security” tab should show the name of the previously imported SSH key. The
“Network” tab should show the integDev network.
No more configuration is needed at this stage. Just click on launch to instantiate the VM.
Wait until the VM generation is finalized then you can connect to your VM using SSH with the
user “ubuntu” and the IP written in the description.
Figure 23. Openstack interface – List of VMs
$ ssh [email protected] -i ~/clarus/myKeySandbox
A2.3.3 CLARUS Release Process
The release process can and will probably evolve in order to be as much easy and stable as possible.
Docker Releases
The releases must be supplied as Docker containers. Each Dockerfile must be provided in the GIT
repository of the use-case project (Geoloc and eHealth) in the branch of each module. The reference
CLARUS - H2020-ICT-2014-1 - G.A. 644024 CLARUS-D5.4- The CLARUS Platform V1-v1.0
© CLARUS Consortium 48 /48
Dockerfile must be present under the directory docker of each branch. If there is no branch describing
your module, create one.
A README.md must be provided under the docker folder. Such a README must give instructions about
how to work with the corresponding Docker container.
Dockerfile must follow the best practices described here:
https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/
To merge the stable version of each use case a Merge Request need to be done and should be approved
by the responsible of the use case.
A Dockerfile sample will be provided in the project repository.
A.2.3.4 Services
Git
A git repository is available for all CLARUS’ partners to use. It is accessible using
http://gitlab.kalel.theresis.org after the correct set up of the VPN.
The web interface is based on the GitLab OSS version 8.8.4 (https://gitlab.com/gitlab-org/gitlab-ce). All
users will be available to create projects, groups and branches. The users will be added by the Thales
Administration Team at the same time as the X509 for the VPN is created.
Aptitude relay
An aptitude relay is set up in order to properly update the VMs’ environment. In order to configure it
you need to issue the command (as root):
$ echo “Acquire::http::proxy \"http://apt.kalel.theresis.org:3142\" ;” > /etc/apt/apt.conf.d/01proxy