rapport veille salon-mobile it & bigdata

Pour tous renseignements : contact@veillesalon.com Tél. 08 71 57 21 78 ‐ Fax. 01 34 35 04 89 Un site produit et édité par VIEDOC Solutions 8 rue de Malleville, 95880 Enghien les bains

A report made by the VIEDOC company 2 rue de Hélène Boucher, 78280 Guyancourt, FRANCE

For any further information: contact@veillesalon.com ‐ Tel : +33(0)1 30 43 45 27 Websites : www.veillesalon.com and www.viedoc.fr

EXIBITION WATCH REPORT

Mobile IT & Big Data 23rd‐25th of October, 2012

Paris, Porte de Versailles Applications for mobile, Business Intelligence

P a g e | 2

TABLE OF CONTENTS ABSTRACT ................................................................................................................................................................ 4 RESUME ................................................................................................................................................................... 4 Part 1. innovations on mobile it ........................................................................................................................ 5 1.1 Background on Gamification on mobile ................................................................................................ 5 1.1.1 Definition of gamification .................................................................................................................. 5 1.1.2 Gamification market forecast ........................................................................................................... 6 1.1.3 Innovations from platform providers ................................................................................................ 6

1.2 Nomalys by Nomalys (Mobile application) ............................................................................................ 7 1.3 Teopad by Thales ................................................................................................................................... 8

Part 2. Big data ................................................................................................................................................ 12 2.1 Background on Big Data ...................................................................................................................... 12 2.1.1 Defining big data ............................................................................................................................. 12 2.1.2 Characteristics of Big Data: The four Vs .......................................................................................... 12 2.1.3 The Importance of Big Data ............................................................................................................ 13 2.1.4 Estimations of IT spending driven by Big Data issues ..................................................................... 15

2.2 Big Data Architecture Capabilities and their primary technologies ..................................................... 16 2.2.1 Comparison of information architectures ....................................................................................... 16 2.2.2 Storage and Management Capability .............................................................................................. 17 2.2.3 Database Capability ......................................................................................................................... 19 2.2.4 Processing Capability ....................................................................................................................... 20 2.2.5 Data Integration Capability ............................................................................................................. 21 2.2.6 Statistical Analysis Capability .......................................................................................................... 22

2.3 Trends on big data ............................................................................................................................... 23 2.3.1 The Internet of Things already here ................................................................................................ 24 2.3.2 Getting to the right business model(s) for data .............................................................................. 24 2.3.3 Adding a Social layer to traditional activities .................................................................................. 24 2.3.4 The New Frontier of Business Intelligence & Semantics at petabyte scale..................................... 25

2.4 Key companies in the Big Data exibition in Paris ................................................................................. 25 2.4.1 Data Publica .................................................................................................................................... 25 2.4.2 Altic ................................................................................................................................................. 25 2.4.3 Talend .............................................................................................................................................. 26

Conclusion ............................................................................................................................................................. 27 About VEILLE SALON ............................................................................................................................................. 28 PRESENTATION of VIEDOC SARL ........................................................................................................................... 29

P a g e | 3

DISCLAIMER This report was compiled from interviews conducted by us with the exhibitors present at each event, from gathering and analyzing information in conferences and from the compilation of information on the web afterwards. Thus, the data contained in this report have information value. Although the objective is to disseminate timely and accurate information, VEILLE SALON cannot guarantee the result. Any damage that may result from use of this information can’t be imputed to this site. The use or reproduction of all or part of this document is prohibited without the prior written consent of VEILLE SALON. For full terms and conditions of use of this report, thank you for contacting us.

P a g e | 4

ABSTRACT

According to the organizers, the exhibition Mobile IT and Big Data have not attracted many visitors. Big companies like Orange, SFR, Bouygues, Free for telecoms or like Intel, Dell, IBM for Big Data were absent. But the conferences on the evolution of these sectors have been very successful. In a gloomy atmosphere where visitors and exhibitors talk openly about tiny budgets for information technology, some sectors, however, were quite healthy and innovative. This was the case for equipment manufacturers and developers of next generation telephony, or web provider. There were some impressive innovations in the field of smartphones coming from a large number of young companies, specializing in mobile business solutions. The advent of smartphones and tablets is revolutionizing enterprise mobility. Judicious use of interfaces from the video games industry brings playful applications, which allows more friendly use by customers. We talk about "gamification" phenomenon, which is about to commercially explode in the short term. Conferences on Big Data grew quite a crowd and allowed visitors to discover an emerging sector that should weigh heavily in the development of enterprises. In only 10 years, the amount of data increased exponentially. Data storage is a costly problem for businesses, but these data are relatively untapped by companies. The idea of big data is to create added value from very diverse data. People now talk about flows, exchanges, collaborations rather than storage. Nothing is sorted but everything can be found. Big Data (from 10 TB of data) is revolutionizing the infrastructure in information technology. Environments such as Hadoop provide flexibility in resources and adapt to the workload by adding inexpensive servers in parallel. Big Data has generated a turnover of $ 17 billion in 2011 and it is estimated that this figure will double by 2016. The great debate with big data is to find a balance between data transparency and privacy of citizens.

Key words: mobile, gamification, smartphone, security, big data, data scientists, hadoop, business intelligence

RESUME

De l’aveu même des organisateurs, les salon Mobile IT et Big Data n’ont pas attiré beaucoup de visiteurs et les grands du milieu comme Orange, SFR, Bouygues, Free pour les télécoms ou comme Intel, Dell, IBM pour les Big Data étaient absents. Mais les conférences techniques et sociétales sur l’évolution de ces secteurs ont connu un vif succès. Dans une ambiance morose où visiteurs et exposants parlent ouvertement de chutes des budgets aux technologies de l’information, certains secteurs affichent cependant une santé de fer, les fabricants d’équipements et développeurs de téléphonie de nouvelle génération, ou les hébergeurs, pour ne citer qu’eux. On assiste particulièrement à des innovations florissantes dans le domaine des smartphones avec un grand nombre de jeunes sociétés, spécialisées dans les solutions professionnelles mobiles. L’arrivée des smartphones et des tablettes révolutionne la mobilité en entreprise. L’utilisation judicieuse des interfaces venant de l’industrie des jeux vidéos apporte un côté ludique aux applications, qui permet une meilleure appropriation par les utilisateurs. On parle de « gamification », phénomène amener à exploser commercialement à très court terme. Les conférences sur le Big Data ont amené les visiteurs à découvrir un secteur naissant qui devrait peser très lourd dans le développement des entreprises. On assiste depuis 10 ans à une explosion du poids des données. Le stockage de données est une problématique couteuse pour les entreprises, mais ces données sont relativement peu exploitées par les entreprises. L’idée des big data est de créer de la valeur ajoutée à partir des données de nature très diverses. On raisonne désormais en flux, échange, collaboration plutôt qu’en stockage. On ne classe rien mais on retrouve tout. Le Big Data (à partir de 10 To de données) est en train de révolutionner les infrastructures dans les technologies de l’information. Les environnements comme Hadoop permettent d’avoir une grande souplesse dans les ressources et de s’adapter à la masse de travail en ajoutant en parallèles des serveurs peu couteux. Le Big Data a déjà généré un chiffre d’affaires de 17 milliards de dollars en 2011 et on estime que ce chiffre doublera d’ici 2016. Le grand débat avec la finesse d’exploitation des big data va être où placer le curseur entre la transparence des données et le respect de la vie privée des citoyens.

Mots clés : big data, stockage, données, valorisation, géolocalisation, mobilité, smartphone, serveur, Hadoop

P a g e | 5

PART 1. INNOVATIONS ON MOBILE IT

1.1 BACKGROUND ON GAMIFICATION ON MOBILE

1.1.1 Definition of gamification

Gamification is the use of games or competition to encourage a user to complete an action or set of actions. Users respond to a range of prompts and are encouraged to return regularly to the application. The prompts include:

What makes gamification so attractive is the fact that we generally enjoy actively participating and engaging with others through entertainment. It is in our human nature to interact and be entertained with playful applications, particularly when there are engaging game design elements employed. Consumer games and digital entertainment continues to attract attention given the interest the public has with games. Compelling game mechanics and design are at the core of an engaging user experience. Gamification, therefore must work to enhance the user experience in order to better engage, retain, motivate and promote overall participation. Gamification takes advantage of game mechanics to deliver engaging applications, and make non‐game

P a g e | 6

applications more entertaining and appealing. By deploying these dynamics in a co‐ordinated application, a company can use games to motivate behaviours and drive outcomes for both the customer and the organisation.

1.1.2 Gamification market forecast

The adoption of applying game mechanics in more nontraditional industries has grown exponentially in the past 18 months. This is due in part to the growth of social and mobile games, as well as the increasing consumer adoption of social media.

M2 Research estimates that the market spend on gamification solutions, applying game mechanics and behavioral analytics in non‐traditional applications will reach $242 million by the end of 2012, which is more than double from 2011. Revenue estimates are comprised of a number of components that includes:

1. Platform vendor revenue 2. Agency and production revenue 3. Internal development

1.1.3 Innovations from platform providers

2012 is a milestone year for gamification and as it grows will evolve into a serious component of consumer and employee engagement. It will be critical for both platform providers as well as deploying organizations to understand that implementing gamification is not a short‐term strategy. It is a long‐term commitment that requires diligence in audience research, application design and activation/maintenance to ultimately benefit from the opportunities that gamification principles offer.

P a g e | 7

Despite the anticipated growth rates, gamification will remain a market that will be carefully evaluated by potential customers for platform providers. Mobile IT took really advantage of gamification for application, and the main innovations displayed in the Mobile IT exhibition in Paris come from platform providers.

1.2 NOMALYS BY NOMALYS (MOBILE APPLICATION)

Address :46 rue Auguste Blanqui 94250 Gentilly, France Tel : 01 46 65 21 58 Fax : 01 79 73 55 89

Contact :Celine BLANC Courriel : contact@nomalys.com Website : http://www.nomalys.com/

NOMALYS offers the opportunity to nomad professionals using a Smartphone (iphone, iPad, Android, BlackBerry et Windows Phone 8) to finally access the totality of their strategic company’s data.

Source: Nomalys, 2012 Every company equipped with a structured IT system can connect it to the Nomalys application. The applications ergonomics, engine and algorithms have been designed to be generic, this means that every IT system can be browsed by any mobile device with the same ergonomics and colorful user interface. However, Nomalys is not only a way to make your CRM or ERP mobile. It is also a chance for each company to build through the power and the innovative ergonomics of Nomalys an application able to display their large range of products and services. The access is immediate, intuitive, dynamic and secured. It is possible to be warned in real time of any important event happening on your database.

P a g e | 8

Source: Nomalys, 2012 Connection is made on existing CRM or ERP software. This allows to access data such as: clients, prospects, stocks, invoices, quotations, pays, human resources, complaints …

Source: Nomalys, 2012 With the solution developed by NOMALYS, your software becomes mobile, dynamic, interactive and fully promoted. Nomalys received a Convergence 2012 awards in the Mobile IT exhibition. Nomalys has developed close partnerships with CNRS, Institut Telecom for developing unique algorithms.

1.3 TEOPAD BY THALES

Address :Thales Communications & Security 45, rue de Villiers – 92200 Neuilly‐sur‐Seine Cedex. Website: http://www.thalesgroup.com

Contact :Raphaël BINET Product Marketing Manager Email: Raphael.binet@thalesgroup.com Tel : +33 1 46 13 29 52 Mobile: +33 6 08 17 93 91

TEOPAD is a securing solution for professional applications on smartphones and tablets, developed by Thales and dedicated to companies and public services. TEOPAD allows to create on the terminal a secure professional environment that can coexist with an open personal context. This professional environment is in the form of an application that can be started after a

P a g e | 9

strong user authentication and by means of a simple icon on the terminal's native desktop. The user can then access a second desktop, which constitutes his/her professional environment. The latter is completely isolated from the personal and native part by a patented sandboxing technology.

Source: Thales, 2012 This part is entirely encrypted and controlled, contains all the applications, data and settings necessary for the user within the framework of his/her business activity:

Applications of all types: web browser, e‐mail client, viewers, note pads, telephony client, business applications, etc.

Documents, contact database, personal organizer, e‐mail archives, etc. The innovations developed by Thales enable TEOPAD to propose significant differentiators with respect to the other market solutions:

Flexibility in choosing the terminal: for a given OS, the solution may be deployed on most of the market terminals using this OS.

Flexibility in choosing the applications: for a given OS, most of the applications available on the market may be hosted and protected in the secure environment. This applies to native applications, as well as to third applications or applications developed by the company for its own needs.

Protection of the information in all its forms: information remains vulnerable when manipulated, transmitted or stored. Therefore, there is no use encrypting only e‐mails or telephony, as most of the current solutions offer to do so. TEOPAD allows to protect information in all its editing, viewing or exchanging contexts.

Flexibility of the secure perimeter: thanks to "TEOPAD Market Place" the company can make any time new secure applications available for its employees. For instance, they can be adapted depending on the employees' missions or business trips. This flexibility enables the employee to travel in complete safety with a terminal, the content of which is strictly adapted to his/her needs. He/she can leave with a terminal with no professional context, the latter being downloaded securely once he/she has reached his/her destination.

Simplicity of deployment for the user: once he/she has received his/her authentication means, the user downloads the TEOPAD application and his/her customized professional context from the "TEOPAD Market Place" available on the Intranet of his/her company.

User‐friendly interface: TEOPAD preserves integrally the ergonomics of the native OS and the applications used.

P a g e | 10

No additional specific infrastructure: TEOPAD is connected very simply to the existing information system. There is no use deploying proprietary servers or gateways, which highly limits the costs.

Offer of high‐quality professional services dedicated to the users Flexible operation: it may be partially or completely given to a trustworthy third.

Source: Thales, 2012 The Teopad sandboxing technology is a unique and patented technology that allows to create terminal duality between two environments – professional and personal ‐ working simultaneously, but independently,and without resorting to proprietary applications. This technology does not rely on virtualization principles, which makes it particularly light, with all possible benefits in terms of performance and autonomy. The Android applications are authorized to perform specific tasks or reach system components depending on the privileges they received. The TEOPAD SANDBOX system controls the authorizations, and then, filters the exchanges between:

professional and personal applications; professional applications and operating system.

This mechanism allows the Information System Department to limit the interaction capabilities of professional applications with their environment. The ringfenced professional environment is then generated and is displayed in the form of a separate desktop on the terminal.

P a g e | 11

This technology supplies efficient means to fight against intrusions, information leaks or trapping of professional applications. The TEOPAD SANDBOX advantages:

customized compartmentalization of professional applications and data with respect to the rest of the terminal;

professional desktop that can host any type of applications available on the market or developed by the company (no mandatory Thales proprietary application);

simultaneous operation of professional and personal environments with unique notification interface for the user (Android native bar);

application content exclusively from the company's Teopad Market Place and entirely under control of the latter;

protection of professional data, including those being visualized, when they are no longer encrypted; very poor print on the terminal, which enables to maintain perfectly the performance of the latter; user‐friendly interface maintained.

The TEOPAD SANDBOX compartmentalization service is proposed independently from the local encryption service on the terminal. These are two complementary services.

Source: Thales, 2012 The TEOPAD solution is composed of the following elements:

For the user: o The TEOPAD application to be installed on the terminal. o The TEOPAD Market Place client application.

For the company: o The TEOPAD infrastructure is particularly light as it does not require any proprietary element

to connect the users to the information system. o It allows a centralized and industrialized deployment, and then operation of TEOPAD. The

tools enable in particular to create generic or customized profiles and to become adapted to fleets with high dimensions or specialized per business activity.

P a g e | 12

PART 2. BIG DATA

2.1 BACKGROUND ON BIG DATA

2.1.1 Defining big data

Big data typically refers to the following types of data: Traditional enterprise data – includes customer information from CRM systems, transactional ERP data, web store transactions, general ledger data.

Machine‐generated /sensor data – includes Call Detail Records (“CDR”), weblogs, smart meters, manufacturing sensors, equipment logs (often referred to as digital exhaust), trading systems data.

Social data – includes customer feedback streams, micro‐blogging sites like Twitter, social media platforms like Facebook

The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020. But while it’s often the most visible parameter, volume of data is not the only characteristic that matters. Big Data is sized in peta‐, exa‐, and soon perhaps, zetta‐bytes! And, it’s not just about volume, the approach to analysis contends with data content and structure that cannot be anticipated or predicted. These analytics and the science behind them filter low value or low‐density data to reveal high value or high‐density data. As a result, new and often proprietary analytical techniques are required. Big Data has a broad array of interesting architecture challenges.

2.1.2 Characteristics of Big Data: The four Vs

In fact, there are four key characteristics that define big data: Volume, Velocity, Variety and Value. It is often said that data volume, velocity, and variety define Big Data, but the unique characteristic of Big Data is the manner in which the value is discovered.

a) Volume.

Machine‐generated data is produced in much larger quantities than non‐traditional data. For instance, a single jet engine can generate 10TB of data in 30 minutes. With more than 25,000 airline flights per day, the daily volume of just this single data source runs into the Petabytes. Smart meters and heavy industrial equipment

P a g e | 13

like oil refineries and drilling rigs generate similar data volumes, compounding the problem. People really speak about big data when the volume is above 10 To.

b) Velocity.

Social media data streams – while not as massive as machine‐generated data – produce a large influx of opinions and relationships valuable to customer relationship management. Even at 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes (over 8 TB per day).

c) Variety.

Traditional data formats tend to be relatively well described and change slowly. In contrast, non‐traditional data formats exhibit a dizzying rate of change. As new services are added, new sensors deployed, or new marketing campaigns executed, new data types are needed to capture the resultant information.

d) Value

The economic value of different data varies significantly. Typically there is good information hidden amongst a larger body of non‐traditional data; the challenge is identifying what is valuable and then transforming and extracting that data for analysis. With Big Data, the value is discovered through a refining modeling process: make a hypothesis, create statistical, visual, or semantic models, validate, then make a new hypothesis. It either takes a person interpreting visualizations or making interactive knowledge‐based queries, or by developing ‘machine learning’ adaptive algorithms that can discover meaning. And in the end, the algorithm may be short‐lived.

2.1.3 The Importance of Big Data

The growth of big data is a result of the increasing channels and variety of data in today’s world. Some of the new data sources are user‐generated content through social media, web and software logs, cameras, information‐sensing mobile devices, aerial sensory technologies, genomics, and medical records.

Source: Cisco, “VNI Service Adoption Forecast, 2011–2016”, May 2012 Companies have realized that there is competitive advantage in this information and that now is the time to

P a g e | 14

put this data to work. To make the most of big data, enterprises must evolve their IT infrastructures to handle the rapid rate of delivery of extreme volumes of data, with varying data types, which can then be integrated with an organization’s other enterprise data to be analyzed.

When big data is distilled and analyzed in combination with traditional enterprise data, enterprises can develop a more thorough and insightful understanding of their business, which can lead to enhanced productivity, a stronger competitive position and greater innovation – all of which can have a significant impact on the bottom line. For example, in the delivery of healthcare services, management of chronic or long‐term conditions is expensive. Use of in‐home monitoring devices to measure vital signs, and monitor progress is just one way that sensor data can be used to improve patient health and reduce both office visits and hospital admittance. Manufacturing companies deploy sensors in their products to return a stream of telemetry. Sometimes this is used to deliver services like OnStar, that delivers communications, security and navigation services. Perhaps more importantly, this telemetry also reveals usage patterns, failure rates and other opportunities for product improvement that can reduce development and assembly costs. The proliferation of smart phones and other GPS devices offers advertisers an opportunity to target consumers when they are in close proximity to a store, a coffee shop or a restaurant. This opens up new revenue for service providers and offers many businesses a chance to target new customers. Retailers usually know who buys their products. Use of social media and web log files from their ecommerce sites can help them understand who didn’t buy and why they chose not to, information not available to them today. This can enable much more effective micro customer segmentation and targeted marketing campaigns, as well as improve supply chain efficiencies. Finally, social media sites like Facebook and LinkedIn simply wouldn’t exist without big data. Their business model requires a personalized experience on the web, which can only be delivered by capturing and using all the available data about a user or member.

P a g e | 15

2.1.4 Estimations of IT spending driven by Big Data issues

The huge volumes of data generated by today’s digital businesses, known as “big data”, will drive $28 billion of worldwide IT spending this year and $34bn next year, according to a forecast from Gartner, the IT research firm.

At the same time, Gartner predicted that by 2015, 4.4 million IT jobs will be created to support big data, including 1.9 million in the US, but warned that there will be a scramble for the limited number of IT professionals qualified to fill these jobs.

P a g e | 16

$232 Billion is projected to be sold in total across all categories in the forecast from 2011 to 2016. From $24.4 Billion in 2011 to $43.7 Billion in 2016, this presents a 12.42% CAGR in total market growth.

2.2 BIG DATA ARCHITECTURE CAPABILITIES AND THEIR PRIMARY TECHNOLOGIES

2.2.1 Comparison of information architectures

Big data differs from other data realms in many dimensions. In the following table you can compare and contrast the characteristics of big data alongside the other data realms.

Source: Oracle, 2012 These different characteristics have influenced how you capture, store, process, retrieve, and secure your information architectures. As you evolve into Big Data, you can minimize your architecture risk by finding synergies across your investments allowing you to leverage your specialized organizations and their skills, equipment, standards, and governance processes.

P a g e | 17

Here is an example for data flow architecture diagram when big data is used for combined analytics.

Source: Oracle, 2012

2.2.2 Storage and Management Capability

a) Hadoop Distributed File System (HDFS)

HDFS has two main layers:

Namespace o Consists of directories, files and blocks o It supports all the namespace related file system operations such as create, delete, modify

and list files and directories. Block Storage Service has two parts

P a g e | 18

o Block Management (which is done in Namenode) Provides datanode cluster membership by handling registrations, and periodic heart

beats. Processes block reports and maintains location of blocks. Supports block related operations such as create, delete, modify and get block

location. Manages replica placement and replication of a block for under replicated blocks and

deletes blocks that are over replicated. o Storage ‐ is provided by datanodes by storing blocks on the local file system and allows

read/write access. In order to scale the name service horizontally, federation uses multiple independent Namenodes/namespaces. The Namenodes are federated, that is, the Namenodes are independent and don’t require coordination with each other. The datanodes are used as common storage for blocks by all the Namenodes. Each datanode registers with all the Namenodes in the cluster. Datanodes send periodic heartbeats and block reports and handles commands from the Namenodes. Here the Key Benefits

Namespace Scalability ‐ HDFS cluster storage scales horizontally but the namespace does not. Large deployments or deployments using lot of small files benefit from scaling the namespace by adding more Namenodes to the cluster

Performance ‐ File system operation throughput is limited by a single Namenode in the prior architecture. Adding more Namenodes to the cluster scales the file system read/write operations throughput.

Isolation ‐ A single Namenode offers no isolation in multi user environment. An experimental application can overload the Namenode and slow down production critical applications. With multiple Namenodes, different categories of applications and users can be isolated to different namespaces.

By way of conclusion, here are the main characteristics of HDFS known by developers:

An Apache open source distributed file system, http://hadoop.apache.org Expected to run on high‐performance commodity hardware Known for highly scalable storage and automatic data replication across three nodes for fault tolerance

Automatic data replication across three nodes eliminates need for backup Write once, read many times

b) Cloudera Manager:

Cloudera Manager is the market‐leading management platform for CDH (Cloudera's Distribution, including Apache Hadoop). As the industry’s first end‐to‐end management application for Apache Hadoop, Cloudera Manager sets the standard for enterprise deployment by delivering granular visibility into and control over every part of CDH ‐ empowering operators to improve cluster performance, enhance quality of service, increase compliance and reduce administrative costs.

Here are the main characteristics of Clourdera Manager:

Cloudera Manager is an end‐to‐end management application for Cloudera’s Distribution of Apache Hadoop, http://www.cloudera.com

Cloudera Manager gives a cluster‐wide, real‐time view of nodes and services running; provides a single, central place to enact configuration changes across the cluster; and incorporates a full range of reporting and diagnostic tools to help optimize cluster performance and utilization.

P a g e | 19

2.2.3 Database Capability

a) Oracle NoSQL

Oracle NoSQL Database delivers scalable throughput with bounded latency, easy administration, and a simple programming model. It scales horizontally to hundreds of nodes with high availability and transparent load balancing. "NoSQL" is a general term meaning that the database isn't an RDBMS which supports SQL as its primary access language, but there are many types of NoSQL databases: BerkeleyDB is an example of a local NoSQL database, whereas HBase is very much a distributed database.

Source: Oracle, 2012 Here are the main characteristics of Oracle NoSQL:

Dynamic and flexible schema design. High performance key value pair database. Key value pair is an alternative to a pre‐defined schema. Used for non‐predictive and dynamic data.

Able to efficiently process data without a row and column structure. Major + Minor key paradigm allows multiple record reads in a single API call

Highly scalable multi‐node, multiple data center, fault tolerant, ACID operations Simple programming model, random index reads and writes Not Only SQL. Simple pattern queries and custom‐developed solutions to access data such as Java APIs.

b) Apache HBase

Apache HBase™ is the Hadoop database, a distributed, scalable, big data store. You can use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables ‐‐ billions of rows X millions of columns ‐‐ atop clusters of commodity hardware. Apache HBase is an open‐source, distributed, versioned, column‐oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable‐like capabilities on top of Hadoop and HDFS. Here are the main characteristics of Apache Hbase:

Allows random, real time read/write access Strictly consistent reads and writes Automatic and configurable sharding of tables Automatic failover support between Region Servers

c) Apache Cassandra

The Apache Cassandra database is the right choice when you need scalability and high availability without

P a g e | 20

compromising performance. Linear scalability and proven fault‐tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission‐critical data. Here are the main characteristics of Apache Cassandra:

Data model offers column indexes with the performance of log‐structured updates, materialized views, and built‐in caching

Fault tolerance capability is designed for every node, replicating across multiple datacenters Can choose between synchronous or asynchronous replication for each update

d) Apache Hive

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad‐hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL‐like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. Hive is based on Hadoop, which is a batch processing system. As a result, Hive does not and cannot promise low latencies on queries. The paradigm here is strictly of submitting jobs and being notified when the jobs are completed as opposed to real‐time queries. In contrast to the systems such as Oracle where analysis is run on a significantly smaller amount of data, but the analysis proceeds much more iteratively with the response times between iterations being less than a few minutes, Hive queries response times for even the smallest jobs can be of the order of several minutes. However for larger jobs (e.g., jobs processing terabytes of data) in general they may run into hours. In summary, low latency performance is not the top‐priority of Hive's design principles. What Hive values most are scalability (scale out with more machines added dynamically to the Hadoop cluster), extensibility (with MapReduce framework and UDF/UDAF/UDTF), fault‐tolerance, and loose‐coupling with its input formats. Here are the main characteristics of Hive:

Tools to enable easy data extract/transform/load (ETL) from files stored either directly in Apache HDFS or in other data storage systems such as Apache HBase

Uses a simple SQL‐like query language called HiveQL Query execution via MapReduce

2.2.4 Processing Capability

a) MapReduce

Source: Oracle, 2012

P a g e | 21

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model. Here are the main characteristics of MapReduce:

Defined by Google in 2004 Break problem up into smaller sub‐problems Able to distribute data workloads across thousands of nodes Can be exposed via SQL and in SQL‐based BI tools

b) Apache Hadoop

Apache Hadoop is 100% open source, and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry‐standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big. And in today’s hyper‐connected world where more and more data is being created every day, Hadoop’s breakthrough advantages mean that businesses and organizations can now find value in data that was recently considered useless. Here are the main characteristics of Apache Hadoop:

Leading MapReduce implementation Highly scalable parallel batch processing Highly customizable infrastructure Writes multiple copies across cluster for fault tolerance

2.2.5 Data Integration Capability

a) Oracle Big Data Connectors, Oracle Loader for Hadoop, Oracle Data Integrator

Built from the ground up by Oracle, Oracle Big Data Connectors delivers a high‐performance Hadoop to Oracle Database integration solution and enables optimized analysis using Oracle’s distribution of open source R analysis directly on Hadoop data. By providing efficient connectivity, Big Data Connectors enables analysis of all data in the enterprise – both structured and unstructured.

P a g e | 22

Here are the main characteristics of Big data connectors:

Exports MapReduce results to RDBMS, Hadoop, and other targets Connects Hadoop to relational databases for SQL processing Includes a graphical user interface integration designer that generates Hive scripts to move and transform MapReduce results

Optimized processing with parallel data import/export Can be installed on Oracle Big Data Appliance or on a generic Hadoop cluster

2.2.6 Statistical Analysis Capability

a) Open Source Project R and Oracle R Enterprise:

P a g e | 23

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time‐series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R's strengths is the ease with which well‐designed publication‐quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS. Here are the main characteristics of project R:

Programming language for statistical analysis Introduced into Oracle Database as a SQL extension to perform high performance in‐database statistical analysis

Oracle R Enterprise allows reuse of pre‐existing R scripts with no modification

2.3 TRENDS ON BIG DATA

In the Big Data exhibition in Paris, innovations were not really displayed as the Big Data world is continuously evolving towards something, no one really knows. So, experts were mostly exchanging words on what they are doing and most importantly on how they feel about the future on big data. They were all agreeing on one thing: Big Data is something that is going to fuel the 21st century and it is almost impossible to forecast how big an economical impact will come from the use of Big Data. Indeed, a new kind of job is coming: Data scientist! But, all the experts pointed out that there will be a shortage of talent for these jobs. The advance of big data shows no signs of slowing. Data scientists are

P a g e | 24

difficult and expensive to hire, and given the very competitive market for their services, difficult to retain. There simply are not a lot of people with their combination of scientific background and computational and analytical skills. Among the conferences, it was possible to define some trends in the Big Data world.

2.3.1 The Internet of Things already here

It is not so long ago that the “Internet of Things” (a vast collection of small devices seamlessly connected to the Net) was still just a concept in research papers. And before you know it, it’s here, and like Monsieur Jourdain, people don’t quite fully understand it. Even if you think calling your smart phone a “Thing” is debatable, and yet it is a “Thing” that sends lots and lots of information to many servers world‐wide, you would be amazed to know the number of anonymous devices that are already fully connected. For example, La Poste has worked with Exalead on connecting the opto‐electronic machines that it uses to filter and sort our mail to the Net. It then uses all the information gathered to build a full‐fledged business intelligence tool, used to operationally monitor the system. Another example: did you know that high‐end car manufacturers have turned their vehicles into “Things” that keep sending monitoring information to central servers to assure better service and maintenance? One has to understand that every such “Thing” creates huge logs of, literally, hundreds of billions of records: that’s more than pages on the entire Web!

2.3.2 Getting to the right business model(s) for data

Data, is the new frontier these days. Big Data, Open Data, DaaS (Data as a Service), as one can name it. Data is like Software, it is very scalable: one invests heavily to create data sets, and then sells them by the millions, with zero or very small marginal costs. At least that is how the theory goes. But in fairness, it’s hard to say that anybody has cracked the right business model for data. For instance, one interesting question remains: to be scalable, a data set needs to be reusable by many applications and developers. But then, the value of such a data data set is probably very low, unless it’s absolutely needed to build everybody’s application and you have exclusivity, which is likely to be a very rare case, especially with Open Data. At the other end of the spectrum, using the Big Data artillery to build a very specific data set can yield a very exclusive “product” that can only be used by one or maybe a handful of non‐competing companies. Such a data set can be very expensive (to build and to buy), and can also create a lot of value for the company that uses it. But it’s an entirely different business model that is very different from the intrinsically scalable business model of the software industry (especially, SaaS). At least until someone cracks it.

2.3.3 Adding a Social layer to traditional activities

Well, that is also a very interesting trend: using social networks like Twitter to produce real‐time “voice of the customer” applications. Indeed, Facebook knows what you are doing, Twitter knows what you are saying, and Google knows what you are thinking. For instance, Mesagraph is working with broadcasters to build iPad applications connected to TV programs so that you can comment and interact with other viewers in real‐time, while you’re watching a show. That is truly revolutionary: finally, a way to connect back to the broadcasters. Consumers can find their interests here, quite obviously, but at the same time, think of the implications in terms of advertising. Real‐time advertising, even. Fine‐grained audience segmentation. This is an entirely new field with all sorts of promises and challenges. Another very interesting application that was presented at WWW2012 is the use of tweets to monitor the Netflix media streaming service, by detecting tweets containing phrases like “is out” (come on, guys, you can do better than that :‐) . Even with very simple heuristics, about 90% of outages were correctly detected.

P a g e | 25

2.3.4 The New Frontier of Business Intelligence & Semantics at petabyte scale

The Internet of Things is making petabyte scales a reality today (a petabyte is 1,000 terabytes, or 1,000,000 gigabytes). A copy of the entire Web amounts to several petabytes. So Big Data technologies are needed to handle such a vast amount of data, and one has to perform some form of Business Intelligence to make sense of it. There are two major breakthroughs to handle this challenge. On one side, RAM‐based databases, where data is organized in “columns”, as opposed to “rows”, allow for very fast processing of large quantities of data (as long as this data fits in RAM, that is). Slicing and dicing couldn’t be any faster or easier. On the other hand, search‐engines, which are “columnar” by essence, are evolving to handle many more kind of data (semantic, numeric, etc.), are becoming more and more transactional (“ACID”, in barbarian terms) and can process even larger data sets since they do not require that entire data sets fit in RAM. You get to choose your favorite. But one thing is clear: semantic treatment of textual data will be a major requirement for next‐generation Business Intelligence platforms. That is the next frontier for Big Data. And search engines are uniquely positioned to win this race.

2.4 KEY COMPANIES IN THE BIG DATA EXIBITION IN PARIS

2.4.1 Data Publica

Address : Data Publica ‐ 8 rue Jouffroy d’Abbans – 75017 Paris, France Website: http://www.data‐publica.com/

Contact :M. François BANCILHON Mail: francois.bancilhon@data‐publica.com

Created in July 2011, Data Publica is one of the leading historical open data in France. The company has benefited from technological investments made in 2010 as part of a R & D project The company was initially funded by a group of "angels" and the seed fund IT Translation . Data Publica is a company working on assembling data sets built from both public data and open data, and then selling these data sets to companies to help them build innovative applications. Data Publica describes itself as a “Data Vendor” similar, in the domain of Open Data, to what “Software Vendors” are to the domain of Software.

2.4.2 Altic

Address :95 Avenue Victor Hugo, 93360 NEUILLY PLAISANCE Tel: 09 53 64 63 69 Website: http://www.altic.org/ Contact: Marc SALLIERES (CEO), contact@altic.org

ALTIC is an ALTernative of Information and Communication. It is an Open Source Software integrator created in June of 2004, and a founding member of the ASS2L. ALTIC assists companies and administration to implement the management software in Open Source. It works on the following domains and open source solutions: Business Solutions (SpagoBI, Talend, JasperReports, BIRT, LemonOLAP), Management Solutions (Compiere, Vtiger, SQL/Ledger), Communication Solutions (Joomla!, Tutos, LemonLDAP). Altic supports also the LemonLDAP project, the Open Source Web SSO.

P a g e | 26

2.4.3 Talend

Address :Talend SA, 9 rue Pagès, 92150 Suresnes France Website: http://fr.talend.com/

Contact : M. Cédric CARBONE Tel: +33 1 46 25 06 00 sales.fr@talend.com

Talend is one of the largest pure play vendors of open source software, offering a breadth of middleware solutions that address both data management and application integration needs. Since the emergence of data integration and data quality tools in the 1990s, and the more recent appearance of Master Data Management solutions, the data management market has been dominated by a small ‐ and quickly consolidating ‐ number of traditional vendors offering proprietary, closed solutions, which only the largest and wealthiest organizations can afford. The situation in the application integration space is quite similar, with significant consolidation occurring as well. As a result, only a minority of organizations use commercial solutions to meet their data management and application integration needs. Indeed, these solutions not only demand a steep initial investment, but they also often require significant resources to manage implementation and ongoing operation. Furthermore, companies are faced with exponential growth in the volume and heterogeneity of the data and applications they need to manage and control. A key challenge that IT departments face today is ensuring the consistency of their data and processes by using modeling tools, workflow management and storage, the foundations of data governance in any company today. This challenge is actually faced by organizations of all sizes ‐ not only the largest corporations. In just a few years, Talend has become the recognized market leader in open source data management. The acquisition in 2010 of Sopera, a leader in open source application integration, has reinforced Talend’s market coverage, creating a global leader in open source middleware. Many large organizations around the globe use Talend's products and services to optimize the costs of data integration, data quality, Master Data Management (MDM) and application integration. With an ever growing number of product downloads and paying customers, Talend offers the most widely used and deployed data management solutions in the world.

P a g e | 27

CONCLUSION

According to the organizers, the exhibition Mobile IT and Big Data have not attracted many visitors. Big companies like Orange, SFR, Bouygues, Free for telecoms or like Intel, Dell, IBM for Big Data were absent. But the conferences on the evolution of these sectors have been very successful. In a gloomy atmosphere where visitors and exhibitors talk openly about tiny budgets for information technology, some sectors, however, were quite healthy and innovative. This was the case for equipment manufacturers and developers of next generation telephony, or web provider. There were some impressive innovations in the field of smartphones coming from a large number of young companies, specializing in mobile business solutions. The advent of smartphones and tablets is revolutionizing enterprise mobility. Judicious use of interfaces from the video games industry brings playful applications, which allows more friendly use by customers. We talk about "gamification" phenomenon, which is about to commercially explode in the short term.

Conferences on Big Data grew quite a crowd and allowed visitors to discover an emerging sector that should weigh heavily in the development of enterprises. In only 10 years, the amount of data increased exponentially. Data storage is a costly problem for businesses, but these data are relatively untapped by companies. The idea of big data is to create added value from very diverse data. People now talk about flows, exchanges, collaborations rather than storage. Nothing is sorted but everything can be found. Big Data (from 10 TB of data) is revolutionizing the infrastructure in information technology. Environments such as Hadoop provide flexibility in resources and adapt to the workload by adding inexpensive servers in parallel. Big Data has generated a turnover of $ 17 billion in 2011 and it is estimated that this figure will double by 2016. The great debate with big data is to find a balance between data transparency and privacy of citizens. Big data is rapidly emerging as a market force, not just a single market unto itself. Big Data IT Services Spending will attain a 10.20% CAGR from 2011 to 2016. By 2020, big data functionality will be part of the baseline of enterprise software, with enterprise vendors enhancing the value of their applications with it.

P a g e | 28

ABOUT VEILLE SALON

Officially launched in early 2010 by VIEDOC Consulting, a business & competitive & technological intelligence company, VeilleSalon.com is the first professional service for watching and reporting on trade show innovations for companies and is based on one of the largest global directory of trade shows, symposiums and other international events. This new professional service is designed both for visitors / companies, for exhibitors and trade show organizers. Through a bilingual directory, VEILLE SALON has already referenced more than 7,500 exhibitions and international events sorted and searchable according to business areas:

for industrial sector : Aerospace, Agriculture, Agribusiness, Automotive, Materials, Construction, Consumer goods, Cosmetics, Electronics, Defense, Energy, Optics, Pharmaceuticals, Telecommunications ...

for tertiary sector: Banking / Insurance, Hospitality, Real Estate, Media / advertising, Human Services, Tourism ...

for business area : Chemistry, Design / Architecture, Distribution, Packaging, Education / Training, Health & Environment, Computing, Innovation, Maintenance, Mechanical, Quality, Human Resources.

Besides the powerful features of multi‐criteria searches (dates, places, keywords, sectors, organizers, exhibitors ...), VeilleSalon.com also offers visitors a customized and interactive calendar of forthcoming exhibitions, a monthly newsletter, a forum and many other services. For potential exhibitors and event’s organizers, VeilleSalon.com is a real communication tool: registration of new events, presentation of your company and of latest news (product & process innovations, new services), free or charged conference proceedings, real time information for the visitor ... VeilleSalon.com is also a forum where visitors can meet directly with you to prepare at best their visit and where they can get information about your company. Why offer a professional service dedicated to trade show innovation watching? Watching trade show innovations is an ideal way to identify and analyze competitors, suppliers, new products, equipment, and services, to detect technology transfers and innovations, to achieve business development with potential new customers and to enhance market and trends knowledge. Therefore the team VEILLESALON, through experienced consultants and seasoned business intelligence engineers from VIEDOC Consulting, offers a range of services in: reporting on trade show innovations, in France and abroad, supporting individuals on‐site events, conducting on demand investigations and interviews, staff training... So whether you are a company wishing to maximize your trade show innovation watch, a future exhibitor or an event organizer, we have developed tailored solutions to meet your expectations. To access our website: http://www.veillesalon.com.

P a g e | 29

PRESENTATION OF VIEDOC SARL

VIEDOC CONSULTING’s core business is information. VIEDOC is your company’s partner from strategy to operation. VIEDOC aims to assist its customers in the first stages of their activities (Business intelligence, knowledge management, competitive analysis, technological watch, market research, patent monitoring, benchmarking, technology transfers, state of the art ...) through information collect and analysis relevant to your business. Business Intelligence does not require mandatory life‐long skills within the company but impose to get the right information at the right time. VIEDOC has worked for customers both on extended and short periods of time to assist companies in decision making. VIEDOC advises companies from all industries (automotive, aerospace and defense, food, cosmetics, health, materials, optics, packaging, telecommunications ...). VIEDOC can assist companies that are ambitious and aware of the importance of investing at this level:

From the small innovative company looking forward to having strategic advice in tight milestones, up to major industrial groups anxious to keep their leadership position.

Methodology: We have a pragmatic approach built on a rigorous methodology showing the issues of collecting, processing, analyzing and dispatching of information with high added value information. Through its multi‐sector experience, VIEDOC provides its clients with services tailored to their needs by listening to their concerns and being available to meet their requirements and methods. To successfully help its customers at different stages of the life of their company (from creation to recovery), of their products (from design to sale) or of their projects (from the first study to the end of the project), VIEDOC operates both on process and on product innovation. VIEDOC deals both with technical and economical information. You can benefit from our experience, of specialists in collecting and analyzing value‐added information, from our methodologies and analytical capacity to provide qualified information and high quality validation. As experts in technology transfer identification, we have consistently grown our multisectoral vision by providing our professionalism and expertise to many clients, large industrial groups and SMEs, in a dozen of distinct sectors. This experience allows us today to make available to our customers, a meaningful analysis which does not neglect any technical, economical, legal and human implications and fully complies with ethical rules that guide all activities of our company.

P a g e | 30

www.veillesalon.com

Un service made by :

VIEDOC SARL

2 Rue Hélène Boucher 78280 Guyancourt (France) Tel : +33 (0)1 30 43 45 27 Email : info@viedoc.biz Website : www.viedoc.fr

rapport veille salon-mobile it & bigdata

Technology

veille salon report mobile it & big data paris 2012

bigdata @ comscore

salon réunir / conférence recherche d'information et...

rapport de veille salon industrie paris 2010 def

bigdata gameverse

bigdata analytics

bigdata insights

bigdata mapreduce

bigdata opensource

rapport de veille salon pcd 2011

bigdata analytics

bigdata hadoop bigdata analytics_mcal

salon maison passive 2013 - conférence veille construction...

1. l’abc de la recherche documentaire - inspq.qc.ca ·...

bigdata primer

the state of bigdata - meetup bigdata @ovh

cloud bigdata

bigdata summit.key

frontier bigdata

prÉsentation promosalons · assister à un salon leader,...