002 ugm2013 whats new final

Post on 12-Nov-2014

27 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

What‘s new…

Bernd Wiswedel

KNIME.com AG, Zurich, Switzerland

Two feature releases last year: 2.6 & 2.7

Documented in Changelog, “What‘s new

summary” and as video on YouTube

What‘s new page on knime.org

KNIMETV Youtube Channel

Outline

Illustrative examples

• Swiss Survival Analysis

• KNIME Forum Analysis

• (Next Best Offer)

New Features in 2.6 & 2.7

Outline

Illustrative examples

• Swiss Survival Analysis

• KNIME Forum Analysis

• (Next Best Offer)

New Features in 2.6 & 2.7

Swiss Survival Analysis

• Survival Analysis / Actuarial Tables

• Using population and deaths data to predict

longevity

• Creating the tables

• Investigating the tables

• Creating customer tables for:

• Overall

• Personal

• Historical

• Forecasting

• Make it easy to use for the non-expert!

Outline

Illustrative examples

• Swiss Survival Analysis

• KNIME Forum Analysis

• (Next Best Offer)

New Features in 2.6 & 2.7

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

KNIME Forum Analysis

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Forum Analysis – Get Data

Two alternatives:

• Connect to underlying database, read

content

Doable but complicated:

7+ tables need to be read,

prepared and joined

Forum Analysis – Get Data

Two alternatives:

• Connect to underlying database, read

content

complicated and not generic

• Crawl the web page, parse html

• Use XML parser & Palladian’s html retriever

nodes

Forum Analysis – Structure of forum

Several Categories, “KNIME General”,

“KNIME Reporting”, “Palladian”, …

(~20 in total)

Forum Analysis – Structure of forum

Discussion threads on several sub-pages

Forum Analysis – Structure of forum

Each thread consists of an initial post

and a variable number of comments

Forum Analysis – Crawler Flow

Forum Analysis – Crawler Flow

Forum Analysis – Crawler Flow

Forum Analysis – Crawler Flow

Forum Analysis – Crawler Flow

Forum Analysis – Crawler Flow

Forum Analysis – Structure of forum

Discussion threads on several sub-pages

Forum Analysis – Crawler Flow

Forum Analysis – Crawler Flow

Forum Analysis – Crawler Flow

Input for all subsequent workflows!

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Forum Analysis – Simple Statistics

Forum Analysis – Simple Statistics

Input table from crawler workflow

Forum Analysis – Simple Statistics

Meta nodes perform simple

preprocessing, e.g. average number

of active users per month

Forum Analysis – Simple Statistics

Many different reporting nodes with

different statistics. Reporting

extension to generate PDF, DOC, …

Forum Analysis – Simple Statistics

Number of active users per year

Forum Analysis – Simple Statistics

An active user is an user with at

least one comment or one post in

that year.

Number of posts per year

Forum Analysis – Simple Statistics

Numbers are just posts (new

discussion threads), not comments

Number of posts per month and year

Forum Analysis – Simple Statistics

Big increase early 2011.

Coincidentally, Simon Richards

(richards99) joined

Who comments/answers on posts?

Forum Analysis – Simple Statistics

Response time

Forum Analysis – Simple Statistics

Number of comments per post

Forum Analysis – Simple Statistics

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Forum Analysis – Classify Posts

• Use text mining to classify forum post into

categories such as ‘io’, ‘manipulation’,

‘mining’, …

• No training set available

(mis-)use KNIME node description

• See evolution of discussion topics over the

years

Forum Analysis – Classify Posts

Want to classify forum post (only

first post, no comments)…

Forum Analysis – Classify Posts

… using KNIME node description text

as labeled training set

Forum Analysis – Classify Posts

Reads node descriptions from xml

dumps (generated with KNIME

command line tool)

Uses forum data input file and

prepares with text mining tools

Forum Analysis – Classify Posts

Unzips an archive with all xml files

into temp location

Forum Analysis – Classify Posts

XML files read with loop and

preprocessed (header and footer

removed)

Forum Analysis – Classify Posts

Description is converted into KNIME

text document, from which

(stemmed) terms are extracted

Forum Analysis – Classify Posts

Forum Analysis – Classify Posts

Training data extracted. Learning

attributes are keyword

occurrences; target is document

category

Forum Analysis – Classify Posts

Training data extracted. Learning

attributes are keyword

occurrences; target is document

category

Verify model by splitting data

into train/test.

Using random forest classifier to

address high dimensionality of

small (and sparse) data set

Forum Analysis – Classify Posts

… continuing with main input branch

(Input table from crawler workflow)

Forum Analysis – Classify Posts

Preprocessing similar to before,

extracting date, author, title, …

Forum Analysis – Classify Posts

Extracting attribute table using the

keywords from the node description

(training) data.

Forum Analysis – Classify Posts

Remainder of the workflow ranks

the prediction and prepares for the

report.

Forum Analysis – Classify Posts

Hot topics have always been

manipulation and mining … tasks

that KNIME is very good at.

Note also increase of ‘flowcontrol’

over the years and low ‘r’ traffic

(separate forum category, not part

of this data set)

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Forum Analysis – Content & Users

• Look at individual categories (KNIME

General, Developer, Reporting, …)

• Learn what is discussed

• See who is contributing

Forum Analysis – Content & Users

Input are all discussions

in one forum category…

Forum Analysis – Content & Users

Output is a multi page

report with tag cloud and

user connection graph

Combines KNIME’s text and

network mining extensions

Forum Analysis – Content & Users

Forum Analysis – Content & Users

Input table from crawler workflow

Forum Analysis – Content & Users

Main loop over all ~20 categories

Forum Analysis – Content & Users

General statistics per category

User network analysis

Text analytics

Forum Analysis – Content & Users

Text analysis: Forum posts converted

to documents and tagged (persons,

node names, node categories)

Forum Analysis – Content & Users

Terms fed into tag cloud, colors

represent persons (‘kilian’), nodes

(‘bow creator’), node categories

(‘xml’), …

Forum Analysis – Content & Users

Network analysis:

User connections

(content ignored)

Forum Analysis – Content & Users

Network analysis: Ignore topics, only

look at user relation ships. Network

nodes represent users, connections

represent (directed) relationships

between users

Forum Analysis – Content & Users

Network analysis: Very simple

user graph, visualized with

standard KNIME graph viewer

Forum Analysis – Content & Users

Data collected and send

to reporting extension

Forum Analysis – Content & Users

Multi page pdf output for

different forum categories

Forum Analysis – Content & Users

Text Mining forum category

Forum Analysis – Content & Users

RDKit (community

chemistry extension)

Forum Analysis – Content & Users

KNIME Users – not

dominated by any

particular users

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Reviewing all workflows

• All workflows rely on the same input data

• Requires re-run of “Crawler” workflow and

updating parameters in analysis flow

What do all these flows have in common?

They all require the “Crawler” data

Reviewing all workflows

• All workflows rely on the same input data

• Requires re-run of “Crawler” workflow and

updating parameters in analysis flow

• Better: Use meta node and share it between

all instances

They all require the “Crawler” data

They all require the “Crawler” data

Now use it in all the

analysis flows

Nice … but now all workflows

fetch the data each time they

execute!

Let’s add a cache option.

Quickform Node defining a switch:

-Get data from web or

-use cached file (lives on server)

Meta Node Templates

• Meta nodes as isolated functional unit

• Shared on KNIME Server (or teamspace) for

use in other workflows or by other users

• Quickforms to expose relevant parameters

in meta node dialog or in wizard execution

• Can also be used on the KNIME server…

Knime Web Portal

Knime Web Portal

Knime Web Portal

Knime Web Portal

Outline

Illustrative examples

• Swiss Survival Analysis

• KNIME Forum Analysis

• (Next Best Offer)

New Features in 2.6 & 2.7

NBO as a typical Project Collect training

data from

multiple sources:

- DB tables

- text files

- excel files

- SAS files

- binary tables

- map files

Define File Paths

and Parameters

Train and evaluate a number of

prediction algorithms to predict

variable Target

Retrieve old model

that has been decently working so

far

Compare

performances

and choose

best model

Recalculate

predictions

based on

best model

and save

Save

best

model

Read

current

data

NBO as an Example

Collect Training

Data from

multiple Sources

Select best

prediction model

Apply best

model to

score data

Select files and

define parameters

Build a

report

NBO Report

KNIME Server Training 109

Mean Error in %

Mean Error in %

e-mail

notification

Me.me@mycompany.com

Global

Flow

Variables

Quickform dialogs

Execution

Wizard File Upload

Quickforms

Value

Selection

Quickform

Integer Input

Quickform „Workflow

Stopped“ light

Status

“Workflow

Running” icon

“Workflow

Running” light

KNIME User Training 115

Errors and Warnings

Report

Export report as

Results of past

Executions

Outline

Illustrative examples

• Swiss Survival Analysis

• KNIME Forum Analysis

• (Next Best Offer)

New Features in 2.6 & 2.7

New Features in 2.6 & 2.7 - Highlights

• Enhanced database functionality

• File Handling node collection

• More flexible R integration

• Streaming API

• Better (Java) scripting support

• Hypothesis testing nodes

• UI Changes

• Database update and delete

Enhanced DB functionality

• New type support: Boolean and Blobs

Enhanced DB functionality

• Set of nodes to read, (un)zip, copy, move,

convert, … files

• Add notion of unique resource identifier

(URI) and mime types

Used in 3rd party extensions

• Nodes to up and

download files:

ssh, http, ftp, …

File Handling Nodes

• Collection of Nodes to extract statistical measures

• Different t-tests

• Anova

• (Crosstab)

Hypotheses Testing Nodes

• Before KNIME 2.7:

• With KNIME 2.7:

Flexible R integration

• Enhanced functionality:

• define multiple outputs at once

• Script templates

• Better editor

• Syntax highlighting

• Auto completion

Scripting – Java Snippet & friends

Enhanced programming interface in KNIME

enabling nodes to be streamed and

distributed.

Streaming API

KNIME Explorer replaces “Workflow Projects”

KNIME UI Changes

Customizable Node repository

(getting from 1500+ nodes to <100)

KNIME UI Changes

Tons more …

Summary

Discussed KNIME Usage Examples

check “Examples” Server for even more

New functionality constantly added, thanks to

community, partners and customers

And more is coming…

top related