[d2 campus] tech meet-up `data science` 발표자료

61
데이터사이언스 & 네이버 최재걸 통합검색

Upload: naver-d2

Post on 12-Apr-2017

45 views

Category:

Engineering


3 download

TRANSCRIPT

  • &

  • 2003

    : ?

    : . .

  • 2017

  • ..

  • 1.

  • ML DM STAT

    Data Mining (KDD)

    Machine Learning

    ( AI )

    Statistics

    From http://www.kdnuggets.com/2014/06/data-science-skills-business-problems.html

  • 1.1Data Mining

    From www.saedasayad.com

    -Solving everything -Algorithmic & Efficient

  • 1.2 Machine Learning

    From http://www.humphreysheil.com/blog/deep-learning-and-machine-learning

    -AI is all of computer science -Learn, learn and learn

  • 1.3 Statistics

    From www.quora.com

    - The World is probabilistic - Model and Distribution

    Too formal but strong

    https://www.quora.com/What-can-you-do-with-a-double-major-in-Computer-Science-and-Statistics

  • 1.4 Why statistics?

    Data Mining (KDD)

    Machine Learning

    ( AI )

    Statistics

    DATA Probability inevitably

    Association Rule

    ( Conditional Probability)

    K-means ( EM ) 1. NO BLACK BOX

    2. BREAKTHROUGH

  • 2003 2017

    2007

    2010

    2012

    AI

  • .

  • NEXT..

  • 2.

  • 1.

    ? ?

    ..

  • 2.

    . .

  • 3.

    Data Mining (KDD)

    Machine Learning

    ( AI )

    Statistics

  • 3.

  • 1.

  • 1.

    ? ? ? ?

  • 1.

    ? ? ? ?

  • 2.

    ? , , ? ? ? ?

  • &

    LDA

    Fraud detection

    Team matching

    ROBOT IR ( )

    TOPIC

  • UX - UI

    -7%

  • 2nd 3600px

    1st 2750px

    3rd 4550px

    2nd 1060px

    1st 560px

    3rd 2050px

    1000px

    2000px

    3000px

    4000px

    5000px

    560px

    User

    Valid max 4050 ( 98%)

    2nd 610px

    1st 560px

    3rd 720px

    - boxplot

    .

  • -50000

    0

    50000

    100000

    150000

    200000

    250000

    300000

    350000

    0 500 1000 1500 2000 2500 3000 3500 4000 4500

    median

    median

    median

    - , Graph

  • 4.

  • NLP ? text ? ?

  • 5. &

  • 0. 5 .

    (1) Data Collection : ( )

    (2) Descriptive Statistics - : .

    (3) Exploratory data analysis : , .

    (4) Hypothesis testing : .

    (5) Estimation : . 5 .

  • 1. (Descriptive Statistics) -

    ?

    -> .

    -> , ,

    Median, quantile, variance,

  • 1. (Descriptive Statistics) -

    [ multi modal ]

    , .

    ! , .

  • 2. (Exploratory Data Analysis ) - ( )

    ?

  • 2. (Exploratory Data Analysis ) -

    - Sequence Mining - Clustering - Classification - Topic modeling - Deep learning

    [ clustering ]

  • 3. (Hypothesis testing) -

    ?

    -> : , ..

  • 3. (Hypothesis testing) -

    - P-value - T test, Chi square

    test - Likelihood ratio - Cross validation

  • 4. (Estimation) - ,

  • 4. (Estimation) - ,

    - Bayesian Inference - Deep Learning

  • 5. (Data Collection) -

    O

    X

    ? SNS

  • 3.1 Agony..

    D-

    Drop

    ..

  • 3.2 Learn from problem-solving Gaussian Mixture Model for MUSIC ( 2012 )

    Beat

    ,

    .

    +

    .

  • 3.3 Roughly saying about Statistics..

  • 3.5 .

    t

    F

    ()

    continuous discrete

    :

  • 3.5 .

    t

    F

    ()

    continuous discrete

    bernuill binomial

    poisson

    multinomial Multivariate normaml

    gaussian

    beta

    dirichlet

    Student t

    Chi-square

    F

    Gamma

    -

  • 3.7 .

    From wikipedia

  • ..

    -Welcome!

    -

  • Q&A

  • Thank You