neural nets: how regular expressions brought about deep learning

Post on 08-Jan-2017

6.845 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Neural NetsHow Regular Expressions brought about Deep Learning

For millennia people have asked

Why?

What is mind?What is consciousness?

How does my brain work?

We need a new science!

Exploring how neuronswork since 1890s™

Guillaume-Benjamin-Amand Duchenne (de Boulogne)

Fast forward to 1943,just after

After Turing

After Turing

After Turing

in the USA

McCullochNeurosci

PittsLogician

published

During WW2

which tried to understand

During WW2

by modelling the neurons

they didn’t know it but...

Their paper had an enormous impact on computer science

Shortly after in 1951

In 1956

Stephen Kleene

set out to prove what

could compute in

could compute

In 1956

proposed asimple algebra

Kleene proposed asimple algebra

(well, simple for algebra)

that defined a

Regular Language

WTF’s a Regular Language?

Let’s start with

an alphabet

{0,1,2,3,4,5,6,7,8,9}

some words

123012334

23848484

a language

A language

0040050070012089

But what about aRegular Language?

If you can decide if a “word” is in a “language” by inspecting

its sequential “letters” in isolation

like

a bit like...

then it’s aRegular Language

Isn’t everything regular?

Nope

for example

Palindromes are not Regular

aaabba

abccbaabcddcba

abcdeedcba

So we call

a bit like...

Finite State Machineor

Deterministic Finite Automaton

In 1956

proved that

a bit like...

is equivalent to

McCullochNeurosci

PittsLogician

He showed a Regular Languagecan be defined using a

grammar with 3 operators

Concatenation

{"ab", "c"}{"d", "ef"}= {"abd", "abef", "cd", "cef"}

Alternation

{"ab", "c"}|{"ab", "d", "ef"}= {"ab", "c", "d", "ef"}

Kleene Star

{"ab", "c"}* = {"", "ab", "c", "abab", "abc", "cab", "cc",

"ababab", "abcab", ... }.

Look familiar?

TADA!

A Regular Expressiondefines a Regular Language

00(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*

00[1-9][0-9]*

00[1-9]\d*

But hold up….

In 1956

didn’t mention computers in

could compute

he was considering

During WW2

so how did they end upcentral to Computer Science?

Fast-forward to 1968

Ken Thomson

published

published

and introduced Regular Expressions in his port of the

editor QED for CTSS

and the next year

wrote “ed”

Fast-forward to 1968

which is still part of*nux today

It has the command

g/{some command}/p

g/re/p

grep

Then in the 80s

Larry Wall

created Perl

It has a further extended syntax

with features like

capturing subgroups(\d{5})(?:-(\d{4}))?

backrefs(.*)\1

lookarounds(?<=a)bcd

recursion(\w)(?:(?R)|\w?)\1

which is moreexpressive and flexible

so this is the default inPerl, Java, JS, Ruby, PHP, etc.

(*features may be missing from some implementations)

These RegEx aren’t always equivalent to a

a bit like...

for example: Palindromes

in fact

in fact in the

Noam Chomsky

defined a hierarchy thatsplits up the languages

Perl “RegEx” covers Context-Free Languages(and some other ones too)

so...

“RegEx” ⊃ Regular Expression

What about today?

RegEx is the workhorseof data extraction

fromLog collation

toWeb Scraping

infact

may generate a RegExwhen you train extractors

We wanted to try making this more accessible, so we put it up

as a labs API

regexpgen.import.io

Let’s check it out...

Remember I said about

McCullochNeurosci

PittsLogician

Their paper had an enormous impact on computer science

Even though they weren’t thinking about computers at all

That paper not onlygave rise to RegEx

but also

every neural network since, including

“Deep Learning”

using technology such as

Recurrent Neural Nets

using technology such as

using technology such as

and of course I have to mention

In fact at

we use neural nets for complex data extraction tasks

where a rule based system, or even ML, is simply not practical

such as

working out the name of a company from their website

I want to leave you witha final thought...

When you are training an extractor

and it generates a RegEx

that’s a Regular Expression

it’s defining a Regular Language

which can be detected by aFinite State Machine

which is equivalent to aa McCulloch-Pitts neural net

which blows my mind

Thank youmatthew.painter@import.io

top related