neural nets: how regular expressions brought about deep learning
TRANSCRIPT
Neural NetsHow Regular Expressions brought about Deep Learning
For millennia people have asked
Why?
What is mind?What is consciousness?
How does my brain work?
We need a new science!
Exploring how neuronswork since 1890s™
Guillaume-Benjamin-Amand Duchenne (de Boulogne)
Fast forward to 1943,just after
After Turing
After Turing
After Turing
in the USA
McCullochNeurosci
PittsLogician
published
During WW2
which tried to understand
During WW2
by modelling the neurons
they didn’t know it but...
Their paper had an enormous impact on computer science
Shortly after in 1951
In 1956
Stephen Kleene
set out to prove what
could compute in
could compute
In 1956
proposed asimple algebra
Kleene proposed asimple algebra
(well, simple for algebra)
that defined a
Regular Language
WTF’s a Regular Language?
Let’s start with
an alphabet
{0,1,2,3,4,5,6,7,8,9}
some words
123012334
23848484
a language
A language
0040050070012089
But what about aRegular Language?
If you can decide if a “word” is in a “language” by inspecting
its sequential “letters” in isolation
like
a bit like...
then it’s aRegular Language
Isn’t everything regular?
Nope
for example
Palindromes are not Regular
aaabba
abccbaabcddcba
abcdeedcba
So we call
a bit like...
Finite State Machineor
Deterministic Finite Automaton
In 1956
proved that
a bit like...
is equivalent to
McCullochNeurosci
PittsLogician
He showed a Regular Languagecan be defined using a
grammar with 3 operators
Concatenation
{"ab", "c"}{"d", "ef"}= {"abd", "abef", "cd", "cef"}
Alternation
{"ab", "c"}|{"ab", "d", "ef"}= {"ab", "c", "d", "ef"}
Kleene Star
{"ab", "c"}* = {"", "ab", "c", "abab", "abc", "cab", "cc",
"ababab", "abcab", ... }.
Look familiar?
TADA!
A Regular Expressiondefines a Regular Language
00(1|2|3|4|5|6|7|8|9)(0|1|2|3|4|5|6|7|8|9)*
00[1-9][0-9]*
00[1-9]\d*
But hold up….
In 1956
didn’t mention computers in
could compute
he was considering
During WW2
so how did they end upcentral to Computer Science?
Fast-forward to 1968
Ken Thomson
published
published
and introduced Regular Expressions in his port of the
editor QED for CTSS
and the next year
wrote “ed”
Fast-forward to 1968
which is still part of*nux today
It has the command
g/{some command}/p
g/re/p
grep
Then in the 80s
Larry Wall
created Perl
It has a further extended syntax
with features like
capturing subgroups(\d{5})(?:-(\d{4}))?
backrefs(.*)\1
lookarounds(?<=a)bcd
recursion(\w)(?:(?R)|\w?)\1
which is moreexpressive and flexible
so this is the default inPerl, Java, JS, Ruby, PHP, etc.
(*features may be missing from some implementations)
These RegEx aren’t always equivalent to a
a bit like...
for example: Palindromes
in fact
in fact in the
Noam Chomsky
defined a hierarchy thatsplits up the languages
Perl “RegEx” covers Context-Free Languages(and some other ones too)
so...
“RegEx” ⊃ Regular Expression
What about today?
RegEx is the workhorseof data extraction
fromLog collation
toWeb Scraping
infact
may generate a RegExwhen you train extractors
We wanted to try making this more accessible, so we put it up
as a labs API
regexpgen.import.io
Let’s check it out...
Remember I said about
McCullochNeurosci
PittsLogician
Their paper had an enormous impact on computer science
Even though they weren’t thinking about computers at all
That paper not onlygave rise to RegEx
but also
every neural network since, including
“Deep Learning”
using technology such as
Recurrent Neural Nets
using technology such as
using technology such as
and of course I have to mention
In fact at
we use neural nets for complex data extraction tasks
where a rule based system, or even ML, is simply not practical
such as
working out the name of a company from their website
Let’s have a look...
I want to leave you witha final thought...
When you are training an extractor
and it generates a RegEx
that’s a Regular Expression
it’s defining a Regular Language
which can be detected by aFinite State Machine
which is equivalent to aa McCulloch-Pitts neural net
which blows my mind
Thank [email protected]