regular expressions 1 day 6 - 9/08/14 ling 3820 & 6820 natural language processing harry howard...
TRANSCRIPT
![Page 1: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/1.jpg)
Regular expressions 1Day 6 - 9/08/14LING 3820 & 6820
Natural Language Processing
Harry Howard
Tulane University
![Page 2: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/2.jpg)
Course organization
08-Sept-2014NLP, Prof. Howard, Tulane University
2
http://www.tulane.edu/~howard/LING3820/
The syllabus is under construction. http://www.tulane.edu/~howard/CompCu
ltEN/
![Page 3: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/3.jpg)
The quiz was the review.
Review
08-Sept-2014
3
NLP, Prof. Howard, Tulane University
![Page 4: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/4.jpg)
Open Spyder
08-Sept-2014
4
NLP, Prof. Howard, Tulane University
![Page 5: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/5.jpg)
§4. Regular expressions
08-Sept-2014
5
NLP, Prof. Howard, Tulane University
![Page 6: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/6.jpg)
Regular expressions, or regex >>> import re re.findall(pattern, target string)
08-Sept-2014NLP, Prof. Howard, Tulane University
6
![Page 7: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/7.jpg)
4.2. Fixed-length matching
08-Sept-2014
7
NLP, Prof. Howard, Tulane University
![Page 8: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/8.jpg)
The test string
>>> S = '''This above all: to thine own self be true,
... And it must follow, as the night the day,
... Thou canst not then be false to any man.'''
08-Sept-2014NLP, Prof. Howard, Tulane University
8
![Page 9: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/9.jpg)
Strings as regular expressions>>> re.findall(' be ', S)
[' be ', ' be ']
08-Sept-2014NLP, Prof. Howard, Tulane University
9
![Page 10: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/10.jpg)
Match one character of a disjunction with |>>> re.findall(' to | be | it | as ', S)
[' to ', ' be ', ' it ', ' as ', ' be ', ' to ']
>>> set(re.findall(' to | be | it | as ', S))
set([' it ', ' as ', ' to ', ' be '])
08-Sept-2014NLP, Prof. Howard, Tulane University
10
![Page 11: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/11.jpg)
Match a group of characters with capturing or non-capturing parentheses, ()>>> re.findall(' (to|be|it|as) ', S)
['to', 'be', 'it', 'as', 'be', 'to']
R>>> re.findall(' (?:to|be|it|as) ', S)
[' to ', ' be ', ' it ', ' as ', ' be ', ' to ']
The default behavior of parentheses is to capture the string inside them in the output. The ?: prefix turns capturing off. For the rest of this discussion, we prefer to exclude the spaces from the output.
08-Sept-2014NLP, Prof. Howard, Tulane University
11
![Page 12: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/12.jpg)
Match one character of a range with [] and its negation with [^]>>> re.findall(' ([a-z][a-z]) ', S)
['to', 'be', 'it', 'as', 'be', 'to']
>>> re.findall(' ([^0-9][^0-9]) ', S)
['to', 'be', 'it', 'as', 'be', 'to']
>>> re.findall(' ([a-e][a-e]) ', S)
['be', 'be']
>>> re.findall(' ([^a-e][^a-e]) ', S)
['to', 'it', 'to']
08-Sept-2014NLP, Prof. Howard, Tulane University
12
![Page 13: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/13.jpg)
Match a number of repetitions of a character with {}
>>> re.findall(' ([a-z]{2}) ', S)['to', 'be', 'it', 'as', 'be', 'to']
08-Sept-2014NLP, Prof. Howard, Tulane University
13
![Page 14: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/14.jpg)
Match any character with .
>>> re.findall(' (..) ', S)['to', 'be', 'it', 'as', 'be', 'to']>>> re.findall(' (.{2}) ', S)['to', 'be', 'it', 'as', 'be', 'to']
08-Sept-2014NLP, Prof. Howard, Tulane University
14
![Page 15: REGULAR EXPRESSIONS 1 DAY 6 - 9/08/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697c02f1a28abf838cda5ef/html5/thumbnails/15.jpg)
4.2.7. and following
Next time
08-Sept-2014NLP, Prof. Howard, Tulane University
15