regular expressions · regexp-module author: gvwilson created date: 10/21/2010 1:22:46 pm keywords...
TRANSCRIPT
![Page 1: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/1.jpg)
More Tools
Regular Expressions
More Tools
Copyright © Software Carpentry 2010
This work is licensed under the Creative Commons Attribution License
See http://software-carpentry.org/license.html for more information.
![Page 2: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/2.jpg)
Thousands of papers and theses written in LaTeX
Regular Expressions More Tools
![Page 3: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/3.jpg)
Thousands of papers and theses written in LaTeX
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.
⋮ ⋮ ⋮ ⋮ ⋮
⋮ ⋮
Regular Expressions More Tools
![Page 4: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/4.jpg)
Thousands of papers and theses written in LaTeX
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.
⋮ ⋮ ⋮ ⋮ ⋮
⋮ ⋮
All share a common bibliography
Regular Expressions More Tools
All share a common bibliography
![Page 5: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/5.jpg)
Thousands of papers and theses written in LaTeX
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.
⋮ ⋮ ⋮ ⋮ ⋮
⋮ ⋮
All share a common bibliography
Regular Expressions More Tools
All share a common bibliography
Want to see how often citations appear together
![Page 6: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/6.jpg)
Thousands of papers and theses written in LaTeX
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.
⋮ ⋮ ⋮ ⋮ ⋮
⋮ ⋮
All share a common bibliography
Regular Expressions More Tools
All share a common bibliography
Want to see how often citations appear together
First step: extract citation sets from documents
![Page 7: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/7.jpg)
Citations enclosed in \cite{…}
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.
⋮ ⋮ ⋮ ⋮ ⋮
⋮ ⋮
Regular Expressions More Tools
![Page 8: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/8.jpg)
Citations enclosed in \cite{…}
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.
⋮ ⋮ ⋮ ⋮ ⋮
⋮ ⋮
Multiple labels separated by commas
Regular Expressions More Tools
Multiple labels separated by commas
![Page 9: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/9.jpg)
Citations enclosed in \cite{…}
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.
⋮ ⋮ ⋮ ⋮ ⋮
⋮ ⋮
Multiple labels separated by commas
Regular Expressions More Tools
Multiple labels separated by commas
May be white space
![Page 10: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/10.jpg)
Citations enclosed in \cite{…}
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.
⋮ ⋮ ⋮ ⋮ ⋮
⋮ ⋮
Multiple labels separated by commas
Regular Expressions More Tools
Multiple labels separated by commas
May be white space (including line breaks)
![Page 11: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/11.jpg)
Citations enclosed in \cite{…}
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.
⋮ ⋮ ⋮ ⋮ ⋮
⋮ ⋮
Multiple labels separated by commas
Regular Expressions More Tools
Multiple labels separated by commas
May be white space (including line breaks)
And multiple citations per line
![Page 12: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/12.jpg)
print re.search('cite{(.+)}', 'a \\cite{X} b').groups()
Idea #1: capture everything in cite{…} in a group
('X',)
Regular Expressions More Tools
![Page 13: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/13.jpg)
print re.search('cite{(.+)}', 'a \\cite{X} b').groups()
Idea #1: capture everything in cite{…} in a group
('X',)
print re.search('cite{(.+)}', 'a \\cite{X} b \\cite{Y} c').groups()
('X} b \\cite{Y',)
What about multiple citations?
Regular Expressions More Tools
![Page 14: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/14.jpg)
print re.search('cite{(.+)}', 'a \\cite{X} b').groups()
Idea #1: capture everything in cite{…} in a group
('X',)
print re.search('cite{(.+)}', 'a \\cite{X} b \\cite{Y} c').groups()
('X} b \\cite{Y',)
What about multiple citations?
Regular Expressions More Tools
![Page 15: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/15.jpg)
print re.search('cite{(.+)}', 'a \\cite{X} b').groups()
Idea #1: capture everything in cite{…} in a group
('X',)
print re.search('cite{(.+)}', 'a \\cite{X} b \\cite{Y} c').groups()
('X} b \\cite{Y',)
What about multiple citations?
Regular Expressions More Tools
Matching is greedy
![Page 16: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/16.jpg)
Idea #2: match everything inside '{}' except '}'
Regular Expressions More Tools
![Page 17: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/17.jpg)
Idea #2: match everything inside '{}' except '}'
Use '[^}]' to negate the set containing only '}'
Regular Expressions More Tools
![Page 18: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/18.jpg)
Idea #2: match everything inside '{}' except '}'
Use '[^}]' to negate the set containing only '}'
print re.search('cite{([^}]+)}', 'a \\cite{X} b').groups()
('X',)
Regular Expressions More Tools
![Page 19: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/19.jpg)
Idea #2: match everything inside '{}' except '}'
Use '[^}]' to negate the set containing only '}'
print re.search('cite{([^}]+)}', 'a \\cite{X} b').groups()
('X',)
Regular Expressions More Tools
![Page 20: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/20.jpg)
Idea #2: match everything inside '{}' except '}'
Use '[^}]' to negate the set containing only '}'
print re.search('cite{([^}]+)}', 'a \\cite{X} b').groups()
('X',)
What about multiple citations?
Regular Expressions More Tools
![Page 21: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/21.jpg)
Idea #2: match everything inside '{}' except '}'
Use '[^}]' to negate the set containing only '}'
print re.search('cite{([^}]+)}', 'a \\cite{X} b').groups()
('X',)
print re.search('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c').groups()
What about multiple citations?
Regular Expressions More Tools
print re.search('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c').groups()
('X',)
![Page 22: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/22.jpg)
Idea #2: match everything inside '{}' except '}'
Use '[^}]' to negate the set containing only '}'
print re.search('cite{([^}]+)}', 'a \\cite{X} b').groups()
('X',)
print re.search('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c').groups()
What about multiple citations?
Regular Expressions More Tools
print re.search('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c').groups()
('X',)
Need to extract all matches, not just the first
![Page 23: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/23.jpg)
Idea #3: use re.findall instead of re.search
Regular Expressions More Tools
![Page 24: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/24.jpg)
Idea #3: use re.findall instead of re.search
"A programmer is only as good as her knowledge
of her language's libraries."of her language's libraries."
Regular Expressions More Tools
![Page 25: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/25.jpg)
Idea #3: use re.findall instead of re.search
"A programmer is only as good as her knowledge
of her language's libraries."
print re.findall('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c')
['X', 'Y']
of her language's libraries."
Regular Expressions More Tools
![Page 26: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/26.jpg)
Idea #3: use re.findall instead of re.search
"A programmer is only as good as her knowledge
of her language's libraries."
print re.findall('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c')
['X', 'Y']
of her language's libraries."
Regular Expressions More Tools
![Page 27: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/27.jpg)
Idea #3: use re.findall instead of re.search
"A programmer is only as good as her knowledge
of her language's libraries."
print re.findall('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c')
['X', 'Y']
of her language's libraries."
What about spaces?
Regular Expressions More Tools
![Page 28: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/28.jpg)
Idea #3: use re.findall instead of re.search
"A programmer is only as good as her knowledge
of her language's libraries."
print re.findall('cite{([^}]+)}', 'a \\cite{X} b \\cite{Y} c')
['X', 'Y']
of her language's libraries."
What about spaces?
Regular Expressions More Tools
print re.search('cite{([^}]+)}', 'a \\cite{ X} b \\cite{Y } c').groups()
[' X', 'Y ']
![Page 29: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/29.jpg)
Could tidy this up after matching using string.strip()
Regular Expressions More Tools
![Page 30: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/30.jpg)
Could tidy this up after matching using string.strip()
Let's modify the pattern instead
Regular Expressions More Tools
![Page 31: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/31.jpg)
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
Could tidy this up after matching using string.strip()
Let's modify the pattern instead
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Regular Expressions More Tools
![Page 32: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/32.jpg)
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
Could tidy this up after matching using string.strip()
Let's modify the pattern instead
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Regular Expressions More Tools
![Page 33: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/33.jpg)
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
Could tidy this up after matching using string.strip()
Let's modify the pattern instead
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Still capturing the space after 'Y'
Regular Expressions More Tools
![Page 34: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/34.jpg)
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
Could tidy this up after matching using string.strip()
Let's modify the pattern instead
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Still capturing the space after 'Y'
Match the word-to-nonword transition as well
Regular Expressions More Tools
![Page 35: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/35.jpg)
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
Could tidy this up after matching using string.strip()
Let's modify the pattern instead
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Still capturing the space after 'Y'
Match the word-to-nonword transition as well
Regular Expressions More Tools
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', 'a \\cite{ X} b
[' X', 'Y']
![Page 36: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/36.jpg)
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
Could tidy this up after matching using string.strip()
Let's modify the pattern instead
print re.findall('cite{\\s*([^}]+)\\s*}', 'a \\cite{ X} b \\cite{Y } c')
['X', 'Y ']
Still capturing the space after 'Y'
Match the word-to-nonword transition as well
Regular Expressions More Tools
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', 'a \\cite{ X} b
[' X', 'Y']
![Page 37: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/37.jpg)
What about multiple labels in a single citation?
Regular Expressions More Tools
![Page 38: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/38.jpg)
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X,Y} ')
What about multiple labels in a single citation?
['X,Y']
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X, Y, Z} ')
['X, Y, Z']
Regular Expressions More Tools
![Page 39: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/39.jpg)
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X,Y} ')
What about multiple labels in a single citation?
['X,Y']
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X, Y, Z} ')
['X, Y, Z']
Actually can be done, but it's very complex
Regular Expressions More Tools
Actually can be done, but it's very complex
![Page 40: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/40.jpg)
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X,Y} ')
What about multiple labels in a single citation?
['X,Y']
print re.findall('cite{\\s*\\b([^}]+)\\b\\s*}', '\\cite{X, Y, Z} ')
['X, Y, Z']
Actually can be done, but it's very complex
Regular Expressions More Tools
Actually can be done, but it's very complex
Use re.split() to break matches on '\\s*,\\s*'
![Page 41: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/41.jpg)
# Start with a working skeleton.
defdefdefdef get_citations(text):
'''Return the set of all citation tags found in a block of text.'''
returnreturnreturnreturn set()
ifififif __name__ == '__main__':ifififif __name__ == '__main__':
test = '''\
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.'''
Regular Expressions More Tools
stibbons2008} highlight several dangers.'''
printprintprintprint get_citations(test)
set([])
![Page 42: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/42.jpg)
import import import import re
CITE = 'cite{\\s*\\b([^}]+)\\b\\s*}'
SPLIT = '\\s*,\\s*'
defdefdefdef get_citations(text):defdefdefdef get_citations(text):
'''Return the set of all citation tags found in a block of text.'''
result = set()
match = re.findall(CITE, text)
if if if if match:
forforforfor citation inininin match:
cites = re.split(SPLIT, citation)
Regular Expressions More Tools
cites = re.split(SPLIT, citation)
forforforfor c inininin cites:
result.add(c)
returnreturnreturnreturn result
![Page 43: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/43.jpg)
import import import import re
CITE = re.compile('cite{\\s*\\b([^}]+)\\b\\s*}')
SPLIT = re.compile('\\s*,\\s*')
defdefdefdef get_citations(text):defdefdefdef get_citations(text):
'''Return the set of all citation tags found in a block of text.'''
result = set()
match = CITE.findall(text)
if if if if match:
forforforfor citations inininin match:
label_list = SPLIT.split(citations)
Regular Expressions More Tools
label_list = SPLIT.split(citations)
forforforfor label inininin label_list:
result.add(label)
returnreturnreturnreturn result
![Page 44: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/44.jpg)
import import import import re
CITE = re.compile('cite{\\s*\\b([^}]+)\\b\\s*}')
SPLIT = re.compile('\\s*,\\s*')
defdefdefdef get_citations(text):defdefdefdef get_citations(text):
'''Return the set of all citation tags found in a block of text.'''
result = set()
match = CITE.findall(text)
if if if if match:
forforforfor citations inininin match:
label_list = SPLIT.split(citations)
Regular Expressions More Tools
label_list = SPLIT.split(citations)
forforforfor label inininin label_list:
result.add(label)
returnreturnreturnreturn result
![Page 45: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/45.jpg)
# Now test it all out.
ifififif __name__ == '__main__':
test = '''\
Granger's work on graphs \cite{dd-gr2007,gr2009},
particularly ones obeying Snape's Inequalityparticularly ones obeying Snape's Inequality
\cite{ snape87 } (but see \cite{quirrell89}),
has opened up new lines of research. However,
studies at Unseen University \cite{stibbons2002,
stibbons2008} highlight several dangers.'''
printprintprintprint get_citations(test)
Regular Expressions More Tools
set(['gr2009', 'stibbons2002', 'dd-gr2007', 'stibbons2008',
'snape87', 'quirrell89'])
![Page 46: Regular Expressions · regexp-module Author: gvwilson Created Date: 10/21/2010 1:22:46 PM Keywords ()](https://reader033.vdocuments.site/reader033/viewer/2022060601/6055dfa3c33ab00abe1132b7/html5/thumbnails/46.jpg)
June 2010
created by
Greg Wilson
June 2010
Copyright © Software Carpentry 2010
This work is licensed under the Creative Commons Attribution License
See http://software-carpentry.org/license.html for more information.