regular expression mohsen mollanoori. what is regex ? “ a notation to describe regular languages....
Post on 01-Jan-2016
225 Views
Preview:
TRANSCRIPT
Regular Expression
Mohsen Mollanoori
What is RegeX ? “A notation to describe regular languages.” “Not necessarily (and not usually) regular” “A Powerful String Processing Tool” “A pattern that can be matched against a
string” “A Language But Not A Language”
What RegeX Does ? String Processing
Matching Strings against a Specific Pattern Split Strings Change Substrings Extract Substrings
What Programming Languages Support RegeX ? Almost All of Them
Perl Java .Net (C#, VB.Net, …) PHP Ruby Java Script …
And even Many IDEs & Editors & Utilities grep eclipse Visual Studio .Net vim emacs …
The NotationSymbol Meaning Example
.Any Single Char /.at/ matches
“cat”, “bat”, “pat”, “mat”
*Zero or More occurrence
of preceding Char/a*b/ matches “b”,
“aaaaab”
+ One or More occurrence of preceding Char
/a+b/ matches “ab”, “aaaaab”
? Zero or One occurrence of preceding Char
/a?b/ matches “ab” and “b”
Example 1
String: “Term”, “Term1”, “Term2”Pattern: /Term./Result: “Term1”, “Term2”
Example 2
String: “Term”, “Term1”, “Term2”Pattern: /Term.?/Result: “Term”, “Term1”, “Term2”
Example 3
String: “Term”, “Term1”, “Term2”Pattern: /Term1?/Result: “Term”, “Term1”
Example 4
String: “Term1”, “Term11”, “Term2”, “Term”
Pattern: /Term1+/Result: “Term1”, “Term11”
Example 5
String: “Term1”, “Term11”, “Term2”, “Term”
Pattern: /Term1*/Result: “Term1”, “Term11”, “Term”
Character ClassesExample Meaning
[pnm] “p” or “n” or “m”
[Qq] “Q” or “q”
[A-Z] Upper Case Letters
[A-Za-z] Letters
[^A-Z] Every char EXCEPT A-Z
[A-Z&&[^C-E]] A-Z but NOT C-E
Example 6
String: “CAT”, “Cat”, “cat”Pattern: /[Cc]at/Result: “Cat”, “cat”
Example 7
String: “CAT”, “Cat”, “cat”Pattern: /[Cc][Aa][Tt]/Result: “CAT”, “Cat”, “cat”
Example 8
String: “Term”, “Term1”, “Term2”Pattern: /[A-Za-z]+/Result: “Term”
Example 9
String: “Term”, “Term1”, “Term222”Pattern: /.*[0-9]+/Result: “Term1”, “Term222”
Example 10
String: “Term”, “Term1”, “Term222”Pattern: /[^0-9]+/Result: “Term”
Repeating Chars (Intervals)Example Description
a{3} Matches “aaa”
a{3,5} Matches “aaa”, “aaaa”, “aaaaa”
a{3,} Matches “aaa”, “aaaa”, …
Predefined Character Classes
Class Description\d Digit
\D Non Digit
\s Space
\S Non Space
\w Alphanumeric
\W Non Alphanumeric
\b Word Boundary
\B Non Word Boundary
\A The beginning of the input
\z The end of the input
Example 11String: “This is some text !”Pattern: /is/Result: “This is some text !”
Example 12String: “This is some text !”Pattern: /\bis\b/Result: “This is some text !”
Example 13Variable Names
Pattern: /[A-Za-z]\w{0,15}/
Groupsemail addresses:
/[A-Za-z0-9_]+@.+\.\w+/
/([A-Za-z0-9_]+)@(.+)\.(\w+)/
$1Username
$2Server
$3Domain
RegeX & Perlopen (IN, “File.txt”); # open file
while ($line = <IN>) # read line by line{ if($line =~ /([A-Za-z0-9_])@(.+)\.(\w+)/) {
print ‘User =’, $1, “, Server =“, $2}
}
close(IN);
RegeX & Ruby
open('in.txt', 'r').readlines.each do |line|
puts line if line =~ /^([a-z0-9_]+)@(.+)\.(.+)$/i
end
RegeX & Java java.util.regex.Pattern java.util.regex.Matcher
java.util.Scanner
java.lang.String replaceAll(regex, replacement) replaceFirst(regex, replacement) matches(regex) split(regex)
Example 16
String email = readEmailFromSomewhere();
if (email.matches("([A-Za-z0-9_]+)@(.+)\\.(\\w+)")) { System.out.println("valid email");} else { System.out.println("invalid email");}
Example 17
String str = "098 123-456-789";
String[] nums = str.split("[\\s-]");
for (String num : nums) {
System.out.println(num);
}
Example 18
// Remove Tags from HTML
String html = “<html><head><title>This is a title.</title></head>” +“<body>This is <b>body</b> of a <i>HTML</i> file” + “!</body></html>”;
String text = html.replaceAll("<[^>]+>", " ");String normalizedText = text.replaceAll("\\s+", " ");
System.out.println(normalizedText);
Example 19// hyperlik urls
String html = "<html>Please Visit http://myhomepage.com</html>";
html = html.replaceAll("https?://([-.A-Za-z]+)“,"<a href='$0'>$1</a>");
System.out.println(html);
Example 20Convert MixedCase to underlined_format
String MixedCase = "ThisIsSomeTextInMixedCaseFormat";
String temp = MixedCase.replaceAll("([a-z])([A-Z])", "$1_$2");
String underlined_format = temp.toLowerCase();
System.out.println(underlined_format);
// result: this_is_some_text_in_mixed_case_format
Convert underlined_format to MixedCase
?
Example 21The Pipe Sign Find Strings of 0s & 1s that have even
number of 1s or even number of 0s
str = ‘110100101'
puts str =~ /^(1*(01*0)*1*|0*(10*1)*0*)$/ ? 'Yes' : 'No'
Example 22Finding Unintentionally Repeated Words
text = 'hello, this is some some text!'
?
Back References \i references to iths matched group
Example: /(.)\1/ matches against “aa”, “bb”, “11”, “##”
Example 22Finding Unintentionally Repeated Words
text = 'hello, this is some some text!'
if text =~ /(\b\w+\b)\W+\1/
puts $1 + " is repeated more than once"
end
# some is repeated more than once
You even needn't write code An Editor that
supports RegeX
eclipse find/replace dialog box
eclipse find/replace dialog box
Microsoft VS.NET Quick Replace
Use Regular ExpressionUse Regular Expression
Extracting Timestamps From a log file
Extracting Timestamps From a log file
Some Rewriting System!rewrite(input) temp = input do before = temp temp = rewrite temp using rule1 temp = rewrite temp using rule2
...
after = temp while(before != after) return temp
XML
<Students >
<Student faculty="Computer Engineering" student-id="8017024">
<Name first="Mohsen" last="Mollanoori"/>
<Terms >
<Term num="1">
<Lesson name="Statistics" mark="10"/>
<Lesson name"Math" mark="10"/>
</Term>
</Terms>
</Student>
</Students>
MML
@Students
{
@Student(faculty="Computer Engineering" student-id="8017024")
{
@Name(first="Mohsen" last="Mollanoori");
@Terms
{
@Term(num="1")
{
@Lesson(name=“Statistics” mark="10");
@Lesson(name"Math1“ mark="10");
}
}
}
}
Example 23MML 2 XML
do {
before = mml;
mml = mml.replaceAll(
"@([A-Za-z]+)(\\(([^)]*)\\))?;",
"<$1 $3/>“);
mml = mml.replaceAll(
"@([A-Za-z]+)(\\(([^)]*)\\))?\\{([^\\{\\}]*)\\}",
"<$1 $3>$4</$1>“);
after = mml;
} while (!before.equals(after));
Example 24Remove Text from XML(Keep Tags Only)Is this Correct ?
String xml = “<b><em>Text Here</em></b>”
xml = xml.replaceAll(“>[^<]*<”, “”);
Match: “<b><em>Text Here</em></b>”
Result: “<b><em/em></b>”
Look Ahead & Look Behind
String xml = “<b><em>Text Here</em></b>”
xml = xml.replaceAll(“(?<=>)[^<]*(?=<)”, “”);
Look Behind using ?<= to
see a ‘>’
Looking Ahead using ?= to see a ‘<’
Example 25Over Matchingxml = “<a> aaa </a><b> bbb </b>”;
xml = xml.replaceFirst(“>.*<”, “”);
Match: xml = “<a> aaa </a><b> bbb </b>”;
Result: xml = “<a/b>”;
Greedy & Non GreedyGreedy Non Greedy
* *?
+ +?
? ??
{a,b} {a,b}?
Example 26Solution to Over Matchingxml = “<a> aaa </a><b> bbb </b>”;
xml = xml.replaceFirst(“>.*?<”, “”);
Match: xml = “<a> aaa </a><b> bbb </b>”;
Result: xml = “<a/a><b> bbb </b>”;
Example 27String xml = "aabb";xml = xml.replaceAll(".{2,3}", "-");System.out.println(xml); // result = ‘-b’
String xml = "aabb";xml = xml.replaceAll(".{2,3}?", "-");System.out.println(xml);// result = ‘--’
Example 28String xml = "aabb";xml = xml.replaceAll(".?", "-");System.out.println(xml);// result: -----
String xml = "aabb";xml = xml.replaceAll(".??", "-");System.out.println(xml);// result: -a-a-b-b-
Further Reading & Works “Teach Yourself Regular Expressions in 10
Minutes”, Sams Publishing, February 28, 2004, ISBN: 0-672-32566-7
“Mastering Regular Expressions, 3rd Edition”, By Jeffrey E. F. Friedl, O'Reilly, August 2006, ISBN :0-596-52812-4
Java Regular Expression Documents
Practice, Practice, Practice
TANX
top related