pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · regular expressions results grouping html...

51
Python Stringology Marcin Mlotkowski 27th March, 2013

Upload: others

Post on 21-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

PythonStringology

Marcin Młotkowski

27th March, 2013

Page 2: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

1 Regular expressions

2 Results grouping

3 html processing

4 XML processing

Marcin Młotkowski Python

Page 3: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Regular expressions in examples

MS Windows system

c:\WINDOWS\system32> dir *.exe

Resultaccwiz.exeactmovie.exeahui.exealg.exeappend.exearp.exeasr_fmt.exe,asr_ldm.exe...

Marcin Młotkowski Python

Page 4: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Examples, cont.

?N*X, *BSD$ rm *.tmp

Examples of regular expression

reg. exp. words’alamakota’ { ’alamakota’ }’(hop!)*’ { ”, ’hop!’, ’hop!hop!’, ’hop!hop!hop!’, ...}’br+um’ { ’brum’, ’brrum’, ’brrrum’, ... }

Marcin Młotkowski Python

Page 5: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Searching and matching

re library

import re

matching

if automat.match(’brr+um’, ’brrrrum!!!’): print ’matches’

searching

if automat.search(’brr+um’, ’Automobile sounds brrrrum!!!’): print’exists’

Marcin Młotkowski Python

Page 6: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Regular expression compilation

import reautomat = re.compile(’brr+um’)automat.search(’brrrrum’)automat.match(’brrrrum’)

Marcin Młotkowski Python

Page 7: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Result interpretation

>>> re.search(’brr+um’, ’brrrum!!!’)

MatchObject

.group(): matched text

.start(): beginning of matched text

.end(): end of matched text

Marcin Młotkowski Python

Page 8: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Advanced example

TaskOn html page find all references to other pages.

Exampleswww.ii.uni.wroc.plwww.gogole.com

Marcin Młotkowski Python

Page 9: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Solution

Implementation

adres = ’([a-zA-Z]+\.)*[a-zA-Z]+’automat = re.compile(’http://’ + adres)tekst = fh.read()

[ url.group() for url in automat.finditer(tekst) ]

Marcin Młotkowski Python

Page 10: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Solution

Implementation

adres = ’([a-zA-Z]+\.)*[a-zA-Z]+’automat = re.compile(’http://’ + adres)tekst = fh.read()

[ url.group() for url in automat.finditer(tekst) ]

Marcin Młotkowski Python

Page 11: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Metasymbols overview

symbol descriptionw* zero or more repetition of ww+ at least one repetition of ww1|w2 alternative of w1 and w2w{m, n} w occurs at least n times, and at most m times. any character except newlinew? 0 or 1 occurrence of w

Marcin Młotkowski Python

Page 12: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Popular abbreviations

symbol description\d any digit\w alphanumeric character (depends on LOCALE)\Z end of text

Marcin Młotkowski Python

Page 13: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Problem with backslash

Role of backslash in Python

’Name\tSurname\n’print ’Tabulator is a character \\t’’c:\\WINDOWS\\win.ini’

Marcin Młotkowski Python

Page 14: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Backslash in regular expressions

Searching of ’[’

re.match(’\[’, ’[’)

A puzzle

How to find ’\[’?

Marcin Młotkowski Python

Page 15: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Backslash in regular expressions

Searching of ’[’

re.match(’\[’, ’[’)

A puzzle

How to find ’\[’?

Marcin Młotkowski Python

Page 16: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Approaches

’\[’re.match(’\[’, ’\[’) # error of regexp compilation

re.match(’\[’, ’[’) # result: None

’\\[’re.match(’\\[’, ’\[’) # error of regexp compilationre.match(’\\[’, ’[’) # result: None

re.match(’\\\[’, ’\[’) # result: Nonere.match(’\\\\[’, ’\[’) # result: None

Marcin Młotkowski Python

Page 17: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Approaches

’\[’re.match(’\[’, ’\[’) # error of regexp compilationre.match(’\[’, ’[’) # result: None

’\\[’re.match(’\\[’, ’\[’) # error of regexp compilationre.match(’\\[’, ’[’) # result: None

re.match(’\\\[’, ’\[’) # result: Nonere.match(’\\\\[’, ’\[’) # result: None

Marcin Młotkowski Python

Page 18: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Approaches

’\[’re.match(’\[’, ’\[’) # error of regexp compilationre.match(’\[’, ’[’) # result: None

’\\[’re.match(’\\[’, ’\[’) # error of regexp compilation

re.match(’\\[’, ’[’) # result: None

re.match(’\\\[’, ’\[’) # result: Nonere.match(’\\\\[’, ’\[’) # result: None

Marcin Młotkowski Python

Page 19: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Approaches

’\[’re.match(’\[’, ’\[’) # error of regexp compilationre.match(’\[’, ’[’) # result: None

’\\[’re.match(’\\[’, ’\[’) # error of regexp compilationre.match(’\\[’, ’[’) # result: None

re.match(’\\\[’, ’\[’) # result: Nonere.match(’\\\\[’, ’\[’) # result: None

Marcin Młotkowski Python

Page 20: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Approaches

’\[’re.match(’\[’, ’\[’) # error of regexp compilationre.match(’\[’, ’[’) # result: None

’\\[’re.match(’\\[’, ’\[’) # error of regexp compilationre.match(’\\[’, ’[’) # result: None

re.match(’\\\[’, ’\[’) # result: Nonere.match(’\\\\[’, ’\[’) # result: None

Marcin Młotkowski Python

Page 21: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Ultimate solution

A solutionre.match(’\\\\\[’, ’\[’)re.match(r’\\\[’, ’\[’)

Marcin Młotkowski Python

Page 22: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

String processing

String processing by Python

string in Python ’true’ character’\n’ 0x0A’\t’ 0x0B’\\’ 0x5C

String processing by regular expressions

string in regex ’true’ character’\[’ 0x5B

Marcin Młotkowski Python

Page 23: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Few words on groups

res = re.match(’a(b*)a.*(a)’, ’abbabbba’)print res.groups()

Result(’bb’, ’a’)

Marcin Młotkowski Python

Page 24: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Grouping expression

(?P<name>regexp)

Marcin Młotkowski Python

Page 25: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Task

From data in format ’20061204’ drag day, month, and year.

Marcin Młotkowski Python

Page 26: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

A solution

Regular expression

wzor = r’(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})’

res = re.search(wzor, ’On 20110406 there is a Python lecture’)

print res.group(’year’), res.group(’month’)

Marcin Młotkowski Python

Page 27: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

A solution

Regular expression

wzor = r’(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})’

res = re.search(wzor, ’On 20110406 there is a Python lecture’)

print res.group(’year’), res.group(’month’)

Marcin Młotkowski Python

Page 28: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

A solution

Regular expression

wzor = r’(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})’

res = re.search(wzor, ’On 20110406 there is a Python lecture’)

print res.group(’year’), res.group(’month’)

Marcin Młotkowski Python

Page 29: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

A solution

Regular expression

wzor = r’(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})’

res = re.search(wzor, ’On 20110406 there is a Python lecture’)

print res.group(’year’), res.group(’month’)

Marcin Młotkowski Python

Page 30: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

A solution

Regular expression

wzor = r’(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})’

res = re.search(wzor, ’On 20110406 there is a Python lecture’)

print res.group(’year’), res.group(’month’)

Marcin Młotkowski Python

Page 31: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

A solution

Regular expression

wzor = r’(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})’

res = re.search(wzor, ’On 20110406 there is a Python lecture’)

print res.group(’year’), res.group(’month’)

Marcin Młotkowski Python

Page 32: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

html processing

html file is a string of tags:

<html><title>Tytuł</title><body bgcolor="red"><div align="center">Tekst</div></body></html>

Opening tags<html>, <body>, <div>

Closing tags

</body>, </div>, </html>

Marcin Młotkowski Python

Page 33: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

sgmllib

import sgmllib

class sgmllib.SGMLParser:def start_tag(self, attrs):def end_tag(self):

Marcin Młotkowski Python

Page 34: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

How to use sgmllib

TaskFind all references of ’href’<a href="adres">Text</a>

class MyParser(sgmllib.SGMLParser):

def start_a(self, attrs):for (atr, val) in attrs:

if atr == ’href’: print val

p = MyParser()p.feed(dokument)p.close()

Marcin Młotkowski Python

Page 35: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

How to use sgmllib

TaskFind all references of ’href’<a href="adres">Text</a>

class MyParser(sgmllib.SGMLParser):

def start_a(self, attrs):for (atr, val) in attrs:

if atr == ’href’: print val

p = MyParser()p.feed(dokument)p.close()

Marcin Młotkowski Python

Page 36: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

XML

Example<?xml version="1.0" encoding="UTF-8"?><biblioteka><ksiazka egzemplarze="3"><autor>Ascher, Martelli, Ravenscroft</autor><tytul>Python cookbook</tytul>

</ksiazka><ksiazka><autor/><tytul>Python for beginners</tytul>

</ksiazka></biblioteka>

Marcin Młotkowski Python

Page 37: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

XML processing

processing of subsequent elements (saxutils)create a tree (DOM) corresponding to xml

Marcin Młotkowski Python

Page 38: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

SAX — Simple Api for XML

elements of documents are read step by stepfor each element a proper method is called

Marcin Młotkowski Python

Page 39: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Parser implementation

Default parser

from xml.sax import *

class saxutils.DefaultHandler:def startDocument(self): passdef endDocument(self): passdef startElement(self, name, attrs): passdef endElement(self, name): passdef characters(self, value): pass

Marcin Młotkowski Python

Page 40: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Own parser implementation

class SaxReader(saxutils.DefaultHandler):

def characters(self, value):print value

def startElement(self, name, attrs):for x in attrs.keys():

Marcin Młotkowski Python

Page 41: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

How to use parser

from xml.sax import make_parserfrom xml.sax.handler import feature_namespacesfrom xml.sax import saxutils

parser = make_parser()parser.setFeature(feature_namespaces, 0)dh = SaxReader()parser.setContentHandler(dh)parser.parse(fh)

Marcin Młotkowski Python

Page 42: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

SAX: summary

Read-only mode processing;processes parts of document;SAX is fast, with small memory requirements.

Marcin Młotkowski Python

Page 43: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

DOM: Document Object Model

A document is kept entirely as a treeA document (its tree) can be modified;Processing needs time and memory, all tree is kept in memory;Specification of DOM is driven by W3C.

Marcin Młotkowski Python

Page 44: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Reminder

Example<?xml version="1.0" encoding="UTF-8"?><biblioteka><ksiazka egzemplarze="3"><autor>Ascher, Martelli, Ravenscroft</autor><tytul>Python. Receptury</tytul>

</ksiazka><ksiazka><autor/><tytul>Python. Od podstaw</tytul>

</ksiazka></biblioteka>

Marcin Młotkowski Python

Page 45: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

A picture

Document

<?xml version="1.0" encoding="UTF-8"?>

Element Text Element

""Text""

Text""

Element<biblioteka>

<ksiazka> <ksiazka>

Element

<autor>

Element

<tytul>

Text

Asher, ...

Text

Python. Od ...

Marcin Młotkowski Python

Page 46: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Python libraries

xml.dom: DOM Level 2xml.dom.minidom: Lightweight DOM implementation, DOMLevel 1

Marcin Młotkowski Python

Page 47: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

minidom implementation

A class Node

class attribute example.nodeName library, book, author.nodeValue "Python cookbook".attributes <book copies="3">.childNodes list of subnodes

Marcin Młotkowski Python

Page 48: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Tree creation

XML file processingimport xml

def wezel(node):print node.nodeNamefor n in node.childNodes:

wezel(n)

doc = xml.dom.minidom.parse(’content.xml’)wezel(doc)

Marcin Młotkowski Python

Page 49: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

DOM processing

Node manipulation

appendChild(newChild)removeChild(oldChild)replaceChild(newChild, oldChild)

New node creationnew = document.createElement(’chapter’)new.setAttribute(’number’, ’5’)document.documentElement.appendChild(new)

print document.toxml()

Marcin Młotkowski Python

Page 50: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

DOM processing

Node manipulation

appendChild(newChild)removeChild(oldChild)replaceChild(newChild, oldChild)

New node creationnew = document.createElement(’chapter’)new.setAttribute(’number’, ’5’)document.documentElement.appendChild(new)

print document.toxml()

Marcin Młotkowski Python

Page 51: Pythonmarcinm/dyd/python_eng/regex.pdf · 2013-03-27 · Regular expressions Results grouping html processing XML processing Regularexpressionsinexamples MSWindowssystem c:nWINDOWSnsystem32>

Regular expressionsResults groupinghtml processingXML processing

Summarize: DOM

process entire treeneeds a lot of time and memory for large files

Marcin Młotkowski Python