1 one thing in one place data, schemas and applications week 8
Post on 20-Dec-2015
216 views
TRANSCRIPT
2
Data Normalisation
• Formal technique to reduce complex relations to a set of simpler relations which are better structured and reduce redundancy
• We shall look at it informally on this module. On a full database module this would be covered in more theoretical depth.
3
Facilities Table
FacilityNo Facility Type Facility Unit Price per session1 Squash court 1 £4.502 Squash court 2 £4.503 Squash court 3 £4.504 Tennis court 1 £4.005 Tennis court 2 £4.006 Tennis court 3 £4.007 Astroturf pitch 1 £8.008 Astroturf pitch 1 £8.009 Hall court 1 £6.00
10 Hall court 2 £6.0011 Skittles alley £7.50
4
Potential anomalies
• UPDATE the price per session of facility 1– now different rates for squash courts - suspect business rule is same rate for all facilities for a ‘sport’
• ADD a new tennis court – must also ensure the same rate is used as for
other tennis courts
• DELETE facility 11– lose the rate per session for skittles
5
Two better tables
FacilityFacilityType
Ftypeno name price1 Squash £4.502 Tennis £4.003 Astroturf £8.004 Hall £6.005 Skittles £7.50
FacilityNo FtypenoFacility Unit1 1 court 12 1 court 23 1 court 34 2 court 15 2 court 26 2 court 37 3 pitch 18 3 pitch 19 4 court 1
10 4 court 211 5 alley
FacilityNoFaciltyUnit
FtypeNoNamePrice
6
Normalisation• Must ensure no data has been lost in this process.
• Can recover the Original data using join:select * from facility natural join facilityType
• Informally we feel that the second solution is ‘better’ than the first.– Less redundancy– Update anomalies avoided– Simpler tables reusable
• Queries are slower because a join needed– original table:
select price from facility where facilityNo=6– normalised tables:
select price from facility natural join facilityType where faciltyNo = 6
7
Exercise
• Normalise this data on outstanding video rentals• Describe your result as an ER model
(relationships not foreign keys)
Source: www.gc.maricopa.edu/business/sylvester/cis164/norm-rev.htm
8
1st Normal form - 1NF
• the data must be in the form of valid relations - i.e. with uniquely named columns, each column containing only atomic (indivisible) values
• each row unique - identify the primary key
9
2nd Normal Form
• Non-key fields are fully dependent on the primary key
• ‘Dependent’ – Column A is dependent on Column B if, in the
data, knowing the value of B will determine the value of A
– E.g. E#, Age• Age is dependent on E# • E# is not dependent on Age ( several people with
the same age , different employee numbers)
10
3rd Normal Form 3NF
• A relation R is in 3NF if and only if– the non-key fields are fully dependent on the
primary key i.e is 2NF and– the non-key fields are mutually independent– e.g R(A,B, C)– then
• B dependent on A• C dependent on A• B and C independent
11
Mnemonic
• “the key, the whole key and nothing but the key”
• 2NF – “the key” : non-key fields depend on the key– “the whole key”: non-key fields depend fully on
the key
• 3NF– “nothing but the key”: non-key fields are not
dependent on each other
12
Summary : Data Normalisation• Place common relations in separate tables and join them when
required• Normalisation is bottom-up – given a mess of data in a table (a
spreadsheet for example), how to restructure it to reduce redundancy
• More names required – name for each relation at least – more concepts but more understandable
• Normalised tables may be reusable• Reduced performance due to extra joins – may have to de-
normalise to get better performance • Improved performance due to smaller database• Normalisation is a bottom-up process – starting from some mess –
like a spreadsheet• Top-down models are usually normalised anyway
13
Code ‘Normalisation’
• For very similar reasons, we want to write our code (PHP, HTML, Javascript..) so it is ‘normalised’ too – common, recurrent code in factored into one place.– To avoid update anomalies – update in one
place– To increase readability – less code to read – To create reusable components– But may reduce performance – but who cares
– maybe later if you need to optimize-
To factor (maths) – to divide into the basic units
68 = 2*2*17
14
Recurrent Values
• If the same value is used in several places, assign it to a variable (constant) and use the variable name where it is needed.
• If this common value has to be changed, it can be changed in one place only.
• The variable name provides a understandable explanation of what this value is for.
15
Silly example: colorheads.php<?phpprint "<h1 style='color:4AFF2F;'>This one is OK</h1>";print "<h1 style='color:FF1052;'>This one is BAD</h1>";?>
<p> here is some text </p>
<?phpprint "<h1 style='color:FF1052;'>This one is BAD too</h1>";print "<h1 style='color:4AFF2F;'>This one is OK too</h1>";?>
16
Revision 1 : colorheads2.php<?php
$goodColour = "4AFF2F";$badColour = "FF1052";
print "<h1 style='color:$goodColour;'>This one is OK</h1>";print "<h1 style='color:$badColour;'>This one is BAD</h1>";
?>
<p> here is some text </p>
<?php
print "<h1 style='color:$badColour;'>This one is BAD too</h1>";print "<h1 style='color:$goodColour;'>This one is OK too</h1>";
?>
17
Revision 2 : colorheads3.php<html><style>.good {color:4AFF2F;}.bad {color:FF1052;}</style></head><body>
<h1 class="good">This one is OK</h1><h1 class="bad">This one is BAD</h1><p> here is some text </p>
<h1 class="bad">This one is BAD</h1><h1 class="good">This one is OK</h1></body></html>
18
Common code
• A common piece of code can be put in on place and linked into the programme
• Suppose you have an HTML footer to go on every page (it might include to call Google Analytics for example)
• We want to put this in one file and then include it in the scripts for each of the pages
19
<hr/><p style="font-size: 10pt; color: blue">This page is generated on <?php print date("l, d/m/Y") ?> at <?php print date("H:i") ?> from a script written by Chris Wallace.</p>
Sample page (page1.php)
<html><head><title>First Page</title></head><body><p style="color:silver; font-size:15pt;">This is the main body of page 1 which is linked to <a href="page2.php">page 2</a></p><?php include("footer.php") ?></body></html>
Footer code (footer.php)
php function to include php script from another file
20
Modularisation
• Modularisation as a common pattern
• In PHP there are several slightly different ways to include code:• include(), include-once()• require(), require-once()
• Since PHP is a scripting language, code can’t be pre-compiled as in Java.
21
Functions
• Common code but for a few variations
• Variations can be expressed as parameters
• Create a function
• Call the function where required
22
Function declaration and use makefooter.php
<?php
// define the function function footer($author, $size) {return "<hr/><p style='font-size: {$size}pt; color: blue'>This page is generated on ". date("l, d/m/Y")." at ".date("H:i"). " from a script written by $author.</p>";}
//call the functionprint footer("Chris Wallace", 20);
//call with different parametersprint footer("Mark Butler", 30);
?>
Two parameters
{ } means evaluate
expression
Can’t write
$sizept
. Dot for concatenate
strings
23
Functions and Modules
• Common to combine these two ideas
• Place the function definitions in a separate file
• Include file in every program which needs these functions
• Call the function wherever required.
• E.g. page3.php
24
<html><head><title>Third Page</title></head><?php include("functs.php") ?><body><p style="color:silver; font-size:15pt;">This is the main body of page 3.</a></p><?php print footer("Chris Wallace", 25) ?></body></html>
<?php// generate a footer function footer($author, $size) {return "<hr/><p style='font-size: {$size}pt; color: blue'>This page is generated on ". date("l, d/m/Y")." at ".date("H:i"). " from a script written by $author.</p>";}?>
Function library functs.php
Use of function
25
Code normalisation : Summary
• Breaking a programming task cleanly into into simpler, re-useable units
• The sign of a good programmer• Good structure is arrived at iteratively
– You notice common parts as the code is written– You re-factor the code to put the common parts in
one place
• Avoid merely copying code because this creates update anomolies – an error has to be corrected in all copies
26
One thing in one place• An idea which is common to both data and code normalisation
– In database it means splitting big data sets into smaller, independent and reusable ones with no redundancy
– In programming it means splitting a large amount of code into smaller reusable items – variables, modules, functions and classes
• Where else do we see this principle?– A class or type separates the fixed parts of a set of things (every
employee has a name) from the variable (Empno ‘1111’ has name ‘Fred’)
– In learning, it is about identifying common themes or patterns, separating the general from the particular :
• String concept implemented differently in each language• Code and Annotations in one script• Several Languages in one script
27
Tradeoffs and judgement
• Normalisation requires judgement– Are these two things really the same?– Will they always be the same?
• Normalisation incurs a performance penalty – because the separate bits have to be put back together in execution – joins in a database, calls in code
• Data and code design involves trade-offs