information technology in plant protection
DESCRIPTION
Information Technology in Plant Protection. Presentation. GIS tools for Plant Protection. Prepared by: Dr. János Busznyák. Digital Mapping Tools for Plant Protection. Methods of Obtaining Spatial Data Manual Geodesy With the help of Global Positioning Photogrammetry Remote Sensing - PowerPoint PPT PresentationTRANSCRIPT
Information Technology in Plant Protection
Presentation
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Prepared by:– Dr. János Busznyák
GIS tools for Plant Protection
2
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Methods of Obtaining Spatial Data– Manual– Geodesy– With the help of Global Positioning– Photogrammetry– Remote Sensing– Manual Map Digitalisation– Scanning Maps– From Digital Files
Digital Mapping Tools for Plant Protection
3
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Not only the digital form of the contents of a map ready to be used with a computer.
• No need for segmentation, the elements are of real size, has accurate fitting, has topology, often uses layers and objects.
• Primary Data Obtaining Methods– Measurements (GPS)– Existing Reports
Mostly vector data are obtained from primary data obtaining methods.• From Secondary Sources
– By digitalization, adding automatic or manual vectorization.In the case of georeferencing and vectorization in secondary methods,
the result is also a vector map. If a secondary data collection (scanning) is not followed by vectorization, the result is a digital raster map.
Digital Map
4
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Aim– New level of GPS analysis (vector)– New publication possibilities– Lower storage and transfer capacity needs
• Preparatory steps– Digitalization of map sheets– Georeferencing, eliminating distortions, projection convertion (lots
of work)• Pre-processing• Vectorization
– Of areas– Of line-like objects– Of objects
• Post-processing
Raster-Vector Transformation
5
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Vectorization– Manual– Semi-automatic– Automatic
Vectorization II.
6
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Automatic vectorization of a soil map– Single bit– Low data density
• Automatic vectorization of a topographic map– 8-bit– High data density
Black: convert to lineBlue: segmented pixels
Application of the Automatic Method
7
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Coordinates of the shape file vertex points– site,lat,long,name,HOTLINK – 1,38.889,-77.035,Washington
Monument,http://www.nps.gov/wamo
– 2,38.889,-77.050,Lincoln Memorial,c:/ESRI/AEJEE/DATA/WASHDC/linc.jpg
– 3,38.898,-77.036,White House,c:/ESRI/AEJEE/DATA/WASHDC/whse.txt
– 4,38.889,-77.009,Capitol,c:/ESRI/AEJEE/DATA/WASHDC/cap.pdf
ESRI Arc Explorer JEE tutorial
Data Input from Text File
8
TÁMOP-4.1.2.A/2-10/1-2010-0012
• With the help of hybrid systems, raster and vector data can be used together. – Vector, raster and attribute data are stored separately, in
the most suitable way for the model.– The operations are carried out by these systems in the
model that is most suitable for the operation in question.– The systems apply a wide variety of vector-raster
transformations before and after the operations.– The GoogleMaps service is based on a hybrid data model.
Hybrid Data Model, Mashup Map
9
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Facts that mostly influence data quality:– Origin of data– Geometric accuracy– Accuracy of attribute data – Consistency of attribute data– Topologic consistency– Completeness and validity of data
Data Quality
10
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Georeferencing is the process of scaling, rotating, translating and deskewing the image to match a particular size and position.The word was originally used to describe the process of referencing a map image to a geographic location. Source: http://wintopo.com/help/html/georef.htm
• Usual ways:– World file– Header (GeoTiff, GeoJP2…)
Georeferencing
11
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Certain image formats include georeferencing information in the header of the image file: – img, – bsq, – bil, – bip, – EXIF– ITT– GeoTIFF– grid
Header
12
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Georeferencing information is stored in a separate word file: – The word file contains 6 parameters of an affin
transformation that means a connection between the image coordinate system and that of the world coordinate system.
– The images are stored as raster data, where each cell of the image is identified by a row and coloumn number.
– The name of the word file has to be the same as the image file and be in the same folder.
Word File
13
TÁMOP-4.1.2.A/2-10/1-2010-0012
Georeferencing with the Help of 2 Reference Points segítségével
14
TÁMOP-4.1.2.A/2-10/1-2010-0012
Graphic Georeferencing - Rubber sheeting
15
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Projection, date• Geoid, geoidundulation• Uniform National Projection (UNP - EOV)• Transformation• Base points, base point systems
Projection Systems, Conversion
16
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Based on image surface shape– Cylinder projection– Cone projection– Flat projection– Other projection
• Based on image surface axle– Polar (normal)– Transversal (equatorial)– Oblique (not normal difference)
• Based on the contact of the image and base surface – Tangent– Transect
Classification of Projection
17
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Systems without projection• Dual projection Hungarian systems• Stereographic projection systems (BUDAPESTI,
MAROSVÁSÁRHELYI)• Oblique Mercator Projection• HÉR, HKR, HDR• EOV• Gauss-Krüger• UTM (Universal Transverse Mercator)• GEOREF (World Geographic Reference System)
Important Projection Systems
18
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Reference ellipsoids nearing an area of the Earth surface• The centre of the ellipsoid is that of the Earth• The axis of rotation is that of the Earth’s
– Parameters• Major axis (equatorial radius)• Oblateness (connection between equatorial and polar radius)
• If the centre of the ellipsoid is moved until it fits to the examined area with the least error, we will get the geodesic date.– Bessel (stereographic)– Kraszovszkij (Gauss-Krüger)– Hayford (UTM)– WGS-84 (GPS), – IUGG-67 (EOV)
Important Ellipsoids
19
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Geographic Projection– WGS 1984 Datum
• Ortographic Projection– SPHERE Datum
• Eckert IV. Projection– WGS 1984 Datum
Some Interesting Projections
20
TÁMOP-4.1.2.A/2-10/1-2010-0012
• GPS measurement gives the height above the ellipsoid (h). When calculating height above sea level(H), geoidundulation has to be taken into consideration.
• Geoidundulation is the separation between the equipotential surface that represents a mean ocean surface and a reference ellipsoid (h=H+N, where N is the value of geoindundulation of the point).
• Geoid: the surface of oceans and seas, if connected by small canals under the land(Listing 1873)
Geoidundulation
21
TÁMOP-4.1.2.A/2-10/1-2010-0012
• The starting coordinates have been placed 200km to the South and 650 km to the West. Thus, the Y coordinates are lower than 400, and the X coordinates are always higher than 400, which means they are easy to distinguish.
Uniform National Projection
22
TÁMOP-4.1.2.A/2-10/1-2010-0012
• The first elevation of Hungary was carried out based on the Mediterranian base level from 1873-1913.– Height of Nadap main
base point: 173,8385 m. • Baltic base level after
World War II.– Height of Nadap main
base point: 173,1638 m, which is 0,6747 m lower.
Uniform National Elevation Network(EOMA)
23
TÁMOP-4.1.2.A/2-10/1-2010-0012
• ETRS89 (OGPSH) points transformed into the Uniform National Projection (EOV) system and back
• The points for the transformation are chosen automatically
• Local transformation based on the common points of the OGPSH and EOV systems
• With 8 common points in Hungary
• With refined Geoidundulation data
Etrs89-Eov-Hivatalos-Helyi-Térbeli-Transzformáció
Transformation
24
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Database of Altitudinal Base Points• Database of Horizontal Base Points• Database of OGPS Base Points
• Országos GPS Hálózat pontjai (Points of the National GPS Network-OGPSH)
Base Points
25
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Video– Georeferencig (graphical)
• Animation– Georeferencing– Geoidundulation– Shape (create)
Videos and Animations for Chapter 1.
26
TÁMOP-4.1.2.A/2-10/1-2010-0012
I. questionIdentify the value of geoid-undulation at the Parliament Building,
Budapest, Hungary with the help of EHT (or any other) software .II. question
Digitalize any map sheet with the help of a scanner. Georeferate it with 3 reference points with the help of GEOREGARCVIEW software.
The necessary coordinates can be obtained from mapservers (eg. Googlemaps).
III. questionDigitalize another map sheet overlapping the previous one with the help of a scanner. Georeferate with 3 reference points with the help of GEOREGARCVIEW software. Open it together with the georeferated file of the previous task with ArcExplorer JEE (or any other) and check its accuracy.The necessary coordinates can be obtained from mapservers (eg. Googlemaps).
Tasks for Chapter 1.
27
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Global Positioning– The coordinates of 3
satellites at a given time are needed.
– If time can be measured accurately, then wave spread speed and the time will help calculate how far we are from the satellite.
– In the case of 1 satellite, it will give a sphere surface.
GNSS Device System
28
TÁMOP-4.1.2.A/2-10/1-2010-0012
• If there is a connection with 2 satellites, then we are on the sphere of both satellites. The section of the two spheres is a circle.
• The section of the sphere of the third satellite and the circle will be two points, one of which can always be excluded (eg. Points far from the earth surface).
Global Positioning II.
29
TÁMOP-4.1.2.A/2-10/1-2010-0012
Differential Correction
30
TÁMOP-4.1.2.A/2-10/1-2010-0012
• GNSSNet• NtripCaster IP address,
port: 84.206.45.44:2101
Network RTK in Hungary(2010)
31
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Geotrade GNSS – Host:
www.geotradegnss.hu– Port: 2101
Multi-Base System in Hungary ( 2010)
32
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Georgikon RTK coverage• DGPS forthe whole
country of Hungary– http://gnss.georgikon.hu– 193.224.81.88:2101
Single-Base System (2010)( 2009
33
TÁMOP-4.1.2.A/2-10/1-2010-0012
Trimble European VRS System
34
TÁMOP-4.1.2.A/2-10/1-2010-0012
• CSD (Circuit Switched Data)– Line connected mobile internet - 9,6 kbit/s - 1G
• GPRS (General Packet Radio Service)– Package connected - 115 kbit/s - 2G
• EDGE (Enhanced Data Rates for GSM Evolution) – GPRS reinforcement- 236 kbit/s-os (112-400) - 2,5G
• 3G – 3G mobile network, video call 384 kbit/s - 3G
• HSPA (High-Speed Downlink/Uplink Packet Access)– HSDPA theoretic data transfer speed depending on device and
coverage: up to 21 Mbit/s – 3,5G• 4G LTE (Long Term Evolution)
– 1Gbit/s - 4G
Mobile Internet
35
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Video– Trimble VRS system
• Animation– GNSSNet service– Geotrade GNSS– Georgikon GNSS Base
Videos and Animations for Chapter 2.
36
TÁMOP-4.1.2.A/2-10/1-2010-0012
I. questionFind the data of the accessible satellites of the Galileo andBEIDOU systems at a given time.
II. questionFind the terrain control stations of the Navstar GPS systemat a given time.
III. questionFind the worst measurement site on the Earth’s surface concerning ionosphere state at a topical time. Use the ‘space weather forecast’ of Australia (or any other information source).
Tasks for Chapter 2.
37
http://www.ips.gov.au/Space_Weather
TÁMOP-4.1.2.A/2-10/1-2010-0012
• GNSS Measurement– Planning (almanach)– Realization (online correction: procession too)– Data transfer(exchange formats, RINEX - Receiver
Independent Exchange Format)– Processing (vectors, transformation, error correction)– Network equalization (OGPSH – National GPS Network)
Terrain GNSS Measurement and Processing
38
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Guarantee of integrity– GNSS– Way of correction
• Guarantee of nedded accuracy– Accuracy of the Rover device– Way of correction– Satellite constellation– Minimalization of other disturbing facts
Aim of GNSS Measurement Planning
39
TÁMOP-4.1.2.A/2-10/1-2010-0012
• GNSS satellite data– Almanach
• Trimble Planning• Leica Satellite Availability• Topcon Occupation Planning
• Receiving correction data– Mobile internet
• Gprs coverage• Style, devices, realization
Devices for Planning
40
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Timing – Further in time– Back in time
• General – YUMA formátum,USA Coast Guard Navigációs Központ
(YUMA format, USA Coast Guard Navigation Center)– A dátum és a GPS-hét kapcsolata a GPS-naptárban (the
connection between date and GPS-week in the GPS calendar)
• Trimble• Leica• Topcon
Almanach
41
TÁMOP-4.1.2.A/2-10/1-2010-0012
Trimble Planning
42
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Relative – Real time– Radio – Satellite – Internet
• Post-processed– Digital data transfer
Channels of Correction Data
43
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Connection to satellites, controller• Connection to correction service• Setting measurement style• Starting measurement• Recording data
Realisation of Measurement
44
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Obtain, check and converse existing spatial data
• Set up a measurement plan– Need for accuracy– Available devices and
services – Specialities of the area– Select measurement method
• Places of measurement• Conversion to the format of
the terrain device• Upload data to the terrain
device
Preparation of Measurement
45
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Check measurement data• Inspection • Delete, edit• New recording • Data • Export in needed formats• Turn off terrain device
End of Measurement
46
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Load data from terrain device– Formats– Give coordinate system and date– Examine data load mistakes– inspection– Delete, edit
• Export to the format of procession
Processing Data
47
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Upload data to GIS system– Conversions – Analyses – Interpolations – Model building– Simulation – Statistical analysis– Publication
• Online correction– Procession
• Offline correction– Time of measurement– Obtain correction data– Correction – Check
GIS procession and Analysis of Data
48
TÁMOP-4.1.2.A/2-10/1-2010-0012
• EEHHTT software– Data input
• From file• Via keyboard
– Set format of data input– Set data conversion direction– Give coordinates
Checking Transformation
49
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Adatgyűjtő– Navigation accuracy
• ArcPad / palmtop with GPS antenna– GPS accuracy
• GPS Pathfinder office / Trimble GeoXH– Geodesic accuracy
• Trimble Survey Controller / Trimble 5800• Data procession
– GPS Analyst– GPS Pathfinder Office– Trimble Geomatics Office
– ArcGIS
Typical Terrain Device System
50
TÁMOP-4.1.2.A/2-10/1-2010-0012
Sample• Aim of survey: automatic data collection for 3D relief model• Place of survey: the island of Kányavári, Hungary• Time of survey: 21. December, 2008. 0920h-1530h• Type of survey: RTK; Format of message transfer: CMR+• PDOP mask: 6, elevation cutoff: 10 degrees, antenna: Trimble 5800, hant: 2 m• Coordinate System Hungary Zone Hungarian EOV• Project Datum HD72 (Hungary)• Vertical Datum Geoid Model EGM96 (Global)• Coordinate Units Meters; Distance Units Meters;Height Units Meters
• Name of point DeltaX DeltaY DeltaZ Slope Distance RMS• 25001 13189,539m 1880,080m 11396,001m 17531,898m 0,002m• Name of pointX Y H• 25001 142686.277 505893.164 109.042
Description of Continuous Topographic GPS Survey
51
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Take sample• Yield mapping• Sensors • Auto pilot system• Mass flow or sprayer
control • Row control• Seeder control
Basic GPS Elements of Precision Farming
52
TÁMOP-4.1.2.A/2-10/1-2010-0012
• 1. GPS survey of field blocks, soil sample taking plan• 2. Take soil sample according to plan every 3-5 acres• 3. Soil examination (extended and holistic) • 4. Make nutrient content maps• 5. Information, services for professional advice, analyses• 6. Agrochemical service• 7. Differentiated fertiliser plan • 8. Differentiated nutrient output, plant number plan • 9. Seeding with base station• 10. Precision herbicid plan (based on Hu, KA, pH map and weed
uptake)• 11. Ffertiliser quantity, upload into professional advice system• 12. Download data from the Internet
Precision Management System (IKR)
53
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Spreadsheet
Evaluation of Tillage Experiments
55
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Spreadsheet
• GIS software (weed density)
• GIS software (weed density)
Evaluation of Tillage Experiments II.
56
TÁMOP-4.1.2.A/2-10/1-2010-0012
3D Model
57
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Video– GNSSNet OGPSH
• Animation
Videos and Animations for Chapter 3.
58
TÁMOP-4.1.2.A/2-10/1-2010-0012
I. questionCreate a forecast for tomorrow 1200hr and 1215hr above 10 degree elevation cutofffor the area of the Helikon strand, Keszthely, Hungary (Lambda = 46 degree 45minutes, Fí = 17 degree 15 minutes, h = 150 m).GDOP=PDOP=HDOPVDOP=TDOP=Number of GPS satellites =Number of Glonass satellites=Number of Galileo satellites=Number of Compass satellites=
II. questionIn the IKR precision management system, which service(s) can use correction GNSSbase data?
III. questionIs soil sample take in the IKR precision management system realized with a yield mapor a grid?
Tasks for Chapter 3.
59
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Remote Sensing– With the help of remote sensing, objects can be examined that
are not in a direct connection with the sensor.– In a narrow sense, the concept of remote sensing is usually used
for aerial and space images. In a wider sense, it can also be defined for eg. remote measurements or medical applications.
– Remote sensing is the acquisition of information about an object or phenomenon, without making physical contact with the object. In modern usage, the term generally refers to the use of aerial sensor technologies to detect and classify objects on Earth (both on the surface, and in the atmosphere and oceans) by means of propagated signals (e.g. electromagnetic radiation emitted from aircraft or satellites).
Remote Sensing Device System, 3D Modelling
60
TÁMOP-4.1.2.A/2-10/1-2010-0012
• The measurement does not influence the examined object, or change its state.
• It can be used at wavelengths out of the visisble range. The result can be examined in the visible spectrum.
• Objective, exact data can be obtained.• Spatial, several dimension data can be obtained.• Lots of data can be obtained from big areas in a short time.• Areas that can not be reached or examined with other
methods can be examined.
Characteristics of Remote Sensing
61
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Active sensors– sense the reflection of their own radiation
• Passive sensors– have no emission
• One or more wavelength range• Images with more than one band are called (depending on the
number of bands) multispectral or hiperspectral.
Clasification of Sensors
62
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Geometric – pixel: the space of one point of the image measurable on
the earth surface, its real extension.• Spectral
– the value of radiation from the object• Radiometric
– characterises the colour depth of the pixels• Temporal
– the time interval between the images
Information from Sensors
63
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Wavelength, frequency– Visible light (0,4 - 0,7 µm)– Infrared (0,7 µm felett) – Ultraviolet (0,4 µm alatt)
Electromagnetic Spectrum
64
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Scatter - Multi path scattering• Occlusion
– Influencing factors• Traveled distance• Radiation energy • Composition of the atmosphere • Size of particles• Wavelength
Atmospheric Effects
65
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Chlorophyl absorbs the energy of the wavelengths between 0.45 and 0.67 µm,mostly blue and red colours, thus the colour of the healthy plant is green.
• In an unhealthy plant, the yellow colour together with the green can be caused by red reflection caused by chlorophyl decrease.
• Reflection within the range 0.7 and 1.3 µm highly depends on leaf structure (sort specific), and dramatically increases.
• Effect of stratification, water occlusion bands above 1.3 µm.• Above 1.3 µm, reflection is inversely proportional to the
whole water content of the leaf.
Visible and Infrared Range
66
TÁMOP-4.1.2.A/2-10/1-2010-0012
• The reflection curve of plant sorts are identifiable.
• Image correction (atmospheric distortion)
• Sample points• Spectrum
Visible and és Infrared Range II.
67
TÁMOP-4.1.2.A/2-10/1-2010-0012
• TM 1 0.45 – 0.52 µm(blue) 30 m• TM 2 0.52 – 0.60 µm(green) 30 m• TM 3 0.63 – 0.69 µm (red) 30 m• TM 4 0.76 – 0.90 µm(near infrared) 30 m• TM 5 1.55 – 1.75 µm(medium infrared) 30 m• TM 6 10.42 – 12.50 µm(thermal infrared) 120 m• TM 7 2.08 – 2.35 µm(middle infrared) 30 m
Spectral Bands and Resolution of Landsat TM
68
TÁMOP-4.1.2.A/2-10/1-2010-0012
• ASPRS (ASPRS satellite database)
Planned Objects of Satellite Sensing
69
TÁMOP-4.1.2.A/2-10/1-2010-0012
• 2002. DLR DAIS, 79 band system• 2006. with the help of AISA DUAL hiperspectral camera,
aerial data collection service was launched by the University of Debrecen (Hungary) and the Ministry of Rural Development. – Senses in a maximum of 498 bands, at the wavelength of
0.45–2.45 micrometres.
Hiperspectral Imaging in Hungary
70
TÁMOP-4.1.2.A/2-10/1-2010-0012
• National Aeronautics and Space Administration (NASA) and U.S. Geological Survey (USGS) (1999)
• Images in 7 bands (6 bands 30 m, termal-infra 60 m terrain resolution)
• Sun-synchronic orbit (the satellite travels above a given site at the same local time)
• Circulates at the height of 705 km• Can take images of an area of 185x170 km every 16 days
LANDSAT 5 TM
71
TÁMOP-4.1.2.A/2-10/1-2010-0012
• TM 1 0.45 – 0.52 µm differentation of land from plants, mapping of artificial surfaces.
• TM 2 0.52 – 0.60 µm mapping plant cover, identification of artificial surfaces.
• TM 3 0.63 – 0.69 µm differentation of planted surfaces from plantless surfaces, identification of artificial surfaces.
• TM 4 0.76 – 0.90 µm identification of plant sorts, definition of green mass, survey of plant vitality, mapping water surfaces, mapping soil water content.
• TM 5 1.55 – 1.75 µm examination of soil and plant water content, differentation of cloudiness from snow blanket.
• TM 6 10.42 – 12.50 µm mapping heat emission (plant stress, heat pollution)
• TM 7 2.08 – 2.35 µm differentation between rock types, mapping plant eater content
Application of Landsat Images
72
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Imaging : central perspective • Photogrammetry: defines the extention of real objects from
the sizes taken from the image– The resulting ortophoto (image data of the Earth surface
obtained by a satellite or aerial data collectors with geographic reference) can comprehensively be used with GPS systems
– During the planning and realisation of imaging, a GPS device system and adequate relief data are needed.
Ortophoto
73
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Photogrammetric evaluation is based on stereoscopy with perspectivic mapping between aerial and space images taken using central projection. – The essence of stetoscopy is that given terrain objects are
mapped in different ways in images from different sources. The task of photogrammetry is to measure the difference between parallaxes, and calculate spatial coordinates.
Photogrammetry
74
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Differentation of types of vegetation• Cover and yield• Calculatio• Productivity of biomass• Vitality and disease of flora• State of soil
– IMG files• View• Select bands• Colour bands
– Erdas ViewFinder 2.1– http://rst.gsfc.nasa.gov/Front/overview.html– FÖMI oktatóanyag (tutorial of the Institute of Geodesy, Cartography
and Remote Sensing, Hungary)
Remote Sensing Data in Agriculture
75
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Model of objects• Relief model• Terrain model• Elevation model
– Digital elevation model (DEM) is the topographic visualisation of the earth surface. It is usually used for relief maps, 3D visualisation, waterflow modelling, and in the case of aerial image correction. Applies remote sensing data or traditional land surveying data.
– Raster based elevation model– Vector based elevation model
Application of 3D Models
76
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Source elevation data create regular grid cells. The size of the cell is constant within the model. The height of the relevant geographic area can be considered constant in the same grid cell.
• Divides space into triangles not covering one another. – Vertices of every triangle are data
points, with the value of x, y, z. – The points are connected with lines,
which gives Delaunay triangles.– A TIN (Triangulated Irregular Network)
is a complete graph, which keeps its topologic connection with the relevant element (intersection, edge and triangle).
– Input data fit directly into the model.
Raster and Vector Models
77
TÁMOP-4.1.2.A/2-10/1-2010-0012
• SRTM (Shuttle Radar Topography Mission 2000) program
– Digital relief of about 80% of the Earth’s surface, with the help of radar system (Endeavour 11 days)
– Radar-interferometry, with two receivers 60 m from one another
– Mapped area: 60 degrees North, 57 degrees south
– Resolution 3 (USA 1) arcsec
Global Relief Model
78
TÁMOP-4.1.2.A/2-10/1-2010-0012
• TanDEM-X 2010, (TerraSAR-X)– Mapping of the whole surface of the Earth– Horizontal resolution 12 m, vertical resolution: 2m. – Two-radar remote sensing satellite with stereo microwave
radar device, at the height of 514 km– Polar sun synchronic orbit– Radiowaves emitted from a satellite with the help of
Synthetic-aperture radar (SAR) technique and then reflected from the surface are received with the antenna on the satellite , or the same surface is photographed from two different points.
Global Relief Model II.
79
TÁMOP-4.1.2.A/2-10/1-2010-0012
• The digital relief model of Hungary, 5m resolution– 1:10 000 scale EOTR
database was used– A GRID derived from
vectorized level lines.
3D Relief Model
80
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Generated from several sources– Level-line digitalization– Digitalization of elevation
points– Import GPS survey points– Correction (aerial photo) – Model generation– Publication
• Generation from direct GNSS measurement
3D Relief Map
81
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Video• Animation
– Elevation Model
Videos and animations for Chapter 4.
82
TÁMOP-4.1.2.A/2-10/1-2010-0012
I. questionFind an aerial image of your place of living from internet sources.
II. questionFind a space image of your place of living from internetsources.
III. questionMeasure the area of the Kányavári Island (Kányavári-sziget),Hungary on the photos of 1990., 1992. and 2002. Use ErdasViewFinder (or any other IMG viewer). The images can be foundon the remote sensing tutorial website of FÖMI
Tasks for Chapter 4.
83
http://www.fomi.hu/taverzekeles_oktatoanyag
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Types of Mapservers– Static webmaps– Dynamically created webmaps– Animated webmaps– personalized webmaps– Open, reusable webmaps– Interactive webmaps– Webmaps suitable for analysis– Collaborative webmaps
Spatial Data Databases
84
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Static webmaps– No animation and interactivity– Only created once, infrequently updated– Mostly scanned paper based maps
• Dynamically created webmaps– Created on demand, often from dynamic data sources– Created by server (ArcIMS –ArcSDE)– WMS protocol
Types of Webmaps II.
85
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Animated webmaps– Show changes in the map over time (water currents, wind
patterns, traffic info)– Real time, data from sensors– Updated Rregularly or on demand
• Personalized webmaps– Allow user to apply own data filtering, selective content– Personal styling and symbolization– OGC SLD WMS uniform system (Styled Layer Description)
Types of Webmaps III.
86
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Open, reusable webmaps• Complex systems, open API(Google Maps, YahooMaps,
BingMaps)• Compatible with API „Open Geospatial and W3C Consortium”
standards• Interactive webmaps• Chengeable parameters • Easy navigation• Events, descriptions, DOM-manipulations
Types of Webmaps IV.
87
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Analytic webmaps– Offer GIS-analysis
• Geodata uploaded by user• Geodata provided by server
• Analysis is carried out by a serverside GIS, results of analysis are displayed by the client.
• Collaborative webmaps– Geometric features being edited by one person can not be
changed by any one else at the time. – Quality check is needed before publication
(OpenStreetMap, Google Earth, Wiki- Mapia…).
Types of Webmaps V
88
TÁMOP-4.1.2.A/2-10/1-2010-0012
• ‘Institute of Geodesy, Cartography and Remote Sensing’, Hungary
• Földmérési és Távérzékelési Intézet fontosabb adatbázisai
(important databases of the Institute of Geodesy, Cartography and Remote Sensing)
‘FÖMI’
89
TÁMOP-4.1.2.A/2-10/1-2010-0012
• To continuously inform farmers and experts, to provide professional background knowledge for tenders and developments.
• Its knowledge base is based on professional news, events, articles, studies, publications-published in an organised, updated system.
• A further aim of the site is to prepare for online data service (logbooks, electronic submission of data of farmers working on vulnerable areas), to give info on data in connection with agri-environmental management, to publish relevant thematic maps and to ensure agrar forecast.
Hungarian National Rural Network (AIR)
90
TÁMOP-4.1.2.A/2-10/1-2010-0012
• 1:200.000 scalegenetic soil map of Hungary
• 40 soil types, 80 sub types, with colours and colour shades
• Physical soil kinds (9 categories) with striping
• Soil formation rock (28 categories) betűjelekkel
AIR Public Map Library
91
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Obtain, process and store weather data• Apply weather data in the agrometeorology model of the
crop growth monitoring system (Crop Growth Monitoring System, CGMS)
• Process NOAA-AVHRR and SPOT-VEGETATION satellite images using CORINE land coverage data (CORINE Land Cover, CLC)
• Common Research centre– Statistic analysis of data– Quantity forecast– Short time crop yield forecast
MARS (Monitoring Agriculture by Remote Sensing) terményhozam-előrejelző rendszer
92
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Monitoring Agriculture by Remote Sensing
MARS
93
TÁMOP-4.1.2.A/2-10/1-2010-0012
‘FÖMI NÖVMON’ (Plant Monitoring)
94
TÁMOP-4.1.2.A/2-10/1-2010-0012
IKR Precision Map Server
95
TÁMOP-4.1.2.A/2-10/1-2010-0012
Soil Data Publication (Georgikon Mapserver, Hun)
96
TÁMOP-4.1.2.A/2-10/1-2010-0012
• (Infrastructure for Spatial Information in the European Community-INSPIRE)
• ‘The INSPIRE Geoportal provide the means to search for spatial data sets and spatial data services, and subject to access restrictions, view and download spatial data sets from the EU Member States within the framework of the Infrastructure for Spatial Information in the European Community (INSPIRE) Directive.
• Aims at making available relevant, harmonised and quality geographic information to support formulation, implementation, monitoring and evaluation of policies and activities which have a direct impact on the environment.’
• (www.inspire-geoportal.eu)
INSPIRE Geoportal
97
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Inspire should be based on the infrastructures for spatial information that are created by the Member States and that are made compatible with common implementing rules and are supplemented with measures at Community level. These measures should ensure that the infrastructures for spatial information created by the Member States are compatible and usable in a Community and transboundary context.
Spatial Data Directive
98
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Member States shall ensure that metadata are created for the spatial data sets and services corresponding to the themes listed in Annexes I, II and III, and that those metadata are kept up to date.
Inspire2008 metadata
99
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Online access to a collection of geographic data and services• Does not store or maintain data• Metadata, catalogues can be accessed with several search
options• With the help of a map server service, maps and metadata
can be searched for and browsed. • Personal maps can be created from existing data sources.
INSPIRE Geoportal
100
TÁMOP-4.1.2.A/2-10/1-2010-0012
• INSPIRE Geoportal
INSPIRE Geoportal Viewer
101
TÁMOP-4.1.2.A/2-10/1-2010-0012
• ArcExplorer JEE Corine Land Cover mash up map from several sources
• http://vektor.georgikon.hukvsz
• http://geo.kvvm.huclc (80%
transparency)Mashup map: a map that includes another (API), made from several internet sources.
Mashup Mapserver Service
102
TÁMOP-4.1.2.A/2-10/1-2010-0012
WebMap and publication
103
Website
MapServer
Picture
Video
Web service
HTML
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Steps of realization : – 1. chose topic– 2. create map, upload
data• a. Create web album,
upload photos• b. Upload video
– 3. create website, embed map
– 4. publish website
Steps of Realization
104
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Video– Institute of Geodesy Cartography and Remote Sensing– Hungarian National Rural Netvork– Inspire Geoportal– GoogleMaps service
• Animation
Videos and Animations for Chapter 5.
105
TÁMOP-4.1.2.A/2-10/1-2010-0012
I. questionMeasure the length of the Belső-tó (‘Inner lake’) of Tihany, Hungary with the help of the topographic map service of the Georgikon Mapserver (or any other mapserver).
II. questionCreate a GoogleMaps map in any agricultural topic with at least 5objects, inserted images and embed it into a website of the sametopic.
III. questionEmbed further mapserver services (Bingmaps, YahooMaps…) intothe website you have created.
Tasks for Chapter 5.
106
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Prepared by:– Dr. Máté Csák
Plant protection database
107
Plant Protection Information
Plant protection database
TÁMOP-4.1.2.A/2-10/1-2010-0012
109
Plant protection’s databases: Topics
• Database management theory– Information, data– Database models, databases– Database Management Systems
• Relation model– Base of theory– Normalized database– Catalog, data-dictionary
• Plant protection’s databases– Practical problems and their solutions
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
110
Database management - Information• Information technology concepts, words of Latin
origin, which is intelligence, news, messages, information does.
• Definitions:1) In general, the data information, news of which we
consider relevant, and lack of knowledge has decreased. Wikipedia
2) Knowledge gains, the growth of knowledge, and it means reduce uncertainty. SH Atlas
3) The information provided is new data, news which removes uncertainty and consequences. Kalamár-Csák
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
111
Theoretical - Information
The information is the same physical reality of the universe as
matter and energy.
Plant Protection Information
pure informationInformation processing
Meaningful information
DNA-moleculeComputer data input
proteinCalculation results
TÁMOP-4.1.2.A/2-10/1-2010-0012
112
Theoretical - Information
Manifestations:• Clearly pronounced– Explicit
– When the information is completely clear to everyone, not in need of explanation.
– For example: the Balaton water at 28 °C• Hidden – Implicit
– The data connection between a method can be displayed.
– For example: statistical calculation (average)
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
113
Theoretical - Data
• The data of an object (any thing that relates to the data), to a specific value (character state, completed forms) for the variable (properties, attributes, characteristic, character).– Therefore be considered as a specific data are
defined, you define what kind of object that is variable, what value are added. The figures represented the value unit is always connected.
• For example: Name: Arvalin LR; Agent: Zinc phosphate; Volume: 4 %
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
114
Theoretical – Data model
• A collection of concepts, which clearly describe the structure of a database.– The structure includes the data type and
their relationship to the restrictive conditions for the data.
– The database conceptual level, logical structure description.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
115
Entity-Relationship-ER basic elements of data model
Plant Protection Information
ENTITIES
ATTRIBUTES
RELATIONSHIPS
TÁMOP-4.1.2.A/2-10/1-2010-0012
Data model – ER - Entities
• Entities : are the principal data objects, which all other things to distinguish, and information is to be collected. – Procedures at issue, and whom we want to store
data.– For example: Citizens, Workers, Patients,
Custumers; Plants, Agents, Phenological phase, Harmful; Cars, Goods, Accounts ...
– The entity to a specific value of the occurrence.
Plant Protection Information
116
·
TÁMOP-4.1.2.A/2-10/1-2010-0012
Data model – ER - Attributes
Attributes: • Internal structure of the entities• are characteristics of entities that provide
descriptive detail about them.– Plants of the named individual characteristic
such as : name, Latin name, ...• The property values of an individual's actual
value is determined.• For example: Peach, Prunus persica, …
Plant Protection Information
117
TÁMOP-4.1.2.A/2-10/1-2010-0012
Data model – ER – Attributes - Key
• If a property or properties to a group of clearly specifies, that the value which the individual is involved, together they are called keys. – For example: name in Plants
Plant Protection Information
118
TÁMOP-4.1.2.A/2-10/1-2010-0012
Data model – ER - Relationship
The relationships: • the external structure of entities,• the represent real-world associations among
one or more entities.• are described in terms of degree,
connectivity, and existence.– For example: Plants-Harmful, Accounts-Goods, ...
• A particular occurrence of a relationship is called relationship instance.
Plant Protection Information
119
TÁMOP-4.1.2.A/2-10/1-2010-0012
Datamodel – ER – Relationship - Types
The types of relationships:
•Independent connectivity
•1:1 connectivity
•1:N connectivity
•N:M connectivity
Plant Protection Information
120
TÁMOP-4.1.2.A/2-10/1-2010-0012
121
Adatmodell – ER – Kapcsolatok 1.
1. Independent connectivity– The two entities independent of each
other, if one set of instances, nothing is linked to a single element or another entities.
• For example:• Agent’s Id: Employe’s account
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Data model – ER – Relationship 2.
2. One –to – one connectivity (1:1):
• One of the elements of each set of instances of another entity set exactly one element is linked.
– For example: Agent’s Id: Agent’s name
Plant Protection Information
122
TÁMOP-4.1.2.A/2-10/1-2010-0012
Data model – ER – Relationship - 1:1 Connectivity
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Adatmodell – ER – Kapcsolatok 3.
3. One-to-many connectivity:
• A set of instances of each element of the B element within the multi-set of instances.
– For example: Aetiologies: Diseases
Plant Protection Information
124
TÁMOP-4.1.2.A/2-10/1-2010-0012
Data model – ER – Relationships - 1:N connectivity
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Adatmodell – ER – Kapcsolatok 4.
4. Many-to-many connectivity:• A set of intstances of all elements of the B
element within the multi-set of instances, vice versa.
– Például: Plants : Diseases
126
TÁMOP-4.1.2.A/2-10/1-2010-0012
Data model – ER – Relationship - N:M connectivity
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Data model - ER definition
• The data model isa finite number set of entity, their finite number set of properties and their set of relationship.
Plant Protection Information
128
TÁMOP-4.1.2.A/2-10/1-2010-0012
Data model - TypesDepending on the core 3 is based on storing the physical data model exist.
entity property connectivity•net, hierarchical + - +
•relation + + -
•Object oriented + + +
• + object-relational (mixid data model)
Plant Protection Information
129
TÁMOP-4.1.2.A/2-10/1-2010-0012
130
Databases• Database: some relation to each other in a
structured set of data, stored so that multiply users can access, typically digital form.
• The database is a finite number of entities occur, their are a finite number of property value, and the relationship of the presence data model orgonized as a combination.
• Benefit: you can use many at once. The data are stored "single" only.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
131
Integrated database
• Linked to all data that are used by different users in different groupings.
• The physical placement of data, centrally, redundancy-free or minimal, controlled redundancy occurs .
• Centrally controlled – data protection, – entering the new data, and– change existing data.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
132
Database Management System (DBMS)
• A softvare, which provides the connection to the database.
• Allows databases – creation, – query the data, – modification, – maintenance, – large amounts of data on long-term safe
storage.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
133
Database Management System (DBMS)
• Grouping– According to the number of users
• Single-user• Multi-user
– Job sharing as• A tasking• Client-Server
– Number of storage locations• A stored• Split /shared
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
134
Database Management System (DBMS)
• The system components– Data Definition Language (DDL)
• User level• Conceptual level• Physical storage level
– Data Manipulation Language (DML)– Data Control Language (DCL)
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
135
Database Management System (DBMS) – Operating concept
TÁMOP-4.1.2.A/2-10/1-2010-0012
DBMS – Operating concept - Explanation
1 Request for information from the database (Application program)2 Request the interpretation and analysis (DBMS: syntax,
existence, rights)3a Executeable→ to operating system3b can not execute → to program4 Contact the exterior container (operating system) 5 The transfer of the requested data (OS, from storage into buffer)6 The passing of data, feedback for a program7 The receipt of data into a program.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
137
Database Management System (DBMS)
• Two types:– Has a autonomous languages
• Oracle (1977)• DB/2 (1983)• SyBase (1987)• Informix (1981)• Ingres (1980)
– Plug-in type• IDMS (1983)• SQL (1986)
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
138
Relation database model – Theoretical basis
• In 1970 Dr. Edgar F. Codd (IBM) create the Relation Database Model.
• The data model describes the various types of data, their relation, connections, and their privacy procedures.
• The collected data are logically separate entity types, entities (table). Determine that the individual entities, whereas we can clearly identify, and also what additional features (attributes).
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – Relation diagram
In VirKor database has seven tables and their properties and relations.
Plant Protection Information
139
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – Relational mode of representation
• Relation of entities (special tables) shows.
• They describe the real world, different entities and their properties.
• Plants table
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
141
VirKor program - Relational mode of representation
• The connection between the entities can be depicted in relations.
• The data management comes true with relational operations.
• Plants – Pests relation
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Benefits and disadvantages
Benefits:• Mathematical (set theoretical) based on models• Very close to everyday thinking,• Most flexibly modifiable,• Well-separable, can be made independent the
three level.Disadvantages:• The power delivery is less effective.
– This is not so big trouble already today.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – The properties of relations • Is a clear relation in the database;• The specimens were characterized by rows and columns of entity properties;• The same number of colums in each row;• Columns within a clear relation to the name;• Any column in a row add up to a value (if no value is NULL);• Columns in any order;• Not two are the same place;• There are least a combination of columns that uniquely identifies the row.
This is the primery key.• Identify any data:
– Relation name– + column name– + value of primery key
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Table, viewsWe do not store each entity value of each property physically.• Base relation. Table
Physically stored.• Virtual relation. View
Contains no data. We create from tables with relational operations.• Materialized view.
Physically stored. We create from tables with relational operations. Change when you change the default tables.
• Snapshot Physically stored. Value of tables, views in a certain moment.
• Queries, the selection result.Relation is not true and only temporarily exist.
• Temporarily tablesTemporarily need that operation, task.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Keys
Ensure data integrity, consistency and relation exemption.
The system automatically checks:• The primary key and foreign key
relations between entities(eg., key of plants in a diseases of plants)– matching– Cascading change– Cascading delete
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Keys
PRIMARY KEY• Clearly identify a relation the rows.• The primary key (or part thereof) may not
be (or not) null value, and should not contain unnecessary columns.
• It is important to decide what should be the primary key if you have more options (eg, person identity: identity card number, tax identification, social insurance number)
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Integrity
Integrity additional options:• Define a unique index (this columns
will not add the same value in two rows)
• Given specific field conditions must be satisfied(eg, the check number only value possibility)
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Indexes
• To expedite the indexed column in the direct and sequential access.– Auto maintenance,– You can always be created, deleted,
– Slows down the change,– Space is needed.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Keys
FOREIGN KEY• A column (combination) in relation to the
link only to add value, eithers as a NULL value, or the referenced tablaóe with one of the primary key values are equal.
• Establish connections between the 1:N relationship. Shall remain valaid for all the changes, data input, deleting.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Foreign key relationship
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Foreign key relationship
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
152
Relation model – Normalization
• Normalization is a formal algorithmic process in which the initial data, the negative pattern of consistent application of appropriate rules of succession is logically more transparent better shape form.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Normalization• The previous steps of design entities well
manageable, received a standard take forms.
• Algorithmizable.• Result of:
– The data will be less need for storage;– The elementary data faster and less error-
prone to change;– The database will be logically clearer.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Normalization – Functional dependence• Any relation of attributes values depend on the values of other attributes.• If one of the attributes of the relation R (X), the independent variable is clearly
identified by another attribute (Y), the dependent variable, then we say that Y is functionally dependent on X from the relation R.
• Naturally this is a clear relation to the actual content of R is not only valid, but independent of time, for the whole duration of its existence constraint database.
• Both the X and Y attributes can be complex, that is consist of several columns as well.
• Functional dependence of the usual marked withR.X R.Y
• Maybe this even a functional diagram, dependency diagram is represent with a different name.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Representation of functional dependence
Plant Protection Information
The arrow Z is from.points to an independent attribute of the dependent attribute.Y and Z in the diagram is functionally dependent from X,Y and
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Full functional dependence• General terms, it R in relation it Y attribute
functionally if and only if X (composite) is complete attribute, if it is functionally dependent on X from, but does not depend on X has only a real component of his. – If X is not complex, then the functional and the
full functional dependence is the same.– strong– weak
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Full dependence
•Be P, Q Í A and P ® Q.•Q full dependent (functionally) from P, P only if Q does not depend on the part of set •Otherwise, the dependence is partial.
– For example:– ORDERITEM {order_id, goods_id, piece}– REPAYMENTS{deptor_id, month, amount, date}– VISIT{visitor_id, date, time, subject, period}
Plant Protection Information
157
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relációs modell – Tranzitív függőség
•Depends on the P to S is transitive, if there exists Q Í A, and P ® Q, Q ® S, but the reverse is not true dependecies.
– Példák: P ® Q® S– WORKER {perid, name, class_code, class_name}– ORDERHEADER {order_id, custcode, custname,
custaddres, date, deadline, totalvalue}– VISITOR{id, name, firm, firmname, firmaddres, …}
Plant Protection Information
158
TÁMOP-4.1.2.A/2-10/1-2010-0012
159
Relation model – Normalforms
• The entity’s structural state • 0NF • 1NF• 2NF• 3NF• 4NF
Plant Protection Information
Plant Latin name Deseases …
Apple Malus domestica
Applemosaic, Impetigo, Apple powdery mildew, …
Potato Solanum tuberosum
Staining virus reticulated, Blight, Black rotting, …
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – First normal form
• First normal form (1NF) is the relation which– Each column has one and only one attribute is
present,– Each row is different,– The order of attributes in each row is same,– There are not repeating fields,– Belongs to each line (at least) a unique key,
from which all the other attributes are functionally dependent.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
161
Relation model – Normál forms - 1NF - example
Plant Protection Information
Plant Latin name Disease Athology …
Apple Malus domestica Applemosaic virus
Apple Malus domestica Impetigo mushrooms
Apple Malus domestica Apple powdery mildew mushrooms
Potato Solanum tuberosum
Staining virus reticulated virus
Potato Solanum tuberosum
Blight mushrooms
Potato Solanum tuberosum
Black stemrotting bacterium
Wheat Triticum vulgare Pitch staining mushrooms
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Normalforms – 1NF - AnomaliesIt can be seen a lot of redundancy (eg Plant and Latin name).Hidden error possibilities (anomalies of change):• Erase anomaly:
– If we erase the removal of the wheat disease Pitch staining• Modify anomaly:
– If the potato into the new name blight disease are renamed, you can either „new” plants will should be modified or anywhere.
• Enter anomaly:– New disease can be entered only if a plant is already ill
(primary key can not be part of a NULL value).
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Normalforms – 2NF
• A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully dependent on the primary key. – Elementary primary key 1NF relations are also automatically in2NF.
Key relations are complex, however, in order to eliminate anomalies int the change we need to2NF.(This is not to removes all the amomalies, but could significantly reduce their number). This is called decomposition of relations.
– The decomposition happens so, that it 1NF from relation with a projection like that 2 NF, we manufacture relations, the primary keys of which the primary key of the original relation, or parts therefor, are those and can only those column that are fully dependent in the new primary key.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
164
Relation model – Normalforms – 2NF - Example
Plant Protection Information
Plan Latin name
Apple Malus domestica
Potato Solanum tuberosum
Wheat Triticum vulgare
Plan Desease Latin name PictureApple Apple mosaic virus Apple mosaic virusApple Impetigo Venturia inaequalisApple Apple powdery mildew Podosphaera leucotrichaPotato Virus networking staining Potato leafrollPotato Black stem rotting Erwinia carotovora subsp.
atroseptica
Potato Blight Phytophthora infestansWheat Pitch staining Lidophia graminis
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Normalforms – Decomposition (1NF®2 NF)• R(A,B,C,D) before decomposition(1NF)
– PRIMARY KEY(A,B)– R.A ® R.D
• After decomposition(2NF)– R1(A,D)– PRIMARY KEY (A)
• and– R2(A,B,C)– PRIMARY KEY (A,B)– FOREIGN KEY(A), refers to R1
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Normalforms – 3NF
• A relation R is in third normal form (3NF) if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.
• In other words, the 3NF means that only the functional dependence of the primary and the alternative keys can start up.
• Employee of the 2NF relation is not in 3NF, because for example, the class (CLASS) is not the primary or alternate key and other columns (CLASS-NAME), BOSS) is functionally dependent on it.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Normalforms – Decomposition (2NF®3NF)
• The decomposition happens so, that the 2 NF relation, we take the projection, which includes only those attributes that are exclusivly dependent on the primary key. This is primarily key will remain the same. The other new relation (or relations, if more than one relationship), the primary key attribute of an independent relation dismantled, and the columns of his dependent attributes.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
168
Relation model – 3NF - Example
Plant Protection Information
Plan Latin name
Apple Malus domestica
Potato Solanum tuberosum
Wheat Triticum vulgare
Plan Disease kép
Apple Apple mosaic virus
Apple Impetigo
Apple Apple powdery mildew
Potato Virus networking staining
Potato Black stem rotting
Potato Blight
Wheat Pitch staining
Disease Latin name Aetiology …
Apple mosaic virus
Apple mosaic virus
virus
Impetigo Venturia inaequalis
fungus
Apple powdery mildew
Podosphaera leucotricha
fungus
Virus networking staining
Potato leafroll virus
Blight Erwinia carotovora subsp. atroseptica
fungus
Black stem rotting
Phytophthora infestans
bacteria
Pitch staining Lidophia graminis fungus
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Normalforms – Decomposition (2NF®3 NF)
• General terms, if the A, B, C, D columns (any compound can be) of 2 NF relation– R(A,B,C,D)
• PRIMARY KEY(A)• R.B ® R.C
• 3NF is the decompositions of the re-establishment relations of the following means:
– R1(B,C)• PRIMARY KEY(B)és
– R2(A,B,D)• PRIMARY KEY(A)• FOREIGN KEY(A), refers to R1
• The relation R can be set back at any time is clearly a combination of R1 and R2 (B). It is however, that the splitting is done according to the principle set out above.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
Relation model – Normalforms – Decomposition - 3 NF
Notes:• Not always appropriate to the 3NF
shape (e.g., address and zip code).• Most database management system
enough to 1 NF, and even the primary key is not required!!!
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
171
Catalog, data dictionary
• The database – definition, – relationships, – storage, – how to use maintaining tables,
• views of all.• System administration carry out tasks.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
172
Plant protection database
• Pesticides Register• VirKor – assistant educational material
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
173
Pesticides Register - Records of pesticides• The aim of a database
Task of the ER-chemistry Co. register of pesticides manufactured by the Planning
• The register should include:– the origin of certain pesticides,– the elements needed to produce the drug,– the possible application areas.
• It is assumed that:– a pesticide may be single or multi-component,– more may be used against the pest,– one component can be derived from multiple
suppliers.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
174
Pesticides Register - Entity types
• Pesticides(id, name, degree of hazard, price)
• Factories (factory code, name)• The fields of application of drugs (pest
, type)• The drug components (compnent
name)• The Transporters (Transporter code,
date, name, address)
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
175
Pesticides Register - Entity type of ER-model
• Pesticides(id, name, hazard, price)• Factories(id, name)• Pests(pestname, type)• Components(name)• Transporters(id, date, name, address)
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
176
Pesticides Register - - ER-diagram
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
177
Pesticides Register - Relations• Where can produce ? Factories:Pesticides (1:N)
– A pesticide plant produces only one, but a plant can produce more gain.
• What do you apply? Pesticides:Pest (M:N)– A pesticide may be used against several pests, and in a pest
can destroy more times.• What are the ingredients? Pesticides:Components
(M:N)– A pesticide consists of several components, but other
substances may also be a component creator.• Where did it come from? Components:Transport (M:N)
– Carry more of a component supplier, but a number of component suppliers will also be distributed
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
178
Pesticides Register - Relation model
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
179
Pesticides Register - Relations
Only the primary keys• Pests(pname, type)• Factories(fid, name)• Components (cname)• Transports(tid, tdate,
tname, taddress)
Primary keys and foreign keys• Pesticedes(pid, name,
hazard, price, fid)• Applies(pid, pname,
term)• Elements(pid, cname,
volume%)• Origines(cname, tid,
tdate, quantity)
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
180
Pesticides Register - Pest table
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
181
Pesticides Register - Pesticide form
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
182
Pesticides Register - Pests form
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
183
Pesticides Register - Factories form
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program
assistant educational material
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program - assistant educational material
The Virkor is an assistant educational program, that helps students understand how to recognize diseases of different plants.
• Demonstration boards are modern.• Educational resource for students of
plant doctor.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – How does it works?
• Stored in the database:– plants,– diseases,– these relations.
• The displayed images are stored in a folder (locally or server).
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – Diseases of plant: Apple
Apple proliferation phytoplasma Podosphera leuchotricha
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program - Diseases of plant: Apple
Apple mosaic virus Monolinia fructigena
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – One disease on different Plants: Mosaic virus on cucumber and apple.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program - Symptoms: Necrosis of tissue
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – How it’s made?
• Photographed hand-capture demonstration boards and then were cleaned.
• The boards in the data recorded in an Excel spreadsheet.
• Created the relational database model.
• Developed the application.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – How it’s made? - Data collection - Digitization
Digitization of the demonstration boardsOriginal Cleaned
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – How it’s made? - Data collection - Stored
Store the data in a worksheet (1NF)
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – How it’s made? - Modify and correct data structure
In this case we supplemented the data with some other properties, for example: add plant parts.
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – How it’s made? - Data collection - Analyze the relationship between data
• Functional dependencies:– The Latin names of the plants are
dependants of the Hungarian names.– The same refers to the disease, the
symptoms and the aetiology (e.g. virus).
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program – How it’s made? – Create a Relation database model
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program - How it’s made? - Table – Entity: Plants
• Apple and it diseases
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program - How it’s made? - Table – Entity: Diseases
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program - How it’s made? - Table – Entity: Plants’ diseases
Plant Protection Information
TÁMOP-4.1.2.A/2-10/1-2010-0012
VirKor program - How it’s made? - Develop tutor program
• Form of Plants• Setting the properties of each tool• Programming each event
– For example • load image file in the picture box• Change the status of checkboxes• Etc.
Plant Protection Information
AZ ELŐADÁS LETÖLTHETŐ:-
Thank you for your kind attention.
Made by: Máté Csák PhD.
TÁMOP-4.1.2.A/2-10/1-2010-0012
Bibliography• Quittner P. - Baksa-Haskó G. (2008): Adatbázisok, Adatbázis-
kezelő rendszerek, DE ATC AVK• KUPCSIKNÉ FITUS I. (2004): Adatbáziskezelés, AIFSZ képzés
tananyaga• TÍMÁR L. ET AL. (2007): Építsünk könnyen és lassan
adatmodellt!, Pannon Egyetemi Kiadó, 46/2007, pp. 23-99.• HERNANDEZ, M. J. – Viescas, J. L. (2009): SQL-lekérdezések
földi halandóknak, Kiskapu.• ULLMAN, J. D. – Widom, J. (2008): Adatbázisrendszerek
Alapvetés 2. átdolgozott kiadás, Panem Kiadó.• CZENKY M. (2005): Adatmodellezés - SQL és ACCESS
alkalmazás - SQL Server és ADO, ComputerBooks.
Plant Protection Information
202
TÁMOP-4.1.2.A/2-10/1-2010-0012
• Prepared by:– Sándor Nagy
Bioinformatics
203
Information Technology in Plant Protection
BioinformaticsBioinformatics -
Databases and homology searching
TÁMOP-4.1.2.A/2-10/1-2010-0012
Contents
• What does Bioinformatics mean?• Structure and operation of DNA• Bioinformatical databases• Using databases• Exercise
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Definition• Bioinformatics derives knowledge from computer
analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Fields of Bioinformatics
• Superindividual Bioinformatics uses systematical modelling in order tp know biological systems
• Molecular Bioinformatics does protein and nucleotid analysis and planning
• Computing Bioinformatics is focusing on utilization of biological systems
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Aim of Bioinformatics
Is to decipher the genetically encoded information, which lead us information on the followings:
• 3D sturcture,• Function,• Evolutionary relations.
Information Technology in Plant Protection
DNA Protein Function
TÁMOP-4.1.2.A/2-10/1-2010-0012
Questions answered by Bioinformatics
• In which other creature can we find the actual sequences? (→ ortholog searching)?
• What kind of variatons can occure in a certain creature? (→paralog searching)?
• What is the rate of heterogeity in a certain paralog (→searching polymorphism)?
• Which positions are important in a given sequency (→ evolutionary conserved) ?
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Basics: Structure of the DNA• Double helix, in which nucleotide bases on the two strands
are connected by Hydrogene bonds: A:T - 2, G:C – 3 H-bonds • Base pairing: complemeter nucleotide bases within the long
polymer are: A:T and G:C• replication, • Genetic code- isn’t monotone • Two helical chains each coiled round the same axis, and each
with a pitch of 34 Ångströms (3.4 nanometres) and a radius of 10 Ångströms
• These two strands run in opposite directions to each other and are therefore anti-parallel, 5′ (five prime) and 3′(three prime) ends
• It containes four bases: adenine (A), cytosine (C), guanine (G), thymine (T)
• Structure of DNA:– http://www.youtube.com/watch?v=qy8dk5iS1f0&feature=player_e
mbedded
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Bases: replication of DNA
• Following from the rule of Base pairing: Hydrogene bonds within the double helix can be pulled apart, both strands are templates for the synthesis of a new strand. Result of this process: same structure
• Genetically coded information ensured by the order of nucleotides
• DNA replication:– http://www.youtube.com/watch?v=E8N
HcQesYl8&feature=related
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Történeti áttekintés
• Early 50thies– publishing insuline sequence• 1953 Watson-Crick: Structure of DNA• Early 70thies – creating algorithms for
sequenal analysis:– Dot matrix– Local and Global Sequence Alignment– BLAST algoritmus
• 1972 first computer stored databases of proteine sequences
• 1979 GenBank prototype
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Bioinformatical databases
• Gene Bank: NCBI (National Center for Biotechnology Information)– http://www.ncbi.nlm.nih.gov/
• European Molecular Biology Laboratory – European Bioinformatics Institute - EMBL-EBI– http://www.ebi.ac.uk/
• DNA DataBank of Japan – DDBJ– http://www.ddbj.nig.ac.jp/
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012Information Technology in Plant Protection
http://www.ncbi.nlm.nih.gov/
Choosing database
Searching keywords
TÁMOP-4.1.2.A/2-10/1-2010-0012
Navigation in Database - Gene Bank
• Choosing database:– Pubmed – database of
publications– Protein – database of
proteines– Nucleotide – database of
nucleic acid– Genom – database of whole
genomes– Gene – database of genes
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Navigation in Database - Gene Bank
• Exercise:– Looking for information on Phytophthora
infestans• Database: Taxonomy, Keyword: Phytophthora
infestans• Result of our search (next slide)
– Taxonomic classification– Databse results from GeneBank– References to other resources
• Choosing the following reference link on the result page we can reach all sequences in the Database: Nucleotid – Dirket
• Looking for INF2A gene within the results
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Eligazodás az adatbázisban - Gene Bank
• Eredmény tábla:
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012Information Technology in Plant Protection
Azonosítók
Information on publishing
Information on structure
Gén azonosítás
TÁMOP-4.1.2.A/2-10/1-2010-0012Information Technology in Plant Protection
Aminoacid orderNukleotide sequence
TÁMOP-4.1.2.A/2-10/1-2010-0012
Navigation in Database- GenBank• Dataformats: :
– Summary – short description, important information
– GenBank – own format of Genbank, detailed data
– FASTA – name+identifiers+sequence most common used format
– ASN.1 – international format
Information Technology in Plant Protection
220
TÁMOP-4.1.2.A/2-10/1-2010-0012
Navigation in Database- GenBank
• FASTA format:– Advantage:
• commonly used• simple• small
– Disadvantage:• less information
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Navigation in Database- GenBank
• Accession number:– AY693804 - Phytophthora infestans INF2A
(Inf2A) gene, complete cds– Accepted international identifier for
nukelic acids and protein sequences• GI (Genbank Identification) number:
– GI:51832280 - - Phytophthora infestans INF2A (Inf2A) gene, complete cds
– Identifier especially used only by Genbank
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Use for … ?
• To quest genetic information of a given organism
• To compare and check our results• Basis of comparing experiments• Collection of papers and publication• Base of researches
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Exercise:
– Look for the genetic code and proteine sequence of a chosen important causative agent and examine the availability of its’ genome.
– Save the given result in FASTA format, textfile. Keep the saved file, it is required for the exercise next time.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Content
• Comparing two sequences• Searching homologue sequences –
BLAST• Nucleotide BLAST - BLASTN• Proteine BLAST – BLASTP• Use for what?• Exerxise
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
How to compare two sequences – Dot matrix
• Dot Matrix method (Gibbs and McIntyre, 1970): It compares two amino acid or nucleotide sequences in a way of placing the two sequences in a matrix in both vertical and horizontal direction and it draws a dot in case of parity.
• Exceedingly suitable for visual demonstration of mutations, deletions and insertions.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
How to compare two sequences – Global Sequence Alignment• Other well known analytical method is the
Global Sequence Alignment which uses dynamical programming.
• Essence of the process: examining analogy of the sequences with the help of a scoring system on the whole sequence.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
How to compare two sequences – Local Sequence Alignment• Using also the dynamic programming
process.• Essence of the process: examining
analogy of the sequences with the help of a scoring system. It tries to create the best alignment.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Pair Sequence Similarity Search
• Basic Local Alignment Search Tool (BLAST) the most effective and common process of searching similarity– Peculiarities:
• Fast• Effective sensibility
– Types:• Blastn – for nucleotide sequences• Blastp – for proteins• Blastx – for translated nucleotid sequences
• http://www.ncbi.nlm.nih.gov/blast
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Pair Sequence Similarity Search
• Types:
Information Technology in Plant Protection
BLAST Kereső szekvencia Adatbázis
Blastn Nucleotide Nucleotide
Blastp Proteine Proteine
Blastx 6 frame translated nucleotide Proteine
Tblastn Proteine 6 frame translated nucleotide
Tblastx 6 frame translated nukleotide 6 frame translated nucleotide
TÁMOP-4.1.2.A/2-10/1-2010-0012Information Technology in Plant Protection
Entering sequences - copy - upload
TÁMOP-4.1.2.A/2-10/1-2010-0012
Most important settings: Blastn – Searching Databases
• The most commonly used one is the not-redundant nucleotide database (chosen one)
• It is possible to narrow searching in case we add taxonomical data in section „Organism”.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Most important settings: Blastn – program optimalisation
• Megablast: searching analogies with 95% or bigger similarity, very fast.
• D megablast: exceedingly suitable for comparison of species, bit slower.
• Suitable for the comparison of any sequences, it indicates little similarities, slow.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Most important settings: Blastp – Searching Databases
• The most commonly used one is the not-redundant protein database (chosen one)
• It is possible to narrow searching in case we add taxonomical data in section „Organism”.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Most important settings: Blastp – program optimalisation
• Blastp: simple searching in protein database.
• PSI-BLAST: searching algorithm with position-specific scoring
• PHI-BLAST: searching with pattern-specific scoring system.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
BLAST result evaluation
• Look for those nucleotide sequences which are similar to AY693804 - Phytophthora infestans INF2A gene.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
BLAST result evaluation
Information Technology in Plant Protection
Searching parameters
Garphical demonstration of the result
TÁMOP-4.1.2.A/2-10/1-2010-0012
BLAST result evaluation
• Result summary table– Max score
• Bigger value means bigger similarity– Query coverage
• Bigger value means bigger similarity
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
BLAST result evaluation
• Result summary table– E value – Expected value
• Lower value means higher similarity.– Max ident – maximal query alignment
• Higher value means higher similarity
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
BLAST result evaluation
• Detailed sequence alignment:
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
BLAST on local environment• It is possible to run BLAST program in local environment.
– It is useful in the following cases:• Comparing sequences to local databases• Operations requiring large number of calculations
• ftp://ftp.ncbi.nih.gov/blast/ – Command line running with parameter inputs.– Supporting many operating systems (also 32 and 64 bits
architectures)– Detailed help– First step is the database formatting, next step is similarity
analysis. – It is able to create a lot of output formats
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
What can we do with it?
• We have an unknown sequence from an unknown source.– What can be the source?– To which gene is similar?– What can be the function of the protein
coded by this sequence?• Use the sequence in FASTA format as
query parameter in the BLAST program. From the result we can answer the questions above.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Exercise
– Using of the FASTA format sequence saved in the previous presentation search relatives with similarity analysis (BLAST).
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Contents
• Multiple sequence alignments• Examining protein sequences• Protein 3D models• Use for what?• Exercise
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Multiple Sequence Alignment
• Essence:– Trying to fit more sequences at the same
time. Possible differences are estimated by penalties.
• Use:– Searching common peculiars and
parameters– Inserting new sequences - taxonomy– Protein structures– Phylogenetical analysis
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Multiple Sequence Alignment - An example
TTGACATG CCGGGG---A AACCGTTGACATG CCGGTG--GT AAGCCTTGACATG -CTAGG---A ACGCGTTGACATG -CTAGGGAAC ACGCGTTGACATC -CTCTG---A ACGCG******** ?????????? *****• What is the consensus sequence?• In case of differences it is difficult to detect
common patterns. That’s why we use alignment software.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Multiple sequence alignment
• Types of alignment:– Manual: hand-made, laborious and long
process. Human faults can occur.– Automata: faster, sometimes it doesn’t
consider biological requirement.– Combinated: we gain the best result by
using together the manual and the automata processes. First use the computer then refine it by hand.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Multiple sequence alignment
• Most used program for multiple sequence alignment: CLUSTAL W– Downloadable local version:
http://www.clustal.org– WWW version: http://www.ebi.ac.uk/clustalw– Use progressive alignment method– Fast, low memory usage application– More sequence alignment effective– Able to use drawing simple phylogenetic
trees
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Multiple sequence alignment
• Exercise: Do the multiple alignment on the following sequence:– Sequence file
http://align.bmr.kyushu-u.ac.jp/mafft/online/server/
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Multiple sequence alignment
• Example:– Examine the catalase sequences of the
plants below:• Paprika (Capsicum annuum)• Tobacco (Nicotiana tabacum)• Tomato (Solanum lycopersicum)• Potato(Solanum tuberosum)
• Sequence file• http://align.genome.jp/
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Multiple sequence alignment
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Multiple sequence alignment
Information Technology in Plant Protection
The result with Jalview program.
TÁMOP-4.1.2.A/2-10/1-2010-0012
Phylogenetics
• The study of evolutionary relatedness among various groups of organisms through molecular sequencing data and morphological data matrices.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Phylogenetics
• Relation of phylogenetical analysis and sequence alignments. – Sequence alignment determine the similarity and
difference of the aligned sequences. – In case of ortholog sequences: differences of
sequences from different species arises from the mutations collected during their different evolution.
– Number of mutations, namely the rate of the difference between the sequences is connected to the evolutionary distance between the two species: the longer ago the two species separated, the higher sequence difference occur within their ortholog genes.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Phylogenetics
• Steps of phylogenetic analysis:– 1. Sequence alignment– 2. Definition of evolutionary model– 3. Tree build– 4. Examination of the tree(s)
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Phylogenetics
• Example:– Do the multiple sequence alignment with
the sequence file below and draw the phylogenetic tree.
– Sequence file
http://align.bmr.kyushu-u.ac.jp/mafft/online/server/
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Multiple sequence alignment
• 1. Sequence alignment
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Multiple sequence alignment
Information Technology in Plant Protection
4. Examination of the tree
TÁMOP-4.1.2.A/2-10/1-2010-0012
Phylogenetics
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Use for what?
• Function prediction• Relative finding• Identification
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Exercise
– Look for resistance genes with known sequences. Do multiple sequence alignments on them. Evaluate similarities and create a phylogenetic tree.
– What are the consequences of the result?
Information Technology in Plant Protection
Information Technology in Plant Protection
Bioniformatics - multiple sequence
alignmentProtein sequence analysis
TÁMOP-4.1.2.A/2-10/1-2010-0012
Content
• Characteristics of the proteins• Protein sequence analysis• Protein 3D models• Practical task
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Characteristics of the proteinsProteins are biochemical compounds consisting of one or more
polypeptides typically folded into a globular or fibrous form, facilitating a biological function
Properties:• 20 amino acid coding triplets• Four types of bases 43 = 64 type of triplets enough 20 amino acid
coding• 61 triplet coding amino acids• 3 stop sign (stop codon) UAA, UAG, UGA• 1 codon sign the start of translation (start codon, methionine AUG)• One triplet coding only one type of amino acid, but the same amino acid
can determinate more triplets degeneration• synonymous triplets: coding the same amino acid• The gene and it has coded protein chain is coo linear• The code is zero overlapped• The genetically code is universal on the living resources.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Characteristics of the proteins
Information Technology in Plant Protection
DNA nucleotide order
RNA nucleotide order
Protein amino acid order
TÁMOP-4.1.2.A/2-10/1-2010-0012
Characteristics of the proteins
Information Technology in Plant Protection
The code table. (Griffiths et al., An Introduction of Genetic analysis, 8th Ed. Fig. 9-8.)
TÁMOP-4.1.2.A/2-10/1-2010-0012
Group of proteins:Basis of biological activity• Enzymes (pepsin)• Protection proteins (immunoglobulin)• Transport proteins (hemoglobin, mioglobin,
transpherin )• Hormones (insulin, ACTH)• Structure proteins (collagen, elastin,
keratin)• Toxins (snake poison)
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Characteristics of protein sequences
• Protein synthesis video:– http://www.youtube.com/watch?v=NJxobg
kPEAo
• Protein structure video:– http://www.youtube.com/watch?v=lijQ3a8
yUYQ
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Caracteristics of protein sequences
• Structures:– Primary structure: order of amino acids
• „MSASSSSALPPLVPALYRWK”– Secondary structure: spatial structure regularly
repeating local structures. The most common examples are the:
• Alpha helix and Beta sheet– Tertiary structure: the overall shape of a single protein
molecule; the spatial relationship of the secondary structures to one another.
– Quaternary structure: several protein molecules (polypeptide chains), usually called protein subunits in this context, which function as a single protein complex
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Characteristics of protein sequences
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Characteristics of protein sequences
• Protein databases:– Protein database: UniProt http://www.uniprot.org/– Protein structure database: ProteinDataBank
http://www.rcsb.org/pdb/home/home.do – Protein interaction database: String http://string.embl.de/
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Protein sequence analysis
• Analysis primary structure– First of all we examine the distribution
and physical-chemical qualities of aminoacids.
• Example: HCA - Hydrophobic Cluster Analysis acetyl-transpherase protein sequence from fusarium.
– http://mobyle.rpbs.univ-paris-diderot.fr/cgi-bin/portal.py?form=HCA
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Fehérje szekvenciák vizsgálata
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Protein sequence analysis
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Protein sequence analysis
• For general similarity trials• We can gain information by general
physical and chemical examinations.
• It is a big challenge for today’s technology to determine the structure and the function of a protein from the DNA sequence.
• For instance several diseases can be defeated if we are able to solve this problem.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Protein sequence analysis
• Prediction according to the dimension:– 1D: amino acid properties, which are
able to write as 1D string. Example: sequence, secondary structure, hydrophobicity
– 2D: distance and contacts between amino acid pairs
– 3D: configuration prediction on the basis of all atom coordinates
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Protein sequence analysis
• Common analysis for protein structures:– 3D configuration visualization → we can see some
important properties. – 3D configuration aligning→ similar structure – similar
function.– 3D configuration classifying → line condition, similar
function.– 3D configuration predicting → secondary, tertiary,
quaternary structure prediction.– Small molecules docking → medical product candidate
molecule for known structure molecule.– Protein structure behavior→ molecule-dynamic
simulation.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Protein 3D models
• In order to understand the function of a proteine, we have to know its 3D structure and quaternary structure. Then we are able to conclude its linking possibilities to other molecules, proteines, enzymes.
• Let’s see an example for acetyl-transpherase protein sequence from fusarium :
– http://www.rcsb.org/pdb/explore/explore.do?structureId=3FP0
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Analysis of protein synergy
• Database of Interacting Proteins– It collect data about proteines interacting
(bonding) with each others by experimental results.
– It describes about 11 000 interaction of about 6200 proteine.
– Specific details of one interaction: one proteine, the other proteine, interacting regions, experimental methods, dissociation constant, references.
– Example: we can show how interaction-network graphs build (nodes clickable).
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Analysis of protein synergy
• http://dip.doe-mbi.ucla.edu/
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Exercise
– Look for the proteine sequence and (if it exists) the 3D structure of an important causative agent in an optional sequence database. Examine the possible linking points.
Information Technology in Plant Protection
Information Technology in Plant Protection
Bioinformatics- Genomes
TÁMOP-4.1.2.A/2-10/1-2010-0012
Content
• What is genome?• Genome projects• Genome browser software• Use for what?• Exercise
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Definition of genome?
• The genome: „The genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA.” (Wikipedia)
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Genomics is the discipline of Genome
• Genomics: encompasses a broader scope of scientific inquiry associated technologies than when genomics was initially considered. A genome is the sum total of all an individual organism's genes. Thus, genomics is the study of all the genes of a cell, or tissue, at the DNA (genotype), mRNA (transcriptome), or protein (proteome) levels.– Functional genomics: attempts to make use of the
vast wealth of data produced by genomic projects – Structural genomics: attempts to determine the
structure of every protein encoded by the genome
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Tools of genomics
• Microarray (v. chip): It is a 2D array on a solid substrate (usually a glass slide or silicon thin-film cell) that assays large amounts of biological material using high-throughput screening methods.
• Types: – DNA microarrays (oligonucleotids or cDNA) – Protein microarrays – Cellular microarrays – Tissue microarrays– Antibody microarrays
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Tools of genomics
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Genome projects
• In the last decade several genome projects has started in the world. The aim of these projects is the entire recognition of the genetic code of more and more creatures. In August 2010 genomes of nearly 2500 species are known entirely or partly.
• Important projects:
– http://genome.ucsc.edu
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Genome projects
• First successful genome project was the Hemophilus Influenzae in 1995, done by Fleischmann et al. The first plant genome was Arabidopsis thaliana in 2000. The entire exploration of the humane genome was completed in 2003.
• Tools of bioinformatics play an important role in the analytical work following genome sequencing. This time happen assembling the genetically code.
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Genom browsers
• Trough genome browsers we are able to view clear and arranged format of genomic data.
• In some case genome is the starting point of genetically analysis.
• Some genome browsers:– http://genome.ucsc.edu– http://www.ensembl.org – http://ecrbrowser.dcode.org– http://www.ncbi.nlm.nih.gov/mapview/
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Genome browsers
• Properties of UCSC Genome browser– News and information on the start page– Genome grouping on the basis of class
and genome.– Selectable sequence assembly by date
(tracking able)– Searching by test
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Chromosome number
Position on Chromosome
Base position
Known genes
Mammals conservations
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Genome browsers
• Options under the graphical interface:– Gene and gene probability options– mRNA and EST options– Expressions and regulation– Comparison options– Variations and duplicates
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Genome browsers
• NCBI MapViewer properties– Four level system
• Start page• Genome View• Map View• Sequence View
– Ability for keyword searching– Access to 800 species genomes
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Genome browsers
• NCBI MapViewer
Information Technology in Plant Protection
TÁMOP-4.1.2.A/2-10/1-2010-0012
Exercise
– With the help of genome browser look for the genome of an important causative agent in an optional sequence database. Examine what kind of genes can occur in the surroundings of the one-millionth nucleotide on the second chromosome of this organism.
Information Technology in Plant Protection
PRESENTATION CAN BE DOWNLOADED FROM:-
Georgikon Faculty
Information Technology in Plant Protection
Prepared by:Dr. János Busznyák - Dr. Máté Csák– Sándor Nagy