stereo vision algorithms in reconfigurable hardware for robotics applications

65
Mälardalen University Press Licentiate Theses No. 141 STEREO VISION ALGORITHMS IN RECONFIGURABLE HARDWARE FOR ROBOTICS APPLICATIONS Jörgen Lidholm 2011 School of Innovation, Design and Engineering

Upload: others

Post on 11-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stereo vision algorithms in reconfigurable hardware for robotics applications

Mälardalen University Press Licentiate ThesesNo. 141

STEREO VISION ALGORITHMS IN RECONFIGURABLEHARDWARE FOR ROBOTICS APPLICATIONS

Jörgen Lidholm

2011

School of Innovation, Design and Engineering

Mälardalen University Press Licentiate ThesesNo. 141

STEREO VISION ALGORITHMS IN RECONFIGURABLEHARDWARE FOR ROBOTICS APPLICATIONS

Jörgen Lidholm

2011

School of Innovation, Design and Engineering

Page 2: Stereo vision algorithms in reconfigurable hardware for robotics applications

Copyright © Jörgen Lidholm, 2011ISBN 978-91-7485-033-8ISSN 1651-9256Printed by Mälardalen University, Västerås, Sweden

Abstract

This thesis presents image processing solutions in FPGA based embedded vi-sion systems. Image processing is a demanding process but the informationthat can be extracted from images is very useful and can be used for manytasks like mapping and navigation, object detection and recognition, collisiondetection and more.

Image processing or analysis involves reading images from a camera sys-tem, improve an image with respect to colour fidelity and white balance, re-moving distortion, extracting salient information. The mentioned steps areoften referred to as low to medium level image processing and involve largeamounts of data and fairly simple algorithms suitable for parallel processing.Medium to high level processing involves a reduced amount of data and morecomplex algorithms. Object recognition which involves matching image fea-tures to information stored in a database is of higher complexity.

A vision system can be used in anything from a car to industry processesto mobile robots playing soccer or assisting people in their homes. A visionsystem often works with video streams that are processed to find pieces thatcan be handled in an industry process, detect obstacles that may be potentialhazards in traffic or to find and track landmarks in the environment that can beused to build and navigate from. This involves large amount of calculationsand this is a problem, even though modern computers are fast they may stillnot be able to execute the desired algorithms with the frequency wanted. Evenif there are computers that are fast enough they are bulky and require a lot ofpower. They are not suitable for incorporating on small mobile robots.

In this thesis I will present the image processing sequence to give an under-standing of the complexity of the processes involved and I will discuss someprocessing platforms suitable for image processing. I will also present mywork that is focused on image algorithm implementations for reconfigurablehardware suitable for mobile robots with requirements on speed an power con-sumption.

i

Page 3: Stereo vision algorithms in reconfigurable hardware for robotics applications

Copyright © Jörgen Lidholm, 2011ISBN 978-91-7485-033-8ISSN 1651-9256Printed by Mälardalen University, Västerås, Sweden

Abstract

This thesis presents image processing solutions in FPGA based embedded vi-sion systems. Image processing is a demanding process but the informationthat can be extracted from images is very useful and can be used for manytasks like mapping and navigation, object detection and recognition, collisiondetection and more.

Image processing or analysis involves reading images from a camera sys-tem, improve an image with respect to colour fidelity and white balance, re-moving distortion, extracting salient information. The mentioned steps areoften referred to as low to medium level image processing and involve largeamounts of data and fairly simple algorithms suitable for parallel processing.Medium to high level processing involves a reduced amount of data and morecomplex algorithms. Object recognition which involves matching image fea-tures to information stored in a database is of higher complexity.

A vision system can be used in anything from a car to industry processesto mobile robots playing soccer or assisting people in their homes. A visionsystem often works with video streams that are processed to find pieces thatcan be handled in an industry process, detect obstacles that may be potentialhazards in traffic or to find and track landmarks in the environment that can beused to build and navigate from. This involves large amount of calculationsand this is a problem, even though modern computers are fast they may stillnot be able to execute the desired algorithms with the frequency wanted. Evenif there are computers that are fast enough they are bulky and require a lot ofpower. They are not suitable for incorporating on small mobile robots.

In this thesis I will present the image processing sequence to give an under-standing of the complexity of the processes involved and I will discuss someprocessing platforms suitable for image processing. I will also present mywork that is focused on image algorithm implementations for reconfigurablehardware suitable for mobile robots with requirements on speed an power con-sumption.

i

Page 4: Stereo vision algorithms in reconfigurable hardware for robotics applications

Swedish Summary - Svensk

Sammanfattning

Robotar blir allt vanligare i vårat samhälle. De finns allt ifrån traditionella in-dustrirobotar till robotar som hjälper oss i hemmen med de tråkigaste sysslorna,robotar som används för att övervaka stora områden, robotar som vi bara harför nöjes skull och en hel uppsjö med olika varianter.

En robot är mer än både än maskin och en intelligent dator, en robot är justen robot om den kan känna, planera och agera. Att känna innebär att robotenskall med sensorer känna av fysiska fenomen i sin omgivning, det kan vara alltifrån att känna av temperatur, bestämma ett avstånd till någonting, känna igenen person och så vidare. Att planera innebär att roboten skall genom intelligensutifrån vad den känner till om sin omgivning utifrån sensordata, planera påvilket sätt den skall agera. Till slut så innebär agera, att roboten skall utföra enfysisk handling, vilket kan bestå i att den förflyttar sig, lyfter upp ett objekt avnågot slag eller följer en boll med ögonen (kameror).

En central del för en robot är följaktligen en intelligent sensor som helstskall kunna användas till så mycket som möjligt, kunna leverera den informa-tion som roboten vill ha med tillräcklig hastighet så att roboten hinner fatta ettbeslut och dessutom skall den leverera information som roboten har använd-ning av. Då ögonen fyller en viktig funktion för människan och vi använderdom främst till att se vad som finns i vår omgivning, vi kan med våra ögonockså bestämma avstånd med tillräckligt god noggrannhet men ögonen hjälpeross också att hålla balansen. För att kunna bygga ett system för artificiellsyn för robotar behövs främst kameror men också metoder för att analyserabilderna som kamerorna tar. Att analysera bildinformation kräver stora resursermed avseende på beräkningskraft, både för att de metoder som normalt användsär relativt avancerade men främst beroende av den stora mängden information.

iii

Page 5: Stereo vision algorithms in reconfigurable hardware for robotics applications

Swedish Summary - Svensk

Sammanfattning

Robotar blir allt vanligare i vårat samhälle. De finns allt ifrån traditionella in-dustrirobotar till robotar som hjälper oss i hemmen med de tråkigaste sysslorna,robotar som används för att övervaka stora områden, robotar som vi bara harför nöjes skull och en hel uppsjö med olika varianter.

En robot är mer än både än maskin och en intelligent dator, en robot är justen robot om den kan känna, planera och agera. Att känna innebär att robotenskall med sensorer känna av fysiska fenomen i sin omgivning, det kan vara alltifrån att känna av temperatur, bestämma ett avstånd till någonting, känna igenen person och så vidare. Att planera innebär att roboten skall genom intelligensutifrån vad den känner till om sin omgivning utifrån sensordata, planera påvilket sätt den skall agera. Till slut så innebär agera, att roboten skall utföra enfysisk handling, vilket kan bestå i att den förflyttar sig, lyfter upp ett objekt avnågot slag eller följer en boll med ögonen (kameror).

En central del för en robot är följaktligen en intelligent sensor som helstskall kunna användas till så mycket som möjligt, kunna leverera den informa-tion som roboten vill ha med tillräcklig hastighet så att roboten hinner fatta ettbeslut och dessutom skall den leverera information som roboten har använd-ning av. Då ögonen fyller en viktig funktion för människan och vi använderdom främst till att se vad som finns i vår omgivning, vi kan med våra ögonockså bestämma avstånd med tillräckligt god noggrannhet men ögonen hjälpeross också att hålla balansen. För att kunna bygga ett system för artificiellsyn för robotar behövs främst kameror men också metoder för att analyserabilderna som kamerorna tar. Att analysera bildinformation kräver stora resursermed avseende på beräkningskraft, både för att de metoder som normalt användsär relativt avancerade men främst beroende av den stora mängden information.

iii

Page 6: Stereo vision algorithms in reconfigurable hardware for robotics applications

iv

En bild innehåller miljontals små färgvärden, för att kunna bestämma djup ibilderna krävs dessutom två bilder från olika perspektiv vilket dubblerar infor-mationsmängden. För att en robot skall kunna röra sig någorlunda snabbt krävsdessutom en uppdateringsfrekvens på bilderna med tiotals bilder i sekunden.

En vanlig modern persondator innehåller en processor som är generell ochjobbar med hög klockfrekvens. Trotts den höga klockfrekvensen tar det tidatt göra beräkningar på stora mängder data. En så kallad FPGA är en enhetsom kan programmeras att lösa ett antal specifika uppgifter och bara dessa.FPGAn tillåter att man köra otroligt stora mängder beräkningar parallellt ochkan därför resultera i en enorm förbättring i beräkningskapacitet jämfört meden standarddator och dessutom med lägre energiåtgång.

I den här avhandlingen diskuterar jag övergripande vad en robot är, vilkaolika sorters robotar som finns. Fokus för avhandlingen ligger dock på bild-analys för robotapplikationer i just FPGA-er. Jag diskuterar bland annat prob-lem och lösningar för bildanalys i FPGA och tar även upp relaterade forskn-ingsområden som kan appliceras på detta problem.

Page 7: Stereo vision algorithms in reconfigurable hardware for robotics applications

iv

En bild innehåller miljontals små färgvärden, för att kunna bestämma djup ibilderna krävs dessutom två bilder från olika perspektiv vilket dubblerar infor-mationsmängden. För att en robot skall kunna röra sig någorlunda snabbt krävsdessutom en uppdateringsfrekvens på bilderna med tiotals bilder i sekunden.

En vanlig modern persondator innehåller en processor som är generell ochjobbar med hög klockfrekvens. Trotts den höga klockfrekvensen tar det tidatt göra beräkningar på stora mängder data. En så kallad FPGA är en enhetsom kan programmeras att lösa ett antal specifika uppgifter och bara dessa.FPGAn tillåter att man köra otroligt stora mängder beräkningar parallellt ochkan därför resultera i en enorm förbättring i beräkningskapacitet jämfört meden standarddator och dessutom med lägre energiåtgång.

I den här avhandlingen diskuterar jag övergripande vad en robot är, vilkaolika sorters robotar som finns. Fokus för avhandlingen ligger dock på bild-analys för robotapplikationer i just FPGA-er. Jag diskuterar bland annat prob-lem och lösningar för bildanalys i FPGA och tar även upp relaterade forskn-ingsområden som kan appliceras på detta problem.

Page 8: Stereo vision algorithms in reconfigurable hardware for robotics applications

Acknowledgements

I would like to take the opportunity to mention the people that has supportedmy work at Mälardalen university in one way or the other. I have really enjoyedthe years that has passed by to fast and would not hesitate to make the samejourney again.

First I would like to thank the ones that has provided financial support toenabled this and they are the Knowledge Foundation, Robotdalen and Xilinx(that has supported by providing tools and hardware).

My main supervisor Lars Asplund has not only been my supervisor, he hasalso been an never ending source of ideas. It has not always been so easy tokeep up, but it has always been great fun! I also consider Lars a dear friendand I have enjoyed the many discussions we have shared during the years. Agreat thank you also goes to my two assisting supervisors. Mikael Ekströmthat became more and more involved and was of great help while I was writingmy final paper. Giacomo Spampinato brought fresh research thinking into thegroup and I hold you to your word that if I need help, your Sicilian family canhelp.

During the course of my studies, both at graduate and under graduate level,I have had the pleasure to meet many great friends. I would like thank some ofyou by sending an extra large hug to you. Thank you Andreas Hjertström for allthe interesting talks and for being a great friend since we started the computerengineering program in 2001. Fredrik Ekstrand, my comrade in arms, thankyou four all the great talks both on private and professional matters. HüseyinAysan, the man of strong prepositions and probably the most curious personI’ve ever met, I’m looking forward to ride Siljan runt with you! Carl Ahlberg,a good friend and an inspiration when it comes to enjoying life, hopefully I canconvince you to ride Finnmarskturen or Cykelvasan, or both, with me. PeterWallin, a great friend since we started the computer engineering program in2001.

vii

Page 9: Stereo vision algorithms in reconfigurable hardware for robotics applications

Acknowledgements

I would like to take the opportunity to mention the people that has supportedmy work at Mälardalen university in one way or the other. I have really enjoyedthe years that has passed by to fast and would not hesitate to make the samejourney again.

First I would like to thank the ones that has provided financial support toenabled this and they are the Knowledge Foundation, Robotdalen and Xilinx(that has supported by providing tools and hardware).

My main supervisor Lars Asplund has not only been my supervisor, he hasalso been an never ending source of ideas. It has not always been so easy tokeep up, but it has always been great fun! I also consider Lars a dear friendand I have enjoyed the many discussions we have shared during the years. Agreat thank you also goes to my two assisting supervisors. Mikael Ekströmthat became more and more involved and was of great help while I was writingmy final paper. Giacomo Spampinato brought fresh research thinking into thegroup and I hold you to your word that if I need help, your Sicilian family canhelp.

During the course of my studies, both at graduate and under graduate level,I have had the pleasure to meet many great friends. I would like thank some ofyou by sending an extra large hug to you. Thank you Andreas Hjertström for allthe interesting talks and for being a great friend since we started the computerengineering program in 2001. Fredrik Ekstrand, my comrade in arms, thankyou four all the great talks both on private and professional matters. HüseyinAysan, the man of strong prepositions and probably the most curious personI’ve ever met, I’m looking forward to ride Siljan runt with you! Carl Ahlberg,a good friend and an inspiration when it comes to enjoying life, hopefully I canconvince you to ride Finnmarskturen or Cykelvasan, or both, with me. PeterWallin, a great friend since we started the computer engineering program in2001.

vii

Page 10: Stereo vision algorithms in reconfigurable hardware for robotics applications

viii

I would also like to thank all the people taking part in coffee break dis-cussions, you are many and you have all taken part in making my time at theuniversity a pleasure.

A big thank you to my parents, Sonja and Tommy, for supporting me duringmy life and pushing me to follow my heart, you mean everything to me. Tomy brother Johan, thank you for being an inspiration in life, for the times wehave shared discussing things and life over a whiskey by the whiskey-holk orelsewhere. Thank you my sister Linda for being supportive and a great sister.Thank you Sandra and Robert for believing in me and giving me the privilegeto become “godfather” to your three lovely boys, Albin, Hampus and Viktor.

Teuvo ja Raili, Kiitos kaikesta tuesta.And last but not least, thank you Pia for always supporting me, you are the

love of my life!

Thank you all!!

Jörgen LidholmGöteborg, September, 2011

List of Publications

Papers Included in the Licentiate Thesis

Paper A: Two Camera System for Robot Applications; Navigation,Jörgen Lidholm, Fredrik Ekstrand and Lars Asplund, In Proceedings of13th IEEE International Conference on Emerging Technologies and Fac-tory Automation (ETFA’08), IEEE Industrial Electronics Society, Ham-burg, Germany, (2008).

Paper B: Validation of Stereo Matching for Robot Navigation,Jörgen Lidholm, Giacomo Spampinato, Lars Asplund, 14th IEEE Inter-national Conference on Emerging Technology and Factory Automation,Palma de Mallorca, Spain, September, 2009.

Paper C: Hardware support for image processing in robot applications,Jörgen Lidholm, Lars Asplund, Mikael Ekström and Giacomo Spamp-inato, In submission.

ix

Page 11: Stereo vision algorithms in reconfigurable hardware for robotics applications

viii

I would also like to thank all the people taking part in coffee break dis-cussions, you are many and you have all taken part in making my time at theuniversity a pleasure.

A big thank you to my parents, Sonja and Tommy, for supporting me duringmy life and pushing me to follow my heart, you mean everything to me. Tomy brother Johan, thank you for being an inspiration in life, for the times wehave shared discussing things and life over a whiskey by the whiskey-holk orelsewhere. Thank you my sister Linda for being supportive and a great sister.Thank you Sandra and Robert for believing in me and giving me the privilegeto become “godfather” to your three lovely boys, Albin, Hampus and Viktor.

Teuvo ja Raili, Kiitos kaikesta tuesta.And last but not least, thank you Pia for always supporting me, you are the

love of my life!

Thank you all!!

Jörgen LidholmGöteborg, September, 2011

List of Publications

Papers Included in the Licentiate Thesis

Paper A: Two Camera System for Robot Applications; Navigation,Jörgen Lidholm, Fredrik Ekstrand and Lars Asplund, In Proceedings of13th IEEE International Conference on Emerging Technologies and Fac-tory Automation (ETFA’08), IEEE Industrial Electronics Society, Ham-burg, Germany, (2008).

Paper B: Validation of Stereo Matching for Robot Navigation,Jörgen Lidholm, Giacomo Spampinato, Lars Asplund, 14th IEEE Inter-national Conference on Emerging Technology and Factory Automation,Palma de Mallorca, Spain, September, 2009.

Paper C: Hardware support for image processing in robot applications,Jörgen Lidholm, Lars Asplund, Mikael Ekström and Giacomo Spamp-inato, In submission.

ix

Page 12: Stereo vision algorithms in reconfigurable hardware for robotics applications

xi

Additional Papers by the Author

ROBOTICS FOR SMEs - 3D VISION IN REAL-TIME FOR NAVIGATION

AND OBJECT RECOGNITION,Fredrik Ekstrand, Jörgen Lidholm, Lars Asplund, 39th International Sym-posium on Robotics (ISR 2008), Seoul, Korea.

Stereo Vision Based Navigation for Automated Vehicles in Industry,Giacomo Spampinato, Jörgen Lidholm, Fredrik Ekstrand, Lars Asplund,14th IEEE International Conference on Emerging Technology and Fac-tory Automation, Palma de Mallorca, Spain, September, 2009.

Navigation in a Box: Stereovision for Industry Automation,Giacomo Spampinato, Jörgen Lidholm, Fredrik Ekstrand, Carl Ahlberg,Lars Asplund, Mikael Ekström, Advances in Theory and Applications ofStereo Vision, Book edited by: Asim Bhatti, ISBN: 978-953-307-516-7,Publisher: InTech, Publishing date: January 2011.

An Embedded Stereo Vision Module for 6D pose estimation and mapping,Giacomo Spampinato, Jörgen Lidholm, Carl Ahlberg, Fredrik Ekstrand,Mikael Ekström, Lars Asplund, 2011 IEEE/RSJ International Confer-ence on Intelligent Robots and Systems, In submission.

Additional Bookchapters by the Author

Navigation in a Box: Stereovision for Industry Automation,Giacomo Spampinato, Jörgen Lidholm, Fredrik Ekstrand, Carl Ahlberg,Lars Asplund and Mikael Ekström (2011). Advances in Theory and Ap-plications of Stereo Vision, Asim Bhatti (Ed.), ISBN: 978-953-307-516-7,InTech, Available from: http://www.intechopen.com/articles/show/title/navigation-in-a-box-stereovision-for-industry-automation

Page 13: Stereo vision algorithms in reconfigurable hardware for robotics applications

xi

Additional Papers by the Author

ROBOTICS FOR SMEs - 3D VISION IN REAL-TIME FOR NAVIGATION

AND OBJECT RECOGNITION,Fredrik Ekstrand, Jörgen Lidholm, Lars Asplund, 39th International Sym-posium on Robotics (ISR 2008), Seoul, Korea.

Stereo Vision Based Navigation for Automated Vehicles in Industry,Giacomo Spampinato, Jörgen Lidholm, Fredrik Ekstrand, Lars Asplund,14th IEEE International Conference on Emerging Technology and Fac-tory Automation, Palma de Mallorca, Spain, September, 2009.

Navigation in a Box: Stereovision for Industry Automation,Giacomo Spampinato, Jörgen Lidholm, Fredrik Ekstrand, Carl Ahlberg,Lars Asplund, Mikael Ekström, Advances in Theory and Applications ofStereo Vision, Book edited by: Asim Bhatti, ISBN: 978-953-307-516-7,Publisher: InTech, Publishing date: January 2011.

An Embedded Stereo Vision Module for 6D pose estimation and mapping,Giacomo Spampinato, Jörgen Lidholm, Carl Ahlberg, Fredrik Ekstrand,Mikael Ekström, Lars Asplund, 2011 IEEE/RSJ International Confer-ence on Intelligent Robots and Systems, In submission.

Additional Bookchapters by the Author

Navigation in a Box: Stereovision for Industry Automation,Giacomo Spampinato, Jörgen Lidholm, Fredrik Ekstrand, Carl Ahlberg,Lars Asplund and Mikael Ekström (2011). Advances in Theory and Ap-plications of Stereo Vision, Asim Bhatti (Ed.), ISBN: 978-953-307-516-7,InTech, Available from: http://www.intechopen.com/articles/show/title/navigation-in-a-box-stereovision-for-industry-automation

Page 14: Stereo vision algorithms in reconfigurable hardware for robotics applications

Contents

I Thesis 2

1 Introduction 5

1.1 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Background and Motivation 7

2.1 The Robot: Sense, plan and act . . . . . . . . . . . . . . . . . 72.1.1 Types of robots . . . . . . . . . . . . . . . . . . . . . 92.1.2 Robotics tasks . . . . . . . . . . . . . . . . . . . . . 102.1.3 Vision in robotics application . . . . . . . . . . . . . 152.1.4 Robotics for SME. . . . . . . . . . . . . . . . . . . . 16

2.2 Vision and image processing . . . . . . . . . . . . . . . . . . 172.2.1 A camera system/Image sensor . . . . . . . . . . . . . 182.2.2 Colour domains . . . . . . . . . . . . . . . . . . . . . 202.2.3 Distortion and correction . . . . . . . . . . . . . . . . 212.2.4 Image features . . . . . . . . . . . . . . . . . . . . . 232.2.5 Stereo vision . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Hardware support for heavy computation tasks . . . . . . . . 262.3.1 Digital Signal Processors . . . . . . . . . . . . . . . . 272.3.2 Vector processors . . . . . . . . . . . . . . . . . . . . 272.3.3 Systolic array processors . . . . . . . . . . . . . . . . 282.3.4 Asymmetric multicore processors . . . . . . . . . . . 282.3.5 General Purpose GPUs (GPGPU) . . . . . . . . . . . 292.3.6 Field Programmable Gate Arrays . . . . . . . . . . . 29

2.4 FPGA for (stereo-) vision applications . . . . . . . . . . . . . 302.4.1 Designing FPGA architectures (What they need) . . . 31

2.5 Embedded FPGA based vision system . . . . . . . . . . . . . 32

xiii

Page 15: Stereo vision algorithms in reconfigurable hardware for robotics applications

Contents

I Thesis 2

1 Introduction 5

1.1 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Background and Motivation 7

2.1 The Robot: Sense, plan and act . . . . . . . . . . . . . . . . . 72.1.1 Types of robots . . . . . . . . . . . . . . . . . . . . . 92.1.2 Robotics tasks . . . . . . . . . . . . . . . . . . . . . 102.1.3 Vision in robotics application . . . . . . . . . . . . . 152.1.4 Robotics for SME. . . . . . . . . . . . . . . . . . . . 16

2.2 Vision and image processing . . . . . . . . . . . . . . . . . . 172.2.1 A camera system/Image sensor . . . . . . . . . . . . . 182.2.2 Colour domains . . . . . . . . . . . . . . . . . . . . . 202.2.3 Distortion and correction . . . . . . . . . . . . . . . . 212.2.4 Image features . . . . . . . . . . . . . . . . . . . . . 232.2.5 Stereo vision . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Hardware support for heavy computation tasks . . . . . . . . 262.3.1 Digital Signal Processors . . . . . . . . . . . . . . . . 272.3.2 Vector processors . . . . . . . . . . . . . . . . . . . . 272.3.3 Systolic array processors . . . . . . . . . . . . . . . . 282.3.4 Asymmetric multicore processors . . . . . . . . . . . 282.3.5 General Purpose GPUs (GPGPU) . . . . . . . . . . . 292.3.6 Field Programmable Gate Arrays . . . . . . . . . . . 29

2.4 FPGA for (stereo-) vision applications . . . . . . . . . . . . . 302.4.1 Designing FPGA architectures (What they need) . . . 31

2.5 Embedded FPGA based vision system . . . . . . . . . . . . . 32

xiii

Page 16: Stereo vision algorithms in reconfigurable hardware for robotics applications

xiv Contents

2.6 Component based software development for hybrid FPGA sys-tems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.7 Simultaneous Localization And Mapping (SLAM) . . . . . . 352.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Summary of papers and their contribution 39

3.1 Paper A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Paper B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.2.1 Contribution . . . . . . . . . . . . . . . . . . . . . . 40

3.3 Paper C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.1 Contribution . . . . . . . . . . . . . . . . . . . . . . 41

4 Conclusions and Future work 43

4.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Bibliography 45

II Included Papers 49

5 Paper A:

Two Camera System for Robot Applications; Navigation 51

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 535.3 Experimental platform . . . . . . . . . . . . . . . . . . . . . 54

5.3.1 Image sensors . . . . . . . . . . . . . . . . . . . . . . 545.3.2 FPGA board . . . . . . . . . . . . . . . . . . . . . . 565.3.3 Carrier board . . . . . . . . . . . . . . . . . . . . . . 56

5.4 Feature detectors . . . . . . . . . . . . . . . . . . . . . . . . 575.4.1 Stephen and Harris combined corner and edge detector 575.4.2 FPGA implementation of Harris corner detector . . . . 58

5.5 Interest point location . . . . . . . . . . . . . . . . . . . . . . 595.5.1 Image sequence feature tracking . . . . . . . . . . . . 635.5.2 Spurious matching and landmark evaluation . . . . . . 635.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . 65

5.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.7 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Contents xv

6 Paper B:

Validation of Stereo Matching for Robot Navigation 71

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . 746.2.2 Navigation process overview . . . . . . . . . . . . . . 766.2.3 Stereo triangulation . . . . . . . . . . . . . . . . . . . 776.2.4 Back projection of landmarks onto the image sensor . 806.2.5 Planar egomotion estimation . . . . . . . . . . . . . . 80

6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.3.1 Experimental Platform . . . . . . . . . . . . . . . . . 816.3.2 Stereo Camera Calibration and rectification . . . . . . 836.3.3 Experimental setup . . . . . . . . . . . . . . . . . . . 83

6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.4.1 Stereo matching . . . . . . . . . . . . . . . . . . . . 846.4.2 Landmark location . . . . . . . . . . . . . . . . . . . 846.4.3 Egomotion estimation . . . . . . . . . . . . . . . . . 85

6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.6 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7 Paper C:

Hardware support for image processing in robot applications 93

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.2 Image Processing; Hardware and Software support . . . . . . 96

7.2.1 The image processing sequence . . . . . . . . . . . . 977.2.2 Hardware for Image Processing . . . . . . . . . . . . 987.2.3 Image processing in FPGAs . . . . . . . . . . . . . . 101

7.3 Developing Software in FPGA . . . . . . . . . . . . . . . . . 1027.4 Lens distortion correction . . . . . . . . . . . . . . . . . . . . 103

7.4.1 Method 1: For cases when the tangential distortion isnegligible . . . . . . . . . . . . . . . . . . . . . . . . 108

7.4.2 Method 2: For cases when the tangential distortionmust corrected . . . . . . . . . . . . . . . . . . . . . 109

7.5 Implementation details . . . . . . . . . . . . . . . . . . . . . 1117.5.1 Proposed hardware design for component based em-

bedded image processing . . . . . . . . . . . . . . . . 1127.5.2 Fixed point arithmetic . . . . . . . . . . . . . . . . . 1137.5.3 Look–up tables (LUTs) . . . . . . . . . . . . . . . . . 114

Page 17: Stereo vision algorithms in reconfigurable hardware for robotics applications

xiv Contents

2.6 Component based software development for hybrid FPGA sys-tems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.7 Simultaneous Localization And Mapping (SLAM) . . . . . . 352.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Summary of papers and their contribution 39

3.1 Paper A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Paper B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.2.1 Contribution . . . . . . . . . . . . . . . . . . . . . . 40

3.3 Paper C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.1 Contribution . . . . . . . . . . . . . . . . . . . . . . 41

4 Conclusions and Future work 43

4.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Bibliography 45

II Included Papers 49

5 Paper A:

Two Camera System for Robot Applications; Navigation 51

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 535.3 Experimental platform . . . . . . . . . . . . . . . . . . . . . 54

5.3.1 Image sensors . . . . . . . . . . . . . . . . . . . . . . 545.3.2 FPGA board . . . . . . . . . . . . . . . . . . . . . . 565.3.3 Carrier board . . . . . . . . . . . . . . . . . . . . . . 56

5.4 Feature detectors . . . . . . . . . . . . . . . . . . . . . . . . 575.4.1 Stephen and Harris combined corner and edge detector 575.4.2 FPGA implementation of Harris corner detector . . . . 58

5.5 Interest point location . . . . . . . . . . . . . . . . . . . . . . 595.5.1 Image sequence feature tracking . . . . . . . . . . . . 635.5.2 Spurious matching and landmark evaluation . . . . . . 635.5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . 65

5.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.7 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Contents xv

6 Paper B:

Validation of Stereo Matching for Robot Navigation 71

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . 746.2.2 Navigation process overview . . . . . . . . . . . . . . 766.2.3 Stereo triangulation . . . . . . . . . . . . . . . . . . . 776.2.4 Back projection of landmarks onto the image sensor . 806.2.5 Planar egomotion estimation . . . . . . . . . . . . . . 80

6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.3.1 Experimental Platform . . . . . . . . . . . . . . . . . 816.3.2 Stereo Camera Calibration and rectification . . . . . . 836.3.3 Experimental setup . . . . . . . . . . . . . . . . . . . 83

6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846.4.1 Stereo matching . . . . . . . . . . . . . . . . . . . . 846.4.2 Landmark location . . . . . . . . . . . . . . . . . . . 846.4.3 Egomotion estimation . . . . . . . . . . . . . . . . . 85

6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.6 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7 Paper C:

Hardware support for image processing in robot applications 93

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.2 Image Processing; Hardware and Software support . . . . . . 96

7.2.1 The image processing sequence . . . . . . . . . . . . 977.2.2 Hardware for Image Processing . . . . . . . . . . . . 987.2.3 Image processing in FPGAs . . . . . . . . . . . . . . 101

7.3 Developing Software in FPGA . . . . . . . . . . . . . . . . . 1027.4 Lens distortion correction . . . . . . . . . . . . . . . . . . . . 103

7.4.1 Method 1: For cases when the tangential distortion isnegligible . . . . . . . . . . . . . . . . . . . . . . . . 108

7.4.2 Method 2: For cases when the tangential distortionmust corrected . . . . . . . . . . . . . . . . . . . . . 109

7.5 Implementation details . . . . . . . . . . . . . . . . . . . . . 1117.5.1 Proposed hardware design for component based em-

bedded image processing . . . . . . . . . . . . . . . . 1127.5.2 Fixed point arithmetic . . . . . . . . . . . . . . . . . 1137.5.3 Look–up tables (LUTs) . . . . . . . . . . . . . . . . . 114

Page 18: Stereo vision algorithms in reconfigurable hardware for robotics applications

xvi Contents

7.5.4 Method 1: Implementation details . . . . . . . . . . . 1147.5.5 Method 2: Implementation details . . . . . . . . . . . 116

7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.6.1 Execution time . . . . . . . . . . . . . . . . . . . . . 1177.6.2 Resource requirements . . . . . . . . . . . . . . . . . 1197.6.3 Precision . . . . . . . . . . . . . . . . . . . . . . . . 120

7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227.8 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 124Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Acronyms

ADC Analog-to-Digital Converter

AGV Autonomous Guided Vehicle

AUV Autonomous Underwater Vehicle

ASIC Application Specific Integrated Circuit

CBSE Component Based Software Engineering

CCD Charge Coupled Device

CMOS Complementary Metal-Oxide Semiconductor

CPU Central Processing Unit

DSP Digital Signal Processor

FPGA Field Programmable Gate Array

GPS Global Positioning System

GPU Graphics Processing Unit

HDL Hardware Description Language

IC Integrated Circuit

ROV Remotely Operated Vehicle

SIMD Single Instruction Multiple Data

MIMD Multiple Instructions Multiple Data

1

Page 19: Stereo vision algorithms in reconfigurable hardware for robotics applications

xvi Contents

7.5.4 Method 1: Implementation details . . . . . . . . . . . 1147.5.5 Method 2: Implementation details . . . . . . . . . . . 116

7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1177.6.1 Execution time . . . . . . . . . . . . . . . . . . . . . 1177.6.2 Resource requirements . . . . . . . . . . . . . . . . . 1197.6.3 Precision . . . . . . . . . . . . . . . . . . . . . . . . 120

7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227.8 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 124Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Acronyms

ADC Analog-to-Digital Converter

AGV Autonomous Guided Vehicle

AUV Autonomous Underwater Vehicle

ASIC Application Specific Integrated Circuit

CBSE Component Based Software Engineering

CCD Charge Coupled Device

CMOS Complementary Metal-Oxide Semiconductor

CPU Central Processing Unit

DSP Digital Signal Processor

FPGA Field Programmable Gate Array

GPS Global Positioning System

GPU Graphics Processing Unit

HDL Hardware Description Language

IC Integrated Circuit

ROV Remotely Operated Vehicle

SIMD Single Instruction Multiple Data

MIMD Multiple Instructions Multiple Data

1

Page 20: Stereo vision algorithms in reconfigurable hardware for robotics applications

2 Contents

SLAM Simultaneous Localisation And Mapping

SME Small and Medium Enterprises

SoC System-on-Chip

UAV Unmanned Aerial Vehicle

VLIW Very Long Instruction Words

I

Thesis

3

Page 21: Stereo vision algorithms in reconfigurable hardware for robotics applications

2 Contents

SLAM Simultaneous Localisation And Mapping

SME Small and Medium Enterprises

SoC System-on-Chip

UAV Unmanned Aerial Vehicle

VLIW Very Long Instruction Words

I

Thesis

3

Page 22: Stereo vision algorithms in reconfigurable hardware for robotics applications

Chapter 1

Introduction

In industry, robots have existed for decades. The main purpose of an industrialrobot is to carry out heavy monotone tasks that often require high precision,i.e. car manufacturing, machine tending. However, to maintain high precisionover time and to be able to lift heavy objects, the robot is often bolted to theground. For a machine tending robot that would also mean that the robot isoccupying a machine.

A mobile industrial robot that can be utilised when needed or to tend severalmachines at the same time would enable smaller (small to medium size) enter-prises to invest in a robot. The robot can easily be moved to free a machine,enabling an operator to run a short series manually. To make an industrialmobile robot flexible, to meet requirements of small and medium enterprises,requires a navigation system for the robot to know its location and move be-tween different machines or to fetch material from warehouse when needed. Anavigation system should be flexible and easy to operate for a small company,enabling easy changing of which route the robot should move. One possibilityis to let a person guide the mobile platform using a well known interface likea joystick. Basically two different sensors would provide sufficient precisionand flexibility for such a system, i.e. laser range finders and cameras. A stereocamera provide higher flexibility and could still be less expensive than a laserrange finder. An image must be corrected from distortion and salient informa-tion extracted which can be utilised to create landmarks through multiple viewcameras. Image analysis requires calculations in several steps of large amountsof data and is the foundation for advanced robot tasks, e.g. navigation. Com-putational performance is still a problem in image processing, mainly due to

5

Page 23: Stereo vision algorithms in reconfigurable hardware for robotics applications

Chapter 1

Introduction

In industry, robots have existed for decades. The main purpose of an industrialrobot is to carry out heavy monotone tasks that often require high precision,i.e. car manufacturing, machine tending. However, to maintain high precisionover time and to be able to lift heavy objects, the robot is often bolted to theground. For a machine tending robot that would also mean that the robot isoccupying a machine.

A mobile industrial robot that can be utilised when needed or to tend severalmachines at the same time would enable smaller (small to medium size) enter-prises to invest in a robot. The robot can easily be moved to free a machine,enabling an operator to run a short series manually. To make an industrialmobile robot flexible, to meet requirements of small and medium enterprises,requires a navigation system for the robot to know its location and move be-tween different machines or to fetch material from warehouse when needed. Anavigation system should be flexible and easy to operate for a small company,enabling easy changing of which route the robot should move. One possibilityis to let a person guide the mobile platform using a well known interface likea joystick. Basically two different sensors would provide sufficient precisionand flexibility for such a system, i.e. laser range finders and cameras. A stereocamera provide higher flexibility and could still be less expensive than a laserrange finder. An image must be corrected from distortion and salient informa-tion extracted which can be utilised to create landmarks through multiple viewcameras. Image analysis requires calculations in several steps of large amountsof data and is the foundation for advanced robot tasks, e.g. navigation. Com-putational performance is still a problem in image processing, mainly due to

5

Page 24: Stereo vision algorithms in reconfigurable hardware for robotics applications

6 Chapter 1. Introduction

the large amount of data.Robots are not only for industrial applications, there are several areas in our

everyday life where a robot can assist. For many applications the robot needsinformation on its location and also knowledge of objects in the environmentthat either can be obstacles or objects that is supposed to be handled in someway. In this thesis I will describe the task at hand when building a systemutilising an Field Programmable Gate Array (FPGA), what is required and whatlimitations there are in such a system. In parallel with this work two stereocamera systems have been designed and used as validation platforms.

1.1 Thesis outline

The outline of this thesis is divided into two parts:Part I presents the background and motivation for the thesis and puts my pa-pers into a context. Chapter 2 presents robots, image processing, hardwarefor image processing and implementation considerations for FPGAs. Chapter

3 presents my papers and my contributions in each paper. Chapter 4 gives asummary and I summarize future relevant work.Part II presents the technical contribution of the thesis in the form of threepapers.

Chapter 2

Background and Motivation

This chapter will give an overview of the research area and a motivation to mywork. I will start with a presentation of my view of what a robot is, continuingwith an introduction to the processes involved in image analysis and finally Iwill present hardware support for image processing, with focus on the FPGAand the development process involved. Some ideas that emerged during thecourse of my research such as the Component Based Software Engineering(CBSE) paradigm as an influence to FPGA architecture development, will alsobe discussed.

2.1 The Robot: Sense, plan and act

A robot is, in my opinion, defined by three abilities; sense, plan and act. Arobot collects information of the environment and uses the information to planand perform a physical act. The ability to sense and plan are what differentiatea robot from a mechanical machine. In this section I will discuss the three partsand draw some parallels to humans.

Sense

A sensor is a device that converts a physical phenomenon to an electricsignal which, when converted to a digital value, can be used in a softwareprogram. A wide range of sensors exists for sensing most physical propertiesand in different constructions can sense anything from rotation of a wheel todistance to a wall.

7

Page 25: Stereo vision algorithms in reconfigurable hardware for robotics applications

6 Chapter 1. Introduction

the large amount of data.Robots are not only for industrial applications, there are several areas in our

everyday life where a robot can assist. For many applications the robot needsinformation on its location and also knowledge of objects in the environmentthat either can be obstacles or objects that is supposed to be handled in someway. In this thesis I will describe the task at hand when building a systemutilising an Field Programmable Gate Array (FPGA), what is required and whatlimitations there are in such a system. In parallel with this work two stereocamera systems have been designed and used as validation platforms.

1.1 Thesis outline

The outline of this thesis is divided into two parts:Part I presents the background and motivation for the thesis and puts my pa-pers into a context. Chapter 2 presents robots, image processing, hardwarefor image processing and implementation considerations for FPGAs. Chapter

3 presents my papers and my contributions in each paper. Chapter 4 gives asummary and I summarize future relevant work.Part II presents the technical contribution of the thesis in the form of threepapers.

Chapter 2

Background and Motivation

This chapter will give an overview of the research area and a motivation to mywork. I will start with a presentation of my view of what a robot is, continuingwith an introduction to the processes involved in image analysis and finally Iwill present hardware support for image processing, with focus on the FPGAand the development process involved. Some ideas that emerged during thecourse of my research such as the Component Based Software Engineering(CBSE) paradigm as an influence to FPGA architecture development, will alsobe discussed.

2.1 The Robot: Sense, plan and act

A robot is, in my opinion, defined by three abilities; sense, plan and act. Arobot collects information of the environment and uses the information to planand perform a physical act. The ability to sense and plan are what differentiatea robot from a mechanical machine. In this section I will discuss the three partsand draw some parallels to humans.

Sense

A sensor is a device that converts a physical phenomenon to an electricsignal which, when converted to a digital value, can be used in a softwareprogram. A wide range of sensors exists for sensing most physical propertiesand in different constructions can sense anything from rotation of a wheel todistance to a wall.

7

Page 26: Stereo vision algorithms in reconfigurable hardware for robotics applications

8 Chapter 2. Background and Motivation

For humans, vision is very important, it helps us to see objects in our vicin-ity and we are able to identify or at least categories them. With our eyes we canestimate depth and distance between objects to acquire a good knowledge ofthe environment, comparable to a map. Vision also assist other abilities suchas balance.

Because vision is so important for humans, it can also be a powerful sourceof information for a robot. Modern cameras are fairly low cost, however“dumb”, a camera must be complemented with an image analysis system. Animage consists of a dense colour matrix, where salient regions are extractedwith a feature detector which can detect corners, edges, blobs or be matchedusing custom templates. A human vision system can be mimicked to someextend, once the features have been acquired.

There is a good reason why visual cortex takes up such a large part of thehuman brain. Both due to the enormous amount of information that has tobe handled for us to understand what it is we see, and also by the complexunderstanding humans can gain from the rays of light projected on the retina ofour eyes. A human can easily extract interesting information from what we see,and our knowledge is not disrupted by artefacts like shades, texture, occlusion,light temperature and colour. In computer vision applications, artificial lightingis often used to remove any unwanted shadows.

A large amount of research has been performed in the area of image pro-cessing and related topics for many years and have matured a great deal, how-ever computational performance is still an obstacle.

Plan

When a robot is able to understand what is in the environment, its own loca-tion, both in relation to interesting objects as well as on a higher level, it canstart to plan an act. Planning is often implemented with artificial intelligence(AI) algorithms. In the AI field there are different types of algorithms thattries to mimic intelligence through different concepts, i.e. genetic algorithms,reinforcement learning, neural networks, fuzzy logic, case based reasoning.

A robot should be able to handle a set of cases without knowing exactlywhat will happen. For example an automatic door is not a robot, which istoo simple and the behaviour is predictable. AI tries to mimic the human wayof thinking, which require abstraction of information. For instance a locationcould be described as in front of the refrigerator in the kitchen as opposed to(32.3, 123.4, 0.0). In other words, my understanding of my location is onlyrelevant with respect to what I want to accomplish.

2.1 The Robot: Sense, plan and act 9

Act

A robot acts through a physical action, it can be anything from moving to a dif-ferent location, to push a button or grab an object. To do this, motors, pneumat-ics, hydraulics, gearboxes and other mechanical constructions are used. Thiswill not be discussed any further in this thesis however interesting it may be.

2.1.1 Types of robots

Designing a robot is a matter of finding a good design for what the robot issupposed to do. For simpler tasks a simple machine is all that is required butfor some tasks intelligence and advanced mechanical functionality is a require-ment. Robots that will help humans in their daily life at home should probablymimic some properties of humans, since the environment created by humansare also created for humans. Other tasks may not be so simple to carry out byrobots inspired by humans, but it is often a good solution to take inspirationfrom animals, be it a snake, spider, dolphin or something else.

Robots exists in many different shapes and designs. Robots in productionindustry from the middle of the twentieth century looked more or less the sameas they do today. The dexterity has increased as well as precision, speed andreliability. The industrial robot is the most common kind and is also meantto perform tasks that humans used to do in production industry. Some robotsimitates snakes, some are built to climb walls or trees others to travel alongan electric power line. But there are also other kinds of robots with wheels,one or more legs, caterpillar tracks, arms, head. A wheeled robot may havetwo, three, four or even more wheels, some have wheels based on a design byan inventor named Bengt Ilon, enabling omnidirectional motion. Robots withlegs can be inspired by humans, having two legs, some have six or even eightlegs, inspired by insects. There are also autonomous robots designed to fly,Unmanned Aerial Vehicle (UAV), that also come in different forms, standardmodel planes, helicopters and quadrocopters. Of course there are also robotsdesigned for under water mobility, Autonomous Underwater Vehicle (AUV)where some look like torpedoes.

A robot can be fully autonomous, semi autonomous or teleoperated (inwhich case it probably should not be called a robot any more).

Mobile robots are power by batteries which are getting increasingly pow-erful but still pose a limitation. Power constraints enforce power efficient com-puting devices and efficient algorithms that are advanced enough to accomplisha certain task.

Page 27: Stereo vision algorithms in reconfigurable hardware for robotics applications

8 Chapter 2. Background and Motivation

For humans, vision is very important, it helps us to see objects in our vicin-ity and we are able to identify or at least categories them. With our eyes we canestimate depth and distance between objects to acquire a good knowledge ofthe environment, comparable to a map. Vision also assist other abilities suchas balance.

Because vision is so important for humans, it can also be a powerful sourceof information for a robot. Modern cameras are fairly low cost, however“dumb”, a camera must be complemented with an image analysis system. Animage consists of a dense colour matrix, where salient regions are extractedwith a feature detector which can detect corners, edges, blobs or be matchedusing custom templates. A human vision system can be mimicked to someextend, once the features have been acquired.

There is a good reason why visual cortex takes up such a large part of thehuman brain. Both due to the enormous amount of information that has tobe handled for us to understand what it is we see, and also by the complexunderstanding humans can gain from the rays of light projected on the retina ofour eyes. A human can easily extract interesting information from what we see,and our knowledge is not disrupted by artefacts like shades, texture, occlusion,light temperature and colour. In computer vision applications, artificial lightingis often used to remove any unwanted shadows.

A large amount of research has been performed in the area of image pro-cessing and related topics for many years and have matured a great deal, how-ever computational performance is still an obstacle.

Plan

When a robot is able to understand what is in the environment, its own loca-tion, both in relation to interesting objects as well as on a higher level, it canstart to plan an act. Planning is often implemented with artificial intelligence(AI) algorithms. In the AI field there are different types of algorithms thattries to mimic intelligence through different concepts, i.e. genetic algorithms,reinforcement learning, neural networks, fuzzy logic, case based reasoning.

A robot should be able to handle a set of cases without knowing exactlywhat will happen. For example an automatic door is not a robot, which istoo simple and the behaviour is predictable. AI tries to mimic the human wayof thinking, which require abstraction of information. For instance a locationcould be described as in front of the refrigerator in the kitchen as opposed to(32.3, 123.4, 0.0). In other words, my understanding of my location is onlyrelevant with respect to what I want to accomplish.

2.1 The Robot: Sense, plan and act 9

Act

A robot acts through a physical action, it can be anything from moving to a dif-ferent location, to push a button or grab an object. To do this, motors, pneumat-ics, hydraulics, gearboxes and other mechanical constructions are used. Thiswill not be discussed any further in this thesis however interesting it may be.

2.1.1 Types of robots

Designing a robot is a matter of finding a good design for what the robot issupposed to do. For simpler tasks a simple machine is all that is required butfor some tasks intelligence and advanced mechanical functionality is a require-ment. Robots that will help humans in their daily life at home should probablymimic some properties of humans, since the environment created by humansare also created for humans. Other tasks may not be so simple to carry out byrobots inspired by humans, but it is often a good solution to take inspirationfrom animals, be it a snake, spider, dolphin or something else.

Robots exists in many different shapes and designs. Robots in productionindustry from the middle of the twentieth century looked more or less the sameas they do today. The dexterity has increased as well as precision, speed andreliability. The industrial robot is the most common kind and is also meantto perform tasks that humans used to do in production industry. Some robotsimitates snakes, some are built to climb walls or trees others to travel alongan electric power line. But there are also other kinds of robots with wheels,one or more legs, caterpillar tracks, arms, head. A wheeled robot may havetwo, three, four or even more wheels, some have wheels based on a design byan inventor named Bengt Ilon, enabling omnidirectional motion. Robots withlegs can be inspired by humans, having two legs, some have six or even eightlegs, inspired by insects. There are also autonomous robots designed to fly,Unmanned Aerial Vehicle (UAV), that also come in different forms, standardmodel planes, helicopters and quadrocopters. Of course there are also robotsdesigned for under water mobility, Autonomous Underwater Vehicle (AUV)where some look like torpedoes.

A robot can be fully autonomous, semi autonomous or teleoperated (inwhich case it probably should not be called a robot any more).

Mobile robots are power by batteries which are getting increasingly pow-erful but still pose a limitation. Power constraints enforce power efficient com-puting devices and efficient algorithms that are advanced enough to accomplisha certain task.

Page 28: Stereo vision algorithms in reconfigurable hardware for robotics applications

10 Chapter 2. Background and Motivation

Mobile robots interacts with the environment and thus must also be ableto acquire information from it. In the environment there can be obstacles thatneeds to be avoided, interesting objects that the robot should interact with, i.e.

a human, a machine, a can of beer. This require sensory equipment, and themore advanced the sensors are, the more information there is to process furtherincreasing load on computing devices and batter consumption.

Early mobile robots where sometimes designed with a traditional desktopcomputer and modern robots often uses laptops (with their own power source)or embedded computer systems.

On the market today a large amount of power efficient devices exists drivenby the mobile market where performance is becoming increasingly importantin our society with smart phones and tablets. A cell-phone today, can incorpo-rate a System-on-Chip (SoC) with multi-core processor, Digital Signal Proces-sor (DSP)s and even a Graphics Processing Unit (GPU).

2.1.2 Robotics tasks

Of course all different kinds of robots are meant to execute different tasks andthe design is chosen to best suit the task at hand. For example there are robotsfor transportation of beds, material and documents in hospitals, for militarypurposes such as precision bombing, surveillance, exploration both airborneand ground vehicles. Civilian robots exists for search and rescue, surveil-lance, pool cleaning, lawn mowing, vacuum cleaning, planetary exploration(Mars rovers), various transportation operations, teleoperated surgery and also“semi-autonomous” and fully autonomous cars (still at research stage). Withthe vast amount of research in robotics and related areas, robots are for exam-ple becoming increasingly intelligent and humanoids more expressive throughspeech synthesis and human like facial expressions, which governs for a broadapplication area and a large variety physical abilities.

Robotics in medical appliances

Teleoperated surgical robot systems are clever for many reasons, they allowskilled surgeons to perform an operation on a patient at a remote location. ASwedish surgeon could perform a surgery on a patient in America, without hav-ing to travel. But such a robotic system also increases precision, by allowingthe surgeon to set how much an instrument should move for given hand motion.

2.1 The Robot: Sense, plan and act 11

Figure 2.1: Teleoperated robot Giraff. (Photo: Giraff Technologies AB/Robot-dalen )

Another type of robot is for elderly care, a teleoperated robot with mobility,a video camera and a screen, see figure 2.1. The robot is operated by a caregiver or relatives and allows the operator to virtually "visit" a care taker withoutphysically going there. The robot enables the operator to virtually move aroundin the house or apartment and see what is going on. A robot like this requiresacceptance from both care giver and care taker, but will probably in time besemi-autonomous to simplify the task of transporting between different rooms.The increased autonomy will require more advanced sensory equipment andsoftware that allows navigation, obstacle detection and avoidance. This is im-portant, especially in a dynamic environment such as a home where furnituresare moved around and with moving people as well.

Persons with physical handicaps are often bound to a wheel chair, an ex-oskeleton could be an alternative to a wheel chair which also could work as arehabilitation tool by gradually decreasing the amplification. An exoskeletonenhances a motion using actuators of varying kind, this can also be utilised toallow a person to lift heavier things than he or she normally would be capableof lifting. Modern technology has reached a level where it is actually feasibleto construct an exoskeleton which is light, durable and powerful enough.

Robots for personal use

A robot is meant to simplify our daily life and to some extent for joy. Vacuumcleaning and lawn mower robots are popular, probably due to a relatively af-

Page 29: Stereo vision algorithms in reconfigurable hardware for robotics applications

10 Chapter 2. Background and Motivation

Mobile robots interacts with the environment and thus must also be ableto acquire information from it. In the environment there can be obstacles thatneeds to be avoided, interesting objects that the robot should interact with, i.e.

a human, a machine, a can of beer. This require sensory equipment, and themore advanced the sensors are, the more information there is to process furtherincreasing load on computing devices and batter consumption.

Early mobile robots where sometimes designed with a traditional desktopcomputer and modern robots often uses laptops (with their own power source)or embedded computer systems.

On the market today a large amount of power efficient devices exists drivenby the mobile market where performance is becoming increasingly importantin our society with smart phones and tablets. A cell-phone today, can incorpo-rate a System-on-Chip (SoC) with multi-core processor, Digital Signal Proces-sor (DSP)s and even a Graphics Processing Unit (GPU).

2.1.2 Robotics tasks

Of course all different kinds of robots are meant to execute different tasks andthe design is chosen to best suit the task at hand. For example there are robotsfor transportation of beds, material and documents in hospitals, for militarypurposes such as precision bombing, surveillance, exploration both airborneand ground vehicles. Civilian robots exists for search and rescue, surveil-lance, pool cleaning, lawn mowing, vacuum cleaning, planetary exploration(Mars rovers), various transportation operations, teleoperated surgery and also“semi-autonomous” and fully autonomous cars (still at research stage). Withthe vast amount of research in robotics and related areas, robots are for exam-ple becoming increasingly intelligent and humanoids more expressive throughspeech synthesis and human like facial expressions, which governs for a broadapplication area and a large variety physical abilities.

Robotics in medical appliances

Teleoperated surgical robot systems are clever for many reasons, they allowskilled surgeons to perform an operation on a patient at a remote location. ASwedish surgeon could perform a surgery on a patient in America, without hav-ing to travel. But such a robotic system also increases precision, by allowingthe surgeon to set how much an instrument should move for given hand motion.

2.1 The Robot: Sense, plan and act 11

Figure 2.1: Teleoperated robot Giraff. (Photo: Giraff Technologies AB/Robot-dalen )

Another type of robot is for elderly care, a teleoperated robot with mobility,a video camera and a screen, see figure 2.1. The robot is operated by a caregiver or relatives and allows the operator to virtually "visit" a care taker withoutphysically going there. The robot enables the operator to virtually move aroundin the house or apartment and see what is going on. A robot like this requiresacceptance from both care giver and care taker, but will probably in time besemi-autonomous to simplify the task of transporting between different rooms.The increased autonomy will require more advanced sensory equipment andsoftware that allows navigation, obstacle detection and avoidance. This is im-portant, especially in a dynamic environment such as a home where furnituresare moved around and with moving people as well.

Persons with physical handicaps are often bound to a wheel chair, an ex-oskeleton could be an alternative to a wheel chair which also could work as arehabilitation tool by gradually decreasing the amplification. An exoskeletonenhances a motion using actuators of varying kind, this can also be utilised toallow a person to lift heavier things than he or she normally would be capableof lifting. Modern technology has reached a level where it is actually feasibleto construct an exoskeleton which is light, durable and powerful enough.

Robots for personal use

A robot is meant to simplify our daily life and to some extent for joy. Vacuumcleaning and lawn mower robots are popular, probably due to a relatively af-

Page 30: Stereo vision algorithms in reconfigurable hardware for robotics applications

12 Chapter 2. Background and Motivation

fordable price, but also due to that those tasks are quite tedious. The robotsare simple but perfect examples of use cases for robots, the tasks they performare non-critical and we (at least I) are just happy if we don’t have to do themourselves, see figure 2.2 for examples.

(a) A vacuum cleaning robot (courtesy ofiRobot)

(b) A lawn mowing robot (courtesy of Husq-varna)

Figure 2.2: Robots for your home.

Semi autonomous cars

Human factor is almost always the main cause of car accidents. This has beenrecognised by car manufacturers, who introduce more technology to aid thedriver to notice potential hazards, change of speed limits and also to act incritical situations, i.e. emergency brake. Comfort systems are also available,i.e. adaptive speed control. Sensors used today are both based on vision andradar technologies.

There are also systems in active development, which are more focused onthe drivers attention, i.e. to detect drowsiness of the driver.

Robots in production industry

The first robots where made for industrial production, where the robot is asolution for improving quality of production, lower production cost throughincreased production speed. Industrial robots are extremely robust and built forconstant operation while maintaining high precision. Industrial robots exists inmany different designs to meet different demands, see figure 2.3.

2.1 The Robot: Sense, plan and act 13

(a) Robots for spot welding (b) High speed robot for pick andplace operations

Figure 2.3: Industrial robots exists in a wide range of configurations. (Photos:Courtesy of ABB)

UAV

Unmanned aerial vehicles has been interesting mainly in military applications,where an UAV allows close and relatively cost efficient surveillance missionswithout putting human lives in harms way. Figure 2.4 shows two examples, butthere are other kinds and different applications.

(a) The Predator UAV, (U.S. Air Forcephoto/Tech. Sgt. Erik Gudmundson)

(b) The Reaper UAV, (U.S. Air Force photo/Tech. Sgt.Erik Gudmundson)

Figure 2.4: US military UAVs in active use, for different types of missions.

Page 31: Stereo vision algorithms in reconfigurable hardware for robotics applications

12 Chapter 2. Background and Motivation

fordable price, but also due to that those tasks are quite tedious. The robotsare simple but perfect examples of use cases for robots, the tasks they performare non-critical and we (at least I) are just happy if we don’t have to do themourselves, see figure 2.2 for examples.

(a) A vacuum cleaning robot (courtesy ofiRobot)

(b) A lawn mowing robot (courtesy of Husq-varna)

Figure 2.2: Robots for your home.

Semi autonomous cars

Human factor is almost always the main cause of car accidents. This has beenrecognised by car manufacturers, who introduce more technology to aid thedriver to notice potential hazards, change of speed limits and also to act incritical situations, i.e. emergency brake. Comfort systems are also available,i.e. adaptive speed control. Sensors used today are both based on vision andradar technologies.

There are also systems in active development, which are more focused onthe drivers attention, i.e. to detect drowsiness of the driver.

Robots in production industry

The first robots where made for industrial production, where the robot is asolution for improving quality of production, lower production cost throughincreased production speed. Industrial robots are extremely robust and built forconstant operation while maintaining high precision. Industrial robots exists inmany different designs to meet different demands, see figure 2.3.

2.1 The Robot: Sense, plan and act 13

(a) Robots for spot welding (b) High speed robot for pick andplace operations

Figure 2.3: Industrial robots exists in a wide range of configurations. (Photos:Courtesy of ABB)

UAV

Unmanned aerial vehicles has been interesting mainly in military applications,where an UAV allows close and relatively cost efficient surveillance missionswithout putting human lives in harms way. Figure 2.4 shows two examples, butthere are other kinds and different applications.

(a) The Predator UAV, (U.S. Air Forcephoto/Tech. Sgt. Erik Gudmundson)

(b) The Reaper UAV, (U.S. Air Force photo/Tech. Sgt.Erik Gudmundson)

Figure 2.4: US military UAVs in active use, for different types of missions.

Page 32: Stereo vision algorithms in reconfigurable hardware for robotics applications

14 Chapter 2. Background and Motivation

AUV

The most common type of underwater vehicle is the Remotely Operated Vehi-cle (ROV). ROVs are often used in deep water exploration and has been usedfor locating drowned persons after boat accidents. The sensory equipment aremainly based on sonar technology. Complementary vision systems allow anoperator to see what exists in the near vicinity of the ROV.

An ROV can measure its own location using sonars that senses distinctiveformations at the sea floor. Together with a surface reference localisation with asupport surface ship as reference, maps can be created. The step from an ROVto an AUV is not so far, what is needed is intelligence. An AUV can be utilisedfor surveillance of off-shore installations, e.g. oil rigs, intruder surveillance ofsea-ports.

Figure 2.5: The autonomous underwater vehicle, designed by students atMälardalen University.

Rovers

A rover is a vehicle, man or unmanned, which is used for terrain exploration.The unmanned Mars rover Spirit landed on planet Mars in the beginning of2004, to start its geological research looking for water and analysing minerals.The rover contains several cameras, tools to drill in rocks and collect samplesof the surface for analysis using a wide range of instruments. Collected infor-mation was sent to earth with low speed long range radio.

An autonomous rover must plan paths to travel, be rugged and mobile tohandle the rough terrain that it will encounter.

2.1 The Robot: Sense, plan and act 15

Figure 2.6: A concept drawing of Spirit, one of the Mars rovers. Designed towithstand the hostile environment on planet Mars. (Courtesy of NASA)

2.1.3 Vision in robotics application

The old saying: “An image says more than a thousand words”, is very true.A lot of information can be collected from an image, which can be used forobstacle detection, object detection/recognition, self localisation and mapping,face recognition and much more.

On top of the obvious ability to see things with vision is that, for a robota camera system can be utilised for several purposes, i.e. navigation, objectrecognition, obstacle avoidance. Modern camera- and image processing per-formance, allows video rate processing and understanding which is vital forhigh speed applications. A robot is only useful if it can perform a given taskin reasonable time, and for time critical systems, i.e. obstacle avoidance andvehicle accident avoidance, the image processing must produce a result withsufficient quality and speed so that a critical situation can be avoided.

Speed is not always the main problem. It is vital for mobile robots, whichare battery powered, that an image processing system consumes little powerand that it is low weight. Consider a robot that harvests trees by climbing itand disassembling the tree while the tree is still standing. Carrying a heavycomputer for image processing is not an option.

There is a need for power efficient and compact vision systems, that canperform advanced image analysis at video rate (30Hz).

Page 33: Stereo vision algorithms in reconfigurable hardware for robotics applications

14 Chapter 2. Background and Motivation

AUV

The most common type of underwater vehicle is the Remotely Operated Vehi-cle (ROV). ROVs are often used in deep water exploration and has been usedfor locating drowned persons after boat accidents. The sensory equipment aremainly based on sonar technology. Complementary vision systems allow anoperator to see what exists in the near vicinity of the ROV.

An ROV can measure its own location using sonars that senses distinctiveformations at the sea floor. Together with a surface reference localisation with asupport surface ship as reference, maps can be created. The step from an ROVto an AUV is not so far, what is needed is intelligence. An AUV can be utilisedfor surveillance of off-shore installations, e.g. oil rigs, intruder surveillance ofsea-ports.

Figure 2.5: The autonomous underwater vehicle, designed by students atMälardalen University.

Rovers

A rover is a vehicle, man or unmanned, which is used for terrain exploration.The unmanned Mars rover Spirit landed on planet Mars in the beginning of2004, to start its geological research looking for water and analysing minerals.The rover contains several cameras, tools to drill in rocks and collect samplesof the surface for analysis using a wide range of instruments. Collected infor-mation was sent to earth with low speed long range radio.

An autonomous rover must plan paths to travel, be rugged and mobile tohandle the rough terrain that it will encounter.

2.1 The Robot: Sense, plan and act 15

Figure 2.6: A concept drawing of Spirit, one of the Mars rovers. Designed towithstand the hostile environment on planet Mars. (Courtesy of NASA)

2.1.3 Vision in robotics application

The old saying: “An image says more than a thousand words”, is very true.A lot of information can be collected from an image, which can be used forobstacle detection, object detection/recognition, self localisation and mapping,face recognition and much more.

On top of the obvious ability to see things with vision is that, for a robota camera system can be utilised for several purposes, i.e. navigation, objectrecognition, obstacle avoidance. Modern camera- and image processing per-formance, allows video rate processing and understanding which is vital forhigh speed applications. A robot is only useful if it can perform a given taskin reasonable time, and for time critical systems, i.e. obstacle avoidance andvehicle accident avoidance, the image processing must produce a result withsufficient quality and speed so that a critical situation can be avoided.

Speed is not always the main problem. It is vital for mobile robots, whichare battery powered, that an image processing system consumes little powerand that it is low weight. Consider a robot that harvests trees by climbing itand disassembling the tree while the tree is still standing. Carrying a heavycomputer for image processing is not an option.

There is a need for power efficient and compact vision systems, that canperform advanced image analysis at video rate (30Hz).

Page 34: Stereo vision algorithms in reconfigurable hardware for robotics applications

16 Chapter 2. Background and Motivation

2.1.4 Robotics for SME.

Figure 2.7: Opiflex, a mobile industrial platform. The platform is targeted toSMEs. (Photo: Anders Thunell)

In a development project called “Robotics for SMEs” a mobile platform foran industrial robot was designed. The task was to design a robotic system thatwould help an Small and Medium Enterprises (SME) to automate productionby introducing a new tool.

The largest obstacle in automation for SMEs is the cost, both the investmentin new equipment but also maintenance and setting a robot up for a new prod-uct. Typically an SME performs contract manufacturing of products in shortseries. Manual production is in many cases more cost efficient. The initial ideafor the project was to design a robot that could be moved by hand between dif-ferent machines to mainly perform machine tending operations. The idea grewinto a desire for an autonomous platform that could by it self move aroundto different machines and tend them. This grew even further to a concept ofone or several autonomous platforms that could handle the complete produc-tion from fetching raw material from a warehouse to tend all machines that areused in the production, carrying the material between machines and finally putthe manufactured piece in a box.

An autonomous platform like this would not lock up a machine, normally

2.2 Vision and image processing 17

an industrial robot is bolted to the floor in front of a machine, thus the machinecould be manually operated if so needed. Furthermore the factory owner wouldhave an increased freedom in positioning the machines.

To realise this kind of autonomous robot platform would require a visionsystem for navigation, that does not require guide lines to be painted on thefloor, no magnetic tapes or any kind of fixed guide system that is otherwiseused for Autonomous Guided Vehicle (AGV) systems in industry today. Witha vision based system natural landmarks can be used and the robot could beguided, for example using a joystick, by an operator to learn a new path. Prob-ably people would be moving around in the environment and things would bemoved around, this could also be handled by a vision system with obstacledetection and avoidance.

The required infrastructure would be docking stations in front of each ma-chine that is tended. The docking station should incorporate a charge plug,communication channels to the machine, compressed air if needed and mostimportantly a position stability and calibration.

Together with the features described, a simplified instruction system is de-sired. Normally a robot system is programmed with a teach pad that is usedto “jogg” the robot and set points in space, and by traditional programming. Ifthe robot could be instructed by for example voice and gestures anyone couldinstruct the robot how to work at each machine/station with little knowledge.

2.2 Vision and image processing

Image processing can be described as the process of transforming raw imagedata to understanding of image content. This involves improving the origi-nal image by correcting distortion, enhance colours and luminance as well asconverting between different colour domains.

Humans can easily distinguish objects in the environment. An object isoften separated from other objects by colour/pattern and or texture and linesseparating one object from another are in most cases easily detectable.

In computer vision the pattern and colour of an object can appear as dif-ferent objects. Consider a white wall that is partly shaded, resulting in lowerintensity in the white shade. A human understands that the shaded part is not anew wall or object on the wall. However, trying to implement that understand-ing in software running on a computer is tough.

Going from raw image data to understanding of the environment consistsof several steps and where the initial steps typically are referred to as low level

Page 35: Stereo vision algorithms in reconfigurable hardware for robotics applications

16 Chapter 2. Background and Motivation

2.1.4 Robotics for SME.

Figure 2.7: Opiflex, a mobile industrial platform. The platform is targeted toSMEs. (Photo: Anders Thunell)

In a development project called “Robotics for SMEs” a mobile platform foran industrial robot was designed. The task was to design a robotic system thatwould help an Small and Medium Enterprises (SME) to automate productionby introducing a new tool.

The largest obstacle in automation for SMEs is the cost, both the investmentin new equipment but also maintenance and setting a robot up for a new prod-uct. Typically an SME performs contract manufacturing of products in shortseries. Manual production is in many cases more cost efficient. The initial ideafor the project was to design a robot that could be moved by hand between dif-ferent machines to mainly perform machine tending operations. The idea grewinto a desire for an autonomous platform that could by it self move aroundto different machines and tend them. This grew even further to a concept ofone or several autonomous platforms that could handle the complete produc-tion from fetching raw material from a warehouse to tend all machines that areused in the production, carrying the material between machines and finally putthe manufactured piece in a box.

An autonomous platform like this would not lock up a machine, normally

2.2 Vision and image processing 17

an industrial robot is bolted to the floor in front of a machine, thus the machinecould be manually operated if so needed. Furthermore the factory owner wouldhave an increased freedom in positioning the machines.

To realise this kind of autonomous robot platform would require a visionsystem for navigation, that does not require guide lines to be painted on thefloor, no magnetic tapes or any kind of fixed guide system that is otherwiseused for Autonomous Guided Vehicle (AGV) systems in industry today. Witha vision based system natural landmarks can be used and the robot could beguided, for example using a joystick, by an operator to learn a new path. Prob-ably people would be moving around in the environment and things would bemoved around, this could also be handled by a vision system with obstacledetection and avoidance.

The required infrastructure would be docking stations in front of each ma-chine that is tended. The docking station should incorporate a charge plug,communication channels to the machine, compressed air if needed and mostimportantly a position stability and calibration.

Together with the features described, a simplified instruction system is de-sired. Normally a robot system is programmed with a teach pad that is usedto “jogg” the robot and set points in space, and by traditional programming. Ifthe robot could be instructed by for example voice and gestures anyone couldinstruct the robot how to work at each machine/station with little knowledge.

2.2 Vision and image processing

Image processing can be described as the process of transforming raw imagedata to understanding of image content. This involves improving the origi-nal image by correcting distortion, enhance colours and luminance as well asconverting between different colour domains.

Humans can easily distinguish objects in the environment. An object isoften separated from other objects by colour/pattern and or texture and linesseparating one object from another are in most cases easily detectable.

In computer vision the pattern and colour of an object can appear as dif-ferent objects. Consider a white wall that is partly shaded, resulting in lowerintensity in the white shade. A human understands that the shaded part is not anew wall or object on the wall. However, trying to implement that understand-ing in software running on a computer is tough.

Going from raw image data to understanding of the environment consistsof several steps and where the initial steps typically are referred to as low level

Page 36: Stereo vision algorithms in reconfigurable hardware for robotics applications

18 Chapter 2. Background and Motivation

image processing being categorised by a large amount of data and fairly simplealgorithms.

After the initial image enhancement algorithms, of low level complexity,the task of extracting interesting information from the image starts. So calledsalient regions that contain information of a certain kind that can be of in-terest, i.e. lines, corners, colour patterns (blobs). There are a large numberof detectors with different properties, i.e. Harris and Stephens combined cor-ner and edge detector, Sobel–edges, Hough–transform, Moravec–detector, SU-SAN, and others. A detector still contains fairly simple operations on localregions of a few pixels, 3×3 to 11×11 is normal.

Salient regions can be used for navigation, using a Simultaneous Localisa-tion And Mapping (SLAM) algorithm, or object recognition/detection. Eitherway it is yet a processing step that is computationally costly.

In the following sections I will discuss the required steps from the rawimage input to understanding of image content in more depth.

2.2.1 A camera system/Image sensor

Figure 2.8: Image sensor Bayer pattern

A digital camera consists primarily of a sensor and a lens. The image sensoris either of Charge Coupled Device (CCD) or Complementary Metal-OxideSemiconductor (CMOS) type, the CCD have traditionally had a higher imagequality while the CMOS sensor have been cheaper to manufacture. The CMOSsensors of today are almost as good as the CCD sensors regarding quality butexcels in power efficiency.

The sensor consists of a Bayer pattern pixel matrix, as seen in Figure 2.8,each pixel is primarily sensitive to light in one of three colours, red, greenor blue. Effectively half of the sensor consists of green pixels, red and blue

2.2 Vision and image processing 19

allocates one quarter each. A pixel cell is an element similar to solar cellswhich are charged by light and the charge level is converted to a digital valuewith an Analog-to-Digital Converter (ADC).

A modern sensor can have millions of pixels and handles speeds of about100 million pixels per second. For a 5 megapixel sensor that results in about15 frames per second, due to synchronisation overhead. The lens projects raysof light from a certain angle to each image cell (pixel). Without the lens eachimage cell would be affected by “all” ambient light. A lens is necessary butalso affects the image in a non-desired way by introducing distortion, whichwill be discussed further in section 2.2.3, that can be described by a radial anda tangential component. The barrel effect is easily visible with a fish-eye lens,c.f. Figure 2.9.

Modern camera chips have a multitude of settings that can be modified,such as skipping and binning, which allows for choosing the field of view,skipping reduces resolution but increases the field of view. The binning optioncombines the values of the outputted pixels with the skipped lines and/or rows.It is also possible to set which part of the image to use if not the whole imageis desired.

Figure 2.9: Example of fish–eye effect

The raw sensor data is in most cases pre processed for white–balance andcolour correction to ensure proper colour fidelity. Auto–white balance is inmost cases an option available in the camera chip, but one may want to use adifferent algorithm then the inbuilt.

Page 37: Stereo vision algorithms in reconfigurable hardware for robotics applications

18 Chapter 2. Background and Motivation

image processing being categorised by a large amount of data and fairly simplealgorithms.

After the initial image enhancement algorithms, of low level complexity,the task of extracting interesting information from the image starts. So calledsalient regions that contain information of a certain kind that can be of in-terest, i.e. lines, corners, colour patterns (blobs). There are a large numberof detectors with different properties, i.e. Harris and Stephens combined cor-ner and edge detector, Sobel–edges, Hough–transform, Moravec–detector, SU-SAN, and others. A detector still contains fairly simple operations on localregions of a few pixels, 3×3 to 11×11 is normal.

Salient regions can be used for navigation, using a Simultaneous Localisa-tion And Mapping (SLAM) algorithm, or object recognition/detection. Eitherway it is yet a processing step that is computationally costly.

In the following sections I will discuss the required steps from the rawimage input to understanding of image content in more depth.

2.2.1 A camera system/Image sensor

Figure 2.8: Image sensor Bayer pattern

A digital camera consists primarily of a sensor and a lens. The image sensoris either of Charge Coupled Device (CCD) or Complementary Metal-OxideSemiconductor (CMOS) type, the CCD have traditionally had a higher imagequality while the CMOS sensor have been cheaper to manufacture. The CMOSsensors of today are almost as good as the CCD sensors regarding quality butexcels in power efficiency.

The sensor consists of a Bayer pattern pixel matrix, as seen in Figure 2.8,each pixel is primarily sensitive to light in one of three colours, red, greenor blue. Effectively half of the sensor consists of green pixels, red and blue

2.2 Vision and image processing 19

allocates one quarter each. A pixel cell is an element similar to solar cellswhich are charged by light and the charge level is converted to a digital valuewith an Analog-to-Digital Converter (ADC).

A modern sensor can have millions of pixels and handles speeds of about100 million pixels per second. For a 5 megapixel sensor that results in about15 frames per second, due to synchronisation overhead. The lens projects raysof light from a certain angle to each image cell (pixel). Without the lens eachimage cell would be affected by “all” ambient light. A lens is necessary butalso affects the image in a non-desired way by introducing distortion, whichwill be discussed further in section 2.2.3, that can be described by a radial anda tangential component. The barrel effect is easily visible with a fish-eye lens,c.f. Figure 2.9.

Modern camera chips have a multitude of settings that can be modified,such as skipping and binning, which allows for choosing the field of view,skipping reduces resolution but increases the field of view. The binning optioncombines the values of the outputted pixels with the skipped lines and/or rows.It is also possible to set which part of the image to use if not the whole imageis desired.

Figure 2.9: Example of fish–eye effect

The raw sensor data is in most cases pre processed for white–balance andcolour correction to ensure proper colour fidelity. Auto–white balance is inmost cases an option available in the camera chip, but one may want to use adifferent algorithm then the inbuilt.

Page 38: Stereo vision algorithms in reconfigurable hardware for robotics applications

20 Chapter 2. Background and Motivation

2.2.2 Colour domains

In case the colour information is read in Bayer format, it must be convertedto a format appropriate for each case. A camera of 5 mega pixels produces 5million pixels, each of one colour, red, green or blue. To attain 5 million pixelsin for example RGB format the colour values must be interpolated for eachpixel.

There are three different cases in which the red component must be inter-polated from neighbouring red pixels on a green or blue pixels, see figure 2.10.The three cases for interpolation of the blue component is similar.

Figure 2.10: Three different cases for interpolation of the colour red from aBayer image. The red component for the centre pixel is either calculated fromfour neighbouring red pixels, two vertical red or two horizontal red neighbours.

For interpolation of the green component there are two cases, on red orblue pixels, in both cases the four neighbouring green pixels can be used. Aslightly better method for interpolation of the green component is a selectivemethod that either uses the two horizontal or vertical components. This methodrequires evaluation of neighbouring colour components, thus adding computa-tion overhead that has to be weighed against image quality improvements.

For selective interpolation of the green component on a red pixel equation2.1 should be used. Figure 2.11 describes the parameters of the equation.

G(R) =

(G1+G2)2 if |R1 −R3| < |R2 −R4|

(G3+G4)2 if |R1 −R3| > |R2 −R4|

(G1+G2+G3+G4)4 if |R1 −R3| = |R2 −R4|

(2.1)

Technical advances in camera sensors are rapid, a modern, low price, com-pact camera contains an image sensor capable of approximately 14 megapix-els. In raw format that is 14 megabyte of information per photo, given that thecolour depth is 8 bits.

Other colour domains such as HSI, RGB, YUV or grey-scale/monochromecan be acquired through binning or interpolation and transformation. One of

2.2 Vision and image processing 21

R1

G1

R4 G4 R G2 R2

G3

R3

Figure 2.11: Selective interpolation of the green component on a red pixel.

these colour domains are usually selected to get a continuous stream of thesame data, depending on the usage the choice differs.

2.2.3 Distortion and correction

Few, if any, lenses are perfect. The imperfection of the lens causes distortionin the image. Thus the first procedure always needed to start working with avision system is to perform a calibration in order to identify both the intrinsicand the extrinsic parameters (for multiple view cameras). Distortion correctionis an important step after the camera calibration, since the correction allows theuse of the pin hole camera model and the projective geometry to retrieve 3Dinformation from 2D images.

A popular tool used to model and estimate the camera distortion is the“Camera Calibration Toolbox for Matlab” by Bouguet [6]. The intrinsic pa-rameters identified include the lens distortion map, the principle points coordi-nates (CC) for the two sensors and the focal lengths (f ) in pixel units [7], seefigure 2.12. The lens distortion model consists of two parts: the radial model,shown by the [first term in equation 2.4], that is symmetric around the principlepoint, and the tangential model (also known as decentering distortion), shownin equation 2.5, that models the distortion effects caused by a leaning detectorsurface, or a non constant refraction index in the lens [8].

The radial distortion is modelled by a sixth order polynomial whose coef-ficients (ki) contain only even exponential terms.

Page 39: Stereo vision algorithms in reconfigurable hardware for robotics applications

20 Chapter 2. Background and Motivation

2.2.2 Colour domains

In case the colour information is read in Bayer format, it must be convertedto a format appropriate for each case. A camera of 5 mega pixels produces 5million pixels, each of one colour, red, green or blue. To attain 5 million pixelsin for example RGB format the colour values must be interpolated for eachpixel.

There are three different cases in which the red component must be inter-polated from neighbouring red pixels on a green or blue pixels, see figure 2.10.The three cases for interpolation of the blue component is similar.

Figure 2.10: Three different cases for interpolation of the colour red from aBayer image. The red component for the centre pixel is either calculated fromfour neighbouring red pixels, two vertical red or two horizontal red neighbours.

For interpolation of the green component there are two cases, on red orblue pixels, in both cases the four neighbouring green pixels can be used. Aslightly better method for interpolation of the green component is a selectivemethod that either uses the two horizontal or vertical components. This methodrequires evaluation of neighbouring colour components, thus adding computa-tion overhead that has to be weighed against image quality improvements.

For selective interpolation of the green component on a red pixel equation2.1 should be used. Figure 2.11 describes the parameters of the equation.

G(R) =

(G1+G2)2 if |R1 −R3| < |R2 −R4|

(G3+G4)2 if |R1 −R3| > |R2 −R4|

(G1+G2+G3+G4)4 if |R1 −R3| = |R2 −R4|

(2.1)

Technical advances in camera sensors are rapid, a modern, low price, com-pact camera contains an image sensor capable of approximately 14 megapix-els. In raw format that is 14 megabyte of information per photo, given that thecolour depth is 8 bits.

Other colour domains such as HSI, RGB, YUV or grey-scale/monochromecan be acquired through binning or interpolation and transformation. One of

2.2 Vision and image processing 21

R1

G1

R4 G4 R G2 R2

G3

R3

Figure 2.11: Selective interpolation of the green component on a red pixel.

these colour domains are usually selected to get a continuous stream of thesame data, depending on the usage the choice differs.

2.2.3 Distortion and correction

Few, if any, lenses are perfect. The imperfection of the lens causes distortionin the image. Thus the first procedure always needed to start working with avision system is to perform a calibration in order to identify both the intrinsicand the extrinsic parameters (for multiple view cameras). Distortion correctionis an important step after the camera calibration, since the correction allows theuse of the pin hole camera model and the projective geometry to retrieve 3Dinformation from 2D images.

A popular tool used to model and estimate the camera distortion is the“Camera Calibration Toolbox for Matlab” by Bouguet [6]. The intrinsic pa-rameters identified include the lens distortion map, the principle points coordi-nates (CC) for the two sensors and the focal lengths (f ) in pixel units [7], seefigure 2.12. The lens distortion model consists of two parts: the radial model,shown by the [first term in equation 2.4], that is symmetric around the principlepoint, and the tangential model (also known as decentering distortion), shownin equation 2.5, that models the distortion effects caused by a leaning detectorsurface, or a non constant refraction index in the lens [8].

The radial distortion is modelled by a sixth order polynomial whose coef-ficients (ki) contain only even exponential terms.

Page 40: Stereo vision algorithms in reconfigurable hardware for robotics applications

22 Chapter 2. Background and Motivation

Figure 2.12: The focal length (f ) is the distance from the lens axis to theprincipal focus. The principal point (CC) is the point on the image planewhere the principal axis is projected. A small misalignment between the lensand the image sensor will make the principal point and the centre of the imageplane to not coincide, which usually is the case.

r = |Pn| (2.2)

Pn =

[

Px −CCx

Py −CCy

]

(2.3)

Pd = Pn ∗ (1 + k1 ∗ r2 + k2 ∗ r

4 + k3 ∗ r6) + dp+ C (2.4)

dp =

[

2 ∗ k3 ∗Px ∗Py + k4 ∗ (r2 + 2 ∗P2

x)k3 ∗ (r

2 + 2 ∗P2y) + 2 ∗ k4 ∗Px ∗Py

]

(2.5)

Typically, one or two coefficients in the radial term are enough to compen-sate for the lens radial distortion, but in the case of a fish–eye lens, all threecoefficients may be required.

The presented model is used to calculate the distorted location of a pixel,which is useful when creating a corrected image by iterating over the outputimage and picking the colour values from the distorted input image. I call thisinverse mapping, see figure 2.13-b. The resulting image will be smaller thanthe input, effectively cropping the image.

On the other hand, an input pixels location can be corrected using an iter-

2.2 Vision and image processing 23

ative method that approximates the correct output location. I call this forwardmapping, see figure 2.13-a. The resulting image will be greater than the input,thus creating an image with undefined pixels. If the forward mapping methodis used to create a new image, undefined pixels must be interpolated for a com-plete output image.

Figure 2.13: To the left (a) an illustration of forward mapping and to the right(b) an illustration of inverse mapping.

2.2.4 Image features

It is easy for humans to find distinctive features in the image and the corre-sponding features in the room. However, for a computer, the image content isjust a matrix of colour values, so further processing is required to find salientregions that can be distinguished from other regions in an image. Some popu-lar and simple features being corners and edges. Methods for detecting cornersand edges were developed already in the eighties whereof one is “Harris andStephens combined corner and edge detector” which, in turn, was an improve-ment of the Moravec operator developed in 1977 [9, 10]. A feature is detectedby looking at each pixel and its neighbours and applying a method which givesa value of how well it corresponds to a feature of a certain kind, for examplefor a corner the intensity would shift in two directions, an edge would have ashift in one direction.

Every feature has an associated value of how strongly it matches the fea-ture type. For a corner this is called cornerness value. This is also the onlything defining a feature and makes it quite hard to associate features from twodifferent images taken in sequence or from different points of view. This is afundamental problem in stereo vision where you have to find the correspon-dence between the features from two cameras. A lot of work has been put intothis area and several local descriptors has been developed to increase featureuniqueness. A well known local descriptor is SIFT (Scale Invariant Feature

Page 41: Stereo vision algorithms in reconfigurable hardware for robotics applications

22 Chapter 2. Background and Motivation

Figure 2.12: The focal length (f ) is the distance from the lens axis to theprincipal focus. The principal point (CC) is the point on the image planewhere the principal axis is projected. A small misalignment between the lensand the image sensor will make the principal point and the centre of the imageplane to not coincide, which usually is the case.

r = |Pn| (2.2)

Pn =

[

Px −CCx

Py −CCy

]

(2.3)

Pd = Pn ∗ (1 + k1 ∗ r2 + k2 ∗ r

4 + k3 ∗ r6) + dp+ C (2.4)

dp =

[

2 ∗ k3 ∗Px ∗Py + k4 ∗ (r2 + 2 ∗P2

x)k3 ∗ (r

2 + 2 ∗P2y) + 2 ∗ k4 ∗Px ∗Py

]

(2.5)

Typically, one or two coefficients in the radial term are enough to compen-sate for the lens radial distortion, but in the case of a fish–eye lens, all threecoefficients may be required.

The presented model is used to calculate the distorted location of a pixel,which is useful when creating a corrected image by iterating over the outputimage and picking the colour values from the distorted input image. I call thisinverse mapping, see figure 2.13-b. The resulting image will be smaller thanthe input, effectively cropping the image.

On the other hand, an input pixels location can be corrected using an iter-

2.2 Vision and image processing 23

ative method that approximates the correct output location. I call this forwardmapping, see figure 2.13-a. The resulting image will be greater than the input,thus creating an image with undefined pixels. If the forward mapping methodis used to create a new image, undefined pixels must be interpolated for a com-plete output image.

Figure 2.13: To the left (a) an illustration of forward mapping and to the right(b) an illustration of inverse mapping.

2.2.4 Image features

It is easy for humans to find distinctive features in the image and the corre-sponding features in the room. However, for a computer, the image content isjust a matrix of colour values, so further processing is required to find salientregions that can be distinguished from other regions in an image. Some popu-lar and simple features being corners and edges. Methods for detecting cornersand edges were developed already in the eighties whereof one is “Harris andStephens combined corner and edge detector” which, in turn, was an improve-ment of the Moravec operator developed in 1977 [9, 10]. A feature is detectedby looking at each pixel and its neighbours and applying a method which givesa value of how well it corresponds to a feature of a certain kind, for examplefor a corner the intensity would shift in two directions, an edge would have ashift in one direction.

Every feature has an associated value of how strongly it matches the fea-ture type. For a corner this is called cornerness value. This is also the onlything defining a feature and makes it quite hard to associate features from twodifferent images taken in sequence or from different points of view. This is afundamental problem in stereo vision where you have to find the correspon-dence between the features from two cameras. A lot of work has been put intothis area and several local descriptors has been developed to increase featureuniqueness. A well known local descriptor is SIFT (Scale Invariant Feature

Page 42: Stereo vision algorithms in reconfigurable hardware for robotics applications

24 Chapter 2. Background and Motivation

Figure 2.14: An example of the Harris and Stephens combined corner and edgedetector output from a stereo camera, overlaid on the source images.

Transform) [11] which describes a feature by looking at its environment whiletransforming the image to increase the robustness to changes in scale and rota-tion. There are also other well known local descriptors, SURF [12], variants ofSIFT, Spin-Images [13], and others. Each trying to improve the most impor-tant properties of these methods, speed, repeatability and descriptor uniqueness[14].

2.2.5 Stereo vision

In computer vision, disparity is used as the only method to determine depth.Unlike humans that also use knowledge of scale of an object, the position ofmuscles that move our eyes, as well as knowledge of focus, i.e. how the lensesare formed to gain focus.

A photo of a view from two different perspectives allows for estimationof the distance to an object, either by area-based correlation and depth mapsor feature-based correlation and landmark triangulation. The relationship be-tween a point in a stereo images pair and the environment can, by consideringa pin-hole camera model (perfect projection of the world onto an image), bedescribed with equations 2.6 through 2.8.

2.2 Vision and image processing 25

d = fb

Z(2.6)

px = fX

Z(2.7)

py = fY

Z(2.8)

Where d is the disparity (separation on the image plane) which is describedby the distance between Il and Ir in figure 2.15, b is the baseline (separationbetween the two cameras), f is the focal length of the camera lenses and X ,Y and Z is the 3D space coordinate relative to the camera. px and py is theimage coordinate in the reference camera, often the left. This requires that thecameras are only separated, either vertically or horizontally, by the baseline b.

Figure 2.15: An illustration of the relationship between a landmark and a stereocamera.

The depth resolution is high at close range and degrades exponentially withdistance. The accuracy degradation is illustrated in figure 2.16. An imagefeature appears on a discrete location on the image sensor, the modelled error inthe figure describes a difference of plus/minus half a pixel. The error at fartherdistances can be several meters while at shorter range a matter of centimetres.

Page 43: Stereo vision algorithms in reconfigurable hardware for robotics applications

24 Chapter 2. Background and Motivation

Figure 2.14: An example of the Harris and Stephens combined corner and edgedetector output from a stereo camera, overlaid on the source images.

Transform) [11] which describes a feature by looking at its environment whiletransforming the image to increase the robustness to changes in scale and rota-tion. There are also other well known local descriptors, SURF [12], variants ofSIFT, Spin-Images [13], and others. Each trying to improve the most impor-tant properties of these methods, speed, repeatability and descriptor uniqueness[14].

2.2.5 Stereo vision

In computer vision, disparity is used as the only method to determine depth.Unlike humans that also use knowledge of scale of an object, the position ofmuscles that move our eyes, as well as knowledge of focus, i.e. how the lensesare formed to gain focus.

A photo of a view from two different perspectives allows for estimationof the distance to an object, either by area-based correlation and depth mapsor feature-based correlation and landmark triangulation. The relationship be-tween a point in a stereo images pair and the environment can, by consideringa pin-hole camera model (perfect projection of the world onto an image), bedescribed with equations 2.6 through 2.8.

2.2 Vision and image processing 25

d = fb

Z(2.6)

px = fX

Z(2.7)

py = fY

Z(2.8)

Where d is the disparity (separation on the image plane) which is describedby the distance between Il and Ir in figure 2.15, b is the baseline (separationbetween the two cameras), f is the focal length of the camera lenses and X ,Y and Z is the 3D space coordinate relative to the camera. px and py is theimage coordinate in the reference camera, often the left. This requires that thecameras are only separated, either vertically or horizontally, by the baseline b.

Figure 2.15: An illustration of the relationship between a landmark and a stereocamera.

The depth resolution is high at close range and degrades exponentially withdistance. The accuracy degradation is illustrated in figure 2.16. An imagefeature appears on a discrete location on the image sensor, the modelled error inthe figure describes a difference of plus/minus half a pixel. The error at fartherdistances can be several meters while at shorter range a matter of centimetres.

Page 44: Stereo vision algorithms in reconfigurable hardware for robotics applications

26 Chapter 2. Background and Motivation

1

2

3

4

5

6

7

8

2 3 4 5 6 7 8 9 10

Dis

tan

ce (

Z)

in m

eter

s

Disparity in pixels

Figure 2.16: The degradation of the distance accuracy is in this figure il-lustradet as the vertical error bars. It is obvious that the accuracy degradesgreatly with the distance to the object.

2.3 Hardware support for heavy computation tasks

Moore’s law suggests that every two years double the amount of transistors canbe placed on an Integrated Circuit (IC), inexpensively.

Computational power is a never ending story, the more we get the morewe need. We are constantly coming up with new ideas for things we can dowith computers if there was enough computing power. One area is the mobilephone market, where the trend is smarter phones with a multitude of applica-tions and high performance games that not so many years ago required a quitefast desktop computer with a high-end graphics card. This is something wecarry around in our pockets that originally was designed to make phone callsand later to send simple text messages. Today we navigate using Global Posi-tioning System (GPS), check what our friends are doing, take photographs andinstantly publish them on a website online, play advanced games and read thenewspaper on them.

As described in the previous sections image processing is a computation-

2.3 Hardware support for heavy computation tasks 27

ally demanding task and it requires still fairly hefty hardware to be able toprocess the increasing amount of data at increasingly higher frame rates.

All processing platforms are designed to be able to make as many calcula-tions as possible with varying demands on power efficiency, today the trend isof course mobility but also green computing, which both strive to increase theamount of computations per power unit.

There are a number of different architectures i.e DSP, Single InstructionMultiple Data (SIMD) and Multiple Instructions Multiple Data (MIMD) CPUs,GPU, FPGA, heterogeneous processors and historically interesting architec-tures like systolic arrays.

2.3.1 Digital Signal Processors

The DSP is designed for (as the name suggests) processing digital signals, athigh speed. DSPs has historically been the natural choice for image processingtasks.

The DSP is designed for signal processing utilising Very Long Instruc-tion Words (VLIW), 256 bit wide, to supply 32bit instructions to, up to eight,function units every clock cycle (TMS320). Two of the function units are fordata-addressing, two are for executing multiplications, and four executes arith-metic, logic and branch instructions. The popular TMS320 DSP family wasintroduced in 1983 and have since then been further developed, it has alsobeen incorporated in system-on-chip devices utilised in embedded devices aswell as cell phones. As an example the TMS320C6416T support up to 8000MIPS at a rate of eight 32-bit instructions per clock cycle and 28 operations perclock cycle. The core power requirements is around 1W for a 1GHz version,not considering additional IO-power. The VLIW architecture and the powerefficiency makes the DSP a good choice for embedded systems that do signalprocessing tasks.

2.3.2 Vector processors

Vector or array processors are sometimes used as coprocessors offloading themain computer that in these cases works as a control device rather than datacruncher [15]. Vector processors operates on vectors of different size as op-posed to scalar numbers. This architecture has to some extent been adopted innewer CPUs with the introduction of SIMD instructions such as Intel R©AVX,MMXTMand SSEnTM, PowerPC AltiVecTMand ARM NEONTM.

Page 45: Stereo vision algorithms in reconfigurable hardware for robotics applications

26 Chapter 2. Background and Motivation

1

2

3

4

5

6

7

8

2 3 4 5 6 7 8 9 10

Dis

tan

ce (

Z)

in m

eter

s

Disparity in pixels

Figure 2.16: The degradation of the distance accuracy is in this figure il-lustradet as the vertical error bars. It is obvious that the accuracy degradesgreatly with the distance to the object.

2.3 Hardware support for heavy computation tasks

Moore’s law suggests that every two years double the amount of transistors canbe placed on an Integrated Circuit (IC), inexpensively.

Computational power is a never ending story, the more we get the morewe need. We are constantly coming up with new ideas for things we can dowith computers if there was enough computing power. One area is the mobilephone market, where the trend is smarter phones with a multitude of applica-tions and high performance games that not so many years ago required a quitefast desktop computer with a high-end graphics card. This is something wecarry around in our pockets that originally was designed to make phone callsand later to send simple text messages. Today we navigate using Global Posi-tioning System (GPS), check what our friends are doing, take photographs andinstantly publish them on a website online, play advanced games and read thenewspaper on them.

As described in the previous sections image processing is a computation-

2.3 Hardware support for heavy computation tasks 27

ally demanding task and it requires still fairly hefty hardware to be able toprocess the increasing amount of data at increasingly higher frame rates.

All processing platforms are designed to be able to make as many calcula-tions as possible with varying demands on power efficiency, today the trend isof course mobility but also green computing, which both strive to increase theamount of computations per power unit.

There are a number of different architectures i.e DSP, Single InstructionMultiple Data (SIMD) and Multiple Instructions Multiple Data (MIMD) CPUs,GPU, FPGA, heterogeneous processors and historically interesting architec-tures like systolic arrays.

2.3.1 Digital Signal Processors

The DSP is designed for (as the name suggests) processing digital signals, athigh speed. DSPs has historically been the natural choice for image processingtasks.

The DSP is designed for signal processing utilising Very Long Instruc-tion Words (VLIW), 256 bit wide, to supply 32bit instructions to, up to eight,function units every clock cycle (TMS320). Two of the function units are fordata-addressing, two are for executing multiplications, and four executes arith-metic, logic and branch instructions. The popular TMS320 DSP family wasintroduced in 1983 and have since then been further developed, it has alsobeen incorporated in system-on-chip devices utilised in embedded devices aswell as cell phones. As an example the TMS320C6416T support up to 8000MIPS at a rate of eight 32-bit instructions per clock cycle and 28 operations perclock cycle. The core power requirements is around 1W for a 1GHz version,not considering additional IO-power. The VLIW architecture and the powerefficiency makes the DSP a good choice for embedded systems that do signalprocessing tasks.

2.3.2 Vector processors

Vector or array processors are sometimes used as coprocessors offloading themain computer that in these cases works as a control device rather than datacruncher [15]. Vector processors operates on vectors of different size as op-posed to scalar numbers. This architecture has to some extent been adopted innewer CPUs with the introduction of SIMD instructions such as Intel R©AVX,MMXTMand SSEnTM, PowerPC AltiVecTMand ARM NEONTM.

Page 46: Stereo vision algorithms in reconfigurable hardware for robotics applications

28 Chapter 2. Background and Motivation

2.3.3 Systolic array processors

Systolic arrays is another architecture family designed as a matrix of processingelements (PEs) with data communication registers for communicating betweenneighbouring PEs. Data flows through the matrix as entering from two sidesand the result flows out at the two remaining sides. The name of this archi-tecture family comes from that the data flows through the processor similar tohow blood flows in the human body. This is a very efficient architecture forvector and matrix operations.

2.3.4 Asymmetric multicore processors

The cell processor is an asymmetric multicore processor jointly developed bySony, IBM and Toshiba. It consists of a main core, a Power ArchitectureTMcorewhich is called a Power Processing Element (PPE), and eight additional copro-cessors called Synergistic Processing Elements (SPEs), the main task of thePPE is to distribute the workload over SPEs that are optimized for data pro-cessing [16]. The performance is about 256 GFLOPS, single precision. Figure2.17 depicts a cell processor die.

The Sony PlayStation R©3 uses such a processor.

Figure 2.17: A photo of a Cell processor die, the eight SPEs are locate to theright and the left part contains the Power Processor and control logic. (Cour-tesy of International Business Machines Corporation, ©International BusinessMachines Corporation.)

2.3 Hardware support for heavy computation tasks 29

2.3.5 General Purpose GPUs (GPGPU)

A modern GPU is capable of processing a few thousand GFLOPS (Billionfloating-point operations per second) at the cost of two to three hundred wattsof power.

The GPU has developed from being a very special device for processingpixels and vertices to a generic architecture, the GPGPU. It is constructed asa number of multiprocessors which in turn consists of a number of processingelements. A modern GPU contains hundreds of processing elements. Thedesign of the GPU makes it perfect for running the same computation on a largeamounts of data, which is the case in image processing applications. Howeverthe programmer may run into problems concerning data sharing and executionsynchronisation which can drastically reduce performance.

Since the GPU is merely a coprocessor a CPU is required for both feedingdata and instructions to the GPU, as well as reading the result. The program-mer must take care to efficiently utilise the processing elements and hide therelatively high GPU memory latency.

2.3.6 Field Programmable Gate Arrays

An FPGA is a cell grid of logic cells, memories, multipliers/DSP-elements andwith a programmable switching network. Both logic and switching networkis reconfigurable. And FPGA device can be configured in a way that allowsfor concurrent computation of data. The resources are, per device, fixed andlimited.

FPGA manufacturers have a wide range of devices with different size, char-acteristic and properties. FPGA architectures (configurations) can be describedusing: Hardware Description Language (HDL) (i.e. VHDL, Verilog), a highlevel language (i.e. Handel–C, SA–C), as well as graphical drag and drop pro-gramming methods.

Modern FPGAs also incorporate DSP blocks, beneficial for signal process-ing applications, i.e. image processing. A typical low-cost FPGA contains 2-3million system gate equivalents and approximately 100 DSP elements. EachDSP element supports one multiplication and add operations per clock cycle.The FPGA it self runs at orders of magnitudes lower clock frequencies thana computer but can perform order of magnitudes faster, due to the high levelof parallelisation that can be achieved, both temporal (pipe-lining) and spatial.Field programmable gate arrays have been around for some time but still underactive development with improving power efficiency, size/prize ratio as well as

Page 47: Stereo vision algorithms in reconfigurable hardware for robotics applications

28 Chapter 2. Background and Motivation

2.3.3 Systolic array processors

Systolic arrays is another architecture family designed as a matrix of processingelements (PEs) with data communication registers for communicating betweenneighbouring PEs. Data flows through the matrix as entering from two sidesand the result flows out at the two remaining sides. The name of this archi-tecture family comes from that the data flows through the processor similar tohow blood flows in the human body. This is a very efficient architecture forvector and matrix operations.

2.3.4 Asymmetric multicore processors

The cell processor is an asymmetric multicore processor jointly developed bySony, IBM and Toshiba. It consists of a main core, a Power ArchitectureTMcorewhich is called a Power Processing Element (PPE), and eight additional copro-cessors called Synergistic Processing Elements (SPEs), the main task of thePPE is to distribute the workload over SPEs that are optimized for data pro-cessing [16]. The performance is about 256 GFLOPS, single precision. Figure2.17 depicts a cell processor die.

The Sony PlayStation R©3 uses such a processor.

Figure 2.17: A photo of a Cell processor die, the eight SPEs are locate to theright and the left part contains the Power Processor and control logic. (Cour-tesy of International Business Machines Corporation, ©International BusinessMachines Corporation.)

2.3 Hardware support for heavy computation tasks 29

2.3.5 General Purpose GPUs (GPGPU)

A modern GPU is capable of processing a few thousand GFLOPS (Billionfloating-point operations per second) at the cost of two to three hundred wattsof power.

The GPU has developed from being a very special device for processingpixels and vertices to a generic architecture, the GPGPU. It is constructed asa number of multiprocessors which in turn consists of a number of processingelements. A modern GPU contains hundreds of processing elements. Thedesign of the GPU makes it perfect for running the same computation on a largeamounts of data, which is the case in image processing applications. Howeverthe programmer may run into problems concerning data sharing and executionsynchronisation which can drastically reduce performance.

Since the GPU is merely a coprocessor a CPU is required for both feedingdata and instructions to the GPU, as well as reading the result. The program-mer must take care to efficiently utilise the processing elements and hide therelatively high GPU memory latency.

2.3.6 Field Programmable Gate Arrays

An FPGA is a cell grid of logic cells, memories, multipliers/DSP-elements andwith a programmable switching network. Both logic and switching networkis reconfigurable. And FPGA device can be configured in a way that allowsfor concurrent computation of data. The resources are, per device, fixed andlimited.

FPGA manufacturers have a wide range of devices with different size, char-acteristic and properties. FPGA architectures (configurations) can be describedusing: Hardware Description Language (HDL) (i.e. VHDL, Verilog), a highlevel language (i.e. Handel–C, SA–C), as well as graphical drag and drop pro-gramming methods.

Modern FPGAs also incorporate DSP blocks, beneficial for signal process-ing applications, i.e. image processing. A typical low-cost FPGA contains 2-3million system gate equivalents and approximately 100 DSP elements. EachDSP element supports one multiplication and add operations per clock cycle.The FPGA it self runs at orders of magnitudes lower clock frequencies thana computer but can perform order of magnitudes faster, due to the high levelof parallelisation that can be achieved, both temporal (pipe-lining) and spatial.Field programmable gate arrays have been around for some time but still underactive development with improving power efficiency, size/prize ratio as well as

Page 48: Stereo vision algorithms in reconfigurable hardware for robotics applications

30 Chapter 2. Background and Motivation

Figure 2.18: An illustration of an FPGA, showing the configurable logic cells,the configurable routing net and I/O buffers.

improved technology.

2.4 FPGA for (stereo-) vision applications

There are a few reasons why using an FPGA can be advantageous; an FPGAcan be orders of magnitude faster than a high end PC, they are almost as flex-ible as a PC, and Power consumption is very low compared to performance.As Sirowy et al. [17] writes, a typical CPU must fetch an instruction frommemory, execute the instruction, and store the result in memory. For an FPGAon the other hand the configuration is the instructions. This is one major rea-son why an FPGA is so fast, another major reason is the possibility of extremeparallelism.

The flexibility of the FPGA comes from the ease of reconfiguring and thewide range of tools available for describing the configuration. There are spe-cific languages designed for this purpose but also tools based on more generallanguage commonly used for writing PC programs, i.e. variants of C, Python,Haskell. An FPGA is also very power efficient, Bonato et.al. [18] makes acomparison between an FPGA, Pentium M and an ARM processor. The FPGAuses 1.3% and 12.3% of the power required by the Pentium M and ARM re-spectively, per data unit. There are of course drawbacks, a few being; using anFPGA in an efficient way requires different skills than for software program-

2.4 FPGA for (stereo-) vision applications 31

mers. Resources are strictly limited, and resources are device dependant, thusthe developer must consider the device in the design.

Developing algorithms for vision applications is probably a lot easier insoftware and modern maths tools often have vision libraries that improve de-velopment speed and the developer can focus on developing new algorithmsor systems. However in production equipment there are a number of factorsthat needs to be considered such as computational power, component cost andpower consumption, which also effects system cost.

In the rapid development pace of electronic equipment today FPGA suitsvery well for both industrial equipment as well as consumer products. Sincethe FPGA can be reconfigured a device can be upgraded with new or improvedfunctionality by altering its FPGA configuration, which in consumer devicesoften is referred to as firmware.

The strongest part of the FPGA is however, the possibility to implement vir-tually unlimited instruction parallelism and that the data path is the instructionswhich in software must be fetched from memory. This is further described bySirowy et. al. [17].

FPGAs is not always the best platform, for high level image algorithms thatare typically more complex and the data rate is fairly low, a microprocessor orcomputer is probably a better platform. A modern (laptop- or desktop-) com-puter has large data cashes and an unmatched clock frequency, approximatelyone order of magnitude higher than on an FPGA.

An FPGA does not natively support division, which requires the developerto incorporate division components or utilize shift operators to realize divisionwith a power of two denominator. This is probably one of the more commonproblems that a developer constantly has to deal with. The incorporation of adivisor component means a sacrifice of hardware resources and/or a numberof clock cycles delay. This is especially important when designing for smallFPGAs and CPLDs.

2.4.1 Designing FPGA architectures (What they need)

To design an FPGA architecture is similar to design software. The architectureitself is, however, very different from software. The design is often data centric,where data flows through a number of components that operate on the datapossibly with some local storage for performing operations on a group of data,i.e. window operators.

Usually the design work starts at a conceptual level, where an algorithm isdeveloped in a scripting tool like Matlab or similar that is suitable for the task.

Page 49: Stereo vision algorithms in reconfigurable hardware for robotics applications

30 Chapter 2. Background and Motivation

Figure 2.18: An illustration of an FPGA, showing the configurable logic cells,the configurable routing net and I/O buffers.

improved technology.

2.4 FPGA for (stereo-) vision applications

There are a few reasons why using an FPGA can be advantageous; an FPGAcan be orders of magnitude faster than a high end PC, they are almost as flex-ible as a PC, and Power consumption is very low compared to performance.As Sirowy et al. [17] writes, a typical CPU must fetch an instruction frommemory, execute the instruction, and store the result in memory. For an FPGAon the other hand the configuration is the instructions. This is one major rea-son why an FPGA is so fast, another major reason is the possibility of extremeparallelism.

The flexibility of the FPGA comes from the ease of reconfiguring and thewide range of tools available for describing the configuration. There are spe-cific languages designed for this purpose but also tools based on more generallanguage commonly used for writing PC programs, i.e. variants of C, Python,Haskell. An FPGA is also very power efficient, Bonato et.al. [18] makes acomparison between an FPGA, Pentium M and an ARM processor. The FPGAuses 1.3% and 12.3% of the power required by the Pentium M and ARM re-spectively, per data unit. There are of course drawbacks, a few being; using anFPGA in an efficient way requires different skills than for software program-

2.4 FPGA for (stereo-) vision applications 31

mers. Resources are strictly limited, and resources are device dependant, thusthe developer must consider the device in the design.

Developing algorithms for vision applications is probably a lot easier insoftware and modern maths tools often have vision libraries that improve de-velopment speed and the developer can focus on developing new algorithmsor systems. However in production equipment there are a number of factorsthat needs to be considered such as computational power, component cost andpower consumption, which also effects system cost.

In the rapid development pace of electronic equipment today FPGA suitsvery well for both industrial equipment as well as consumer products. Sincethe FPGA can be reconfigured a device can be upgraded with new or improvedfunctionality by altering its FPGA configuration, which in consumer devicesoften is referred to as firmware.

The strongest part of the FPGA is however, the possibility to implement vir-tually unlimited instruction parallelism and that the data path is the instructionswhich in software must be fetched from memory. This is further described bySirowy et. al. [17].

FPGAs is not always the best platform, for high level image algorithms thatare typically more complex and the data rate is fairly low, a microprocessor orcomputer is probably a better platform. A modern (laptop- or desktop-) com-puter has large data cashes and an unmatched clock frequency, approximatelyone order of magnitude higher than on an FPGA.

An FPGA does not natively support division, which requires the developerto incorporate division components or utilize shift operators to realize divisionwith a power of two denominator. This is probably one of the more commonproblems that a developer constantly has to deal with. The incorporation of adivisor component means a sacrifice of hardware resources and/or a numberof clock cycles delay. This is especially important when designing for smallFPGAs and CPLDs.

2.4.1 Designing FPGA architectures (What they need)

To design an FPGA architecture is similar to design software. The architectureitself is, however, very different from software. The design is often data centric,where data flows through a number of components that operate on the datapossibly with some local storage for performing operations on a group of data,i.e. window operators.

Usually the design work starts at a conceptual level, where an algorithm isdeveloped in a scripting tool like Matlab or similar that is suitable for the task.

Page 50: Stereo vision algorithms in reconfigurable hardware for robotics applications

32 Chapter 2. Background and Motivation

Then the algorithm is adopted to run on a FPGA, often making simplificationsin the calculations to save resources on FPGA. The FPGA natively supportsaddition (subtraction) multiplication and bitwise logic operations. For exampledivision and square root requires a larger amount of logic to execute and isoften replaced with simplifications, since a division with a divisor which is apower of two can be implemented as a shift operation instead, some constantdivision can be can be altered to the closest power of two.

When an adopted algorithm is implemented in software and the resultsevaluated sufficiently good, it can be transferred to the target platform. Rewrit-ing a sequential algorithm to a parallel architecture is an error prone process.The problem has been acknowledged and new languages has evolved, basedon the C-language with extensions for support of hardware specific properties,i.e. bit length of signals (“variables”). Two such examples are Handel-C [19]and SA-C [20].

2.5 Embedded FPGA based vision system

For embedded vision systems, a key factor is power consumption and for real–time vision applications of course also the performance, with respect to thealgorithms that has to be run.

There are a number of embedded vision systems that are designed withdifferent main priorities, i.e. cost, power usage, power efficiency, performance.

Rowe et.al. [21], describes a low cost monocular vision system based on asingle CMOS colour camera and an SX52 RISC microcontroller. The systemsupports 352×288 pixel resolution with a maximum frame rate of 50 fps.

Khaleghi et.al. [22], describes a small form factor (5×5 cm) stereo camerasystem based on two tiny CMOS camera sensors, an state-of-the-art embeddedprocessor (dual blackfin cores at 600MHz). The system operates at 20 fps andutilises only 2.3 W.

Sawasaki et.al. [23], describes a six-camera vision system for robot nav-igation based on a combined DSP and FPGA board, a systolic array of 64processing elements was implemented on the FPGA which worked as a co-processor for the DSP. The data from two, out of six, cameras are selectivelyinput to the image processor via a camera switch, effectively providing alter-native baselines. The system operates at 30 fps and utilises 10W.

As of today many image processing implementations relies on powerfulcomputers, in some cases in combination with an FPGA that is foremost usedfor lower level processing or as a co-processor. But there are also some stan-

2.5 Embedded FPGA based vision system 33

dalone vision systems based on FPGAs.

We suggest a system that has multifaceted application areas. A block dia-gram that describes a system overview can be seen in figure 2.19.

Figure 2.19: Two-Camera system block diagram

The basis is two CMOS camera chips that are directly connected to theFPGA, enabling full control to the FPGA. One of the cameras is located on abreak-off board that can be removed for slimmer size (see figure 2.20).

The FPGA is in turn connected to an on-board Ethernet switch through aphy-layer chip. So that the system can, in its simplest form, be used as a stereo-/mono camera that dumps images to a host system over Ethernet. But canalso be a standalone system with an on-board Q7 module that can take part inprocessing the image at higher level or utilise results produced from the FPGAto perform higher level cognitive tasks, control a robot, provide information toa truck or car driver, and more. The board also incorporates SD-card reader,USB and LVDS connectivity.

Page 51: Stereo vision algorithms in reconfigurable hardware for robotics applications

32 Chapter 2. Background and Motivation

Then the algorithm is adopted to run on a FPGA, often making simplificationsin the calculations to save resources on FPGA. The FPGA natively supportsaddition (subtraction) multiplication and bitwise logic operations. For exampledivision and square root requires a larger amount of logic to execute and isoften replaced with simplifications, since a division with a divisor which is apower of two can be implemented as a shift operation instead, some constantdivision can be can be altered to the closest power of two.

When an adopted algorithm is implemented in software and the resultsevaluated sufficiently good, it can be transferred to the target platform. Rewrit-ing a sequential algorithm to a parallel architecture is an error prone process.The problem has been acknowledged and new languages has evolved, basedon the C-language with extensions for support of hardware specific properties,i.e. bit length of signals (“variables”). Two such examples are Handel-C [19]and SA-C [20].

2.5 Embedded FPGA based vision system

For embedded vision systems, a key factor is power consumption and for real–time vision applications of course also the performance, with respect to thealgorithms that has to be run.

There are a number of embedded vision systems that are designed withdifferent main priorities, i.e. cost, power usage, power efficiency, performance.

Rowe et.al. [21], describes a low cost monocular vision system based on asingle CMOS colour camera and an SX52 RISC microcontroller. The systemsupports 352×288 pixel resolution with a maximum frame rate of 50 fps.

Khaleghi et.al. [22], describes a small form factor (5×5 cm) stereo camerasystem based on two tiny CMOS camera sensors, an state-of-the-art embeddedprocessor (dual blackfin cores at 600MHz). The system operates at 20 fps andutilises only 2.3 W.

Sawasaki et.al. [23], describes a six-camera vision system for robot nav-igation based on a combined DSP and FPGA board, a systolic array of 64processing elements was implemented on the FPGA which worked as a co-processor for the DSP. The data from two, out of six, cameras are selectivelyinput to the image processor via a camera switch, effectively providing alter-native baselines. The system operates at 30 fps and utilises 10W.

As of today many image processing implementations relies on powerfulcomputers, in some cases in combination with an FPGA that is foremost usedfor lower level processing or as a co-processor. But there are also some stan-

2.5 Embedded FPGA based vision system 33

dalone vision systems based on FPGAs.

We suggest a system that has multifaceted application areas. A block dia-gram that describes a system overview can be seen in figure 2.19.

Figure 2.19: Two-Camera system block diagram

The basis is two CMOS camera chips that are directly connected to theFPGA, enabling full control to the FPGA. One of the cameras is located on abreak-off board that can be removed for slimmer size (see figure 2.20).

The FPGA is in turn connected to an on-board Ethernet switch through aphy-layer chip. So that the system can, in its simplest form, be used as a stereo-/mono camera that dumps images to a host system over Ethernet. But canalso be a standalone system with an on-board Q7 module that can take part inprocessing the image at higher level or utilise results produced from the FPGAto perform higher level cognitive tasks, control a robot, provide information toa truck or car driver, and more. The board also incorporates SD-card reader,USB and LVDS connectivity.

Page 52: Stereo vision algorithms in reconfigurable hardware for robotics applications

34 Chapter 2. Background and Motivation

2.6 Component based software development for

hybrid FPGA systems

In the software engineering community a new paradigm has evolved in therecent years, CBSE. According to Crnkovic and Larsson [24] The major goalsof CBSE are:

• To provide support for the development of systems as assemblies of com-ponents.

• To support the development of components as reusable entities.

• To facilitate the maintenance and upgrading of systems by customizingand replacing their components.

These are fundamental ideas that, I believe, can be implemented in anysystem, weather it is software, FPGA architectures, or electrical components.To manage component based development in FPGA architectures we believethat a number of rules needs to be applied in component design.

Figure 2.20: A photo of the Two-Camera system developed in our researchgroup

2.7 Simultaneous Localization And Mapping (SLAM) 35

• Minimal signal interfaces

• Self contained components

• Provider / Consumer hierarchy

• Proxies for type conversion as component "glue"

– RGB to HSI

– RGB to CMYK

– RGB to bin

Developing FPGA architectures using HDL is done through entity spec-ifications where each entity is a small component (not to be confused with asystem component as in CBSE). Designing in HDL has been compared towriting software in assembly languages [25], it takes time and it is easy to domistakes with the traditional ’data-flow’ design process which is further sup-ported by Gaisler in [26]. Gaisler promotes a two process design methodologywhich is supposed to improve:

• Development time (writing code, debugging)

• Simulation and synthesis speed

• Readability

• Maintenance and re–use

As in any software design project, regardless of the language used, theimportance of consistency, structure and documentation can not be stressedenough.

2.7 Simultaneous Localization And Mapping (SLAM)

Navigation is a fundamental functionality in mobile robotics. This is an areain robotics that has been thoroughly researched with a number of developedalgorithms. There was in the beginning to categories of SLAM algorithmsbased on either the Kalman filter or particle filters.

An important step in most vision systems is the correspondence problem,the same goes for SLAM systems. Landmarks seen at any given time must bematched with landmarks in the map to perform an update of the egomotion.

Page 53: Stereo vision algorithms in reconfigurable hardware for robotics applications

34 Chapter 2. Background and Motivation

2.6 Component based software development for

hybrid FPGA systems

In the software engineering community a new paradigm has evolved in therecent years, CBSE. According to Crnkovic and Larsson [24] The major goalsof CBSE are:

• To provide support for the development of systems as assemblies of com-ponents.

• To support the development of components as reusable entities.

• To facilitate the maintenance and upgrading of systems by customizingand replacing their components.

These are fundamental ideas that, I believe, can be implemented in anysystem, weather it is software, FPGA architectures, or electrical components.To manage component based development in FPGA architectures we believethat a number of rules needs to be applied in component design.

Figure 2.20: A photo of the Two-Camera system developed in our researchgroup

2.7 Simultaneous Localization And Mapping (SLAM) 35

• Minimal signal interfaces

• Self contained components

• Provider / Consumer hierarchy

• Proxies for type conversion as component "glue"

– RGB to HSI

– RGB to CMYK

– RGB to bin

Developing FPGA architectures using HDL is done through entity spec-ifications where each entity is a small component (not to be confused with asystem component as in CBSE). Designing in HDL has been compared towriting software in assembly languages [25], it takes time and it is easy to domistakes with the traditional ’data-flow’ design process which is further sup-ported by Gaisler in [26]. Gaisler promotes a two process design methodologywhich is supposed to improve:

• Development time (writing code, debugging)

• Simulation and synthesis speed

• Readability

• Maintenance and re–use

As in any software design project, regardless of the language used, theimportance of consistency, structure and documentation can not be stressedenough.

2.7 Simultaneous Localization And Mapping (SLAM)

Navigation is a fundamental functionality in mobile robotics. This is an areain robotics that has been thoroughly researched with a number of developedalgorithms. There was in the beginning to categories of SLAM algorithmsbased on either the Kalman filter or particle filters.

An important step in most vision systems is the correspondence problem,the same goes for SLAM systems. Landmarks seen at any given time must bematched with landmarks in the map to perform an update of the egomotion.

Page 54: Stereo vision algorithms in reconfigurable hardware for robotics applications

36 Chapter 2. Background and Motivation

This pose a huge problem in kidnapping situations, when the robot is blindfolded and moved to a new location, where basically the whole map must besearched to find the best possible match.

Encoder based odometry is probably the most common way to determinethe motion of a robot. On omnidirectional and differential drive robots, slip isa problem that affects the estimation of the robot motion. Over time, encoderbased odometry is therefore not a reliable method for determining the robotmotion. For short motions it is however relatively reliable, this is why encodersare still commonly used as a complementary sensor.

In robot navigation, vision is a commonly used sensor for detecting objectsor features in the environment using their position relative to the robots posi-tion. Features in the environment can be used as landmarks to build a map andthen recover the current position with respect to the map. This is known asSLAM, simultaneous localization and mapping.

A vision system has both the drawback and benefit of being informationdens. The information density provides the opportunity to select different prop-erties from an image to extract the most beneficial for the task at hand. Somecommonly used properties of images are corners, edges and intensity regions.

The negative aspect of vision is that it requires substantial processing ofthe image before the data can be used. In many applications where vision isused the desired information from the image are depth and direction. To beable to extract the depth of a feature, at least two images are required and thecorrespondence between the image data has to be found, further increasing theamount of processing required.

To do real time vision, at high frame rates, it is necessary to use efficientalgorithms and/or a high performance computing device/-s.

2.8 Summary

Robotics is the science of autonomous machines that interact physically withthe environment after sensing it and carefully planning the physical actions.Sensors are central for a robot, vision is very important for humans and canbe equally important for a robot. Though vision impose a problem due tothe large amount of data and advanced algorithms required to extract essentialinformation from the images.

Navigation is vital for many mobile robot applications and vision basednavigation would probably involve tasks such as: Image capture, image colourenhancement, distortion correction, feature extraction, (descriptor calculations,)

2.8 Summary 37

stereo matching, triangulation, pose estimation and map update. And all this atas high rate as possible to increase precision and to enable faster motion.

Matching the amount of calculations and the demand in update frequencycan be a problem, sometimes a combination of computing units are used, i.e.

DSP, ASIC, FPGA, CPU, GPU. For example an FPGA can be used for thelow level image processing where the algorithms are still fairly simple and theamount of data is large, and a CPU/GPU combination for the higher level taskswhich involve more advanced algorithms.

So there are a number of tasks that needs to be performed that can be trans-lated into software/hardware components. I suggest that each algorithm shouldbe implemented as a self contained component that receives input of a welldefined type and produces a result with well defined type and deterministicperformance. The actual implementation is not important. This allows a sys-tem to be designed at system level and components can be exchanged as long asthey conform to the input/output type and can match the required performance.

As seen in figure 2.21 there are a number of tasks that is to be executed andeach task should be implemented as a component as described above. The taskimplementations are mapped with a computing device, but could be alteredwith an alternative implementation either for the same device or a differentone.

Communication overhead should be minimized since this can be a bottle-neck and communicating data back and forth between devices should be avoidedfor large data sets.

Figure 2.21: An illustration of the tasks involved in autonomous navigation.

Page 55: Stereo vision algorithms in reconfigurable hardware for robotics applications

36 Chapter 2. Background and Motivation

This pose a huge problem in kidnapping situations, when the robot is blindfolded and moved to a new location, where basically the whole map must besearched to find the best possible match.

Encoder based odometry is probably the most common way to determinethe motion of a robot. On omnidirectional and differential drive robots, slip isa problem that affects the estimation of the robot motion. Over time, encoderbased odometry is therefore not a reliable method for determining the robotmotion. For short motions it is however relatively reliable, this is why encodersare still commonly used as a complementary sensor.

In robot navigation, vision is a commonly used sensor for detecting objectsor features in the environment using their position relative to the robots posi-tion. Features in the environment can be used as landmarks to build a map andthen recover the current position with respect to the map. This is known asSLAM, simultaneous localization and mapping.

A vision system has both the drawback and benefit of being informationdens. The information density provides the opportunity to select different prop-erties from an image to extract the most beneficial for the task at hand. Somecommonly used properties of images are corners, edges and intensity regions.

The negative aspect of vision is that it requires substantial processing ofthe image before the data can be used. In many applications where vision isused the desired information from the image are depth and direction. To beable to extract the depth of a feature, at least two images are required and thecorrespondence between the image data has to be found, further increasing theamount of processing required.

To do real time vision, at high frame rates, it is necessary to use efficientalgorithms and/or a high performance computing device/-s.

2.8 Summary

Robotics is the science of autonomous machines that interact physically withthe environment after sensing it and carefully planning the physical actions.Sensors are central for a robot, vision is very important for humans and canbe equally important for a robot. Though vision impose a problem due tothe large amount of data and advanced algorithms required to extract essentialinformation from the images.

Navigation is vital for many mobile robot applications and vision basednavigation would probably involve tasks such as: Image capture, image colourenhancement, distortion correction, feature extraction, (descriptor calculations,)

2.8 Summary 37

stereo matching, triangulation, pose estimation and map update. And all this atas high rate as possible to increase precision and to enable faster motion.

Matching the amount of calculations and the demand in update frequencycan be a problem, sometimes a combination of computing units are used, i.e.

DSP, ASIC, FPGA, CPU, GPU. For example an FPGA can be used for thelow level image processing where the algorithms are still fairly simple and theamount of data is large, and a CPU/GPU combination for the higher level taskswhich involve more advanced algorithms.

So there are a number of tasks that needs to be performed that can be trans-lated into software/hardware components. I suggest that each algorithm shouldbe implemented as a self contained component that receives input of a welldefined type and produces a result with well defined type and deterministicperformance. The actual implementation is not important. This allows a sys-tem to be designed at system level and components can be exchanged as long asthey conform to the input/output type and can match the required performance.

As seen in figure 2.21 there are a number of tasks that is to be executed andeach task should be implemented as a component as described above. The taskimplementations are mapped with a computing device, but could be alteredwith an alternative implementation either for the same device or a differentone.

Communication overhead should be minimized since this can be a bottle-neck and communicating data back and forth between devices should be avoidedfor large data sets.

Figure 2.21: An illustration of the tasks involved in autonomous navigation.

Page 56: Stereo vision algorithms in reconfigurable hardware for robotics applications

Chapter 3

Summary of papers and their

contribution

3.1 Paper A

This paper describes a new approach to exclude image features from a featuredetector while stereo matching them. The basic principal is that features areextracted from a stereo camera, and every feature from left image is matchedto every feature in the right image that could possibly be the correct match.

Every stereo pair is then triangulated to form a landmark in 3D space. Af-ter moving the camera system, mounted on a robot, and measuring the distancemoved using a secondary sensor, i.e. wheel encoders. At the new position thesame procedure is performed with the features and finally each 3D landmarkthat does not support the motion is excluded, or rather the landmarks that sup-ports the motion is stored. The method creates a number of spurious matches,hence the name.

The algorithm is designed with an FPGA implementation in mind, where adeterministic execution is vital.

3.1.1 Contribution

The contribution in this paper is a new method for evaluating the stereo match-ing. Traditional methods involve statistic evaluation, i.e. least squared dis-tance, and similar. Our method supports real-time performance, by allowing

39

Page 57: Stereo vision algorithms in reconfigurable hardware for robotics applications

Chapter 3

Summary of papers and their

contribution

3.1 Paper A

This paper describes a new approach to exclude image features from a featuredetector while stereo matching them. The basic principal is that features areextracted from a stereo camera, and every feature from left image is matchedto every feature in the right image that could possibly be the correct match.

Every stereo pair is then triangulated to form a landmark in 3D space. Af-ter moving the camera system, mounted on a robot, and measuring the distancemoved using a secondary sensor, i.e. wheel encoders. At the new position thesame procedure is performed with the features and finally each 3D landmarkthat does not support the motion is excluded, or rather the landmarks that sup-ports the motion is stored. The method creates a number of spurious matches,hence the name.

The algorithm is designed with an FPGA implementation in mind, where adeterministic execution is vital.

3.1.1 Contribution

The contribution in this paper is a new method for evaluating the stereo match-ing. Traditional methods involve statistic evaluation, i.e. least squared dis-tance, and similar. Our method supports real-time performance, by allowing

39

Page 58: Stereo vision algorithms in reconfigurable hardware for robotics applications

40 Chapter 3. Summary of papers and their contribution

suspension of the exclusion/inclusion algorithm and thus allowing a maximumexecution time.

The landmarks that supports the motion are latter used for ego-motion es-timation.

I was the main author of this paper, the concept was developed by me andFredrik Ekstrand. I developed software for interfacing the FPGA system andperformed the experiments.

3.2 Paper B

This paper describes controlled experiments based on the algorithm presentedin paper A. A stereo camera is positioned vertically with the cameras facingthe ceiling. Features are extracted and the camera moved different distancesby hand, with unknown uncertainty, and the spurious matching is executed onthe data sets. The landmarks that are classified as valid are then used for ego-motion estimation which proves the sufficiency of our classification.

3.2.1 Contribution

The contribution in this paper is the experiments that validates the algorithmdescribed in paper A.

I was the main author of this paper. The development was a proof of con-cept implementation in software to verify the feasibility of the concept. I de-veloped all software for the experiments and performed them my self.

3.3 Paper C

Lens distortion correction is important in most image processing applications,it can both be used to improve the image before further processing is performedor more importantly correct the mapping between image space and the envi-ronment. This paper describes two simplified methods that targets the fieldprogrammable gate array (FPGA) implementations. The methods enable im-age stereo feature processing to be performed in the FPGA without prematurecommunication to a host system.

3.3 Paper C 41

3.3.1 Contribution

The contribution in this paper is two novel methods for feature location cor-rection designed for FPGA implementation with different requirements on re-sources and precision.

I was the main author of this paper. I took part in designing the radial onlydistortion correction algorithm and designed the other algorithm my self. Iimplemented the algorithms and performed all experiments my self. The radialonly algorithm was mainly designed and tested in software by two mastersstudents, Robert Håkman Grann and Kenneth Ericsson.

Page 59: Stereo vision algorithms in reconfigurable hardware for robotics applications

40 Chapter 3. Summary of papers and their contribution

suspension of the exclusion/inclusion algorithm and thus allowing a maximumexecution time.

The landmarks that supports the motion are latter used for ego-motion es-timation.

I was the main author of this paper, the concept was developed by me andFredrik Ekstrand. I developed software for interfacing the FPGA system andperformed the experiments.

3.2 Paper B

This paper describes controlled experiments based on the algorithm presentedin paper A. A stereo camera is positioned vertically with the cameras facingthe ceiling. Features are extracted and the camera moved different distancesby hand, with unknown uncertainty, and the spurious matching is executed onthe data sets. The landmarks that are classified as valid are then used for ego-motion estimation which proves the sufficiency of our classification.

3.2.1 Contribution

The contribution in this paper is the experiments that validates the algorithmdescribed in paper A.

I was the main author of this paper. The development was a proof of con-cept implementation in software to verify the feasibility of the concept. I de-veloped all software for the experiments and performed them my self.

3.3 Paper C

Lens distortion correction is important in most image processing applications,it can both be used to improve the image before further processing is performedor more importantly correct the mapping between image space and the envi-ronment. This paper describes two simplified methods that targets the fieldprogrammable gate array (FPGA) implementations. The methods enable im-age stereo feature processing to be performed in the FPGA without prematurecommunication to a host system.

3.3 Paper C 41

3.3.1 Contribution

The contribution in this paper is two novel methods for feature location cor-rection designed for FPGA implementation with different requirements on re-sources and precision.

I was the main author of this paper. I took part in designing the radial onlydistortion correction algorithm and designed the other algorithm my self. Iimplemented the algorithms and performed all experiments my self. The radialonly algorithm was mainly designed and tested in software by two mastersstudents, Robert Håkman Grann and Kenneth Ericsson.

Page 60: Stereo vision algorithms in reconfigurable hardware for robotics applications

Chapter 4

Conclusions and Future

work

4.1 Conclusions

Robots are extremely interesting, especially as a concept. The idea of a me-chanic device that can think and act similar to a human or at least an animal isstaggering. Even though a modern robot can not be completely disguised as ahuman, research has come a long way and there are, as described earlier in thisthesis, already a wide variety of applications where robots are used already.

Most robots are still using simple sensors for navigation and other complextasks, one reason is of course price, but also the complexity of more versatilesensors like cameras. An important step to enable vision systems for morerobot applications is to find better hardware for image processing tasks. FPGAshas the potential to be that system, but more work in the area is needed. Thereneeds to be development models that can shorten development time throughincreased quality. There will always be a demand of resource efficient solutionsso that more algorithms can be put on a device.

Already today it is almost as simple to work with FPGAs as it is to workwith embedded computer systems. And this thesis discusses solutions to im-portant vision algorithms and how they can be implemented in an FPGA, i.e.

distortion correction. And in the future there will be design methods and au-tomated tools that will do the bulk of the work. Vision is the future for robotsand the future is now.

43

Page 61: Stereo vision algorithms in reconfigurable hardware for robotics applications

Chapter 4

Conclusions and Future

work

4.1 Conclusions

Robots are extremely interesting, especially as a concept. The idea of a me-chanic device that can think and act similar to a human or at least an animal isstaggering. Even though a modern robot can not be completely disguised as ahuman, research has come a long way and there are, as described earlier in thisthesis, already a wide variety of applications where robots are used already.

Most robots are still using simple sensors for navigation and other complextasks, one reason is of course price, but also the complexity of more versatilesensors like cameras. An important step to enable vision systems for morerobot applications is to find better hardware for image processing tasks. FPGAshas the potential to be that system, but more work in the area is needed. Thereneeds to be development models that can shorten development time throughincreased quality. There will always be a demand of resource efficient solutionsso that more algorithms can be put on a device.

Already today it is almost as simple to work with FPGAs as it is to workwith embedded computer systems. And this thesis discusses solutions to im-portant vision algorithms and how they can be implemented in an FPGA, i.e.

distortion correction. And in the future there will be design methods and au-tomated tools that will do the bulk of the work. Vision is the future for robotsand the future is now.

43

Page 62: Stereo vision algorithms in reconfigurable hardware for robotics applications

44 Chapter 4. Conclusions and Future work

4.2 Future work

My strong belief is that the CBSE paradigm can be adopted to improve FPGAsystems development. The CBSE paradigm is very interesting with a lot of re-search going on and has already been adopted in industry. It would be interest-ing to do research on component based systems engineering on heterogeneoussystems, with different kinds of computing devices.

More urgent future work involves hardware abstraction layers, a layer thathides system components such as physical component communication, i.e.

Ethernet PHY-component configuration and communication should be hiddenfrom the main system.

More algorithms needs to be implemented for our system. Interesting workin progress is a feature detector that combines a few standard feature detec-tors that runs in parallel. By combining the results from the feature detectors,features may be defined better.

Implement a producer–subscriber system where the camera system pro-vides a number of different data types, i.e. Harris corners, Sobel edges, 3D-landmarks. A computer may then subscribe to one or more of these datastreams as it pleases. A producer–subscriber design allows multiple subscribersto the same information, and for example one system could subscribe to theLandmarks stream to run through a navigation algorithm, another system couldsubscribe to the Harris corners stream to run some optical flow algorithm.

Bibliography

[1] Jörgen Lidholm, Fredrik Ekstrand, and Lars Asplund. Two camera sys-tem for robot applications; navigation. In Emerging Technologies and

Factory Automation, 2008. ETFA 2008. IEEE International Conference

on, pages 345–352, September 2008.

[2] Jörgen Lidholm, Giacomo Spampinato, and Lars Asplund. Validation ofstereo matching for robot navigation. In 14th IEEE International Con-

ference on emerging Technologies and Factory Automation ETFA 2009,September 2009.

[3] Fredrik Ekstrand, Jörgen Lidholm, and Lars Asplund. Robotics for smes- 3d vision in real-time for navigation and object recognition. In 39th In-

ternational Symposium on Robotics (ISR 2008), pages 70–75. ISR 2008,October 2008.

[4] Giacomo Spampinato, Jörgen Lidholm, Fredrik Ekstrand, and Lars As-plund. Stereo vision based navigation for automated vehicles in industry.In 14th IEEE International Conference on emerging Technologies and

Factory Automation ETFA 2009, September 2009.

[5] Giacomo Spampinato, JÃurgen Lidholm, Carl Ahlberg, Fredrik Ekstrand,Mikael EkstrÃum, and Lars Asplund. An embedded stereo vision modulefor 6d pose estimation and mapping. In 2011 IEEE/RSJ International

Conference on Intelligent Robots and Systems, September 2011.

[6] Jean-Yves Bouguet. Camera calibration toolbox for matlab®. Technicalreport, http://www.vision.caltech.edu/bouguetj/calib_doc, 2009 [online].

[7] Janne Heikkila and Olli Silven. A four-step camera calibration procedurewith implicit image correction. In Proceedings of the 1997 Conference on

45

Page 63: Stereo vision algorithms in reconfigurable hardware for robotics applications

44 Chapter 4. Conclusions and Future work

4.2 Future work

My strong belief is that the CBSE paradigm can be adopted to improve FPGAsystems development. The CBSE paradigm is very interesting with a lot of re-search going on and has already been adopted in industry. It would be interest-ing to do research on component based systems engineering on heterogeneoussystems, with different kinds of computing devices.

More urgent future work involves hardware abstraction layers, a layer thathides system components such as physical component communication, i.e.

Ethernet PHY-component configuration and communication should be hiddenfrom the main system.

More algorithms needs to be implemented for our system. Interesting workin progress is a feature detector that combines a few standard feature detec-tors that runs in parallel. By combining the results from the feature detectors,features may be defined better.

Implement a producer–subscriber system where the camera system pro-vides a number of different data types, i.e. Harris corners, Sobel edges, 3D-landmarks. A computer may then subscribe to one or more of these datastreams as it pleases. A producer–subscriber design allows multiple subscribersto the same information, and for example one system could subscribe to theLandmarks stream to run through a navigation algorithm, another system couldsubscribe to the Harris corners stream to run some optical flow algorithm.

Bibliography

[1] Jörgen Lidholm, Fredrik Ekstrand, and Lars Asplund. Two camera sys-tem for robot applications; navigation. In Emerging Technologies and

Factory Automation, 2008. ETFA 2008. IEEE International Conference

on, pages 345–352, September 2008.

[2] Jörgen Lidholm, Giacomo Spampinato, and Lars Asplund. Validation ofstereo matching for robot navigation. In 14th IEEE International Con-

ference on emerging Technologies and Factory Automation ETFA 2009,September 2009.

[3] Fredrik Ekstrand, Jörgen Lidholm, and Lars Asplund. Robotics for smes- 3d vision in real-time for navigation and object recognition. In 39th In-

ternational Symposium on Robotics (ISR 2008), pages 70–75. ISR 2008,October 2008.

[4] Giacomo Spampinato, Jörgen Lidholm, Fredrik Ekstrand, and Lars As-plund. Stereo vision based navigation for automated vehicles in industry.In 14th IEEE International Conference on emerging Technologies and

Factory Automation ETFA 2009, September 2009.

[5] Giacomo Spampinato, JÃurgen Lidholm, Carl Ahlberg, Fredrik Ekstrand,Mikael EkstrÃum, and Lars Asplund. An embedded stereo vision modulefor 6d pose estimation and mapping. In 2011 IEEE/RSJ International

Conference on Intelligent Robots and Systems, September 2011.

[6] Jean-Yves Bouguet. Camera calibration toolbox for matlab®. Technicalreport, http://www.vision.caltech.edu/bouguetj/calib_doc, 2009 [online].

[7] Janne Heikkila and Olli Silven. A four-step camera calibration procedurewith implicit image correction. In Proceedings of the 1997 Conference on

45

Page 64: Stereo vision algorithms in reconfigurable hardware for robotics applications

46 Bibliography

Computer Vision and Pattern Recognition (CVPR ’97), CVPR ’97, pages1106–1112, Washington, DC, USA, 1997. IEEE Computer Society.

[8] Duane C. Brown. Close-range camera calibration. PHOTOGRAMMET-

RIC ENGINEERING, 37(8):855–866, 1971.

[9] C. Harris and M. Stephens. A combined corner and edge detection.In Proceedings of The Fourth Alvey Vision Conference, pages 147–151,1988.

[10] Hans Moravec. Towards automatic visual obstacle avoidance. In Pro-

ceedings of the 5th International Joint Conference on Artificial Intelli-

gence, page 584, August 1977.

[11] David G. Lowe. Distinctive image features from scale-invariant key-points. Int. J. Comput. Vision, 60(2):91–110, 2004.

[12] Herbert Bay, Tinne Tuytelaars, and Luc J. Van Gool. Surf: Speeded uprobust features. In Ales Leonardis, Horst Bischof, and Axel Pinz, editors,ECCV (1), volume 3951 of Lecture Notes in Computer Science, pages404–417. Springer, 2006.

[13] Andrew Johnson. Spin-Images: A Representation for 3-D Surface Match-

ing. PhD thesis, Robotics Institute, Carnegie Mellon University, Pitts-burgh, PA, August 1997.

[14] Krystian Mikolajczyk and Cordelia Schmid. A performance evaluationof local descriptors. IEEE Transactions on Pattern Analysis & Machine

Intelligence, 27(10):1615–1630, 2005.

[15] S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris,M. Schuette, and A. Saidi. The reconfigurable streaming vector processor(RSVPTM). In Microarchitecture, 2003. MICRO-36. Proceedings. 36th

Annual IEEE/ACM International Symposium on, pages 141 – 150, 2003.

[16] M. Gschwind, H.P. Hofstee, B. Flachs, M. Hopkin, Y. Watanabe, andT. Yamazaki. Synergistic processing in cell’s multicore architecture. Mi-

cro, IEEE, 26(2):10 –24, march-april 2006.

[17] Scott Sirowy and Alessandro Forin. Where’s the beef? why fpgas are sofast,. Technical report, Microsoft Research, 2008.

[18] Vanderlei Bonato, Eduardo Marques, and George A. Constantinides.A floating-point extended kalman filter implementation for autonomousmobile robots. J. Signal Process. Syst., 56(1):41–50, 2009.

[19] Celoxica. Handel-C Language Reference Manual, 2004.

[20] Bruce A. Draper, A. P. Willem BÃuhm, Jeff Hammes, Walid Najjar,J. Ross Beveridge, J. Ross, Charlie Ross, Monica Chawathe, Mitesh De-sai, JosÃl’ Bins, and Faculdade De InformÃatica. Compiling sa-c pro-grams to fpgas: Performance results. In In Proc. of the International

Conference on Vision Systems, pages 220–235. Springer-Verlag, 2001.

[21] A. Rowe, C. Rosenberg, and I. Nourbakhsh. A second generation low costembedded color vision system. In Computer Vision and Pattern Recogni-

tion - Workshops, 2005. CVPR Workshops. IEEE Computer Society Con-

ference on, page 136, june 2005.

[22] B. Khaleghi, S. Ahuja, and Q. Wu. An improved real-time miniaturizedembedded stereo vision system (mesvs-ii). In Computer Vision and Pat-

tern Recognition Workshops, 2008. CVPRW ’08. IEEE Computer Society

Conference on, pages 1 –8, june 2008.

[23] N. Sawasaki, M. Nakao, Y. Yamamoto, and K. Okabayashi. Embeddedvision system for mobile robot navigation. In Robotics and Automation,

2006. ICRA 2006. Proceedings 2006 IEEE International Conference on,pages 2693 –2698, may 2006.

[24] Ivica Crnkovic and Magnus Larsson. Building Reliable Component-

Based Software Systems. Artech House, Inc., Norwood, MA, USA, 2002.

[25] C. T. Johnston, K. T. Gribbon, and D. G. Bailey. Implementing ImageProcessing Algorithms on FPGAs. In Proceedings of the Eleventh Elec-

tronics New Zealand Conference, ENZCon’04, Palmerston North, pages118–123, 2004.

[26] Jiri Gaisler. A structured vhdl design method, 2004.

Page 65: Stereo vision algorithms in reconfigurable hardware for robotics applications

46 Bibliography

Computer Vision and Pattern Recognition (CVPR ’97), CVPR ’97, pages1106–1112, Washington, DC, USA, 1997. IEEE Computer Society.

[8] Duane C. Brown. Close-range camera calibration. PHOTOGRAMMET-

RIC ENGINEERING, 37(8):855–866, 1971.

[9] C. Harris and M. Stephens. A combined corner and edge detection.In Proceedings of The Fourth Alvey Vision Conference, pages 147–151,1988.

[10] Hans Moravec. Towards automatic visual obstacle avoidance. In Pro-

ceedings of the 5th International Joint Conference on Artificial Intelli-

gence, page 584, August 1977.

[11] David G. Lowe. Distinctive image features from scale-invariant key-points. Int. J. Comput. Vision, 60(2):91–110, 2004.

[12] Herbert Bay, Tinne Tuytelaars, and Luc J. Van Gool. Surf: Speeded uprobust features. In Ales Leonardis, Horst Bischof, and Axel Pinz, editors,ECCV (1), volume 3951 of Lecture Notes in Computer Science, pages404–417. Springer, 2006.

[13] Andrew Johnson. Spin-Images: A Representation for 3-D Surface Match-

ing. PhD thesis, Robotics Institute, Carnegie Mellon University, Pitts-burgh, PA, August 1997.

[14] Krystian Mikolajczyk and Cordelia Schmid. A performance evaluationof local descriptors. IEEE Transactions on Pattern Analysis & Machine

Intelligence, 27(10):1615–1630, 2005.

[15] S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris,M. Schuette, and A. Saidi. The reconfigurable streaming vector processor(RSVPTM). In Microarchitecture, 2003. MICRO-36. Proceedings. 36th

Annual IEEE/ACM International Symposium on, pages 141 – 150, 2003.

[16] M. Gschwind, H.P. Hofstee, B. Flachs, M. Hopkin, Y. Watanabe, andT. Yamazaki. Synergistic processing in cell’s multicore architecture. Mi-

cro, IEEE, 26(2):10 –24, march-april 2006.

[17] Scott Sirowy and Alessandro Forin. Where’s the beef? why fpgas are sofast,. Technical report, Microsoft Research, 2008.

[18] Vanderlei Bonato, Eduardo Marques, and George A. Constantinides.A floating-point extended kalman filter implementation for autonomousmobile robots. J. Signal Process. Syst., 56(1):41–50, 2009.

[19] Celoxica. Handel-C Language Reference Manual, 2004.

[20] Bruce A. Draper, A. P. Willem BÃuhm, Jeff Hammes, Walid Najjar,J. Ross Beveridge, J. Ross, Charlie Ross, Monica Chawathe, Mitesh De-sai, JosÃl’ Bins, and Faculdade De InformÃatica. Compiling sa-c pro-grams to fpgas: Performance results. In In Proc. of the International

Conference on Vision Systems, pages 220–235. Springer-Verlag, 2001.

[21] A. Rowe, C. Rosenberg, and I. Nourbakhsh. A second generation low costembedded color vision system. In Computer Vision and Pattern Recogni-

tion - Workshops, 2005. CVPR Workshops. IEEE Computer Society Con-

ference on, page 136, june 2005.

[22] B. Khaleghi, S. Ahuja, and Q. Wu. An improved real-time miniaturizedembedded stereo vision system (mesvs-ii). In Computer Vision and Pat-

tern Recognition Workshops, 2008. CVPRW ’08. IEEE Computer Society

Conference on, pages 1 –8, june 2008.

[23] N. Sawasaki, M. Nakao, Y. Yamamoto, and K. Okabayashi. Embeddedvision system for mobile robot navigation. In Robotics and Automation,

2006. ICRA 2006. Proceedings 2006 IEEE International Conference on,pages 2693 –2698, may 2006.

[24] Ivica Crnkovic and Magnus Larsson. Building Reliable Component-

Based Software Systems. Artech House, Inc., Norwood, MA, USA, 2002.

[25] C. T. Johnston, K. T. Gribbon, and D. G. Bailey. Implementing ImageProcessing Algorithms on FPGAs. In Proceedings of the Eleventh Elec-

tronics New Zealand Conference, ENZCon’04, Palmerston North, pages118–123, 2004.

[26] Jiri Gaisler. A structured vhdl design method, 2004.