embedded distributed/parallel computing hardware for · pdf file ·...

5
Embedded distributed/parallel computing hardware for high school students Hannes Haljaste Institute of Computer Science University of Tartu [email protected] Abstract—This paper focuses on embedded distributed/parallel computing hardware, where microcontrollers are used as a central processing unit. It gives an overview of two controllers one with the best available performance at the time of writing and one for its ease of use to learn about. Different development environments are evaluated to be used for teaching distributed and parallel paradigms. Index Terms—Embedded computing, distributed computing, parallel computing, Cortex-M7, Arduino, high school I. I NTRODUCTION Building distributed/parallel computing hardware is expen- sive and therefore is usually not available for hobbyists and high school students. With more research and development going towards these fields, raising interest in younger people is essential. This paper tries to find the best solution to use in a way that could demonstrate distributed and parallel paradigms before university education in high schools and for hobbyists how might have an interest in the field. For this reason two different microcontrollers and their most likely development environments are evaluated to find the best and to develop prototype electronics and software for demonstration. II. HARDWARE For the embedded distributed/parallel computing hardware two different microcontrollers are considered - Microchip ATSAME70 and ATmega328P. Both of these devices have different features besides central processing unit which makes them very useful in different applications like sensor networks, robotics, self driving cars, aerospace etc. One of the most useful feature is that they can be configured to consume very low power based on the performance that is needed at the time. On a small battery these controllers can stay in sleep mode for years. This opens up a lot of possibilities where high performance computing is needed occasionally and/or where system’s power consumption is limited. The ATSAME70’s central processing unit is based on the 32-bit Cortex-M7 core running at up to 300 MHz. It also features various hardware peripherals like the Advanced Encryption Standard (AES), True Random Number Generator (TRNG), Universal Serial Bus (USB), Serial Peripheral Interface (SPI), Two Wire Interface (TWI) also known as Inter-Integrated Circuit (I2C) and Universal Synchronous/Asynchronous Receiver/Transmit- ter (USART) among others. The second microcontroller has a 8-bit core and is running at up to 20 MHz. It also has ATSAME70 [4] ATmega328P [3] Core 32-bit 8-bit CPU design RISC RISC Core clock 300 MHz 20 MHz Floating point support Hardware Software Flash memory 2 MB 32 KB RAM 384 KB 2 KB Maximum power consumption 300 mW 20 mW Sleep mode power consumption 19 μW 4 μW Other DSP instructions - Peripherals AES SPI TRNG TWI Ethernet MAC USART USB SPI TWI USART TABLE I COMPARISON BETWEEN TWO MICROCONTROLLERS. Iterations ATSAME70 at 300 MHz ATmega328P at 16 MHz 10000 0.030 s 2.331 s 100000 0.302 s 23.311 s 1000000 3.021 s 233.111 s TABLE II PI CALCULATION RESULTS USING MONTE CARLO METHOD IN SECONDS. SPI, TWI and USART communication peripherals to transmit and receive data. Compared by the power consumption both of these microcontrollers consume about the same amount of power per 1 MHz. While in sleep mode they consumes couple of μW. On a small 100 mAh battery ATmega328P can stay in sleep mode for about 14 years while not taking into consideration other factors. [3] [4] To compare actual performance of the two microcontrollers a simple calculation test was written. Table II shows the result of using Monte Carlo Method to calculate Pi for both microcontrollers. ATmega328P was clock at 16 MHz since this is directly supported by the Arduino and its millis() function to measure time. For ATSAME70 time measurement was implemented on the hardware timers since specific function to measure processor time was not available. Results show that ATSAME70 was about 77 times faster in executing the algorithm while the clock speed was only 18.75 times faster. This is mostly thanks to hardware floating point unit and 32-bit core that helps speed up calculations. For ATSAME70 code was compiled with no optimizations.

Upload: truongthuy

Post on 07-Mar-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Embedded distributed/parallel computing hardware for · PDF file · 2017-07-13Embedded distributed/parallel computing hardware for high school students ... SPI speed is usually the

Embedded distributed/parallel computing hardwarefor high school students

Hannes HaljasteInstitute of Computer Science

University of [email protected]

Abstract—This paper focuses on embedded distributed/parallelcomputing hardware, where microcontrollers are used as acentral processing unit. It gives an overview of two controllersone with the best available performance at the time of writingand one for its ease of use to learn about. Different developmentenvironments are evaluated to be used for teaching distributedand parallel paradigms.

Index Terms—Embedded computing, distributed computing,parallel computing, Cortex-M7, Arduino, high school

I. INTRODUCTION

Building distributed/parallel computing hardware is expen-sive and therefore is usually not available for hobbyists andhigh school students. With more research and developmentgoing towards these fields, raising interest in younger peopleis essential. This paper tries to find the best solution to use in away that could demonstrate distributed and parallel paradigmsbefore university education in high schools and for hobbyistshow might have an interest in the field. For this reason twodifferent microcontrollers and their most likely developmentenvironments are evaluated to find the best and to developprototype electronics and software for demonstration.

II. HARDWARE

For the embedded distributed/parallel computing hardwaretwo different microcontrollers are considered - MicrochipATSAME70 and ATmega328P. Both of these devices havedifferent features besides central processing unit which makesthem very useful in different applications like sensor networks,robotics, self driving cars, aerospace etc. One of the mostuseful feature is that they can be configured to consume verylow power based on the performance that is needed at thetime. On a small battery these controllers can stay in sleepmode for years. This opens up a lot of possibilities where highperformance computing is needed occasionally and/or wheresystem’s power consumption is limited. The ATSAME70’scentral processing unit is based on the 32-bit Cortex-M7core running at up to 300 MHz. It also features varioushardware peripherals like the Advanced Encryption Standard(AES), True Random Number Generator (TRNG), UniversalSerial Bus (USB), Serial Peripheral Interface (SPI), Two WireInterface (TWI) also known as Inter-Integrated Circuit (I2C)and Universal Synchronous/Asynchronous Receiver/Transmit-ter (USART) among others. The second microcontroller hasa 8-bit core and is running at up to 20 MHz. It also has

ATSAME70 [4] ATmega328P [3]Core 32-bit 8-bit

CPU design RISC RISCCore clock 300 MHz 20 MHz

Floating point support Hardware SoftwareFlash memory 2 MB 32 KB

RAM 384 KB 2 KBMaximum power consumption 300 mW 20 mW

Sleep mode power consumption 19 µW 4 µWOther DSP instructions -

Peripherals AES SPITRNG TWI

Ethernet MAC USARTUSBSPITWI

USART

TABLE ICOMPARISON BETWEEN TWO MICROCONTROLLERS.

Iterations ATSAME70 at 300 MHz ATmega328P at 16 MHz10000 0.030 s 2.331 s100000 0.302 s 23.311 s

1000000 3.021 s 233.111 s

TABLE IIPI CALCULATION RESULTS USING MONTE CARLO METHOD IN SECONDS.

SPI, TWI and USART communication peripherals to transmitand receive data. Compared by the power consumption bothof these microcontrollers consume about the same amountof power per 1 MHz. While in sleep mode they consumescouple of µW. On a small 100 mAh battery ATmega328P canstay in sleep mode for about 14 years while not taking intoconsideration other factors. [3] [4]

To compare actual performance of the two microcontrollersa simple calculation test was written. Table II shows theresult of using Monte Carlo Method to calculate Pi for bothmicrocontrollers. ATmega328P was clock at 16 MHz since thisis directly supported by the Arduino and its millis() functionto measure time. For ATSAME70 time measurement wasimplemented on the hardware timers since specific functionto measure processor time was not available. Results showthat ATSAME70 was about 77 times faster in executing thealgorithm while the clock speed was only 18.75 times faster.This is mostly thanks to hardware floating point unit and 32-bitcore that helps speed up calculations. For ATSAME70 codewas compiled with no optimizations.

Page 2: Embedded distributed/parallel computing hardware for · PDF file · 2017-07-13Embedded distributed/parallel computing hardware for high school students ... SPI speed is usually the

III. COMMUNICATION [3] [4]

While building multi-microcontroller distributed system forhigh performance computing there aren’t available any com-munication standards that directly support it like on somehigh end CPUs. For example Intel has developed its ownpoint-to-point interconnect called QuickPath. Microprocessorscan exchange data at speeds up to 19.2 GB/s [7]. For highperformance computing ARM has developed CoreLink thatallows connecting up to 32 multi-core processor on a samebus with bandwidth exceeding 1 TB/s [8].

Most of the microcontrollers support various communica-tion interfaces that allow data to be passed between differentdevices. The speed how fast data can be transmitted andreceived is mostly determined by the clock sources supportedby the devices or the protocol used. Less powerful microcon-trollers which have slower clocks have their communicationspeeds limited. For example ATmega328P has its maximumcommunication speed limited for SPI by half the main clocksource giving the maximum transfer rate up to 10 Mbits/s.On the other hand ATSAME70 can theoretically run SPI at150 Mbits/s. This section will give a better overview of thecommunication protocols available while designing distribut-ed/parallel embedded system.

One of the most versatile communication interfaces wouldbe bit-banging. This means that microcontroller manipulatescommunication lines in software and can create any protocolto transit data. Since all this must be done in software, itmakes it very complicated, hard to implement and consumesvaluable resource needed to perform computations. A betteroption would be using one of the communication interfacesimplemented into the microcontroller that does most of thetrivial signaling while the central processing unit can performits own tasks.

One of those communication interfaces is an Inter-IntegratedCircuit (I2C). It uses two lines - clock (SCL) and data (SDA)line. The protocol was created by Philips Semiconductor(currently NXP Semiconductor) and is wildly used by mostof the microcontroller manufacturers. It has fixed clock speedsof 100 kHz and 400 kHz on most devices while higher clockspeeds are only supported by some higher end devices goingup to 5 MHz. I2C can support at least 7-bit address space,allowing up to 127 devices to be connected together on thesame bus. The more advanced microcontroller can support upto 10-bit address space supporting up to 1023 devices on asingle bus. Because of its design, I2C bus is recommended tobe used only on a single printed circuit board (PCB). Its slowcommunication speed limits what algorithms can be efficientlyrun on such a distributed/parallel system. [6]

While I2C is half duplex the Serial Peripheral Interface(SPI) can communicate in full duplex mode. It uses threelines for data transfer, clock (SCLK), master in slave out(MISO) and master out slave in (MOSI), and one additionalchip select (CS) line for each slave device. SPI can be used inmulti-master mode but this requires additional signaling linesto make sure that no collisions happen on the bus. At the

same time I2C is multi-master by design. Requirement for chipselect line for each slave limits the amount of microcontrollersthat can be implemented on the same bus since the pin count isphysically limited by the package used for the microcontroller.SPI speed is usually the fastest that microcontrollers support.On ATSAME70 SPI can run at a clock rate up to 150 MHz,while ATmega328P can support speeds up to 10 MHz.

Next very popular communication interface is UniversalSynchronous/Asynchronous Receiver/Transmitter (USART).As a standalone interface, it allows only point to point commu-nication. To support multi-master or master-slave communica-tion additional hardware must be used. One of those is RS485which allows multiple devices with USART to be connectedto the same bus. For communication USART uses two lines,one for transmitting (TXD) and one for receiving (RXD) datawhile RS485 uses two lines in half duplex mode and four linesin full duplex mode. Since RS485 is not directly supported,nor is communication protocol available then the user mustprogram this part in software.

Some of the more advanced communication protocols areonly available on more advanced microcontrollers. One of suchprotocols is Ethernet, which is supported by microcontrollersonly on the MAC level like it is on ATSAME70. The additionalintegrated circuit is required to transmit and receive electricalsignals. With a lot of research and advancement going on in thefield of Internet of Thing (IoT) there is available SPI to Ether-net converters which can be used with the ATmega328P. WhileATSAME70 can reach 100 Mbits/s the ATmega328P is limitedby its SPI speed which is 10 Mbits/s. Ethernet uses IPv4,which can practically support unlimited amount of deviceson the same local network. Additionally Ethernet also needsswitches to allow communication between different devices.The biggest problem with Ethernet is its power consumption.It usually is around 100 mA to 500 mA for five devices.This is way over the maximum what microcontrollers use.The biggest power consumer is magnetics that protects themfrom overvoltage and noise. If the distributed/parallel systemis designed and implemented on a single PCB then magneticscan be replaced by using only capacitors to reduce powerconsumption [2]. This requires special PCB design and testing.Although there is an IEEE Work group maintaining backplaneEthernet standard IEEE 802.3ap, devices that support thisstandard are not widely commercially available.

Other communication standards that can be found on somedevices is controller area network (CAN) that is mostly usedin car industry. Although it is slow it is very robust. Oneoption is also to build wireless distributed/parallel computingnetwork. Recently different manufacturers have come out withsome low energy wireless network modules all based on IEEE802.15.4 technical standard. It supports IPv6 and its networkis self healing so losing some of the nodes does not make itinoperable.

Based on the microcontroller capabilities and performance,ATSAME70 seems like a clear choice but for the high schoolstudent and hobbyist others factors are also very important.The most important is that the concepts are easy to understand

Page 3: Embedded distributed/parallel computing hardware for · PDF file · 2017-07-13Embedded distributed/parallel computing hardware for high school students ... SPI speed is usually the

and that the code development isn’t frustrating and hard tounderstand.

IV. SOFTWARE DEVELOPMENT

There are a lot of software development tools available.Finding the right one for a particular task may be hardand confusing. For ATSAME70 and ATmega328P there isintegrated development environment (IDE) developed by themicrocontroller manufacturer itself called Atmel studio (Atmelwas acquired by Microchip). Microchip also develops a hard-ware abstraction layer called Advanced Software Framework(ASF) for both microcontrollers to make developing firmwarefor the devices easier. For ATmega328P there is also supportby Arduino, who has developed its own IDE and hardwareabstraction layer. Next two examples demonstrate their differ-ence.

i f ( p m c i s p e r i p h c l k e n a b l e d ( ID PIOD ) == 0){

p m c e n a b l e p e r i p h c l k ( ID PIOD ) ;}p i o s e t i n p u t ( PIOD , PIO PD11 , 0 ) ;i n t v a l u e = p i o g e t ( PIOD , 0 , PIO PD11 ) ;

Listing 1. Advanced software framework code example. Setting pin as aninput and reading its value.

pinMode (3 INPUT ) ;i n t v a l u e = d i g i t a l R e a d ( 3 )

Listing 2. Arduino code example. Setting pin as an input and reading itsvalue where 3 is pin number.

A. Atmel studio and advanced software framework

Atmel studio is built on Microsoft Visual Studio and sup-ports some of its features. The most important feature is thatMicrochip directly develops the environment and starting aproject in this is very easy and does not require more advancedknowledge about hardware design, compilers and linkers to setit up. Using the advanced software framework (ASF) is notvery straight forward and requires knowledge about registers,individual bits and how to use them correctly. ASF makes codewriting easier and better to understand, but does not replace theactual deep knowledge needed to program microcontrollers.Atmel Studio and ASF include many example projects anddemos to help developers develop software.

B. Arduino

ATmega328P is supported microcontroller by Arduino opensource hardware initiative, which makes programming veryeasy and straight forward with a huge amount of examplesand wide community support. Arduino has put more focuson making software writing easy and straight forward forsimple projects. All of its hardware designs are open sourcewhich makes creating electric circuits and PCB designs verysimple and does not require deep knowledge about the actualmicrocontroller. For PCB designing and breadboard design Ar-duino does not support any specific design environment. EaglePCB has been the one that open source community mostlyuses. By focusing on the software side, Arduino has createda hardware abstraction layer which completely hides registers

Fig. 1. Fritzing breadboard design view.

and individual bits from the user making programming veryeasy to learn by high school students, hobbyists and beginners.Because of its simplicity and quick development cycle someof the companies have used Arduino to bring their productsto the market.

V. FRITZING

Fritzing is another open source hardware initiative thatmakes hardware design and software writing easy for hobbyistand beginners. Its focus is more on developing hardwareschematics, breadboard layouts and PCB designs. For a smallproject, it is a viable tool to use although it can’t currentlyreally compete with tool that are developed by professionalcompanies and offer free version of their product for hobbyistlike Eagle PCB or long developed KiCad by the community.Fritzing features four different views (breadboard, schematic,PCB and code) which are connected in the background.Making changes in the schematic view will be shown in thebreadboard and PCB view which makes designing electroniccircuits quite easy. Fritzing already includes most of theArduino hardware, but it also allows adding new componentslike ATSAME70 to build electronics.

VI. PROTOTYPE DESIGN

Some Estonian schools offer optional courses in elec-tronics, programming and robotics. These courses are veryshort compared to the courses offered in universities. Themost important thing is to teach student about parallel anddistributed paradigms. This can best achieved by makinghardware and software design as simple as possible. Thedevelopment environment must be simple to use and softwarewriting must be easy to understand. For this reason Arduinowas chosen as a platform from which to develop hardware andsoftware demonstrations.

As a demonstration a schematic and PCB was designed inFritzing to show how to build multi-microcontroller distributedsystem and perform parallel computations. Printed circuitboard (PCB) features two ATmega328P microcontrollers, each

Page 4: Embedded distributed/parallel computing hardware for · PDF file · 2017-07-13Embedded distributed/parallel computing hardware for high school students ... SPI speed is usually the

Fig. 2. Fritzing schematic design view with a part of the prototype embeddedparallel computing hardware.

Fig. 3. Fritzing PCB design view with a prototype embedded parallelcomputing hardware.

Fig. 4. Prototype electronics for high schools.

of these controllers have its own USART to USB convertersto program and debug the software. Also SPI interface isavailable to burn the bootloader on the chip. The board alsoincludes a reset button for each microcontroller and othernecessary components for it to work. Both microcontrollershave been connected together using I2C bus such that datacan be passed between the two microcontrollers.

Current hardware design has some limitations that shouldbe eliminated in the next version. These additional featureswould allow users to better understand how software worksand extend the possibilities what can be done. One of themost useful features on official Arduino boards is pin head-ers. They allow connecting different external devices to themicrocontroller. All unused pins should be connected to aadditional header. Also there should be one header for I2Cbus so that multiple boards could be connected together toallow extending parallel/distributed system. For visual feed-back at least one light emitting diode (LED) should be addedper microcontroller. Current design features FTDI chips tocommunicate over USB. These chips are expensive and forthis reason more cost effective solution should be considered.

While designing hardware there were also shortcoming inthe software. Fritzing 0.9.3 seems to be a perfect tool touse, but it suffers from the fact that it is open source anddeveloped by the community. As such, it lacks some basicfeatures that most paid and long developed tools have. Fromthe PCB development side, Fritzing lacks the ability to drawwires in a 90◦ and 45 ◦ angle which is the most basic feature inother PCB design tools. Also the PCB view rendering is slowand makes routing time consuming and annoying. A supportfor GPU should be added to help render graphics. One newview that might be very helpful to add is circuit simulatorslike the Falstad circuit simulator. This would help studentsmeasure different values of resistors, see LED’s blinking anddesign more complex circuits all in one environment.

VII. USE CASE IN HIGH SCHOOLS

In electronics and programming courses Fritzing can beused as a tool to help students to better understand howdifferent levels in hardware and software design are inter-twined. As a first step students can use schematic view to

Page 5: Embedded distributed/parallel computing hardware for · PDF file · 2017-07-13Embedded distributed/parallel computing hardware for high school students ... SPI speed is usually the

Iterations Single core Dual core10000 2.338 s 1.271 s

100000 23.367 s 11.787 s1000000 233.661 s 116.934 s

TABLE IIIPI CALCULATION RESULTS USING MONTE CARLO METHOD WITH SINGLE

AND DUAL MICROCONTROLLER SETUP.

build desired electronic circuits. After that they can buildbreadboard designs which would help them build it in the realworld and test if their design work. As a third step they cancreate their own PCB designs after which they have been givena good understanding how the actual hardware developmenttakes place. After hardware is designed they can program theArduino microcontrollers in the same environment without theneed to download or to get familiar with another tool. All thiscan take place through multiple courses and include a way tointroduce the topics of distributed and parallel computing earlyon. For example, students could learn about simple algorithmslike Pi calculation using Monte Carlo method and build adistributed system to do it faster.

VIII. COMPUTATION RESULTS

Built hardware was used to perform Monte Carlo Pi calcu-lations (Listing 3) with one and two microcontrollers. Two mi-crocontrollers were in master-slave configuration where masterreceived commands from computer and then divided the workload between the two microcontrollers. The results are shownin TABLE III. It can be seen that using two microcontrollerscut the computation time in half. This was expected since thework load is divided equally and the communication betweenthe microcontrollers is utilized only at the beginning and atthe end.

void m o n t e C a r l o P i ( unsigned long i t e r a t i o n s ,unsigned long ∗Mx,unsigned long ∗Nx )

{unsigned long M = 0 ;unsigned long N = 0 ;f o r ( ; i t e r a t i o n s != 0 ; i t e r a t i o n s −−){

double x = random ( 0 , 65536) / 6 5 5 3 6 . 0 ;double y = random ( 0 , 65536) / 6 5 5 3 6 . 0 ;i f ( ( x ∗ x + y ∗ y ) < 1){

++M;}e l s e{

++N;}

}∗Mx = M;∗Nx = N;

}

double g e t P i ( unsigned long M, unsigned long N){

re turn 4 . 0 ∗ M / ( double ) (M + N ) ;}

Listing 3. Monte Carlo Pi source code.

To improve the performance of the demonstration sourcecode, it can be written without using floating point mathwhich should more than double the actual performance. Theother idea would be using assembly language. It is hardwarespecific and requires a lot of experience to write efficient code.Usually it does not pay off while using microcontrollers wherehardware cost is low. Using assembly could reduce cost whenhigh performance computing centers are used where costs aremuch higher.

IX. CONCLUSION

The paper describes possible solutions to build an embeddeddistributed/parallel computing hardware. Comparison betweentwo different microcontrollers, different communication inter-faces and overview about available development environmentswere given. After picking Fritzing for hardware development,a demonstration hardware was designed, possible uses for highschool students were given and results using single and dualmicrocontroller setup to calculate Pi with Monte Carlo methodwere demonstrated. The results show that parallel/distributedparadigms can be demonstrated for high school students withsimple examples and affordable hardware. Next step is to setup curriculum and see how well students would understandthe concepts.

Project website:http://fritzing.org/projects/embedded-cluster-prototype

REFERENCES

[1] Liem Radita Tapaning Hesti. (2017).GEMM in Multicore Arduinos [Online].https://courses.cs.ut.ee/MTAT.08.020/2016 fall/uploads/Main/Gemm.pdf

[2] Texas Instrument. (2013). AN-1519 DP83848 PHYTERTransformerless Ethernet Operation [Online]. Available:http://www.ti.com/lit/an/snla088a/snla088a.pdf

[3] Microchip. (2016). ATmega328/P - Complete Datasheet [Online]. Avail-able: http://www.microchip.com/wwwproducts/en/ATmega328P

[4] Microchip. (2016). SAM E70 Datasheet [Online] Available:http://www.microchip.com/wwwproducts/en/ATSAME70Q21

[5] E.Vita. (2014) Arduino Nano-Rev3.2 [Online]. Available:https://www.arduino.cc/en/Main/arduinoBoardNano

[6] NXP. (2014). UM10204 I2C-bus specification and user manual [Online]Available: http://www.nxp.com/documents/user manual/UM10204.pdf

[7] Intel. (2009). An Introduction to the Intel QuickPath Interconnect [On-line]. Available: http://www.intel.com/content/www/us/en/io/quickpath-technology/quick-path-interconnect-introduction-paper.html

[8] ARM. (2017, May 10). CoreLink CMN-600 Coherent MeshNetwork [Online]. Available: https://www.arm.com/products/system-ip/interconnect/corelink-cmn-600.php