assignment 9

MSRSAS - Postgraduate Engineering and Management Programme - PEMP

i

ASSIGNMENT

Module Code ESD 532

Module Name Multi core Architecture and Programming

Course M.Sc. [Engg.] in Real Time Embedded Systems

Department Computer Engineering

Name of the Student Bhargav Shah

Reg. No CHB0910001

Batch Full-Time 2011.

Module Leader Padma Priya Dharishini P.

POSTGRADUATE ENGINEERING AND MANAGEMENT PROGRAMME – (PEMP)

M.S.Ramaiah School of Advanced Studies Postgraduate Engineering and Management Programmes(PEMP)

#470-P Peenya Industrial Area, 4th Phase, Peenya, Bengaluru-560 058

Tel; 080 4906 5555, website: www.msrsas.org


Multi core Architecture and Programming ii

Declaration Sheet Student Name Bhargav Shah

Reg. No CHB0910001

Course RTES Batch Full-Time 2011.

Batch Full-Time 2011

Module Code ESD 532

Module Title Multi Core Architecture and Programming

Module Date 06-02-2012 to 03-03-2012

Module Leader Padma Priya Dharishini P.

Extension requests: Extensions can only be granted by the Head of the Department in consultation with the module leader.

Extensions granted by any other person will not be accepted and hence the assignment will incur a penalty.

Extensions MUST be requested by using the ‘Extension Request Form’, which is available with the ARO.

A copy of the extension approval must be attached to the assignment submitted.

Penalty for late submission Unless you have submitted proof of mitigating circumstances or have been granted an extension, the

penalties for a late submission of an assignment shall be as follows:

• Up to one week late: Penalty of 5 marks

• One-Two weeks late: Penalty of 10 marks

• More than Two weeks late: Fail - 0% recorded (F)

All late assignments: must be submitted to Academic Records Office (ARO). It is your responsibility to

ensure that the receipt of a late assignment is recorded in the ARO. If an extension was agreed, the

authorization should be submitted to ARO during the submission of assignment.

To ensure assignment reports are written concisely, the length should be restricted to a limit

indicated in the assignment problem statement. Assignment reports greater than this length may

incur a penalty of one grade (5 marks). Each delegate is required to retain a copy of the assignment

report.

Declaration The assignment submitted herewith is a result of my own investigations and that I have conformed to the

guidelines against plagiarism as laid out in the PEMP Student Handbook. All sections of the text and

results, which have been obtained from other sources, are fully referenced. I understand that cheating and

plagiarism constitute a breach of University regulations and will be dealt with accordingly.

Signature of the student Date

Submission date stamp (by ARO)

Signature of the Module Leader and date Signature of Head of the Department and date


Multi core Architecture and Programming iii

M. S. Ramaiah School of Advanced Studies Postgraduate Engineering and Management Programme- Coventry University (UK)

Assessment Sheet

Department Computer Engineering

Course RTES Batch Full-Time 2011

Module Code ESD 532 Module Title Multi core Architecture and Programming

Module Leader Padma Priya Dharishini P. Module Completion

Date 03-03-2012

Student Name Bhargav Shah ID Number CHB0910001

Attendance Details Theory Laboratory Fine Paid (if any for shortage of attendance)

Remarks

Written Examination – Marks – Sheet (Assessor to Fill)

Q. No a b C d Total Remarks

1

2

3

4

5

6

Marks Scored for 100 Marks Scored out of 50

Result PASS FAIL

Assignment – Marks-Sheet (Assessor to Fill)

Part a b C d Total Remarks

A

B

C

Marks Scored for 100 Marks Scored out of 50

Result PASS FAIL

PMAR- form completed for student feedback (Assessor has to mark) Yes No

Overall-Result

Components Assessor Reviewer

Written Examination (Max 50) Pass / Fail

Assignment (Max 50) Pass / Fail

Total Marks (Max 100) (Before Late Penalty) Grade

Total Marks (Max 100) (After Late Penalty) Grade

IMPORTANT

1. The assignment and examination marks have to be rounded off to the nearest integer and entered in the respective fields

2. A minimum of 40% required for a pass in both assignment and written test individually

3. A student cannot fail on application of late penalty (i.e. on application of late penalty if the marks are below 40, cap at 40 marks)

Signature of Reviewer with date Signature of Module Leader with date


Multi core Architecture and Programming iv

Abstract

Multi-core processors may provide higher performance to current embedded processors to

support future embedded systems functionalities. According to the Industrial Advisor Board,

embedded systems will benefit from multi-core processors, as these systems are comprised by

mixed applications, i.e. applications with hard real-time constrains and without real-time constrains,

that can be executed into the same processor.

Moreover, the Industrial Advisor Board also stated that memory operations represent one of

the main bottlenecks that current embedded applications must face, being even more important than

the performance of the core that can suffer a degradation of 10-20% without really affecting overall

performance. We take profit of this fact by studying the effect of running several threads per core,

that is, we make the core multithreaded. And we also studied the effect of caches, which are a well

known technique in high performance computing to reduce the memory bottleneck.

Chapter 1 discuss on Arbitration schemes of Memory Access in Multicore Systems , what

are the types of arbitration schemes existed up to now which is the best one of them, what are the

challenging factors for these arbitration schemes in the present situation and finally short note on

the factors that support the proposed arbitration schemes.

Chapter 2 discuss about a multi-threaded concept of consumer and producer threads how are

going to share a common queue, how to prioritize the threads if we are sharing a common thread

and some test cases to test the scenarios.

Chapter 3 discuss about a different situation having 4 of producers with different queues and

single consumers and it will discuss about the changing priority levels of the consumer so that in

the conflicting condition with the consumer thread the producer will get the high priority to

execute.


Multi core Architecture and Programming v

Contents

Declaration Sheet ................................................................................................................................. ii

Abstract .............................................................................................................................................. iv

List of Figures ................................................................................................................................... vii

Symbols ............................................................................................................................................. vii

Nomenclature...................................................................................................................................viii

CHAPTER 1 ....................................................................................................................................... 9

Arbitration schemes of memory access in multi core ..................................................................... 9

1.1 Introduction ...........................................................................................................................9

1.2 Types of arbitration schemes .................................................................................................9

1.3 Challenges in arbitration schemes .......................................................................................10

1.4 Impact of the arbitration schemes on throughput and latency .................................................11

1.5 Proposal of better arbitration scheme with justification ..........................................................11

1.6 Conclusion ...............................................................................................................................12

CHAPTER 2 ..................................................................................................................................... 13

Development of Consumer Producer Application ........................................................................ 13

2.1 Introduction ..............................................................................................................................13

2.2 Sequence diagram ....................................................................................................................13

2.3 Development of parallelized program using Pthread/openMP ................................................14

2.4 Test cases and Testing results for scenario 1 ...........................................................................17

2.4.1 Test cases ........................................................................................................................................ 17

2.4.2 Testing results ................................................................................................................................. 18

2.5 Sequence diagram .............................................................................................................................. 19

2.6 Development of paralleled program using pthread/openMP ...................................................20

2.4 Test cases and Testing results for scenario 2 ...........................................................................23

2.4.1 Test cases ........................................................................................................................................ 23

2.4.2 Testing results ................................................................................................................................. 24

2.5 Conclusion ...............................................................................................................................25

CHAPTER 3 ..................................................................................................................................... 26

Development of Consumer Producer Application with extended priority concept ................... 26

3.1 Introduction ..............................................................................................................................26

3.2 Sequence diagram ................................................................................................................26

3.2 Development of designed application ......................................................................................27

3.3 Test cases and testing results for scenario 3 ........................................................................34

3.3.1 Test cases ........................................................................................................................................ 34

3.3.2 Documentation of the results ........................................................................................................ 35

3.5 Conclusion ...............................................................................................................................36

CHAPTER 4 ..................................................................................................................................... 37

4.1 Module Learning Outcomes .....................................................................................................37

4.2 Conclusion ...............................................................................................................................37

References ......................................................................................................................................... 38

Appendix-1 ........................................................................................................................................ 39

Appendix-2 ........................................................................................................................................ 40


Multi core Architecture and Programming vi

List of Tabl

Table 2. 1 Test cases for single producer single consumer ................................................................17

Table 2. 2 Test cases for single producer single consumer ................................................................23

Table 3. 1 Test cases for higher priority consumer thread .................................................................34


Multi core Architecture and Programming vii

List of Figures

Figure 2. 1 Sequence diagram for one producer and one consumer ..................................................14

Figure 2. 2 Including Libraries and files for scenario 1 .....................................................................14

Figure 2. 3 Deceleration of mutext and structures for scenario 1 ......................................................14

Figure 2. 4 Function to create new list for scenario 1 ........................................................................15

Figure 2. 5 Main function for Application of scenario 1 ...................................................................15

Figure 2. 6 Body of producer thread for scenario 1 ...........................................................................16

Figure 2. 7 Body of consumer thread for scenario 1 ..........................................................................17

Figure 2. 8 Producer thread is waiting for value in critical region ....................................................18

Figure 2. 9 Consumer thread is printing the value from inserted by producer thread .......................18

Figure 2. 10 Sequence diagram of three producer one consumer ......................................................19

Figure 2. 11 Including Libraries and files for scenario 2 ...................................................................20

Figure 2. 12 Deceleration of mutext and structures for scenario 2 ....................................................20

Figure 2. 13 Function to create new list for scenario 2 ......................................................................20

Figure 2. 14 Main function for application of scenario 2 ..................................................................21

Figure 2. 15 Body of producer thread for scenario 2 .........................................................................22

Figure 2. 16 Body of consumer thread for scenario 2 ........................................................................22

Figure 2. 17 Producer thread is waiting in critical region ..................................................................24

Figure 2. 18 Consumer threads is active after all the producer thread finish the critical region .......24

Figure3. 1 Sequence diagram for prioritized consumer thread ..........................................................27

Figure3. 2 Including library files for scenario 3 ................................................................................27

Figure3. 3 Declaration of constructive functions for scenario 3 ........................................................28

Figure3. 4 Decelaration of list pointers and location pointers ...........................................................28

Figure3. 5 Definition of constructive functions .................................................................................28

Figure3. 6 Declaration of thread function and synchronization objects ............................................29

Figure3. 7 Main function for application of scenario 2 ....................................................................29

Figure3. 8 First producer thread .........................................................................................................30

Figure3. 9 Second producer thread ....................................................................................................30

Figure3. 10 Third producer thread .....................................................................................................31

Figure3. 11 Fourth producer thread ...................................................................................................31

Figure3. 12 Consumer thread with highest priority queue .................................................................32

Figure3. 13 Continuation of consumer thread for second priority queue ..........................................33

Figure3. 14 Continuation of consumer thread for third priority queue .............................................33

Figure3. 15 Continuation of consumer thread for last priority queue ................................................34

Figure3. 16 Results of test cases ........................................................................................................35


Multi core Architecture and Programming viii

Nomenclature

WRR Weighted Round Robin

CMP Chip Multiprocessors

SDRAM Synchronous Dynamic Random Access Memory

DRR Deficit Round Robin

SRR Stratified Round Robin

PD Priority Division

PBS Priority Based Budget Scheduler

TDMA Time Division Multiple Access

CCSP Credit Controlled Static Priority

CHAPTER 1

Arbitration schemes of memory access in multi core

1.1 Introduction

The constraints of embedded systems in terms of power consumption, thermal dissipation, cost-

efficiency and performance can be met by using multi core processors (CMP or chip

multiprocessors). On typical medium size CMPs, the cores share a bus to the highest levels of the

memory hierarchy. In multi-core architectures, resources are often shared to reduce cost and

exchange information. An off-chip memory is one of the most common shared resources. SDRAM

is a popular off-chip memory currently used in cost sensitive and performance demanding

applications due to its low price, high data rate and large storage. An asynchronous refresh

operation and a dependence on the previous access make SDRAM access latency vary by an order

of magnitude. The main contribution of this report to critically compare throughput and latency for

available arbitration schemes of multi core. At the end justification for batter arbitration scheme is

derived from the analysis.

1.2 Types of arbitration schemes[1]

There have been many approaches to provide fairness, high throughput and worst case latency

bounds in the arbiter especially in the networks domain.

Weighted Round Robin (WRR) is a work conserving arbiter where cores are allocated a

number of slots within a round robin cycle depending on their bandwidth requirements. If a core

does not use its slot, the next active core in the round robin cycle is immediately assigned to

increase the throughput. Cores producing busty traffic benefit at the cost of cores which produce

uniform traffic.

Deficit Round Robin (DRR) assigns different slot sizes to each master according to its

bandwidth requirements and schedules them in a Round Robin (RR) fashion. Difference between

DRR and RR is that if a master cannot use its slot or part of its slot in the current cycle, the

remaining slot (deficit) is added into the next cycle. In the next cycle, the master can transfer up to

an amount of data equal to the sum of its slot size and the deficit. Thus, the DRR tries to avoid the

unfairness caused to uniform traffic generators in the WRR.

Stratified Round Robin (SRR) groups masters with alike bandwidth requirements into one

Class. After grouping masters into various classes ,two step arbitration is applied: interclass and

infraclass. The inter class scheduler schedules each class Fk once in 2k clock cycles. Hence, the

lesser the k, the often the class is scheduled. The intra class scheduler uses WRR mechanism to


Multi core Architecture and Programming 10

select the next master within the class. Due to more uniform distribution of bandwidth, SRR

reduces the worst case latencies compared to the WRR. However, to achieve low worst case latency

for a class Fk, k must be minimized which leads to over allocation.

Priority Division (PD) combines TDMA and static priorities to achieve guarantees and high

resource utilization. Instead of fixing TDMA slots statically, PD fixes priorities of each master

within the slot statically such that each master has at least one slot where it has the highest priority.

Thus, masters have guarantees equal to TDMA and unused slots are arbitrated based on static

priority to increase the resource utilization. This approach provides benefit over RR or WRR only if

the response time of the shared resource is fixed. In the case of variable response time (e.g.

SDRAM), this approach produces high worst case latencies.

. In Priority Based Budget Scheduler (PBS), masters are assigned fixed budgets of access in a

unit time (Replenishment Period). Moreover, masters are also assigned fixed priorities to resolve

conflicts. Budget relates to master's bandwidth requirements while priority relates to master's

latency requirements. Thus, coupling between latency and bandwidth is removed. The shared

resource is granted to the active master with the highest priority which still has a budget left. At the

beginning of a replenishment period, each master gets its original budget back.

Akesson et al introduce a Credit Controlled Static Priority (CCSP) arbiter. The CCSP also uses

priorities and budgets within the replenishment period. But, instead of using frame based

replenishment periods, masters are replenished incrementally for fine grade bandwidth assignment.

1.3 Challenges in arbitration schemes

The traditional shared bus arbitration schemes such as TMDA and round robin show the

several defects such as bus starvation, and low system performance. In strict priority scheduling the

higher priority packets can get the most of the bandwidth therefore the lower priority packets has to

wait for longer time for the resource allocation. This will cause starvation in lower priority packets.

In case of WRR and LARD regarding power consumption is that both of them always have their

servers turned ON even though some of them do not serve any requests. Therefore, they cannot

conserve any power; Weighted Round-Robin and Deficit Round-Robin are extensions that

guarantee each requestor a minimum service, proportional to an allocated rate, in a common

periodically repeating frame of fixed size. This type of frame-based rate regulation is similar to the

Deferrable Server, and suffers from an inherent coupling between allocation granularity and

latency, where allocation granularity is inversely proportional to the frame size. Larger frame size

results in finer allocation granularity, reducing over allocation, but at the cost of increased latencies

for all requestors. Another common example of frame-based scheduling disciplines is time-division



multiplexing that suffers from the additional disadvantage that it requires a schedule to be stored for

each configuration, which is very costly if the frame size or the number of use cases are large[2].

The above arbitration algorithms cannot handle the strict real-time requirements, so two-level

arbitration algorithm, which is called the RB_Lottery bus arbitration, has been developed which

will solve the impartiality, starvation and real-time problems, which exist in the Lottery method,

and reduces the average latency for bus requests[5]. Within hardware verifications, the proposed

arbiter processes higher operation frequency than the Lottery arbiter. Although the pays more

attention on the chip area and power consumptions than the Lottery arbiter and it also has less

average latency of bus requests than the Lottery arbitration.

1.4 Impact of the arbitration schemes on throughput and latency[4]

Each approach to provide fairness, high throughput and worst case latency bounds,

optimization of one factor degrades other factors. In Weighted Round Robin to provide low worst

case latency to any core, it has to be assigned more slots in the round robin cycle which leads to

over allocation. Deficit Round Robin (DRR) has very high latencies in the worst case. For example,

one master stays idle for a long time and gains high deficit. Afterwards, it contentiously requests

the shared resource. Since it has gained a high deficit, it will occupy the shared resource for a long

time incurring very high latencies to other masters.

Due to the presence of priorities, PBS is fair to high priority masters and unfair to low priority

masters. When all masters are executing HRTs (as outlined in the introduction), PBS results in large

WCETs for low priority masters. Credit Controlled Static Priority (CCSP) also due to the presence

of the priorities, large worst case execution time bounds for lower priority masters are produced.

1.5 Proposal of better arbitration scheme with justification

Stratified Round Robin is better when compare to the other arbitration, since it is a fair-

queuing packet scheduler which has good fairness and delay properties, and low complexity. It is

unique among all other schedulers of comparable complexity in that it provides a single packet

delay bound that is independent of the number of flows. Importantly, it is also enable a simple

hardware implementation, and thus fills a current gap between scheduling algorithms that have

provably good performance and those that are feasible and practical to implement in high-speed

routers. Interactive applications such as video and audio conferencing require the total delay

experienced by a packet in the network to be bounded on an end-to-end basis. The packet scheduler

decides the order in which packets are sent on the output link, and therefore determines the queuing

delay experienced by a packet at each intermediate router in the network. The Low complexity with



line rates increasing to 40 Gbps, it is critical that all packet processing tasks performed by routers,

including output scheduling, be able to operate in nanosecond time frames.

1.6 Conclusion

By critically comparing throughput and latency for available arbitration schemes of multi core,

Round Robin is better when compare to the other arbitration, since it is a fair-queuing packet

scheduler which has good fairness and delay properties, and low complexity, even there are lot

more negative aspects in round robin hope replacements would be done in future.



CHAPTER 2

Development of Consumer Producer Application 2.1 Introduction

Today, the world of software development is presented with a new challenge. To fully

leverage this new class of multi-core hardware, software developers must change the way they

create applications. By turning their focus to multi-threaded applications, developers will be able to

take full advantage of multi-core devices and deliver software that meets the demands of the world.

But this paradigm of multi-threaded software development adds a new wrinkle of complexity for

those who care the utmost about software quality. Concurrency defects such as race conditions and

deadlocks are software defect types that are unique to multi-threaded applications. Complex and

hard-to-find, these defects can quickly derail a software project. To avoid catastrophic failures in

multithreaded applications, software development organizations must understand how to identify

and eliminate these deadly problems early in the application development lifecycle.

Here as the part of this work multi threaded producer consumer application is created using

the given linked list program. Two scenarios has been accommodate in this part of document. In the

first case producer will insert one value to the doubled linked list and at the other edge the

consumer will read that value and delete it. As the part of the second case three producer threads

tries to insert value in the linked list, at the end one consumer thread tries to read it and delete it.

Proper synchronization mechanism is developed.

2.2 Sequence diagram

A sequence diagram is a kind of interaction diagram that shows how processes operate with

one another and in what order. It is a construct of a Message Sequence Chart. A sequence diagram

shows object interactions arranged in time sequence. It depicts the objects and classes involved in

the scenario and the sequence of messages exchanged between the objects needed to carry out the

functionality of the scenario. Sequence diagrams typically are associated with use case realizations

in the Logical View of the system under development.

Figure 2.1 shows the sequence diagram for the one producer and one consumer .In figure,

the y-axis represents the time and x-axis represents the resources. In top left side one producer

thread and top right side one consumer thread is shown. At the starting, produce has to write the

data in the linked list. But linked list is shared between producer and consumer thread. To provide

synchronization between producer and consumer thread mutext is used. So, producer locks the

mutext and write data to the list. By the same time if the consumer tries to read the data then it tries

to acquire mutext which is taken by producer and it fails. In this case the consumer thread has to



wait until the producer releases the mutext.This phenomena shown in Figure 2.1. In this application

consumer can’t read the data until the producer produces it and store it in linked list. This is

synchronization mechanism is achieved by using mutext.

Figure 2. 1 Sequence diagram for one producer and one consumer

2.3 Development of parallelized program using Pthread/openMP

There are two approaches to develop the threading programs in the Linux. One is using

pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs

are chosen to develop the single producer and single consumer thread.

Figure 2. 2 Including Libraries and files for scenario 1

Figure 2.2 shows the all preprocessor statements of the code segment. The first one is

“ll2.c” file which is having definition of all the functions related to the linked list operation. The

second one is “ pthread.h”, which is having definition of all the threading related APIs. The last two

are the standard library files for normal functions. As the last lines of the image, a function named

create is declared.

Figure 2. 3 Deceleration of mutext and structures for scenario 1

Critical

region

Trying

Obtain

mutext

but fails

Consumer

has wait

until

resource

will be

free by

producer



To obtain synchronization in the application mutext is used. Here “lock” is defined as the

pthread mutext object. It is essential to initialize the mutext before using it. Here initialization is

handle by the assignment operation of the macro named “PTHREAD_MUTEX_INITIALIZER”.

Structure to pointer *myList is created which is holding starting address of the list. Structure to

pointer *p is also created which is pointed the current position for accessing the value. These all

declaration is shown by the Figure 2.2.

Figure 2. 4 Function to create new list for scenario 1

Figure 2.4 shows the definition of the creat function. By the calling of this function, it will

create new list. The new list is pointed by the myList pointer. list_crate() is the function which

created the new list and returns addresses of the new list in the form of list_head structure. At the

second line of the pointer p is created to hold the current position of the element in the list.At the

initial level the current position is set to first position. By calling the list_position_creat().

Figure 2. 5 Main function for Application of scenario 1

Figure 2.5 shows the main function for the single producer and single consumer application.

In figure, two functions is declared with the void pointer argument and void pointer return type.

The function named “ser” is the function which is called by the producer thread. The other side

consumer thread will call function “cli”.In the main One void pointer named “exit” is defined to

obtain the return value from the thread function. Here, two thread object is defined named “t_ser”

and “t_cli”.On the successful creation of the producer thread, ID of the thread is stored in the



“t_ser” and ID of consumer thread is stored in “t_cli”.To create the thread “pthread_create” API is

used with the appropriate arguments. In this application two threads are created one is producer

thread and other is the consumer thread. The consumer thread dies automatically if the producer of

the consumer thread exits. To avoid this situation main thread has to wait until the consumer thread

exits successfully. This mechanism is provided by the “pthread_join” API.

Figure 2. 6 Body of producer thread for scenario 1

Figur 2.6 shows the body for producer thread.At the starting of the producer thread the

creat() is called. This will create the one new list and assign the pointer p to first location.

Consumer can’t get any value before the producer stores it to the list. To avoid such race condition

mutex named “lock” is used. Function named “pthread_mutex_lock” is used to take the mutex and

enter in to the critical region. After this producer thread will take a value from the user in the

variable “val”. The user entered value is stored in the list and the position of the pointer p is

updated with the new current location. The storing mechanism is provided by the function

“list_inserLast” with the argument of the list object (myList) and value to be inserted. After

successful insertion of the value in the list any of the thread can get that value. So to end up with

the critical region, to release the obtained mutex “pthread_mutex_unlock” function is used. In the

critical region if the consumer thread tries to take the mutex or tries to access the critical section it

has to wait until the producer realizes it. So, after unlocking the mutex by producer thread the

consumer thread will acquire the resources.

Figure 2.7 shows the body of the consumer thread. At the starting of the thread function it

tries to take mutex. After unlocking the mutex by the producer thread the consumer thread gets

access to the shared list. The value is displayed by the passing the list object to the function named

“list_display”. Now the consumer thread has to remove the value. To do this function



“list_removeLast” is called with the list object. The return type of this function is location of

previous data. After removing this data mutex which is taken by the consumer thread is realized.

This whole phenomenon is shown by the code in figure 2.7.

Figure 2. 7 Body of consumer thread for scenario 1

2.4 Test cases and Testing results for scenario 1

2.4.1 Test cases

In this section test cases are designed for the producer consumer system. The table below

describes the test cases which are to be performed. There are to validate the functionality of system

with corner cases of input.

Table 2. 1 Test cases for single producer single consumer

TCN

Test cases

Test

Data

Expected Result

Output

Obtained

TC_1 Producer thread

will insert value

Int Consumer should read the value inserted by

the producer

Yes

TC_2 Only after

producer thread

unlocks

resources

consumer

should acquire it

Any The proper synchronization should be

maintain by producer and consumer threads

Yes

TC_3 Main thread

should wait until

all the child

threads are exits

Any The main thread has to alive until all the

threads executes completely.

Yes

TC_4 Any kind of

dead lock

should not

occurs

Any All the function of the program should

execute due to resource locking it should not

create dead lock

Yes



TC_5 After reading

the data

consumer thread

should delete it

Any After reading the data which entered by the

producer, consumer thread has to delete it

properly

Yes

2.4.2 Testing results

Figure 2.8 shows the testing results of TC_1,TC_2 and TC_4.Here the server thread is

waiting for the value from user domain. The server thread will holds critical region until it stores he

value in the shared list. By this time client thread is waiting to acquire resources.

Figure 2. 8 Producer thread is waiting for value in critical region

. Figure 2.9 shows the results of TC_3 and TC_4.By the time producer thread is leave the

critical region the consumer thread will enter in the critical region to read the value entered by the

producer.Here after reading the value the consumer thread has to delete it .This is shown in figure

Figure 2. 9 Consumer thread is printing the value from inserted by producer thread




Figure 2.10 shows the sequence diagram for the three producers and one consumer .In

figure, the y-axis represents the time and x-axis represents the resources. In left side of the figure

(on y axis) three producer threads and right side one consumer thread is shown.

Figure 2. 10 Sequence diagram of three producer one consumer

At the starting, every produce has to write the data in the linked list. But linked list is shared

between producer and consumer thread. To provide synchronization between producer and

consumer thread mutext is used. So, every producer locks the mutext and write data to the list. By

the same time if the consumer tries to read the data then it tries to acquire mutext which is taken by

producer and it fails. In this case the consumer thread has to wait until the producer releases the

mutext. This phenomena shown in Figure 2.10. In this application consumer can’t read the data

until the producer produces it and store it in linked list. This is synchronization mechanism is

achieved by using mutext.

Producer

thread 1

is in

critical

region

Producer

thread 2

is in

critical

region

Producer

thread 3

is in

critical

region

Consumer

thread is in

wait state

due to

resource is

acquired

by

producer

threads

Trying

Obtain

mutext

but fails



2.6 Development of paralleled program using pthread/openMP



are chosen to develop the three producers and single consumer thread. Definition procedures for

both scenarios are same the only difference is in the main body of application code.

Figure 2. 11 Including Libraries and files for scenario 2

Figure 2.11 shows the all preprocessor statements of the code segment. The first one is

“ll2.c” file which is having definition of all the functions related to the linked list operation. The

second one is “ pthread.h”, which is having definition of all the threading related APIs. The last two

are the standard library files for normal functions. As the last lines of the image, a function named

create is declared.

Figure 2. 12 Deceleration of mutext and structures for scenario 2

To obtain synchronization in the application mutext is used. Here “lock” is defined as the

pthread mutext object. It is essential to initialize the mutext before using it. Here initialization is

handle by the assignment operation of the macro named “PTHREAD_MUTEX_INITIALIZER”.

Structure to pointer *myList is created which is holding starting address of the list. Structure to

pointer *p is also created which is pointed the current position for accessing the value. These all

declaration is shown by the Figure 2.12.

Figure 2. 13 Function to create new list for scenario 2

Figure 2.13 shows the definition of the creat function. By the calling of this function, it will

create new list. The new list is pointed by the myList pointer. list_crate() is the function which

created the new list and returns addresses of the new list in the form of list_head structure. At the



second line of the pointer p is created to hold the current position of the element in the list.At the

initial level the current position is set to first position. By calling the list_position_creat().

Figure 2.14 shows the main function for multiple producer and single consumer application.

In figure ,two function is declared with the void pointer argument and void pointer return type. The

function named “ser” is the function which is called by the producer thread. Here there are three

producer threads are available which will call the same function thrice. The other side consumer

thread will call function “cli”. In the main One void pointer named “exit” is defined to obtain the

return value from the thread function.

Figure 2. 14 Main function for application of scenario 2

Here, five thread object is defined named “t_ser”, “t_ser1”, “t_ser2”, “t_ser3” and “t_cli”.

On the successful creation of the producer thread, ID of the thread is stored in the defined thread

object for producer and ID of consumer thread is stored in “t_cli”. Before creating thread here

creat() function is called to generate list and assign current location to the pointer p. In the case of

single producer and single consumer this function is called in the producer thread function because

the both threads run only once in that case. In this case producer function will execute thrice so

every time no need of creating new list. Once list is created, all the threads have to insert the value

and append the location pointer. To create the thread “pthread_create” API is used with the

appropriate arguments. In this application four threads are created in which three producer threads

and one consumer thread. The consumer thread dies automatically if the main thread exits. To avoid

this situation main thread has to wait until the consumer thread exits successfully. This mechanism

is provided by the “pthread_join” API.

Consumer can’t get any value before the producer stores it to the list. Even consumer has to

wait until all the producer stores the value in the list. The other side, no other producer thread can



insert value if one producer thread is in critical region. To achieve such synchronization, mutex

named “lock” is used. Function named “pthread_mutex_lock” is used to take the mutex and enter in

to the critical region. After these producers thread will take a value from the user in the local

variable “val”. The user entered value is stored in the list and the position of the pointer p is

updated by every producer thread. The storing mechanism is provided by the function

“list_inserLast” with the argument of the list object (myList) and value to be inserted from last.

After successful insertion of the value by all the producer thread any of the thread (consumer) can

get that value. So to end up with the critical region, to release the obtained mutex

“pthread_mutex_unlock” function is used. The body of the producer threads is shown by Figure

2.15.

Figure 2. 15 Body of producer thread for scenario 2

Figure 2. 16 Body of consumer thread for scenario 2



Figure 2.16 shows the body of the consumer thread. At the starting of the thread function it

tries to take mutex. After unlocking the mutex by the producer thread the consumer thread gets

access to the shared list. The value is displayed by the passing the list object to the function named

“list_display”. Now the consumer thread has to remove the value. To do this function

“list_removeLast” is called to remove single value from the list. In this scenario there are three

values are available in the list. So reading and deletion procedure is repeated thrice. The return type

of this function is location of previous data. After removing this all data mutex which is taken by

the consumer thread is realized.

2.4 Test cases and Testing results for scenario 2

2.4.1 Test cases

In this section test cases are designed for the four producer one consumer system. The table

below describes the test cases which are to be performed. There are to validate the functionality of

system with corner cases of input.

Table 2. 2 Test cases for single producer single consumer

TCN

Test cases

Test

Data

Expected Result

Output

Obtained

TC_1 All producer

threads should

insert value in

List

Int Consumer should read the value inserted

by the producer threads

Yes

TC_2 Consumer

thread should

reads the value

in appropriate

priority

Any Priority is assigned to the all the producer

threads. Consumer should read it in the

proper priority order

Yes

TC_3 Main thread

should wait until

all the child

threads are exits

Any The main thread has to alive until all the

threads executes completely.

Yes

TC_4 Two producer

threads should

not insert value

by same time

Any The proper synchronization mechanism

should be maintained by the app producer

threads to insert value

Yes



TC_5 After reading

the data

consumer thread

should delete it

one by one

Any After reading the data which entered by

the producer, consumer thread has to

delete it properly

Yes

2.4.2 Testing results

Figure 2.17 shows the testing results of developed producer-consumer application. Here

consumer thread is waiting until all the consumer threads leave the critical region. Here first

priority is assigned to the first producer thread.Results of TC_1,TC_2 and TC_4 is shown in the

figure below.

Figure 2. 17 Producer thread is waiting in critical region

Figure 2.18 shows the results of test cases TC_3 and TC_4 .Only after all server threads

leave the critical region consumer can enter in it to read the values from the list. As par the given

priority the consumer thread reads the value.

Figure 2. 18 Consumer threads is active after all the producer thread finish the critical region

NOTE: In this document all the results are documented for single iteration of application to

provide clear understanding.



2.5 Conclusion

Multi-core hardware is clearly increasing software complexity by driving the need for multi-

threaded applications. Based on the rising rate of multi-core hardware adoption in both enterprise

and consumer devices, the challenge of creating multi-threaded applications is here to stay for

software developers. In the coming years, multi-threaded application development will most likely

become the dominant paradigm in software As this shift continues, many development

organizations will transition to multi-threaded application development on the fly.

In view of this, a producer-consumer application is successfully created using pthread APIs.

Both threads shares same linked list. The synchronization is provided by mutex. The test cases are

developed by critically analyzing the application code and assignment requirements. All the test

cases are successfully tested.



CHAPTER 3

Development of Consumer Producer Application with extended

priority concept 3.1 Introduction

All modern operating systems divide CPU cycles in the form of time quantas among various

processes and threads (or Linux tasks) in accordance with their policies and priorities. Thread

scheduling is one of the most important and fundamental services offered by an operating system

kernel. Some of the metrics an operating system scheduler seeks to optimize are: fairness,

throughput, turnaround time, response time and efficiency .Multiprocessor operating systems

assume that all cores are identical and offer the same performance.


Figure 3.1 shows the sequence diagram in which the message queue is prioritizes. In figure, the

y-axis represents the time and x-axis represents the resources. Producer thread is shown by the left

side in the image. In producer thread, vertical thin line shows the main thread and thick overlapped

line shows the producer threads. Each producer threads maintain one queue to store data. On the

right side of the image one consumer thread is shown. Before spawning the producer and consumer

thread main thread locks four semaphores. After locking semaphores main will create four

producers and one consumer. At the starting, consumer threads will tries to acquire semaphore in

proper priority. All the producer thread will access their own message queue and insert the data. At

the end, the producer thread will unlock the semaphore so consumer thread can have access to

particular semaphore.

In figure the ascending priority order for producer threads/queues are thread4, thread3, thread2,

thread1.By the end of thread 1 it will release the semaphore1.The consumer thread continuously

seeking for the size of the all the lists associated with the queues. But due to the priority assigned to

the thread 3 is higher so consumer thread looks to the size of the third queue first. If thread three

doesn’t contain any data in its queue then consumer thread will look for the lower priority queue.

As result of this mechanism, by the time only thread one enters the element in the queue and

releases the semaphore. By the next moment consumer thread will look for availability of the

element in the queue three but it fails. No other higher priority has data in its queue then rather than

waiting for the higher priority thread, consumer thread will read and delete the data associated with

the lower priority queue. By the time higher priority producer threads enters the value in its

queue,the consumer thread will immediately reads it and delete it.



In some condition when consumer and producer thread tries to acquire resource at the same

time the consumer thread is given the priority to access the resources.

Figure3. 1 Sequence diagram for prioritized consumer thread

3.2 Development of designed application



are chosen to develop the four producers and single consumer thread.

Figure3. 2 Including library files for scenario 3

Before spawning

consumer

/producer main task

locks 4 semaphores

Main task

Consumer thread

has to wait until

highest priority

producer releases

the semaphore

Each producer

is storing data

in their queue

and unlocking

semaphore

Consu

mer

thread

got

highest priority

semaph

ore

Producer

thread

with priority

Producer

thread

with priority

Producer

thread

with priority

Producer

thread

with priority



In this scenario pthreas API are used. Definition of Pthread APIs are included by including

pthread.h. To provide appropriate synchronization semaphore is used. Definition of semaphore

APIs and decelatarion of the semaphore type object is included with the semaphore.h. Figure 3.2

shows this files is included with the application.

Figure3. 3 Declaration of constructive functions for scenario 3

Figure 3.3 shows the deceleration of constructive function. Here in this scenario four thread

will create four different list. To fulfill this requirement each function for each thread is decelared.

Figure3. 4 Decelaration of list pointers and location pointers

Structure to pointer *myList is created which is holding starting address of the list.Here we

have four different queue. To hold these queues(hold base address) four different pointer to

structure list_head is created. As same, to hold the current location in the in four different list four

ll_node is created. Deceleration of these all object are shown by the Figure3.4.

Figure3. 5 Definition of constructive functions

Figure 3.5 shows the definition of the constructive functions. By the calling of this function,

it will create new list. The new list is pointed by the myList series of pointers. list_create() is the



function which created the new list and returns addresses of the new list in the form of list_head

structure. At the second line of the pointer p,q,r and s will be created to hold the current position of

the element in the lists. At the initial level the current position is set to first position by calling the

list_position_create().

Figure3. 6 Declaration of thread function and synchronization objects

Here, four producer thread will call the four different function. Decelerations of these

functions are shown by the figure 3.6.As the part of synchronization mechanism four semaphores

and mutext is used.The only reason for using semaphore is it can be taken by one thread and

released by the other thread but in the case of mutex it is not possible. The decelerations of these

objects are shown by the Figure 3.6.

Figure3. 7 Main function for application of scenario 2

Figure 3.7 shows the code for the main function. Semaphore is initialized at the starting of

the main function. To initialize the semaphore function sem_init is used with the three arguments.

The first argument shows the address of the sem_t (semaphore) type object. The second parameter



shows that semaphore is shared between all the threads of process. The third parameter shows the

initial value of the semaphore. Here in our case the initial value is 1 so semaphore is binary

semaphore. After initializing all synchronization tools, threads are created by locking the

semaphore So at the end of this four producer thread and one consumer thread is created by locking

four semaphore and main thread will wait for client thread to finish execution.

Figure3. 8 First producer thread

Figure 3.8 shows the first producer thread. The first thread will enter the value in to the list

named mylist. At the end of the function thread 1 will unlock semaphore named l_th, which was

taken by the main function before crating the thread. By the same time consumer thread is also

waiting to take the highest priority thread’s semaphore again. Here highest priority thread is thread

3 and semaphore associated with it is l_th3.Mutext is used to avoid the multiple threads seeks for

data at the same time.

Figure3. 9 Second producer thread

Figure 3.9 shows the second producer thread. The second thread will enter the value in to

the list named mylist1. At the end of the function thread 2 will unlock semaphore named l_th1,



which was taken by the main function before crating the thread. By the same time consumer thread

is also waiting to take the highest priority thread’s semaphore again. Here highest priority thread is

thread 3 and semaphore associated with it is l_th3.Mutext is used to avoid the multiple threads

seeks for data at the same time.

Figure3. 10 Third producer thread

Figure 3.10 shows the third producer thread. The third thread will enter the value in to the

list named mylist2. At the end of the function thread 3 will unlock semaphore named l_th2, which

was taken by the main function before crating the thread. By the same time consumer thread is

waiting to lock semaphore l_th3 which is already locked by main function .Mutex is used to avoid

the multiple threads seeks for data at the same time.

Figure3. 11 Fourth producer thread

Figure 3.10 shows the fourth producer thread. The fourth thread will enter the value in to the

list named mylist3. At the end of the function thread 4 will unlock semaphore named l_th3, which

was taken by the main function before crating the thread. This is the highest priority thread for



which consumer thread is looking. By the moment when thread four will release the semaphore,

consumer thread will be active. The consumer thread will read the data from the highest priority

thread to lowest priority thread.

Figure3. 12 Consumer thread with highest priority queue

Figure 3.12 shows the top half of the consumer thread. As per the requirement, when the

producer and consumer thread comes to gather the consumer should get highest priority to access

queue. TO obtain this one instance of the structure sched_param is created. Here two APIs are used

named pthread_setschedparam() and pthread_setschedprio().The first API is used to change the

scheduling policy of the current thread. For consumer thread scheduling policy is set to

FIFO.Basically FIFO is the scheduling policy in which the thread which comes first in ready state

will get a chance to execute. No other thread can preempt the current execution of the thread. In our

case due to FIFO scheduling no other thread can preempt the consumer thread.

On the other side, the requirement is when consumer and producer come to at the same

time consumer should get the highest priority to in such situation. To fulfill this requirement here

priority of the client thread is set high and server thread is working with the normal priority. To

assign priority to thread pthread_setschedprio() is used with the argument of thread ID and value of

priority. In FIFO scheduling the 0 is least priority and 99 is the highest priority. After setting the

priority to the thread the consumer thread will continuously mask the size variable in every list



structure of the producer threads. If producer stores some data in its list, consumer will read it and

remove it.

Here first thread 3 is having the highest priority so consumer thread checks size of queue

associated with the thread 3. If the size is not equal to zero it menace that some data is available in

the queue so it has to delete as the first priority.

Figure3. 13 Continuation of consumer thread for second priority queue

If the highest priority thread is not having data in its queue then it not worth for consumer

thread to wait until highest priority thread stores data in its queue because consumer has to serve

four producer threads. To achieve this functionality if consumer thread doesn’t get the data in the

highest priority thread (myList3) then it will jump to check the data is second priority queue. Here

the second priority is assigned to the thread 2 and queue associated with thread 2 is myList2.This

mechanism is seen form the figure 3.13. In figure consumer thread is checking the myList2 if some

data is available in myList2 then consumer will print it and delete it.

Figure3. 14 Continuation of consumer thread for third priority queue

If the first two priority threads are not having data in their queue then it not worth for

consumer thread to wait until any of two threads stores data in its queue because consumer has to

serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in

the first two priority thread (myList3 and myList2) then it will jump to check the data is third



priority queue. Here the third priority is assigned to the thread 1 and queue associated with thread 1

is myList1.This mechanism is seen form the figure 3.14. In figure consumer thread is checking the

myList3.If some data is available in myList1 then consumer will print it and delete it.

Figure3. 15 Continuation of consumer thread for last priority queue

If the first three priority threads are not having data in their queue then it not worth for

consumer thread to wait until any of three threads stores data in its queue because consumer has to

serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in

the first three priority thread (myList3,myList2 and myList1) then it will jump to check the data is

last priority queue. Here the last priority is assigned to the thread 0 and queue associated with

thread 3 is myList3.This mechanism is seen form the figure 3.14. In figure consumer thread is

checking the myList3.If some data is available in myList3 then consumer will print it and delete it.

3.3 Test cases and testing results for scenario 3

3.3.1 Test cases

In this section test cases are designed for the four producer one consumer system. The table

below describes the test cases which are to be performed. There are to validate the functionality of

system with corner cases of input.

Table 3. 1 Test cases for higher priority consumer thread

TCN

Test cases

Test

Data

Expected Result

Output

Obtained

TC_1 Consumer

should acquire

higher priorities

and it should

run first

NA At the starting of the program the

consumer should run and shows the list is

empty

Yes



TC_2 If the consumer

and producer

runs to tries to

access resource

together then

consumer

should get

access first

Any By the time of access of shared resources,

the consumer should get higher priority

Yes

TC_3 Consumer

should not wait

for the higher

priority thread

to enter value

Any If higher priority thread doesn’t enters the

value then consumer should check in the

other lower priority queue

Yes

TC_4 If two values is

enters by the

any of producer

thread then

consumer

respond it

Any There are some cases when consumer is

busy in printing some values and by the

same time some thread enter two values in

tits queue. In such condition consumer

should Read and delete both values

Yes

3.3.2 Documentation of the results

Figure 3.16 shows the results of the test cases which is developed in above section. It can be

seen from the image that consumer thread is resounding for all the present threads queues which is

having values. Attached callouts will give the batter understanding about resuts.

Figure3. 16 Results of test cases

Thread 3 is having higher

priority but it doesn’t have value

in its queue. But present thread 1

has a value in its queue. Rather

than waiting consumer thread

giving service to the lower

priority thread.

Consumer thread

is executing first as

it has highest

priority

Thread 3 has a

highest priority

to be read. But it

is coming but

consumer is not

waiting for

thread three.



3.5 Conclusion

Consumer Producer Application with extended priority concept is explained with a

sequence diagram showing the relation between FOUR static priority producer threads, one

consumer thread, four queues and their synchronization mechanism, parallelized program using P

thread and test cases to test the functionality.



CHAPTER 4

4.1 Module Learning Outcomes

This module helped to expertise on parallel programming for multi-core architectures,

learned multi-core processors along with their performance quantification & usability techniques,

single & multi-core optimization techniques and development of multi-threaded parallel

programming are explained clearly. Virtualization and partitioning techniques are explained

detailed along with specific challenges and solutions. Got an idea about parallel programming of

multi-core processors with appropriate case studies using OpenMP and pthreads.

After this module I am able to analyze multi-core architectures, optimization process of

single- & multi-core processors and parallel programming for multi-core processors proficiently

using OpenMP library and GCC complier for programming multi-core processors and applying

parallel programming concepts for developing applications on multi-core processors was well

taught using lab programs has become a efficient way of learning pthreads, OpenMP and various

synchronization techniques for eliminating deadlock situation.

4.2 Conclusion

In Chapter1 By critically comparing throughput and latency for available arbitration

schemes of multi core, Round Robin is better when compare to the other arbitration, since it is a

fair-queuing packet scheduler which has good fairness and delay properties, and low complexity,

even there are lot more negative aspects in round robin hope replacements would be done in future.

In chapter2 Multi-core hardware is clearly increasing software complexity by driving the

need for multi-threaded applications. Based on the rising rate of multi-core hardware adoption in

both enterprise and consumer devices, the challenge of creating multi-threaded applications is here

to stay for software developers.

In view of this, a producer-consumer application is successfully created using pthread APIs.

Both threads shares same linked list. The synchronization is provided by mutex. The test cases are

developed by critically analyzing the application code and assignment requirements. All the test

cases are successfully tested.

In chapter3 Consumer Producer Application with extended priority concept is explained

with a sequence diagram showing the relation between FOUR static priority producer threads, one

consumer thread, four queues and their synchronization mechanism, parallelized program using P

thread and test cases to test the functionality.



References

[1] http://www.coverity.com/library/pdf/coverity_multi-threaded_whitepaper.pdf

[2] www.irit.fr/publis/TRACES/12619_etfa2011.pd

[3] www.cs.fsu.edu/research/reports/TR-100401.pd

[4] paper.ijcsns.org/07_book/200809/20080936.pdf

[5] www.sti.uniurb.it/bogliolo/e-publ/KLUWERjdaes03.pdf



Appendix-1



Appendix-2

assignment 9

Technology

industrial

macro named

datemulti

themulti core

queuing packet

threads executes

extended priority

core hardware