ieee final year projects 2011-2012 :: elysium technologies pvt ltd::parallel computing

1. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 201220 Churn-Resilient Protocol for Massive Data Dissemination in P2P Networks Massive data dissemination is often disrupted by frequent join and departure or failure of client nodes in a peer-to-peer (P2P) network. We propose a new churn-resilient protocol (CRP) to assure alternating path and data proximity to accelerate the data dissemination process under network churn. The CRP enables the construction of proximity-aware P2P content delivery systems. We present new data dissemination algorithms using this proximity-aware overlay design. We simulated P2P networks up to 20,000 nodes to validate the claimed advantages. Specifically, we make four technical contributions: 1). The CRP scheme promotes proximity awareness, dynamic load balancing, and resilience to node failures and network anomalies. 2). The proximity-aware overlay network has a 28-50 percent speed gain in massive data dissemination, compared with the use of scope-flooding or epidemic tree schemes in unstructured P2P networks. 3). The CRP-enabled network requires only 1/3 of the control messages used in a large CAM-Chord network. 4) Even with 40 percent of node failures, the CRP network guarantees atomic broadcast of all data items. These results clearly demonstrate the scalability and robustness of CRP networks under churn conditions. The scheme appeals especially to webscale applications in digital content delivery, network worm containment, and consumer relationship management over hundreds of datacenters in cloud computing services.21 Cloud Technologies for Bioinformatics Applications Executing large number of independent jobs or jobs comprising of large number of tasks that perform minimal inter task communication is a common requirement in many domains. Various technologies ranging from classic job schedulers to the latest cloud technologies such as Map Reduce can be used to execute these many-tasks in parallel. In this paper, we present our experience in applying two cloud technologies Apache Ha doop and Microsoft DryadLINQ to two bioinformatics applications with the above characteristics. The applications are a pair wise Alu sequence alignment application and an Expressed Sequence Tag (EST) sequence assembly program. First, we compare the performance of these cloud technologies using the above applications and also compare them with traditional MPI implementation in one application. Next, we analyze the effect of inhomogeneous data on the scheduling mechanisms of the cloud technologies. Finally, we present a comparison of performance of the cloud technologies under virtual and nonvirtual hardware platforms.22 Collective Receiver-Initiated Multicast for Grid Applications Grid applications often need to distribute large amounts of data efficiently from one cluster to multiple others (multicast). Existing sender-initiated methods arrange nodes in optimized tree structures, based on external network monitoring data. This dependence on monitoring data severely impacts both ease of deployment and adaptivity to dynamically changing network conditions. In this paper, we present Robber, a collective, receiver-initiated, high- throughput multicast approach inspired by the BitTorrent protocol. Unlike BitTorrent, Robber is specifically designed to maximize the throughput between multiple cluster computers. Nodes in the same cluster work together as a collective that tries to steal data from peer clusters. Instead of using potentially outdated monitoring data, Robber automatically adapts to the currently achievable bandwidth ratios. Within a collective, nodes automatically tune the amount of data they steal remotely to their relative performance. Our experimental evaluation compares Robber to BitTorrent, to Balanced Multicasting, and to its predecessor MOB. Balanced Multicasting optimizes multicast trees based on external monitoring data, while MOB uses collective, receiver-initiated multicast with static load balancing.Madurai Trichy KollamElysium Technologies Private LimitedElysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234. eMail: [email protected]: [email protected] 1

2. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 2012 We show that both Robber and MOB outperform BitTorrent. They are competitive with Balanced Multicasting as long as the network bandwidth remains stable, and outperform it by wide margins when bandwidth changes dynamically. In large environments and heterogeneous clusters, Robber outperforms MOB.23 Comparing Hardware Accelerators in Scientific Applications: A Case Study Multicore processors and a variety of accelerators have allowed scientific applications to scale to larger problem sizes. We present a performance, design methodology, platform, and architectural comparison of several application accelerators executing a Quantum Monte Carlo application. We compare the applications performance and programmability on a variety of platforms including CUDA with Nvidia GPUs, Brook+ with ATI graphics accelerators, OpenCL running on both multicore and graphics processors, C++ running on multicore processors, and a VHDL implementation running on a Xilinx FPGA. We show that OpenCL provides application portability between multicore processors and GPUs, but may incur a performance cost. Furthermore, we illustrate that graphics accelerators can make simulations involving large numbers of particles feasible.24 Computing Localized Power-Efficient Data Aggregation Trees for Sensor Networks We propose localized, self organizing, robust, and energy-efficient data aggregation tree approaches for sensor networks, which we call Localized Power-Efficient Data Aggregation Protocols (L-PEDAPs). They are based on topologies, such as LMST and RNG,that can approximate minimum spanning tree and can be efficiently computed using only position or distance information of one-hop neighbors. The actual routing tree is constructed over these topologies. We also consider different parent selection strategies while constructing a routing tree. We compare each topology and parent selection strategy and conclude that the best among them is the shortest path strategy over LMSTstructure. Our solution also involves route maintenance procedures that will be executed when a sensor node fails or a new node is added to the network. The proposed solution is also adapted to consider the remaining power levels of nodes in order to increase the network lifetime. Our simulation results show that by using our power-aware localized approach, we can almost have the same performance of a centralized solution in terms of network lifetime, and close to 90 percent of an upper bound derived here.25 Conflicts and Incentives in Wireless Cooperative Relaying: A Distributed Market Pricing Framework Extensive research in recent years has shown the benefits of cooperative relaying in wireless networks, where nodes overhear and cooperatively forward packets transmitted between their neighbors. Most existing studies focus on physical-layer optimization of the effective channel capacity for a given transmitter-receiver link; however, the interaction among simultaneous flows between different endpoint pairs, and the conflicts arising from their competition for a shared pool of relay nodes, are not yet well understood. In this paper, we study a distributed pricing framework, where sources pay relay nodes to forward their packets, and the payment is shared equally whenever a packet is successfully relayed by several nodes at once. We formulate this scenario as a Stackelberg (leader-follower) game, in which sources set the payment rates they offer, and relay nodes respond by choosing the flows to cooperate with. We provide a systematic analysis of the fundamental structural properties of this generic model. We show that multiple follower equilibria exist in general due to the nonconcave nature of their game, yet only one equilibriumMadurai Trichy KollamElysium Technologies Private LimitedElysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234. eMail: [email protected]: [email protected] 3. Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Trichy | Coimbatore | Kollam| SingaporeWebsite: elysiumtechnologies.com, elysiumtechnologies.infoEmail: [email protected] Project List 2011 - 2012 possesses certain continuity properties that further lead to a unique system equilibrium among the leaders. We further demonstrate that the resulting equilibria are reasonably efficient in several typical scenarios.26 Consensus and Mutual Exclusion in a Multiple Access Channel We consider deterministic feasibility and time complexity of two fundamental tasks in distributed computing: consensus and mutual exclusion. Processes have different labels and communicate through a multiple access channel. The adversary wakes up some processes in possibly different rounds. In any round, every awake process either listens or transmits. The message of a process i is heard by all other awake processes, if i is the only process to transmit in a given round. If more than one process transmits simultaneously, there is a collision and no message is heard. We consider three characteristics that may or may not exist in the channel: collision detection (listening processes can distinguish collision from silence), the availability of a global clock showing the round number, and the knowledge of the number n of all processes. If none of the above three characteristics is available in the channel, we prove that consensus and mutual exclusion are infeasible; if at least one of them is available, both tasks are feasible, and we study their time complexity. Collision detection is shown to cause an exponential gap in complexity: if it is available, both tasks can be performed in time logarithmic in n, which is optimal, and without collision detection both tasks require linear time. We then investigate both consensus and mutual exclusion in the absence of collision detection, but under alternative presence of the two other features. With global clock, we give an algorithm whose time complexity linearly depends on n and on the wake-up time, and an algorithm whose complexity does not depend on the wake-up time and differs from the linear lower bound only by a factor O(log2 n). If n is known, we also show an algorithm whose complexity differs from the linear lower bound only by a factor O(log2 n).27 Cooperative Channelization in Wireless Networks with Network Coding n this paper, we address congestion of multicast traffic in multihop wireless networks through a combination of network coding and resource reservation. Network coding reduces the number of transmissions required in multicast flows, thus allowing a network to approach its multicast capacity. In addition, it efficiently repairs errors in multicast flows by combining packets lost at different destinations. However, under conditions of extremely high congestion the repair capability of network coding is seriously degraded. In this paper, we propose cooperative channelization, in which portions of the transmission media are allocated to links that are congested at the point where network coding cannot efficiently repair loss. A health metric is proposed to allow comparison of need for channelization of different multicast links. Cooperative channelization considers the impact of channelization on overall network performance before resource reservation is triggered. Our results show that cooperative channelization improves overall network performance while being well suited for wireless networks using network coding.28 Cooperative Search and Survey Using Autonomous Underwater Vehicles (AUVs) In this work, we study algorithms for cooperative search and survey using a fleet of Autonomous Underwater Vehicles (AUVs). Due to the limited energy, communication range/bandwidth, and sensing range of the AUVs, underwater search and survey with multiple AUVs brings about several new challenges since a large amount of data needs to be collected by each AUV, and any AUV may fail unexpectedly. To address the challenges and meet our objectives of minimizing the total survey time and traveled distance of AUVs, we propose a cooperative rendezvous scheme calledMaduraiTrichyKollamElysium Technologies Private Limited Elysium Technologies Private LimitedElysium Technologies Private Limited230, Church Road, Annanagar, 3rd Floor,SI Towers,Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020. 15 ,Melapudur , Trichy, kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702. Tamilnadu 620 001.Contact : 91474 2723622.eMail: [email protected] : 91431 - 4002234.eMail: [email protected] eMail: [email protected] 3 4. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 2012 Synchronization-Based Survey (SBS) to facilitate cooperation among a large number of AUVs when surveying a large area. In SBS, AUVs form an intermittently connected network (ICN) in that they periodically meet each other for data aggregation, control signal dissemination, and AUV failure detection/recovery. Numerical analysis and simulations have been performed to compare the performance of three variants of SBS schemes, namely, Alternating Column Synchronization (ACS), Strict Line Synchronization (SLS), and X Synchronization (XS). The results show that XS can outperform other SBS schemes in terms of the survey time and the traveled distance of AUVs. We also compare XS with nonsynchronization-based survey and the lower bound on the survey time and traveled distance. The results show that XS achieves a close to optimal performance..29 Coordinating Computation and I/O in Massively Parallel Sequence Search With the explosive growth of genomic information, the searching of sequence databases has emerged as one of the most computation and data-intensive scientific applications. Our previous studies suggested that parallel genomic sequence-search possesses highly irregular computation and I/O patterns. Effectively addressing these runtime irregularities is thus the key to designing scalable sequence-search tools on massively parallel computers. While the computation scheduling for irregular scientific applications and the optimization of noncontiguous file accesses have been well-studied independently, little attention has been paid to the interplay between the two. In this paper, we systematically investigate the computation and I/O scheduling for data-intensive, irregular scientific applications within the context of genomic sequence search. Our study reveals that the lack of coordination between computation scheduling and I/O optimization could result in severe performance issues. We then propose an integrated scheduling approach that effectively improves sequence-search throughput by gracefully coordinating the dynamic load balancing of computation and highperformance noncontiguous I/O.30 Coordinating Power Control and Performance Management for Virtualized Server Clusters Todays data centers face two critical challenges. First, various customers need to be assured by meeting their required service-level agreements such as response time and throughput. Second, server power consumption must be controlled in order to avoid failures caused by power capacity overload or system overheating due to increasing high server density. However, existing work controls power and application-level performance separately, and thus, cannot simultaneously provide explicit guarantees on both. In addition, as power and performance control strategies may come from different hardware/software vendors and coexist at different layers, it is more feasible to coordinate various strategies to achieve the desired control objectives than relying on a single centralized control strategy. This paper proposes Co-Con, a novel cluster-level control architecture that coordinates individual power and performance control loops for virtualized server clusters. To emulate the current practice in data centers, the power control loop changes hardware power states with no regard to the application-level performance. The performance control loop is then designed for each virtual machine to achieve the desired performance even when the system model varies significantly due to the impact of power control. Co-Con configures the two control loops rigorously, based on feedback control theory, for theoretically guaranteed control accuracy and system stability. Empirical results on a physical testbed demonstrate that Co-Con can simultaneously provide effective control on both application-level performance and underlying power consumption.Madurai Trichy KollamElysium Technologies Private LimitedElysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234. eMail: [email protected]: [email protected] 5. Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Trichy | Coimbatore | Kollam| SingaporeWebsite: elysiumtechnologies.com, elysiumtechnologies.infoEmail: [email protected] Project List 2011 - 201231 Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid We have previously suggested mixed precision iterative solvers specifically tailored to the iterative solution of sparse linear equation systems as they typically arise in the finite element discretization of partial differential equations. These schemes have been evaluated for a number of hardware platforms, in particular, single-precision GPUs as accelerators to the general purpose CPU. This paper reevaluates the situation with new mixed precision solvers that run entirely on the GPU: We demonstrate that mixed precision schemes constitute a significant performance gain over native double precision. Moreover, we present a new implementation of cyclic reduction for the parallel solution of tridiagonal systems and employ this scheme as a line relaxation smoother in our GPU-based multigrid solver. With an alternating direction implicit variant of this advanced smoother, we can extend the applicability of the GPU multigrid solvers to very ill-conditioned systems arising from the discretization on anisotropic meshes, that previously had to be solved on the CPU. The resulting mixed-precision schemes are always faster than double precision alone, and outperform tuned CPU solvers consistently by almost an order of magnitude.32 Data Fusion with Desired Reliability in Wireless Sensor Networks Energy-efficient and reliable transmission of sensory information is a key problem in wireless sensor networks. To save more energy, in-network processing such as data fusion is a widely used technique, which, however, may often lead to unbalanced information among nodes in the data fusion tree. Traditional schemes aim at providing reliable transmission to individual data packets from source node to the sink, but seldom offer the desired reliability to a data fusion tree. In this paper, we explore the problem of Minimum Energy Reliable Information Gathering (MERIG) when performing data fusion. By adaptively using redundant transmission on fusion routes without acknowledgments, packets with more information are delivered with higher reliability. For different data fusion topologies, such as star, chain, and tree, we provide optimal solutions to compute the number of transmissions for each node. We also propose practical, distributed approximation algorithms for chain and tree topologies. Analytical proofs and simulation results show that energy-efficient information reliability can be guaranteed in an unreliable wireless environment with the help of our proposed schemes.33 Data Replication in Data Intensive Scientific Applications with Performance Guarantee Data replication has been well adopted in data intensive scientific applications to reduce data file transfer time and bandwidth consumption. However, the problem of data replication in Data Grids, an enabling technology for data intensive applications, has proven to be NP-hard and even non approximable, making this problem difficult to solve. Meanwhile, most of the previous research in this field is either theoretical investigation without practical consideration, or heuristics-based with little or no theoretical performance guarantee. In this paper, we propose a data replication algorithm that not only has a provable theoretical performance guarantee, but also can be implemented in a distributed and practical manner. Specifically, we design a polynomial time centralized replication algorithm that reduces the total data file access delay by at least half of that reduced by the optimal replication solution. Based on this centralized algorithm, we also design a distributed caching algorithm, which can be easily adopted in a distributed environment such as Data Grids. Extensive simulations are performed to validate the efficiency of our proposed algorithms. Using our own simulator, we show that our centralized replication algorithm performs comparably to the optimal algorithm and other intuitive heuristics under different network parameters. Using GridSim, a popular distributed Grid simulator, we demonstrate that the distributed caching technique significantly outperforms an existing popular file cachingMaduraiTrichy KollamElysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020. 15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702. Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] : 91431 - 4002234. eMail: [email protected] eMail: [email protected] 6. Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Trichy | Coimbatore | Kollam| SingaporeWebsite: elysiumtechnologies.com, elysiumtechnologies.infoEmail: [email protected] Project List 2011 - 2012 technique in Data Grids, and it is more scalable and adaptive to the dynamic change of file access patterns in Data Grids.34 Dealing with Nonuniformity in Data Centric Storage for Wireless Sensor Networks In-network storage of data in Wireless Sensor Networks (WSNs) is considered a promising alternative to external storage since it contributes to reduce the communication overhead inside the network. Recent approaches to data storage rely on Geographic Hash Tables (GHT) for efficient data storage and retrieval. These approaches, however, assume that sensors are uniformly distributed in the sensor field, which is seldom true in real applications. Also they do not allow tuning the redundancy level in the storage according to the importance of the data to be stored. To deal with these issues, we propose an approach based on two mechanisms. The first is aimed at estimating the real network distribution. The second exploits data dispersal method based on the estimated network distribution. Experiments through simulation show that our approach approximates quite closely the real distribution of sensors and that our dispersal protocol sensibly reduces data losses due to unbalanced data load.35 Decomposing Workload Bursts for Efficient Storage Resource Management The growing popularity of hosted storage services and shared storage infrastructure in data centers is driving the recent interest in resource management and QoS in storage systems. The bursty nature of storage workloads raises significant performance and provisioning challenges, leading to increased resource requirements, management costs, and energy consumption. We present a novel workload shaping framework to handle bursty workloads, where the arrival stream is dynamically decomposed to isolate its bursts, and then rescheduled to exploit available slack. We show how decomposition reduces the server capacity requirements and power consumption significantly, while affecting QoS guarantees minimally. We present an optimal decomposition algorithm RTT and a recombination algorithm Miser, and show the benefits of the approach by evaluating the performance of several storage workloads using both simulation and Linux implementation.36 Design and Evaluation of MPI File Domain Partitioning Methods under Extent-Based File Locking Protocol MPI collective I/O has been an effective method for parallel shared-file access and maintaining the canonical orders of structured data in files. Its implementation commonly uses a two-phase I/O strategy that partitions a file into disjoint file domains, assigns each domain to a unique process, redistributes the I/O data based on their locations in the domains, and has each process perform I/O for the assigned domain. The partitioning quality determines the maximal performance achievable by the underlying file system, as the shared-file I/O has long been impeded by the cost of file systems data consistency control, particularly due to the conflicted locks. This paper proposes a few file domain partitioning methods designed to reduce lock conflicts under the extent-based file locking protocol. Experiments from four I/O benchmarks on the IBM GPFS and Lustre parallel file systems show that the partitioning method producingMaduraiTrichy KollamElysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020. 15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702. Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] : 91431 - 4002234. eMail: [email protected] eMail: [email protected] 7. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 2012 minimum lock conflicts wins the highest performance. The benefit of removing conflicted locks can be so significant that more than thirty times of write bandwidth differences are observed between the best and worst methods.37 Design and Evaluation of Multiple-Level Data Staging for Blue Gene Systems Parallel applications currently suffer from a significant imbalance between computational power and available I/O bandwidth. Additionally, the hierarchical organization of current Petascale systems contributes to an increase of the I/O subsystem latency. In these hierarchies, file access involves pipelining data through several networks with incremental latencies and higher probability of congestion. Future Exascale systems are likely to share this trait. This paper presents a scalable parallel I/O software system designed to transparently hide the latency of file system accesses to applications on these platforms. Our solution takes advantage of the hierarchy of networks involved in file accesses, to maximize the degree of overlap between computation, file I/O-related communication, and file system access. We describe and evaluate a two-level hierarchy for Blue Gene systems consisting of client-side and I/O node-side caching. Our file cache management modules coordinate the data staging between application and storage through the Blue Gene networks. The experimental results demonstrate that our architecture achieves significant performance improvements through a high degree of overlap between computation, communication, and file I/O.38 Design and Performance Evaluation of Image Processing Algorithms on GPUs In this paper, we construe key factors in design and evaluation of image processing algorithms on the massive parallel graphics processing units (GPUs) using the compute unified device architecture (CUDA) programming model. A set of metrics, customized for image processing, is proposed to quantitatively evaluate algorithm characteristics. In addition, we show that a range of image processing algorithms map readily to CUDA using multiview stereo matching, linear feature extraction, JPEG2000 image encoding, and nonphotorealistic rendering (NPR) as our example applications. The algorithms are carefully selected from major domains of image processing, so they inherently contain a variety of subalgorithms with diverse characteristics when implemented on the GPU. Performance is evaluated in terms of execution time and is compared to the fastest host-only version implemented using OpenMP. It is shown that the observed speedup varies extensively depending on the characteristics of each algorithm. Intensive analysis is conducted to show the appropriateness of the proposed metrics in predicting the effectiveness of an application for parallel implementation.39Design of Distributed Heterogeneous Embedded Systems in DDFCharts The use of formal models of computation in dealing with increasing complexity of embedded systems design is gaining attention. A successful model of computation must be able to handle both control-dominated and data-dominated behaviors, which are most often simultaneously present in complex embedded systems. Besides behavioralMadurai Trichy KollamElysium Technologies Private LimitedElysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234. eMail: [email protected]: [email protected] 8. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 2012 heterogeneity, direct support for modeling distributed systems is also desirable, since an increasing number of embedded systems belong to this category. In this paper, we present distributed DFCharts (DDFCharts), a language based on a formal model that targets distributed heterogeneous embedded systems. Its top hierarchical level is made suitable to capture distributed systems. Behavioral heterogeneity is addressed by composing finite-state machines (FSMs) and synchronous dataflow graphs (SDFGs). We illustrate modeling in DDFCharts with practical examples and describe its implementation on heterogeneous target architecture.40 Dynamic Resource Provisioning in Massively Multiplayer Online Games Todays Massively Multiplayer Online Games (MMOGs) can include millions of concurrent players spread across the world and interacting with each other within a single session. Faced with high resource demand variability and with misfit resource renting policies, the current industry practice is to overprovision for each game tens of self-owned data centers, making the market entry affordable only for big companies. Focusing on the reduction of entry and operational costs, we investigate a new dynamic resource provisioning method for MMOG operation using external data centers as low-cost resource providers. First, we identify in the various types of player interaction a source of short-term load variability, which complements the long-term load variability due to the size of the player population. Then, we introduce a combined MMOG processor, network, and memory load model that takes into account both the player interaction type and the population size. Our model is best used for estimating the MMOG resource demand dynamically, and thus, for dynamic resource provisioning based on the game world entity distribution. We evaluate several classes of online predictors for MMOG entity distribution and propose and tune a neural network-based predictor to deliver good accuracy consistently under real-time performance constraints. We assess using trace-based simulation the impact of the data center policies on the quality of resource provisioning. We find that the dynamic resource provisioning can be much more efficient than its static alternative even when the external data centers are busy, and that data centers with policies unsuitable for MMOGs are penalized by our dynamic resource provisioning method. Finally, we present experimental results showing the real-time parallelization and load balancing of a real game prototype using data center resources provisioned using our method and show its advantage against a rudimentary client threshold approach.41Edge Self-Monitoring for Wireless Sensor Networks Local monitoring is an effective mechanism for the security of wireless sensor networks (WSNs). Existing schemes assume the existence of sufficient number of active nodes to carry out monitoring operations. Such an assumption, however, is often difficult for a large-scale sensor network. In this work, we focus on designing an efficient scheme integrated with good self-monitoring capability as well as providing an infrastructure for various security protocols using local monitoring. To the best of our knowledge, we are the first to present the formal study on optimizing network topology for edge self-monitoring in WSNs. We show that the problem is NP-complete even under the unit disk graph (UDG) model and give the upper bound on the approximation ratio in various graph models. We provide polynomial- time approximation scheme (PTAS) algorithms for the problem in some specific graphs, for example, the monitoring- setbounded graph. We further design two distributed polynomial algorithms with provable approximation ratio. Through comprehensive simulations, we evaluate the effectiveness of our design.42 Efficient Adaptive Scheduling of Multiprocessors with Stable Parallelism Feedback With proliferation of multicore computers and multiprocessor systems, an imminent challenge is to efficiently scheduleMadurai Trichy KollamElysium Technologies Private LimitedElysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234. eMail: [email protected]: [email protected] 9. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 2012 parallel applications on these resources. In contrast to conventional static scheduling, adaptive schedulers that dynamically allocate processors to jobs possess good potential for improving processor utilization and speeding up jobs execution. In this paper, we focus on adaptive scheduling of malleable jobs with periodic processor reallocations based on parallelism feedback of the jobs and allocation policy of the system. We present an efficient adaptive scheduler ACDEQ that provides parallelism feedback using an adaptive controller A-CONTROL and allocates processors based on the well-known Dynamic Equipartitioning algorithm (DEQ). Compared to A-GREEDY, an existing adaptive scheduler that experiences feedback instability thus incurs unnecessary scheduling overheads, we show that A-CONTROL achieves much more stable feedback among other desirable control-theoretic properties. Furthermore, we analyze algorithmically the performances of ACDEQ in terms of its response time and processor waste for an individual job as well as makespan and total response time for a set of jobs. To the best of our knowledge, ACDEQ is the first multiprocessor scheduling algorithm that offers both control-theoretic and algorithmic guarantees. We further evaluate ACDEQ via simulations by using Downeys parallel job model augmented with internal parallelism variations. The results confirm its improved performances over AGDEQ, and they show that ACDEQ excels especially when the scheduling overhead becomes high.43Enabling Public Auditability and Data Dynamics for Storage Security in Cloud Computing Cloud Computing has been envisioned as the next-generation architecture of IT Enterprise. It moves the application software and databases to the centralized large data centers, where the management of the data and services may not be fully trustworthy. This unique paradigm brings about many new security challenges, which have not been well understood. This work studies the problem of ensuring the integrity of data storage in Cloud Computing. In particular, we consider the task of allowing a third party auditor (TPA), on behalf of the cloud client, to verify the integrity of the dynamic data stored in the cloud. The introduction of TPA eliminates the involvement of the client through the auditing of whether his data stored in the cloud are indeed intact, which can be important in achieving economies of scale for Cloud Computing. The support for data dynamics via the most general forms of data operation, such as block modification, insertion, and deletion, is also a significant step toward practicality, since services in Cloud Computing are not limited to archive or backup data only. While prior works on ensuring remote data integrity often lacks the support of either public auditability or dynamic data operations, this paper achieves both. We first identify the difficulties and potential security problems of direct extensions with fully dynamic data updates from prior works and then show how to construct an elegant verification scheme for the seamless integration of these two salient features in our protocol design. In particular, to achieve efficient data dynamics, we improve the existing proof of storage models by manipulating the classic Merkle Hash Tree construction for block tag authentication. To support efficient handling of multiple auditing tasks, we further explore the technique of bilinear aggregate signature to extend our main result into a multiuser setting, where TPA can perform multiple auditing tasks simultaneously. Extensive security and performance analysis show that the proposed schemes are highly efficient and provably secure.44 Energy Conscious Scheduling for Distributed Computing Systems under Different Operating Conditions Traditionally, the primary performance goal of computer systems has focused on reducing the execution time of applications while increasing throughput. This performance goal has been mostly achieved by the development of high-density computer systems. As witnessed recently, these systems provide very powerful processing capability and capacity. They often consist of tens or hundreds of thousands of processors and other resource-hungry devices. The energy consumption of these systems has become a major concern. In this paper, we address the problem of scheduling precedence-constrained parallel applications on multiprocessor computer systems and present two energy- conscious scheduling algorithms using dynamic voltage scaling (DVS). A number of recent commodity processors are capable of DVS, which enables processors to operate at different voltage supply levels at the expense of sacrificingMadurai TrichyKollamElysium Technologies Private LimitedElysium Technologies Private LimitedElysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers,Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy, kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001.Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234.eMail: [email protected]: [email protected] 9 10. Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Trichy | Coimbatore | Kollam| SingaporeWebsite: elysiumtechnologies.com, elysiumtechnologies.infoEmail: [email protected] Project List 2011 - 2012 clock frequencies. In the context of scheduling, this multiple voltage facility implies that there is a trade-off between the quality of schedules and energy consumption. To effectively balance these two performance goals, we have devised a novel objective function and a variant from that. The main difference between the two algorithms is in their measurement of energy consumption. The extensive comparative evaluations conducted as part of this work show that the performance of our algorithms is very compelling in terms of both application completion time and energy consumption.45Energy-Efficient Localized Routing in Random Multihop Wireless Networks A number of energy-aware routing protocols were proposed to seek the energy efficiency of routes in multihop wireless networks. Among them, several geographical localized routing protocols were proposed to help making smarter routing decision using only local information and reduce the routing overhead. However, all proposed localized routing methods cannot guarantee the energy efficiency of their routes. In this paper, we first give a simple localized routing algorithm, called Localized Energy-Aware Restricted Neighborhood routing (LEARN), which can guarantee the energy efficiency of its route if it can find the route successfully. We then theoretically study its critical transmission radius in random networks which can guarantee that LEARN routing finds a route for any source and destination pairs asymptotically almost surely. We also extend the proposed routing into three-dimensional (3D) networks and derive its critical transmission radius in 3D random networks. Simulation results confirm our theoretical analysis of LEARN routing and demonstrate its energy efficiency in large scale random networks.46 Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud In recent years ad hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a- Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs. However, the processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper, we discuss the opportunities and challenges for efficient parallel data processing in clouds and present our research project Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by todays IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution. Based on this new framework, we perform extended evaluations of MapReduce-inspired processing jobs on an IaaS cloud system and compare the results to the popular data processing framework Hadoop.47Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing. At the core of this phenomenon are massively multithreaded, data-parallel architectures possessing impressive acceleration ratings, offering low-cost supercomputing together with attractive power budgets. Even given the numerous benefits provided by GPGPUs, there remain a number of barriers that delay wider adoption ofMaduraiTrichy KollamElysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020. 15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702. Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] : 91431 - 4002234. eMail: [email protected] eMail: [email protected] 11. Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Trichy | Coimbatore | Kollam| SingaporeWebsite: elysiumtechnologies.com, elysiumtechnologies.infoEmail: [email protected] Project List 2011 - 2012 these architectures. One major issue is the heterogeneous and distributed nature of the memory subsystem commonly found on data-parallel architectures. Application acceleration is highly dependent on being able to utilize the memory subsystem effectively so that all execution units remain busy. In this paper, we present techniques for enhancing the memory efficiency of applications on data-parallel architectures, based on the analysis and characterization of memory access patterns in loop bodies; we target vectorization via data transformation to benefit vector-based architectures (e.g., AMD GPUs) and algorithmic memory selection for scalar-based architectures (e.g., NVIDIA GPUs). We demonstrate the effectiveness of our proposed methods with kernels from a wide range of benchmark suites. For the benchmark kernels studied, we achieve consistent and significant performance improvements (up to 11:4x and 13:5x over baseline GPU implementations on each platform, respectively) by applying our proposed methodology.48 Fast and Cost-Effective Online Load-Balancing in Distributed Range-Queriable Systems Distributed systems such as Peer-to-Peer overlays have been shown to efficiently support the processing of range queries over large numbers of participating hosts. In such systems, uneven load allocation has to be effectively tackled in order to minimize overloaded peers and optimize their performance. In this work, we detect the two basic methodologies used to achieve load-balancing: Iterative key redistribution between neighbors and node migration. We identify these two key mechanisms and describe their relative advantages and disadvantages. Based on this analysis, we propose NIXMIG, a hybrid method that adaptively utilizes these two extremes to achieve both fast and cost-effective load-balancing in distributed systems that support range queries. We theoretically prove its convergence and as a case study, we offer an implementation on top of a Skip Graph, where we thoroughly validate our findings in a variety of static, dynamic and realistic workloads. We compare NIXMIG with an existing load-balancing algorithm proposed by Karger and Ruhl [1] and our experimental analysis shows that, NIXMIG can be as much as three times faster, requiring only one sixth and one third of message and item exchanges, respectively, to bring the system to a balanced state.49FDAC: Toward Fine-Grained Distributed Data Access Control in Wireless Sensor Networks Distributed sensor data storage and retrieval have gained increasing popularity in recent years for supporting various applications. While distributed architecture enjoys a more robust and fault-tolerant wireless sensor network (WSN), such architecture also poses a number of security challenges especially when applied in mission-critical applications such as battlefield and ehealthcare. First, as sensor data are stored and maintained by individual sensors and unattended sensors are easily subject to strong attacks such as physical compromise, it is significantly harder to ensure data security. Second, in many mission-critical applications, fine-grained data access control is a must as illegal access to the sensitive data may cause disastrous results and/or be prohibited by the law. Last but not least, sensor nodes usually are resource-constrained, which limits the direct adoption of expensive cryptographic primitives. To address the above challenges, we propose, in this paper, a distributed data access control scheme that is able to enforce fine-grained access control over sensor data and is resilient against strong attacks such as sensor compromise and user colluding. The proposed scheme exploits a novel cryptographic primitive called attribute-based encryption (ABE), tailors, and adapts it for WSNs with respect to both performance and security requirements. The feasibility of the scheme is demonstrated by experiments on real sensor platforms. To our best knowledge, this paper is the first to realize distributed fine-grained data access control for WSNs.50 Flexible Robust Group Key Agreement A robust group key agreement protocol (GKA) allows a set of players to establish a shared secret key, regardless ofMaduraiTrichyKollamElysium Technologies Private Limited Elysium Technologies Private LimitedElysium Technologies Private Limited230, Church Road, Annanagar, 3rd Floor,SI Towers,Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020. 15 ,Melapudur , Trichy, kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702. Tamilnadu 620 001.Contact : 91474 2723622.eMail: [email protected] : 91431 - 4002234.eMail: [email protected] eMail: [email protected] 11 12. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 2012 network/node failures. Current constant-round GKA protocols are either efficient and nonrobust or robust but not efficient; assuming a reliable broadcast communication medium, the standard encryption-based group key agreement protocol can be robust against arbitrary number of node faults, but the size of the messages broadcast by every player is proportional to the number of players. In contrast, nonrobust group key agreement can be achieved with each player broadcasting just constant-sized messages. We propose a novel 2-round group key agreement protocol, which tolerates up to T node failures, using OT-sized messages for any T. We show that the new protocol implies a fully- robust group key agreement with logarithmic-sized messages and expected round complexity close to 2, assuming random node faults. The protocol can be extended to withstand malicious insiders at small constant factor increases in bandwidth and computation. The proposed protocol is secure under the (standard) Decisional Square Diffie-Hellman assumption.51Group Strategy proof Multicast in Wireless NetworksWe study the dissemination of common information from a source to multiple nodes within a multihop wireless network,where nodes are equipped with uniform omnidirectional antennas and have a fixed cost per packet transmission. Whilemany nodes may be interested in the dissemination service, their valuation or utility for such a service is usually privateinformation. A desirable routing and charging mechanism encourages truthful utility reports from the nodes. We provideboth negative and positive results toward such mechanism design. We show that in order to achieve the groupstrategyproof property, a compromise in routing optimality or budget-balance is inevitable. In particular, the fraction ofoptimal routing cost that can be recovered through node charges cannot be significantly higher than 1 2 . To answer thequestion whether constant-ratio cost recovery is possible, we further apply a primal-dual schema to simultaneouslybuild a routing solution and a cost-sharing scheme, and prove that the resulting mechanism is group strategyproof andguarantees 1 4 -approximate cost recovery against an optimal routing scheme.52 HaRP: Rapid Packet Classification via Hashing Round-Down Prefixes Packet classification is central to a wide array of Internet applications and services, with its approaches mostly involving either hardware support or optimization steps needed by software-oriented techniques (to add precomputed markers and insert rules in the search data structures). Unfortunately, an approach with hardware support is expensive and has limited scalability, whereas one with optimization fails to handle incremental rule updates effectively. This work deals with rapid packet classification, realized by hashing round-down prefixes (HaRP) in a way that the source and the destination IP prefixes specified in a rule are rounded down to designated prefix lengths (DPL) for indexing into hash sets. HaRP exhibits superb hash storage utilization, able to not only outperform those earlier software-oriented classification techniques but also well accommodate dynamic creation and deletion of rules. HaRP makes it possible to hold all its search data structures in the local cache of each core within a contemporary processor, dramatically elevating its classification performance. Empirical results measured on an AMD 4-way 2.8 GHz Opteron system (with 1 MB cache for each core) under six filter data sets (each with up to 30 K rules) obtained from a public source unveil that HaRP enjoys up to some 3:6throughput level achievable by the best known decision tree-based counterpart, HyperCuts (HC).53hiCUDA: High-Level GPGPU Programming Graphics Processing Units (GPUs) have become a competitive accelerator for applications outside the graphics domain,Madurai Trichy KollamElysium Technologies Private LimitedElysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234. eMail: [email protected]: [email protected] 13. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 2012 mainly driven by the improvements in GPU programmability. Although the Compute Unified Device Architecture (CUDA) is a simple C-like interface for programming NVIDIA GPUs, porting applications to CUDA remains a challenge to average programmers. In particular, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host and GPU memories, and of manually optimizing the utilization of the GPU memory. Practical experience shows that the programmer needs to make significant code changes, often tedious and error-prone, before getting an optimized program. We have designed hi CUDA, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process. In this paper, we describe the hi CUDA directives as well as the design and implementation of a prototype compiler that translates a hi CUDA program to a CUDA program. Our compiler is able to support real-world applications that span multiple procedures and use dynamically allocated arrays. Experiments using nine CUDA benchmarks show that the simplicity hi CUDA provides comes at no expense to performance.54 Hybrid Core Acceleration of UWB SIRE Radar Signal Processing To move High-Performance Computing (HPC) closer to forward operating environments and missions, the Army Research Laboratory is developing approaches using hybrid, asymmetric core computing. By blending capabilities found in Graphics Processing Units (GPUs) and traditional von Neumann multicore Central Processing Units (CPUs), approaches are being developed and optimized to provide at or near real-time processing speeds for research project applications. Algorithms are designed to partition work to resources best designed to handle the processing load. The use of commodity resources allows the design to be flexible throughout the life cycle without the costly and time-consuming delays associated with Application-Specific Integrated Circuit (ASIC) development. This paradigm allows for rapid technology transfer to end users. In this paper, we describe a synchronous impulse reconstruction radar imaging algorithm that has been designed for hybrid CPU-GPU processing. We discuss various optimizations such as asynchronous task partitioning between the CPU and GPU as well as data movement reduction. We also discuss analysis and design of the algorithms within the context of two programming models: NVIDIAs CUDA and AMDs ATI Brook+. Finally, we report on the speedup achieved by this approach that allowed us to take a code once restricted to post processing and transform it into one that exceeds real-time Performance requirements.55Impact of Traffic Influxes: Revealing Exponential Inter contact Time in Urban VANETs Intercontact time between moving vehicles is one of the key metrics in vehicular ad hoc networks (VANETs) and central to forwarding algorithms and the end-to-end delay. Due to prohibitive costs, little work has conducted experimental study on intercontact time in urban vehicular environments. In this paper, we carry out an extensive experiment involving thousands of operational taxies in Shanghai city. Studying the taxi trace data on the frequency and duration of transfer opportunities between taxies, we observe that the tail distribution of the intercontact time, that is, the time gap separating two contacts of the same pair of taxies, exhibits an exponential decay, over a large range of timescale. This observation is in sharp contrast to recent empirical data studies based on human mobility, in which the distribution of the intercontact time obeys a power law. By analyzing a simplified mobility model that captures the effect of hot areas in the city, we rigorously prove that common traffic influxes, where large volume of traffic converges, play a major role in generating the exponential tail of the intercontact time. Our results thus provide fundamental guidelines on design of new vehicular mobility models in urban scenarios, new data forwarding protocols and their performance analysis.MaduraiTrichy KollamElysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020. 15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702. Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] : 91431 - 4002234. eMail: [email protected] eMail: [email protected] 13 14. Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Trichy | Coimbatore | Kollam| SingaporeWebsite: elysiumtechnologies.com, elysiumtechnologies.infoEmail: [email protected] Project List 2011 - 201256 Integrating Caching and Prefetching Mechanisms in a Distributed Transactional Memory We present a distributed transactional memory system that exploits a new opportunity to automatically hide network latency by speculatively prefetching and caching objects. The system includes an object caching framework, language extensions to support our approach, and symbolic prefetches. To our knowledge, this is the first prefetching approach that can prefetch objects whose addresses have not been computed or predicted. Our approach makes aggressive use of both prefetching and caching of remote objects to hide network latency while relying on the transaction commit mechanism to preserve the simple transactional consistency model that we present to the developer. We have evaluated this approach on three distributed benchmarks, five scientific benchmarks, and several micro benchmarks. We have found that our approach enables our benchmark applications to effectively utilize multiple machines and benefit from prefetching and caching. We have observed a speedup of up to 7:26for distributed applications on our system using prefetching and caching and a speedup of up to 5:55 for parallel applications on our system.57Interlacing Bypass Rings to Torus Networks for More Efficient Networks We introduce a new technique for generating more efficient networks by systematically interlacing bypass rings to torus networks (iBT networks). The resulting network can improve the original torus network by reducing the network diameter, node-to-node distances, and by increasing the bisection width without increasing wiring and other engineering complexity. We present and analyze the statement that a 3D iBT network proposed by our technique outperforms 4D torus networks of the same node degree. We found that interlacing rings of sizes 6 and 12 to all three dimensions of a torus network with meshes 30 x30 x 36 generate the best network of all possible networks, including 4D torus and hypercube of approximately 32,000 nodes. This demonstrates that strategically interlacing bypass rings into a 3D torus network enhances the torus network more effectively than adding a fourth dimension, although we may generalize the claim. We also present a node-to-node distance formula for the iBT networks.58 Joint Optimization of Complexity and Overhead for the Routing in Hierarchical Networks The hierarchical network structure was proposed in the early 80s and becomes popular nowadays. The routing complexity and the routing table size are the two primary performance measures in a dynamic route guidance system. Although various algorithms exist for finding the best routing policy in a hierarchical network, hardly exists any work in studying and evaluating the aforementioned measures for a hierarchical network. In this paper, a new mathematical framework to carry out the averages of the routing complexity and the routing table size is proposed to express the routing complexity and the routing table size as the functions of the hierarchical network parameters such as the number of the hierarchical levels and the subscriber density (cluster-population) for each hierarchical level.Madurai Trichy KollamElysium Technologies Private LimitedElysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234. eMail: [email protected]: [email protected] 14 15. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 201259Key Pre distribution Schemes for Establishing Pairwise Keys with a Mobile Sink in Sensor Networks Security services such as authentication and pair wise key establishment are critical to sensor networks. They enable sensor nodes to communicate securely with each other using cryptographic techniques. In this paper, we propose two key pre distribution schemes that enable a mobile sink to establish a secure data-communication link, on the fly, with any sensor nodes. The proposed schemes are based on the polynomial pool-based key pre distribution scheme, the probabilistic generation key pre distribution scheme, and the Q-composite scheme. The security analysis in this paper indicates that these two proposed pre distribution schemes assure, with high probability and low communication overhead, that any sensor node can establish a pair wise key with the mobile sink. Comparing the two proposed key pre distribution schemes with the Q-composite scheme, the probabilistic key pre distribution scheme, and the polynomial pool-based scheme, our analytical results clearly show that our schemes perform better in terms of network resilience to node capture than existing schemes if used in wireless sensor networks with mobile sinks.60 LBMP: A Logarithm-Barrier-Based Multipath Protocol for Internet Traffic Management Traffic management is the adaptation of source rates and routing to efficiently utilize network resources. Recently, the complicated interactions between different Internet traffic management modules have been elegantly modeled by distributed primaldual utility maximization, which sheds new light for developing effective management protocols. For single-path routing with given routes, the dual is a strictly concave network optimization problem. Unfortunately, the general form of multipath utility optimization is not strictly concave, making its solution quite unstable. Decomposition-based techniques like TRaffic- management Using Multipath Protocol (TRUMP) alleviates the instability, but their convergence is not guaranteed, nor is their optimality. They are also inflexible in differentiating the control at different links. In this paper, we address the above issues through a novel logarithm-barrier-based approach. Our approach jointly considers user utility and routing/congestion control. It translates the multipath utility maximization into a sequence of unconstrained optimization problems, with infinite logarithm barriers being deployed at the constraint boundary. We demonstrate that setting up barriers is much simpler than choosing traditional cost functions and, more importantly, it makes optimal solution achievable. We further demonstrate a distributed implementation, together with the design of a practical Logarithm Barrierbased- Multipath Protocol (LBMP). We evaluate the performance of LBMP through both numerical analysis and packet-level simulations. The results show that LBMP achieves high throughput and fast convergence over diverse representative network topologies. Such performance is comparable to TRUMP, and is often better. Moreover, LBMP is flexible in differentiating the control at different links, and its optimality and convergence are theoretically guaranteed.61Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip Irregular and dynamic applications, such as graph problems and agent-based simulations, often require fine-grained parallelism to achieve good performance. However, current multicore processors only provide architectural support for coarse-grained parallelism, making it necessary to use software-based multithreading environments to effectively implement fine-grained parallelism. Although these software-based environments have demonstrated superior performance over heavyweight, OS-level threads, they are still limited by the significant overhead involved in thread management and synchronization. In order to address this, we propose a Lightweight Chip Multi-Threaded (LCMT) architecture that further exploits thread-level parallelism (TLP) by incorporating direct architectural support for anMaduraiTrichyKollamElysium Technologies Private Limited Elysium Technologies Private LimitedElysium Technologies Private Limited230, Church Road, Annanagar, 3rd Floor,SI Towers,Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020. 15 ,Melapudur , Trichy, kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702. Tamilnadu 620 001.Contact : 91474 2723622.eMail: [email protected] : 91431 - 4002234.eMail: [email protected] eMail: [email protected] 15 16. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 2012 unlimited number of dynamically created lightweight threads with very low thread management and synchronization overhead. The LCMT architecture can be implemented atop a mainstream architecture with minimum extra hardware to leverage existing legacy software environments. We compare the LCMT architecture with a Niagara-like baseline architecture. Our results show up to 1.8X better scalability, 1.91X better performance, and more importantly, 1.74X better performance per watt, using the LCMT architecture for irregular and dynamic benchmarks, when compared to the baseline architecture. The LCMT architecture delivers similar performance to the baseline architecture for regular benchmarks.62Load Balance with Imperfect Information in Structured Peer-to-Peer Systems With the notion of virtual servers, peers participating in a heterogeneous, structured peer-to-peer (P2P) network may host different numbers of virtual servers, and by migrating virtual servers, peers can balance their loads proportional to their capacities. The existing and decentralized load balance algorithms designed for the heterogeneous, structured P2P networks either explicitly construct auxiliary networks to manipulate global information or implicitly demand the P2P substrates organized in a hierarchical fashion. Without relying on any auxiliary networks and independent of the geometry of the P2P substrates, we present, in this paper, a novel load balancing algorithm that is unique in that each participating peer is based on the partial knowledge of the system to estimate the probability distributions of the capacities of peers and the loads of virtual servers, resulting in imperfect knowledge of the system state. With the imperfect system state, peers can compute their expected loads and reallocate their loads in parallel. Through extensive simulations, we compare our proposal to prior load balancing algorithms.63 Many Task Computing for Real-Time Uncertainty Prediction and Data Assimilation in the Ocean Uncertainty prediction for ocean and climate predictions is essential for multiple applications today. Many-Task Computing can play a significant role in making such predictions feasible. In this manuscript, we focus on ocean uncertainty prediction using the Error Subspace Statistical Estimation (ESSE) approach. In ESSE, uncertainties are represented by an error subspace of variable size. To predict these uncertainties, we perturb an initial state based on the initial error subspace and integrate the corresponding ensemble of initial conditions forward in time, including stochastic forcing during each simulation. The dominant error covariance (generated via SVD of the ensemble) is used for data assimilation. The resulting ocean fields are used as inputs for predictions of underwater sound propagation. ESSE is a classic case of Many Task Computing: It uses dynamic heterogeneous workflows and ESSE ensembles are data intensive applications. We first study the execution characteristics of a distributed ESSE workflow on a medium size dedicated cluster, examine in more detail the I/O patterns exhibited and throughputs achieved by its components as well as the overall ensemble performance seen in practice. We then study the performance/usability challenges of employing Amazon EC2 and the Teragrid to augment our ESSE ensembles and provide better solutions faster.64 Mars: Accelerating MapReduce with Graphics Processors We design and implement Mars, a MapReduce runtime system accelerated with graphics processing units (GPUs). MapReduce is a simple and flexible parallel programming paradigm originally proposed by Google, for the ease of large-scale data processing on thousands of CPUs. Compared with CPUs, GPUs have an order of magnitude higher computation power and memory bandwidth. However, GPUs are designed as special-purpose coprocessors and their programming interfaces are lessMadurai Trichy KollamElysium Technologies Private LimitedElysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234. eMail: [email protected]: [email protected] 17. Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Trichy | Coimbatore | Kollam| SingaporeWebsite: elysiumtechnologies.com, elysiumtechnologies.infoEmail: [email protected] Project List 2011 - 2012 familiar than those on the CPUs to MapReduce programmers. To harness GPUs power for MapReduce, we developed Mars to run on NVIDIA GPUs, AMD GPUs as well as multicore CPUs. Furthermore, we integrated Mars into Hadoop, an open-source CPU-based MapReduce system. Mars hides the programming complexity of GPUs behind the simple and familiar MapReduce interface, and automatically manages task partitioning, data distribution, and parallelization on the processors. We have implemented six representative applications on Mars and evaluated their performance on PCs equipped with GPUs as well as multicore CPUs. The experimental results show that, the GPU-CPU coprocessing of Mars on an NVIDIA GTX280 GPU and an Intel quad-core CPU outperformed Phoenix, the state-of-the-art MapReduce on the multicore CPU with a speedup of up to 72 times and 24 times on average, depending on the applications. Additionally, integrating Mars into Hadoop enabled GPU acceleration for a network of PCs.65 Massively LDPC Decoding on Multicore Architectures Unlike usual VLSI approaches necessary for the computation of intensive Low-Density Parity-Check (LDPC) code decoders, this paper presents flexible software-based LDPC decoders. Algorithms and data structures suitable for parallel computing are proposed in this paper to perform LDPC decoding on multicore architectures. To evaluate the efficiency of the proposed parallel algorithms, LDPC decoders were developed on recent multicores, such as off-the- shelf general-purpose x86 processors, Graphics Processing Units (GPUs), and the CELL Broadband Engine (CELL/B.E.). Challenging restrictions, such as memory access conflicts, latency, coalescence, or unknown behavior of thread and block schedulers, were unraveled and worked out. Experimental results for different code lengths show throughputs in the order of 1 ~ 2 Mbps on the general-purpose multicores, and ranging from 40 Mbps on the GPU to nearly 70 Mbps on the CELL/B.E. The analysis of the obtained results allows to conclude that the CELL/B.E. performs better for short to medium length codes, while the GPU achieves superior throughputs with larger codes. They achieve throughputs that in some cases approach very well those obtained with VLSI decoders. From the analysis of the results, we can predict a throughput increase with the rise of the number of cores. Index TermsLDPC, data-parallel computing, multicore, graphics66Maximizing the Number of Broadcast Operations in Random Geometric Ad Hoc Wireless Networks We consider static ad hoc wireless networks whose nodes, equipped with the same initial battery charge, may dynamically change their transmission range. When a node v transmits with range r(v), its battery charge is decreased by B r(v)2, where B > 0 is a fixed constant. The goal is to provide a range assignment schedule that maximizes the number of broadcast operations from a given source (this number is denoted by the length of the schedule). This maximization problem, denoted by MAX LIFETIME, is known to be NP-hard and the best algorithm yields worst-case approximation ratio(log n), where n is the number of nodes of the network. We consider random geometric instances formed by selecting n points independently and uniformly at random from a square of side length root( n) p in the Euclidean plane. We present an efficient algorithm that constructs a range assignment schedule having length not smaller than 12 of the optimum with high probability. Then we design an efficient distributed version of the above algorithm, where nodes initially know n and their own position only. The resulting schedule guarantees the same approximation ratio achieved by the centralized version, thus, obtaining the first distributed algorithm having provably good performance for this problem.67 Measuring Client-Perceived Page view Response Time of Internet Services As e-commerce services are exponentially growing, businesses need quantitative estimates of client-perceived response times to continuously improve the quality of their services. Current server-side nonintrusive measurement techniques are limited to non secured HTTP traffic. In this paper, we present the design and evaluation a monitor,Madurai TrichyKollamElysium Technologies Private LimitedElysium Technologies Private LimitedElysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers,Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy, kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001.Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234.eMail: [email protected]: [email protected] 17 18. Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Trichy | Coimbatore | Kollam| SingaporeWebsite: elysiumtechnologies.com, elysiumtechnologies.infoEmail: [email protected] Project List 2011 - 2012 namely s Monitor, which is able to measure client-perceived response times for both HTTP and HTTPS traffic. At the heart of s Monitor is a novel size-based analysis method that parses live packets to delimit different web pages and to infer their response times. The method is based on the observation that most HTTP(S)-compatible browsers send significantly larger requests for container objects than those for embedded objects. S Monitor is designed to operate accurately in the presence of complicated browser behaviors, such as parallel downloading of multiple web pages and HTTP pipelining, as well as packet losses and delays. It requires only to passively collect network traffic in and out of the monitored secured services. We conduct comprehensive experiments across a wide range of operating conditions using live secured Internet services, on the Planet Lab, and on controlled networks. The experimental results demonstrate that s Monitor is able to control the estimation error within 6.7 percent, in comparison with the actual measured time at the client side.68Metadata Distribution and Consistency Techniques for Large-Scale Cluster File Systems Most supercomputers nowadays are based on large clusters, which call for sophisticated, scalable, and decentralized metadata processing techniques. From the perspective of maximizing metadata throughput, an ideal metadata distribution policy should automatically balance the namespace locality and even distribution without manual intervention. None of existing metadata distribution schemes is designed to make such a balance. We propose a novel metadata distribution policy, Dynamic Dir-Grain (DDG), which seeks to balance the requirements of keeping namespace locality and even distribution of the load by dynamic partitioning of the namespace into size-adjustable hierarchical units. Extensive simulation and measurement results show that DDG policies with a proper granularity significantly outperform traditional techniques such as the Random policy and the Sub tree policy by 40 percent to 62 times. In addition, from the perspective of file system reliability, metadata consistency is an equally important issue. However, it is complicated by dynamic metadata distribution. Metadata consistency of cross-metadata server operations cannot be solved by traditional metadata journaling on each server. While traditional two-phase commit (2PC) algorithm can be used, it is too costly for distributed file systems. We proposed a consistent metadata processing protocol, S2PC-MP, which combines the two-phase commit algorithm with metadata processing to reduce overheads. Our measurement results show that S2PC-MP not only ensures fast recovery, but also greatly reduces fail-free execution overheads.69 Minimum-Delay Service Provisioning in Opportunistic Networks Opportunistic networks are created dynamically by exploiting contacts between pairs of mobile devices that come within communication range. While forwarding in opportunistic networking has been explored, investigations into asynchronous service provisioning on top of opportunistic networks are unique contributions of this paper. Mobile devices are typically heterogeneous, possess disparate physical resources, and can provide a variety of services. During opportunistic contacts, the pairing peers can cooperatively provide (avail of) their (other peers) services. This service provisioning paradigm is a key feature of the emerging opportunistic computing paradigm. We develop an analytical model to study the behaviors of service seeking nodes (seekers) and service providing nodes (providers) that spawn and execute service requests, respectively. The model considers the case in which seekers can spawn parallel executions on multiple providers for any given request, and determines: 1) the delays at different stages of service provisioning; and 2) the optimal number of parallel executions that minimizes the expected execution time. The analytical model is validated through simulations, and exploited to investigate the performance of service provisioning over a wide range of parameters.Madurai TrichyKollamElysium Technologies Private LimitedElysium Technologies Private LimitedElysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers,Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy, kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001.Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234.eMail: [email protected]: [email protected] 18 19. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: [email protected] IEEE Project List 2011 - 201270Multicloud Deployment of Computing Clusters for Loosely Coupled MTC Applications Cloud computing is gaining acceptance in many IT organizations, as an elastic, flexible, and variable-cost way to deploy their service platforms using outsourced resources. Unlike traditional utilities where a single provider scheme is a common practice, the ubiquitous access to cloud resources easily enables the simultaneous use of different clouds. In this paper, we explore this scenario to deploy a computing cluster on the top of a multi cloud infrastructure, for solving loosely coupled Many-Task Computing (MTC) applications. In this way, the cluster nodes can be provisioned with resources from different clouds to improve the cost effectiveness of the deployment, or to implement high-availability strategies. We prove the viability of this kind of solutions by evaluating the scalability, performance, and cost of different configurations of a Sun Grid Engine cluster, deployed on a multi cloud infrastructure spanning a local data center and three different cloud sites: Amazon EC2 Europe, Amazon EC2 US, and Elastic Hosts. Although the testbed deployed in this work is limited to a reduced number of computing resources (due to hardware and budget limitations), we have complemented our analysis with a simulated infrastructure model, which includes a larger number of resources, and runs larger problem sizes. Data obtained by simulation show that performance and cost results can be extrapolated to large-scale problems and cluster infrastructures.71 Multipath Routing and Max-Min Fair QoS Provisioning under Interference Constraints in Wireless Multi hop Networks In this paper, we investigate the problem of flow routing and fair bandwidth allocation under interference constraints for multi hop wireless networks. We first develop a novel isotonic routing metric, RI3M, considering the influence of interflow and intra flow interference. The isotonicity of the routing metric is proved using virtual network decomposition. Second, in order to ensure QoS, an interference-aware max-min fair bandwidth allocation algorithm, LMX:M3F, is proposed where multiple paths (determined by using the routing metric) coexist for each user to the base station. In order to solve the algorithm, we develop an optimization formulation that is modeled as a multi commodity flow problem where the lexicographically largest bandwidth allocation vector is found among all optimal allocation vectors while considering constraints of interference on the flows. We compare our RI3M routing metric and LMX:M3F bandwidth allocation algorithm with various interference-based routing metrics and interference-aware bandwidth allocation algorithms established in the literature. We show that RI3M and LMX:M3F succeed in improving network performance in terms of delay, packet loss ratio, and bandwidth usage.72Multispanning Tree Zone-Ordered Label-Based Routing Algorithms for Irregular Networks In this paper, a diverse range of routing algorithms is classified into a new family of routings called zone-ordered label based routing algorithms. The proposed classification is based on three common steps (factors) for generating such routings, namely, graph labeling, deadlock-free zones, and zone ordering. The main goal of this classification is to define several new routing concepts and streamline the knowledge on routing algorithms. Following the classification, a novel methodology is proposed to generate routing algorithms for irregular networks. The methodology uses theMadurai Trichy KollamElysium Technologies Private LimitedElysium Technologies Private Limited Elysium Technologies Private Limited230, Church Road, Annanagar,3rd Floor,SI Towers, Surya Complex,Vendor junction,Madurai , Tamilnadu 625 020.15 ,Melapudur , Trichy,kollam,Kerala 691 010.Contact : 91452 4390702, 4392702, 4394702.Tamilnadu 620 001. Contact : 91474 2723622.eMail: [email protected] Contact : 91431 - 4002234. eMail: [email protected]: [email protected] 20. Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Trichy | Coimbatore | Kollam| SingaporeWebsite: elysiumtechnologies.com, elysiumtechnologies.infoEmail: [email protected] Project List 2011 - 2012 three mentioned steps to generate deadlock-free routings. Consequently, the methodology-based routings fall into the category of zone-ordered label-based routings. However, the graph labeling method (first step) used in the methodology is based on multiple spanning tree construction on the network. The simulation results show that constructing further spanning trees may result in routing algorithm with better performance.73 Network Immunization with Distributed Autonomy-Oriented Entities Many communication systems, e.g., internet, can be modeled as complex networks. For such networks, immunization strategies are necessary for preventing malicious attacks or viruses being percolated from a node to its neighboring nodes following their connectivities. In recent years, various immunization strategies have been proposed and demonstrated, most of which rest on the assumpt

ieee final year projects 2011-2012 :: elysium technologies pvt ltd::parallel computing

Education