Highly-decoupled thread level redundancy

مرکز و کتابخانه مطالعات اسلامی به زبان های اروپایی

منو

" Highly-decoupled thread level redundancy "
Muhammad Wasiur Rashid M. Huang

Document Type	:	Latin Dissertation
Language of Document	:	English
Record Number	:	54019
Doc. No	:	TL23973
Call number	:	‭3326559‬
Main Entry	:	Muhammad Wasiur Rashid
Title & Author	:	Highly-decoupled thread level redundancy\ Muhammad Wasiur Rashid
College	:	University of Rochester
Date	:	2008
Degree	:	Ph.D.
student score	:	2008
Page No	:	166
Abstract	:	Continued scaling of device dimensions and operating voltage reduces the critical charge and thus natural noise tolerance level of transistors. As a result, circuits can produce transient upsets that corrupt program execution and data. To prevent operational failure due to these errors, system-level techniques such as redundant execution will increasingly be required for fault detection and tolerance in future processors. The increasing prevalence of multi-core architectures makes coarse-grain thread level redundancy (TLR)—a redundant execution paradigm where duplicate copies of the same thread are executed concurrently either on the same processor or on separate processors—very attractive. In this dissertation, we introduce a highly decoupled thread level redundant (TLR) architecture that provides a very high level of error coverage without unduly impacting performance for both parallel and single threaded workloads. One of our main design goals is to support a large number of unverified instructions, so that long latencies in verification can easily be tolerated. Supporting a large slack between execution and verification also provides the flexibility required to execute the duplicate thread in a low-power mode to reduce the energy overhead of redundancy. Another important objective is to have comprehensive coverage that includes not only the computation logic but also the coherence and consistency logic in the memory subsystem. Hence, the redundant copy of the program needs to independently access the memory and the system needs to efficiently manage the non-determinism in parallel execution. Buffering a large number of unverified stores presents a microarchitectural challenge as the access latency of a naive store buffer increases significantly with size. We overcome this challenge by introducing a novel store buffering scheme where most of the slow buffer searches are filtered by the L1 cache, thus giving us the capacity without the performance overhead. To manage the non-determinism in parallel execution we propose novel memory access order tracking and enforcement algorithms that require simple, light-weight architectural support. The proposed architectural support to mitigate the challenges is entirely off the processor critical path and can easily be disabled when redundancy is not requested. The significant energy overhead associated with redundancy is a major drawback of conventional TLR architectures. The thesis extensively analyzes the energy overhead of several TLR designs. We propose a new architecture that provides TLR support at a considerably lower energy overhead compared to other designs by parallelizing the verification operation and executing them on a chip multiprocessor with support for frequency and voltage scaling. Finally, we show that the principle of highly decoupled TLR can be applied to mitigate bottlenecks in the uniprocessor pipeline by introducing an alternative memory disambiguation technique.
Subject	:	Applied sciences; Decoupled execution; Fault tolerance; Store buffering; Thread-level redundancy; Electrical engineering; Computer science; 0984:Computer science; 0544:Electrical engineering
Added Entry	:	M. Huang
Added Entry	:	University of Rochester