ROLE
Author only
· Editor only
· Advisor only
· Other only
· All roles
AUTHOR'S COLLEAGUES
See all colleagues of this author
SUBJECT AREAS
See all subject areas
KEYWORDS
See all author supplied keywords
AUTHOR PROFILE PAGES
Project background
Author-Izer Service
BOOKMARK & SHARE
|
|
70 results found
Export Results:
bibtex
| endnote
| acmref
| csv
Result page:
1
2
3
4
1
RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads
January 2016
ACM Transactions on Architecture and Code Optimization (TACO): Volume 12 Issue 4, January 2016
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 13, Downloads (12 Months): 117, Downloads (Overall): 143
Full text available:
 PDF
This article aims to tackle two fundamental memory bottlenecks: limited off-chip bandwidth (bandwidth wall) and long access latency (memory wall). To achieve this goal, our approach exploits the inherent error resilience of a wide range of applications. We introduce an approximation technique, called Rollback-Free Value Prediction (RFVP). When certain safe-to-approximate ...
Keywords:
Load value approximation, memory bandwidth, value prediction, memory latency, GPUs
2
Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses
November 2015
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 7
Downloads (6 Weeks): 18, Downloads (12 Months): 146, Downloads (Overall): 207
Full text available:
 PDF
Many data structures (e.g., matrices) are typically accessed with multiple access patterns. Depending on the layout of the data structure in physical address space, some access patterns result in non-unit strides. In existing systems, which are optimized to store and access cache lines, non-unit strided accesses exhibit low spatial locality. ...
Keywords:
SIMD, in-memory databases, memory bandwidth, performance, energy, strided accesses, DRAM, caches
3
Tracking and Reducing Uncertainty in Dataflow Analysis-Based Dynamic Parallel Monitoring
October 2015
PACT '15: Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)
Publisher: IEEE Computer Society
Dataflow analysis-based dynamic parallel monitoring(DADPM) is a recent approach for identifying bugsin parallel software as it executes, based on the key insightof explicitly modeling a sliding window of uncertainty acrossparallel threads. While this makes the approach practical andscalable, it also introduces the possibility of false positives inthe analysis. In this ...
4
Fast Bulk Bitwise AND and OR in DRAM
June 2015
IEEE Computer Architecture Letters: Volume 14 Issue 2, July 2015
Publisher: IEEE Computer Society
Bitwise operations are an important component of modern day programming, and are used in a variety of applications such as databases. In this work, we propose a new and simple mechanism to implement bulk bitwise AND and OR operations in DRAM, which is faster and more efficient than existing mechanisms. ...
5
Toggle-Aware Compression for GPUs
June 2015
IEEE Computer Architecture Letters: Volume 14 Issue 2, July 2015
Publisher: IEEE Computer Society
Memory bandwidth compression can be an effective way to achieve higher system performance and energy efficiency in modern data-intensive applications by exploiting redundancy in data. Prior works studied various data compression techniques to improve both capacity (e.g., of caches and main memory) and bandwidth utilization (e.g., of the on-chip and ...
6
A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps
June 2015
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
Publisher: ACM
Bibliometrics:
Citation Count: 8
Downloads (6 Weeks): 11, Downloads (12 Months): 168, Downloads (Overall): 511
Full text available:
 PDF
Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational ...
Also published in:
January 2016
ACM SIGARCH Computer Architecture News - ISCA'15: Volume 43 Issue 3, June 2015
7
Page overlays: an enhanced virtual memory framework to enable fine-grained memory management
June 2015
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 16, Downloads (12 Months): 162, Downloads (Overall): 368
Full text available:
 PDF
Many recent works propose mechanisms demonstrating the potential advantages of managing memory at a fine (e.g., cache line) granularity---e.g., fine-grained deduplication and fine-grained memory protection. Unfortunately, existing virtual memory systems track memory at a larger granularity (e.g., 4 KB pages), inhibiting efficient implementation of such techniques. Simply reducing the page ...
Also published in:
December 2015
ACM SIGARCH Computer Architecture News - ISCA'15: Volume 43 Issue 3, June 2015
8
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks
December 2014
ACM Transactions on Architecture and Code Optimization (TACO): Volume 11 Issue 4, January 2015
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 11, Downloads (12 Months): 76, Downloads (Overall): 226
Full text available:
 PDF
Many modern high-performance processors prefetch blocks into the on-chip cache. Prefetched blocks can potentially pollute the cache by evicting more useful blocks. In this work, we observe that both accurate and inaccurate prefetches lead to cache pollution, and propose a comprehensive mechanism to mitigate prefetcher-caused cache pollution. First, we observe ...
Keywords:
Prefetching, cache insertion/promotion policy, cache pollution, caches
9
Rollback-free value prediction with approximate loads
August 2014
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation
Publisher: ACM
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 5, Downloads (12 Months): 39, Downloads (Overall): 147
Full text available:
 PDF
This paper demonstrates how to utilize the inherent error resilience of a wide range of applications to mitigate the memory wall -- the discrepancy between core and memory speed. We define a new microarchitecturally-triggered approximation technique called rollback-free value prediction. This technique predicts the value of safe-to-approximate loads when they ...
Keywords:
compilers, rollback-free value prediction, general-purpose approximate computing, memory systems
10
The dirty-block index
June 2014
ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture
Publisher: IEEE Press
Bibliometrics:
Citation Count: 9
Downloads (6 Weeks): 7, Downloads (12 Months): 50, Downloads (Overall): 255
Full text available:
 PDF
On-chip caches maintain multiple pieces of metadata about each cached block---e.g., dirty bit, coherence information, ECC. Traditionally, such metadata for each block is stored in the corresponding tag entry in the tag store. While this approach is simple to implement and scalable, it necessitates a full tag store lookup for ...
Also published in:
October 2014
ACM SIGARCH Computer Architecture News - ISCA '14: Volume 42 Issue 3, June 2014
11
Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers
February 2014
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 4, Downloads (12 Months): 61, Downloads (Overall): 298
Full text available:
 PDF
Device drivers are an Achilles' heel of modern commodity operating systems, accounting for far too many system failures. Previous work on driver reliability has focused on protecting the kernel from unsafe driver side-effects by interposing an invariant-checking layer at the driver interface, but otherwise treating the driver as a black ...
Keywords:
device drivers, dynamic analysis
Also published in:
March 2014
ACM SIGPLAN Notices - ASPLOS '14: Volume 49 Issue 4, April 2014 March 2014
ACM SIGARCH Computer Architecture News - ASPLOS '14: Volume 42 Issue 1, March 2014
12
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization
Vivek Seshadri,
Yoongu Kim,
Chris Fallin,
Donghyuk Lee,
Rachata Ausavarungnirun,
Gennady Pekhimenko,
Yixin Luo,
Onur Mutlu,
Phillip B. Gibbons,
Michael A. Kozuch,
Todd C. Mowry
November 2013
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 24
Downloads (6 Weeks): 14, Downloads (12 Months): 109, Downloads (Overall): 436
Full text available:
 PDF
Several system-level operations trigger bulk data copy or initialization. Even though these bulk data operations do not require any computation, current systems transfer a large quantity of data back and forth on the memory channel to perform such operations. As a result, bulk data operations consume high latency, bandwidth, and ...
Keywords:
memory bandwidth, bulk operations, in-memory processing, performance, energy, page copy, DRAM, page initialization
13
Linearly compressed pages: a low-complexity, low-latency main memory compression framework
November 2013
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 14
Downloads (6 Weeks): 17, Downloads (12 Months): 114, Downloads (Overall): 479
Full text available:
 PDF
Data compression is a promising approach for meeting the increasing memory capacity demands expected in future systems. Unfortunately, existing compression algorithms do not translate well when directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory ...
Keywords:
DRAM, data compression, memory, memory bandwidth, memory controller, memory capacity
14
Base-delta-immediate compression: practical data compression for on-chip caches
September 2012
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Publisher: ACM
Bibliometrics:
Citation Count: 32
Downloads (6 Weeks): 16, Downloads (12 Months): 108, Downloads (Overall): 445
Full text available:
 PDF
Cache compression is a promising technique to increase on-chip cache capacity and to decrease on-chip and off-chip bandwidth usage. Unfortunately, directly applying well-known compression algorithms (usually implemented in software) leads to high hardware complexity and unacceptable decompression/compression latencies, which in turn can negatively affect performance. Hence, there is a need ...
Keywords:
caching, cache compression, memory
15
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
September 2012
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Publisher: ACM
Bibliometrics:
Citation Count: 14
Downloads (6 Weeks): 7, Downloads (12 Months): 40, Downloads (Overall): 334
Full text available:
 PDF
Off-chip main memory has long been a bottleneck for system performance. With increasing memory pressure due to multiple on-chip cores, effective cache utilization is important. In a system with limited cache space, we would ideally like to prevent 1) cache pollution, i.e., blocks with low reuse evicting blocks with high ...
Keywords:
insertion policy, memory, caching, pollution, thrashing
16
Linearly compressed pages: a main memory compression framework with low complexity and low latency
September 2012
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 5, Downloads (12 Months): 32, Downloads (Overall): 196
Full text available:
 PDF
Keywords:
cache compression, main memory compression
17
Chrysalis analysis: incorporating synchronization arcs in dataflow-analysis-based parallel monitoring
September 2012
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 4, Downloads (12 Months): 16, Downloads (Overall): 118
Full text available:
 PDF
Software lifeguards , or tools that monitor applications at runtime, are an effective way of identifying program errors and security exploits. Parallel programs are susceptible to a wider range of possible errors than sequential programs, making them even more in need of online monitoring. Unfortunately, monitoring parallel applications is difficult ...
Keywords:
high-level synchronization, data flow analysis, vector clocks, dynamic program monitoring, parallel programming
18
Log-based architectures: using multicore to help software behave correctly
February 2011
ACM SIGOPS Operating Systems Review: Volume 45 Issue 1, January 2011
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 1, Downloads (12 Months): 23, Downloads (Overall): 242
Full text available:
 PDF
While application performance and power-efficiency are both important, application correctness is even more important. In other words, if the application is misbehaving, it is little consolation that it is doing so quickly or power-efficiently. In the Log-Based Architectures (LBA) project, we are focusing on a challenging source of application misbehavior: ...
Keywords:
log-based architectures, parallel monitoring, program monitoring, lifeguards, software bugs
19
Decoupled lifeguards: enabling path optimizations for dynamic correctness checking tools
May 2010
PLDI '10: Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation
Publisher: ACM
Bibliometrics:
Citation Count: 9
Downloads (6 Weeks): 1, Downloads (12 Months): 16, Downloads (Overall): 379
Full text available:
 PDF
Dynamic correctness checking tools (a.k.a. lifeguards) can detect a wide array of correctness issues, such as memory, security, and concurrency misbehavior, in unmodified executables at run time. However, lifeguards that are implemented using dynamic binary instrumentation (DBI) often slow down the monitored application by 10-50X, while proposals that replace DBI ...
Keywords:
dynamic code optimization, dynamic program analysis, dynamic correctness checking
Also published in:
May 2010
ACM SIGPLAN Notices - PLDI '10: Volume 45 Issue 6, June 2010
20
ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications
March 2010
ACM SIGARCH Computer Architecture News - ASPLOS '10: Volume 38 Issue 1, March 2010
Publisher: ACM
Bibliometrics:
Citation Count: 19
Downloads (6 Weeks): 1, Downloads (12 Months): 14, Downloads (Overall): 438
Full text available:
 PDF
Instruction-grain lifeguards monitor the events of a running application at the level of individual instructions in order to identify and help mitigate application bugs and security exploits. Because such lifeguards impose a 10-100X slowdown on existing platforms, previous studies have proposed hardware designs to accelerate lifeguard processing. However, these accelerators ...
Keywords:
hardware support for debugging, instruction-grain lifeguards, online parallel monitoring
Also published in:
March 2010
ACM SIGPLAN Notices - ASPLOS '10: Volume 45 Issue 3, March 2010 March 2010
ASPLOS XV: Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
|
|