ROLE
Author only
· Editor only
· Other only
· All roles
AUTHOR'S COLLEAGUES
See all colleagues of this author
AUTHOR PROFILE PAGES
Project background
Author-Izer Service
BOOKMARK & SHARE
|
|
151 results found
Export Results:
bibtex
| endnote
| acmref
| csv
Result page:
1
2
3
4
5
6
7
8
1
How Emerging Memory Technologies Will Have You Rethinking Algorithm Design
July 2016
PODC '16: Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 28, Downloads (12 Months): 99, Downloads (Overall): 99
Full text available:
 PDF
We are on the cusp of the emergence of a new wave of nonvolatile memory technologies that are projected to become the dominant type of main memory in the near future. A key property of these new memory technologies is their asymmetric read-write costs: Writes can be an order of ...
Keywords:
persistent memory, asymmetric read-write costs, memory hierarchies, write-efficient algorithms, models of computation, nvram, shared memory algorithms
2
Parallel Algorithms for Asymmetric Read-Write Costs
July 2016
SPAA '16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 16, Downloads (12 Months): 68, Downloads (Overall): 68
Full text available:
 PDF
Motivated by the significantly higher cost of writing than reading in emerging memory technologies, we consider parallel algorithm design under such asymmetric read-write costs, with the goal of reducing the number of writes while preserving work-efficiency and low span. We present a nested-parallel model of computation that combines (i) small ...
Keywords:
non-volatile memory, breadth-first search, convex hull, list contraction, minimum spanning tree, parallel algorithms, tree contraction, write-avoiding, write-efficient, asymmetric nested-parallel, asymmetric read-write costs, work stealing
3
Experimental Analysis of Space-Bounded Schedulers
June 2016
ACM Transactions on Parallel Computing (TOPC) - Special Issue for SPAA 2014: Volume 3 Issue 1, June 2016
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 5, Downloads (12 Months): 41, Downloads (Overall): 41
Full text available:
 PDF
The running time of nested parallel programs on shared-memory machines depends in significant part on how well the scheduler mapping the program to the machine is optimized for the organization of caches and processor cores on the machine. Recent work proposed “space-bounded schedulers” for scheduling such programs on the multilevel ...
Keywords:
cache misses, multicores, Thread schedulers, memory bandwidth, work stealing, space-bounded schedulers
4
Gather-scatter DRAM: in-DRAM address translation to improve the spatial locality of non-unit strided accesses
November 2015
MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 8
Downloads (6 Weeks): 15, Downloads (12 Months): 139, Downloads (Overall): 211
Full text available:
 PDF
Many data structures (e.g., matrices) are typically accessed with multiple access patterns. Depending on the layout of the data structure in physical address space, some access patterns result in non-unit strides. In existing systems, which are optimized to store and access cache lines, non-unit strided accesses exhibit low spatial locality. ...
Keywords:
SIMD, in-memory databases, memory bandwidth, performance, energy, strided accesses, DRAM, caches
5
Tracking and Reducing Uncertainty in Dataflow Analysis-Based Dynamic Parallel Monitoring
October 2015
PACT '15: Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)
Publisher: IEEE Computer Society
Dataflow analysis-based dynamic parallel monitoring(DADPM) is a recent approach for identifying bugsin parallel software as it executes, based on the key insightof explicitly modeling a sliding window of uncertainty acrossparallel threads. While this makes the approach practical andscalable, it also introduces the possibility of false positives inthe analysis. In this ...
6
Fast Bulk Bitwise AND and OR in DRAM
June 2015
IEEE Computer Architecture Letters: Volume 14 Issue 2, July 2015
Publisher: IEEE Computer Society
Bitwise operations are an important component of modern day programming, and are used in a variety of applications such as databases. In this work, we propose a new and simple mechanism to implement bulk bitwise AND and OR operations in DRAM, which is faster and more efficient than existing mechanisms. ...
7
Sorting with Asymmetric Read and Write Costs
June 2015
SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and Architectures
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 11, Downloads (12 Months): 61, Downloads (Overall): 155
Full text available:
 PDF
Emerging memory technologies have a significant gap between the cost, both in time and in energy, of writing to memory versus reading from memory. In this paper we present models and algorithms that account for this difference, with a focus on write-efficient sorting algorithms. First, we consider the PRAM model ...
Keywords:
asymmetric read-write costs, matrix multiplication, mergesort, cache-oblivious algorithms, fft, non-volatile memory, sorting, external memory model, i/o buffer tree, parallel algorithms, sample sort, write-avoiding, write-efficient, persistent memory
8
Page overlays: an enhanced virtual memory framework to enable fine-grained memory management
June 2015
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 15, Downloads (12 Months): 146, Downloads (Overall): 375
Full text available:
 PDF
Many recent works propose mechanisms demonstrating the potential advantages of managing memory at a fine (e.g., cache line) granularity---e.g., fine-grained deduplication and fine-grained memory protection. Unfortunately, existing virtual memory systems track memory at a larger granularity (e.g., 4 KB pages), inhibiting efficient implementation of such techniques. Simply reducing the page ...
Also published in:
December 2015
ACM SIGARCH Computer Architecture News - ISCA'15: Volume 43 Issue 3, June 2015
9
Online Updates on Data Warehouses via Judicious Use of Solid-State Storage
March 2015
ACM Transactions on Database Systems (TODS): Volume 40 Issue 1, March 2015
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 8, Downloads (12 Months): 75, Downloads (Overall): 333
Full text available:
 PDF
Data warehouses have been traditionally optimized for read-only query performance, allowing only offline updates at night, essentially trading off data freshness for performance. The need for 24x7 operations in global markets and the rise of online and other quickly reacting businesses make concurrent online updates increasingly desirable. Unfortunately, state-of-the-art approaches ...
Keywords:
Materialized sort merge, data warehouses, SSD, online updates
10
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks
December 2014
ACM Transactions on Architecture and Code Optimization (TACO): Volume 11 Issue 4, January 2015
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 12, Downloads (12 Months): 76, Downloads (Overall): 233
Full text available:
 PDF
Many modern high-performance processors prefetch blocks into the on-chip cache. Prefetched blocks can potentially pollute the cache by evicting more useful blocks. In this work, we observe that both accurate and inaccurate prefetches lead to cache pollution, and propose a comprehensive mechanism to mitigate prefetcher-caused cache pollution. First, we observe ...
Keywords:
Prefetching, cache insertion/promotion policy, cache pollution, caches
11
Sequential random permutation, list contraction and tree contraction are highly parallel
December 2014
SODA '15: Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms
Publisher: Society for Industrial and Applied Mathematics
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 4, Downloads (12 Months): 35, Downloads (Overall): 54
Full text available:
 PDF
We show that simple sequential randomized iterative algorithms for random permutation, list contraction, and tree contraction are highly parallel. In particular, if iterations of the algorithms are run as soon as all of their dependencies have been resolved, the resulting computations have logarithmic depth (parallel time) with high probability. Our ...
12
Experimental analysis of space-bounded schedulers
June 2014
SPAA '14: Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 0, Downloads (12 Months): 15, Downloads (Overall): 111
Full text available:
 PDF
The running time of nested parallel programs on shared memory machines depends in significant part on how well the scheduler mapping the program to the machine is optimized for the organization of caches and processors on the machine. Recent work proposed ``space-bounded schedulers'' for scheduling such programs on the multi-level ...
Keywords:
memory bandwidth, thread schedulers, work stealing, space-bounded schedulers, cache misses, multicores
13
Gleaner: mitigating the blocked-waiter wakeup problem for virtualized multicore applications
June 2014
USENIX ATC'14: Proceedings of the 2014 USENIX conference on USENIX Annual Technical Conference
Publisher: USENIX Association
As the number of cores in a multicore node increases in accordance with Moore's law, the question arises as to what are the costs of virtualized environments when scaling applications to take advantage of larger core counts. While a widely-known cost due to preempted spinlock holders has been extensively studied, ...
14
The dirty-block index
June 2014
ISCA '14: Proceeding of the 41st annual international symposium on Computer architecuture
Publisher: IEEE Press
Bibliometrics:
Citation Count: 7
Downloads (6 Weeks): 3, Downloads (12 Months): 56, Downloads (Overall): 226
Full text available:
 PDF
On-chip caches maintain multiple pieces of metadata about each cached block---e.g., dirty bit, coherence information, ECC. Traditionally, such metadata for each block is stored in the corresponding tag entry in the tag store. While this approach is simple to implement and scalable, it necessitates a full tag store lookup for ...
Also published in:
October 2014
ACM SIGARCH Computer Architecture News - ISCA '14: Volume 42 Issue 3, June 2014
15
The Cost of Fault Tolerance in Multi-Party Communication Complexity
May 2014
Journal of the ACM (JACM): Volume 61 Issue 3, May 2014
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 2, Downloads (12 Months): 33, Downloads (Overall): 349
Full text available:
 PDF
Multi-party communication complexity involves distributed computation of a function over inputs held by multiple distributed players. A key focus of distributed computing research, since the very beginning, has been to tolerate failures. It is thus natural to ask “If we want to compute a certain function in a fault-tolerant way, ...
Keywords:
Aggregate functions, communication complexity, fault tolerance, promise problems
16
Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers
February 2014
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 4, Downloads (12 Months): 39, Downloads (Overall): 300
Full text available:
 PDF
Device drivers are an Achilles' heel of modern commodity operating systems, accounting for far too many system failures. Previous work on driver reliability has focused on protecting the kernel from unsafe driver side-effects by interposing an invariant-checking layer at the driver interface, but otherwise treating the driver as a black ...
Keywords:
device drivers, dynamic analysis
Also published in:
April 2014
ACM SIGPLAN Notices - ASPLOS '14: Volume 49 Issue 4, April 2014 April 2014
ACM SIGARCH Computer Architecture News - ASPLOS '14: Volume 42 Issue 1, March 2014
17
November 2013
NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems
Publisher: Curran Associates Inc.
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 1, Downloads (12 Months): 1, Downloads (Overall): 1
Full text available:
PDF
We propose a parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees. The parameter server provides an easy-to-use shared interface for read/write access to an ...
18
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization
Vivek Seshadri,
Yoongu Kim,
Chris Fallin,
Donghyuk Lee,
Rachata Ausavarungnirun,
Gennady Pekhimenko,
Yixin Luo,
Onur Mutlu,
Phillip B. Gibbons,
Michael A. Kozuch,
Todd C. Mowry
November 2013
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 17
Downloads (6 Weeks): 8, Downloads (12 Months): 100, Downloads (Overall): 376
Full text available:
 PDF
Several system-level operations trigger bulk data copy or initialization. Even though these bulk data operations do not require any computation, current systems transfer a large quantity of data back and forth on the memory channel to perform such operations. As a result, bulk data operations consume high latency, bandwidth, and ...
Keywords:
memory bandwidth, bulk operations, in-memory processing, performance, energy, page copy, DRAM, page initialization
19
Linearly compressed pages: a low-complexity, low-latency main memory compression framework
November 2013
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 14
Downloads (6 Weeks): 16, Downloads (12 Months): 108, Downloads (Overall): 483
Full text available:
 PDF
Data compression is a promising approach for meeting the increasing memory capacity demands expected in future systems. Unfortunately, existing compression algorithms do not translate well when directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory ...
Keywords:
DRAM, data compression, memory, memory bandwidth, memory controller, memory capacity
20
Reducing contention through priority updates
July 2013
SPAA '13: Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 2, Downloads (12 Months): 34, Downloads (Overall): 167
Full text available:
 PDF
Memory contention can be a serious performance bottleneck in concurrent programs on shared-memory multicore architectures. Having all threads write to a small set of shared locations, for example, can lead to orders of magnitude loss in performance relative to all threads writing to distinct locations, or even relative to a ...
Keywords:
memory contention, parallel programming
|
|