Joel S. Emer
Joel S. Emer

MIT homepage
emeratacm.org

  Affiliation history
Bibliometrics: publication history
Average citations per article32.63
Citation Count2,088
Publication count64
Publication years1984-2016
Available for download42
Average downloads per article822.10
Downloads (cumulative)34,528
Downloads (12 Months)3,355
Downloads (6 Weeks)364
ACM Fellow
SEARCH
ROLE
Arrow RightAuthor only
· Editor only
· Advisor only
· Other only
· All roles


AUTHOR'S COLLEAGUES
See all colleagues of this author

SUBJECT AREAS
See all subject areas

KEYWORDS
See all author supplied keywords



BOOKMARK & SHARE


66 results found Export Results: bibtex | endnoteacmrefcsv

Result 1 – 20 of 66
Result page: 1 2 3 4

Sort by:

1 published by ACM
Hsin-Jung Yang, Kermin Fleming, Felix Winterstein, Michael Adler, Joel Emer
March 2017 ACM Transactions on Reconfigurable Technology and Systems (TRETS) - Special Section on Field Programmable Logic and Applications 2015 and Regular Papers: Volume 10 Issue 2, April 2017
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 22,   Downloads (12 Months): 32,   Downloads (Overall): 32

Full text available: PDFPDF
High-level abstractions separate algorithm design from platform implementation, allowing programmers to focus on algorithms while building complex systems. This separation also provides system programmers and compilers an opportunity to optimize platform services on an application-by-application basis. In field-programmable gate arrays (FPGAs), platform-level malleability extends to the memory system: Unlike general-purpose ...
Keywords: scalable cache, memory hierarchy, resource-aware optimization, FPGA

2 published by ACM
Hsin-Jung Yang, Kermin Fleming, Felix Winterstein, Annie I. Chen, Michael Adler, Joel Emer
February 2017 FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 16,   Downloads (12 Months): 105,   Downloads (Overall): 105

Full text available: PDFPDF
Memory systems play a key role in the performance of FPGA applications. As FPGA deployments move towards design entry points that are more serial, memory latency has become a serious design consideration. For these applications, memory network optimization is essential in improving performance. In this paper, we examine the automatic, ...
Keywords: network on chip, FPGA, compiler optimization, memory network, memory system, reconfigurable computing

3 published by ACM
CLARA: Circular Linked-List Auto and Self Refresh Architecture
Aditya Agrawal, Mike O'Connor, Evgeny Bolotin, Niladrish Chatterjee, Joel Emer, Stephen Keckler
October 2016 MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 5,   Downloads (12 Months): 47,   Downloads (Overall): 47

Full text available: PDFPDF
With increasing DRAM densities, the performance and energy overheads of refresh operations are increasingly significant. When the system is active, refresh commands render DRAM banks unavailable for increasing periods of time. These refresh operations can interfere with regular memory operations and hurt performance. In addition, when the system is idle, ...
Keywords: Self refresh, DRAM, Auto refresh

4
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
Yu-Hsin Chen, Joel Emer, Vivienne Sze
June 2016 ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture
Publisher: IEEE Press
Bibliometrics:
Citation Count: 7
Downloads (6 Weeks): 27,   Downloads (12 Months): 240,   Downloads (Overall): 240

Full text available: PDFPDF
Deep convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity. The complexity comes from the need to simultaneously process hundreds of filters and channels in the high-dimensional convolutions, which involve a significant amount of data movement. ...
Also published in:
October 2016  ACM SIGARCH Computer Architecture News - ISCA'16: Volume 44 Issue 3, June 2016

5 published by ACM
LMC: Automatic Resource-Aware Program-Optimized Memory Partitioning
Hsin-Jung Yang, Kermin Fleming, Michael Adler, Felix Winterstein, Joel Emer
February 2016 FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 6,   Downloads (12 Months): 100,   Downloads (Overall): 180

Full text available: PDFPDF
As FPGAs have grown in size and capacity, FPGA memory systems have become both richer and more diverse in order to support the increased computational capacity of FPGA fabrics. Using these resources, and using them well, has become commensurately more difficult, especially in the context of legacy designs ported from ...
Keywords: resource-aware optimization, fpga memory partitioning

6 published by ACM
A scalable architecture for ordered parallelism
Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, Daniel Sanchez
December 2015 MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 15,   Downloads (12 Months): 201,   Downloads (Overall): 356

Full text available: PDFPDF
We present Swarm, a novel architecture that exploits ordered irregular parallelism , which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks with programmer-specified timestamps. Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ...
Keywords: ordered parallelism, speculative execution, multicore, fine-grain parallelism, irregular parallelism, synchronization

7 published by ACM
A fast and accurate analytical technique to compute the AVF of sequential bits in a processor
Steven Raasch, Arijit Biswas, Jon Stephan, Paul Racunas, Joel Emer
December 2015 MICRO-48: Proceedings of the 48th International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 3,   Downloads (12 Months): 53,   Downloads (Overall): 150

Full text available: PDFPDF
The rate of particle induced soft errors in a processor increases in proportion to the number of bits. This soft error rate (SER) can limit the performance of a system by placing an effective limit on the number of cores, nodes or clusters. The vulnerability of bits in a processor ...
Keywords: fault injection, sequentials, AVF, fault simulation, soft error, ACE analysis, reliability

8 published by ACM
September 2015 ACM Transactions on Computer Systems (TOCS): Volume 33 Issue 3, September 2015
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 13,   Downloads (12 Months): 79,   Downloads (Overall): 286

Full text available: PDFPDF
There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to efficiently control each processing element (PE) in the system, and how to facilitate inter-PE communication without the ...
Keywords: Spatial programming, reconfigurable accelerators

9
LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories
Hsin Jung Yang, Kermin Fleming, Michael Adler, Joel Emer
May 2014 FCCM '14: Proceedings of the 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 3

Parallel programming has been widely used in many scientific and technical areas to solve large problems. While general-purpose processors have rich infrastructure to support parallel programming on shared memory, such as coherent caches and synchronization libraries, parallel programming infrastructure for FPGAs is limited. Thus, development of FPGA-based parallel algorithms remains ...
Keywords: FPGA shared memory, coherency, synchronization

10 published by ACM
Samantika Subramaniam, Simon C. Steely, Will Hasenplaugh, Aamer Jaleel, Carl Beckmann, Tryggve Fossum, Joel Emer
December 2013 ACM Transactions on Architecture and Code Optimization (TACO): Volume 10 Issue 4, December 2013
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 2,   Downloads (12 Months): 23,   Downloads (Overall): 323

Full text available: PDFPDF
As microprocessor designs integrate more cores, scalability of cache coherence protocols becomes a challenging problem. Most directory-based protocols avoid races by using blocking tag directories that can impact the performance of parallel applications. In this article, we first quantitatively demonstrate that state-of-the-art blocking protocols significantly constrain throughput at large core ...
Keywords: tag directories, Cache coherence, nonblocking, synchronization

11 published by ACM
Triggered instructions: a control paradigm for spatially-programmed architectures
June 2013 ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
Publisher: ACM
Bibliometrics:
Citation Count: 12
Downloads (6 Weeks): 19,   Downloads (12 Months): 129,   Downloads (Overall): 1,050

Full text available: PDFPDF
In this paper, we present triggered instructions , a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. They also allow efficient reactivity to inter-PE communication ...
Keywords: reconfigurable accelerators, spatial programming
Also published in:
June 2013  ACM SIGARCH Computer Architecture News - ICSA '13: Volume 41 Issue 3, June 2013

12
A Hierarchical Architectural Framework for Reconfigurable Logic Computing
May 2013 IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 0

Recently there has been growing interest in using Reconfigurable Logic (RL) for computation because of the significant performance gains that they can provide over traditional architectures on many classes of workloads. While there is a rich body of prior work proposing a variety of reconfigurable systems, we believe there hasn't ...
Keywords: reconfigurable logic architecture taxonomy

13
Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE)
Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, Joel Emer
June 2012 ISCA '12: Proceedings of the 39th Annual International Symposium on Computer Architecture
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 51
Downloads (6 Weeks): 13,   Downloads (12 Months): 174,   Downloads (Overall): 1,422

Full text available: PDFPDF
Single-ISA heterogeneous multi-core processors are typically composed of small (e.g., in-order) power-efficient cores and big (e.g., out-of-order) high-performance cores. The effectiveness of heterogeneous multi-cores depends on how well a scheduler can map workloads onto the most appropriate core type. In general, small cores can achieve good performance if the workload ...
Also published in:
September 2012  ACM SIGARCH Computer Architecture News - ISCA '12: Volume 40 Issue 3, June 2012

14 published by ACM
CRUISE: cache replacement and utility-aware scheduling
Aamer Jaleel, Hashem H. Najaf-abadi, Samantika Subramaniam, Simon C. Steely, Joel Emer
March 2012 ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Publisher: ACM
Bibliometrics:
Citation Count: 20
Downloads (6 Weeks): 7,   Downloads (12 Months): 59,   Downloads (Overall): 934

Full text available: PDFPDF
When several applications are co-scheduled to run on a system with multiple shared LLCs, there is opportunity to improve system performance. This opportunity can be exploited by the hardware, software, or a combination of both hardware and software. The software, i.e., an operating system or hypervisor, can improve system performance ...
Keywords: cache replacement, shared cache, scheduling
Also published in:
April 2012  ACM SIGARCH Computer Architecture News - ASPLOS '12: Volume 40 Issue 1, March 2012 June 2012  ACM SIGPLAN Notices - ASPLOS '12: Volume 47 Issue 4, April 2012

15 published by ACM
Leveraging latency-insensitivity to ease multiple FPGA design
February 2012 FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Publisher: ACM
Bibliometrics:
Citation Count: 6
Downloads (6 Weeks): 4,   Downloads (12 Months): 34,   Downloads (Overall): 323

Full text available: PDFPDF
Traditionally, hardware designs partitioned across multiple FPGAs have had low performance due to the inefficiency of maintaining cycle-by-cycle timing among discrete FPGAs. In this paper, we present a mechanism by which complex designs may be efficiently and automatically partitioned among multiple FPGAs using explicitly programmed latency-insensitive links. We describe the ...
Keywords: high-level synthesis, switch architecture, FPGA, programming languages, DSP, compiler, design automation

16 published by ACM
The gradient-based cache partitioning algorithm
William Hasenplaugh, Pritpal S. Ahuja, Aamer Jaleel, Simon Steely Jr., Joel Emer
January 2012 ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers: Volume 8 Issue 4, January 2012
Publisher: ACM
Bibliometrics:
Citation Count: 2
Downloads (6 Weeks): 5,   Downloads (12 Months): 41,   Downloads (Overall): 496

Full text available: PDFPDF
This paper addresses the problem of partitioning a cache between multiple concurrent threads and in the presence of hardware prefetching. Cache replacement designed to preserve temporal locality (e.g., LRU) will allocate cache resources proportional to the miss-rate of each competing thread irrespective of whether the cache space will be utilized ...
Keywords: adaptive caching, chernoff bound, hill climbing, dynamic control, gradient descent, Cache replacement, dynamic cache partitioning, insertion policy

17 published by ACM
PACMan: prefetch-aware cache management for high performance caching
Carole-Jean Wu, Aamer Jaleel, Margaret Martonosi, Simon C. Steely, Jr., Joel Emer
December 2011 MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 22
Downloads (6 Weeks): 6,   Downloads (12 Months): 69,   Downloads (Overall): 874

Full text available: PDFPDF
Hardware prefetching and last-level cache (LLC) management are two independent mechanisms to mitigate the growing latency to memory. However, the interaction between LLC management and hardware prefetching has received very little attention. This paper characterizes the performance of state-of-the-art LLC management policies in the presence and absence of hardware prefetching. ...
Keywords: prefetch-aware replacement, set dueling, reuse distance prediction, shared cache

18 published by ACM
SHiP: signature-based hit predictor for high performance caching
Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely, Jr., Joel Emer
December 2011 MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Publisher: ACM
Bibliometrics:
Citation Count: 30
Downloads (6 Weeks): 9,   Downloads (12 Months): 195,   Downloads (Overall): 969

Full text available: PDFPDF
The shared last-level caches in CMPs play an important role in improving application performance and reducing off-chip memory bandwidth requirements. In order to use LLCs more efficiently, recent research has shown that changing the re-reference prediction on cache insertions and cache hits can significantly improve cache performance. A fundamental challenge, ...
Keywords: replacement, reuse distance prediction, shared cache

19 published by ACM
February 2011 FPGA '11: Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Publisher: ACM
Bibliometrics:
Citation Count: 18
Downloads (6 Weeks): 4,   Downloads (12 Months): 96,   Downloads (Overall): 577

Full text available: PDFPDF
Developers accelerating applications on FPGAs or other reconfigurable logic have nothing but raw memory devices in their standard toolkits. Each project typically includes tedious development of single-use memory management. Software developers expect a programming environment to include automatic memory management. Virtual memory provides the illusion of very large arrays and ...
Keywords: memory management, caches, fpga

20
HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing
February 2011 HPCA '11: Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Publisher: IEEE Computer Society
Bibliometrics:
Citation Count: 17

In this paper we present the HAsim FPGA-accelerated simulator. HAsim is able to model a shared-memory multicore system including detailed core pipelines, cache hierarchy, and on-chip network, using a single FPGA. We describe the scaling techniques that make this possible, including novel uses of time-multiplexing in the core pipeline and ...



The ACM Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
Terms of Usage   Privacy Policy   Code of Ethics   Contact Us