ROLE
SUBJECT AREAS
See all subject areas
BOOKMARK & SHARE
|
|
Result page:
1
2
3
4
5
6
7
8
9
10
>>
1
EC-cache: load-balanced, low-latency cluster caching with online erasure coding
November 2016
OSDI'16: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation
Publisher: USENIX Association
Data-intensive clusters and object stores are increasingly relying on in-memory object caching to meet the I/O performance demands. These systems routinely face the challenges of popularity skew, background load imbalance, and server failures, which result in severe load imbalance across servers and degraded I/O performance. Selective replication is a commonly ...
2
Apache Spark: a unified engine for big data processing
Matei Zaharia,
Reynold S. Xin,
Patrick Wendell,
Tathagata Das,
Michael Armbrust,
Ankur Dave,
Xiangrui Meng,
Josh Rosen,
Shivaram Venkataraman,
Michael J. Franklin,
Ali Ghodsi,
Joseph Gonzalez,
Scott Shenker,
Ion Stoica
October 2016
Communications of the ACM: Volume 59 Issue 11, November 2016
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 1,816, Downloads (12 Months): 107,193, Downloads (Overall): 107,193
Full text available:
Html PDF
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
3
Trends and challenges in big data processing
September 2016
Proceedings of the VLDB Endowment: Volume 9 Issue 13, September 2016
Publisher: VLDB Endowment
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 17, Downloads (12 Months): 139, Downloads (Overall): 139
Full text available:
PDF
Almost six years ago we started the Spark project at UC Berkeley. Spark is a cluster computing engine that is optimized for in-memory processing, and unifies support for a variety of workloads, including batch, interactive querying, streaming, and iterative computations. Spark is now the most active big data project in ...
4
July 2016
ACM Transactions on Database Systems (TODS): Volume 41 Issue 3, August 2016
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 10, Downloads (12 Months): 160, Downloads (Overall): 160
Full text available:
PDF
Databases can provide scalability by partitioning data across several servers. However, multipartition, multioperation transactional access is often expensive, employing coordination-intensive locking, validation, or scheduling mechanisms. Accordingly, many real-world systems avoid mechanisms that provide useful semantics for multipartition operations. This leads to incorrect behavior for a large class of applications including ...
Keywords:
materialized views, Atomic visibility, secondary indexing, transaction processing, NoSQL
5
Time-evolving graph processing at scale
June 2016
GRADES '16: Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 20, Downloads (12 Months): 212, Downloads (Overall): 212
Full text available:
PDF
Time-evolving graph-structured big data arises naturally in many application domains such as social networks and communication networks. However, existing graph processing systems lack support for efficient computations on dynamic graphs. In this paper, we represent most computations on time evolving graphs into (1) a stream of consistent and resilient graph ...
6
iOLAP: Managing Uncertainty for Efficient Incremental OLAP
June 2016
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
Publisher: ACM
Bibliometrics:
Citation Count: 0
Downloads (6 Weeks): 16, Downloads (12 Months): 190, Downloads (Overall): 228
Full text available:
PDF
The size of data and the complexity of analytics continue to grow along with the need for timely and cost-effective analysis. However, the growth of computation power cannot keep up with the growth of data. This calls for a paradigm shift from traditional batch OLAP processing model to an incremental ...
Keywords:
bootstrap, incremental, OLAP, approximate query processing
7
SparkR: Scaling R Programs with Spark
Shivaram Venkataraman,
Zongheng Yang,
Davies Liu,
Eric Liang,
Hossein Falaki,
Xiangrui Meng,
Reynold Xin,
Ali Ghodsi,
Michael Franklin,
Ion Stoica,
Matei Zaharia
June 2016
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 34, Downloads (12 Months): 386, Downloads (Overall): 462
Full text available:
PDF
R is a popular statistical programming language with a number of extensions that support data processing and machine learning tasks. However, interactive data analysis in R is usually limited as the R runtime is single threaded and can only process data sets that fit in a single machine's memory. We ...
Keywords:
R, spark, statistical computing
8
April 2016
EuroSys '16: Proceedings of the Eleventh European Conference on Computer Systems
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 7, Downloads (12 Months): 116, Downloads (Overall): 218
Full text available:
PDF
Systems are increasingly required to provide responses to queries, even if not exact, within stringent time deadlines. These systems parallelize computations over many processes and aggregate them hierarchically to get the final response ( e.g ., search engines and data analytics). Due to large performance variations in clusters, some processes ...
Keywords:
deadline, order-statistics, partition-aggregate, quality, stragglers
9
HUG: multi-resource fairness for correlated and elastic demands
March 2016
NSDI'16: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation
Publisher: USENIX Association
In this paper, we study how to optimally provide isolation guarantees in multi-resource environments, such as public clouds, where a tenant's demands on different resources (links) are correlated. Unlike prior work such as Dominant Resource Fairness (DRF) that assumes static and fixed demands, we consider elastic demands. Our approach generalizes ...
10
FairRide: near-optimal, fair cache sharing
March 2016
NSDI'16: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation
Publisher: USENIX Association
Memory caches continue to be a critical component to many systems. In recent years, there has been larger amounts of data into main memory, especially in shared environments such as the cloud. The nature of such environments requires resource allocations to provide both performance isolation for multiple users/applications and high ...
11
Ernest: efficient performance prediction for large-scale advanced analytics
March 2016
NSDI'16: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation
Publisher: USENIX Association
Recent workload trends indicate rapid growth in the deployment of machine learning, genomics and scientific workloads on cloud computing infrastructure. However, efficiently running these applications on shared infrastructure is challenging and we find that choosing the right hardware configuration can significantly improve performance and cost. The key to address the ...
12
March 2016
NSDI'16: Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation
Publisher: USENIX Association
Many prior efforts have suggested that Internet video Quality of Experience (QoE) could be dramatically improved by using data-driven prediction of video quality for different choices (e.g., CDN or bitrate) to make optimal decisions. However, building such a prediction system is challenging on two fronts. First, the relationships between video ...
13
FastLane: making short flows shorter with agile drop notification
August 2015
SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 7, Downloads (12 Months): 111, Downloads (Overall): 305
Full text available:
PDF
The drive towards richer and more interactive web content places increasingly stringent requirements on datacenter network performance. Applications running atop these networks typically partition an incoming query into multiple subqueries, and generate the final result by aggregating the responses for these subqueries. As a result, a large fraction --- as ...
Keywords:
datacenter networks, transport protocols
14
Low Latency Geo-distributed Data Analytics
August 2015
SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication
Publisher: ACM
Bibliometrics:
Citation Count: 9
Downloads (6 Weeks): 20, Downloads (12 Months): 339, Downloads (Overall): 682
Full text available:
PDF
Low latency analytics on geographically distributed datasets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single datacenter significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics ...
Keywords:
data analytics, low latency, wan analytics, network aware, geo-distributed
Also published in:
September 2015
ACM SIGCOMM Computer Communication Review - SIGCOMM'15: Volume 45 Issue 4, October 2015
15
Efficient Coflow Scheduling Without Prior Knowledge
August 2015
SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication
Publisher: ACM
Bibliometrics:
Citation Count: 16
Downloads (6 Weeks): 19, Downloads (12 Months): 223, Downloads (Overall): 456
Full text available:
PDF
Inter-coflow scheduling improves application-level communication performance in data-parallel clusters. However, existing efficient schedulers require a priori coflow information and ignore cluster dynamics like pipelining, task failures, and speculative executions, which limit their applicability. Schedulers without prior knowledge compromise on performance to avoid head-of-line blocking. In this paper, we present Aalo ...
Keywords:
data-intensive applications, datacenter networks, coflow
Also published in:
September 2015
ACM SIGCOMM Computer Communication Review - SIGCOMM'15: Volume 45 Issue 4, October 2015
16
Scaling spark in the real world: performance and usability
August 2015
Proceedings of the VLDB Endowment - Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii: Volume 8 Issue 12, August 2015
Publisher: VLDB Endowment
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 19, Downloads (12 Months): 324, Downloads (Overall): 662
Full text available:
PDF
Apache Spark is one of the most widely used open source processing engines for big data, with rich language-integrated APIs and a wide range of libraries. Over the past two years, our group has worked to deploy Spark to a wide range of organizations through consulting relationships as well as ...
17
May 2015
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
Publisher: ACM
Bibliometrics:
Citation Count: 7
Downloads (6 Weeks): 22, Downloads (12 Months): 146, Downloads (Overall): 483
Full text available:
PDF
The rise of data-intensive "Web 2.0" Internet services has led to a range of popular new programming frameworks that collectively embody the latest incarnation of the vision of Object-Relational Mapping (ORM) systems, albeit at unprecedented scale. In this work, we empirically investigate modern ORM-backed applications' use and disuse of database ...
Keywords:
invariants, orms, concurrency control, impedance mismatch, ruby on rails, application integrity
18
G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data
May 2015
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
Publisher: ACM
Bibliometrics:
Citation Count: 11
Downloads (6 Weeks): 24, Downloads (12 Months): 171, Downloads (Overall): 405
Full text available:
PDF
Nearly 15 years ago, Hellerstein, Haas and Wang proposed online aggregation (OLA), a technique that allows users to (1) observe the progress of a query by showing iteratively refined approximate answers, and (2) stop the query execution once its result achieves the desired accuracy. In this demonstration, we present G-OLA, ...
Keywords:
online aggregation
19
C3: internet-scale control plane for video quality optimization
May 2015
NSDI'15: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation
Publisher: USENIX Association
As Internet video goes mainstream, we see increasing user expectations for higher video quality and new global policy requirements for content providers. Inspired by the case for centralizing network-layer control, we present C3, a control system for optimizing Internet video delivery. The design of C3 addresses key challenges in ensuring ...
20
CellIQ: real-time cellular network analytics at scale
May 2015
NSDI'15: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation
Publisher: USENIX Association
We present CellIQ, a real-time cellular network analytics system that supports rich and sophisticated analysis tasks. CellIQ is motivated by the lack of support for realtime analytics or advanced tasks such as spatio-temporal traffic hotspots and handoff sequences with performance problems in state-of-the-art systems, and the interest in such tasks ...
|
|