ROLE
Author only
AUTHOR'S COLLEAGUES
See all colleagues of this author
SUBJECT AREAS
See all subject areas
KEYWORDS
See all author supplied keywords
AUTHOR PROFILE PAGES
Project background
Author-Izer Service
BOOKMARK & SHARE
|
|
10 results found
Export Results:
bibtex
| endnote
| acmref
| csv
1
Apache Spark: a unified engine for big data processing
Matei Zaharia,
Reynold S. Xin,
Patrick Wendell,
Tathagata Das,
Michael Armbrust,
Ankur Dave,
Xiangrui Meng,
Josh Rosen,
Shivaram Venkataraman,
Michael J. Franklin,
Ali Ghodsi,
Joseph Gonzalez,
Scott Shenker,
Ion Stoica
October 2016
Communications of the ACM: Volume 59 Issue 11, November 2016
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 1,816, Downloads (12 Months): 107,193, Downloads (Overall): 107,193
Full text available:
Html PDF
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
2
Introduction to Spark 2.0 for Database Researchers
June 2016
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
Publisher: ACM
Bibliometrics:
Citation Count: 1
Downloads (6 Weeks): 28, Downloads (12 Months): 448, Downloads (Overall): 520
Full text available:
PDF
Originally started as an academic research project at UC Berkeley, Apache Spark is one of the most popular open source projects for big data analytics. Over 1000 volunteers have contributed code to the project; it is supported by virtually every commercial vendor; many universities are now offering courses on Spark. ...
Keywords:
machine learning, streaming, SQL, spark, Hadoop, big data
3
Scaling spark in the real world: performance and usability
August 2015
Proceedings of the VLDB Endowment - Proceedings of the 41st International Conference on Very Large Data Bases, Kohala Coast, Hawaii: Volume 8 Issue 12, August 2015
Publisher: VLDB Endowment
Bibliometrics:
Citation Count: 5
Downloads (6 Weeks): 19, Downloads (12 Months): 324, Downloads (Overall): 662
Full text available:
PDF
Apache Spark is one of the most widely used open source processing engines for big data, with rich language-integrated APIs and a wide range of libraries. Over the past two years, our group has worked to deploy Spark to a wide range of organizations through consulting relationships as well as ...
4
Michael Armbrust,
Reynold S. Xin,
Cheng Lian,
Yin Huai,
Davies Liu,
Joseph K. Bradley,
Xiangrui Meng,
Tomer Kaftan,
Michael J. Franklin,
Ali Ghodsi,
Matei Zaharia
May 2015
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
Publisher: ACM
Bibliometrics:
Citation Count: 70
Downloads (6 Weeks): 66, Downloads (12 Months): 1,064, Downloads (Overall): 2,737
Full text available:
PDF
Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API. Built on our experience with Shark, Spark SQL lets Spark programmers leverage the benefits of relational processing (e.g. declarative queries and optimized storage), and lets SQL users call complex analytics libraries in ...
Keywords:
databases, spark, hadoop, data warehouse, machine learning
5
G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data
May 2015
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
Publisher: ACM
Bibliometrics:
Citation Count: 11
Downloads (6 Weeks): 24, Downloads (12 Months): 171, Downloads (Overall): 405
Full text available:
PDF
Nearly 15 years ago, Hellerstein, Haas and Wang proposed online aggregation (OLA), a technique that allows users to (1) observe the progress of a query by showing iteratively refined approximate answers, and (2) stop the query execution once its result achieves the desired accuracy. In this demonstration, we present G-OLA, ...
Keywords:
online aggregation
6
Generalized scale independence through incremental precomputation
June 2013
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Publisher: ACM
Bibliometrics:
Citation Count: 8
Downloads (6 Weeks): 10, Downloads (12 Months): 59, Downloads (Overall): 321
Full text available:
PDF
Developers of rapidly growing applications must be able to anticipate potential scalability problems before they cause performance issues in production environments. A new type of data independence, called scale independence, seeks to address this challenge by guaranteeing a bounded amount of work is required to execute all queries in an ...
Keywords:
materialized view selection, scalability, scale independence
7
PIQL: success-tolerant query processing in the cloud
November 2011
Proceedings of the VLDB Endowment: Volume 5 Issue 3, November 2011
Publisher: VLDB Endowment
Bibliometrics:
Citation Count: 15
Downloads (6 Weeks): 3, Downloads (12 Months): 18, Downloads (Overall): 287
Full text available:
PDF
Newly-released web applications often succumb to a "Success Disaster," where overloaded database machines and resulting high response times destroy a previously good user experience. Unfortunately, the data independence provided by a traditional relational database system, while useful for agile development, only exacerbates the problem by hiding potentially expensive queries under ...
8
The case for PIQL: a performance insightful query language
June 2010
SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing
Publisher: ACM
Bibliometrics:
Citation Count: 4
Downloads (6 Weeks): 5, Downloads (12 Months): 41, Downloads (Overall): 418
Full text available:
PDF
Large-scale, user-facing applications are increasingly moving from relational databases to distributed key/value stores for high-request-rate, low-latency workloads. Often, this move is motivated not only by key/value stores' ability to scale simply by adding more hardware, but also by the easy to understand predictable performance they provide for all operations. For ...
Keywords:
databases, performance
9
PIQL: a performance insightful query language
June 2010
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Publisher: ACM
Bibliometrics:
Citation Count: 3
Downloads (6 Weeks): 21, Downloads (12 Months): 112, Downloads (Overall): 407
Full text available:
PDF
Large-scale websites are increasingly moving from relational databases to distributed key-value stores for high request rate, low latency workloads. Often this move is motivated not only by key-value stores' ability to scale simply by adding more hardware, but also by the easy to understand predictable performance they provide for all ...
Keywords:
databases
10
A view of cloud computing
Michael Armbrust,
Armando Fox,
Rean Griffith,
Anthony D. Joseph,
Randy Katz,
Andy Konwinski,
Gunho Lee,
David Patterson,
Ariel Rabkin,
Ion Stoica,
Matei Zaharia
April 2010
Communications of the ACM: Volume 53 Issue 4, April 2010
Publisher: ACM
Bibliometrics:
Citation Count: 841
Downloads (6 Weeks): 2,404, Downloads (12 Months): 32,568, Downloads (Overall): 313,542
Full text available:
Html PDF
Clearing the clouds away from the true potential and obstacles posed by this computing capability.
|
|