June 15, 2019 | Palace Hotel, San Francisco
ACM-IMS Interdisciplinary Summit on the Foundations of Data Science
ACM and the Institute of Mathematical Statistics are bringing together speakers and panelists to address topics such as deep learning, reinforcement learning, fairness, ethics, and the future of data science. Jeannette Wing and David Madigan are the event Co-Chairs.
https://www.acm.org/data-science-summit
The March/April 2019 issue of acmqueue is out now
Subscribers and ACM Professional members login here
March/April 2019
The Soft Side of Software
Overly Attached
Kate Matsudaira
Know when to let go of emotional attachment to your work.
A smart, senior engineer couldn't make logical decisions if it meant deprecating the system he and his team had worked on for a number of years. Even though the best thing would have been to help another team create the replacement system, they didn't want to entertain the idea because it would mean putting an end to something they had invested so much in. It is good to have strong ownership, but what happens when you get too attached?
Business and Management,
The Soft Side of Software
Industry-scale Knowledge Graphs: Lessons and Challenges
Natasha Noy, Yuqing Gao, Anshu Jain, Anant Narayanan, Alan Patterson, Jamie Taylor
Five diverse technology companies show how it's done
This article looks at the knowledge graphs of five diverse tech companies, comparing the similarities and differences in their respective experiences of building and using the graphs, and discussing the challenges that all knowledge-driven enterprises face today.
The collection of knowledge graphs discussed here covers the breadth of applications, from search, to product descriptions, to social networks.
The goal here is not to describe these knowledge graphs exhaustively, but rather to use the authors' practical experiences in building knowledge graphs in some of the largest technology companies today as a scaffolding to highlight the challenges that any enterprise-scale knowledge graph will face and where some innovative research is needed.
Data and Databases,
Development,
Networks
The Morning Paper:
GAN Dissection and Datacenter RPCs
Adrian Colyer
Visualizing and understanding generative adversarial networks;
datacenter RPCs can be general and fast.
Image generation using GANs (generative adversarial networks) has made astonishing progress over the past few years. While staring in wonder at some of the incredible images, it's natural to ask how such feats are possible. "GAN Dissection: Visualizing and Understanding Generative Adversarial Networks" gives us a look under the hood to see what kinds of things are being learned by GAN units, and how manipulating those units can affect the generated images. February saw the 16th edition of the Usenix Symposium on Networked Systems Design and Implementation. Kalia et al. blew me away with their work on fast RPCs (remote procedure calls) in the datacenter. Through a carefully considered design, they show that RPC performance with commodity CPUs and standard lossy Ethernet can be competitive with specialized systems based on FPGAs (field-programmable gate arrays), programmable switches, and RDMA (remote direct memory access). It's a fabulous reminder to ensure we're making the most of what we already have before leaping to more expensive solutions.
Data and Databases,
Development,
Networks
January/February 2019
Research for Practice:
Troubling Trends in Machine Learning Scholarship
Zachary C. Lipton, Jacob Steinhardt
Some ML papers suffer from flaws that could mislead the public and stymie future research.
Flawed scholarship threatens to mislead the public and stymie future research by compromising ML's intellectual foundations. Indeed, many of these problems have recurred cyclically throughout the history of AI and, more broadly, in scientific research. In 1976, Drew McDermott chastised the AI community for abandoning self-discipline, warning prophetically that "if we can't criticize ourselves, someone else will save us the trouble." The current strength of machine learning owes to a large body of rigorous research to date, both theoretical and empirical. By promoting clear scientific thinking and communication, our community can sustain the trust and investment it currently enjoys.
Artificial Intelligence,
Research for Practice
Everything Sysadmin
Tom's Top Ten Things Executives Should Know About Software
Thomas A. Limoncelli
Software acumen is the new norm.
Software is eating the world. To do their jobs well, executives and managers outside of technology will benefit from understanding some fundamentals of software and the software-delivery process.
Business and Management,
Everything Sysadmin
Garbage Collection as a Joint Venture
Ulan Degenbaev, Michael Lippautz, Hannes Payer
A collaborative approach to reclaiming memory in heterogeneous software systems
Cross-component tracing is a way to solve the problem of reference cycles across component boundaries. This problem appears as soon as components can form arbitrary object graphs with nontrivial ownership across API boundaries. An incremental version of CCT is implemented in V8 and Blink, enabling effective and efficient reclamation of memory in a safe manner.
Languages
The Soft Side of Software
How to Create a Great Team Culture (and Why It Matters)
Kate Matsudaira
Build safety, share vulnerability, and establish purpose.
As leader of the team, you have significant influence over your team's culture. You can institute policies and procedures that help make your team happy and productive, monitor team successes, and continually improve the team. Another important part of team culture, however, is helping people feel they are a part of creating it. How can you expand the job of creating a culture to other team members?
Business and Management,
The Soft Side of Software
Online Event Processing
Martin Kleppmann, Alastair R. Beresford, Boerge Svingen
Achieving consistency where distributed transactions have failed
Support for distributed transactions across heterogeneous storage technologies is either nonexistent or suffers from poor operational and performance characteristics. In contrast, OLEP is increasingly used to provide good performance and strong consistency guarantees in such settings. In data systems it is very common for logs to be used as internal implementation details. The OLEP approach is different: it uses event logs, rather than transactions, as the primary application programming model for data management. Traditional databases are still used, but their writes come from a log rather than directly from the application. The use of OLEP is not simply pragmatism on the part of developers, but rather it offers a number of advantages. Consequently, OLEP is expected to be increasingly used to provide strong consistency in large-scale systems that use heterogeneous storage technologies.
Distributed Development
Kode Vicious
The Worst Idea of All Time
Revelations at 100!
So, is the author behind Kode Vicious really a big, loud jerk who throws coworkers out windows, flattens the tires of the annoying marketing guy, drinks heavily, and beats and berates his colleagues? The answer is both yes and no.
Kode Vicious
Net Neutrality: Unexpected Solution to Blockchain Scaling
Aleksandar Kuzmanovic
Cloud-delivery networks could dramatically improve blockchains' scalability, but clouds must be provably neutral first.
There is a growing expectation, or at least a hope, that blockchains possess a disruptive potential in numerous domains because of their decentralized nature (i.e., no single entity controls their operations). Decentralization comes with a price, however: blockchains do not scale.
Provably neutral clouds are undoubtedly a viable solution to blockchain scaling.
By optimizing the transport layer, not only can the throughput be fundamentally scaled up, but the latency could be dramatically reduced.
The key to this vision, however, lies in establishing trust by the blockchain ecosystem into the underlying networking infrastructure.
This, in turn, is achieved by decoupling authority from infrastructure via a provably neutral network design.
Networks
The Morning Paper:
SageDB and NetAccel
Adrian Colyer
Learned models within the database system; network-accelerated query processing
The CIDR (Conference on Innovative Data Systems Research) runs once every two years, and luckily for us 2019 is one of those years. I've selected two papers from this year's conference that highlight bold and exciting directions for data systems.
Development
November/December 2018
Identity by Any Other Name
Pat Helland
The complex cacophony of intertwined systems
As distributed systems scale in size and heterogeneity, increasingly they are connected by identifiers. Frequently, these terms refer to immutable things. At other times, they refer to stuff that changes as time goes on. Identifiers are even used to represent the nature of the computation working across distrusting systems.
Identity and identifiers provide the immutable linkage. Both sides of this linkage may change, but they provide a semantic consistency needed by the business operation. No matter what you call it, identity is the glue that makes things stick and lubricates cooperative work.
Data and Databases,
Distributed Computing
Research for Practice:
Edge Computing
Nitesh Mor
Scaling resources within multiple administrative domains
Cloud computing taught practitioners how to scale resources within a single administrative domain. Edge computing requires learning how to scale in the many administrative domains.
Creating edge computing infrastructures and applications encompasses quite a breadth of systems research. Let's take a look at the academic view of edge computing and a sample of existing research that will be relevant in the coming years.
Data and Databases,
Distributed Computing,
Research for Practice
Achieving Digital Permanence
Raymond Blum, Betsy Beyer
The many challenges to maintaining stored information and ways to overcome them
Today's Information Age is creating new uses for and new ways to steward the data that the world depends on. The world is moving away from familiar, physical artifacts to new means of representation that are closer to information in its essence. We need processes to ensure both the integrity and accessibility of knowledge in order to guarantee that history will be known and true.
Data and Databases,
Web Services
Kode Vicious
Know Your Algorithms
Stop using hardware to solve software problems.
Knowing that your CPU is in use 100 percent of the time doesn't tell you much about the overall system other than it's busy, but busy with what? Maybe it's sitting in a tight loop, or some clown added a bunch of delay loops during testing that are no longer necessary. Until you profile your system, you have no idea why the CPU is busy. All systems provide some form of profiling so that you can track down where the bottlenecks are, and it's your responsibility to apply these tools before you spend money on brand new hardware.
Development,
Kode Vicious
Metrics That Matter
Benjamin Treynor Sloss, Shylaja Nukala, and Vivek Rau
Critical but oft-neglected service metrics that every SRE and product owner should care about
Measure your site reliability metrics, set the right targets, and go through the work to measure the metrics accurately. Then, you'll find that your service runs better, with fewer outages, and much more user adoption.
Web Services
The Soft Side of Software
Design Patterns for Managing Up
Kate Matsudaira
Four challenging work situations and how to handle them
Have you ever been in a situation where you are presenting to your manager or your manager's manager and you completely flub the opportunity by saying all the wrong things? Look for patterns and be the version of yourself that you want to be. When you have a plan in place, you are much more likely to succeed.
Business and Management,
The Soft Side of Software
A Hitchhiker's Guide to the Blockchain Universe
Jim Waldo
Blockchain remains a mystery, despite its growing acceptance.
It is difficult these days to avoid hearing about blockchain. Despite the significant potential of blockchain, it is also difficult to find a consistent description of what it really is. This article looks at the basics of blockchain: the individual components, how those components fit together, and what changes might be made to solve some of the problems with blockchain technology.
Networks,
Security
September/October 2018
Tear Down the Method Prisons! Set Free the Practices!
Ivar Jacobson, Roly Stimson
Essence: a new way of thinking that promises to liberate the practices and enable true learning organizations
This article explains why we need to break out of this repetitive dysfunctional behavior, and it introduces Essence, a new way of thinking that promises to free the practices from their method prisons and thus enable true learning organizations.
Development
Research for Practice:
Security for the Modern Age
Jessie Frazelle
Securely running processes that require the entire syscall interface
While evidence has shown that "a container with a well-crafted seccomp profile provides roughly equivalent security to a hypervisor", methods are still needed for securely running those processes that require the entire syscall interface. Solving this problem has led to some interesting research.
The container ecosystem is very fast paced. Numerous companies are building products on top of existing technologies, while enterprises are using these technologies and products to run their infrastructures. The focus of the three papers described here is on advancements to the underlying technologies themselves and strategic ways to secure software in the modern age.
Giving operators a usable means of securing the methods they use to deploy and run applications is a win for everyone. Keeping the usability-focused abstractions provided by containers, while finding new ways to automate security and defend against attacks, is a great path forward.
Development,
Performance,
Research for Practice,
Security
Everything Sysadmin
SQL is No Excuse to Avoid DevOps
Thomas A. Limoncelli
Automation and a little discipline allow better testing, shorter release cycles, and reduced business risk.
Using SQL databases is not an impediment to doing DevOps. Automating schema management and a little developer discipline enables more vigorous and repeatable testing, shorter release cycles, and reduced business risk.
Automating releases liberates us. It turns a worrisome, stressful, manual upgrade process into a regular event that happens without incident. It reduces business risk but, more importantly, creates a more sustainable workplace.
When you can confidently deploy new releases, you do it more frequently. New features that previously sat unreleased for weeks or months now reach users sooner. Bugs are fixed faster. Security holes are closed sooner. It enables the company to provide better value to customers.
Data and Databases,
Development,
Everything Sysadmin,
Systems Administration
Understanding Database Reconstruction Attacks on Public Data
Simson Garfinkel, John M. Abowd, and Christian Martindale, U.S. Census Bureau
These attacks on statistical databases are no longer a theoretical danger.
With the dramatic improvement in both computer speeds and the efficiency of SAT and other NP-hard solvers in the last decade, DRAs on statistical databases are no longer just a theoretical danger. The vast quantity of data products published by statistical agencies each year may give a determined attacker more than enough information to reconstruct some or all of a target database and breach the privacy of millions of people. Traditional disclosure-avoidance techniques are not designed to protect against this kind of attack.
Faced with the threat of database reconstruction, statistical agencies have two choices: they can either publish dramatically less information or use some kind of noise injection. Agencies can use differential privacy to determine the minimum amount of noise necessary to add, and the most efficient way to add that noise, in order to achieve their privacy protection goals.
Data and Databases,
Security
Kode Vicious
Writing a Test Plan
Establish your hypotheses, methodologies, and expected results.
If you can think of each of your tests as an experiment with a hypothesis, a test methodology, and a test result, it should all fall into place rather than falling through the cracks.
Development,
Kode Vicious
The Soft Side of Software
The Importance of a Great Finish
Kate Matsudaira
You have to finish strong, every time.
How can you make sure that you are recognized as a valuable member of your team, whose work is seen as critical to the team's success? Here is how to keep your momentum up and make the right moves to be a visible contributor to the final success of every project.
Business and Management,
The Soft Side of Software
Case Study
CodeFlow: Improving the Code Review Process at Microsoft
A discussion with Jacek Czerwonka, Michaela Greiler, Christian Bird, Lucas Panjer, and Terry Coatta
People may associate code reviews with debugging, but that's not as central to the code-review process as you might think. The real win comes in the form of improved long-term code maintainability.
Case Studies,
Workflow
Benchmarking "Hello, World!"
Richard L. Sites
Six different views of the execution of "Hello, World!" show what is often missing in today's tools
Too often a service provider has a performance promise to keep but few tools for measuring the existence of laggard transactions, and none at all for understanding their root causes. As more and more software moves off the desktop and into data centers, and more and more cell phones use server requests as the other half of apps, observation tools for large-scale distributed transaction systems are not keeping up. Know what each tool you use is blind to, know what information you need to understand a performance problem, and then look for tools that can actually observe that information directly.
Development,
Performance
July/August 2018
Using Remote Cache Service for Bazel
Alpha Lam
Save time by sharing and reusing build and test output
Bazel is an actively developed open-source build and test system that aims to increase productivity in software development. It has a growing number of optimizations to improve the performance of daily development tasks. Remote cache service is a new development that significantly saves time in running builds and tests. It is particularly useful for a large code base and any size of development team.
Development
Kode Vicious
A Chance Gardener
Harvesting open-source products and planting the next crop
It is a very natural progression for a company to go from being a pure consumer of open source, to interacting with the project via patch submission, and then becoming a direct contributor. No one would expect a company to be a direct contributor to all the open-source projects it consumes, as most companies consume far more software than they would ever produce, which is the bounty of the open-source garden. It ought to be the goal of every company consuming open source to contribute something back, however, so that its garden continues to bear fruit, instead of rotting vegetables.
Kode Vicious,
Open Source
Why SRE Documents Matter
Shylaja Nukala, Vivek Rau
How documentation enables SRE teams to manage new and existing services
SRE (site reliability engineering) is a job function, a mindset, and a set of engineering approaches for making web products and services run reliably. SREs operate at the intersection of software development and systems engineering to solve operational problems and engineer solutions to design, build, and run large-scale distributed systems scalably, reliably, and efficiently. A mature SRE team likely has well-defined bodies of documentation associated with many SRE functions. If you manage an SRE team or intend to start one, this article will help you understand the types of documents your team needs to write and why each type is needed, allowing you to plan for and prioritize documentation work along with other team projects.
Web Development
How to Live in a Post-Meltdown and -Spectre World
Rich Bennett, Craig Callahan, Stacy Jones, Matt Levine, Merrill Miller, and Andy Ozment
Learn from the past to prepare for the next battle.
The scope of vulnerabilities such as Meltdown and Spectre is so vast that it can be difficult to address. At best, this is an incredibly complex situation for an organization like Goldman Sachs with dedicated threat, vulnerability management, and infrastructure teams. Navigation for a small or medium-sized business without dedicated triage teams is likely harder. We rely heavily on vendor coordination for clarity on patch dependency and still have to move forward with less-than-perfect answers at times.
Good cyber-hygiene practices remain foundational—the nature of the vulnerability is different, but the framework and approach to managing it are not. In a world of zero days and multidimensional vulnerabilities such as Spectre and Meltdown, the speed and effectiveness of the response to triage and prioritizing risk-reduction efforts are vital to all organizations. More high-profile and complex vulnerabilities are sure to follow, so now is a good time to take lessons learned from Spectre and Meltdown and use them to help prepare for the next battle.
Security
The Soft Side of Software
How to Get Things Done When You Don't Feel Like It
Kate Matsudaira
Five strategies for pushing through
If you want to be successful, then it serves you better to rise to the occasion no matter what. That means learning how to push through challenges and deliver valuable results.
Business and Management,
The Soft Side of Software
Tracking and Controlling Microservice Dependencies
Silvia Esparrachiari, Tanya Reilly, and Ashleigh Rentz
Dependency management is a crucial part of system and software design.
Dependency cycles will be familiar to you if you have ever locked your keys inside your house or car. You can't open the lock without the key, but you can't get the key without opening the lock. Some cycles are obvious, but more complex dependency cycles can be challenging to find before they lead to outages. Strategies for tracking and controlling dependencies are necessary for maintaining reliable systems.
Dependencies can be tracked by observing the behavior of a system, but preventing dependency problems before they reach production requires a more active strategy. Implementing dependency control ensures that each new dependency can be added to a DAG (directed acyclic graph) before it enters use. This gives system designers the freedom to add new dependencies where they are valuable, while eliminating much of the risk that comes from the uncontrolled growth of dependencies.
Development,
Web Services
May/June 2018
Kode Vicious:
The Obscene Coupling Known as Spaghetti Code
Teach your junior programmers how to read code
Communication is just a fancy word for storytelling, something that humans have probably been doing since before we acquired language. Unless you are an accomplished surrealist, you tell a story by starting at the beginning, then over the course of time expose the reader to more of the details, finally arriving at the end where, hopefully, the reader experiences a satisfying bit of closure. The goal of the writer (or coder) is to form in the mind of the reader the same image the writer had. That is the process of communication, and it doesn't matter if it's prose, program or poetry—at the end of the day, if the recipient of our message has no clue what we meant, then all was for naught.
Development,
Kode Vicious
Corp to Cloud: Google's Virtual Desktops
Matt Fata, Philippe-Joseph Arida, Patrick Hahn, and Betsy Beyer
How Google moved its virtual desktops to the cloud
Over one-fourth of Googlers use internal, data-center-hosted virtual desktops. This on-premises offering sits in the corporate network and allows users to develop code, access internal resources, and use GUI tools remotely from anywhere in the world. Among its most notable features, a virtual desktop instance can be sized according to the task at hand, has persistent user storage, and can be moved between corporate data centers to follow traveling Googlers.
Until recently, our virtual desktops were hosted on commercially available hardware on Google's corporate network using a homegrown open-source virtual cluster-management system called Ganeti. Today, this substantial and Google-critical workload runs on GCP (Google Compute Platform). This article discusses the reasons for the move to GCP, and how the migration was accomplished.
Distributed Computing
Mind Your State for Your State of Mind
Pat Helland
The interactions between storage and applications can be complex and subtle.
Applications have had an interesting evolution as they have moved into the distributed and scalable world. Similarly, storage and its cousin databases have changed side by side with applications. Many times, the semantics, performance, and failure models of storage and applications do a subtle dance as they change in support of changing business requirements and environmental challenges. Adding scale to the mix has really stirred things up. This article looks at some of these issues and their impact on systems.
Storage
Research for Practice:
Knowledge Base Construction in the Machine-learning Era
Alex Ratner and Chris Ré
Three critical design points: Joint-learning, weak supervision, and new representations
This installment of Research for Practice features a curated selection from Alex Ratner and Chris Ré, who provide an overview of recent developments in Knowledge Base Construction (KBC). While knowledge bases have a long history dating to the expert systems of the 1970s, recent advances in machine learning have led to a knowledge base renaissance, with knowledge bases now powering major product functionality including Google Assistant, Amazon Alexa, Apple Siri, and Wolfram Alpha. Ratner and Ré's selections highlight key considerations in the modern KBC process, from interfaces that extract knowledge from domain experts to algorithms and representations that transfer knowledge across tasks.
AI,
Research for Practice
The Soft Side of Software
The Secret Formula for Choosing the Right Next Role
Kate Matsudaira
The best careers are not defined by titles or resume bullet points.
When you are searching for the next step in your career, don't just think about the surface-level benefits. Drill down on your biggest goals and do a little thinking about whether or not each job will help you get closer to those goals. The smarter you are about what you choose next, the closer you will get to the things you truly want from your life and your work.
Business and Management,
The Soft Side of Software
The Mythos of Model Interpretability
Zachary C. Lipton
In machine learning, the concept of interpretability is both important and slippery.
Supervised machine-learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? Models should be not only good, but also interpretable, yet the task of interpretation appears underspecified. The academic literature has provided diverse and sometimes non-overlapping motivations for interpretability and has offered myriad techniques for rendering interpretable models. Despite this ambiguity, many authors proclaim their models to be interpretable axiomatically, absent further argument. Problematically, it is not clear what common properties unite these techniques.
This article seeks to refine the discourse on interpretability. First it examines the objectives of previous papers addressing interpretability, finding them to be diverse and occasionally discordant. Then, it explores model properties and techniques thought to confer interpretability, identifying transparency to humans and post hoc explanations as competing concepts. Throughout, the feasibility and desirability of different notions of interpretability are discussed. The article questions the oft-made assertions that linear models are interpretable and that deep neural networks are not.
AI
Everything Sysadmin
GitOps: A Path to More Self-service IT
Thomas A. Limoncelli
IaC + PR = GitOps
GitOps lowers the cost of creating self-service IT systems, enabling self-service operations where previously they could not be justified. It improves the ability to operate the system safely, permitting regular users to make big changes. Safety improves as more tests are added. Security audits become easier as every change is tracked.
Everything Sysadmin,
Systems Administration