Sleuthing Google Apps - Part 2: The Google Application Suite

by Estragon

In Part 1 (39:1), we discussed how Google Calendar "busy" time may be utilized to see when people are meeting together, even when meetings are intended to be confidential.

In Part 2, we will see how the history of changes to documents can be illuminating.  First, let's review what the Google application suite is for.  This is a set of online applications, which are web-accessible and have native apps for phones and tablets.  The apps include email, a calendar (which was our focus in Part 1), office productivity tools (documents, spreadsheets, presentations), file storage and sharing, and a variety of other things.  The suite also includes non-Google applications.  In addition, a Google login may be utilized to access non-Google services as part of a single sign-on solution.

Many individuals utilize the Google suite, and thousands of organizations provide their constituents (members, employees, affiliates, etc.) with a Google suite login within the Internet domain space of the organization.  In this article, I will describe how there can be unintended information leakage through the use of applications that allow authorized users to view the history of changes.

Being able to recall and replay history in computer-based tools is a standard feature across a variety of applications and platforms.  For example, the UNIX/Linux history command shows what commands were executed in the shell and the history can be saved so you can search from earlier logins.  Another example is using Ctrl-Z (or similar) as an "undo" command to roll back to one step earlier in many applications.  A final example is source code revision tracking, such as that offered by Git, make it easy to roll back a set of files (i.e., source code) to an earlier state.  The ability to view history, potentially with features like undo or rollback, is a great convenience.

In the Google application suite being discussed here, an interesting feature of the spreadsheet ("Google Sheets") and word processor ("Google Docs") applications is that the history can track edits by multiple identities.  That is, a single spreadsheet or document can be edited by people with different Google logins - and the history associates each change with the specific login (i.e., person) who made it.

This is a very useful feature.  During collaborative editing, which might take place over days, weeks, or longer, anyone who can view the document can also see the history of changes, and who made them.

In a Google document, granularity of the history is typically the editing session.  You can view a version of the document (even many versions per day) that reflects what a document looked like before a session.  Usually this seems to be an amount of time at the keyboard adding content or making changes.  If multiple people were editing, each person's edits before someone else made changes would constitute a session, so you could rollback to an earlier version if desired, or see what has been changed.

In a Google spreadsheet, the temporal granularity is similar.  But in addition to switching back to an earlier session, Google helpfully highlights the specific cells in the spreadsheet that were changed.  You can then visually see what was changed, as well as what user made the change.

There are a few ways to get access to the history in docs and sheets.  Easiest in the web interface is to look towards the top of the screen where it says when the most recent change was (something like, "Last edit was..." and a date or time, like "5 minutes ago."  Just select (click on) that text, and you'll get the history tracking view of the doc or sheet.  It pops up on the right side, and you can navigate back to different versions.  This behavior seems to be similar across different web browsers (Firefox, Chrome, etc.).  On phone and tablet apps I tried, the menu was a little different: "Details and activity."

These features provide some accountability and traceability to determine what changes were made.  They allow a reasonably granular rollback capability that persists even after the web browser is closed or the user logs out, because the history is part of the document itself.  In documents that are older, they provide a record of who worked on it and where their contributions were made.  If someone works on a document, and then their Google login for the organization is deleted, track changes will indicate an anonymous or unknown user made those changes.

The history tracking features can be a source of information leakage, however.  For example, there might have been earlier versions with content found to be questionable, offensive, incorrect, or otherwise undesirable.  The history tracking means that those earlier versions are still accessible to anyone who can see the document history.

It might be that in some organizations there is sensitivity to the identity of an editor.  If it was a departmental memo, for example, perhaps it would be inappropriate if someone from another department made changes.  In a university setting, what if a term paper ostensibly written by a single student was found to have had sections written by other students?  What if changes were made by someone who had departed the organization, but still had managed to retain a Google login?

A personal experience I had with information leakage builds on a story I told in Part 1.  In that story, I was in a large multi-institutional membership organization where hundreds of people from over a half dozen organizations had a shared Google space.

There was a situation where a group of people in the organization were colluding against the broader organization.  Part 1 described how I was able to gain insights into the people who were colluding: who was colluding, when they had meetings, and even where they met, simply by looking at free and busy time in the calendars I had access to.  This was information leakage through Google Calendar.

The same collusion was also manifesting itself in the shared documents and spreadsheets.  The default settings I am familiar with are that documents, spreadsheets, presentations, and similar types of works are not viewable or findable by organizational members, except by the person who owns them.  That person can then invite others to collaborate.  Collaborators may be invited either to view only; to view and comment; or to view, comment, and edit.

However, the setup we used, which is typical of other organizations I've seen, is to have a shared document repository.  Anyone in the organization could access documents in the repository and navigate it via a hierarchy.  The tool for organizing, sharing, moving, etc. is Google Drive (GDrive), and it serves as a web-based interface to a document collection.

If you haven't used GDrive before, or had it in a shared organizational context, it probably still sounds familiar.  The Windows, Icons, Mouse, Pointer (WIMP) interaction method, combined with POSIX or POSIX-like capabilities for creating a file (document) and directory (folder) hierarchy, is ubiquitous.  It's the basis of Windows, Macintosh, and *NIX approaches to files and directories.  This is also how much of the web is presented and experienced, with main pages (files) leading to groups of other pages (directories) in a hierarchy.

So, in the organization I was part of, we had a shared GDrive with many documents.  Most were visible to anyone in the organization, and many were even editable by anyone.  We trusted people to behave, though it would have been possible for someone to purposely delete, rename, or deface documents.  Of course, it would have been easy to find out who had made those changes, unless they did a good job of covering their tracks.

The collusion situation was that we had some shared organizational documents, set up for limited visibility only for a cross-organization group that was working on them.  This included a budget for the whole organization and its component organizations, as well as various documents describing governance processes.  That big budget spreadsheet, though, was the focus.  The group working against the larger organization was, among other things not discussed here, trying to shift the budgets so that some parts of the organization would starve, while others would thrive.

I'm not providing a lot of detail (such as, how would a shared spreadsheet have such a big real-world impact?  Aren't there other processes in place to ensure against misbehavior?).  For this example, the focus isn't on what happened next.  The point is that there was a group within the broader organization that was attempting to hijack the process, by making edits to the spreadsheet in their favor.

Figure 1: A budget planning document intended to be viewed by all organizational members.

Sleuthing to the Rescue!

By looking at the change history, I was able to see that the spreadsheet owner first created the Google spreadsheet by uploading an Excel file.  The original name of the file disclosed intentions behind the budget, because it had a name that basically said it was focused on enriching some of the organizational members by cutting the budgets of other members.  The history showed that the spreadsheet was immediately renamed to something less incriminating, but the history showed the original.

Within the spreadsheet, I could see who had made changes to adjust the original and the nature of those changes.  It was evident who was trying to move money away from one part and towards another.  Through the history viewing mechanism described earlier, I could see just what changes were made and how they propagated throughout the spreadsheet.  For example, changing assumptions about annual salary increases for one part of the organization would instantly propagate across the spreadsheet, even across multiple pages in the spreadsheet.  Google helpfully color-codes these changes, according to who made them and when.

Figure 2: Original version showing content that had been removed after the Excel file was uploaded.
Color coding (appearing shaded here) shows what was changed during the editing session.

It was also interesting to see who had not done any editing.  In several cases, I saw that the top administrators for the colluding organizations were making these changes, rather than their finance experts.  In other words, it was the bosses who were colluding to disenfranchise other bosses.

This information leakage is a byproduct of the convenience of a shared editing platform.  I took some screen shots and made saved copies of some of the intermediate versions (another convenient feature!) as evidence of the collusion.  The examples in this document are not the actual documents from the incident.  They were created by me to illustrate the fundamentals.

Just as with Part 1 of the article, which described information leakage in the Google calendar, the leakage through Google's spreadsheets and documents is a result of the design.  I didn't need to have administrator privileges, or bypass any technical controls, to get a picture of what had happened during the history of the edits.

Yet it's clear that those making the edits would have preferred their identities and the nature of the changes were not visible to people who were not part of the colluding group.  After all, the group had made significant efforts to keep their plans hidden (including as described in Part 1 with a secret meeting).  The edits all happened before a big meeting to go over the final proposed budget.

At the budget meeting, the collusion group didn't raise any questions about the new budget or how it had managed to sway resources towards their parts of the organization.  It was left to the disenfranchised to point out the problems.  My sleuthing was instrumental in demonstrating the focused effort to shift budget resources.

Were there steps the group could have taken to avoid making their actions visible?  What general practices might be advised for individuals and organizations utilizing the Google suite?

Firstly, common sense would dictate that anything happening on the shared platform might be visible to others.  In my examples, it was easy for anyone with access to view the spreadsheet to see who had made changes, and the impact of those changes.  This was a result of the design of the tools in the Google suite.

Yet even if the platform didn't make actions easily visible, they would be visible to people with privileged roles within the Google suite for the organization.  Or perhaps only to Google itself.  For example, private (non-shared) documents are only visible to the username that "owns" them.  But an administrator could force a password change and login as that username to see the private files, emails, etc.  This password change would be detected by the person who had been using the account, of course, but not if, for example, they had been fired.  If external authentication was being used (using LDAP) or OAuth 2.0 or similar), the administrator could even change the password back without the original user knowing about it.

Bottom line: If you are using a shared platform, you should assume that anything you do could be visible to others.  The only issue is how easily it's visible.  In the case of shared documents and spreadsheets, what you do is visible (at some level of granularity) to anyone who can see those shared documents and spreadsheets.

Secondly, as a corollary: Anything you would prefer to be kept secret should be done off the shared platform, or at least outside of the areas that are easily visible by default.  In my example, the collusion group would have done better to utilize email to revise an Excel spreadsheet, before uploading the Excel file as a Google sheet.

Thirdly, there are some steps to make the history less visible.  In the Google suite, the editing history is part of a specific document.  If you make a copy of the document, the history is not copied.  So, a new document starts with a blank history.

Another technique to make a new document is to download.  If you save a Google document or spreadsheet as a .DOCX or .XLSX respectively, the editing history is not saved.  (Note that any comments are saved.)  You could then share the .XLSX or .DOCX, instead of the online document or spreadsheet.  Of course, the collaborative editing and other features will not be available, but maybe this is desirable.

More generally, if your goal is to share the outcome, and not an editable file, then save/download as a PDF (or even take a screenshot).  You can even put the PDF in your shared Google space.

And finally: Be diligent about default settings for sharing, granularity of who things are shared with, and removing shared access when it is no longer necessary.  This is partly the responsibility of the domain administrator for the Google suite, and partly the responsibility of the individual:

Shared spaces (i.e., a location in Google Drive, as mentioned briefly above) should only be used for items that should be shared.

When allowing access to others, make it at the lowest suitable level: View, Comment, or Edit, in that order.  "Edit" capability (versus "View" or "Comment") should not be the default.

Revoke or decrease access when it is no longer needed.

If shared editing is not needed, then do editing in a private space, and share immutable formats like PDF.

One final note on the Google technologies I've written about: The details of features and how to access them change over time, including some changes since the experiences I've described.  While the specifics of what I've described might change over time, the general characteristics of the design of the platform have remained stable.

In closing, please be cautious when you are using shared platforms for document editing or similar purposes.  The platform can keep track of what you are doing, and information about actions that might seem secret may be easily visible to others.

Return to $2600 Index