Data Exposure in Online APIs
Spotify users can visit Spotify.me, log in to their account, and see some interesting figures and inferences about their listening habits. Although Spotify itself runs this service, users still need to grant it permission to analyze personal account data.
Spotify makes this data (and other platform functions) available to external software developers through its application programming interface (API). Given proper user authorization, programmers may use the Spotify API to operate on private user data: modify playlists, analyze users’ favorite albums, integrate Spotify with voice assistants, and more.
Spotify isn’t unique in giving third-party software access account-holders’ private data. Many other online services offer similar APIs, which enable a variety of valuable use cases. A Slack bot might integrate with Github’s API in order to issue project activity notifications in a team’s Slack channel. A file migration tool might use APIs offered by cloud storage providers (e.g. Dropbox, iCloud, OneDrive) to automatically move files between services. A script might use the Reddit API to export a user’s favorite links.
Online APIs pose privacy risks to users though, as the Cambridge Analytica scandal showed. A researcher developed a personality quiz app that integrated with Facebook’s API. Quiz-takers were unaware that the app harvested information from their Facebook profiles and that of their friends. Cambridge Analytica later used this data for targeted political advertising. Following its revelation in 2018 and the resulting public backlash, Facebook updated their platform to address API privacy risks. Their API would expose fewer private data types, and user consent would automatically expire for unused apps, among other safeguards.
Google API Usage in the Wild
It would be reasonable to expect other online services take similar measures to reduce the exposure of private user data through APIs. However, a cursory glance at one’s Google account authorizations may suggest that this isn’t as strong an assumption as expected.
While Zoom is indeed a valuable product, it’s concerning that an external service can read email data, view calendar events, communicate with unspecified external services, and do so without user interaction. This observation prompted our preliminary investigation into the privacy risks of online APIs and mitigations in place to protect private data. Our research asked the following questions:
- What API authorizations do third-party apps tend to request?
- What safeguards are present to mitigate risks of API misuse?
To answer these, we examined 987 web apps listed on the G Suite Marketplace as of January 2020. This represents only a small portion of Google API integrations; Google APIs can be integrated into mobile apps, standalone desktop programs, and other classes of software not covered in our corpus. Still, the G Suite Marketplace represents a good cross-section of productivity-focused apps and add-ons, many of which have millions of installations each.
We developed a Selenium script to automate the installation of these apps onto a free @gmail.com test account. For each app, we collected the permissions disclosed on the app authorization screen.
Among our corpus of 987 Marketplace apps, the most commonly requested API permissions were:
- 50% display and run third-party web content in prompts and sidebars inside Google applications
- 49% connect to an external service
- 27% see, edit, create, and delete your spreadsheets in Google Drive
- 25% allow this application to run when you are not present
- 21% see, edit, create, and delete all of your Google Drive files
We opted to look more closely at the apps with the ability to connect to an external service. These apps made up nearly half of the corpus. Among the 481 apps that did this, 21% had full access to users’ Google Drive data, 17% could read Gmail data while running as an add-on, and 3% had full access to Contacts data.
As a platform, the G Suite Marketplace does not appear to provide details about what external services apps use or what data is shared with those services. Users have to rely on voluntary disclosures from app developers in order to assess any potential privacy risks Marketplace apps might pose, as shown below. As these disclosures are voluntary; not all apps explain how they use external services.
[Update on June 10, 2020. Romain Vialard (developer of Awesome Table and Yet Another Mail Merge) and Stéphane Giron (developer of ezShared Contacts) reached out to me to clarify the app verification process. In 2018, Google implemented OAuth client verification for risky API scopes, but allowed existing apps that use those scopes to remain listed on the G Suite Marketplace as “unverified.” See Romain Vialard’s Medium post for more details.]
Google’s policies recognize risks in giving unrestricted access to user data via API. The platform enforces certain limits on the use of API scopes deemed “sensitive” and “restricted”. The strictest limits apply to the use of private Gmail and Drive data, which require verification by Google. We encountered unverified apps during the automated crawl and installation of G Suite Marketplace apps. The platform does allow up to 100 new users to install unverified apps, but it is subject to Google’s discretion. Users are unable to install unverified apps beyond this limit.
[Update on June 10, 2020. Romain Vialard (developer of Awesome Table and Yet Another Mail Merge) and Stéphane Giron (developer of ezShared Contacts) reached out to me to clarify that the user counts discussed below include many accounts that didn’t directly authorize the apps. Those accounts are part of G Suite domains for which the administrator enabled apps on their users’ behalf. See Romain Vialard’s Medium post for more details.]
We measured how strictly Google adhered to its stated policy of limiting unverified apps to 100 new users. In our initial crawl of the G Suite Marketplace, we encountered 277 unverified apps, of which 144 were installable and thus included in our analysis corpus. We re-examined those 144 apps 16 days later and found that 124 were still unverified and installable. 24 apps netted more than 100 new users between the two crawls. The following apps are a sample of the largest net user growths:
- +9,398 users — Chemistry Question Generator
- +7,158 users — YouCanBook.me
- +3,632 users — siteMastro
- +2,332 users — Hippo Video
- +1,135 users — ezShared Contacts
ezShared Contacts is of particular interest. We collected this data in January 2020. The app remains unverified but installable of May 2020. The app also continues to request full access to the user’s Gmail, Drive, and Contacts data.
Lessons to Re-Learn?
We believe our examination of G Suite Marketplace apps justifies further research into improved disclosure, control, and auditing mechanisms in this area. We don’t need to look far for productive ideas to this end. Mobile platforms have evolved to address some of these issues. For instance, install-time permissions remain common in online API authorizations, but have fallen out of favor in mobile platforms.
Prior work has found that users struggle to understand install-time permissions on mobile devices. Additionally, mobile apps often exercise permissions in ways that don’t match to user expectations. Further research is necessary to determine how much these findings apply to online API permissions. If they are indeed relevant, then existing interventions like contextual permission systems and better user interfaces could be adapted to improve transparency and control for online API integrations.
Although our findings come from a somewhat narrow class of apps that use the Google API, they reveal much opportunity for improving user privacy in the context of online API integrations. These integrations are widely used, but mostly invisible to end-users. It took a major breach of privacy as broad and political as the Cambridge Analytica scandal to bring this issue to the public eye. And while Facebook did indeed take sensible steps in response to that, this does not appear to be universal across online services.
This work was accepted to the 2020 Workshop on Consumer Protection and Technology (ConPro ’20). Further details can be found in the draft report. G Suite Marketplace scrape data available for download.