Traditionally, data gathered for electronic recordkeeping was in the same paradigm as files in a filing cabinet. Data was recorded by a human at some point, filed away, and retrieved (and perhaps updated) as needed. Data that was no longer relevant would be discarded to make room for new data.

Early digital systems were similar: data was input by human beings, created by computer systems, or sensed within an environment, and then more or less filed away to be retrieved later, when needed.

Modern data management can be mapped to three key stages:

1. Disclosing / Sensing—humans or machines that gather and record data.

2. Manipulating / Processing—aggregation, transformation, and/ or analysis that turns data into useful information.

3. Consuming / Applying—a person or machine uses information to derive insights that can then be used to affect change.


Data Disclosure

Data at Rest

Data may be sourced from archives or other backups


Guideline: Ensure the context of original consent is known and respected; data security practices should be revisited on a regular basis to minimize risk of accidental disclosure. Aggregation of data from multiple sources often represents a new context for disclosure; have the responsible parties made a meaningful effort to renew informed consent agreements for this new context?

Data in Motion

Data is collected in real-time from machine sensors, automated processes, or human input; while in motion, data may or may not be retained, reshaped, corrupted, disclosed, etc.


Guideline: Be respectful of data disclosers and the individuals behind the data. Protect the integrity and security of data throughout networks and supply chains. Only collect the minimum amount of data needed for a specific application. Avoid collecting personally identifiable information, or any associated meta-data whenever possible. Maximize preservation of provenance.

 

Data Manipulation

Data at Rest

Data is stored locally without widespread distribution channels; all transformations happen locally


Guideline: Set up a secure environment for handling static data so the risk of security breaches is minimized and data is not mistakenly shared with external networks. Data movement and transformation should be fully auditable.

Data in Motion

Data is actively being moved or aggregated; data transformations use multiple datasets or API calls which might be from multiple parties; the Internet may be used


Guideline: Ensure that data moving between networks and cloud service providers is encrypted; shared datasets should strive to minimize the amount of data shared and anonymize as much as possible. Be sure to destroy any temporary databases that contain aggregated data. Are research outcomes consistent with the discloser’s original intentions?

 

Data Consumption

Data at Rest

Data analytics processes do not rely on live or real-time updates


Guideline: Consider how comfortable data disclosers would be with how the derived insights are being applied. Gain consent, preferably informed consent, from data disclosers for application-specific uses of data.

Data in Motion

Data insights could be context-aware, informed by sensors, or might benefit from streamed data or API calls


Guideline: The data at rest guidelines for data consumption are equally important here. In addition, adhere to any license agreements associated with the APIs being used. Encrypt data. Be conscious of the lack of control over streamed data once it is broadcast. Streaming data also has a unique range of potential harms—the ability to track individuals, deciphering network vulnerabilities, etc.


Historically, digital systems were not as interoperable and networked as they are now, and so data could be thought of as being “at rest”—stored statically, just like the files in filing cabinets of the past. But today, some data is in near-constant motion. When we access social media sites, we’re not just pulling static data from some digital filing cabinet— we are accessing data which is in constant transformation. For example, algorithms shift which news stories are displayed to us based on an ever-evolving model around our tastes and the tastes of other users. An action taken in an app, like an online retailer, connected to a user’s social media account could change the content delivered to a user. Managing the complexity of consent and potential harms in this environment is much harder than the connection between traditional market research and mass-purchase, broadcast advertising.

This data in motion is much harder to comprehend at scale. The chaotic effects of multiple, interoperable systems and their data playing off each other makes it difficult for design and development stakeholders to see the big picture of how data might affect their users—much less communicate to those users for the purposes of informed consent or doing no harm. Data in motion can be relatively straightforward in the context of the flow of interactions through each stage of disclosing, manipulating, and consuming data. However, although it can be tempting to think of data as a file moving from one filing cabinet to another, it is, in fact, something more dynamic, which is being manipulated in many different ways in many different locations, more or less simultaneously. It becomes even more ambiguous when a lack of interaction with a piece of data could still be used to draw conclusions about a user that they might otherwise keep private.

This data in motion is much harder to comprehend at scale. The chaotic effects of multiple, interoperable systems and their data playing off each other makes it difficult for design and development stakeholders to see the big picture of how data might affect their users.

For example, ride-sharing apps need to collect location information about drivers and passengers to ensure the service is being delivered. This makes sense in “the moment” of using the app. However, if the app’s consent agreement allows location data to be collected regardless of whether or not the driver or rider is actually using the app, a user may be passively providing their location information without being actively aware of that fact. In such cases, the application may be inferring things about that passenger’s interest in various goods or services based on the locations they travel to, even when they’re not using the app.

Given that location data may be moving through mapping APIs, or used by the app provider in numerous ways, a user has little insight into the real-time use of their data and the different parties with whom that data may be shared. For users of the ride-sharing app, this may cause concern that their location data is being used to profile their time spent outside the app—information that could be significant if, for example, an algorithm determines that a driver who has installed the app is also driving with a competing ridesharing provider.¹ Without clear consent agreements, interpretation of where data moves and how it is used becomes challenging for the user and can erode the trust that their best interests are being served.

Trading data among multiple organizations can make data more useful and information more insightful. As a discipline, this practice requires the ability to predict potential effects when data is combined or used in new ways or new combinations, and is best aided when data’s movement can be recorded in a way that makes tracking provenance possible when failures occur. But just as diplomats must consider many complex and sometimes unpredictable risks and opportunities when engaging across borders, so too must leaders and developers.

Organizations must be willing to devote at least as much time to considering the effects of this data-sharing as they are willing to look at its monetization options. They must also find a common language with other organizations—and end-users— to determine what is an effective and acceptable use of data. Informed consent requires that data diplomats—be they business executives, application developers, or marketing teams, among many others—communicate proactively about the potential pitfalls of data exposure.²

Organizations which are effective at communicating their datasharing efforts stand to win the trust of users. These users will be willing to take a (measured) risk in sharing their data with the promise of a return of more useful data and information, less expensive services, or other benefits.

With the advent of mainstream IoT devices, a sensor on a device may give a user feedback in the form of a raw number that reflects something about the state of their environment. If the user takes this information at face-value, they may start to think less about the device providing the feedback and focus instead on the number itself. This shift in attention is important because overlooking the device or system that provides feedback suggests that how data is handled and any algorithms used to process the data are also being overlooked. The failure to consider these underlying interactions can result in unintended risks.

With analytics and machine learning, algorithms can be trained to notice where there has been customer upset, and bring it to the attention of a real human.

Take the example of the use of algorithms to create risk assessment scores that rate a defendant’s risk of committing future crime. Now widespread in the US justice system, these risk assessments were recently the subject of an indepth investigation by ProPublica, an independent newsroom producing investigative journalism in the public interest.¹⁸ In 2014, the US Attorney General raised concerns that these scores could be introducing bias to the courts (where they are used to inform decisions on bail, sentencing, and probation). This algorithmic process has been shown to be unreliable in forecasting certain kinds of crime. In fact, in an instance investigated by ProPublica, based on the risk scores assigned to over 7,000 people arrested in a single Florida county in 2013 and 2014, the algorithm used was little more reliable than tossing a coin in its ability to accurately identify re-offenders.

With analytics and machine learning, algorithms can be trained to notice where there has been customer upset, and bring it to the attention of a real human—in other words, detecting that harm may have been done and bringing it to the attention of developers and other stakeholders. On social media, sentiment analysis (a set of tools and practices which deconstruct written language and user behaviors to detect mood) could be used to identify situations where a piece of data shared about a user is causing emotional (and potentially physical) harm. Take the example of a user of a social network uploading a picture of another user and “tagging” them in that picture. If that second user starts reacting negatively in comment threads or receiving negative messages from others, machine learning could identify these situations and escalate them to moderators, presenting an opportunity for that user to “untag” themselves or request removal of the photograph in question. Such an approach could go further by then alerting developers and other business teams to consider such scenarios in their user personas and user stories for subsequent app updates, consent and permissions management approaches. This secondary feedback is key in making sure lessons learned are acted upon and that the appropriate corrective action is taken.

Monitoring data transformations through user interviews

Interviewing users who have experienced harm can uncover misunderstandings in the way users are perceiving or using applications. This is not placing the blame on users, but can rather be used to determine areas where better communication may be required. Noticing where users say that a use or disclosure of data was not appropriate is a form of qualitative forensics which can be linked to other, quantitative approaches like behavioral analytics. When users see an app or service that feels uncomfortable, that’s an indication that consent may not be in place. But information about this discomfort rarely reaches developers or business stakeholders unless they cultivate—and systematize—curiosity about and empathy for users.

In order to think critically and spot potential harms to users, employees must have a working knowledge of how data moves and is transformed, how data (and users) are threatened by cyber-security breaches, and what end-users expected and consented their data could be used for. This goes for IT stakeholders, but also employees in general, given the increasingly digital nature of corporations across entire companies. Regularly updating the shared “world view” of the organization with both direct and analytics-sourced input from users is an important first step. Once it’s been taken, it can be followed by creating feedback loops into the software development process from both human and machine sources.

This will enable empathy for users to be systematized into actionable updates of practices and programming.

Forensic analysis

Forensic analysis of data breaches is becoming more commonplace when data holders experience cyberattacks; similar methods can be used to track data through various servers and applications to determine where personally identifying data might be vulnerable to disclosure, or has been processed in a way contrary to the intent of the user or designers. However, most organizations are not yet prepared to track data well enough to discover, much less mitigate, harms to users.

Continual discovery of potential harms

Google is faced with a conundrum: if its machine-learning discovers that a user may have a medical condition, based on what that user is searching for, is it ethical to tell the user? Or unethical? A recent article by Fast.co Design explored this concept:¹⁹

“If Google or another technology company has the ability to spot that I have cancer before I do, should it ethically have to tell me? As complicated as this question sounds, it turns out that most experts I asked—ranging from an ethicist to a doctor to a UX specialist—agreed on the solution. Google, along with Facebook, Apple, and their peers, should offer consumers the chance to opt-in to medical alerts.”

Such conundrums are not limited to search results, but the uniquely personal (and potentially emotionally and physically harmful) impact of medical analytics is still a nascent conversation that neither healthcare providers nor technology companies are fully prepared to enter into—yet.

Leaders can learn from Google’s example by creating ways for end-users, “observers” (in this case, medical professionals and other researchers), developers, and executives to discover potential harms—even after product launches.

The reality is few organizations are currently able to show you the full impact of a breach— few are able to identify all of the systems that were breached, much less what specific data could have been compromised. Understanding the scope of the damage/harm is predicated on having both the right mindset and logging and monitoring in place.
— Lisa O’Connor, Managing Director, Security R&D, Accenture

Avoiding harm and continuing to seek and clarify informed consent is vitally important. Mitigation of harms takes many forms, and can involve nearly every part of an organization—especially when harms are not immediately noticed or their scope is large.

Scope control and fail-safes

Data-related harms usually occur in one of two categories: unintended disclosure of raw data (such as photos of a user or their credit-card information) or improper decisions that have been made based on data about a user. These decisions can be made by humans (such as a decision on whether or not to prescribe a medication), hybrid decisions (such as a credit reportinfluenced decision whether to offer a loan) or machine decisions (such as automatic re-routing of vehicles based on traffic data). Strategies to mitigate such harm and respond to it when it occurs depend on the types of decisions being made.

Revocation and distributed deletion

While pre-release design is critical to meet the “do no harm” expectation, designing with the ability to adapt post-release is equally critical. For example, a social network in which users directly offer up data about themselves (whether for public or private consumption) would likely be launched with privacy controls available from day one. However, the system’s owners may find that users are not aware of the available privacy controls, and introduce a feature whereby users are informed/reminded of the available settings. In such a case, users should be able to retroactively affect their past shared data—i.e. any change a user makes to their privacy settings should affect not only future shared data, but anything they have previously shared. In this way, a system that was not initially designed to allow fully informed consent could be adjusted to allow revocation of consent over time. However, such a capability requires the system’s designers to have planned for adaption and future changes. And, given the interdependence of various software features, if a breach or unintended effect occurs, plans should include how data can be removed from the entire data supply chain—not just one company’s servers.

One practice for mitigating the risks associated with data usage is coordination between stakeholders in webs of shared computing resources. This collective coordination and alignment on key standards is known as “federation” in the IT context. Standards for ethical and uniform treatment of user data should be added alongside existing agreements on uptime, general security measures and performance.²⁰ Federated identity management (and, as part of that management, single-sign-on tools) is a subset of broader discussions of federation. Identity management is another critical component of managing data ethics, so that stakeholders accessing customer data (as well as customers themselves) are verified to be permitted to access this data.

Communicating impact

As part of an investigation into a 2015 shooting incident in San Bernardino, California, The US Federal Bureau of Investigation (FBI) filed a court request for Apple’s assistance in creating software to bypass security protocols on an iPhone owned by one of the perpetrators. Apple’s public letter to its customers explaining its decision to challenge that request provides a glimpse into the complexity and potential risks. How society addresses matters such as these will be central to shaping 21st century ethics and law. Apple chose a very intentional, bold, and values-driven stance.²¹ Perhaps most relevant to the issues of informed consent and intent to do no harm was Apple’s choice to not only make a statement about its position and intention, but also to explain in layman’s terms how the current security features function, how they would potentially be subverted by the proposed software, and what the future risks of having such software in existence would be, should the company comply with the FBI’s request. In the words of Tim Cook, Apple’s CEO, “This moment calls for public discussion, and we want our customers and people around the country to understand what is at stake.”

Brands that declare responsibility for educating their users about data security and use have an opportunity to build trust and loyalty from their customers.

 

This move is in stark contrast to the often lampooned iTunes EULA (see R. Sikoryak’s “The Unabridged Graphic Adaptation [of] iTunes Terms and Conditions”), which may be seen as a small annoyance to users as they scroll past and accept without reading in order to access their music.²² Like most EULAs, the dense legalese gives users a difficult time in determining the significance of any changes they need to “review” and “accept.”

As users increasingly realize the importance and value of securing their data and sharing it selectively and with intent, brands that declare responsibility for educating their users about data security and use have an opportunity to build trust and loyalty from their customers. By focusing on more than just removing themselves from liability through processes that occur as a technicality in the user flow, and instead utilizing proactive measures (as Apple does by reminding users that location data is being used by apps) companies can establish themselves as industry leaders in ethical data use.

At a minimum, a data literacy program should cover:

  • What happens once data “leaves” the control of any one employee or user.

  • The impossibility of a static enterprise architecture in the age of data interdependency (due to use of third-party APIs, data partnerships and common frameworks).

  • Understanding of data at rest (stored data) versus data in motion (data being transformed into useful information by systems and people).

In the process of modeling potential uses for data, unspoken values will become apparent. If good feedback loops are in place, end users will be able to signal to developers where their values diverge from those of the developers, or where implementation does not bear out the intended result. Doctrine for data handling and management of consent needs to be incorporated not just at the edges of an organization’s human decision-makers, but at the edges of its computing infrastructure as well (as embodied in the algorithms used in analytics or machine-learning processes).

Doctrines, which can be defined as guidelines for effective improvisation, can be used to achieve this requirement. Coined in business usage by Mark Bonchek, the concept is sourced originally from military contexts, where commanders need to empower the soldiers on the front lines to have rules of engagement which not only specifically proscribe or restrict their actions, but also give them sufficient information to make smart decisions in line with a larger strategy, even when not able to directly communicate with the command structure.²³

Wise leaders will attend to management of consent and prevention of harm through good design, seeking informed consent from users, monitoring and managing consent over time and creating harm mitigation strategies. And with data literacy programs in place in their organizations, embodied in clear doctrines, their teams and partners will be able to fulfill the promises companies have made to their users—both to avoid harm and to create new value.

Further discussion of implementing a doctrinal approach can be found in “Code of Ethics.” The incorporation of values into machine teaching and learning is discussed at length in “Ethical Algorithms for Sense & Respond Systems.”

100-day plan:

Over the next three months, these are the actions you can take to improve your informed consent practices and minimize potential harm:

  1. Evaluate existing ethics codes that your business has agreed to follow. Consider whether they have sufficient guidance for data ethics. If not, host a design session to build a draft of your own Code of Data Ethics. Use the 12 guidelines for developing ethics codes as a guide. Coordinate with partners and suppliers to ensure their future ability to honor your new Code.

  2. Build an operations plan for communicating and implementing your Code of Data Ethics by charting the roles that furnish, store, anonymize, access, and transform data on behalf of your customers.

  3. Evaluate any informed consent agreements your organization offers for language that may be unclear and could lead to misunderstandings between your business and your customers. Begin to develop a plan to address these inconsistencies by simplifying language and clarifying intent around data use.

  4. Pilot a data literacy training program for data scientists, technical architects, and marketing professionals. Use their feedback to refine a larger program for all employees.

  5. Implement regular reviews of data-gathering techniques. Involve a diverse group of stakeholders and maximize transparency of the proceedings.

  6. Perform a gap analysis of your company’s current cybersecurity strategies that provide threat intelligence and other ways of discovering and automatically mitigating potential data breaches. Enumerate the potential harms that could impact your customers if your company mishandles or discloses data about them. Identify the organizations responsible for safeguarding against these missteps and communicate your findings with them.

  7. Develop a training toolkit to teach your employees who interface with customers how to identify harms that occur through the use of your products. Priority rank the groups within your company who should receive the training with the group that responds to the greatest variety of situations as the highest priority.

  8. Draft and launch a data literacy plan for ensuring shared understanding of data usage and potential harms throughout your organization, including partners and vendors.

365-day plan:

Over the next year, build on top of the short-term goals and scale improvements to include your entire company and ecosystem of stakeholders.

  1. Gain support from your company’s leadership team to ratify your Code of Data Ethics and start working with partners and vendors to integrate the principles into new agreements.

  2. Roll out a data literacy training program for all employees.

  3. Develop standard text to include in consent agreements that is easily understood and accessible. Consider altering the ways these agreements are shared with customers, how interactive they are, and how customers can revisit these agreements over the lifecycle of their relationship with your products, services, and brand. Instantiate varying degrees of these updates in a handful of agreements. Consent agreements should strive to communicate the scope of how data is collected, manipulated, and used as well as the value this data has for all of the stakeholders in the data supply chain who might come in contact with this data.

  4. Now that potential harms have been enumerated, seek out instances of harm—first from existing feedback loops (e.g. call centers, customer service channels, surveys), and then create new methods for finding harms that fill gaps in existing feedback mechanisms. When unintended harms are discovered, document the incident in the form of a use case and share these findings with product owners and their managers.

  5. Deploy your training toolkit to train groups of employees based on their priority ranking. These employees should understand how to identify, document, and internally report instances of harm. If appropriate, consider disclosing these reports publicly.

  6. Align data use cases by product, interface, and data teams with the customers’ use cases for sharing data in the first place.

  7. Share the customer data-centric threat intelligence evaluation report with your CISO (or equivalent) and ask her to address the gaps your team found between what is currently in place and what a stronger posture might include.

References

  1. Cassano, J. (2016, February 2). How Uber Profits Even While Its Drivers Aren’t Earning Money. Retrieved June 1, 2016.

  2. “…We’ve seen corporates play with even very young startups through this sort of “data diplomacy”…enabling an entrepreneur to get some limited access to data in order to test it out and create a product around it.” Beyroutey J. and MJ Petroni (2013, September 17). Personal interview.

  3. Open Letter to Facebook About its Real Names Policy. (2015, October 5). Retrieved April 3, 2016.

  4. Osofsky, J., & Gage, T. (2015, December 15). Community Support FYI: Improving the Names Process on Facebook [facebook Newsroom]. Retrieved April 3, 2016.

  5. Calvo, R. A., & Peters, D. (2014). Positive Computing: Technology for Wellbeing and Human Potential. The MIT Press.

  6. Bonchek, M. (2013, May 3). Little Data Makes Big Data More Powerful. Retrieved June 1, 2016.

  7. Seshagiri, A. (2014, October 1). Claims That Google Violates Gmail User Privacy. The New York Times. Retrieved June 1, 2016.

  8. Kramera, A. D., Guilloryb, J. E., & Hancockb, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. PNAS, 111(29), 10779.

  9. Goel, V. (2014, June 29). Facebook Tinkers With Users’ Emotions in News Feed Experiment, Stirring Outcry. The New York Times. Retrieved June 1, 2016.

  10. Schroepfer, M. (2014, October 2). Research at Facebook [facebook Newsroom]. Retrieved April 3, 2016.

  11. Jackman, M., & Kanerva, L. (2016). Evolving the IRB: Building Robust Review for Industry Research. Washington and Lee Law Review Online, 72(3), 442.

  12. Gewirtz, D. (2015, October 15). Europe to US: Stop storing our data on your servers (or else). Retrieved June 1, 2016.

  13. Kelley, P. G., Bresee, J., Cranor, L. F., & Reeder, R. W. (2009). A nutrition label for privacy. In Proceedings of the 5th Symposium on Usable Privacy and Security (p. 4). ACM. Retrieved March 3, 2016.

  14. Kelley, P. G., Cesca, L., Bresee, J., & Cranor, L. F. (2010). Standardizing privacy notices: an online study of the nutrition label approach. In Proceedings of the SIGCHI Conference on Human factors in Computing Systems (pp. 1573–1582). ACM.

  15. Usable Privacy. (n.d.). Retrieved March 3, 2016

  16. PBS. (n.d.). Retrieved June 1, 2016

  17. Tuerk, A. (2015, February 10). Take a Security Checkup on Safer Internet Day. Retrieved March 15, 2016

  18. Kirchner, J. A., Surya Mattu, Jeff Larson, Lauren. (2016, May 23). Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks. Retrieved June 23, 2016.

  19. The UX Of Ethics: Should Google Tell You If You Have Cancer? (2016, April 18). Retrieved May 7, 2016.

  20. Federation (information technology). (2016, May 21). In Wikipedia, the free encyclopedia. Retrieved March 27, 2016.

  21. Cook, T. (2016, February 16). Customer Letter. Retrieved February 17, 2016.

  22. Sikoryak, R. (n.d.). iTunes Terms and Conditions: The Graphic Novel. Retrieved March 30, 2016.

  23. Fussell, M. B. and C. (2013, February 20). Use Doctrine to Pierce the Fog of Business. Retrieved February 16, 2016.

Authors

MJ Petroni
Cyborg Anthropologist and CEO, Causeit, Inc.

Jessica Long
Cyborg Anthropologist, Causeit, Inc,

Contributors

Steven C. Tiell
Senior Principal—Digital Ethics
Accenture Labs

Harrison Lynch
Accenture Labs

Scott L. David
University of Washington
Data Ethics Research Initiative

Launched by Accenture’s Technology Vision team, the Data Ethics Research Initiative brings together leading thinkers and researchers from Accenture Labs and over a dozen external organizations to explore the most pertinent issues of data ethics in the digital economy. The goal of this research initiative is to outline strategic guidelines and tactical actions businesses, government agencies, and NGOs can take to adopt ethical practices throughout their data supply chains.

Copyright

This document makes descriptive reference to trademarks that may be owned by others.

The use of such trademarks herein is not an assertion of ownership of such trademarks by Accenture or Causeit, Inc. and is not intended to represent or imply the existence of an association between Accenture, Causeit, Inc.  and/or the lawful owners of such trademarks.

© 2016 Causeit, Inc. All rights reserved. This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. Accenture is a trademark of Accenture.