οundergraduate course explores threats to privacy in age of big data
In 2017, Sidewalk Labs and Waterfront Toronto announced their vision for the future of cities: a “smart neighbourhood” on the city’s eastern waterfront called Sidewalk Toronto. Along with autonomous vehicles and robots, the city within a city would feature sensors embedded in its physical infrastructure to collect vast amounts of data about traffic, energy use, mail delivery and even garbage disposal.
But in October 2018, Ann Cavoukian, Ontario's former privacy commissioner, resigned as an adviser for Sidewalk Toronto due to concerns about digital privacy practices – in particular, whether stored and shared data would be stripped of all information that could identify an individual.
“Sidewalk Labs is a great example of our dream of a utopia in which we harness technology to solve problems and enhance decision making,” says David Liu, an assistant professor, teaching stream, in the Faculty of Arts & Science’s department of computer science.
“But I’m fundamentally skeptical about how Sidewalk Toronto’s governance will work, given the huge amount of data that will be collected. I worry whether requests for data will receive nothing more than rubber stamp approvals from the agency responsible and who such an agency would be accountable to.”
Liu is exploring these and other issues in a new course he developed called “What, Who, How: Privacy in the Age of Big Data Collection.” The course explores the countless ways our data are being collected by retailers, corporations and governments, as well as law enforcement and national security agencies – and the potential dangers of that harvest.
“Almost every company we interact with now is collecting data on us,” says Liu. “And it's not just the companies selling us stuff – it’s also the companies who provide free services like Google, Facebook, Instagram and YouTube. They're collecting data on how we interact both with their services and other people using those services.
“What’s more, there’s a growing use of sensors – like those planned for Sidewalk Toronto – that are infiltrating our physical world. So it’s not just when you use an electronic device. It happens as you move through and interact with the physical world. For example, home assistants record your speech, facial recognition software watches you and a device like an Amazon Ring front doorbell video-records you as you walk along a sidewalk.
“Modern technology has enabled mass surveillance in ways that were literally unimaginable 50 years ago. Surveillance that is orders of magnitude more efficient. So it’s really important for us to be cognizant of privacy issues – especially when it comes to public data collection – because it affects everyone.”
“Almost every company we interact with now is collecting data on us,” says David Liu, an assistant professor, teaching stream, in the department of computer science (photo by Diana Tyszko)
Liu and his students explore the digital rewards and risks through case studies that are straight from the headlines.
Students studied the Cambridge Analytica scandal where the consulting firm harvested the personal information of millions of Facebook users without their consent and used it to target political advertising during the 2016 presidential campaign.
They learned how Spotify is sharing more than just traditional user data with marketers – the music-streaming giant is also sharing listeners’ emotional states and what they’re doing as they listen. With this intelligence, retailers target ads based on whether someone is listening to the Happy Beats, Breakup Songs, Girls Night Out or Barbeque playlist. As Spotify says, “You are what you stream,” and marketers can use this insight to their advantage.
Students also learned that it’s more than just song choices, geographic location or purchase histories that are being collected. We share our most personal data – our DNA – with genealogy services like GEDMatch and 23andMe.
Such services are a potentially valuable resource. In 2018, for example, police uploaded DNA they found at a crime scene they suspected was a murder committed by the Golden State Killer to GEDMatch. The company’s genetic database of a million users revealed distant relatives of a suspect who – after additional investigation and police work – was eventually arrested and charged.
But genetic databases also pose unique privacy risks, according to Liu. “We share DNA with our family members,” he says. “So, if my second-cousin uploads their DNA without my knowledge, this reveals information about me without my consent. And that’s troubling.
“Also, to protect our privacy we can change passwords, user names, our hair colour – even our faces, but our DNA is immutable. You can’t disguise your DNA.”
For Ella Li, a student at University College, Liu’s course was eye-opening. “I never really thought much about privacy and how it relates to big data before,” she says. “But this course gave me new insight into privacy issues and security breaches around the world.”
For Innis College student Isabella Buklarewicz, the course was frightening. “The scariest thing I learned was the way in which companies use algorithms in the hiring process,” she explains. “They may make hiring easier, but the biases of the algorithm’s creator can be embedded in the code – and that can be advantageous for one group of people and discriminatory toward another.”
As Liu notes, the risks of big data collection are not shared equitably as “marginalized people are more vulnerable to surveillance and more likely to be targeted because of their race or because of their socioeconomic status or because they’re immigrants.”
The lesson is not lost on Buklarewicz.
“The collection of data isn’t just a privacy issue,” she says. “It’s a human rights issue.”