Cambridge, July 2013
Im here with Michal Kosinski, Operations Director for the Psychometrics Centre in Cambridge and leader of their e-psychometrics unit. Mr. Kosinski is also a Research Consultant for the Online Services and Advertising Group at Microsoft Research Cambridge, and a visiting lecturer at the Mathematics Department of the University of Namur, Belgium. He is currently pursuing his PhD from Cambridge in Social Psychology in which he is working to better understand the role of Online Social Networks in the corporate environment.
Along with his distinguished academic career, Mr. Kosinski has had a flourishing business career as well, founding a successful software development and ITC consultancy start-up and introducing the first VoIP product line in Poland.
Mr. Kosinski's research in the online environment has recently made waves across the Internet with his collaborative work on the myPersonality Facebook app project. The focus of the project, which involved over 150 research teams from around the world, was to investigate the relationship between psychological traits and online behaviour through analysing 8 million Facebook users profiles.
He's here to provide some insight on the future of social media security and the implications of Big Data mining for online users and ecommerce retailers.
So lets get started .
Tell us a little about how the idea for the myPersonality Facebook app project came about?
The idea came about six or seven years ago. It was originally the idea of Dr. David Stillwell, who was just starting his PhD at the time at Nottingham University. He basically had some time over the holidays and thought, hey, why dont I post some psychological questionnaires online, which he had been studying before as a psychology student. This was at a time when Facebook first opened itself to third-party applications so anyone could create a Facebook app. David really expected to maybe have a few hundred participants, and the amount of people interested in taking part in the questionnaire exceeded all of his expectations. There were months where we had more than one million visitors.
The interesting bit is that we never forced anyone to give us data for the research, we never offered any kind of encouragement apart from giving them genuine feedback on their scores. So as opposed to what usually happens in psychology research where people are being paid, rewarded or even forced to participate like in the case of psychology students, our approach was completely different. We said you dont have to participate, well just give you feedback and if you liked the experience and the feedback then you can click a button and donate your data to science. This ensured that people had no motivation to misrepresent themselves, lie on the questionnaires and so on.
And if you look at the data, it is of unparalleled quality. I work alot with psychological data and Ive never seen data that could be so reliable. Also remember that its an uncontrolled environment, anyone could access it from anywhere and fill it in at their own convenience.
Big Data is such a buzzword at the moment, and it seems to strike fear into everyone whos worried about large social media companies keeping users information. How does your role here at Cambridge and your research fit into this debate?
I actually look at Big Data from the perspective of the user, and their digital footprint. People are increasingly realizing, or should, that everything they do is being recorded at the moment. So as we speak, we are recording our conversation, but your cellphone has the capability to record your conversations all the time, and network operators working between callers also have access to this content.
Increasingly we use this kind of information in research but companies can use it for commercial purposes, governments can use it for eavesdropping purposes. You can easily switch on recording on a cell phone at any time. Your cell phone also knows where it is so it can record movements, it knows who youre meeting, and anyone who has access to your phone, like an app that youve installed and given permission to use your information. So its not only phone logs, text messages or email, Big Data is increasingly about records of your actual life- how quickly you walk, how early you wake up in the morning, your heartbeat. So imagine I can check not only the pitch of your voice but your heartbeat so I can know what your emotions are and I can create a lot of derivatives from this data.
Some people say they dont really care that their physical movements are being recorded, but if I match your physical movements with those of other people, who you met and perhaps your Visa credit card purchases, all of a sudden these little bits of innocent data can bring forth deeper meaning about who you are, your intentions, motivations and the very intimate details of your life.
But this is only the first layer, because everyone knows that if someone followed you they would also be able to figure that out. The thing is that you can now do this using computers on a massive scale. I can just take Facebook and in a matter of minutes I can create a detailed psycho-demographic profile of every person of Facebook. And its not only based on the data that people explicitly share, like my gender, age, friends, etc. but then I can also use the data about books you read, movies and music, status updates you post online.
Before we had to pay people to spy on other people. Now i can just have a computer that will give me a very detailed profile for just about anyone, and very quickly. This of course has amazing applications. It sounds scary, even to me as I describe it, but now imagine we live in a perfect world where you as a user are given control over this data. Now politicians are saying that we should be able to trade our data, exchange it for a few pence each time, but I think this is only disguising the problem. No one cares about a few pence; I think what really matters is control.
I believe that there are so many advantages to this technology, which is able to predict your future behaviour and individual traits from your digital footprint automatically, and do it on a large scale. It has amazing advantages. But then if you would have people who want to opt out of it- people who dont want to use Facebook, cell phones, credit cards, etc. because of these predictions- then even if its a small minority of society they become handicapped in a way. This then becomes extremely expensive not only for them individually, but also for the economy at large. Every person here that doesnt use email, Facebook, credit cards and what-not is basically not as productive, which affects his friends, family and country. It doesnt matter if they are paranoid about privacy or not, it affects everyone who would like to be in touch with them as well.
The problem is that statistically-speaking we are bound to have a big privacy-related issue. The technology that is now available, where like on Minority Report you can really predict what people will do in the future and where incidents will most likely occur at a given time using archived and real-time data of the environment. You can know that a person will be drunk and rowdy at a given time even before that person knows about it!
Moreover you will know people better than they even know themselves- this is scary. I can bet that nowadays people spend more time with their computers and cellphones than with their partners, mothers, fathers, kids. With technology today, those devices will know you better than those people.
Where do you see social media sites like Facebook and Twitter- and this privacy debate thats been circling around their data- heading in the future?
Im not an expert in legal matters, but according to what Ive heard from a lawyer involved in creating privacy laws in the EU, most of the services online at the moment are simply illegal in Europe. So from the point of view of the psychologist, if you collect IQ scores for people, you have to store the data in a certain fashion, to protect it and destroy it after a certain time, obtain users consent and so on. Theres a whole legal framework around the data collected about individuals. Take sexuality for instance- you cannot go around and start collecting a list of homosexuals in a certain area and put it on a public bulletin board, publish it in a newspaper or even store it at home. This would be illegal according to the current framework that exists today.
And now your digital footprint is available to you, your Facebook friends, your internet service operator, cell phone operator, your university if you use their network, your government, marketing institutions that put Cookies on your computer and so on. If the same digital footprint could be used, as I have done in my research, to figure out your sexual orientation with 90% accuracy, this means simply that whether they want it or not, companies must store this information about your sexual orientation along with the rest of what they gather about you, which is illegal.
This also creates a paradoxical situation here because I personally wouldnt advise anyone to shut down Facebook. Its an amazing service doing a lot of good for a lot of people, and there are many advantages to using it. Its a great product that I love myself and I would wish for everyone to be able to use it. But the problem is that its illegal and its unacceptable what happens at the moment with the amount of digital exposure that is out of our control.
Think about Digital Data Markets; there are whole markets where people collected email addresses and some purchase data, and put it on the market so other companies can buy it. Then companies can match what an individual bought in one store with another, run some predictive software, add their personality, their IQ, why not add their religion, political view, sexual orientation and all those other pieces of information, and its entirely beyond your control.
My view here is that the only way to go forward is to give full control of the data to the user. From a technical point-of-view I actually cant see- remember Im a Psychologist and not a software engineer- why Facebook would need to store all of my information on their central servers.
In the past emails were stored on a central server simply because your computer was only online for a number of seconds. Now all devices are online all the time, you have your account in the Cloud but the data can be encoded and no one has to look at it necessarily. Why wouldnt you just have an email that never stays on any centralised server and goes directly into your mailbox? The same goes for purchase records. I dont understand why technically Amazon has to store my purchase information. This could be stored on my computer in an encrypted form, which Amazon could verify with todays amazing encoding algorithms. And you could store all your purchase information on your computer and Amazon could get access to it only when you allow them to see it.
Should users then become more educated about these privacy issues, and how should they be reacting?
I believe to a large extent that as a user you are not given the option to protect yourself, apart from dropping out, which is bad for everyone. I dont want to go as far as saying that access to Facebook is a part of your human rights, but maybe in 10-15 years people wouldnt laugh about that. Having access to the internet today should be a human right, and with Facebook its becoming the same thing. If people began to emigrate from Facebook, this is bad for everyone, much in the same way as leaving school. You cannot really allow young people to choose whether to go to school or not. As a society we made a choice that going to school is better than not going to school and we force everyone to attend. I know that some people may think thats taking it too far, but the same applies to Facebook as an element of the digital environment. If youre not on Facebook, you are outside of your community. I encourage everyone to join Facebook not because I have a stake in them or something- and I actually study the risk associated with using it- but I do believe the advantages by far outweigh the disadvantages. And I hope that as a civilisation we will be able to solve this problem the same way we did with nuclear energy and any other technology that has both positive and negative sides to it.
So users should be educated and they should be aware of whats possible. They shouldnt be told that hey, big companies are doing this. I actually think that its not Facebook, Google or Microsoft that are being spooky about this. They collect the data, and they probably feel to a certain extent uncomfortable even by having this data, and they have loads to lose. If Facebook starts stalking users they can lose a lot of business. For this reason I trust Facebook, I trust Microsoft and Google. But I do not trust governments; I do not trust small companies like the ones who make apps that you can download on your iPhone. Small companies can collect data on individuals and they have little to lose. Governments are too arrogant to care about losing anything.
Now imagine that a rogue countrys government has as much access to internet archives about what you were doing in the past as many corporations. Imagine being stopped ten years down the line at a border of this rogue country for expressing strong opinions about them. Imagine if the police in this country, or in the United States, take your Facebook profile or any other digital footprint and run a scan trying to figure out if youre not a terrorist, or have the wrong political views, or whatever else they may not like.
So what youre telling me is that theres good news and bad news for the future. The fear of online security will continue to exist unless we make some changes to how we distribute and use our personal data. But the increasing utility of social media products means that people want to trust companies with their information. How do you think these advancements will affect marketing techniques in the future?
Well I think it will just become ultimately personalised. Because of your digital footprint, your computer will know you better than your own mother, or better than you think you know yourself. They will be able to predict your behaviour and probably your motivations, your potential and what you want to buy and listen to, etc. So I think the message and product offering, as well as eventually the products themselves, will become ultimately personalised. They will become like other people, or actors in the social network. So when you approach different people, most of us would adjust our behaviour. If you go and visit your grandma you behave in such a way that your grandma is pleased and happy to have a successful interaction with you. You will choose different subjects that you discuss with her than with your peers or people younger than you. You will behave differently and probably even dress differently. We all do it because it smoothes social interactions. We are such successful social animals because we can adjust our interactions and what we say to other people.
I think that both marketing and products themselves will join the game you may say. You car will be your friend, discussing your current weather, swapping jokes, and will adjust the parameters of the engine to your mood that day and your overall personality. It would change the music on the radio to match what you really want to listen to but you wouldnt have to set that up yourself- your car would know better than you.
I have a recent article here by Mark Koltko-Rivera entitled, Can they really predict personality from Facebook Likes? Not well. In it he challenges your Facebook myPersonality App as the best-performing aspects of the authors algorithms and to a large extent a very poor indicator of ones personality traits and intelligence. How can you respond to this, and do you see any logical flaws in this project?
Well our research, just like any other experiment has a number of flaws and limitations. First of all, if you are given control over your data, then you are also given the ability to react to the predictions that are being made. If we dont really do spooky predictions, or predict something against you, but we share with you the results of the predictions. You can imagine the marketing company approaching you and saying, hey would you mind if I predict your behaviour? And by the way this is what we think that you are, and give us feedback whether we are right or wrong. This is the more collaborative way in which you work with marketing companies, products or even your car, that try their best to understanding you.
So then of course you need some kind of prediction mechanism and if the user plays along with the prediction, they will work with you to make those predictions as accurate as possible. What I can see from just skimming this article is he says that the correlations are not so high. But what I think the author of this paper didnt notice really is that Facebook Likes are deliberately poor indicators, which shows the poorest kind of signal that we could use. Facebook Likes are actually public, so anyone can access them. But people wouldnt Like things that are very intimate or kinky. Your browsing behaviour would be much more informative here because you feel much more intimate and visit a much wider variety of websites and places than you would actually click Like on Facebook. So in the predictions we have results indicating that they are much more accurate when you use more invasive and less publicly-shared signals. Moreover, we have to remember that personality itself is just a noisy variable on its own.
This author claims that 81.5% of Openness has nothing at all to do with Facebook Likes. The same applies to personality questionnaires. Personality is a noisy subject, which is very difficult to estimate. Even measuring weight and height can be imperfect and there is an error of measurement, but very accurate. If you cannot measure something accurately, like personality, then it is very difficult to predict it. The thing is we have shown that there is a lot of real-life information in a digital footprint. We can figure out your political views, religious views, IQ, personality and so on. We did not say that personality is the best way to describe people. Now that we have so much data available, maybe we can begin to develop a new model of personality that will be much more accurate, based on much larger samples, have more than five dimensions, describe people better. Such a model would presumably be much easier to predict from.
That sounds quite exciting for the future, but were not there yet. Do you think that this is the path that predictive technology is taking, and in particular where your own research is going?
We are definitely going in that direction because we have so much more data, we can observe people in their natural environment. Its not only self-reported questionnaires where you can misrepresent yourself and limited to only a few hundred questions, which is the maximum you can force anyone to answer. We are now getting access to gigabytes of data about given individuals that we can collect in a day, and you can add to it some more sophisticated brainwave scanning.
I was surprised myself when I started my education as a psychologist, a proper scanning of whats happening in the brain required machines that were worth millions of dollars. Today you can buy on Ebay a tool that costs $70 that connects to your computer by USB that provides you with a highly accurate picture of whats happening in your brain. You can use this to then pilot little planes or whatnot.
We live in a Wild West of data right now. Legal and social changes take time, whereas technology is just speeding up. So this completely new phenomenon, which 10-15 years ago was not a problem, is super positive on one hand and super risky on another. But you can easily solve it- just give full control of their data to an individual. And not control by forcing companies to put ridiculous Cookies agreements on their websites that if you dont agree with then you can just go away. This is not a good way to go. The company should be forced to offer you an option to still use all the services without needing to share any data with them.
Interviewed by Victoria Elizabeth