The HRW Innovation team is always keen to take on new challenges to broaden our horizons, so when we had the chance to contribute to HRW’s Summer Book Club series, there was no shortage in volunteers to take part. Read below to find out how Jaz Gill, Rhiannon Philips, Darren Vircavs and Francesca Cooper debated and deliberated their way through July…
Choosing the right book was a tough place to start, with so many of us working through our own personal list of interesting, innovative literature. But, a consistent front runner (and subsequent winner) was “Everybody Lies” by data scientist, Seth Stephens-Davidowitz. This book resonated with us, as Big Data has long been a focal point and topic for debate in market research. We were interested in understanding how Seth tells his story of understanding the world and human behaviour through such an accessible tool as Google Trends and, importantly, we wanted to know about the pitfalls and limitations of using big data that Seth battled with throughout his career.
The power of something like Google Trends is that internet searches are increasingly frequent and intrinsically linked with our behaviours, so much so that the way we search can provide a glimpse into our subconscious. Most people tell the internet more than they would confide in any living person and, as a result, analysing this data can shed light on attitudes and behaviours that people don’t recognise in themselves- let alone have the ability or desire to communicate (which can be problematic for interviewing). Observational techniques, such as the analysis of Google search terms, allow us to tap into what people actually do- which is often very different from what they say they do. If you are interested in hearing about some of our experiences with a variety of innovative techniques to access the reality of respondent behaviour this way, email us at firstname.lastname@example.org.
The value of data patterns and trends has become intrinsic to the way many industries run and make decisions. Netflix, for example, used to promote movies that were frequently added to “watchlists”: however, it was soon discovered that there is an intention/action gap between what people aspire to watch versus what they actually choose to watch (usually much lower calibre content). So, they changed their algorithm to base suggestions “recommended for you” on what the viewer watches in reality- and viewership went up. Tracking trends and patterns in data, if understood properly, opens opportunities to make predictions and plan ahead – could we predict how certain demographics will respond to particular treatments? Could we identify early symptoms which may be indicators of, god forbid, the next pandemic?
“Everybody Lies” is jam packed full of intriguing examples where Seth uses Google Trends to identify insights that had never been identified before – including controversially linking racial-charged search data with the incidence of people voting for Donald Trump in the 2016 Presidential Election – as you can see by the image below, whilst the patterns are not perfect it does raise an interesting hypothesis…
One theme of the book that really resonated with me is the premise that bigger is not always better – as Seth describes it, “The size of a dataset, I believe, is frequently overrated”. There is a sense that the more data you accumulate the better, because this validates and can illicit nuanced findings. While this is true, an interesting argument is that the bigger the effect a variable has, then fewer observations are required to identify it. Bigger data, in fact, can lead to the “curse of dimensionality” – a phenomenon where if you test enough things, at some point you will find a correlation by chance, not by causation. What Seth identifies as the real value in the “bigness” of big data is the ability it gives you to zoom in on smaller segments, and still have the required clarity and detail for robust analysis (something that we take into account when designing segmentations in particular).
Rhiannon, as a behavioural scientist, particularly enjoyed reding about Seth doing the undoable – and disproved Freud. As Freud’s work focuses on the “unconscious” – a concept which is inaccessible and therefore hard to disprove (the same premise as the inability to disprove a negative – i.e. how can you prove the Lochness Monster doesn’t exist?), he is regarded as unfalsifiable among psychologists and as a result his work has survived for decades. By analysing search terms for fruit that appears in dreams, Seth disproved that bananas (and other phallic shaped fruit/veg) appear in dreams more frequently than any other common fruit.
What can we do with all this information? Seth gives an impactful example of how one of Obama’s speeches on condemning hate crimes against Muslims unexpectedly lead to provoking further intolerance on the internet. Changing the tone of the speech to instead inform people of facts about Muslim people in America resulted in, for the first time in a year, the top search for ‘Muslims’ no longer being “terrorists/ extremists/ refugees” but “athletes”. This is a powerful example that shows how essential communication can be, not only for how you are perceived, but also what response you elicit from your communication.
There is a gap within all this rich analysis of correlation versus causation: what big data often cannot do is answer the ‘why’, which is often the pivotal detail which converts something from an observation into an insight that can be acted upon. This is why we always approach studies with big data in conjunction with supporting methodologies to understand beyond what the data reports – and obtain insights that are truly actionable.
The elephant in the room when it comes to discussing big data always comes back to ethics. Given the fact that murderers may search for ways to murder before committing the crime – should police be allowed to monitor and intercept those searches? The issue arises in that the majority of people who search for “murder” have no intended action as a result of the search, so at what point does this infringe freedom of speech or the right to be innocent until proven guilty? On top of this, if internet searches do truly expose your subconscious thoughts, how can you feel comfortable with one entity owning that much information on you, or trust them to use that information ethically?
So, what did we take away from the book? Aside from some really great anecdotes and fun facts, “Everybody Lies” sheds a clear and considered light on how something so simple to trace such as google search data can open up a whole new realm of insights into human behaviour, however, the value of supplementary ‘small data’ is essential.
Words by Francesca Cooper