By John Spooner, Head of Data Science at SAS UK & Ireland.
You may never have heard of it, but the small, West Yorkshire town of Hebden Bridge is the UK's best place to live. Whenever a value judgement like this is made it's easy to dismiss it as subjective. Surely the UK's 'best' place is down to preference and will be different for everyone? Yet, the view of Hedben Bridge as paradise isn't opinion, it's an analytical fact.
The Paradise Found project was launched to discover the best place on Earth, using advanced analytics and machine learning technology – machine learning being one example of something that falls under the much-hyped umbrella term "artificial intelligence" or AI. While Australia's West Perth was crowned the victor, Hebden Bridge was the only UK location to secure a place in the global top seven thanks to its attractiveness to families, quality infrastructure and local culture.
To say this with any confidence would be impossible were it not backed up by data. This philosophy defined Paradise Found and helped demonstrate the power of data to produce accurate, objective insight into any problem or question you may care to ask.
Separating fact from fiction
In these kinds of projects, opinion is often placed before fact. The usual approach is a consumer questionnaire or survey, made of pre-determined questions and criteria selected by its organisers. In this example, it might be the editor of a travel publication wanting to come up with a list of the best places to visit around the world.
Yet, while surveys are useful when we're interested in measuring sentiment, they won't help us get to the truth. The subjective or experience-based bias of those involved will always influence the result, making it subjective and opinionated rather than objective and accurate.
By contrast, the value of Paradise Found was in letting the data speak for itself. SAS used machine learning to analyse masses of publicly available data from global studies, reports, social media and review sites. From the information collected, the model identified the eight most important characteristics that people find attractive in a place. It then ranked locations according to how well they rated in each of these eight areas.
If this sounds straightforward, it wasn't. The project examined 148,233 cities in 193 countries based on five million unique data points from 1,124 different sources. In total, 1,060 international data services, three online geodata services, four social media services, and 57 urban studies contributed.
When dealing with the sheer bulk and diversity of the information sources, data management becomes as important as the analytics. For example, when we were standardising city names from an immense variety of languages and alphabets, data quality methods like profiling, parsing and cleansing were essential. It underlined the importance of using an open, transparent analytics platform, not just to make the system work but to make the data useable.
The devil's in the data
Taking the analytical approach didn't show where people thought the UK's best place was, it revealed where it actually is. Data doesn't lie and AI won't disagree with the evidence. The organisers of a survey will likely make assumptions as to what people think is important, whereas analytics begins with the data and builds the criteria from there.
Of course, this is not to say that a scientific approach using data and machine learning will always unearth the answer you're looking for. A lot of the time it will provide important insights that you can investigate further, before you make a final decision. For example, out of the top locations in the world it may be, on further investigation, you prefer somewhere further down the list than West Perth. This is why our Paradise Configurator allows users to choose which of the eight dimensions are most important to them, so it can generate a list of locations based on those key criteria.
Beyond simple curiosity, Paradise Found holds an important lesson for organisations. When data management, analytics and AI work together they can reveal unexpected and potentially game-changing truths that, when applied correctly, can overcome business challenges. This could involve finding the best location or most valuable customers, uncovering fraud or identifying cost-savings through more efficient processes. Insight is power and, with the right tools, any industry can benefit.