A colleague recently shared with me an amusing article regarding the phenomenon we know as “Big Data”.
As a technical person, I could empathise with many of the author’s observations and share the view that many of us feel that we ought to be doing big data because everyone else seems to doing it.
If we don’t start doing big data then we will somehow miss out. The trouble is few people seem to know what the concept really means. Moreover, many of the technologies we have used for many years continue to cope just fine, despite increasing amounts of data.
So what is all the excitement about? What is the immediate relevance of big data to the insurance industry? And what problems can it solve?
What is Big Data?
In broad terms, big data refers to:
- The vast increase in the quantity of information being collected (and stored, and processed, and analyzed)
- The heterogeneous nature of this information, which can come from online purchases, Facebook status updates, tweets, shared photos, and “check-ins”
- The demand to crunch these mountains of data as quickly and efficiently as possible.
Of these three points, it is the need to analyse heterogeneous and increasingly unstructured data that could be the big driver for the insurance industry.
Typically, the traditional systems and databases that our analytical teams at Willis use have adequately managed the continuous growth in data sizes. For example, Willis’ financial modelling system handles hundreds of billions of records in a good old-fashioned relational database. We have never felt the urge to refer to this system as a Big Data platform because it does what many other systems have done for many, many years. However, there are number of new volumes of data that require a radically different approach in order to analyse and interpret them appropriately. I’d like to explore some of these here.
Risk and Insurance Applications
My background comes from the strain of analytics we might refer to as risk analytics, by which I mean catastrophe modelling, financial modelling, spatial analysis or geospatial science. In the last twelve months I have presented at and attended various technology conferences and have observed first-hand the information explosion that is upon us.
Firstly let’s take situation awareness – the need to know everything going on in any given area. In risk and insurance terms this could refer to a kidnap and ransom event in the middle-east or the aftermath of a North Atlantic super-storm.
Crunching social media content is one of the primary purposes of Big Data technology – powered by the analytics of platforms like Hadoop to sift through the data very quickly. In other words, scanning through millions of newsfeeds and social media feeds, such as Twitter, YouTube and Ushahidi, for posts containing certain content (e.g. #earthquake) within a specific “geo-fenced” area.
What has been applied in other industries to analyse customers and their behaviours can be applied to solve very interesting problems in the insurance industry.
Tracking through Sensor Networks
Sensor networks are another interesting area that perhaps hasn’t yet been embraced in the insurance industry. The number of sensors and devices on the internet already exceeds the number of people.
Applications include tracking people in risky territories via mobile and iridium trackers, tracking of ships and planes, through to telemetry systems for cars and driving behaviours.
Tracking mobile assets produces vast volumes of data very quickly and it requires very high performance computers to analyse the data and facilitate predictive analytics. It is platforms such as Hadoop and Massively Parallel Processing (MPP) databases such as Oracle ExaData, TeraData and Netezza that allow this data to be stored and provide the data platform required to undertake analysis and interpretation.
Satellite imagery and more generally, spatial information ought to be classed as big data. In particular, remotely-sensed data has been largely under-utilised in the insurance industry, mainly due to the relatively high costs of capture, processing and delivery.
Technology is facilitating change for those that can find ways to harness and consume it. NASA is a particularly good example. Raw flood, fire and climatic data can be downloaded from NASA’s online archives. And the evolution in cloud and SaaS licensing is starting to pave the way for on-demand imagery which can be integrated into tools and products.
ESRI CEO Jack Dangermond recently explained that it will take 15 seconds to retrieve and process raw imagery from the DigitalGlobe archive. Doing this as a pay-as-you-go model illustrates very clearly the direction of travel – what was previously available to the few at high cost suddenly becomes a commodity to all.
It’s not all just technology
Perhaps one of the overlooked drivers for this trend towards more and better access to high-quality data is a general change in data policy. While the rise of social networking has been a powerful driver, the liberation of public domain data has also created a flood of new data to play with.
A recent example in the UK was Ordnance Survey’s OpenData initiative which liberated property gazeteers and topographic data in to the public domain. From the US, the FAA is changing its policies to make it easier for UAVs (a.k.a “drones” to you and I) to be flown in US airspace. Such a change opens the door for the capture of high resolution imagery (LIDAR) quickly and cheaply – something that could revolutionise damage detection, change monitoring and claims management. In the past, the entry costs for data collection were very high. Now the cost can be as low as a few tens of thousands of dollars.
Additionally, in more prosperous times, governments allocated more funding to research and research institutions. In the current financial climate government funding is more difficult to secure and consequently research funding is increasingly coming from the private sector. This could explain why there is more commercial awareness of data availability and the opportunities this presents. Tighter-coupling between the research and the consumer might bring more products to market which are packaged in a way that facilitates large-scale consumption.
Big data doesn’t answer the question “why”
In my mind, big data is a new term to describe the management and use of data in the modern era. Technology evolution allows us to do things that we could never have thought possible a decade ago but it is not the size of the data mountain that has driven innovation. Rather, it is the demands made by the consumer.
When confronted by the information tidal wave, technology helps us answer questions in a way that was previously not possible. “Who”, “what” and “when” are questions that we have always asked of our data, but these questions were previously constrained by scope, geography, time and domain. For example, internet search engines pay little respect to the constraints of the past and will return almost anything vaguely related to the question asked. As impressive as this may be, a lot of unnecessary information is returned and we all know that the user needs to ask the right question to get the right answer.
Big data does not answer the question “why”. For this we turn to intelligent analysis of the data combined with the most powerful computational system on the planet – the human brain.