If you think your computer’s one terabyte hard drive is big then think again. Steven Frazer explains how companies are harnessing unimaginable amounts of data to predict consumer behaviour and highlights the companies set to cash in on this information revolution

‘Big data’ is everywhere these days. Consultants, analysts and journalists are churning out seemingly endless reams of reports, analyses and column inches flagging the explosion of information that our digital world is spewing out. Whether it’s retailers monitoring the personal shopping habits of millions of customers in a flash or public health officials proposing grand schemes to map the DNA of entire populations in order to eradicate disease, big data has emerged as one of the buzzwords of the past year, and it’s here to stay.

With potential applications across a diverse spectrum of industries and disciplines, big data represents a goldmine for the companies which can monitise it. An IBM (IBM:NYSE) study found that in 2012 63% of UK firms recognised the competitive advantage big data brings them, a huge jump from the 34% rate recorded in 2010. Similarly, a report from Gartner forecast big data will drive more than £20 billion ($34 billion) of IT spend in 2013.

Specialist UK companies with a true big data slant remain small in number at present but this will change as the opportunities become increasingly apparent to entrepreneurs and existing enterprise professionals. We flag a trio of big data plays being Sheffield-based WANdisco (WAND:AIM) , whose US headquarters are in California’s Silicon Valley, Fusionex (FXI:AIM) and 1Spatial (SPA:AIM). The latter two are exciting if higher risk alternative plays on the theme. But to truly understand the opportunities of these Aim enterprises, we first need to answer a fundamental question: what is big data?

Big data is data that exceeds the processing capacity of conventional database systems. The data is too numerous, moves too fast, or doesn’t fit the strictures of existing database architectures. To gain value from this data, you must choose an alternative way to process it. Based on statistical analysis of past and present reality, big data attempts to predict and/or influence future events, behaviours and so on. It is a logical extension of data mining which has been practiced since the 1990s. The novelty lies in the fact that with ever larger data volumes and new data sources it is possible to achieve statistically more accurate results with greater commercial value.

Units of data box-out

Retail has been the obvious early beneficiary of big data in both the off and online worlds. Emerging eCommerce (internet) and mCommerce (mobile) giants such as Amazon (AMZN:NDQ) have been applying big data techniques for several years, for example with books and music recommendations. The mCommerce opportunity in China is huge. The chief executive officer of WANdisco David Richards tells a semi-comical anecdote that perfectly captures the scope and scale of big data, whether you want it or not.

Last year the New York Times ran a story about an angry father who had stormed into his local Target (TGT:NYSE) homewares store and demanded to know why the firm had been sending his daughter deals for baby products. ‘She’s still in high school,’ barked the irate dad, before grilling the store manager. ‘Are you trying to encourage her to get pregnant?’ The manager was left with little choice but to issue a grovelling apology when, on checking the company’s mailing system, he noticed the girl had been sent baby clothes and nursery furniture mail-outs.

Yet just a few days later, when the store manager called the dad to apologise again, he found out that the tables have turned. The father admitted to just discovering that his daughter was indeed pregnant, and Target’s data capture system was bang on the money thanks to its coupon targeting based on detailed customer preferences. This might sound a bit creepy but it illustrates just how much data organisations are able to collect about their customers, information that can be leveraged to drive buying behaviour.

Predicting the future

Here in the UK, that mass of Clubcard data Tesco (TSCO) has on its shoppers and their preferences will soon be put to extra use, as the supermarket giant is mulling an advice system that gives buyers tips on healthy choices. The supermarket chain says its 50 million-odd global customer base has told it they’d like help in choosing healthy options and wants to see whether they would opt in for tailored recommendations to that end. What could turn out to be a really useful system for shoppers would, from the company’s perspective, also answer those critics who claim it prices less healthy options more favourably.

Medical applications also hold huge promise. In his book Physics of the Future, acclaimed physicist Michio Kaku imagines how gene sequencing might develop. Kaku foresees the rolling out the Human Genome Project to the masses, where every individual’s DNA code is mapped and stored. Already an emerging branch of medical diagnostics, this would take ‘bioinformatics’ to the next level and firmly entrench it as a key big data trend. The idea, explains the scientist, would be for computers to rapidly scan and analyse the data for signs of hereditary illnesses and diseases, such as heart disease, Alzheimer’s, even cancer. This would act as, at the very least, an early warning sign and mean treatments could begin potentially years before symptoms might show up in diagnostic tests.

Of course, cost is a major barrier to mass application of such technology right now, but Moore’s Law means computing power is helping drive down those costs. Intel (INTC:NDQ) co-founder Gordon Moore observation that over the history of computing hardware, the number of transistors on integrated circuits has doubled approximately every two years, a phenomenon which explains the rapid fall in the cost of equipment despite the exponential growth in processing power. Kaku notes that Stanford engineer Stephen Quake has managed to slash the cost of personal gene sequencing to around $50,000 per individual, and foresees prices plunging to around $1,000 over the next few years, the price point at which personal gene sequencing could be adopted on a mass scale. This implies computing enormous quantities of data on a national, even global scale, but just think of the potential healthcare cost savings of catching a multitude of killer illnesses early from nothing more than a simple blood or saliva swab.

Big data box-out

Big data is here to stay

Most leading industry experts predict that big data is here to stay and speculate on the benefits to be gained. Gartner stated last year that big data will deliver transformational benefits and enable enterprises to outperform competitors by 20% in financial metrics. The market research group further estimates the total amount of unstructured data alone will increase to 650% of its current size by 2017.

Last week American venture capitalist Mary Meeker presented her annual Internet Trends Report in which the ‘Queen of the internet’ flagged ‘wearables, drivables, flyables and scannables’ as emerging technology device trends. It doesn’t take much imagination to work out what devices she is referring to with Apple’s (AAPL:NDQ) iWatches and Google’s (GOOG:NDQ) glasses being the ‘wearables’, connected (perhaps even self-driving) cars the ‘drivables’, mini-drones the ‘flyables’ and Quick Response (QR) codes, the matrix barcodes used to quickly relay information to smartphones, the ‘scannables’. Such devices will add enormous quantities of information to an already mind-boggling mountain of data.

Meeker also pointed out the still-growing internet user base, suggesting that 2.4 billion people were online at the end of 2012, just 34% of the world’s seven billion-odd souls. China alone has 564 million internet users, nearly 10-times the UK’s entire population. Google remains the world’s biggest internet entity, followed by Microsoft (MSFT:NDQ) and Facebook (FB.:NDQ) in second and third spots respectively. But we should not be surprised by the rapid rise up the table of Chinese businesses, conglomerate TenCent (0700:HK) (number nine) and search engine Baidu (BIDU:NDQ) at position ten and among the fastest climbers.

Yet it is Meeker’s comments and slides on the world’s data explosion that really catch the eye. According to her presentation, the amount of ‘global digital information created and shared’ - photos, videos, tweets, documents etc - grew nine-fold in the five years to 2011 to nearly two zettabytes, and on to over 2.5 zettabytes last year. The presentation notes that data will pass eight zettabytes by 2015. That’s the equivalent of two billion gigabytes, or more than 31 million 64 gigabyte iPods. Looked at another way, the amount of data generated in a year during the 1990s we are today churning out in 60 seconds. Staggering.

Clearly we are moving on from the days of counting data sets in megabytes, gigabytes, even terabytes, in future we’ll be talking in many multiples of zettabytes and perhaps even yottabytes (see ‘Units of data’ page 16). To store a yottabyte on terabyte-sized hard drives would require a million city block sized data centre, to fill the states of Delaware and Rhode Island combined. That’s the equivalent of Cumbria and Kent added together. If 64 GB microSD cards (the most compact data storage medium available to the public as of early 2013) were used instead, the total volume of card required would be approximately 2.5 million cubic metres, or the volume of the Great Pyramid of Giza.

The truth is that most of us simply cannot comprehend the scale of data represented by these terms. As Calla Knopman, author of a BeyeNETWORK study (part of the TechTarget (TTGT:NDQ) business intelligence organisation), explains ‘we lack the framework to visualize this amount of data’. The author continues: ‘I cannot visualise a trillion dollars. I can understand money only within the framework that is part of my world. I can see a million dollars as equivalent to a home, granted a nice home in my area, but once I go much beyond this, it doesn’t really have meaning to me.’ This is true for many, if not all, of us, and underlines the famous quotation; a picture paints a thousand words.

China and USA bar charts

A discipline of many parts

Research group BeyeNETWORK splits big data in to four component parts; metrics and measurement, event logs, social text media and social multi-media. According to its research, metrics and measurement emanate more or less directly from sensors, monitoring devices and less complex machines, including radio frequency identity tags and readers. Already we have multitudes of sensors in modern planes, trains, cars, cameras to name but a few, but most interestingly, in smartphones. Such data is highly structured and reflects discrete events or characteristics of the physical world.

The second class, according to BeyeNETWORK’s Barry Devlin, is also machine-sourced, consists of computer event logs, tracking everything from processor usage and database transactions to website click-through rates and instant message distribution. ‘While machine-generated, data in both of these classes are proxies for events in the real world and, in business terms, those that record the results of human actions are of particular interest,’ says Devlin. ‘For example, measurements of speed, acceleration and braking forces from an automobile can be used to make inferences about driver behaviour and thus insurance risk.’

Classes three and four feature the social networks and private communications information directly created by humans. These are sub-divided into the more highly structured textual information and the less structured multimedia audio, image and video categories. Devlin says: ‘Statistical analysis of such information gives direct access to people’s opinions and reactions, allowing new methods of individual marketing and direct response to emerging opportunities or problems. Much of the current hype around big data comes from the insights into customer behaviour that internet giants like Google and eBay (EBAY:NDQ) and mega-retailers such as Walmart (WMT:NYSE) can obtain.’ However, in the longer term, machine-generated data is likely to be ‘the big game-changer,’ he adds, ‘simply because of the number of events recorded and communicated’.

The ‘big’ in big data is a function of the volume, variety, and velocity of the information that constitutes it. Yet big data gets a lot more interesting, certainly from a business and investment point of view, when you bring in a fourth ‘V’ - for value.

A report compiled by Skytree, a data analytics automation developer, points out that, while academia may have fuelled the big data movement, as technologies become more mainstream, the need to broaden the scope of work within an organisation’s existing IT talent pool becomes pressing. ‘Many businesses with big data initiatives are grossly under resourced due to the lack of formally trained data scientists in the market today,’ says the report. ‘Our survey shows that over 60% of the respondents who have a big data initiative expressed only zero to medium knowledge of expertise in machine learning and/or advanced analytics, and this is driving the market to deliver out-of-the-box solutions with intuitive interfaces, solutions that are easy to deploy, use, manage and support.’

Chipmaker Intel is a $120 billion giant that enjoys a near monopoly on semiconductors that go into PCs. But when it comes to the data underlying big companies like Facebook and Google, it says it wants to ‘return power to the people’. Intel Labs, the company’s research and development arm, is launching an initiative around what it calls the ‘data economy’ aimed at studying how consumers might capture more of the value of their personal information, like digital records of their location or work history. To make this possible, Intel is funding ‘hackathons’ to urge developers to explore novel uses of personal data. It has also paid for a rebellious-sounding website called ‘We the Data’, featuring raised fists and stories comparing Facebook to Exxon Mobil (XON:NYSE) .

Intel’s effort to stir a debate around personal data is just one example of how some companies, and perhaps society more broadly, is grappling with a basic economic asymmetry of the big data age - they’ve got the data, and we don’t. Internet firms like Google and Amazon are compiling valuable data about consumers on an unprecedented scale as people click around the web. But regulations and social standards haven’t kept up with the technical and economic shift, creating a widening gap between data haves and have-nots.

Consumers fighting back

‘As consumers, we have no right to know what companies know about us,’ says Hilary Mason, chief data scientist at Bit.ly, a social-media company in New York. ‘As companies, we have few restrictions on what we can do with this data. Even though people derive value, and companies derive value, it’s totally chaotic who has rights to what, and it’s making people uncomfortable.’ Earlier this year, for example, legislators in California attempted, without success, to introduce the first US law to give individuals a complete view of digital information companies hold on them. The ‘Right to Know’ bill would have given state citizens the power to demand a detailed report showing all the information firms like LinkedIn (LNKD:NYSE) , Facebook or Google had stored on their servers and with whom they had shared it with. That bill quickly got shelved but it does demonstrate growing public concern.

Digital consumers around the world are starting to tire of their personal data being collected across the internet, says Ovum. The global industry analysts paint a threatening scenario for the internet economy, as consumers seek out new tools that allow them to remain ‘invisible, untraceable and impossible to target by data means’. Ovum’s latest Consumer Insights Survey reveals that 68% of the internet population across 11 countries would select a ‘do-not-track’ feature if it was easily available, suggesting that a data black hole could soon open up under the online economy. This hardening of consumer attitudes, coupled with tightening regulation, could diminish personal data supply lines and have a considerable impact on targeted advertising, customer relationship management and analytics, all the big data hot spots.

‘Unfortunately, in the gold rush that is big data, taking the supply of little data, or personal data, for granted seems to be an accident waiting to happen,’ says Mark Little, principal analyst at Ovum. ‘However, consumers are being empowered with new tools and services to monitor, control, and secure their personal data as never before, and it seems they increasingly have the motivation to use them.’ Recent data privacy scandals such as WhatsApp’s use of address books, and the continuing issues over privacy and data use policies on Facebook and Google websites have fuelled consumers’ concerns over the protection of their personal data.

Clearly the big data concept is complex, but you dismiss it as a fad at your peril. Yes, big data faces challenges, a skills gap, limitations of the current IT infrastructure, new technological developments, even working out what data is worth collecting and what is not. But the opportunities are also huge for those companies able to spot genuine opportunities early, and the same goes for investors. Don’t expect it to be an easy ride, but five or 10 years from now, buying into big data enablers could look in retrospect to have been a very savvy move.

Communicate 98168090 (TS)

BIG DATA SWOT ANALYSIS

STRENGTHS

• Soaring data volumes

• Detailed analytics

• Customer engagement

WEAKNESSES

• Infrastructure limitations

• Extracting real value

• Education/skills gap

OPPORTUNITIES

• Increasingly connected world

• Tailored solutions/products

• Access to new markets

THREATS

• Privacy concerns

• Database hacking

• Ongoing development costs

200130803-001

WANdisco (WAND:AIM) 947.5p

STORY

WANdisco is largely about connectivity, supplying an ‘always-on’ data feed across the globe via clever server computer replication technology. The name stands for ‘Wide Area Network DIStributed Computing’ and it is based in Sheffield’s Electric Works digital campus, with offices in Silicon Valley, where chief executive officer David Richards lives. The company joined Aim in June last year raising £15 million at 180p, and the shares have captured the imagination of investors like nothing else since, rising more than 425%.

BIG DATA PITCH

Cut its teeth in innovation, supplying a collaboration software platform called Subversion that effectively allows engineering teams of big companies access to the same open source development systems and projects from anywhere in the world. Based on open source technology, it uses a ‘freemium’ model - it provides free access to clients and up-sells a variety of in-house built enterprise extras, including newly launched tools for the Hadoop 2.0 platform. Hadoop is the big data engine that Facebook, Amazon, Google and almost all the big technology firms rely on. WANdisco unveiled its first big data customer win in April.

WANDISCO - Comparison Line Chart (Rebased to first)

NUMBERS

At IPO the company was servicing five million users worldwide across 2,000 international companies, a base earned thanks to WANdisco’s claim of an internal pay-back inside of a year and return-on-investment of 150% for its users. While the £200 million company is not currently expected to make a profits breakthrough for several years, estimates out there at the moment are based solely on the firm’s existing products. There is the possibility of cashflow neutrality late next year, or early 2015. Richards sees WANdisco developing into a $1 billion revenue business in time, ambitious given last year’s $7.9 million of bookings. That will double this year, reckon Panmure Gordon analysts, hitting over $21 million in 2014. The broker has a discounted cashflow-based valuation of £12.35.

COMPANY SAYS (David Richards, CEO)

‘We are incredibly pleased with the progress achieved since our IPO in June 2012. Our revenues have almost doubled during this period in a fast growing market and our results today are ahead of expectation. Our major investments in talented people and complementary IP during 2012 have enabled us to launch new products for the high growth big data market, which we believe ideally position WANdisco for long term sustained growth. We have delivered a tremendous start to 2013, securing another record quarter in Q1 where we saw 96% growth in bookings. Some of the world’s largest, most innovative companies know they must get to grips with the big data challenge and - perhaps even more importantly - are prepared to trust WANdisco to help them do that.’

Touch 162272172 (TS)

1Spatial (SPA:AIM) 9.0p

STORY

Headquartered in Cambridge and with offices in Australia, Ireland and Belgium, 1Spatial’s software is used to create, manage, analyse and display geospatial data. Customers include Unilever(ULVR) , Unisys (UIS:NYSE) , US Census, Ordnance Survey GB, the Brazilian Army and Ordnance Survey Ireland. Last month it appointed WANdisco chief executive officer David Richards (see page 18) as the non-executive deputy chairman, a boardroom coup.

BIG DATA PITCH

It recently tapped investors for £18 million to buy a 75% stake in Star-Apic, a Belgium-based provider of Geographic Information Systems software and solutions, specialising in land and infrastructure management. Considering that some of the world’s biggest agencies rely on 1Spatial, such as the Ordnance Survey in Britain, which uses its software to update its records every day, this looks a smart move that will significantly add to the company’s capability in the rapidly growing big data market. The fresh funds will also bolster sales and support in a new Middle East office, allow ongoing research and development and provide the firepower for more acquisitions down the line.

1SPATIAL - Comparison Line Chart (Rebased to first)

NUMBERS

It’s been said in the past that 1Spatial was too early to the big data space, it struggled simply because many customers and potential clients assumed the theme was for the future, not today. But they seem to be catching up and this bodes well for the £29 million cap. It chalked-up revenues of £5.2 million last year to end January, but that figure was surpassed in this year’s first half alone, with sales jumping 146% to £6.4 million. This implies something around £12 million to £13 million for the 12 months to be confirmed when the finals are published probably next month. No forecasts are available but that could mean a break into the black.

COMPANY SAYS (Marcus Hanke, CEO)

‘Given the recent success and interest 1Spatial has seen in new markets, such as utilities, and the need to map networks and gain accurate insight into assets, Star-Apic is seen as a key acquisition opportunity, providing many benefits to support the group’s expansion plans and goals.’

World 134133143 (TS)

Fusionex (FXI:AIM) 271.0p

STORY

Fusionex is a Malaysia-based supplier of own IP enterprise software and implementation services that joined Aim in December after raising £12 million from investors at 150p. Its business intelligence product is a rules-based application which enables users to mine data from a variety of structured and semi-structured sources. The transactional engine is Fusionex’s own developed software, customised for a range of clients, notably in banking and insurance.

BIG DATA PITCH

The firm has yet to launch its big data dedicated suite, named Giant, but it’s likely to be rolled out later this year. With its strong background in business intelligence, that’s an easy space to start selling into and Fusionex has already laid out a migration path for existing clients.

FUSIONEX INTERNATIONAL - Comparison Line Chart (Rebased to first)

NUMBERS

Fusionex has been profitable since 2007 and in the 12 months to 30 September 2012 generated revenues of $10 million, on which it made a $4.2 million post-tax profit. Most recent half year numbers beat expectations with revenues hitting $6 million and earnings before interest, tax, depreciation and amortisation rising 14% to $2.4 million. Importantly, 66% of sales are recurring.

COMPANY SAYS (Ivan Teh, CEO)

‘We have started the first six months of the current financial year well supported by a number of key customer wins and continued progress on expanding our geographical footprint. Our business remains well positioned to take further advantage of the significant opportunities that exist in the business intelligence marketplace and most notably in the field of big data.’



Find out how to deal online from £1.50 in a SIPP, ISA or Dealing account. AJ Bell logo