Articles Marketmedia

Big Data and Big Iron

Written by Terry Flanagan | Dec 13, 2012 9:16:32 PM

Data volumes associated with capital markets have undergone explosive growth in recent years from traditional sources—namely stock data—by churning out exponentially higher volumes of new forms of structured and unstructured data.

“Much of this data must be centralized and stored in ways that make it instantly available,” said Patrick Mullevey, executive director of systems integration at Gravitas, which provides IT services to the alternative investment industry. “This has made the data center a mission-critical component of capital markets.”

Patrick Mullevey, executive director of systems integration, Gravitas Patrick Mullevey, executive director of systems integration, Gravitas

This explosion of ‘big data’ has affected all industries, but capital markets have their own unique set of issues, such as the need to capture time-series data and merge it with real-time event processing systems.

“This reflects expanded use of data in areas such as risk and trading analytics, regulation, compliance, reference and market data management, as well as decades of archival data for consumption by powerful computing systems with processing and memory capabilities unavailable less than a decade ago,” said Mullevey.

Open Source Databases
Capital markets firms are experimenting with open source database technology capable of capturing, storing and analyzing enormous amounts of data.

“The two major issues are the cost and size of the data,” said Philip Enness, director of markets infrastructure at technology firm IBM. “This is where the cloud opportunity comes in. The challenge will be offering a flexible infrastructure that supports a broad set of analytics while minimizing the duplication of data, which requires maximizing access to that data.”

Storage systems coupled with low latency messaging transport technologies and supporting systems will enable these build-outs, said Enness.

“To support this model, market data providers will have to implement data centers that serve as hubs for firms that want to subscribe to services as opposed to building their own,” he said. “The geographical placement of these hub data centers and the interconnecting networks will require a delicate balance to minimize latency and maximize client value while providing services cost effectively.”

Data centers are playing a key role in the distribution, processing and consumption of market data.

“They are the central points for the relevant infrastructure to process market data and are strategically positioned near fiber backbones for the onward distribution of that data,” said Scott Caudell, vice-president of IT infrastructure at Interactive Data, a provider of financial market data. “They also can be approximate to the raw feed sources themselves, which reduces points of failure, lowers cost and improves quality."

Open source data storage systems such as Hadoop and Cassandra are ideal for capital markets apps because they can process, store and trigger actions based on a high-volume real-time event stream, perform analytics on historical data, and update models directly into the application.

“Moving forward, as direct normalized and FPGA technology gain prevalence, we'll see the same dynamic for a distributed and hardware acceleration becoming commonplace,” said Caudell.

Cassandra is an open source distributed database management system designed to store and allow very low latency access to large amounts of data.

The Cassandra data model is designed for distributed data on a large scale. In a relational database, data is stored in tables and the tables comprising an application are typically related to each other.

Cassandra is a column-oriented database, meaning that it stores its content by column rather than by row. This has advantages for heavy-duty number crunching apps that involve complex queries.

Hadoop is an open source framework that allows for distributed processing of large data sets across clusters of computers. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

“Advanced analytics technologies combined with powerful analytic processors will boost the ability of exchanges and other market data owners and providers to add value to the raw market data by turning it into information that is useful to the investor communities,” said Enness of IBM. “Ultra-fast and large volume data load and analysis capabilities will indeed be the tools to turn data into actionable insights.”

Big Data Market
The big data market, meanwhile, is projected to increase to $53 billion in 2016, up from $5.1 billion in 2012, according to market research firm Wikibon.

The market, as defined by Wikibon, includes Hadoop software and related hardware; next-generation data warehouses and related hardware; analytic platforms and applications; business intelligence, data mining and data visualization platforms; and data integration platforms.

“Organizations are increasingly mining massive sets of unstructured data but extracting usable information is becoming more difficult,” said John Gilmartin, vice-president of marketing at Coraid, a provider of cloud-based storage.

“This is where big data analytics comes in and hence why there is a rise in a new set of tools pioneered by Google and Yahoo. While adoption is currently driven by application and business teams, IT administrators will soon need to figure out how to incorporate big data analytics into their operations to stay ahead.”