Nº 1 2014 > Big Data
Big data, big deal, big challenge
The big data phenomenon — emerging technological capabilities for solving huge complex tasks — has been hailed by industry analysts, business strategists and marketing pros as a new frontier for innovation, competition and productivity. The 11th World Telecommunication/ICT Indicators Symposium (WTIS), which took place in Mexico City, from 4 to 6 December 2013, saw big data as having tremendous potential for fostering development by providing real-time information, at low cost, compared with data from other sources.
Practically everything that deals with data or business intelligence can be rebranded as “big data”, and the hype looks set to match the stir created by cloud computing, where existing offerings were rebranded as “cloud-enabled” and whole organizations moved overnight to the “cloud”.
Beyond the buzz, big data capabilities motivate researchers from fields as diverse as physics, computer science, genomics and economics. The new analytical power is seen as an opportunity to invent and investigate new methods and algorithms capable of detecting useful patterns or correlations present in big chunks of data. Analysing more data in shorter spaces of time can lead to better, faster decisions in areas spanning finance, health and research.
The Technology Watch report Big Data: Big today, normal tomorrow, written by Martin Adolph of the ITU Telecommunication Standardization Bureau and released in November 2013, looks at different examples and applications associated with the big data paradigm, describes their characteristics, identifies the commonalities among them, and highlights some of the technologies enabling the upsurge of big data. As with many emerging technologies, several challenges need to be identified and addressed. Global standardization can contribute to addressing these challenges and will help companies enter new markets, reduce costs and increase efficiency.
A food scandal in early 2013, which rocked several European countries, illustrates the power of big data in resolving a crisis. The scandal involved a network of fraud, mislabelling and sub-standard supply chain management. This was not the first food scandal, and will surely not be the last. For restaurant chains with thousands of branches and hundreds of suppliers worldwide, monitoring the origin and quality of each ingredient is nearly impossible. Now, however, data availability and sophisticated real-time analytics offer a way of discovering irregularities early (or, better still, preventing them). Through data analysis, the events leading to the scandal were uncovered and resolved. The incident highlights the promise and challenges of data management for multiparty, multidimensional, international systems.
Billions of individual pieces of data are amassed each day, from sources that include supplier data, delivery slips, restaurant locations, employment records, DNA records, data from Interpol’s database of international criminals, and also customer complaints and user-generated content such as location check-ins, as well as messages, photos and videos on social media sites. Gleaning insight and knowledge from this mass of disparate data requires identifying relevant data items and detecting patterns among them in order to draw accurate, comprehensive and actionable conclusions.
This is the big data phenomenon, described in the Big Data: Big today, normal tomorrow report, from which the present article is drawn. The report, along with other Technology Watch Reports can be found at http://itu.int/techwatch.
Big data — easier to recognize than define?
While no precise definition exists, four common characteristics help to describe big data — volume, velocity, variety and veracity.
Volume may be the most compelling attraction of big data analytics. In the healthcare field, for example, evaluating the effectiveness of a treatment on a population-wide basis yields a far more reliable result than the same analysis for a dataset of 100 patients. Although the adjective “big” is not quantified, it is estimated that 90 per cent of the data in the world today has been created in the past two years, with machines and humans both contributing to data growth.
The velocity of decision-making — the time taken from data input to decision output — is a critical factor. Emerging technologies are capable of processing vast volumes of data in real time or near real time. This increases the flexibility with which organizations can respond to changes in the market, shifting customer preferences or evidence of fraud. Championed by high-frequency traders in the financial services market, velocity and tight feedback loops are a key part of gaining competitive advantage in a number of industries.
Variety is the messy reality of big data. Text, sensor data, call records, maps, audio, image, video, click streams, log files and more need time and effort to be shaped into a form fit for processing and analysis. The capacity of a system to analyse data from a variety of sources is crucial, as this can yield insights not achievable by consulting one type of data in isolation.
An ability to assess the veracity of data is essential in establishing a basis for important decisions. Big datasets reflect uncertainties attributable to inconsistency, incompleteness, ambiguities and latency in data items. This varying level of uncertainty must be factored into the decision-making process. A system therefore needs capabilities to distinguish, evaluate, weigh or rank different datasets in order to maintain veracity.
Big data in health, science, transport…
Data are critical in the healthcare industry to document illnesses and treatment given to individual patients. With medical image archives growing by 20 to 40 per cent annually, an average hospital is expected, by 2015, to be generating 665 terabytes of medical data each year. Applications of big data analytics in the healthcare domain are as numerous as they are multifaceted, both in research and practice. For example, remote patient monitoring systems for chronically ill patients can reduce physician appointments, emergency department visits and in-hospital bed days, improve the targeting of care, and prevent some long-term health complications.
Analysing large datasets of patient characteristics, outcomes of treatments and their cost can help identify the most clinically effective and cost-efficient treatments to apply. Furthermore, analysing global disease patterns to identify trends at an early stage is mission critical, not only in managing public health crises, but also in allowing the pharmaceutical and medical sectors to model future demand for their products as a basis for deciding on research and development investment.
An example of big data, par excellence, is an effort being put into solving the mysteries of the universe. Located just a few minutes drive from ITU headquarters, the European Organization for Nuclear Research (CERN) is host to one of the biggest known experiments in the world. For more than 50 years, CERN has been tackling the growing torrents of data produced by its experiments studying fundamental particles and the forces by which they interact. The Large Hadron Collider (LHC) consists of a 27-kilometre ring of superconducting magnets with a number of accelerating structures to boost the energy of the particles along the way. The detector comprises 150 million sensors and acts as a 3D camera, taking pictures of particle collision events at a speed of 40 million times per second. Responding to the need to store, distribute and analyse the up to 30 petabytes of data produced each year, the Worldwide LHC Computing Grid was established in 2002 to provide a global distributed network of computer centres. A lot of CERN’s data are unstructured and only indicate that something has happened. Scientists around the world now collaborate to structure, reconstruct and analyse what has happened and why.
Mobile phones leave traces that can be exploited for transport modelling. This is of particular interest where other transport-related data are scarce. For example, to support transport planning to reduce traffic congestion in Abidjan, Côte d’Ivoire, telecommunication provider Orange offered access to anonymized datasets containing 2.5 billion records of local calls and text messages exchanged between 5 million users over a period of 5 months. Similarly, Korea Telecom helped the City of Seoul determine optimal night bus routes. As a result, seven more night-bus routes have been added to the city’s original plan. A similar analysis of Swisscom data in Geneva, Switzerland, is shown in the image below.
On a larger geographical scale, mobile phone data contribute to the analysis of migration patterns and are invaluable in crisis management. Launched by the Executive Office of the United Nations Secretary-General, the Global Pulse initiative responds to the need for more timely information to track and monitor the impacts of global and local socio-economic crises.
In the telecommunication field, network analytics help providers to optimize their routing network assets and to predict faults and bottlenecks before they cause any harm. Combined real-time network insights and complete customer profiles add value, enabling tailor-made offerings that increase revenue opportunities, while attracting and retaining customers. Network analytics are also an important means to detect and mitigate denial of service attacks.
Data protection, privacy and cybersecurity
The two basic principles of data protection — data avoidance and data minimization — stand in stark contrast to the power of big data to facilitate the tracking of people’s movements, behaviours and preferences, predicting an individual’s behaviour with unprecedented accuracy, often without the individual’s consent. For instance, electronic health records and real-time self-quantification (people wearing sensors to monitor, say, their fitness level or sleep pattern) may be an enormous step forward in streamlining drug prescriptions or diet and fitness plans. But many consumers view such data as very sensitive.
Large sets of mobile call records, even when anonymized and stripped of all personal information, can be used to create fingerprints of users, which when combined with other data such as geo-located tweets or check-ins could reveal the individual’s identity.
As the amount of personal data and global digital information grows, so does the number of entities accessing and using this information. Assurances must be given that personal data will be used appropriately, in the context of the intended uses and abiding by the relevant laws.
A closely related concern is cybersecurity. Threats and risks need to be reassessed in view of big data, adapting technical solutions in response. The time is ripe to review information security policies, privacy guidelines, and data protection acts.
Important sources of new data, such as information from mobile-cellular networks and, in particular, social networking services, could complement official statistics. The World Telecommunication/ICT Indicators Symposium (WTIS) nevertheless pointed to a number of confidentiality and privacy concerns related to the use of big data. WTIS encouraged regulatory authorities to explore the development of guidelines on how big data could be produced, exploited and stored. National statistical offices, in cooperation with other relevant agencies, should look into the opportunities offered by big data, while addressing current challenges in terms of big data quality and veracity and privacy within the framework of the fundamental principles of official statistics.
Achieving the big data goals set out by business and consumers will require the interworking of multiple systems and technologies.
The standards community has launched several initiatives and working groups on big data. In 2012, the Cloud Security Alliance established a big data working group with the aim of identifying scalable techniques for data-centric security and privacy problems. The group’s investigation is expected to clarify best practices for security and privacy in big data, and also to guide industry and government in the adoption of those best practices.
The United States National Institute of Standards and Technology (NIST) kicked-off its big data activities with a workshop in June 2012, and in 2013 launched a public working group. The NIST working group intends to support secure and effective adoption of big data by developing consensus on definitions, taxonomies, secure reference architectures and a technology road map for big data analytic techniques and technology infrastructure. ISO/IEC JTC1’s data management and interchange standards committee (SC32) has initiated a study on next-generation analytics and big data. The World Wide Web Consortium (W3C) has created several community groups on different aspects of big data.
ITU’s Telecommunication Standardization Sector (ITU–T) is currently addressing individual infrastructure requirements, noting existing work in domains including optical transport and access networks, future network capabilities (such as software-defined networking), multimedia and security.
ITU–T is studying the relationship between cloud computing and big data in view of requirements and capabilities. Recommendation ITU–T X.1600 on “Security framework for cloud computing” matches security threats with mitigation techniques, and the future standardization of the described threat-mitigation techniques is expected to incorporate big data use cases. A previous report in the Technology Watch series advocated the use of privacy-enhancing technologies as a means to implement the “privacy by design” principle, which is of course of great interest to big data applications.
With a worldwide membership comprising governments, telecommunication operators, equipment manufacturers, and academia and research institutes, ITU is ideally positioned to review current practices in the use of aggregated datasets and to develop related technical standards and policies.
ITU has been accelerating its efforts to increase interoperability in electronic health applications in areas such as the exchange of health data and the design of personal health systems. With the boom in personal and wearable “connected health” and fitness products in mind, standardization could enable a smart wristband, say, to exchange data securely with a smartwatch of a different make (uninhibited by vendor or manufacturer boundaries). Big data analytics would then have the ability to integrate the data streams collected from different devices and pinpoint results that could trigger actions beneficial to health.
After doubling the efficiency of its Emmy-winning predecessor, Recommendation ITU–T H.265 is on track to become the web’s leading video codec. Considering multimedia’s significant share in total Internet traffic, the automatic analysis of digital image, audio and video data is an area to be monitored closely from the big data perspective.
The open data movement is maturing, in emerging economies as well as in highly industrialized countries. With a number of interoperability and policy challenges to be faced, it is opportune for ITU to embrace and advance the cause of open data — in partnership with the many open data champions within and outside its membership. From the standards angle, this might include the development of requirements for data reporting, and mechanisms for the publication, distribution and discovery of datasets.
More work is needed to fully understand the potential of big data, and ITU should further examine the challenges and opportunities related to big data in the ICT sector.