ISSN (Print) - 0012-9976 | ISSN (Online) - 2349-8846
-A A +A

Indian Official Statistics

Digital Transformation to Honour Citizens

R B Barman (barmanrb@gmail.com) is Chairman, National Statistical Commission.

Official statistics is a public good that informs, supports, and sustains democracy and advances socio-economic development. The Indian statistical system is analysed and methods for modernising it are suggested by using information and communications technology to improve the quality, credibility, coherence, and timeliness of data. An integrated, decentralised information system populated with granular data will enable data to be carried flexibly wherever required, queried, and analysed in business contexts at all levels of governance for a deeper insight. Such a system will help the government to inform stakeholders about the economy and honour our commitments to the United Nations resolution of 2014.

The difficulty lies not so much in developing new ideas as in escaping

from old ones. —John M Keynes

By providing quality information in the public domain, official statistics helps in measuring progress, analysing interplay of market forces, and in shedding light on business opportunities in the changing socio-economic, technological, and political environment. The government owns the system to fulfil its commitment to produce statistics as a public good, make informed decisions in formulating policy, and evaluate performance. As official statistics informs people about the state of progress of a country in various spheres, it has to follow a sound methodology and be authentic, dependable, trustworthy, transparent, and timely.

In a democracy, a government is a contract between those who govern and those who are governed. Official statistics is expected to give concrete empirical evidence about governance.

Governance, defined as the capacity of a country’s institutional matrix to implement and enforce public policies and to improve private sector coordination, affects the incentives of politicians, bureaucrats, and private economic agents alike and determines the terms of exchange among citizens and between them and government officials. (Ahrens 2002)

Governance is independent of government. This separation expects a statistical system to be independent to provide impartial, verifiable empirical evidence on the quality of governance in its various dimensions.

India is a vast country, united by collaborative federalism. The country inherited a stately history along with a fractured society. The country has seen great preachers of humanity along with worst forms of oppression that muzzled the voice of the poor and deprived. As Smith (2004) observed, “The impoverishment of India is a classic example of plunder-by-trade [emphasis in the original] backed by military might.” Freedom, and a democratic system of governance, provided people—for the first time—an opportunity to participate in socio-economic development and redeem themselves from centuries of subjugation and deprivation. Jawaharlal Nehru, the first Prime Minister of independent India, in his memorable speech on the eve of independence on 15 August 1947 said,

Long years ago we made a tryst with destiny, and now the time comes when we shall redeem our pledge, not wholly, or in full measure, but very substantially .... We end today a period of misfortunes and India discovers herself again. The achievement we celebrate today is but a step, an opening of opportunity to the greater triumphs and achievements that await us.

India has come a long way since then. But much more remains to be accomplished.

Official statistics has discharged an important role in the country’s development effort by providing credible evidence about the state of development. The Indian statistical system is committed to continue providing, through professional quality data, an independent and impartial account of the country’s socio-economic progress. How can this cause be strengthened by a better system? That is the critical question. An appraisal of the present state of our system, to set the context, and suggestions for modernisation based on available options, form the contour of the paper.

The Indian Statistical System

The system for official statistics in India as it exists today owes heavily to the great statistician Prasanta Chandra Mahalanobis (1893–1972) for its foundation. His birthday, 29 June, is celebrated as Statistics Day. The 125th birth anniversary of this great soul was celebrated on 29 June 2018. The Indian Statistical Institute, founded by Mahalanobis, is conducting a year-long programme to commemorate the anniversary.

As honorary statistical adviser to the Cabinet, Government of India, Mahalanobis guided the process of laying a solid foundation for official statistics in the country. To him statistics was the “key technology.” This technology was considered a powerful means for not only scientific investigations but also for supporting the socio-economic development of the country, which was badly ravaged by colonial exploitation. The planning process, as part of the strategic objective of self-reliance, was conceived for the optimal use of resources for fast growth, as per the country’s priorities. This necessitated reliable data on various dimensions of the economy, which formed part of official statistics.

Mahalanobis continued to nurture the development of official statistics as long as he lived. His contributions in the area of statistics are well-documented by Rudra (1996).

The hallmark of excellence of PCM’s scientific work consists in the inseparable relation it represents between theory and application. He had an articulated philosophy of research in statistics. He believed statistics to be a Key Technology meant to help in the analysis of problems in the different sciences. All through his life, in all his research work, he remained true to this philosophy.

The chapter in Rudra (1996) on the Indian statistical system records the creation of the two major wings of the present National Statistics Office—the National Sample Survey Organisation (NSSO) and the Central Statistical Organisation (CSO). No country, developed, underdeveloped, or over-developed, has such a wealth of information about its people as India has in respect to expenditure, savings, time lost through sickness, employment, unemployment, agricultural production, industrial production. We in this country, though accustomed to work in large-scale sample surveys were aghast at Mahalanobis’ plan for the national sample surveys of India. Their complexity and scope seemed beyond the bounds of possibility, if not beyond anyone else’s imagination, but they took hold and grew,” said Edward Deming (quoted in Rudra 1996), a renowned expert in sample survey methodology.

The system developed for official statistics was then one of the best. India was among the earliest countries to adhere to international commitments on quality, comparability, and timeliness, such as the System of National Accounts (SNA), for estimating gross domestic product (GDP) in harmony with other systems on flow of funds, balance of payments, and fiscal statistics. This solid foundation helped it participate in the Special Data Dissemination Standard (SDDS) of the International Monetary Fund (IMF). These are positive aspects of the Indian statistical system, and provide methodological soundness for consistency and transparency, which are required for confidence in the data produced. However, the system required continual updating in keeping with the technological, organisational, methodological, and data-related issues and, when certain deficiencies became highly disturbing, it became necessary to review the system.

Rangarajan Commission

The task of reviewing the system was entrusted on 19 January 2000 to a commission chaired by C Rangarajan, the then governor of Andhra Pradesh. The Rangarajan Commission submitted its report on 18 August 2001. The report examines the whole gamut of data quality, consistency, relevance, and timeliness of collected statistics, along with systems and processes for their administration, and is a major landmark. The Rangarajan Commission noted the shortcomings of the Indian statistical system and observed that its credibility, timeliness, and adequacy needed improving. The commission recommended that data gaps be identified and alternative techniques explored to improve the methodology of collecting, compiling, and disseminating data, and suggested reforming the administrative structure, grant it autonomy required for independence and upgrading infrastructure. As part of the implementation of the Rangarajan Commission’s recommendations, the National Statistical Commission (NSC) was set up in 2006. It was expected to be empowered to serve as a nodal body for all core statistical activities. This commission was to be backed by an act; it was drafted, but not enacted. Since 2006, the commission has considered many pressing issues confronting the generation of official statistics.

Areas of Weakness

Several standing committees support the development of concepts and methods to maintain the quality of data collected—measurement of variables, survey sampling, updating the base of indices. Some of these are the Advisory Committee on National Accounts, the committees on prices and industrial production, and working groups on sample surveys. Whatever these committees did, became available in the public domain. Transparency remains a hallmark of the Indian statistical system but, despite these achievements, timeliness, quality, consistency, and coherence remain weak. It takes almost five years to revise the base period weighting diagram for price and production indices, whereas advanced economies produce chain-based indices for them, revising the weighting diagram every year. This is possible because systems are digitalised. Several attempts have been made in India to prepare an exhaustive register of business units to serve as a frame for drawing samples for various surveys. In a country where any business worth its salt requires to be registered, it is a pity that we cannot have a dependable business register. The data on business units collected through the economic census is not only expensive but also highly deficient.

The Annual Survey of Industries (ASI) is an example of how a deficient frame contributes to erroneous estimates. The data on international trade based on customs and payments were also differing for different reasons. There is a wide variation between the consumption data based on the household surveys conducted by the NSSO and the estimates obtained through the national accounts. The present system is not equipped to reconcile the differences and look for solid evidence therein, which is needed.

The estimation of gross state domestic product (GSDP) is a new, and more disturbing, issue thrown up by the latest revision in methodology for the current series of the GDP with base year 2011–12, which replaced the previous series with base year 2004–05. This has happened for several reasons; a major reason is the use of Ministry of Corporate Affairs (MCA 21) data on the corporate sector, which replaced the age-old ASI. The ASI data had various shortcomings, particularly under-coverage of the factory sector, and corporate-sector data were considered better for GDP estimation. The allocation for GSDP estimate for the year 2011–12, for the new series in respect of Gujarat was as high as 74.4%, to align the state’s estimates with those of the CSO, while the same for the GSDP estimate for the previous base of 2004–05 was only 30.0% (Dholakia and Pandya 2017). The allocation was as high as 100% for mining and quarrying, manufacturing, railways, and communications and services related to broadcasting and financial services. This has thoroughly disturbed the system developed painstakingly for estimating GSDP over many decades. It is particularly so for manufacturing, which should not be worked out by allocation. In a federal structure, where regional development is an important focus of development, the most important macro-parameter should be sufficiently credible. While pointing out major limitations of the revision carried out in the new series, Dholakia and Pandya (2017) observe that “most of these impacts are negative on the quality, reliability, valid usage, interpretation and meaningful analysis of long term trends of sectors and the economy at the state level in the country.” The story is similar for other states as well.

New sources of data must be made use of. The goods and services tax (GST) system is capturing a lot of information on traded goods and services. An appropriate system must be developed for making use of these data for various estimates. Likewise, e-governance data need to be integrated more closely into official statistics.

The Rangarajan Commission recommended that a data warehouse be built to consolidate fiscal data. The idea is to capture granular data for shedding much more light on the government effort on revenue generation and intervention for promoting socio-economic objectives. This will make available detailed information on government spending under various heads in any geography, district upwards, which could be related with other data for understanding the success of interventionist government policy. There is no such integrated system for fiscal statistics, though aggregated data are available under various heads for both the central and state governments.

Even financial statistics, which is otherwise well-organised, needs to address emerging challenges on the distribution of financial assets and liabilities, and their sources and use, by geography, along with other characteristics, including riskiness and reasons thereof. Data on the regional distribution of some of the components of flow of funds are required on a quarterly basis to understand the nexus and dynamics of interactions between the real and financial sectors in shaping the course of the economy over space and time. This is because an enterprise is a product of opportunities, resources available, market conditions, and availability of risk capital. The success of government intervention for reducing imbalances in development and creation of jobs for demographic dividend can also be assessed when these data become available at lower levels of aggregation. Many areas of official statistics require improvement, and a major effort is needed to improve the legacy systems and processes that support the production of statistics and that are based on disparate conventional practices. The Indian statistical system was developed in the aftermath of independence to support development plans. The system improved over time, but the silos of decentralised systems created under the allocation of responsibilities to administrative ministries remained largely intact. More than one ministry collects and disseminates consumer price data. Now it is possible to meet user demand on agricultural, industrial, urban, rural and composite indices based on back-end compilation, and using the data repository on consumer prices and the corresponding weighting diagram for each type of index.

In the past it was almost impossible to match unit-level data going into trade statistics compiled by the Directorate General of Commercial Intelligence and Statistics (DGCI&S) and the balance of payments (BoP) statistics produced by the Reserve Bank of India (RBI). Likewise, the establishment and enterprise approaches for the ASI and company finance data for manufacturing units were not amenable for mapping. Ideally we should have unit-level data on household production, consumption, and saving. While different sources provide these data at the aggregate level, it becomes difficult to compare them for regional distribution. Geocoded data for these three important household components would enable a better understanding of their inter-consistencies. It may be possible to examine household consumption expenditure for possible sources of divergence—households eating out but not reported in consumption surveys, midday meals supplied free of cost, and corporates providing their employees subsidised food. The GST data on sales at comparable regional levels could partially help. Digital payments data could be another source, though these data would not provide perfect mapping—except for some kind of dimensional checking. In short, data with comparable dimensional characteristics will allow for reconciliation of data available from alternative sources through a matching exercise by geography and institutional characteristics, and through a comparison of data from different sources that can be pulled using conformed dimensions in data warehousing parlance. It is relatively easy to reconcile differences in a smaller geography. Consider a few examples on comparability of data arising out of different sources.

The data collected in the ASI details items manufactured, industrial classification, materials consumed, and location, but the MCA 21 focuses on financial performance in manufacturing units. As many large companies operate in multiple states, it is not possible to separately estimate statewise output. This is how the use of MCA 21 data in the new revision, instead of ASI data, led to very sizeable allocation of company output into different states. This problem can be greatly solved if the two sets of data are mapped through a common identity. The matching will not be perfect, but the census sector of the ASI will mostly be enterprises; hence, a major part of corporate manufacturing output can be identified to the respective geography. If no better alternative is available, allocation may be restricted to an unmatched portion. This will also relate two very important sets of data for various other analytical purposes.

The NSSO is uniquely placed to conduct possibly the largest household survey. The NSSO uses professional, sophisticated, and statistically advanced methods to collect data from about 1,50,000 households. That is a large number of households but a tiny fraction of the 250 million households in India. Using the data to estimate household employment by state and occupation will reduce the number of households for each element. Even for the all-India estimate, there are some issues when it comes to sub-classifications. The NSSO estimate of urban population is significantly lower than the census figure. The standard error, even if low, does not help in the correction of such an estimate. One reason could be the deficiency in the urban frame of households the sample is drawn from. This deficiency leads to a major problem of consistency and validity as these estimates are used for value-added components for the unorganised sector.

The Index of Industrial Production (IIP) is used to estimate quarterly GDP. The annual growth rate of value added by industry based on the IIP differs from the ASI rate, and when the quarterly estimate is derived using IIP it becomes deficient to this extent. The IIP is a measure of output, not of value added.

Conceptually, the system for estimating national accounts follows three approaches based on production, income, and expenditure, which is useful for cross-checking consistency and coherence, in addition to other things. Many countries follow production and expenditure approaches for major components. Only a few countries follow all the three approaches because in these countries the income approach is also reliable. India follows the production approach predominantly. There are expenditure data on households, the government, and the corporate sector—the three major institutional sectors. However, the NSSO estimate of household expenditure differs widely from the CSO estimate, which is based on national accounting, and the difference is widening over time. There is no satisfactory way of verifying the various sources of this widening divergence. In big data parlance, production and expenditure data for the government, the corporate sector, and households under different heads at lower levels of aggregation, even if approximate, may allow for a way to cross-check each estimate for inter-consistency on dimension and direction of change.

At present attempts are made to defend differences intellectually by arguing over possible sources of divergence. In some cases, one source is accepted as more credible than the other. Trade data is relied on to calculate exports although the RBI also collects payments data. As the RBI and the DGCI&S have much better systems now, it may be possible to undertake matching exercise periodically, at least on a sample basis.

The Indian statistical system is highly decentralised. It has its own challenges. So far, some of the major concerns, which are legacy conventions, have been considered. Now, certain unconventional questions are raised on issues confronting official statistics.

Do we understand the economy very well? It is about man and material resources, and their use over time and space; behavioural traits; market factors; impact on individuals and groups; opportunities and threats; policy prescriptions and their impacts; price formation; wins and losses; and the environment and its degradation. Do we understand poverty, inequality, and deprivation well enough to pursue policy to support vulnerable sections in a focused way and by geography? We need data to focus our programmes for success. To assess progress on the Sustainable Development Goals (SDGs) of the United Nations (UN), which span over 15 years up to 2030, India needs a system for collecting data on 17 major indicators and formulating plans and milestones. In a vast country of India’s size, non-linearities and heterogeneity cannot be wished away. India needs to use information and communications technology (ICT) to organise data for informing regional and subregional development.

For an economy to function efficiently, it is necessary that authentic information about its functioning at each level is available in the public domain—that is, there is no information asymmetry. As pricing is a thermometer of market pressure, an understanding is necessary of how each market prices products and allows competition for efficiency and social welfare. Also, people need to be informed on how effectively government intervention is working to promote an environment conducive to growth, stability and equity, and how parliamentarians are performing to safeguard the welfare of the people. As enterprises use household savings for investment, risks thereon become a public concern. High-quality, dependable data at all levels of governance is needed to address
these concerns.

Why We Failed to Overcome Known Shortcomings

The Rangarajan Commission’s review of the Indian statistical system was a bold attempt to identify shortcomings and suggest appropriate action. In addition to methodological improvement, the commission wanted a major thrust on the use of modern technology for reporting; data processing, using data warehousing; and strengthening state statistical systems. There is no data warehouse yet for national accounting or large-scale sample surveys. When the world is using parallel processing in a cloud environment for processing voluminous data, India cannot fall behind any further. Procuring and deploying such a technology is a complex exercise. While solution providers can help with technology, there is the need to clearly spell out business requirements; source data; develop a methodology for collating data for estimation of parameters that adheres to accepted concepts and definitions; create classifications; and check for consistency, coherence, and timely dissemination as per policy. Conventional tabulations, spreadsheets, and traditional databases involve drudgery, and make for disjointed, relatively inflexible systems; adopting advanced technology will free us to do more worthy and stimulating work.

The system needs to be more sensitive to public criticism. When the new GDP series was released, there was severe criticism, but remedies could not be made for want of appropriate data. If GDP estimates need to be tracted going into granular data and the method used for aggregation at different layers, there are few options, as inputs on these estimates are spread in such a way that verification is difficult.

Data governance is another major issue. It includes laying down standards for maintenance of data, information technology (IT) architecture, and business continuity to ensure integrity and availability of these data. When spreadsheets are widely used to maintain data, it is not possible to impose rigorous data governance standards. The processes followed while processing data to check for consistency and coherence remain partial. It is not certain that estimates will withstand the test of inter-consistency and robustness. This is a major concern.

Our International Commitment on Quality Statistics

In 2014 a UN resolution laid down 10 fundamental principles for official statistics. The first is

official statistics provide an indispensable element of the information system of a democratic society, serving the government, the economy, and the public with data about the economic, demographic, social, and environmental situation. To this end, official statistics that meet the test of practical utility are to be compiled and made available on an impartial basis by official statistical agencies to honour citizens.

The other fundamental principles relate to methods and procedures, scientific standards, proper interpretation, all sources of data, confidentiality, rules and regulations, coordination, use of international concepts, and cooperation. The 10 principles were notified by the Government of India through a gazette notification dated 15 June 2016 for adherence as part of our international commitment and also to improve data quality. It is necessary to put in place a system to review progress consistent with the commitments made. A code of practice for promoting the use of scientific methods and procedures for maintaining confidence in produced statistics is a major instrument for ensuring high quality, as will be explained later.

The Indian statistical system can be proud of a variety of data on many dimensions of the economy, but systems and processes suffer from many legacy issues, particularly on absorption of technology. Many countries took up the challenge at the highest level because the pressure was very high to do so, as it was cutting at the roots of democracy.

What Are Our Options?

There is a need for a comprehensive review of our official statistical system once again, not only to examine these major deficiencies but also to take advantage of new developments on explosive growth in digitisation of business operations and corresponding data sources along with technology, methodology, and user demand for data. There are countries that are moving away from well-established systems by undertaking a thorough revamping of their system. Apart from methodological and technological aspects, the organisational and administrative machinery also plays an important role, as recognised in the Rangarajan Commission report. If India had acted on some of the recommendations of the commission, the statistical system would have been in a much better state.

An analysis of these recommendations should help to identify the weaknesses and courses of action to put the system on a sound footing. The major reason for advocating a thorough revamp of the Indian statistical system—“creative destruction” (Schumpeter 1976)—is not to criticise the well-established and time-tested systems existing now. These were developed when it was difficult and expensive to collect data and integrate them for multivariate distribution for analytical purposes. The technology and tools for processing and analysis of data were vastly different then. The systems, thus, remained disjointed. The main concern is that when we are going for changes, these are basically in bits and pieces. It is like an old Rolls-Royce, still roadworthy because of continual maintenance. However, maintenance has become costly and it does not pick up speed the way a new one will do. But we cannot abandon it unless we get a new one, which is tried and tested. This is where we need creative destruction.

There have been major developments using advanced technology—geographical information systems (GIS); satellite remote sensing; broadband connectivity covering all gram panchayats; and GST for indirect taxes. Most government offices are digitalised and access to information online is widespread. How do we take advantage of these developments in building a system that allows for much more insight into the economy?

System of National Accounts

The GDP is the single-most important barometer on the economy. Its compilation is guided by the SNA. The GDP covers all important parameters of economic statistics and is harmonised for consistency. In recognition of new possibilities in measuring GDP, the United Nations et al (2009) suggested:

The sequence of accounts and balance sheets of the SNA could, in principle, be compiled at any level of aggregation, even that of an individual institutional unit. It might therefore appear desirable if the macroeconomic accounts for sectors or total economy could be obtained directly by aggregating corresponding data of individual units. There would be considerable analytical advantages in having micro-databases that are fully compatible with the corresponding macroeconomic accounts for sectors or the economy. Data in the form of aggregates, or averages, often conceal a great deal of useful information about changes occurring within populations to which they relate.

However, a debate is going on over the importance of the GDP as a measure of the economy and over the analysis that assumes its central importance (Coyle 2014, 2016). In a review essay on her work, Syrquin (2016) agreed on one issue of particular relevance: “It may be correct, as argued in the book, that GDP has outlived its usefulness in the digital age.”

Micro–Macro Linkage Matters

The two important factors contributing to strengthening an economy are productivity and competition. Both are much more relevant at micro levels. This does not mean that the macro level is not important. It is definitely useful for growth and stability. However, the quality of growth is equally important. Likewise, if tightening the economy for stability causes the small and marginal sections disproportionately higher suffering, a way must be found to protect them. Thus, macromodels and a calibrated approach may not be enough. Solow (2008) critiqued the stochastic general equilibrium model, stating that macromodels are made up of a single “representative agent.” There are many other issues (Barman 2016). This is where micro–macro linkages make eminent sense, though there will be challenges in analysing huge volumes of data. Porter (2002) said,

Developing countries, again and again, are tripped up by microeconomic failures … countries can engineer spurts of growth through macroeconomic and financial reforms that bring floods of capital and cause the illusion of progress as construction cranes dot the skyline … Unless firms are fundamentally improving their operations and strategies and competition is moving to a higher level, however, growth will be snuffed out as jobs fail to materialise, wages stagnate, and returns to investment prove disappointing … India heads the list of low income countries with microeconomic capability that could be unlocked by microeconomic and political reforms.

The other major issue is of poverty, inequality, and deprivation. Income inequality has worsened in the past 35 years; in 2016, the top 10% of earners cornered over half the country’s national income (World Inequality Lab 2018). In pursuing equilibrium for analytical elegance, glossing over basic issues of distribution and equity in a country where one-third of the world’s poor lives can lead only to peril. As Schelling (1978) pointed out, one’s search of equilibrium is meaningful when the dust is settled: “The body of a hanged man is in equilibrium when it finally stops swinging, but nobody is going to insist that the man is all right.” For India, the dust has not settled yet. In its tryst with destiny, the country needs to give vulnerable sections sufficient space to realise their potential.

Should official statistics be burdened with these analytical issues? This can be debated. However, there must be a way to extract data to support various possibilities on analysis.  There is a need for an integrated system populated with data of ultimate granularity and tools to extract relevant information flexibly.

Real Sector, Financial Sector, and Fiscal Sector Nexus

In the analysis of an economy, the focus is on production, consumption, saving, investment, and exchange of goods and services. As a behavioural science, its concern is on explaining how society makes choices under conditions of scarcity of means of production, what enables growth and stability, and how the benefits are distributed. Reinert (2007) observed:

Between the value of the raw material and that of the manufactured product lie much of employment, stable profits under increasing returns and much taxable income for the government. The benefits from manufacturing spread as “triple rents”: (1) to the entrepreneur in the form of profits; (2) to the employee in terms of employment; and (3) through the government in terms of increased taxes.

The framework for official statistics is detailed enough to provide data on these aspects, but these data are not well organised for access to microdata or for masking identity as required. The published data cater to user requirement following certain conventions. To understand the interplay of demand and supply, the domestic economy is divided into three sectors—real, financial, and fiscal. The real sector relates to production of goods and services; the financial sector relates to the flow of money-supporting transactions; and the fiscal sector relates to government revenue and expenditure. The issues relate to how market forces behave and respond to inducements, and how they approach equilibrium.

To analyse performance, basic data is collected on these three sectors. There is also an external sector to complete the building blocks for analysis. As we have a reasonably good set of data on transactions forming part of the external sector, the focus here is on these three sectors only. How should data on the three sectors be organised, collected, compiled, and disseminated for analysis and for shedding light on behavioural dynamics?

In the present system, microdata on entities are spread over many silos. There is limited data at the level of the village or village panchayat, the lowest tier of governance. We have 2,50,000 rural and urban bodies, and over three million government representatives as part of these institutions. The National Institution for Transforming India (NITI Aayog) has an aspirational objective to get data at this level to formulate credible plans at the village level and aggregate these progressively at higher levels, but new age IT is needed.

What New Age Information Technology Provides

To take advantage of the new information age for official statistics, it will be necessary to seamlessly integrate conventional data collection methods with new government initiatives for capturing data digitally—direct benefit transfer, GST, tax collection, dispensation under other social benefit schemes, land record, land use, etc. Digitisation of payments is another exponentially growing area of data. Modernisation of systems for data on employment, health, education, etc, is on the anvil. How can capacity be built to integrate these data for shedding much better light on the economy and socio-economic development? It should also be possible to navigate the data repository or mine the data to pick up nuggets from the submerged mountains of data.

Big data enables collection of audio, video, text, and digital data. These data may be structured, unstructured, or semi-structured. The concern now is mostly with structured data. Hence, the Statistical Data and Metadata eXchange (SDMX) has evolved as an internationally adopted method for data transfer for processing.

Statistical Data and Metadata eXchange

The SDMX is a new-generation IT tool developed by the UN Statistics Division, along with other international agencies, for statistical reporting and sharing data and metadata following a common standard. The SDMX reduces delay in data transmission; uses less resources for processing at different levels; and improves the overall quality and timeliness of collected statistics. The taxonomy and classification for data elements, developed by user countries, can be customised for India’s purposes.

Reporting under the MCA 21 uses the eXtensible Business Reporting Language (XBRL), which takes care of standardisation of concepts and definitions, nomenclature, classification, and hierarchical dimensions. Each individual item has a taxonomy and is assigned a unique computer-readable tag; precise, contextual description makes for seamless aggregation. The XBRL adheres to accounting principles for financial reporting. The same technology underlies the SDMX; the difference is due to its focus on statistical reporting. India needs a road map to implement the SDMX for the reporting and exchange of data.

Big Data and Data Warehouse

As defined in the Oxford dictionary, big data is made up of “extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.” Wikipedia defines big data as “a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them.”

Generally, big data originate in transactional data collected through data streaming—the way Google, Facebook, YouTube, and Amazon throw up data. However, these data are mostly unstructured. While these data can be useful for official statistics in certain circumstances and for certain purposes—like the spread of epidemics, volume of business, and the use of digital modes of payment—official statistics relies on data that are well-defined in terms of concepts, definitions, classification, representativeness, etc. The processing capability of big data technology is useful for integrating granular data, forming part of official statistics and their processing, taking advantage of parallel processing in clustered environments and analysis using advanced statistical and machine learning techniques.

Official statistics builds out of huge amounts of microdata. Agriculture data—on land availability, land use, yield rate, area under irrigation, crops cultivated, cost of cultivation, farm gate price, wholesale price, consumer price, trade and transport margin, weather, and topography—are available in digitised form, by way of published, summarised tables, but are not well-integrated or geocoded. Researchers have to painstakingly cull these data from different sources. As these data are also pre-aggregated in tabular form, there is no flexibility to delve deeper into non-linearities, heterogeneity, or geography. These issues cannot be overlooked if agricultural productivity has to be examined in appropriate contexts. Also, data are required at much lower levels of aggregation than what is available now for evaluation of policy, monitoring of progress, and appropriate follow-up action needed to raise farms out of distress.

Big data technology has the capability to pull out and process multivariate data of ultimate granularity stored in data lakes for much deeper insight on the issue being investigated, and for policy formulation and implementation. In the present illustrative example on agriculture, it is not only productivity and competitiveness that can be analysed in situational contexts but also the conditions of people engaged in agricultural labour; poverty levels; and malnutrition by age, gender, social group, and skills. This kind of analysis will be more purposive, relevant and penetrative, and shed much better light on human distress and possible remedies. This will not only support better, and more empirically based, decisions at all levels of governance—rather than a broad-brush approach, using aggregate data—but also better performance.

A data warehouse is “a single, complete and consistent store of data obtained from a variety of different sources made available to end users in a way they can understand and use in a business context” (Devlin 1996). A data warehouse is generally subject-oriented, integrated, non-volatile, and time-stamped, which is accomplished by a data model organising data as facts along with dimensions as indexes for easy retrieval of these data as per user needs. The basic data elements are stored in the structure of a relational database management system (RDBMS) and then de-normalised using schema and populated in a multidimensional database (MDDB) server for quick retrieval.

The idea is to fuse the two architectures of big data and data warehouse. There is also a need for metadata to explain the end-to-end life cycle of each data item used for different aggregates. This is the approach explained in Mohanty et al (2013). Its advantage is that it allows for the creation of a data repository, like a data lake, if advisable, for all data to flow into a central system or centrally connected systems. That may be a cloud-based cluster of servers from where a specific user can source data for storing in a data warehouse as per requirements. For example, the CSO may have a data warehouse for national accounting; line ministries can have their own data warehouse as per their requirements, drawing data from integrated system as a repository.

Can we still have a single version of the truth? There is no easy answer to this question. Many iterations of carefully estimating various aggregates are needed over a long time, along with metadata mapping inputs with outputs, to find sources of differences and reconcile these. It will be a very complex exercise, and it may not always be feasible. Thus, a single version of the truth may not be easily achievable; however, much insight will be gained when our system can respond to such investigation because of virtualisation of data as part of an integrated repository.

What Kind of Technology?

Big data technology has the capacity to handle huge volume of data, and appears very appropriate as a central system for official statistics for the country. The technology has been under development for the past decade and has now achieved maturity for adoption. The software for the system is largely open source, and the hardware is available like a commodity, which can be expanded at will to cater to increasing demand for storage and processing. The cluster of servers in a distributed cloud environment can support very high scalability.

The data processed for official statistics are well defined in terms of concepts and definitions and need to meet high quality standards following sound methodology. Considering that the data originate from many sources spread throughout the country, sourcing data from decentralised systems following clearly defined measurement standards and integrating these can be a daunting task. There is a need for metadata that track the entire life cycle of data and for discipline on flow of data from these sources. The data can be administered through a system-driven process for timeliness and quality check. Considering the huge task at each stage, quality checks can be greatly process-driven. This system needs to work under professional supervision for both on-time and penetrative analysis regarding quality, consistency, and coherence of collected data. The outliers thrown up by the system or the missing data need to be attended to promptly.

Considering the challenges in data collection and processing and in the management of operations, a modern data warehouse with big data technology is needed for the central system. This can be connected with all the other systems that form part of the national statistical system for two-way flow of data and processing. The requirements would be defined by users—data producers in central and state government ministries and coordinating offices. The details of such arrangements, supporting technology, and integration standards have to be worked out. The broad approach for data flow for the same is shown in Figure 1. It should be noted that this is only an illustration, not a prescription. Help is required from experts to work out appropriate technological solutions.

The data can be captured by various means—surveys, census, web-based reporting, administrative records, automated systems for periodic sourcing from feeder systems, satellite images, Facebook, external open sources, and so on. Then, according to predefined concepts, the data are filtered and processed for extraction, transformation, and loading, and validation checks imposed to ensure quality. Complex event processing engines may include spatial engines. These data then move to the spatial big data warehouse. The spatial element is expected to take care of geography right from the village level, wherever applicable. The user interface is a facility to take control of entire operation according to specific
user needs.

Data warehouses can be used for predefined and ad hoc tabulations. Requirements for data tabulation are set by the SNA along with other harmonised systems for flow of funds, balance of payments, input–output tables, etc. These data are produced by different organisations and the processing systems are also different. Each state is responsible for estimating the state domestic product. Other line departments are responsible for production of data falling under their ambit. While this will continue to be so for a long time, it will be desirable to make provisions for certain checks as required for consistency. The harmonised system of official statistics expects this for quality and coherence of these data.

Dashboard and Data Visualisation

At present, data production and dissemination for official statistics follow a predefined set pattern. While this is the primary responsibility of the national statistical system, each line ministry has a set-up to produce data specific to the needs of users. There are also ad hoc queries, which can be analysed for discovering regularity in those needs. These requirements can be systematised by developing dashboards. A dashboard is defined as an information management tool made available to users as a canned report containing data and visual graphics on key performance areas, which help in monitoring progress and evaluating performance on set objectives. The dashboard is easy to read and the visual is usually revealing. An intelligent dashboard can also be designed to be interactive and support further requirements for review and analysis.

The dashboard culture is widely prevalent in professional organisations. Bloomberg is a classic example of intelligent dashboards feeding data on how global markets move, economies perform, rates change, and even opinions differ every day. Some of the information contained in such intelligent content can be important sources of external data for authorised users, subject to terms of agreement. The availability
of these data can enhance the usefulness of information systems that look continuously for signals that may have policy implications.

The Indian official statistical system has emerged out of the planned era. In the predominantly market-led economy, the requirement is varied and covers a much broader canvas. Price signals, forward trades, and international trade are but a few examples of demand for information in the present globalised world. It is necessary to work out how far an official statistical system can move to accommodate user requirements where external sources of data would be required. Official statistics is a public good; hence, data from other countries can be collected from their official sources as admissible. However, data from non-official sources may lack quality and credibility, and need to be evaluated against quality standards.

Data Quality and Code of Practice

Official statistics needs to be compiled following standards laid down for concepts and definitions; data collection and aggregation methodology; a data dissemination policy as per our commitment; and a publicly disclosed policy on transparency to maintain high confidence. The Rangarajan Commission specified quality assurance standards. Our commitment on quality—as set out on 15 June 2016, and consistent with the UN resolution of 2014—stipulates certain core principles: impartiality, objectivity, integrity, sound methods, confidentiality, accessibility, accuracy, reliability, coherence, and clarity. A system of statistical audit is prescribed for ensuring these quality standards. Stringent data quality conditions are stipulated in the code of practice in many countries. For high-quality data, systems and processes need to be upgraded and modernised; resources strengthened; and feedback solicited from users periodically.

The Rangarajan Commission found that

At the moment, as the system operates, there is no effective coordination either horizontally among the different departments at the Centre or vertically between the Centre and the States … For reform of administration of the Indian Statistical System by upgrading its infrastructure and thereby enhancing the credibility of official statistics, the Commission is of the view that an independent statistical authority free from political interference having power to set priorities with respect to Core Statistics is needed to ensure quality standards of statistical processes. Such an authority will also improve the coordination among different agencies collecting data. Though the National Advisory Board on Statistics was constituted with this objective, its impact has been minimal. In view of this, the Commission has recommended the creation of a permanent and statutory apex body—National Commission on Statistics [sic]—through an Act of Parliament, independent of the Government in respect of policy making, coordination, and maintaining quality standards of Core Statistics.

Though the Rangarajan Commission was appointed by the Vajpayee government, and its report was implemented by the next government, some vital recommendations have not been acted upon. The NSC needed the backing of an act for effectiveness. Without it, the NSC remained largely handicapped. The NSC came out with many more reports, but these were not acted upon. Without accountability on implementation of well-thought-out decisions, it effectively remains helpless.

As the NSC functions with part-time members and a small contingent of staff, it has not been effective in carrying forward its mandate. This is further weakened by its complete dependence on the ministry for administrative and financial matters, which creates various frictions. A way to ensure genuine independence must be found.

Learning from the United Kingdom

The commission approach at the apex of the statistical system does not work well. Such a system in the United Kingdom (UK) was replaced by the UK Statistics Authority (UKSA), which is a board backed by the Statistics and Registration Service Act, 2007, and is directly responsible to parliament through the Ministry of Cabinet Affairs. A highly professional membership and robust systems and processes has made a major difference. The UKSA system ensures high standards of transparency and professionalism and makes executives responsible. For production of official statistics to be independent, the entire machinery must work at arm’s length from ministerial control.

The UKSA has oversight of the Office for National Statistics (ONS), a non-ministerial government department. The UKSA is also responsible for independent monitoring and assessment of official statistics; maintaining a code of practice for official statistics; and according code-compliant statistics as “national statistics.” The chief executive of the ONS is the National Statistician and is directly accountable to parliament through the UKSA.

The ONS started the first phase of modernisation in 2001. Its experience is well documented in Penneck (2009):

Pressure to operate more efficiently, respond more rapidly to changing user demands, exploit data more effectively and improve statistical quality have led a number of statistics offices to seek to modernise their statistical systems in similar ways: adopting an information technological environment, using standard tools, and processes across statistical systems, with common business processes driven by metadata.

“Together the Design Authority and IT strategy provide clear direction for modernisation” for delivering “high quality and noticeable business benefits.” This should be the most important learning point for India. The ONS is now going through the second phase of modernisation. Charles Bean (2016) Committee Report sets the agenda for this phase of reform, which is challenging organisationally, methodologically, and technologically. The report notes how the methodology focusing on GDP is deficient in many respects.

The UK went about this reform as a part of its election manifesto. Parliament was fully involved in the discussion on systems, checks, and balances. This was in the making for a long time. Jack Straw, a prominent member of parliament and minister (until 2014) told the Royal Statistical Society in 1995:

Democracy is about conceding power to those with whom you disagree; not those with whom you agree; and about ensuring that every citizen has a similar access to the information on which decisions are made, and governments are judged. In a modern democracy, the system of official statistics should be a dignified part of the constitution.

Independence and authority for official statistics have been fortified in many other countries in the West by enactment and establishment of a statistical authority. Such an authority is free of any kind of extraneous influence and it is vested with the power to produce high-quality statistics. The European Union has prescribed a code of practice that member countries follow. India must follow similar organisation, systems, and processes to revamp its statistical system.

Action Points

India needs an exhaustive list of data going into estimation of GDP, and for financial and fiscal statistics, with clear definition, classification, and sources for each item. The list may run over 3,000 items, expected to be available with the CSO.

A system is needed to capture data from various sources. Milestones for web-based reporting may be defined following reporting standards such as SDMX or XBRL. India needs to build capacity for processing voluminous data using modern big data and data warehousing technology. It is desirable to have a design authority to lay down standards on technology for integration, high value on low cost, and use of standard tools so that local development is avoided.

India needs to develop capacity for using advanced techniques for survey sampling; measurement of variables as contained in the SNA and other manuals; and undertake experimentation for discovery of patterns and dependencies using techniques of multilevel analysis, machine learning, and artificial intelligence. The time has come for statisticians to acquire advanced knowledge in software and domain knowledge of subject area of analysis and graduate as data scientists.

Last but not the least is an act, to make the system really independent. Official statistics is a public good, and an important part of the democratic process. An act that provides for the creation of an independent, professional authority that can raise the quality of data and confidence in the system, will make a major difference.

Conclusions

Official statistics is an important part of the democratic process: it informs people how the economy is progressing; how interventionist government policy helps in maintaining stability conducive to growth and advances the cause of social development, particularly in respect of vulnerable sections of the people; and how private enterprises work in a market economy. These data should be available from the local level and upwards to support more efficient use of resources and more responsive governance in all walks of life. The data should withstand rigorous scientific scrutiny and be made available to users as a public good.

New techniques are needed in the social sciences to analyse the complexities of socio-economic dynamics. A prerequisite is the availability of granular data and tools for their access as per analytical needs. A spatial big data warehouse, which can capture the entire life cycle of data going into estimates at different levels of geography, is expected to serve as the backbone of analytics and give a new direction on research backed by solid empirical evidence. The flow chart is only an illustration, not a prescription. We do not suggest any specific tool. The widespread use of artificial intelligence, along with massively parallel processing in cloud environments, should lead to new breakthroughs. The prerequisite for this is relevant data extracted from various sources for such analysis. A new breed of researchers and data scientists with statistics and machine learning expertise, who understand business objectives and are good at handling huge volumes of data, will find interesting patterns. Tobler’s first law of geography is, “Everything is related to everything else, but near things are more related than distant things.” There is the option of multilevel analysis as a powerful statistical tool for a layered approach to data analysis. In the process our long-standing ideas based on too much of abstraction will come under scrutiny and pave the way for deeper insight on development issues. It will help in creating a new vista in our development effort in the present millennium.

References

Ahrens, Joachim (2002): Governance and Economic Development: A Comparative Institutional Approach, 14, Edward Elgar.

Barman, R B (2016): “Rethinking Economics, Statistical System and Welfare: A Critique with India as a Case,” Economic & Political Weekly,
LI (28): pp 46–56.

Barman, R B (2017): “Modernisation of Information System for Economic Statistics: An Integrated Approach for Policy Analysis,” Special Proceedings of 19th Annual Conference of SSCA held at SKUAST, Jammu, 6–8 March.

Bean, Charles (2016): “Independent Review of UK,” Economie & Statistique, March.

Coyle, Diane (2014): GDP: A Brief but Affectionate History, Princeton University Press.

— (2016): “GDP in the Dock,” Nature, Vol 534, pp 72–74, 23 June.

Devlin, Barry (1996): Data Warehouse: From Architecture to Implementation, UK: Addison-Wesley Professional.

Dholakia, Ravindra H and Manish B Pandya (2017): “Critique of Recent Revisions with Base Year Change for Estimation of State Income in India,” Journal of Indian School of Political Economy,
Vol XXIX, Nos 1 and 2, January–June 2017.

Government of UK (2007): “Statistics and Registration Services Act 2007,” UK.

Lee, J G and M Kang (2015): “Geospatial Big Data: Challenges and Opportunities,” Elsevier.

Mohanty, Soumendra, Madhu Jagadeesh and Harsha Srivatsa (2013): “Big Data Imperatives: Enterprise Big Data Warehouse,” BI Implementation and Analytics, Apress.

Penneck, Stephen (2009): “The Office for National Statistics (ONS) Statistical Modernisation Programme: What Went Right?” What Went Wrong? in Proceedings of Modernisation of Statistics Production.

Porter, Michael (2002): “Enhancing the Microeconomic Foundations of Prosperity: The Current Competitiveness Index,” Harvard Business School, September.

Rangarajan, C (2001): “Report of the National Statistical Commission,” Vol 1, August, Government of India.

Reinert, Erik S (2007): How Rich Countries Got Rich ... And Why Poor Countries Stay Poor, 316, New York: Carroll & Graf Publishers.

Rudra, Ashok (1996): Prasanta Chandra Mahalanobis—A Biography, New Delhi: Oxford University Press.

Schelling, Thomas C (1978): Microeconomics and Macrobehaviour, Fels Lectures on Public Policy Analysis, 25–27, University of Pennsylvania.

Schumpeter, Joseph A (1976): Capitalism, Socialism and Democracy, George Allen & Unwin (Publishers) Ltd, pp 81–86.

Smith, J W (2004): Economic Democracy: Political Struggle of the Twenty-First Century, 50, New Delhi: India Research Press.

Solow, Robert (2008): “The State of Macroeconomics,” Journal of Economic Perspectives, 22 (1): 243–49.

Straw, Jack (1995): “As Quoted in ‘Independence for UK Official Statistics: The New UK Statistics Authority,” by Richard Laux, Richard Alldritt and Ross Young (UK Statistics Authority), Google download.

Syrquin, Moshe (2016): “A Review Essay on GDP: A Brief but Affectionate History, by Diane Coyle,” Journal of Economic Literature, 54(2): 573–88. doi:10.1257/jel.54.2.573.

Tobler, Waldo R (1970): “A Computer Movie Simulating Urban Growth in Detroit Region,” Economic Geography, 46:234–40, doi:10.2307/
143141
.

United Nations et al (2009): “System of National Accounts 2008,” p 9.

World Inequality Lab (2018): “World Inequality Report, 2018,” https://wir2018.wid.world/files/download/wir2018-summary-english.pdf.

Updated On : 9th Jul, 2018

Comments

(-) Hide

EPW looks forward to your comments. Please note that comments are moderated as per our comments policy. They may take some time to appear. A comment, if suitable, may be selected for publication in the Letters pages of EPW.

Back to Top