What Ails India's Data Economy?

Data is the new currency. Drawing parallels between existing taxation and data structures helps us to understand how data is being monetised by the government and private firms. But, the lack of a robust data protection architecture in India raises serious concerns. 

 

This is part of a six-article series on questions surrounding data, privacy, artificial intelligence, among others. You can read the introduction here.

 

In a modern economy, taxes collected from individuals and organisations are used as public expenditure for commonly agreed goals of the society and for government functions. In a parliamentary democracy, various kinds of taxation applicable to individuals and commercial enterprises are decided upon through discussions. A statement of revenue and expenditure is presented as a budget, and is kept in the public domain. The broader policy objectives and long-term goals are specified in the economic surveys, released annually ahead of the budget presentation. 

The 2020 Economic Survey has hinted at the state's attempt to monetise personal data, essentially making data a commodity. Subsequently, in a reply to a question in Parliament, the government has informed that vehicle data in the country has been monetised to the tune of Rs 65 crore. While data monetisation by the government, at a time when there is no crystallised data protection architecture in the country, raises serious concerns, one needs to question the fundamental premise of data collection (or rather, data taxation) by the government itself. 

Data Taxation

Like taxation, data collection or data taxation is important for the state to perform its functions. It is essential to collect basic demographic details, for something as routine as issuing a driving licence. But, unlike revenue collected as tax, data collected by the government can be double-spent (that is, it can be sold to multiple entities), and can be linked with other data sets to derive a new value. So, it is important to evaluate how data tax is  governed, along with a host of related issues, such as the optimal level of data taxation (data collection), how collected data is used, what data should be redistributed, and broadly the transparency and accountability concerns that come with it. 

Let us try to draw parallels between data and taxation systems to understand the issues surrounding data collection.     

Digital Identifiers 

Analogous to tax identifiers, like Permanent Account Number (PAN), bank accounts, among others, that are used to track a person’s or an organisation’s incomes and to levy taxes, digital identifiers are mandated by the law across digital platforms for service delivery. These include mobile numbers, device IDs (identity documents), digital IDs, social media account identifiers, among others. While some identifiers are essential in certain contexts, forced cross-linking, such as Aadhaar-mobile linkage which was subsequently struck down by the Supreme Court, amounts to a form of data taxation that forces an individual/organisation to mandatorily give away data to private entities under the pretext of the law, when actually no such system exists. 

Data Taxation Systems  

Like direct and indirect taxes from revenue, data is collected directly and indirectly about persons through various systems. The government directly collects various kinds of data, be it income details for income tax, socio-economic data for welfare schemes, specific details, like maternal health parameters for providing maternal and child care and related subsidies. Data can also be collected from us through indirect means, such as goods and services tax (GST) invoices providing travel history, cash transaction data, among others. Such data is routed to the financial intelligence unit, which is an executive agency for the prevention of money laundering activities. Therefore, unlike monetary tax, in data tax, digital identifiers can be used as a common foreign key to cross-link and share data between various government and/or private entities, and the emerging cumulative data can be used by multiple parties. 

For example, as per the law, caste information cannot be collected for Aadhaar, as it is largely a system to bring together biometric information and basic identification details of an individual, but it does not reveal anything about a person’s socio-economic status. On the other hand, the National Population Register (NPR), which predates Aadhaar, has been collecting information of an individual’s caste. Once Aadhaar came into the picture, the NPR has been collecting both individual’s caste and Aadhaar information. The information is then stored in the state residential data hubs (SRDH).    

When a caste certificate is issued, it is issued via a digital locker, which is operated by the Ministry of Electronics and Information Technology (MeitY). In the background, caste certificates are linked with Aadhaar. While Unique Identification Authority of India (UIDAI) clearly stipulates that Aadhaar strictly remains for basic identification of individual and it does not capture information regarding socio-economic status of an individual, this cross-linkage of data is in violation of existing laws, and also amounts to profiling of a person.  
 
Every new digital platform built and promoted by the government has inherently been a data taxation network because, like private entities, there is no purpose limitation on the collected data even by the government. When data is unified or merged with digital identifiers, a complete data footprint can be accessed.    

Data Revenue 

Analogous to monetary revenue, data revenue comes directly and indirectly to the government from both individuals and entities. Census, Aadhaar enrolment, applications for caste and income certificates, income tax filings, among others are direct means to collect data from individuals. Indirect means include know-your-customer (KYC) details, public records, such as crime data, court records, among others. Data revenue also comes from entities in the form of regulatory filings, such as registrar of societies, licencing requirements, Ministry of Corporate Affairs (MCA) filings besides GST (which now collects granular data on the movement of goods through electronic way bill).

Data Expenditure

The collected data is then put to use to serve the original purpose as per the statute. But, like investing a portion of revenues for growth, some data is cross-linked to generate newer insights, or to derive knowledge that could help the state improve its functions. 

Data triangulation, a method through which different revenue data sets are cross-linked, has been proposed in the recent GST Council meeting by the revenue secretary of the Government of India to identify tax evaders in the system. While there are legitimate needs for cross-linking data, an excess and arbitrary cross-linking without checks would amount to a 360-degree profiling.   

While the use of data for government planning can be considered as public expenditure, sharing of data to private entities, either as openly in the case of vehicle data or selectively as in the case of Employees’ Provident Fund Organisation (EPFO) data to select researchers, can be seen as private expenditure. While illegal repurposing and the diversion of monetary revenues for private purposes would be called a scam, however, repurposing data is not treated as one, even though privacy is declared as a fundamental right.

Data Centralisation 

Analogous to how there is an increased focus on fiscal centralisation, both through centralised taxation systems, such as GST, and the increasing centrally sponsored schemes/ projects encroaching on the subjects of states, digitisation paves the way for the centralisation of the data. The availability of this data at the national scale also paves a way for generating a new value at the national level. 

Although water and electricity utilities are managed by the states, the centre’s decision to make such bill collections centralised through a single platform, like Bharat Bill Payment System (BBPS), granular data about the receivables of state government run power distribution companies (discoms) becomes accessible to the central government. This has widespread negative implications for the states.  

A significant part of this centralisation happened through the National eGovernance Policy (NeGP), and subsequently through the recommendations of Nandan Nilekani-led Technology Advisory Group for Unique Projects (TAGUP) during the United Progressive Alliance (UPA) government era.  

Goods and Services Taxation Network (GSTN), UIDAI, National Payments Corporation of India (NPCI), and in many other private non-profit organisations, where the government holds a strategic control, market players hold a significant control. These entities, as they are private, are not subjected to constitutional checks and balances, including transparency laws, such as the Right to Information (RTI) Act. They build national-level databases and derive aggregate value for them and their stakeholders.
 
Some examples of the centralised data systems include VAHAN, Sarathi, Bharat Bill Payment System, FASTag, Aadhaar bank mapper, DigiLocker, Crime and Criminal Tracking Network and Systems (CCTNS), Integrated Criminal Justice System (ICJS), among others. Public Credit Registry, National Skills Registry, National Social Registry, and Deoxyribonucleic Acid (DNA) data banks are all proposed large data infrastructures.

While centralisation of some data may help the central government in dealing with certain policy objectives, there is always a threat to federalism as this gradual centralisation of data would eventually make states as glorified municipalities. An example could be the reservation of Tamil Nadu in accepting National Food Security Act’s (NFSA) targeted public distribution system (PDS), moving away from current universal PDS and Ujwal DISCOM Assurance Yojana (UDAY), a power sector reform scheme for distribution companies. In both the cases, the central financial assistance is conditional on the production of the data, and the Government of Tamil Nadu perceives such clauses as aimed at diluting its power to autonomously legislate. These central databases also pose a grave cyber security risk as they are potential destinations in the case of a cyber warfare.

Revenue Deficit and the Cost of Junk Data 

Revenue deficit occurs when actual revenues fall short of expectations. In the data economy, this could mean the inability to have a complete picture through data and the data collection network. Much like having strong laws to prevent tax evasion on paper is not going to bring revenues automatically, coercing people to mandatorily give data will not address the deficit in the data.  

Unlike monetary taxes where non-compliance means only a shortfall, in data tax, this could mean the costs incurred on the junk data, or to maintain updates to the data. An example of an inefficient data collection network could be Aadhaar where there has been no document verification for the enrolment of the data.

In the pursuit of achieving near 100% enrolment, no data has been captured on Aadhaar holders who have died. Due to these shortcomings in the Aadhaar enrolment, NPR exercise has been projected as a corrective measure, and it would come at the cost of Rs 4,000 crore in addition to costs incurred on the census exercise. If data collection under Aadhaar had been efficient, these additional costs would have been avoided. Managing meaningful data sets have a cost associated with them for data to be useful, and the notion of self-cleaning databases are only good marketing pitches but nothing else.

Fiscal Deficit and Trust Deficit 

In an economy, fiscal deficit  is how much a state borrows to meet its policy objectives. While every economy borrows, there are limits placed on the borrowings to maintain overall financial stability of the state. In a data economy, while one cannot borrow data, we often face a trust deficit in the data shared by the governments, particularly when requests for additional transparency and probe are rejected. Again, there might be legitimate state interests under its sovereignty to “conceal” select data to achieve its goals and national interests (a carefully, publicly defined one), the state cannot freely build narratives with selective data or without following transparency. By doing so, a trust deficit emerges, and such deficit is harmful for national interests.

Data-driven Decision-making and Evidence-based Policy 

While the 2019 Economic Survey talks about the use of personal data for “public good,”  the draft Personal Data Protection Bill tabled in Parliament gives wide-ranging powers to the government to seek data from any data processor for evidence-based policymaking. This raises concerns for not just the civil society, due to the privacy concerns, but also from businesses, which have a sovereign stake on the access to data. 

While data-driven decision-making and evidence-based policymaking may sound rational, it could still lead to suboptimal outcomes, like perpetuating the biases pre-existent in the data and discount factors that are not visible in data, but would impact the policy. Any policy needs to have a space for democratic participation and debate to arrive at consensus, and using data-based policymaking which ignores ground realities would lead to disasters.

Constitutional Checks and Balances 

When India opened up its economy in the 1990s, a wide range of regulatory institutions and mechanisms were put in place. As we have matured as a democracy, there has been a gradual adoption of technology in governance to increase transparency, by bringing forward legislations like RTI and carrying out social audit of government programmes by the civil society.  

But, the digital transformation that has been underway in India comes with a few checks and balances to prevent the executive overreach. This has been demonstrated a number of times by knotty problems surrounding Aadhaar and its implementing agency UIDAI. Below are some ideas that need discussion for putting in place an institutional and procedural system for the data economy in India, which respects constitutional values and does not alter the relationship between the citizen and the state in an adverse way.

Instituting Data Budget and Floating a Data Comptroller and Auditor General 

Every department in the government, as part of its accountability to the citizens, presents its budget, and keeps its revenue and expenditure in the public domain. The Comptroller and Auditor General (CAG) is a constitutional authority that is empowered to audit government departments. In the same manner, a data budget needs to be presented by the government, noting data revenues, data expenditure, and how they are managed. The government should also come out with plans targeting to keep the cost of junk data and trust deficit low. A data budget would also help in prescribing the right level of data taxation, where trade-offs for privacy are necessary, and how that data can be put to use. A data CAG must be floated and empowered to independently audit data collection, usage practices of the government bodies to ensure that data is used only for purposes prescribed by the law, in line with the values of constitutional democracy. 

Limitations on Cross-linking Data 

Just as how the Fiscal Responsibility and Budget Management Act, 2003 limits and manages fiscal deficit at manageable levels for a stable fiscal environment, there is a need for a framework to limit the state’s ability to cross-link data, which violates the fundamental right to privacy, even if it derives an economic value for the state. The balance of interests must be clearly and transparently documented, reviewed periodically, just like how the finances of the different arms of the government are reviewed, and restrictions are placed.

Enhancing Data Federalism 

The move towards extreme centralisation of data and usurping the powers of the states over data systems by the centre pose a threat to the federal polity of the country. States should be empowered to build digital infrastructure in line with the purpose and the usage of the data. This could mean building alternate digital systems that compete with each other to preserve the rights of the individuals and organisations who have parted with their data. At the same time, effective decentralisation of power over data should be promoted all the way down to the  local government bodies, and strengthening data silos would be the only way to retain the rights of individuals and organisations.  

Empowering Citizens to Question 

While RTI empowers citizens to request information from the government, there have been several instances when information has been denied on the grounds that there is no information with the department concerned in the form requested by the person seeking the information. Given such a scenario, technology should be used to audit data systems put in place, and citizens must be empowered to question and observe the metadata of systems in place to prevent abuse. 

Beefing Up Cyber Security

Data systems, particularly the centralised ones, pose a crucial cyber security challenge. India does not have a good track record in identifying, let alone prosecuting cyber criminals. None has been prosecuted so far in the largest debit card hack, which compromised the safety of 32 million card holders in 2019. India needs to ramp up both its law enforcement and defence capabilities to combat cyber wars. 

 

 

Must Read

Do water policies recognise the differential requirements and usages of water by women and the importance of adequate availability and accessibility?
Personal Laws in India present a situation where abolishing them in the interest of gender justice also inadvertently benefits the reactionary side.   
Back to Top