Open Data in India: In a Restrictive Copyright Regime, Voluntary Organisations Pitch in to Make Data Accessible

Globally, while there has been a shift towards open data policies, some Government of India’s departments have gone the opposite way. They have decided to stop putting out data in the public domain. Data enthusiasts and voluntary organisations have been pitching in with their efforts to build open data resources in India.  

 

This is part of a six-article series on questions surrounding data, privacy, artificial intelligence, among others. You can read the introduction here.

 

The idea of open data has caught the imagination of the world. Open data relates to the idea of keeping data freely available for usage, remixing, and wider distribution. At most, open data comes with as minimum requirements as “share alike” or “attribution.” Globally, several institutions have been committing to open data access, and governments are, particularly, joining the open data movement of sorts. India is also legally obliged to follow open data guidelines. According to the Open Knowledge Foundation’s “State of Open Government Data Index 2018,” Taiwan tops the index, and is followed by Australia, United Kingdom (UK), and France. India stands at 32nd position on the index with 47% score, while Taiwan has 90% score (Open Knowledge Foundation nd). 

A good open data set is hard to come by in India. There have been several efforts over the years to make it possible. The National Data Sharing and Accessibility Policy (NDSAP) that came into existence in 2012, with the approval of the union cabinet, is one such effort. The NDSAP is empowered by the Section 4(2) of the Right to Information (RTI) Act, and makes it the responsibility of every public authority to share their data and information suo motu at regular intervals. The NDSAP is applicable to all non-personal, non-sensitive data produced using public funds by the central, state, and local governments, and their departments. It covers data in all formats, digital, analog, machine, and human-readable formats. The NDSAP promises to adhere to the principles of open data: openness, transparency, quality, privacy, and machine readability.

As a policy, NDSAP encourages and facilitates the sharing of government-owned data to achieve two primary goals: transparency and the accountability of the government, and innovation and the economic development of the country. 

As part of NDSAP directives, ministries and departments are expected to set up their respective NDSAP cells with chief data officers to oversee them. The primary job of these cells is to gather, identify, and classify data sets into shareable and non-sharable data sets. It is also their job to make these data sets available in both machine and human-readable form for maximum usage. They are also tasked with keeping them updated and relevant on data.gov.in website.

India’s open government data (OGD) platform, data.gov.in is used by various government departments to publish data sets. It intends to increase transparency and keep government data in the public domain for innovation purposes. It works under the provisions of the NDSAP. Though the OGD platform has hundreds of data sets, it is plagued with incomplete datasets, and they are rarely updated. 

The OGD platform has not been able to publish enough government data because data-rich departments, like the Indian Space Research Organisation (ISRO) and the National Sample Survey Office (NSSO), have not been providing good data sets. This could be because the focus of the platform so far has been to bring together the data of the departments which are publishing information in some format, and the attempt has been to put out the data in a single format. Even government bodies like ISRO, despite the need to withhold information due to national security reasons, a lot of data, like its earth observation data under Bhuvan project, can be kept in the public domain. Similarly, efforts should be made to further ease the access to available NSSO data, as currently one has to pay to obtain unit-level data. 

Globally, while there has been a shift towards open data policies, some Government of India’s departments have gone the opposite way. They have decided to stop putting out data in the public domain. Below are a few notable examples, and their consequent impact on the economy of the country. 

Transparency to Opaqueness: The Case of Central Board of Indirect Taxes and Customs 

Particularly, the decision by the Central Board of Indirect Taxes and Customs (CBITC) in November 2016 to stop publishing import and export data sets has had an adverse impact on the businesses and the economy. This near real-time data set included every item that was imported into and out of India. Several market intelligence start-ups and data business firms, like InfoDriveIndia, Cybex, Eximpulse, among others, have been built on these data sets. These firms in turn provide business intelligence to small- and medium-scale industries.

Many small- and medium-scale industries use their business intelligence to take decisions. Small scale industries use them to decide on which products to buy, whom to buy from, who are the competitors, and who also import similar goods, among others. 

The decision by CBITC to stop publishing import and export data sets has economically hurt companies which relied on them and data innovation in general. It has been alleged that some big businesses are behind the decision of the CBITC to save their business prospects against that of small and medium scale industry. It should be noted that CBITC had been the pioneers in publishing data online much before NDSAP came into existence in 2012. It used to publish one of the most updated, real time, and relevant information.  

Restrictive Copyright Regime and the Bureau of Indian Standards

The Bureau of Indian Standards (BIS) formulates Indian standards for various products, processes, among others in line with the latest technology and the changes taking place in the economy. The purpose behind building standards is to safeguard the producer and consumer of a product and a process, and to improve the economy of the country. From standarising measurements for cycle rubber tubes to the national flag, BIS has contributed to the know-how in the country. But these standards are placed behind a paywall, and are guarded by a strong copyright regime. The standard documents are also not machine-readable (PDF documents with raster images). Such barriers to the information and the deliberate withholding of data are in violation of Section 4(2) of the Right to Information Act and NDSAP guidelines. 

Like a private company, BSI sells its data on standards, even though it is funded by taxpayers’ money. The BSI data is sold under highly restrictive licences, and is only available in an archaic PDF format, which cannot be processed by machines. These restrictions make it impossible for a common man and a small-scale industry to access the information. This information is a standard and a form of legal code that needs to be open, so that manufacturers can access it and adhere to the standards. It is equally important for consumers, as they can take decisions before buying a particular product to ascertain whether or not it complies with the standards of BIS. 

The Role of Voluntary Organisations 

There is a huge demand for clean, digital, complete, regularly updated, and openly licenced data sets in India from media houses, researchers, commercial entities, among others. The NDSAP has not been able to meet these demands as it could not enforce its principles on the government departments to make data sets publicly available. Against this backdrop, volunteer-based community organisations, like DataMeet, WikiData, lawresource.org, OpenStreetMap, and others, have been filling the gap for open data. 

Parliament Constituency Data and the DataMeet

Indian parliamentary elections are the biggest in the world due to the sheer magnitude of the exercise. Election results are celebrated across India, making the counting day one of the biggest events for Indian media houses. Analysis of various kinds are made on the basis of election results by television channels, newspapers, and the social media.    

Despite the fact that the maps with the contours of parliament constituencies are the most basic maps required to represent/analyse elections, the Election Commission of India (ECI) does not publish them as open data or in a usable digital format. India has 543 parliamentary constituencies and the boundaries of which are managed by the ECI. 

Given this scenario, a group of volunteers from DataMeet painstakingly digitised and georeferenced PDF and image maps of the parliament constituencies in March 2014.  Georeferencing is the process of assigning geographical coordinates (latitudinal and longitudinal) to each pixel of the raster (image) map. In the case of boundaries, the number of points the mapper georeferences is directly proportional to the quality of the map. It was a lot of manual work. 

DataMeet is a community of over 1,500 data enthusiasts from around the country who have been working towards making data open and accessible to all. They share information on how to access and use information, give technical support to people at large, and discuss policy issues regarding data (DataMeet nd). The decision to publish parliament constituency maps by DataMeet volunteers was taken as part of its annual conference called “Open Data Camp” held in 2014 in Bengaluru. The parliament constituency maps produced by DataMeet are, probably, the only geographical data set available under liberal licences for both media houses and researchers even after two general elections in 2014 and 2019 (DataMeet nd; Open Data Camp nd). 

While voluntary organisations like DataMeet are now working towards making the data of assembly constituencies in the country publicly available, the government and its various departments still fail to recognise the significance of open data.  

Open Geographical Database and the OpenStreetMap Foundation 

The Survey of India (SoI) is the country’s premier organisation for carrying out mapping. It releases two kinds of map series:  defence series maps (DSM) for defence purposes and open series maps (OSM) for civilian use. The latter are prepared on 1:2,50,000 (1 centimetre on the map represents 2.5 km on the ground),1:50,000, and 1:25,000 scales. They are for general public use, and do not contain any grid and classified information. These OSM versions are of interest to the open data communities. 

Although SoI has published OSM maps on its website, those maps are not foolproof but have glaring errors, like the exclusion of large states like Andhra Pradesh (Survey of India 2020). As SoI publishes only watermarked PDF maps, they are of no use to any geographer. They are not under any open data licence, and the licencing terms are unclear. This makes it impossible to use them along with other open data sets. The fact of the matter is that the above-mentioned issues can be resolved easily by attaching an open government licence to the data sets. 

One of the biggest geographical data sets available in India today is that of the OpenStreetMap Foundation. It was started by Steve Coast in 2004 with an aim initially to map the UK.  In the UK, like in India, mapping projects (surveys) are carried out by the government, but it has failed to keep those maps open. As an answer to the demand for the availability of maps openly, OpenStreetMap has come out to provide geospatial data for anybody to use and share.  The maps have been contributed by volunteer editors. 

Not much happened on this front in India until 2008. There had been a very few mappers as mapping required a global positioning system (GPS) device, which was not only expensive but difficult to obtain. The legal cases against mappers were also not uncommon. But things have changed with the advent of smartphones and the built-in GPS. They have brought about a mapping revolution in the country (Times of India 2008).    

Figure 1: The State of OpenStreetMap’s Mapping of Bengaluru City in 2007 and 2020

Source: Martijn van Exel of OpenStreetMap 

 

Today, almost 12 years later, the volunteers of the OpenStreetMap Foundation have built the biggest open geographical database in India, and it is made available freely to the public. Despite achieving this impressive feat, building a geographical database is still a work in progress.  

On the other hand, OpenStreetMap still does not have access to other geographical data stuck with the government under restrictive copyright regimes. Among them, the postal index number data (widely known as PIN code) and the associated geographical boundaries are of significant importance, and they need to be kept in the public domain. The PIN codes in India were introduced by the postal department in 1972.  It was brought in to simplify the manual sorting of letters and to eliminate the delays in the delivery of mails. Over time, it has gained prominence as a code that represents geographical area. 

Food delivery and e-commerce firms, like Zomato, Swiggy, Flipkart, among others, use PIN codes to determine the shipping. The people at large use them to denote locations. But, a geographical location that a PIN code covers is still a secret sauce. The postal department (the government) is the legitimate owner of this data.  It is impossible to add this closed data set to OpenStreetMap to make it useable, until the government releases the PIN code boundaries under an open licence. OpenStreetMap, as an open database licenced (ODBL), can only receive and add data sets that are equally open in terms of licencing. Unless PIN code boundaries are released under a government open data licence, they cannot be merged or remixed with OSM. Issues such as these prove to be an impediment to keep geographical data sets open for public use. 

India’s rapid urbanisation has been a challenge to the mapping exercise. The geographical reality of the country changes every day, and it needs to be reflected in datasets on a daily basis. OpenStreetMap has been able to counter this challenge with its widely distributed voluntary geographers. Although SoI has tried its hand at crowdsourcing maps, it lacks community management, as it does not encourage community participation due to the closed copyright regime (Survey of India nd). So, it is due to these challenges, voluntary organisations, such as OpenStreetMap, have been successfully contributing to the need for open data. 

Village Boundaries 

India is a country of villages. Different government sources give different numbers on the number of villages in India. But, as per Census 2011, there are about 6,49,481 villages in India. Every village has an official geographical boundary. But, the country still lacks a single, complete, open source on the village boundaries. Even though some of the government departments have published it, it is often in the format that cannot be used by the open data community, or it does not have the attributes required to map other data sets.  

The open data community has tried to resolve issues on this front. It has been noticed that the issues concerning villages are far worse than that of boundaries of parliament constituencies, given the sheer size and the lack of any information. The DataMeet community has been working on this project from 2016 onwards.  

Four years into this project, in 2020, the DataMeet community has been able to find and digitise boundaries of villages only in nine states, in Karnataka, Sikkim, Rajasthan, Odisha, Maharashtra, Kerala, Gujarat, Goa, and Bihar. The DataMeet community has been able to do this with the contribution of data sets from a few non-profit organisations, like the Centre for Interdisciplinary Studies in Environment and Development. The progress of the project has been slow due to the lack of availability of data in the public domain and requisite funds.  

The effort to digitise and publish information on 6,49,481 villages in India needs a government willing to be liberal about open data policy, allowing researchers and private firms to use the data. Given the humongous nature of the task, it cannot be achieved by a few voluntary organisations. There needs to be a proactive approach from the government in terms of funds, policy, and resources.  

The Road Ahead

The copyright regime and the general lack of transparency from the data-rich departments are hurting the open data scene in India, and its consequent impact on the economic progress of the country. Numerous start-ups, like food delivery and ecommerce firms, rely heavily on open data. It has also been proved that relief and rehabilitation operations during disasters are also affected, as voluntary organisations, as demonstrated during floods in Kerala, had to quickly recreate data to support relief measures.   

It is clear that voluntary organisations have been contributing immensely to the open data scene in India. However, some of the data sets can only be produced by the government due to the financial capital they demand, or due to the regulatory constraints or the natural monopoly of the government. Unless the government steps up efforts to enforce the guidelines of NDSAP, open data enthusiasts in their individual capacity and voluntary organisations cannot help data-hungry innovation in India.    

Must Read

Do water policies recognise the differential requirements and usages of water by women and the importance of adequate availability and accessibility?
Personal Laws in India present a situation where abolishing them in the interest of gender justice also inadvertently benefits the reactionary side.   
Back to Top