ISSN (Print) - 0012-9976 | ISSN (Online) - 2349-8846

A+| A| A-

A Framework for a Securities Market Database in India

This paper presents a framework for developing a reliable, transparent and flexible network to disseminate Indian financial market data and analysis to the global financial community. The analysis that this network would support is valuable to policymakers, academics and market participants in the areas of accounting, banking, economics, finance, management and public policy. The proposed database has the following features: (a) research friendly focus, (b) web enabled single window access, (c) multiple data set coverage (company, industry, security, investor and institutional data), and (d) data manipulation utilities


SPECIAL ARTICLEdecember 1, 2007 Economic & Political Weekly82However, such a data infrastructure is vital for educating India’s rapidly increasing investor base as well as for generating reliable analysis to inform the Indian and global finance community. An infrastructure for finance data is already in place in other regional markets. We view our proposal as much more comprehensive, integrating data from equity, debt, derivative markets in India. Such an undertaking is also consistent with the efforts of the recent high-level committee that proposes the transformation of Mumbai as an international finance centre. We believe that the time is right to create such a network in India.Throughout this paper we use the term “database” to refer to a specific set of data that is internally created or made available from a vendor, organisation or government agency. We use the term “network” to refer to all the different databases, the associated software and the documentation that is created to handle them. Following the introduction, the rest of this paper is organised into seven sections. Section 2 provides the motivation behind this project in terms of the policy initiatives it can address.Section3briefly surveys the evolution of such networks in the US.Section 4 de-scribes the different types of users that we expect this network to attract. Section 5 discusses issues that arise in designing such a network from the perspective of data quality, data structure and data access. Section 6 discusses the components of the proposed network. Section 7 describes the benefits of single window access to such a network and Section 8 concludes. 2 Motivation Finance research in India is at an early stage. Several institutions have recently launched finance journals. Many Indian finance faculty have expressed their frustration with the poor quality and reliability of Indian data. Barua and Verma (2006) highlight the issues involved in the construction of a long time-series of Indian equity index returns. India-based scholars also despair of their inability to publish their work in western finance journals and confess that this is partly due to the perception that Indian data is unreliable and unvalidated. In addition to academic research, there are several policy-oriented studies typically commissioned by high-level expert committees which require fact-based analyses to guide their policymaking. Here again, the onus is on the researcher to obtain the data and conduct the analysis. In manycases, access to data is hampered by sensitivity concerns – althoughthis is easily remedied by disguising the identity of marketparticipants and/or releasing data with a lag. The common approach to academic research in India has been to apply finance theories and techniques from western developed markets to study the Indian experience.1 While this is a legiti-mate direction for inquiry in an emerging market, there are also several unique Indian practices whose investigation would be-come feasible with this network. Below we identify some of these research areas. The purpose is not to attempt a taxonomy of Indian research/policy issues, but rather to showcase the possibilities. 2.1 EquityMarketsFirst, the open electronic limit order book (ELOB) of the NSE makesit possible to observe the entire demand and supply curve for traded securities – one not available in any other market in the world. Understanding the shape of these curves has important implica-tions for the pricing of financial assets and the efficiency of markets. Studies using this information would enable a better assessment of investor tolerances for risk and enable the identification of be-havioural biases that potentially impact their investment activity. Second, the transaction-level data provided by the NSE also fa-cilitates deeper inquiries into the trading activities of investors, both retail and institutional as well as domestic and foreign with consequent implications for a wide range of regulatory and policy initiatives. We expand on one persistent policy question – the role of foreign institutional investors (FII). Monitoring, managing and encouragingFII activity involves a trade-off between the benefi-cial effects of the liquidity they provide by entry with the increase in volatility that can be caused in the event of their correlated exit. SomeFII are classic long-term investors whose presence can exert discipline on managers and positively impact governance. Others are rapid traders whose herding behaviour and actions can destabilise markets. As Indian foreign exchange reserves rise and the appeal of India as a destination for equity investors worldwide continues to grow, one would expect the concerns about the correlated exit of investors to abate. It could also be argued that as the participation of domestic institutions on equity markets increases, price volatility concerns around FII flight are likely to be mitigated.2 Therefore, anyexaminationofFII entry and exit should also be supplementedwith the behaviour of domestic institutions at that time. Moreover, we also believe that an exami-nation of both the short-term andlong-term effects of institutional inflows and outflows from Indianequitymarkets iswarranted. Flows that are temporary in nature and perhaps arise from peri-odic (and often violent) corrections in markets are of greater con-cern if their nature is permanent and not transitory. The database set up that we propose will permit an exhaustive investigation of these flows at the transaction level as well as using index returns, individual security returns, and extreme returns to evaluate market impacts. Third, even the breadth of the equity ownership structure of mutual funds is not widely understood in India. An understand-ing of the distribution of retail investor versus institutional par-ticipation in domestic mutual funds is critical to the market de-velopment role that regulators in India have assumed. 2.2 Debt and Derivative MarketsPolicymakers have spent considerable time and energy attempt-ing to develop secondary corporate bond markets in India. The focus has been on reforming institutions and designing appropri-ate trading and settlement systems and this has resulted in a vi-brant market in government debt. The primary corporate debt market has grown steadily with a significant portion of the debt being privately placed-ostensibly because of the costs of going public and associated disclosure norms. The secondary market has remained largely quiescent. Concerns about market breadth, institutional domination of the market and the lack of retail in-vestor participation are discussed in the policy space with several prescriptions being suggested [see Anshuman 2004]. However, there are no reliable estimates of the outstanding stock of corporate debt nor any sense of its ownership structure
SPECIAL ARTICLEEconomic & Political Weekly december 1, 200783across different market participants. Few detailed studies of the financing patterns of publicly traded Indian firms are available along with an understanding of the range of credit ratings for corporate debt in India. All of these are critical inputs into gener-ating appropriate reforms and all these initiatives would benefit from a data driven analysis.On the derivatives front, Indian market participants have em-braced single-stock futures to a much larger extent than any-where else in the world, perhaps because of some familiarity with the badla system of the past. This market provides an excellent laboratory for examining the role of futures in price discovery. As another example, in May 2007, the Reserve Bank of India intro-duced revised guidelines for transactions in credit default swaps essentially limited to entities wishing to hedge credit exposure. The caution of the RBI appears almost prescient in light of the liquidity crunch in credit markets in August 2007, triggered by collapsing hedge funds in the US and Europe and spilling over into money markets. This required global central bank interven-tions on a large scale and aftershocks will continue to reverber-ate globally for some time. In light of these events, debates re-garding financial reform in emerging markets, particularly in the area of asset securitisation have already begun to take place and the likelihood of regulatory fallout appears fairly high. Our point here is that all sides of this debate would be better served if data, in this instance of swap spreads and yield spreads in global and emerging markets be made readily available. Analysis of the short-run and long-run behaviour of such spreads and their speed of recovery from disruptions can only make better policy. 2.3 Legal and Other Areas Data fromNSDL andCCIL provide a wealth of information that is also not easily available outside India. Another area of inquiry is the market response to regulatory actions that are periodically pursued by Securities and Exchange Board of India (SEBI). How significantly does the market view these actions? Are repeat vio-lators subject to harsher market penalties for non-compliance? Employing data to investigate such questions will potentially lead to the design of more efficient enforcement mechanisms. We are confident that the network we propose would easily at-tract interested scholars from around the world to participate in mutually beneficial activities. 3 Global Perspective on Data NetworksIn theUS in the 1950s, academic finance was a collection of heuristics that market participants used to make investment decisions. Four economists at the University of Chicago were approached by Merrill Lynch and Associates to design, construct and maintain a data base of security prices. This entity, named the Centre for Research in Security Prices (CRSP) essentially spawned modern empirical finance as we know it today. One of the original four economists, Eugene Fama, is widely believed to be in contention for the Nobel prize in economics.3 As interest in finance evolved in western economies, informa-tion providers generated data to initially service the money man-agement industry, with company-specific information, analyst research reports, and regulatory filings all being made available in electronic form to subscribers. Data for research use was diffi-cult to obtain and, when available, not easy to handle. Despite these limitations, research contributions in finance continued to shape public policy and guide market surveillance. Two notable recent examples are:(a) The finding of Christie, Harris and Schultz (1994) that market makers on theNASDAQ avoid odd-eighth quotes, thereby system-atically increasing investor trading costs. The study was pivotal in moving financial markets towards decimal prices. (b) Lie (2005) and Lie and Heron (2007) identify the widespread practice of falsifying the dates that options were issued to corpo-rate executives (backdating) to boost the value of their equity holdings. This has resulted in several criminal indictments and changes to the governance process underlying executive com-pensation in the US.4 In theUS, commonly used data bases are CRSP, S&P Compus-tat (firm level accounting data), the NASDAQ and the NYSE’s transactions and quotes (TAQ) data base, Zacks/IBES (analyst information) and the SEC’s Edgar (company filings) data base. For academic users, the Wharton Research Data Base (WRDS) at the University of Pennsylvania integrates several of these products and delivers data in a format of the user’s choice – spreadsheets, SAS data sets, or flat ASCII files. The Pacific Basin region has a database calledPACAP that offers features similar to the one we propose. This data base covers data from nine regions: China, (and Hong Kong), Indonesia, Japan, Korea, Malaysia, Singapore, Taiwan and Thailand. More recently, the Chinese authorities have taken up the the task of database creation on a war-footing. Tsinghua University has created the China Financial Research Database, which pro-vides a platform for users to access equity market and audited company financial information of Chinese companies. At present only company-level and equity market data are available. It is anticipated that this single initiative is going to exponentially increase research on Chinese firms in the next decade. 4 Potential Users of the NetworkThere are at least three broad categories of users that would benefit from being able to access the network we propose. The first are scholars in economics and finance at academic institu-tions both in India and overseas who would use the network to complement education and research activities. On the education front, the network would provide critical supporting data to train students and market professionals and contribute to-wardsincreasing investor education and literacy. On the research front, data of this sort with facilitate a fact-based approach to policymaking and indeed contribute proactively to a broader policy agenda. A second group of users for this network are decision-makers at government agencies such asSEBI, RBI, the ministry of finance, and the Insurance Regulatory and Development Agency (IRDA) to name a few. The most common concern articulated by indi-viduals in the policy space is the dearth of fact-based analysis that would enable them to make more informed decisions and recommendations.5 Policymakers wishing to craft processes and procedures consistent with liberalisation goals as well as others

All holders

Promoters Non Promoters Indian Foreign Institution Non Institution Individual Mutual funds Corporate Government Corporate FII Individual Institutional Many others Other Bank Other Other
SPECIAL ARTICLEdecember 1, 2007 Economic & Political Weekly86between different units can become very complex. The challenge is to design an optimal framework of data structures that pro-vides the best mode of data extraction. We illustrate this chal-lenge by asking a sequence of incrementally more specific ques-tions in the next section. Illustrating the Importance of Data Linkages: First consider an effort to generate the aggregate statistics on FII activity re-ported on theSEBI web site. This data is presently available from theNSE as two files, one with all the transactions on the ex-change in a particular month, and another with a list of client codes for all traders (including FII) trading in that month. If these were stored in a relational database with the client code as the common indexing variable, then this query would involve taking a trade from the transactions file and checking it for FII status on the client codes file. Whenever there is a match, the associated price and volume are multiplied and then summed up over the month to generate the requisite result. Next consider a slightly more specific question. Suppose one wants to determine whether FII are more active in large-capitali-sation high-dividend paying stocks in a given year. In addition to the two files above, this requires data on the universe of stocks (indexed by company code) along with their associated market capitalisations and dividend payments. Then these would have to be sorted into (say) quintiles by market capitalisation and then, within it, by dividend payout. Assuming that these are built into the relational database, one queries it again, but this time based on two indexing variables – the company code and the FII client code – before extracting the required information. Suppose we modify the question still further and ask whether the hypothesisedFII strategy of being active in large-capitalisa-tion, high dividend paying stocks is profitable. Such a question is more akin to a simple research investigation. In addition to the first two steps above, answering this question requires that his-torical returns data on the individual stocks be obtained for some chosen period, which in-turn requires accessing the appropriate database of prices and converting them to returns.6 Then, port-folios of stocks in the different capitalisation and dividend quintiles must be created. Then a model that measures what the normal risk-adjusted rate of return for such stocks must be estimated.Only then can we determine whether the strategy generates profits beyond what is “normal”. The point of this illustration is to emphasise that there are complex relationships between the data that different users will want to exploit.7 If the end-users have a set of standardised que-ries, one can design the database to accommodate specific rela-tionships across data files while minimising data access times. This would satisfy a large set of commercial users such as com-pany analysts. Research queries tend to be much more complex and often involve imposing several conditions (such as data for large-capitalisation and large dividend paying companies only) on the data being extracted. Moreover, a good research design calls for these conditions to be varied within the same research question. Therefore, by their very nature, researchers’ queries cannot be anticipated before hand. However carefully the database is designed, it is simply not credible to expect that all possible relationships would have been anticipated at the crea-tion stage. This is tantamount to anticipating all possible research questions that could ever emerge!In general, users can be classified along the dimension of query complexity. Commercial users have queries with low complexity. Academic researchers have queries with high complexity. In between there are a wide range of users with varying degree of query complexity. The time to access data depends on the degreeto which relationships have already been incorporated in the database design. For queries with low complexity, time to access data can be minimal. For more complex queries, the timetoaccessdata also depends on the extent to which the data-base design provides for data manipulation capabilities because it just not possible to pre-specify all possible relationships (giv-en current levels of computing power). For researchers, data manipulation capabilities can significantly reduce the effective time to access data.The optimal framework for designing financial databases should reflect the trade-offs between specifying relationships ex ante versus providing data manipulation flexibility. For academics, the trade-off should be tilted in favour of data manipulation capabilities. This can be achieved by specifying a basic set of relationships between data units and then using programming to manipulate data. In short, what is required is a minimal form of a relational database system that stores data in a way that is consistent with the requirements of a programming environment. 5.3 DataStorage and Retrieval It is reasonable to expect considerable variation in terms of the types of data that end-users will require as well as in their level of computer literacy. For educational purposes, extracting aggre-gate data with some graphing capabilities found on spreadsheets will mostly suffice. Other end-users will require data simultane-ously from several databases. One can also expect considerable variation in computer literacy. End-users will range from novices with little knowledge of computers and databases to empirical researchers with a variety of data manipulation skills. For novice end-users, the data retrieval method of choice is the relational data base driven by a query language likeSQL. The needs of the research end-user have tended to be met via the cre-ation of large “flat” files, i e, files that have a minimal core set of relationships embedded in them, and permit other kinds of re-lationships to be created at the user’s discretion. Some simple access programmes enable speedy delivery of data, with the pre-sumption that the end-user will process it further. The challenge to any network design is to serve as many different types of end-users and end-uses as possible. Expanding the set of relationships (indexing variables) embedded in anSQL-type relational data-base greatly increases its size, slows retrieval speeds and may not serve some users at all. Flat file storage on the other hand calls for considerable programming experience. While the SQL-type relational database system is ideal for quick access to pre-defined problems, it may actually result in slower access for researchers because of its static nature. The research community views data manipulation capabilities as a key

Equity prices

Bond prices

Money market instruments, prices and yields

Corporate filings

Initial public offerings

Executive compensation

Mutual funds data

Ownership structure (equities, debt)

Depository data

Corporate actions

Private equity and venture capital

Foreign exchange date

Derivative markets data

Market surveillance


Macro economic data

Micro-structure, tick-by-tick data

ajayshahblog.blogspot. com/2006/09/public-policy-research-in-india.html.

Dear reader,

To continue reading, become a subscriber.

Explore our attractive subscription offers.

Click here


(-) Hide

EPW looks forward to your comments. Please note that comments are moderated as per our comments policy. They may take some time to appear. A comment, if suitable, may be selected for publication in the Letters pages of EPW.

Back to Top