In: Physics
Write a statistics paper on the following "For the research paper, you have the freedom to choose any topics related to the concepts, theories, applications or the successful implementation stories related to business analytics (and/or big data)".
ENTROPY ESTIMATION AND APPLICATIONS-
Estimation of entropies of molecules is an important problem in molecular sciences. A commonly used method by molecular scientist is based on the assumption of a multivariate normal distribution for the internal molecular coordinates. For the multivariate normal distribution, we have proposed various estimators of entropy and established their optimum properties. The assumption of a multivariate normal distribution for the internal coordinates of molecules is adequate when the temperature at which the molecule is studied is low, and thus the fluctuations in internal coordinates are small. However, at higher temperatures, the multivariate normal distribution is inadequate as the dihedral angles at higher temperatures exhibit multimodes and skewness in their distribution. Moreover the internal coordinates of molecules are circular variables and thus the assumption of multivariate normality is inappropriate. Therefore a nonparametric and circular statistic approach to the problem of estimation of entropy is desirable. We have adopted a circular nonparametric approach for estimating entropy of a molecule. This approach is getting a lot of attention among molecular scientists.
DATA MINING IN FINANCE-
Economic globalization and evolution of information technology has in recent times accounted for huge volume of financial data being generated and accumulated at an unprecedented pace. Effective and efficient utilization of massive amount of financial data using automated data driven analysis and modelling to help in strategic planning, investment, risk management and other decision-making goals is of critical importance. Data mining techniques have been used to extract hidden patterns and predict future trends and behaviours in financial markets. Data mining is an interdisciplinary field bringing together techniques from machine learning, pattern recognition, statistics, databases and visualization to address the issue of information extraction from such large databases. Advanced statistical, mathematical and artificial intelligence techniques are typically required for mining such data, especially the high frequency financial data. Solving complex financial problems using wavelets, neural networks, genetic algorithms and statistical computational techniques is thus an active area of research for researchers and practitioners.
RANKING AND SELECTION PROBLEMS-
About fifty years ago statistical inference problems were first formulated in the now-familiar “Ranking and Selection” framework. Ranking and selection problems broadly deal with the goal of ordering of different populations in terms of unknown parameters associated with them. We deal with the following aspects of Ranking and Selection Problems:
1. Obtaining optimal ranking and selection procedures using decision theoretic approach;
2. Obtaining optimal ranking and selection procedures under heteroscedasticity;
3. Simultaneous confidence intervals for all distances from the best and/or worst populations, where the best (worst) population is the one corresponding to the largest (smallest) value of the parameter;
4. Estimation of ranked parameters when the ranking between parameters is not known apriori;
5. Estimation of (random) parameters of the populations selected using a given decision rule for ranking and selection problems.
STEP STRESS MODELLING-
Traditionally, life-data analysis involves analysing the time-to-failure data obtained under normal operating conditions. However, such data are difficult to obtain due to long durability of modern days. products, lack of time-gap in designing, manufacturing and actually releasing such products in market, etc. Given these difficulties as well as the ever-increasing need to observe failures of products to better understand their failure modes and their life characteristics in today’s competitive scenario, attempts have been made to devise methods to force these products to fail more quickly than they would under normal use conditions. Various methods have been developed to study this type of “accelerated life testing” (ALT) models. Step-stress modelling is a special case of ALT, where one or more stress factors are applied in a life-testing experiment, which are changed according to pre-decided design. The failure data observed as order statistics are used to estimate parameters of the distribution of failure times under normal operating conditions. The process requires a model relating the level of stress and the parameters of the failure distribution at that stress level. The difficulty level of estimation procedure depends on several factors like, the lifetime distribution and number of parameters thereof, the uncensored or various censoring (Type I, Type II, Hybrid, Progressive, etc.) schemes adopted, the application of non-Bayesian or Bayesian estimation procedures, etc.