Forecasting can be done by utilizing formal statistical methods employing time series, cross-sectional or Panel data, or alternatively to less formal judgmental methods.
For instance, to forecast data by Time, Time Series Analysis can be used; and to forecast data from different places, Cross Sectional Data Analysis will be helpful.
In order to simplify the theory, let’s use an example of wildlife. Wild animals are perishing due to many natural and artificial reasons from planet earth which is a huge matter of concern.
Example 1: To find out the increase ‘or’ decrease of the Lion population for a specific region in last five years. Time Series Analysis will prove to be the best for forecasting.
Example 2: To study the population of Lions in different regions during the same time frame, we will to use Cross Sectional Analysis.
Example 3: The major problem arises when one tries to study the population of Lions in the past 5 years for different regions. In this instance, pooling the required data is necessary viz. Panel Data.
Simply put: Panel Data Analysis is a combination of Cross Sectional & Time Series Analysis.
In order to analyse similar data, we need to make use of specific tools. Here, I am sharing some specific codes and commands of R-studio.
There are two techniques to analyse Panel Data:
  1. Fixed Effect Models
  2. Random Effect Models
Example: I have inbuilt data for Gasoline consumed by various countries in previous years.
First step is data extraction, once data is extracted; programming can be done to derive the Fixed Effect & Random Effect Models.
The programming is given below:
Data("Gasoline")
Gasoline
Names(Gasoline)
> scatterplot(lgaspcar~year|country, boxplots=FALSE, smooth=TRUE, reg.line=FALSE, data=Gasoline)
ols <-lm(lgaspcar ~lincomep+lrpmg+lcarpcap , data=Gasoline)
summary(ols)
fixed <- plm(lgaspcar ~lincomep+lrpmg+lcarpcap , data=Gasoline, index=c("country", "year"), model="within")
summary(fixed)
fixef(fixed) # Display the fixed effects (constants for each country)
pFtest(fixed, ols) # Testing for fixed effects, null: OLS better than fixed
random <- plm(lgaspcar ~lincomep+lrpmg+lcarpcap , data=Gasoline,index=c("country", "year"), model="random")
summary(random)
We need to review the results in order to find out which model is better suited…
“If the p-value is significant (for example <0.05); then use Fixed Effects. If not; then use the Random Effects.” > phtest(fixed, random)
# Regular OLS (pooling model) using plm > >
pool <- plm(lgaspcar ~lincomep+lrpmg+lcarpcap, data=Gasoline,index=c("country", "year"), model="pooling")
summary(pool)
Similarly, different tests can be utilized in order to check which model is better:
# Breusch-Pagan Lagrange Multiplier for Random Effects. Null is no Panel Effect (i.e. OLS is better).
plmtest(pool, type=c("bp"))
Data Science will remain the sexiest job of the 21st century despite the problems scientists face with preparation of data. However, it’s best for a user to decide which forecasting method will suit their problem.
Panel data lately & largely been used in:
1.Determining of public expenditure on health
2.Health and Growth
   A. Childhood mortality
   B. Economic growth
   C. Income on child health, etc.
3.Determinant of Investment Pattern in different states for different years
4.Rural Poverty Study
Thank you,
Parita Dave
Data Analyst Recruiter @ Apidel Technologies