“TIME” is the single most powerful parameter that defines and refines human life. From discoveries to disasters, monumental achievements to minuscule efforts, time stamps its’ mark on every event recorded in history.
So, to effectively use the ‘Time’ component in predictions, the entire historical data should be divided in to multiple homogeneous groups known as ‘cohorts’.
A cohort is defined as the aggregate of individuals who experienced the same event within the same time interval.
A cohort (ct) can be defined as function of homogeneous groups and parameters
ct = f(gt, pt) t = 1 to N for N cohorts
where ct= the cohort identifier
gt = Homogeneous group of people
pt = the other parameters
Now, to build a forecasting model, we need
- historical data on which cohorts are built
- identifying which cohorts are applicable (fit for purpose) and which are not *
- perform series of iterations to arrive at the final set of ‘cohorts’. **
- dusing the right cohorts to build an effective model and forecast the future behavior.
These ‘cohorts’ will then be used in different modeling techniques, e.g.
- a) Market Basket Analysis – generating association rules using ‘Apriori’ algorithm or ECLAT (Equivalence Class Transformation ) algorithm
- b) Linear and Logistic Regression – forecasting technique and time-series modeling, etc..
Stay tuned for the next series of KYC involving modeling techniques.
* Once the initial cohorts are built, they are analyzed to see whether they are fit for prediction and based on that they will be either included or excluded in the process.
(To eliminate ‘bias’ in the modeling process, cohorts are first built and analyzed to see if they are ‘fit for purpose’.)
** This is an iterative process and other parameters play a significant role in building the ultimate set of ‘cohorts’ that eventually will result in providing us the predictive model which is ‘optimal’ and probably the best fit.
Author – Santanu Mukherjee