Modeling conversion of Television advertisement for Fast Moving Consumer Goods (FMCG) – (Viewer-to-Buyer Conversion)

There are currently more than 107 TV stations in Kenya (with just over 10 free to air dominating the market), a number that has been growing exponentially since 2001. Further, more than 80% of the country’s population has access to a television. Driven by these two factors and the growing economy, advertising revenue for broadcasters has grown threefold from $107 million in 2007 to $359 million in 2013. With all this money being invested into TV advertising by companies, there has been a limited availability and exposure to tools for measuring the return of such huge investments for Ad spots. Research companies have developed tools to test ads and define the qualities of good advertisement, but none has zeroed down on estimating the conversion rates of those exposed to advertisement; the probability of audiences being converted to buyers of the advertised product. With a special focus on Fast moving consumer goods, a generalized linear model is obtained to estimate the probability of conversion from “viewers” to “buyers” for those that have been exposed to a particular TV advertisement. Data for 120 residents of Nairobi is collected. Demographic characteristics, social economic status, exposure, purchase habits and motivators data are collected. A multinomial logistic model was constructed using this data, with the response being a three-level multinomial variable – “Will buy”, “Will consider buying” and “Will not buy”. Six variables significantly influence the conversion of Television ad viewer to buyers – Gender, income/social class, Level of education, total time spent watching TV in a day, main television interest and most important feature of an advertisement. The model was validated by (a) significant test of the overall model, (b) tests of regression coefficients, (c) goodness-of-fit measures, & (d) validation of predicted probabilities. Three methodological issues were highlighted in the discussion: (1) the use of odds ratio, (2) the Hosmer and Lemeshow test extended to multinomial logistic models, and (3) the missing data problem. Believability and relatability of a television advertisement increase the probability of conversion by three times compared to the length/precision aspect. People with either primary or secondary education are also 3 times more likely to be converted compared to those with tertiary education.


Introduction
A general goal of regression analysis is to estimate the association between one or more explanatory variables and a single outcome variable. An outcome variable is presumed to depend in a random fashion or could be systematically predicted by the explanatory variable(s). The explanatory variables are thought to independently affect the outcome variable; hence, they are often known as independent variables. Multiple linear models have several independent/predictor variables which help us understand how these affect other dependent variable(s). We examine through these models, the impact of a variable on the dependent variable when the rest of the predictors are held constant. These simple models are built under a set of assumptions, which must be satisfied prior to fitting [1].
Social and economic data often presents researchers with outcome variables that are not linear, such as survey respondents' choices among two or more options, which require different types of non-linear transformations in order to be estimated. The Ordinary Least Square models (Simple and Multiple) are rendered ineffectual and restrictive in such cases, specifically because they only cater for continuous response variables having a normal distribution and the relationship between the response and predictor variables is a strictly simple/identity function.
Generalized Linear Modeling, a flexible generalization of the ordinary least squares regression, is a common technique used to obtain meaningful results in these cases since it allows for transformations and the response variables can have a distribution other than normal. These models help us accommodate binary, ordered and multinomial dependent variables, count data and positive valued continuous distributions [2]. In other terms, generalized linear models allow for extension of the distribution of the response variable to the exponential family of distributions -For a random variable Y with probability distribution function ( ; ) , where is an unknown parameter; Then the distribution is said to belong to the exponential family of distributions if the probability distribution function can be expressed in a generalized form. The normal, Poisson and Binomial (And by extension, multinomial) distributions can be expressed in a generalized format and belong to the exponential family.

Television Advertising
The very first television advertisement ever broadcasted was on July 1, 1941 during a baseball game on a local New York channel. The 10-second ad advertised Bulova watches and cost a mere four dollars. Due to the overwhelming success of the Bulova advertisement, other companies began to realize that they needed to jump on board with their marketing as well. By 1948, many additional advertisers were using television spots to reach the large audience that owned television sets. Television's spreading popularity merited the formation of the American Association of Advertising Agencies to regulate commercials. Television was so popular during that era that even the movie studios feared that television would dominate all other media.
Locally, in 1959, the Kenya Broadcasting Corporation was founded by the colonial government. By the end of 1962, a transmission station and recording studio had been set up, and television was officially launched the following year, running as an autonomous public corporation. It drew its revenues mostly from advertising. Fast forward to the 1990s and private television stations began as a result of expansion and modernization. Currently, there are more than 12 dominant free to air TV stations in the country, each competing for audiences and advertising dollars subsequently.
With such a large and diverse pool of stations for advertisers to choose from, it is becoming more important for them to evaluate the returns of the money and improve their placement, partly by maximizing on well thought out distribution and placement strategies. Television advertising for consumer goods is done with two general underlying objectives: a) Getting consumer's attention and drawing their attention to the advertised product, thus creating awareness. Building awareness also helps companies to reach out to prospective consumers that either add on to the pool of existing consumers or replace lost ones. b) Prompting immediate action (Call to action) -This is a strategic prompt to consumers to purchase the advertised good, subsequently increasing sales volumes, a measurable metric for Return on Investment. In Kenya, TV remains the most effective channel for advertising, even with increasing traction of the internet.
In a report by media monitoring firm, Reelforge, companies in Kenya splurged Sh85.8 billion on advertising in 2014, compared to Sh79.2 billion in 2013. Of these, Sh41.8 billion went to TV advertising, accounting for 42% of the total advertising revenue. The increase was partly attributed to an increase in rate cards. Further, Pricewaterhouse Coopers (PwC) Kenya, a leading audit and management firm, estimates that the share of TV in advertising revenue is likely to rise to 50% by 2017 (865 million USD). Kenya is believed to have relatively higher rates of advertising agency commission: a TV ad in Kenya costs more than twice the Nigerian price for the same air time. With manufacturers spending approximately more than 400 million USD on TV advertisements, improvement of how they maximize their impacts is paramount. Currently, advertising agencies use qualitative and quantitative techniques to test advertisements, pre and post deployment. This paper uses statistically rigorous modeling to help establish the numeric chances of getting the consumers that are targeted by the TV ad to make purchases. This, unlike most approaches, can help provide an estimate of the expected impact and subsequent return on investment, even before launch.

Regression Analysis in TV Advertising
Modeling market response is intended to help scholars and managers understand how consumers individually and collectively respond to marketing activities, and how competitors interact. Appropriately estimated effects constitute a basis for improved decision making in marketing [3]. Much of marketing decision making is of a repetitive or tactical nature. For example, advertising expenditures, sales promotion budgets, shelf space allocations, prices, margins, etc. have to be determined for each period. The consideration of changes in decisions is facilitated by the development of ever more detailed data, whose availability also makes it easier to justify the use of econometric modeling (e.g. bimonthly audit data would not permit the estimation of deal effect curves). And the increasing frequency and amount of marketplace feedback also demands a systematic approach for data analysis. Standardized models have become important tools to improve the quality of tactical marketing decisions at functional levels such as brand management.

Problem Statement
With 85% TV penetration in Kenya and over $359 million being spent yearly on advertising, manufacturers of fast moving consumer goods and the advertising agencies have tools to test advertisements before going live, but have no rigorous, tested and definitive tools of measuring the conversion rates and probabilities for the target demographics. [4] Used multinomial logistic regression to profile adolescents who were at greatest behavioral risks in Indiana University. They applied the model to a data set that they collected themselves. Results showed that gender, intention to drop from the school, family structure, selfesteem, and emotional risk were effective predictors. The model was then validated by the overall significant test, tests of regression coefficients, goodness-of-fit measures and validation of predicted probabilities. Three methodological issues -use of odds ratio, the Hosmer and Lemeshow test extended to multinomial logistic models and the missing data problem were highlighted in the discussion.

Literature Review
[5] Simply lays out the benefits of Generalized Linear Models -Linear modeling theory provides a platform for choosing appropriate linear combinations of explanatory variables to predict a response. In generalized linear models, statisticians have the added ability to widen the class of distributions to allow for handling of other types of non-normal outcomes. This broad class includes, as special cases, the normal, Bernoulli and Poisson distributions. Further, generalized linear models methods are used vastly in various fields due to the fact that the maximum likelihood estimators can be computed quickly through the iterated reweighted least squares technique.
[6] Used a multinomial linear regression model on a marketing data set that was further analyzed by [7]. The subjects under observation based on Nielsen data consist of n=100 households in Springfield, Missouri. The response of interest was the type of yogurt purchased, consisting of four brands: Yoplait, Dannon, Weight Watchers and Hiland. The households were monitored over a two-year period with the number of purchases ranging from 4 to 185; the total number of purchases was N=2,412. The two marketing variables of interest were PRICE and FEATURES. For the brand purchased, PRICE was recorded as price paid, that is, the shelf price net of the value of coupons redeemed. FEATURES was classified as a binary variable, defined to be one if there was a newspaper feature advertising the brand at time of purchase, and zero otherwise. One brand was the omitted alternative. The results obtained showed that every parameter was statistically significantly different from zero. Thus, the parameter estimates were useful when predicting the probability of choosing a brand of yogurt. Moreover, in a marketing context, the coefficients have important substantive interpretations. The results were interpreted to suggest that a consumer is 1.634 times more likely to purchase a product that is featured in a newspaper ad compared to one that is not. For the PRICE coefficient, a one cent decrease in price suggests that a consumer is1.443 times more likely to purchase a brand of yogurt.
Market response models are intended to help scholars and managers understand how consumers individually and collectively respond to marketing activities, and how competitors interact according to [8]. Appropriately estimated effects constitute a basis for improved decision making in marketing. In the past years market response models have diffused in the practioners' community. Leading firms, especially in consumer goods and services, database marketing companies and traditional market research companies develop and use increasingly sophisticated models and analyses. The successful implementation of models depends on data availability, the methodology used, and other characteristics. It appears that many models appearing in the academic literature have little relation to marketing practice. Such models often deal with specific problems, are more descriptive than prescriptive, and include complexities that reduce the chance of implementation in practice. Academics are challenged to make models that can be easily integrated into real corporate problems, yielding sound solutions.
[9] Imply in their research the near impossibility of measuring returns on advertising using classical theories to measure the causal impact and conversion probabilities. In their paper, it is explained that statistical evidence from the randomized trials is very weak because individuallevel sales are incredibly volatile relative to the per capita cost of a campaign -a "small" impact on a noisy dependent variable can generate positive returns.
According to [10], an accurate television viewing choice model is a critical working tool for both television network executives, who face difficult programming, scheduling and marketing decisions, and advertisers, who want to get the most from their spend. [10] claim that such a model can help television executives maximize ratings by improving both the scheduling and the characteristics of their shows. In addition it can help advertisers predict ratings and the demographic composition of the audiences. Many researchers have developed such rating models [11,12,13], however, the usefulness of these models to advertisers is questionable when there is no accompanying prediction of the frequency of channel switching as an indicator of reduced advertising attention. In this study we develop a model which describes the frequency of channel switching through the use of network loyalty measures.
The demographic and socio-economic characteristics of television viewers play a pivotal role in ascertaining criteria relating to why certain shows are frequently watched. There has been extensive research carried out on understanding the motivations and determinants for TV viewing from a British perspective [12]. Various researchers have over the years also looked for ways to examine the relationship between viewers and programs. [14] Performed demographic comparisons for repeat viewing rates, finding that repeat viewing rates are slightly higher for women and for older people. It is therefore suggested that demographic and socio-economic effects will also impact on the impact of the advertisements as well as the resultant action or otherwise.
Statisticians have used proportional hazard models in the past to predict online advertisement conversions. [15] However used a syndicated approach that married the leading edge of online advertising conversion attribution (Engagement Mapping) to the proportional hazard model, eventually producing a tool that can be used to find optimal settings for advertiser models of online conversion attribution. [16] Discussed three different dimension reduction approaches to fitting logistic regression models in advertising for classification purposes. For Principal Component Logistic Regression (PCLR), [16] proposes two different methods on cross validation -R-statistic and PRESS MSEP -and the average eigenvalue rule for determining the significant set of principal components.
Further he suggests that there exists no ultimate rule for choosing the ultimate the number of Principal Components, and suggests that to take three or at least two criteria into consideration. [16] also introduces and discusses General Partial Least Squares.

Ordinary Least Square Models
In most research projects many researchers often aim to explain a random variable with one or more explanatory variables, . An outcome variable is presumed to be systematically explained by one or more predictor variables. Further, the independent variables are assumed to independently impact on the outcome variable.
The simple linear regression model has one predictor/independent variable and takes the form Where Y is the response variable. X is the predictor variable and is the random error. is a result of both measurement and non-measurement errors, and is assumed to have a normal distribution with mean and variance 2 i.e. ~ ( , ) . The mean of the error is checked for variability, and if found to not be constant, heteroscedascity is inferred. Heteroscedascity must be removed before proceeding. The multiple linear regression is characterized by more than one predictor/independent variable and takes the form Where 1 , 2, … , are the regression parameters/coefficients. This equation is said to be linear in terms of the parameter/coefficients and not the variables.

Generalized Linear Models
As discussed in, a general ordinary least squares (OLS) regression model is of the form The response variable ; 1, 2,.., is modeled as a linear function of predictor variables ; 1, 2,.., well as an error term. The response variable is a continuous variable that has normal distribution, a result of the normal distribution of the error term. Generalized Linear Models (GLM) go beyond this in two major respects: (i) The response variables can have a distribution other than normal; i.e any distribution within a class of distributions known as "exponential family of distributions". (ii) Instead of having we can allow for transformation of the equation into (iii) Generalized linear models are a flexible generalization of ordinary least squares regression that relates the distribution of the response variable to the systematic portion of the experiment (the linear predictor) through a function called the link function. Generalized Linear Models (GLM) have three components:

Random Component
For any random variable Y with probability distribution function ( ; ); where is an unknown parameter; if the probability distribution function can be expressed in the form The distribution then is said to be belong to the exponential family of distributions. Further, if ( ) = , then the distribution is said to be in canonical form and ( ) is known as a natural parameter. Some examples of distributions that belong to exponential family of distributions and can be written in canonical form include the normal and Poisson distributions. Below is an illustration of how the Binomial distribution and by extension, the multinomial distribution falls into the exponential family.
The Binomial distribution

Systematic Component
This is the linear predictor part of the model. It is the quantity which incorporates the information about the predictor variables into the model.
For predictor j and observation i It is related to the expected value of the data through the link function.

Link Function
This function links the systematic component to the random component. It provides for the relationship between the linear predictor and mean of the distribution function.
Let [ ] = and be a monotonic, differentiable function.
If ℊ( ) = then is a link function. The relationship is non-linear.
For this exponential family of distributions that can be written in canonical form, the natural parameter is used as the link function. It is thus known as the canonical link function.
Breslow (1999) lists some of the critical assumptions that underline generalized linear models, many of which apply to any regression model

Logistic Regression Model
Logistic regressions are probability models whose origins can be traced in epidemiological studies. They are now commonly used in a wide range of fields including engineering, social studies, economics and Finance. In contrast to the ordinary Least squares linear regression, where the independent variable is continuous in nature, logistic regression deal with categorical in nature variables, which could either have two (binary) or more (Multinomial) categories.
We can define a binary response variable as If Pr(Y=1) = p and Pr(Y=1) = 1-p, then E[Y] = p. Y is then said to be a Bernoulli random variable. If there are n such variables, 1 , 2 , 3 , … , that are independent with the probability, Pr(Y = 1) = , ∀ , then their joint probability distribution is given by This is a member of the exponential family of distributions.
Subsequently, a generalized linear model can be developed with 1 , 2 , 3 , … , as the response variables.

Multinomial Logistic Regression
For response variables that have no natural order among the categories, like what this paper proposes to use, nominal logistic regression models are used. One category is arbitrarily chosen as the reference category, sometimes with a consideration of how the researcher wants to report. In this case where the response variable has three categories (Buys, considers buying, does not buy), the 'does not buy' category will is used as the reference. The logits for the other categories are then defined by The (J-1) logit equations are used simultaneously to estimate the parameters .In this case, J=2. The linear predictors can be calculated once the parameter estimates have been obtained. From equation (12), The fitted values for each of the covariate patterns can be calculated by multiplying the estimated probabilities � by the total frequency of the covariate pattern.
The Pearson Chi-squared residuals are obtained by Where and are the observed and expected frequencies for = 1,2, … , . N is J times the number of distinct covariate patterns. We can assess the adequacy of the model using the residuals.
The Chi-square statistic is given by 2 2 1 .
The deviance statistic is defined in terms of maximum values of the log-likelihood function for the fitted model, ( ), and for the maximal model ( ). The deviance statistic D describes the lack of fit; the larger the value, the poorer the fit. A significant fit is one in which the test gives a non-significant result.
Alternatively, we can use a null model (one with only the intercept) to test for the adequacy of the model fitted. The hypotheses to carry this test are : The null model (intercept only) is a better fit . : The fitted model is a better fit.
The deviance statistic describes the goodness of fitthe test gives a significant result for a significant fit.
The likelihood ratio chi-square statistic is defines in terms of the maximum value of the log likelihood function for the minimal model. It is given by We can also evaluate the Pseudo  (2002), it is often easier to interpret the effects of explanatory factors in terms of odds ratios than the parameters . More contextually, we consider the three response categories in this project. Further, we consider interpretation of the binary explanatory variable "Remembrance"; if the respondent remembers the last TV advertised they watched, then x=1 and if they do not remember, then x=0. The odds ratio for remembrance for response j (j=2,3) relative to the reference category j=1 is Where and denote the probabilities of response category j (j=2, 3) based on whether remembers or does not remember, respectively. Assuming that "Remembrance" is the only explanatory variable, we have the model The log odds are given by log � 1 � = 0 ℎ = 0, indicating that they do not remember the last TV advertisement watched.
We can also calculate the 95% confidence intervals for . They are given by Confidence intervals that which do not include 1 correspond to values that are significantly different from zero.

Statistical Inference on Regression Coefficients and Coefficient Interpretation
The asymptotic sampling distribution of the estimators of is the normal distribution i.e. (0,1).
We use this sampling distribution to construct the 95% confidence interval and test the hypothesis for the significance of . To test the hypothesis We can use either of the following test statistics If the response variable under evaluation has J categories, we fit J-1 binary logistic models for each of the dummy variables. Each model explains the effect of the predictors on the probability of category , = 2,3, … , , in comparison to the reference category 1 . Each of the J-1 models has its own intercept and regression coefficients.
The Wald test, derived from the sampling distribution of the maximum likelihood function, is used to evaluate whether or not a specific predictor variable is statistically significant in differentiating between the two categories in each of the underlying binary logistic comparisons. It is possible for a predictor variable with an overall relationship with the response variable to either be or not be statistically significant in differentiating between pairs of groups defined by the response variables. Each of the models is interpreted individually. Each of the explanatory variables', effect on the odds ratio is calculated by taking the exponent of the corresponding coefficient . The effect is then interpreted by holding all other variables constant.
Prediction of the probabilities is computed as follows For categories 2, 3, … , J used to fit binary logistic models And for the reference category:

Analysis Software
Data analysis has been carried out using MS Excel, R-GUI and SPSS. Data cleaning, formatting and basic descriptive statistics were done using MS Excel. Model fitting, diagnosis, and prediction were done using R-GUI and SPSS for comparison of results. Both R and SPSS are also used to carry out model significance tests.

Analysis
In this chapter, we analyze the data, fit the best model and interpret the results. The data collected has 12 socialeconomic and TV watching habits variables, one of which will be the response variable. Data for 120 people is used to estimate the model.

Response Variable
The response variable is classified into three categories based on the respondent's history on purchase of fast moving consumer goods as a result of watching the said good's advertisement on television. The descriptive statistics for the response variable are as below About half of the respondents have contemplated purchase of a fast moving consumer good in the past one year after watching the good's advertisement on television but did not do it. The model proposes to use the "No" category as the reference category. Multinomial logistic regression provides an effective and reliable way to obtain the estimated probability of belonging to a specific population (e.g. Buying a product based on its television advertisement) and the estimate of odds ratio of buying or considering to buy compared to not buying at all as a result of various reasons [4,17]. Furthermore, using the multinomial logistic regression procedure yields estimates of the net effects of a set of explanatory variables on the dependent variable can be obtained. In this case, a set of regression coefficients for social demographic and television watching habits variables will be used to obtain the odds ratio and predicted probabilities.

Predictor Variables
From the administered survey, respondents were, in addition to purchase habits, asked for their social and demographic characteristics. 30% of the respondents were female. 68% of the respondents have tertiary education, while 114 out of the 120 respondents have access to a television set in their household, a key variable as this is one of the basis upon which the model is built -a person must have access to a television set to have visibility of advertisements. Additionally, an underlying caveat is that a person who makes a decision to purchase or has been in consideration must remember the last advertisement they watched -this variable is thus not used in the model but used as an indicator of what variable should be considered. The Kenya National Bureau of Statistics in 2010 revised the income groups for both Urban and Rural settings in Kenya. These levels are currently as follows: • Lower income group: Households spending KSh. 23,670 or less per month (they constitute 72.12% of the households). • Middle-income group: Households spending between KSh. 23,671 upto and including KSh. 120,000 per month (they constitute 24.12% of the households). • Upper income group: Households spending above KSh. 120,000 per month (they constitute 3.76% of the households). Respondents in the survey were asked for the income range and grouped into the corresponding groups. 58% of the respondents earn and spend less than 23,670 shillings per month at household level, with the rest spending between 23,671 and 120,000 per month.
Due to the versatility of the advertising industry, one key assumption in regards to television advertising is made, leaving room for further model enhancement. The study and model assumes that all television channels that respondents are exposed to have the same reach and effect, and do not influence the respondent's perception and subsequent decision to purchase or otherwise.
In order to achieve optimal results for a multinomial logistic model, it is paramount to evaluate the matrix/contingency table obtained from the response variable and each of the predictor variables, and ensure that none of the values is less than two. Furthermore, the number of cases per predictor variable is required to be greater than15 for model optimization. In this case, due to the limited number of cases, direct entry with evaluation of the most significant variables and the impact they have on the overall significance of the model is done. This narrows the model to 6 predictor variables, which implies about 20 cases per variable.
Subsequently, assessment of the overall model fit is done with Significance test of the model log likelihood (Change in -2LL), Measures analogous to R² (Cox and Snell R², Nagelkerke R²), classification matrices as a measure of model accuracy, and Numerical Problems checks.

Parameter Estimation and Model Fit Tests
The initial log likelihood value (190.778) is a measure of a model with no independent variables, i.e. only a constant or intercept. The final log likelihood value (159.496) is the measure computed after all of the independent variables have been entered into the logistic regression. The difference between these two measures is the model chi-square value that is tested for statistical significance. This test is analogous to the F-test for R² or change in R² value in multiple regression which tests whether or not the improvement in the model associated with the additional variables is statistically significant. In this problem the model Chi-Square value of 159.496 is also significant at 0.05 since 0.027<0.05. We conclude that there is a significant relationship between the dependent variable and the set of independent variables. Still, the sample used for model construction is somewhat low, leading to a high probability of the design affording low power for detecting effects.  The Cox and Snell R² measure operates like R², with higher values indicating greater model fit. However, this measure is limited in that it cannot reach the maximum value of 1, so Nagelkerke proposed a modification that had the range from 0 to 1. We rely upon Nagelkerke's measure as indicating the strength of the relationship. There is a relationship but not very strong based on the Nagelkerke R² The classification matrix in logistic regression serves the same function as the classification matrix in Multinomial Logistic Regression, i.e. evaluating the accuracy of the model.
If the predicted and actual group memberships are the same, i.e. No and No, Consideration and Consideration, or Yes and Yes, then the prediction is accurate for that case. If predicted group membership and actual group membership are different, the model "misses" for that case. The overall percentage of accurate predictions, 63.2% in this case, is the measure of a model that is relied on most heavily for analysis as well as for Multinomial Logistic Regression because it has a meaning that is readily communicated, i.e. the percentage of cases for which the model predicts accurately.

21
There are many numerical problems that could occur in multinomial logistic regression that are not detected by the statistical package used. These include multi-collinearity among the independent variables, zero cells for a dummy-coded independent variable because all of the subjects have the same value for the variable, and "complete separation" whereby the two groups in the dependent event variable can be perfectly separated by scores on one of the independent variables. All of these problems produce large standard errors (over 2) for the variables included in the analysis and very often produce very large B coefficients as well. If we encounter large standard errors for the predictor variables, we should examine frequency tables, one-way ANOVAs, and correlations for the variables involved to try to identify the source of the problem. None of the standard errors or B coefficients in the fit modelis excessively large, so there is no evidence of a numeric problem with this analysis.

Interpretation and Prediction
The response category "No", containing those that have not purchases a fast moving good as a result of seeing it being advertised on television, is set as the reference category. We first consider the odds of purchasing compared to not purchasing a good based on its television advertisement at all and interpret, then consider the odds of purchasing relative to not purchasing at all. Each predictor variable's effect on the response's odds ratio is interpreted while holding the rest of the variables constant. We then consider a consumer with an assumed set of characteristics and estimate the probability of near conversion or total conversion. Persons with Primary/Secondary education are 36% more likely to consider buying a product over not buying it at all based on its television advertisement compared to those with tertiary education. Monthly Income (High=0, Medium=1) b = −0.506, exp(b) = 0.60 ; Persons with low income levels are 40% less likely to consider buying a product over not buying it entirely based on its television advertisement compared to persons with medium monthly income levels TV hours per day (more than 4 hours=0, {Less than one hour, 2-4 hours} =1) = 0.736, exp( ) = 2.087; Persons who watch less that hour of television everyday are two times more likely to consider purchasing a product over not buying it all based on its television advertisement compared to those that watch television for more than 4 hours a day. = 2.637, exp( ) = 13.969 ; Persons who watch between two and four hours of television everyday are 13 times more likely to consider purchasing a product over not buying it all based on its television advertisement compared to those that watch television for more than 4 hours a day.

Main Television interest (Locals news/Programs=0, {Live events, International news/Programs} =1)
= −1.806, exp( ) = 0.164 ; Those who are most interested in live events on television are 84% less likely to consider purchasing a product over not purchasing it based on its television advertisement compared to those whose preference is local news and programming.
= 0.062, exp( ) = 1.064 ; On the other hand, those who are most interested in international news and programming television are 6% more likely to consider purchasing a product over not purchasing it based on its television advertisement compared to those whose preference is local news and programming. Key feature of a television advertisement (Short and precise=0, {Entertaining, Believable and relatable} =1) = 1.506, exp( ) = 4.511; People who look out for the entertainment aspect of a television advertisement and regard it to be the key element are almost 5 times more likely to consider purchase the advertised product over not purchasing it at all compared to those that regard the length and precision of the advertisement. = 2.687, exp( ) = 14.689 ; Additionally, those that regard the believability and relatability of a good's television advertisement as the most important feature are about 15 times more likely to consider purchasing as opposed to not purchasing the good entirely compared to those that value the length and precision of the advertisement. Believability and relatability of an advertisement, based on this, emerges as prime in getting people to budge and even consider buying a good based on its advertisement. Persons with Primary/Secondary education are 3times more likely to buy an advertised fast moving good over not buying it at all based on its television advertisement compared to those with tertiary education. Monthly Income (High=0, Medium=1) b = −1.096, exp(b) = 0.334; Persons with low monthly income levels are 67% less likely to purchase an advertised fast moving good over not buying it entirely based on its television advertisement compared to persons with medium monthly income levels TV hours per day (more than 4 hours=0, {Less than one hour, 2-4 hours} =1) = 0.734, exp( ) = 2.084; Persons who watch less that hour of television everyday are approximately two times more likely purchase an advertised fast moving good over not buying it all based on its television advertisement compared to those that watch television for more than 4 hours a day. = 1.852, exp( ) = 6.37; Persons who watch between two and four hours of television everyday are 6 times more likely to purchase an advertised fast moving good over not buying it all based on its television advertisement compared to those that watch television for more than 4 hours a day.

Main Television interest (Locals news/Programs=0, {Live events, International news/Programs} =1)
= −0.74, exp( ) = 0.477; Those who are most interested in live events on television are about 50% less likely to purchase an advertised fast moving good over not purchasing it based on its television advertisement compared to those whose main preference is local news and programming.
= 0.374, exp( ) = 1.453; On the contrary, those who are most interested in international news and programming television are 50% more likely to purchase the advertised fast moving good over not purchasing it based on its television advertisement compared to those whose preference is local news and programming. Key feature of a television advertisement (Short and precise=0, {Entertaining, Believable and relatable} =1) = 1.059, exp( ) = 2.884 ; People who value the entertainment aspect of a television advertisement and regard it to be the key element are approximately 3 times more likely to purchase the advertised fast moving good over not purchasing it at all compared to those that regard the length and precision of the advertisement. = 1.786, exp( ) = 5.967; Additionally, those that regard the believability and relatability of a fast moving good's television advertisement as the most important feature are about 6 times more likely to consider purchasing as opposed to not purchasing the good entirely compared to those that value the length and precision of the advertisement. Believability and relatability of an advertisement, based on this, again emerges as prime in getting people to purchase the advertised good.
We can calculate the probabilities of each response category given a set of predictor variable categories in which a consumer falls.
For the Purchase consideration and Purchase categories, this is given by Consider a female consumer with access to a television in the house hold and remembers the last fast moving consumer good advertisement she watched on television. The consumer has gone up to university and earns a medium level monthly income. She is employed and is able to watch television for about two and a half hours every day, with her main preference being local news and programming. She treasures the ability of a television advertisement to be believed and directly relatable to her day to day life. We calculate the probability of converting her from a viewer of an advertisement for any particular fast moving consumer goods into a buyer of the product.
There is a 40% chance of converting her from a viewer to a buyer.

Conclusions and Recommendations
The six response variables used in the model can be grouped into two -Social economic (Gender, education level, Income level) and television watching habits/variables (TV hours per day, main television interest, most important television advertisement feature). Overall these variables provide a good fit based on all statistical evaluation criteria. Some of the individual variables at sub-model level are not significance and this is mostly attributed to the associated small cell counts. Males are more likely to just consider purchasing a product but not go ahead with the purchase as opposed to women, who are more likely to be converted from advertisement viewers to buyers. Furthermore, holding other variables constant, persons with Secondary education are more likely to be converted from viewers to buyers compared to those with tertiary education. This could be a case of the more educated being more cautious and critical in the face of decisions regarding the purchase decisions. Controlling for all other variables, medium income earners are more likely to consider purchase of advertised goods or even actually purchasing compared to low income earners. With the middle class in Kenya growing and spurring the growth of various consumer goods industries, this is a key aspect that would measurably target the middle income earners and subsequently increase the conversion rates of advertisements.
People who watch television for an average of between two and four hours a day are also more likely to be converted from viewers to buyers of advertised fast moving consumer goods compared to those that watch too little (Less than two hours) or too much (more than 4 hours). Further, consumers that are more interested in international programming and events as well as local programming are more likely to be converted compared to those that prefer live events on television. It is evident based on the results that a believable and relatable television advertisement compared to a short/precise one or even an entertaining one is more likely to provoke a viewer of a good's advertisement to consider purchasing it or actually purchasing it, holding all other variables constant. A believable advertisement is momentous. Bearing the effect of the social economic and television watching habits variables used in the model on the conversion of viewers to buyers, it is perceptible that it is statistically feasible to measure the conversion probabilities of advertisement and make more effective decisions based on the target audience. Advertisers should thus consider utilization of models such as this in addition to the conventional advertisement testing techniques to achieve more precision and rigor in the various advertisement campaigns. This in turn reaps them more benefit for the advertising money spent to buy space on television.

Limitations and Future Research
The results from this model and other researchers' experiences indicate some limitations of the multinomial logistic model fit that pose challenges: • The residuals cannot be normally distributed (OLS assumption) • Losing information, data, and power -predictor and sometime even the response categories have to be lumped together especially when the sample is not large enough • The coding is completely arbitrary i.e. recoding the dependent variable can give very different results. Moreover, due to the fact data used to generate and test this model is based on only 120 participants, there are limitations on the response variables that can be used. Further research using a bigger pool of responses as well as inclusion of other variables that are assumed in this study is required to have a fully industry and academic certified model. This just forms the premise.