Quantitative Business research model on Coffee shop comparison


      India has always been predominantly a tea drinking nation. Coffee had been only moderately popular in some southern states. However, there has been a sudden change in this trend with coffee becoming more and more popular in recent times especially among the youth. Thanks to the new entrants in the segment including Barista, Café Coffee Day (CCD) and others.
      Cafés are increasingly becoming more than places to sip coffee. A lot many things in life and work happen over a cup these days. India has now become one of the fastest growing coffee markets in the world. It is taking great strides on both counts; making its presence in the world market as well as in the domestic retail arena as more and more Indians prefer the drink.
       A single visit to a shopping mall, high street or a neighborhood market, school, college, hospital or any public place makes us realize the growing popularity of coffee shops in India. The brands available are not just the local brands, rather lot of international brands have also managed to grab the share of coffee market in India. Whether it is a school or a college friends meet, an official interview/meeting or even sharing some moments of happiness with loved ones, the best place to meet is over a cup of aromatic coffee. Coffee shops have become the most popular places to relax after a grilling day at work. Not only are café shops pleasurable for working class, but it is heaven for shoppers who can rest their feet in a café after a hectic day of shopping. All these benefits have led a large number of national as well as international café chains to expand their network via the franchise route. Let us find out the factors that have encouraged the growth of café brands before reading about these brands and their success journey.

Encouraging factors

  • Dominance of youth segment: Beverages like coffee is especially preferred by youth. About 50 per cent of the county’s population is 25 years or younger. This amounts to the growing popularity of cafes. This trend is sure to pick up in future because researches show that the youth population is expected to reach 55 per cent by 2015.
  • Increase in disposable income: Students get a higher pocket money to spend on cafes with their friends. Earning class also does not mind to spend a few extra bucks on having a delicious cup of coffee in comfortable ambience of a café.
  • Café’s as a social hub: With coffee culture brewing up pan India, it has become a great social hub for all kind of people to chat and relax over a cup of coffee.
  • Increase in private offices: The increasing MNC culture has also helped in the popularity of café chains. Working professionals usually prefer to hang out at these places during their free time.
  • Low cost kiosks: The foremost factor for growth of coffee culture is its low cost benefits. Any interested entrepreneur can take a coffee kiosk due to low investment needed.
  • Availability factor:  Café outlets or kiosks are available at mostly all places. Customers can also opt for ‘take-away coffee’ if they are running out of time due to its easy availability. 
  • Franchisees pan India. It popularly operates through its kiosks which require 50-100 sq.ft area and a location that has high footfalls.
  • International players ready for foray. These are few of the brands which have become popular in café industry via the franchise route. Costa Coffee, Gloria Jean’s Coffee is some international brands which have made their mark in India.

            With the purpose to investigate these precise factors that affect attitude and buying behavior of type of consumers mainly the ones who prefer homemade coffee or branded coffee shops in India

Factor Analysis:

Factor Analysis is done to basically identify the important factors or variables that influence the measuring variable.

In this project, Factor Analysis is done to identify the important factors that influence the consumer preference towards the coffee shop.

The variables that have been chosen for analysis are as follows:

Preference (Preference of Coffee shop over Home-made coffee)

Frequency (How frequently people drink coffee in a coffee shop)

Reason (Reason as to why do people go to coffee shop)

Beverages (Beverages other than coffee that people like to have in a coffee shop)

Knick Knacks (Food items that people like to eat along with coffee)

Premium Amount (Do people prefer to pay a premium amount in a branded coffee shop)

The Factor analysis has been done using SAS Enterprise Guide 4.2 and the analysis of the tables and results obtained are as follows:

KMO value:

KMO value is a measure of adequacy. It is a measure that tells whether the number of samples taken for analysis is sufficient or not. If the KMO value is greater than 0.5, the number of samples is sufficient. Else the analysis has to be repeated by increasing the number of samples.

Kaiser’s Measure of Sampling Adequacy: Overall MSA = 0.59658972
Preference Frequency Reason Premium Amount Knick Knacks Beverages
0.63089608 0.59348846 0.56644823 0.60319500 0.67119815 0.46908893

             Table no. 1

The above mentioned table indicates the KMO factor obtained after the analysis with 200 samples.

Since the KMO value obtained (0.59658972) is greater than 0.5, the number of samples taken for the analysis is sufficient.

Eigen Values:

Eigen value is a measure of sum of variances of the variables present in a factor. If the Eigen value for a factor is greater than 1, it means that the factor is significant else it can be ignored.

Eigen values of the Correlation Matrix: Total
= 6 Average = 1
  Eigen value Difference Proportion Cumulative
1 1.62876265 0.46635549 0.2715 0.2715
2 1.16240716 0.23130315 0.1937 0.4652
3 0.93110401 0.03803337 0.1552 0.6204
4 0.89307064 0.17436043 0.1488 0.7692
5 0.71871021 0.05276489 0.1198 0.8890
6 0.66594533   0.1110 1.0000

Table no. 2

The above mentioned table shows that the Eigen values for two factors are greater than one. Hence only two factors will be retained by the MINEIGEN criterion and the rest would be ignored.

Scree Plot:

Scree plot shows the number of factors that are significant.

Graph 1

Factor Pattern:

Factor Pattern is a matrix showing the factor loadings i.e. the variances between the variables and the factors.

Factor Pattern
  Factor1 Factor2
Preference 0.71016 -0.02296
Frequency 0.53154 0.37027
Reason -0.29179 -0.58213
Premium Amount 0.69153 -0.12074
Knick Knacks -0.47445 0.17219
Beverages -0.23117 0.80105

              Table no. 3

The above mentioned table shows the factor loadings between all the six variables and the two factors.

Rotated Factor Pattern:

The rotated factor pattern is obtained by rotating the factor pattern along the 90 degree axis. This is done to remove the effect of unwanted variable i.e. the variables with least correlation.

The rotated factor pattern can be used to assign the variables to the suitable factors.

Rotated Factor Pattern
  Factor1 Factor2
Preference 0.69585 0.14369
Frequency 0.43024 0.48427
Reason -0.14762 -0.63421
Premium Amount 0.70059 0.04427
Knick Knacks -0.50156 0.05650
Beverages -0.41203 0.72481

             Table no. 4

In the above mentioned table, we can see the factor loadings of the six variables with the two factors. Hence the variables can be assigned to the suitable factors in the following manner:

Factor1                                               Factor2

Preference                                           Frequency

Premium Amount                               Reason

Knick Knacks                                     Beverages

The above mentioned factors can be named based on the characteristics of the variables lying underneath.

Factor 1 can be named as fondness related variables since the variables are related to what does a consumer like or prefer in a coffee shop.

Factor 2 can be named as intellection related variables since the variables are related to when do a consumer go to a coffee shop.

Factor Scoring Coefficients:

This is a measure of the importance of each variable i.e. how much does a variable influence the measuring factor.

This can be calculated by the sum-product of each standardized scoring coefficient with its factor pattern.

Standardized Scoring Coefficients
  Factor1 Factor2
Preference 0.42855 0.08272
Frequency 0.24283 0.38601
Reason -0.05711 -0.52880
Premium Amount 0.43709 -0.00173
Knick Knacks -0.31786 0.07592
  Beverages -0.29910 0.63685

Table no. 5

Factor Pattern
  Factor1 Factor2
Preference 0.71016 -0.02296
Frequency 0.53154 0.37027
Reason -0.29179 -0.58213
Premium Amount 0.69153 -0.12074
Knick Knacks -0.47445 0.17219
Beverages -0.23117 0.80105

Table no. 6

Preference                               0.3043

Frequency                               0.1291

Reason                                    0.0167

Premium                                  0.3023

Knick Knacks                         0.1508

Beverages                                0.0691

Hence from the values mentioned above it is clear that the Preference of whether people prefer coffee shop or home-made coffee influences the consumer behavior most towards coffee shop.

Hence the factor analysis was helpful in identifying the factors that influence the consumers’ preference towards the coffee shop.

Discriminant analysis

1. This table simply gives information about the sample size, number of independent variables and categories or groups of dependent variable.

Total Sample Size 200 DF Total 199
Variables 6 DF Within Classes 194
Classes 6 DF Between Classes 5

2. This table indicates missing values if any. Since no. of observations = no. of observations used there are no cases of missing values here.

Number of Observations Read 200
Number of Observations Used 200

3. This table gives information about the dependent variable in particular. Since it was assumed that number of observations is equal in all the categories hence prior probability = 0.166667.  Prior probability by default is set to 0.5 when we do not have information on the possible proportional division of categories of the sample in hand. If we have prior information then SAS have options to set it proportionately as per sample characteristics.

Class Level Information
Among the following, my fav_0001 Variable
Frequency Weight Proportion Prior
BARISTA BARISTA 41 41.0000 0.205000 0.166667
CCD CCD 82 82.0000 0.410000 0.166667
COSTA COFFEE COSTA COFFEE 42 42.0000 0.210000 0.166667
FIESTA FIESTA 7 7.0000 0.035000 0.166667
TESTA ROSSA CAFFÈ TESTA ROSSA CAFFÈ 21 21.0000 0.105000 0.166667

Table no. 7

4. The table below is equivalent to the “log determinants” table of SPSS. The difference being in SPSS there are three rows of data whereas SAS gives data only on the last row of that table. “There are NO BOX’s M Test results in SAS EG output”.

Pooled Covariance Matrix
Matrix Rank
Natural Log of the
Determinant of the
Covariance Matrix
6 3.93903

Table no. 8

5. The table below is similar to “Tests of equality of group means” of SPSS. Ignore the Total SD, Pooled SD and Between SD columns. They are not of much use here. Concentrate on the last column of Pr>F. It is same as Sig column of SPSS. This column actually indicates p values. As it is seen price, taste, quantity, ambience, variety and location all are significant. This table provides strong statistical evidence of significant differences between means of six categories of the dependent variable for all the independent variables. That means all of them are assisting discriminating the dependent variable categories. Next, the values of the R – Square column are taken into consideration. (If one subtracts “R-Square value from 1 you get Wilks’ Lambda” values for individual variables.) R – Square value indicates how much a single independent variable explains the proportion discrimination among dependent variables categories. For e.g. Ambience explains 5.98% of discrimination in the dependent variable (it is also the strongest discriminating independent variable

Univariate Test Statistics
F Statistics, Num DF=5, Den DF=194
Variable Total
R-Square R-Square
/ (1-RSq)
F Value Pr > F
price 1.6265 1.6189 0.3288 0.0342 0.0354 1.37 0.2354
taste 1.2194 1.2226 0.1886 0.0200 0.0205 0.79 0.5555
quantity 1.6134 1.6250 0.1853 0.0110 0.0112 0.43 0.8250
ambience 1.4240 1.4009 0.3696 0.0564 0.0598 2.32 0.0448
variety 1.4886 1.4876 0.2647 0.0265 0.0272 1.06 0.3867
location 1.6289 1.6390 0.2023 0.0129 0.0131 0.51 0.7702

Table no. 9

6. The below table is similar to the “Eigen values table” of SPSS. The first column of canonical correlation needs to be analyzed, the square value of 0.292793 = 0.085728 indicates the squared canonical correlation column. This value indicates that the proposed discriminant function model explains 8.57% of the discrimination that exists between the categories of the dependent variable. Another way of saying (more technical and appropriate) is: approximately 8.57% of variance in the Discriminant scores is explained by the differences among the groups.

Eigen values of Inv(E)*H
= CanRsq/(1-CanRsq)
Test of H0: The canonical correlations in the current row and all that follow are zero
Eigen value Difference Proportion Cumulative Likelihood
F Value
Num DF Den DF Pr > F
1 0.292793 0.186753 0.064811 0.085728 0.0938 0.0347 0.4788 0.4788 0.82730832 1.23 30 758 0.1896
2 0.236103 0.176764 0.066936 0.055745 0.0590 0.0326 0.3014 0.7802 0.90488171 0.97 20 631.11 0.5031
3 0.160412 . 0.069064 0.025732 0.0264 0.0113 0.1349 0.9150 0.95830193 0.68 12 505.63 0.7677
4 0.122056 . 0.069832 0.014898 0.0151 0.0136 0.0772 0.9923 0.98361234 0.53 6 384 0.7848
5 0.038891 . 0.070781 0.001513 0.0015   0.0077 1.0000 0.99848748 0.15 2 193 0.8641

Table no. 10

7. The below table is called Discriminant loadings matrix similar to interpretation as factor loadings. It represents the correlation of each predictor variable with the Discriminant function. It is preferable to comment on the strength of the predictors to discriminate among groups based on structure matrix table as it is considered to be more accurate and free from multicollinearity issues that may be there among variables. The naming of discriminating factor is done depending on the variables which load highly on to the discriminating function.

This table is seen in combination with the Univariate Test Statistics table. First the significant discriminating variables are determined then the discriminant loadings are checked to comment on the strength of the individual variable’s discriminating power.

Pooled Within Canonical Structure
Variable Can1 Can2 Can3 Can4 Can5
price 0.199888 0.688744 0.055365 0.459585 0.516915
taste -0.139108 -0.218914 -0.743423 -0.247219 0.435684
quantity 0.041338 0.319034 -0.366515 0.269634 -0.481089
ambience -0.782018 -0.201699 0.022977 0.008311 0.163086
variety 0.119532 -0.633160 -0.205030 0.259332 0.199497
location 0.183970 0.151962 0.346928 -0.592870 0.192306

8. The Standardized coefficients allow comparing variables measured on different scales. The coefficients with large absolute values correspond to variables with greater discriminating ability.

Pooled Within-Class Standardized Canonical Coefficients
Variable Can1 Can2 Can3 Can4 Can5
price 0.085171068 0.519161498 0.103813766 0.545930743 0.702351598
taste -0.047678641 0.136660609 -0.997160810 -0.526994939 0.423073841
quantity 0.048332853 0.439269152 -0.408407886 0.149549382 -0.750017915
ambience -1.081197585 0.052523964 0.320821120 0.173011118 0.130047778
variety 0.558835378 -0.811436501 0.251745336 0.628704704 0.229370505
location 0.337174077 0.191037661 0.425150885 -0.698316135 0.129086861

9. This table shows Group Centroids, the group means of predictor variables. If Discriminant scores are used to for classification then they are useful in calculation of optimal cut off scores.

Class Means on Canonical Variables
Among the following, my fav_0001 Can1 Can2 Can3 Can4 Can5
BARISTA -0.033780254 0.135037103 -0.022885540 -0.055598468 0.069801548
CCD 0.081639110 -0.225347494 0.092505485 0.044394331 -0.003944585
COSTA COFFEE 0.003206959 0.382693089 0.058677253 0.066406709 -0.033620752
FIESTA -0.118319509 -0.028275201 0.166074582 -0.587293723 -0.064305976
MINERVA COFFEE SHOP -1.504012439 -0.195732556 -0.189713147 0.068696815 -0.021179645
TESTA ROSSA CAFFÈ 0.281534033 -0.074432482 -0.426005586 -0.024747683 -0.025139838

10. In this table the rows are the observed categories of the dependent and the columns are the predicted categories. When prediction is perfect all cases will lie on the diagonal. The percentage of cases on the diagonal is the percentage of correct classifications. The cross validated set of data is a more honest presentation of the power of the discriminant function than that provided by the original classifications and often produces a poorer outcome.

Linear Discriminant Function for Among the following, my fav_0001
Constant -6.93204 -6.56254 -7.10828 -6.32001 -7.63554 -6.84312
price 1.46664 1.36627 1.54951 1.18446 1.27497 1.35942
taste 0.42324 0.21572 0.29458 0.43694 0.49461 0.67016
quantity 0.42296 0.34321 0.52947 0.34177 0.38518 0.52369
ambience 0.70110 0.63043 0.70598 0.72538 1.79210 0.35256
variety 0.77950 1.06985 0.70773 0.62342 0.41789 0.94239
location 0.83828 0.80154 0.83578 1.06683 0.39388 0.75354

11. The following two tables are classification matrix. The first one is for analysis sample. The second one is for validation of the proposed model.

Diagonal values should be checked upon here for improvement in predictions. Using the diagonal values the hit ratio is calculated. It measures how correctly the model has predicted the number of respondents would go to a specified coffee shop.

Hit Ratio = (2+15+16+2+4+6)/200 = 27.5%

Number of Observations and Percent Classified into Among the following, my fav_0001  
Barista 2 4.88 8 19.51 10 24.39 8 19.51 5 12.20 8 19.51 41 100.00
Ccd 4 4.88 25 30.49 17 20.73 13 15.85 9 10.98 14 17.07 82 100.00
Costa coffee 0 0.00 5 11.90 16 38.10 7 16.67 5 11.90 9 21.43 42 100.00
Fiesta 0 0.00 1 14.29 2 28.57 2 28.57 1 14.29 1 14.29 7 100.00
Minerva coffee shop 0 0.00 1 14.29 1 14.29 1 14.29 4 57.14 0 0.00 7 100.00
Testa rossa caffè 1 4.76 4 19.05 5 23.81 3 14.29 2 9.52 6 28.57 21 100.00
Total 7 3.50 44 22.00 51 25.50 34 17.00 26 13.00 38 19.00 200 100.00
Priors 0.16667   0.16667   0.16667   0.16667   0.16667   0.16667      
Number of Observations and Percent Classified into Among the following, my fav_0001
BARISTA 1 2.44 8 19.51 11 26.83 8 19.51 5 12.20 8 19.51 41 100.00
CCD 4 4.88 20 24.39 17 20.73 14 17.07 9 10.98 18 21.95 82 100.00
COSTA COFFEE 0 0.00 5 11.90 14 33.33 8 19.05 5 11.90 10 23.81 42 100.00
FIESTA 1 14.29 2 28.57 2 28.57 0 0.00 1 14.29 1 14.29 7 100.00
MINERVA COFFEE SHOP 0 0.00 1 14.29 1 14.29 1 14.29 4 57.14 0 0.00 7 100.00
TESTA ROSSA CAFFÈ 1 4.76 5 23.81 5 23.81 3 14.29 2 9.52 5 23.81 21 100.00
Total 7 3.50 41 20.50 50 25.00 34 17.00 26 13.00 42 21.00 200 100.00
Priors 0.16667   0.16667   0.16667   0.16667   0.16667   0.16667      


Factor Analysis

From the factor analysis report, it is known that the Preference of whether people prefer coffee shop or home-made coffee influences the consumer behavior most, towards coffee shop. It was helpful in identifying the factors that influence the consumers’ preference towards the coffee shop. The factors that were identified are fondness related variables and intellection related factors.

Discriminant Analysis 

With the results form SAS and manually calculated discriminant scores, it can be concluded that the data is categorized into six groups CCD, BARISTA, COSTA COFFEE, FIESTA, MINERVA COFFEE SHOP, TESTA ROSSA CAFFÈ. Any new entry in the respondent sheet can be guessed that which coffee shop the new entrant would prefer.

For example: A new entrant X defines his preference as price and quantity, the results would give out the inference that he refers to CCD.

Thus, the discriminant equation obtained is:

D = -7.63554 + (1.18446*price) + (0.67016*taste) + (0.52947*quantity) + (1.7921*ambience) + (1.06985*variety) + (1.06683*location)

Hence, discriminant analysis has helped to determine the choice of coffee shop based on the preference of the consumers for variables like price, quantity, location, ambience, variety, taste.


  Branded coffee shops should prefer locations nearer to big corporate companies so that employees feel visiting for relaxing from job tensions and meetings.

 Branded coffee shops need to reduce their cost overall and have many alternative variants of coffees and snacks.

 Branded coffee shops should never loose their core competency on taste while serving hot coffee for customers.

Leave a Reply

Your email address will not be published. Required fields are marked *