Introduction:
India has always been predominantly a tea
drinking nation. Coffee had been only moderately popular in some southern
states. However, there has been a sudden change in this trend with coffee
becoming more and more popular in recent times especially among the youth.
Thanks to the new entrants in the segment including Barista, Café Coffee Day
(CCD) and others.
Cafés are increasingly becoming
more than places to sip coffee. A lot many things in
life and work happen over a cup these days. India has now become one of
the fastest growing coffee markets in the world. It is taking great strides on
both counts; making its presence in the world market as well as in the domestic
retail arena as more and more Indians prefer the drink.
A single visit to a shopping mall,
high street or a neighborhood market, school, college, hospital or any public place makes us realize the
growing popularity of coffee shops in India. The brands available are not just
the local brands, rather lot of international brands have also managed to grab
the share of coffee market in India. Whether it is a school or a college
friends meet, an official interview/meeting or even sharing some moments of
happiness with loved ones, the best place to meet is over a cup of aromatic coffee.
Coffee shops have become the most popular places to relax after a grilling day
at work. Not only are café shops pleasurable for working class, but it is
heaven for shoppers who can rest their feet in a café after a hectic day of
shopping. All these benefits have led a large number of national as well as
international café chains to expand their network via the franchise route. Let
us find out the factors that have encouraged the growth of café brands before
reading about these brands and their success journey.
Encouraging factors
- Dominance of youth segment: Beverages like coffee is especially preferred by youth. About 50 per cent of the county’s population is 25 years or younger. This amounts to the growing popularity of cafes. This trend is sure to pick up in future because researches show that the youth population is expected to reach 55 per cent by 2015.
- Increase in disposable income: Students get a higher pocket money to spend on cafes with their friends. Earning class also does not mind to spend a few extra bucks on having a delicious cup of coffee in comfortable ambience of a café.
- Café’s as a social hub: With coffee culture brewing up pan India, it has become a great social hub for all kind of people to chat and relax over a cup of coffee.
- Increase in private offices: The increasing MNC culture has also helped in the popularity of café chains. Working professionals usually prefer to hang out at these places during their free time.
- Low cost kiosks: The foremost factor for growth of coffee culture is its low cost benefits. Any interested entrepreneur can take a coffee kiosk due to low investment needed.
- Availability factor: Café outlets or kiosks are available at mostly all places. Customers can also opt for ‘take-away coffee’ if they are running out of time due to its easy availability.
- Franchisees pan India. It popularly operates through its kiosks which require 50-100 sq.ft area and a location that has high footfalls.
- International players ready for foray. These are few of the brands which have become popular in café industry via the franchise route. Costa Coffee, Gloria Jean’s Coffee is some international brands which have made their mark in India.
With the purpose to investigate these precise factors that affect attitude and buying behavior of type of consumers mainly the ones who prefer homemade coffee or branded coffee shops in India
Factor Analysis:
Factor Analysis is done to basically identify the important factors or variables that influence the measuring variable.
In this project, Factor Analysis is done to identify the important factors that influence the consumer preference towards the coffee shop.
The variables that have been chosen for analysis are as follows:
Preference (Preference of Coffee shop over Home-made coffee)
Frequency (How frequently people drink coffee in a coffee shop)
Reason (Reason as to why do people go to coffee shop)
Beverages (Beverages other than coffee that people like to have in a coffee shop)
Knick Knacks (Food items that people like to eat along with coffee)
Premium Amount (Do people prefer to pay a premium amount in a branded coffee shop)
The Factor analysis has been done using SAS Enterprise Guide 4.2 and the analysis of the tables and results obtained are as follows:
KMO value:
KMO value is a measure of adequacy. It is a measure that tells whether the number of samples taken for analysis is sufficient or not. If the KMO value is greater than 0.5, the number of samples is sufficient. Else the analysis has to be repeated by increasing the number of samples.
Kaiser’s Measure of Sampling Adequacy: Overall MSA = 0.59658972 | |||||
Preference | Frequency | Reason | Premium Amount | Knick Knacks | Beverages |
0.63089608 | 0.59348846 | 0.56644823 | 0.60319500 | 0.67119815 | 0.46908893 |
Table no. 1
The above mentioned table indicates the KMO factor obtained after the analysis with 200 samples.
Since the KMO value obtained (0.59658972) is greater than 0.5, the number of samples taken for the analysis is sufficient.
Eigen Values:
Eigen value is a measure of sum of variances of the variables present in a factor. If the Eigen value for a factor is greater than 1, it means that the factor is significant else it can be ignored.
Eigen values of the Correlation Matrix: Total = 6 Average = 1 | ||||
Eigen value | Difference | Proportion | Cumulative | |
1 | 1.62876265 | 0.46635549 | 0.2715 | 0.2715 |
2 | 1.16240716 | 0.23130315 | 0.1937 | 0.4652 |
3 | 0.93110401 | 0.03803337 | 0.1552 | 0.6204 |
4 | 0.89307064 | 0.17436043 | 0.1488 | 0.7692 |
5 | 0.71871021 | 0.05276489 | 0.1198 | 0.8890 |
6 | 0.66594533 | 0.1110 | 1.0000 |
Table no. 2
The above mentioned table shows that the Eigen values for two factors are greater than one. Hence only two factors will be retained by the MINEIGEN criterion and the rest would be ignored.
Scree Plot:
Scree plot shows the number of factors that are significant.
Graph 1
Factor Pattern:
Factor Pattern is a matrix showing the factor loadings i.e. the variances between the variables and the factors.
Factor Pattern | ||
Factor1 | Factor2 | |
Preference | 0.71016 | -0.02296 |
Frequency | 0.53154 | 0.37027 |
Reason | -0.29179 | -0.58213 |
Premium Amount | 0.69153 | -0.12074 |
Knick Knacks | -0.47445 | 0.17219 |
Beverages | -0.23117 | 0.80105 |
Table no. 3
The above mentioned table shows the factor loadings between all the six variables and the two factors.
Rotated Factor Pattern:
The rotated factor pattern is obtained by rotating the factor pattern along the 90 degree axis. This is done to remove the effect of unwanted variable i.e. the variables with least correlation.
The rotated factor pattern can be used to assign the variables to the suitable factors.
Rotated Factor Pattern | ||
Factor1 | Factor2 | |
Preference | 0.69585 | 0.14369 |
Frequency | 0.43024 | 0.48427 |
Reason | -0.14762 | -0.63421 |
Premium Amount | 0.70059 | 0.04427 |
Knick Knacks | -0.50156 | 0.05650 |
Beverages | -0.41203 | 0.72481 |
Table no. 4
In the above mentioned table, we can see the factor loadings of the six variables with the two factors. Hence the variables can be assigned to the suitable factors in the following manner:
Factor1 Factor2
Preference Frequency
Premium Amount Reason
Knick Knacks Beverages
The above mentioned factors can be named based on the characteristics of the variables lying underneath.
Factor 1 can be named as fondness related variables since the variables are related to what does a consumer like or prefer in a coffee shop.
Factor 2 can be named as intellection related variables since the variables are related to when do a consumer go to a coffee shop.
Factor Scoring Coefficients:
This is a measure of the importance of each variable i.e. how much does a variable influence the measuring factor.
This can be calculated by the sum-product of each standardized scoring coefficient with its factor pattern.
Standardized Scoring Coefficients | ||
Factor1 | Factor2 | |
Preference | 0.42855 | 0.08272 |
Frequency | 0.24283 | 0.38601 |
Reason | -0.05711 | -0.52880 |
Premium Amount | 0.43709 | -0.00173 |
Knick Knacks | -0.31786 | 0.07592 |
Beverages | -0.29910 | 0.63685 |
Table no. 5
Factor Pattern | ||
Factor1 | Factor2 | |
Preference | 0.71016 | -0.02296 |
Frequency | 0.53154 | 0.37027 |
Reason | -0.29179 | -0.58213 |
Premium Amount | 0.69153 | -0.12074 |
Knick Knacks | -0.47445 | 0.17219 |
Beverages | -0.23117 | 0.80105 |
Table no. 6
Preference 0.3043
Frequency 0.1291
Reason 0.0167
Premium 0.3023
Knick Knacks 0.1508
Beverages 0.0691
Hence from the values mentioned above it is clear that the Preference of whether people prefer coffee shop or home-made coffee influences the consumer behavior most towards coffee shop.
Hence the factor analysis was helpful in identifying the factors that influence the consumers’ preference towards the coffee shop.
Discriminant analysis
1. This table simply gives information about the sample size, number of independent variables and categories or groups of dependent variable.
Total Sample Size | 200 | DF Total | 199 |
Variables | 6 | DF Within Classes | 194 |
Classes | 6 | DF Between Classes | 5 |
2. This table indicates missing values if any. Since no. of observations = no. of observations used there are no cases of missing values here.
Number of Observations Read | 200 |
Number of Observations Used | 200 |
3. This table gives information about the dependent variable in particular. Since it was assumed that number of observations is equal in all the categories hence prior probability = 0.166667. Prior probability by default is set to 0.5 when we do not have information on the possible proportional division of categories of the sample in hand. If we have prior information then SAS have options to set it proportionately as per sample characteristics.
Class Level Information | |||||
Among the following, my fav_0001 |
Variable Name | Frequency | Weight | Proportion |
Prior Probability |
BARISTA | BARISTA | 41 | 41.0000 | 0.205000 | 0.166667 |
CCD | CCD | 82 | 82.0000 | 0.410000 | 0.166667 |
COSTA COFFEE | COSTA COFFEE | 42 | 42.0000 | 0.210000 | 0.166667 |
FIESTA | FIESTA | 7 | 7.0000 | 0.035000 | 0.166667 |
MINERVA COFFEE SHOP | MINERVA COFFEE SHOP | 7 | 7.0000 | 0.035000 | 0.166667 |
TESTA ROSSA CAFFÈ | TESTA ROSSA CAFFÈ | 21 | 21.0000 | 0.105000 | 0.166667 |
Table no. 7
4. The table below is equivalent to the “log determinants” table of SPSS. The difference being in SPSS there are three rows of data whereas SAS gives data only on the last row of that table. “There are NO BOX’s M Test results in SAS EG output”.
Pooled Covariance Matrix Information | |
Covariance Matrix Rank |
Natural Log of the Determinant of the Covariance Matrix |
6 | 3.93903 |
Table no. 8
5. The table below is similar to “Tests of equality of group means” of SPSS. Ignore the Total SD, Pooled SD and Between SD columns. They are not of much use here. Concentrate on the last column of Pr>F. It is same as Sig column of SPSS. This column actually indicates p values. As it is seen price, taste, quantity, ambience, variety and location all are significant. This table provides strong statistical evidence of significant differences between means of six categories of the dependent variable for all the independent variables. That means all of them are assisting discriminating the dependent variable categories. Next, the values of the R – Square column are taken into consideration. (If one subtracts “R-Square value from 1 you get Wilks’ Lambda” values for individual variables.) R – Square value indicates how much a single independent variable explains the proportion discrimination among dependent variables categories. For e.g. Ambience explains 5.98% of discrimination in the dependent variable (it is also the strongest discriminating independent variable
Univariate Test Statistics | |||||||
F Statistics, Num DF=5, Den DF=194 | |||||||
Variable |
Total Standard Deviation |
Pooled Standard Deviation |
Between Standard Deviation | R-Square |
R-Square / (1-RSq) | F Value | Pr > F |
price | 1.6265 | 1.6189 | 0.3288 | 0.0342 | 0.0354 | 1.37 | 0.2354 |
taste | 1.2194 | 1.2226 | 0.1886 | 0.0200 | 0.0205 | 0.79 | 0.5555 |
quantity | 1.6134 | 1.6250 | 0.1853 | 0.0110 | 0.0112 | 0.43 | 0.8250 |
ambience | 1.4240 | 1.4009 | 0.3696 | 0.0564 | 0.0598 | 2.32 | 0.0448 |
variety | 1.4886 | 1.4876 | 0.2647 | 0.0265 | 0.0272 | 1.06 | 0.3867 |
location | 1.6289 | 1.6390 | 0.2023 | 0.0129 | 0.0131 | 0.51 | 0.7702 |
Table no. 9
6. The below table is similar to the “Eigen values table” of SPSS. The first column of canonical correlation needs to be analyzed, the square value of 0.292793 = 0.085728 indicates the squared canonical correlation column. This value indicates that the proposed discriminant function model explains 8.57% of the discrimination that exists between the categories of the dependent variable. Another way of saying (more technical and appropriate) is: approximately 8.57% of variance in the Discriminant scores is explained by the differences among the groups.
Canonical Correlation |
Adjusted Canonical Correlation |
Approximate Standard Error |
Squared Canonical Correlation |
Eigen values of Inv(E)*H = CanRsq/(1-CanRsq) | Test of H0: The canonical correlations in the current row and all that follow are zero | ||||||||
Eigen value | Difference | Proportion | Cumulative |
Likelihood Ratio |
Approximate F Value | Num DF | Den DF | Pr > F | |||||
1 | 0.292793 | 0.186753 | 0.064811 | 0.085728 | 0.0938 | 0.0347 | 0.4788 | 0.4788 | 0.82730832 | 1.23 | 30 | 758 | 0.1896 |
2 | 0.236103 | 0.176764 | 0.066936 | 0.055745 | 0.0590 | 0.0326 | 0.3014 | 0.7802 | 0.90488171 | 0.97 | 20 | 631.11 | 0.5031 |
3 | 0.160412 | . | 0.069064 | 0.025732 | 0.0264 | 0.0113 | 0.1349 | 0.9150 | 0.95830193 | 0.68 | 12 | 505.63 | 0.7677 |
4 | 0.122056 | . | 0.069832 | 0.014898 | 0.0151 | 0.0136 | 0.0772 | 0.9923 | 0.98361234 | 0.53 | 6 | 384 | 0.7848 |
5 | 0.038891 | . | 0.070781 | 0.001513 | 0.0015 | 0.0077 | 1.0000 | 0.99848748 | 0.15 | 2 | 193 | 0.8641 |
Table no. 10
7. The below table is called Discriminant loadings matrix similar to interpretation as factor loadings. It represents the correlation of each predictor variable with the Discriminant function. It is preferable to comment on the strength of the predictors to discriminate among groups based on structure matrix table as it is considered to be more accurate and free from multicollinearity issues that may be there among variables. The naming of discriminating factor is done depending on the variables which load highly on to the discriminating function.
This table is seen in combination with the Univariate Test Statistics table. First the significant discriminating variables are determined then the discriminant loadings are checked to comment on the strength of the individual variable’s discriminating power.
Pooled Within Canonical Structure | |||||
Variable | Can1 | Can2 | Can3 | Can4 | Can5 |
price | 0.199888 | 0.688744 | 0.055365 | 0.459585 | 0.516915 |
taste | -0.139108 | -0.218914 | -0.743423 | -0.247219 | 0.435684 |
quantity | 0.041338 | 0.319034 | -0.366515 | 0.269634 | -0.481089 |
ambience | -0.782018 | -0.201699 | 0.022977 | 0.008311 | 0.163086 |
variety | 0.119532 | -0.633160 | -0.205030 | 0.259332 | 0.199497 |
location | 0.183970 | 0.151962 | 0.346928 | -0.592870 | 0.192306 |
8. The Standardized coefficients allow comparing variables measured on different scales. The coefficients with large absolute values correspond to variables with greater discriminating ability.
Pooled Within-Class Standardized Canonical Coefficients | |||||
Variable | Can1 | Can2 | Can3 | Can4 | Can5 |
price | 0.085171068 | 0.519161498 | 0.103813766 | 0.545930743 | 0.702351598 |
taste | -0.047678641 | 0.136660609 | -0.997160810 | -0.526994939 | 0.423073841 |
quantity | 0.048332853 | 0.439269152 | -0.408407886 | 0.149549382 | -0.750017915 |
ambience | -1.081197585 | 0.052523964 | 0.320821120 | 0.173011118 | 0.130047778 |
variety | 0.558835378 | -0.811436501 | 0.251745336 | 0.628704704 | 0.229370505 |
location | 0.337174077 | 0.191037661 | 0.425150885 | -0.698316135 | 0.129086861 |
9. This table shows Group Centroids, the group means of predictor variables. If Discriminant scores are used to for classification then they are useful in calculation of optimal cut off scores.
Class Means on Canonical Variables | |||||
Among the following, my fav_0001 | Can1 | Can2 | Can3 | Can4 | Can5 |
BARISTA | -0.033780254 | 0.135037103 | -0.022885540 | -0.055598468 | 0.069801548 |
CCD | 0.081639110 | -0.225347494 | 0.092505485 | 0.044394331 | -0.003944585 |
COSTA COFFEE | 0.003206959 | 0.382693089 | 0.058677253 | 0.066406709 | -0.033620752 |
FIESTA | -0.118319509 | -0.028275201 | 0.166074582 | -0.587293723 | -0.064305976 |
MINERVA COFFEE SHOP | -1.504012439 | -0.195732556 | -0.189713147 | 0.068696815 | -0.021179645 |
TESTA ROSSA CAFFÈ | 0.281534033 | -0.074432482 | -0.426005586 | -0.024747683 | -0.025139838 |
10. In this table the rows are the observed categories of the dependent and the columns are the predicted categories. When prediction is perfect all cases will lie on the diagonal. The percentage of cases on the diagonal is the percentage of correct classifications. The cross validated set of data is a more honest presentation of the power of the discriminant function than that provided by the original classifications and often produces a poorer outcome.
Linear Discriminant Function for Among the following, my fav_0001 | ||||||
Variable | BARISTA | CCD | COSTA COFFEE | FIESTA | MINERVA COFFEE SHOP | TESTA ROSSA CAFFÈ |
Constant | -6.93204 | -6.56254 | -7.10828 | -6.32001 | -7.63554 | -6.84312 |
price | 1.46664 | 1.36627 | 1.54951 | 1.18446 | 1.27497 | 1.35942 |
taste | 0.42324 | 0.21572 | 0.29458 | 0.43694 | 0.49461 | 0.67016 |
quantity | 0.42296 | 0.34321 | 0.52947 | 0.34177 | 0.38518 | 0.52369 |
ambience | 0.70110 | 0.63043 | 0.70598 | 0.72538 | 1.79210 | 0.35256 |
variety | 0.77950 | 1.06985 | 0.70773 | 0.62342 | 0.41789 | 0.94239 |
location | 0.83828 | 0.80154 | 0.83578 | 1.06683 | 0.39388 | 0.75354 |
11. The following two tables are classification matrix. The first one is for analysis sample. The second one is for validation of the proposed model.
Diagonal values should be checked upon here for improvement in predictions. Using the diagonal values the hit ratio is calculated. It measures how correctly the model has predicted the number of respondents would go to a specified coffee shop.
Hit Ratio = (2+15+16+2+4+6)/200 = 27.5%
Number of Observations and Percent Classified into Among the following, my fav_0001 | ||||||||||||||
From Among the following, my fav_0001 | BARISTA | CCD | COSTA COFFEE | FIESTA | MINERVA COFFEE SHOP | TESTA ROSSA CAFFÈ | Total | |||||||
Barista | 2 4.88 | 8 19.51 | 10 24.39 | 8 19.51 | 5 12.20 | 8 19.51 | 41 100.00 | |||||||
Ccd | 4 4.88 | 25 30.49 | 17 20.73 | 13 15.85 | 9 10.98 | 14 17.07 | 82 100.00 | |||||||
Costa coffee | 0 0.00 | 5 11.90 | 16 38.10 | 7 16.67 | 5 11.90 | 9 21.43 | 42 100.00 | |||||||
Fiesta | 0 0.00 | 1 14.29 | 2 28.57 | 2 28.57 | 1 14.29 | 1 14.29 | 7 100.00 | |||||||
Minerva coffee shop | 0 0.00 | 1 14.29 | 1 14.29 | 1 14.29 | 4 57.14 | 0 0.00 | 7 100.00 | |||||||
Testa rossa caffè | 1 4.76 | 4 19.05 | 5 23.81 | 3 14.29 | 2 9.52 | 6 28.57 | 21 100.00 | |||||||
Total | 7 3.50 | 44 22.00 | 51 25.50 | 34 17.00 | 26 13.00 | 38 19.00 | 200 100.00 | |||||||
Priors | 0.16667 | 0.16667 | 0.16667 | 0.16667 | 0.16667 | 0.16667 | ||||||||
Number of Observations and Percent Classified into Among the following, my fav_0001 | |||||||
From Among the following, my fav_0001 | BARISTA | CCD | COSTA COFFEE | FIESTA | MINERVA COFFEE SHOP | TESTA ROSSA CAFFÈ | Total |
BARISTA | 1 2.44 | 8 19.51 | 11 26.83 | 8 19.51 | 5 12.20 | 8 19.51 | 41 100.00 |
CCD | 4 4.88 | 20 24.39 | 17 20.73 | 14 17.07 | 9 10.98 | 18 21.95 | 82 100.00 |
COSTA COFFEE | 0 0.00 | 5 11.90 | 14 33.33 | 8 19.05 | 5 11.90 | 10 23.81 | 42 100.00 |
FIESTA | 1 14.29 | 2 28.57 | 2 28.57 | 0 0.00 | 1 14.29 | 1 14.29 | 7 100.00 |
MINERVA COFFEE SHOP | 0 0.00 | 1 14.29 | 1 14.29 | 1 14.29 | 4 57.14 | 0 0.00 | 7 100.00 |
TESTA ROSSA CAFFÈ | 1 4.76 | 5 23.81 | 5 23.81 | 3 14.29 | 2 9.52 | 5 23.81 | 21 100.00 |
Total | 7 3.50 | 41 20.50 | 50 25.00 | 34 17.00 | 26 13.00 | 42 21.00 | 200 100.00 |
Priors | 0.16667 | 0.16667 | 0.16667 | 0.16667 | 0.16667 | 0.16667 |
Conclusion:
Factor Analysis
From the factor analysis report, it is known that the Preference of whether people prefer coffee shop or home-made coffee influences the consumer behavior most, towards coffee shop. It was helpful in identifying the factors that influence the consumers’ preference towards the coffee shop. The factors that were identified are fondness related variables and intellection related factors.
Discriminant Analysis
With the results form SAS and manually calculated discriminant scores, it can be concluded that the data is categorized into six groups CCD, BARISTA, COSTA COFFEE, FIESTA, MINERVA COFFEE SHOP, TESTA ROSSA CAFFÈ. Any new entry in the respondent sheet can be guessed that which coffee shop the new entrant would prefer.
For example: A new entrant X defines his preference as price and quantity, the results would give out the inference that he refers to CCD.
Thus, the discriminant equation obtained is:
D = -7.63554 + (1.18446*price) + (0.67016*taste) + (0.52947*quantity) + (1.7921*ambience) + (1.06985*variety) + (1.06683*location)
Hence, discriminant analysis has helped to determine the choice of coffee shop based on the preference of the consumers for variables like price, quantity, location, ambience, variety, taste.
Recommendation:
Branded coffee shops should prefer locations nearer to big corporate companies so that employees feel visiting for relaxing from job tensions and meetings.
Branded coffee shops need to reduce their cost overall and have many alternative variants of coffees and snacks.
Branded coffee shops should never loose their core competency on taste while serving hot coffee for customers.