Introduction

The automotive industry has historically designed vehicles based on what can be narrowed down to two features: utility functions and overall efficiency. First, let’s examine the utility functions as a whole. Vehicles are used to transport people and things from one place to another. Things like passenger capacity, payload capacity and power are all considered and maximized to give the customer the best utility package for a desired performance expectation. To deliver the performance expected manufacturers typically choose large powerful engines combined with robust powertrains in a roomy and usually big vehicle. On the opposite side of the spectrum is efficiency. Small aerodynamic designs along with fuel efficient engines provide the customer with an efficient vehicle that operates economically. These designs typically generate small, limited capacity lower performance vehicles.

With the recent increase in gas prices customers are becoming more aware of the efficiency aspect of vehicles. But most are not willing to completely give up their required performance necessities. This is forcing a change in the way automotive manufacturers design their vehicles. With the vast variety of engine powertrains available customers and manufacturers alike have choices when considering a vehicles design and how it will fit into the functional purpose. This paper will investigate the influence engine size has on the fuel economy efficiency of vehicles. Although several factors influence a vehicles performance and fuel economy, it is expected that the size of the engine alone will show some direct correlation to a vehicles fuel economy.

Literature Review

The industry standard for the measure of engine size is by volume, typically reported in liters or cubic centimeters. This fact shows quite easily that the engine size will influence the fuel economy. A larger engine has a larger volume which in turn will use a larger amount of gasoline thus reducing the fuel efficiency of the vehicle. The American Automotive Association (2006) reported that on December 18, 2006 the national average price for one gallon of regular unleaded gasoline was $2.30. Using two different vehicles with different engine sizes and performance capacities to perform an example of the different costs follows. The EPA (2006) reports on it’s website that the 2004 Honda Civic with a 1.4L engine has a city fuel economy of 31 miles per gallon. It also reports that the 2004 Ford Expedition with a 4.6L engine has a city fuel economy of 14 miles per gallon. If both vehicles were driven 500 city miles the gasoline usage and cost would break down as follows. The Honda Civic would use 16 gallons of gas for a cost of $37.10. The Ford Expedition would use 36 gallons of gas for a cost of $82.14; more than double that of the Civic.

The above example shows only one instance where this is true. Many arguments can be made regarding why this type of example exists. For instance, one expert may say that the vehicle weight between the two cars is much different. This is also true but the discussion about fuel economy almost always seems to come back down to small cars with small engines.

Method

Data for this analysis was collected by Roger W. Johnson for 428 new 2004 model year vehicles. The data consists of all categories of cars and trucks. The source for the data is Kiplinger’s Personal Finance December 2003 vol. 57 no. 12. Nineteen different variables are included in the data set including vehicle type, price, manufacturer, engine size, weight and fuel economy. For the purposes of this paper a random sample of 40 vehicles was selected and analyzed concentrating on two variables, engine size and city miles per gallon.

Results

As mentioned in the paragraph above a random sample of 40 vehicles was selected and statistical analysis was performed on the engine size and city miles per gallon of those vehicles. A series of histograms and boxplots were created to check for normal distribution within the data and to investigate any outliers which may influence the sample mean. Figure 1 and 2 show the histograms for each variable analyzed. Both histograms show that neither variable is normally distributed. Figure 1 shows the engine data being slightly skewed to the left, meaning the frequency of smaller engines is greater than that of larger engines. But this also tells that the sample size accurately represents the whole population. It is expected that there be a larger number of smaller engines as these are common for passenger cars and small SUV’s, which make up a large part of the whole population.

The histograms also tell us about the mean and the standard deviation of the sample. Figure 2 shows the city MGP mean of 20.46 with a standard deviation of 3.926. It is important to note that the city MGP data has two bars located to the right which may be outliers and could be affecting the mean for this sample. Next we will investigate the boxplot for each variable to learn more about the distribution and variation of the data.

The figure shows the boxplot for each variable. The figure shows the engine size data. It represents what is expected based on the histogram. The center of the data is at the bottom of the box, or near the smaller engine sizes. And there are no outliers shown indicating the sample mean is a good representation of the population mean.

The figure shows the city MGP boxplot. As seen with the histogram, there are two outliers in this sample of data. These outliers are affecting the sample mean. But, these are not incorrect values. These values are simply explained as being city MPG values for extreme cases, possibly hybrid or small compact vehicles. To gain an even higher degree of accuracy this analysis may be redone by separating the vehicles into segments, e.g. small, medium and large, and determining what affect that has combined with engine size on the fuel economy.

The above analysis shows the data sampled to be a correct representation of the population. It also demonstrates the data is fit to be analyzed further to try and find a correlation between the two variables. A descriptive statistic analysis was performed on each sample of data. The analysis for engine size is as follows:

Descriptive Statistics: Engine Size n=40

Variable Mean SE Mean StDev Variance Minimum Q1 Median

Engine Size n=40 2.925 0.168 1.063 1.131 1.300 2.225 2.500

Variable Q3 Maximum

Engine Size n=40 3.725 5.300

Using the range rule of thumb the range of usual values can be calculated with the mean and standard deviation to be 0.799 – 5.051. This means that we expect all vehicles to have an engine size within this range, anything outside the range would be considered unusual. The sample analyzed shows two unusual values but they are very close to the range and do not impact the analysis.

The descriptive analysis for city MPG is as follows:

Descriptive Statistics: City MPG n=40

Variable Mean SE Mean StDev Variance Minimum Q1 Median

City MPG n=40 20.462 0.629 3.926 15.413 14.000 18.000 21.000

Variable Q3 Maximum

City MPG n=40 22.000 32.000

We can again find the range of usual values using the mean and standard deviation to be 12.61 – 28.31. The sample shows two unusual values outside of this range and those values were described above as outliers.

All the analysis thus far has shown good evidence to continue with the investigation. We have not discovered any flaws in the sample data so the next step is to construct a scatterplot. Figure 5 shows the scatterplot with a fitted line. The plot was created with city MPG on the x-axis and engine size on the y-axis. This plot gives us further evidence about the sample.

The scatterplot is helpful in showing clusters in the data. A cluster on the scatterplot would be a concentrated grouping of data points in one region of the graph. This would indicate that the sample chosen consists of two different populations. Figure 5 does not exhibit any such clusters. It is also important to note that the fitted line shows a general trend for the data. There are a couple of points that do not fit the trend but mainly the data shows a good trend that as engine size decreases city MPG goes up.

The results from the statistical analysis for the sample size selected show enough evidence about the data that a correlation can be made between the two variables. The next section will discuss this correlation and the rest of the findings regarding the analysis.

Discussion

As shown clearly with the above statistical analysis there seems to be enough evidence for a correlation between engine size and city MPG. Minitab was used to test for a correlation and the results are as follows:

Correlations: Engine Size n=40, City MPG n=40

Pearson correlation of Engine Size n=40 and City MPG n=40 = -0.689

Interpreting the data, the correlation coefficient r is 0.689. Using the sample size of 40 we can calculate the 95% and 99% critical values as 0.312 and 0.402 respectively. The correction coefficient exceeds both critical values indicating there is a correlation between engine size and city MPG.

Now that it is proven there is a correlation the regression formula can be used to predict values for both the engine size and city MPG. The last thing that must be considered is the r^{2} value. On figure 5 the r^{2} is shown as 47.5%. This means that 47.5% of the relationship between engine size and city MPG is represented in the regression formula below:

Engine Size = 6.623 – 0.1828 * City MPG

The meaning of the above statement can be put simply in that 47.5% of the time the regression formula will predict the correct value. The remaining 52.5% of the time it will have some error in it. But this is still a good formula to use for prediction. Because of the evidence shown in the Results section we do not expect to predict a large amount of error. Any error in the formula will probably be negligible for the purposes it is used for.

Stating that, the regression formula may be used to predict either required engine size given the desired city MPG. Or it can be rearranged to predict an expected city MPG given a known engine size. Automotive manufacturers would use the first version when designing a vehicle. If they were targeting a specific fuel economy range they would be able to determine what size engine to design into the vehicle. Customers shopping for a vehicle would use the second version. They could compare the difference in expected city MPG based on two different engine sizes available.