影响粮食产量的相关因素分析 为了研究中国影响粮食产量的各种因素,通过经济理论分析得出粮产量与以下四个因素有关 ,现建模如下:y=α+β1X1+β2X2+β3X3+β4X4+U X1:农业机械总动力(万千瓦) X2:有效灌溉面积(千公顷) X3:化肥施用量(万吨) X4:农业从业人员(万人) Y:粮食总产量(万吨) 数据资料如下: 地区 X1 X2 X3 X4 Y 北 京 399.2 328.2 17.9 69.7 144.2 天 津 593.4 353.2 16.6 79.7 124.1 河 北 7000.4 4482.3 270.6 1665.5 2551.1 山 西 1701.3 1105 87 658.3 853.4 内蒙古 1350.3 2371.7 74.8 524.3 1241.9 辽 宁 1339.8 1440.7 109.8 651.2 1140.0 吉 林 1015.4 1315.1 112.1 516.8 1638.0 黑龙江 1613.8 2032 121.6 744.1 2545.5 上 海 142.5 285.9 19.3 84.6 174.0 江 苏 2925.3 3900.9 335.5 1480.2 3106.6 浙 江 1990.1 1403.2 89.7 1014.9 1217.7 安 徽 2975.9 3197.2 253.2 2001.8 2472.1 福 建 873.3 940.2 123.3 768.7 854.7 江 西 902.3 1903.4 106.9 983.4 1614.6 山 东 7025.2 4824.9 423.2 2462.6 3837.7 河 南 5780.6 4725.3 419.5 3558.6 4101.5 湖 北 1414.0 2072.5 247.1 1159.1 2218.5 湖 南 2209.7 2677.5 182.2 2071.4 2767.9 广 东 1763.9 1478.5 176.2 1570.1 1760.1 广 西 1467.9 1501.6 157.8 1556.8 1528.5 海 南 200.9 179.8 26.3 177.2 199.6 重 庆 586.5 624.6 72 921.5 1106.9 四 川 1679.7 2469 212.6 2631.1 3372.0 贵 州 618.6 653.4 71.3 1372.1 1161.3 云 南 1301.3 1403.4 112.1 1674.3 1467.8 西 藏 114.5 157 2.5 90.1 96.2 陕 西 1042.9 1308 131.2 1002.2 1089.1 甘 肃 1056.9 981.5 64.5 697.5 713.5 青 海 256.2 211.4 7.2 142.3 82.7 宁 夏 380.6 398.8 23.6 153.1 252.7 新 疆 851.2 3094.3 79.2 314.5 783.7 第一,进行OLS检验 Dependent Variable: Y Method: Least Squares Date: 05/16/04 Time: 14:53 Sample: 1 31 Included observations: 31 Variable Coefficient Std. Error t-Statistic Prob. X1 -0.136288 0.087494 -1.557681 0.1314 X2 0.301594 0.134812 2.237136 0.0341 X3 5.578372 1.919377 2.906345 0.0074 X4 0.359531 0.151924 2.366526 0.0257 C 79.59973 119.3616 0.666879 0.5107 R-squared 0.902706 Mean dependent var 1490.890 Adjusted R-squared 0.887738 S.D. dependent var 1141.343 S.E. of regression 382.4131 Akaike info criterion 14.87757 Sum squared resid 3802234. Schwarz criterion 15.10886 Log likelihood -225.6023 F-statistic 60.30791 Durbin-Watson stat 1.447710 Prob(F-statistic) 0.000000 从估计结果可以看出,模型拟合较好,可决系数R2=0.9027,表明模型在整体上拟合非常好。系数显著性检验:对于β,T统计量为负,说明β1未通过检验,即农业机械总动力对粮产量的影响不显著,初步决定删除X1。 第二,从影响粮产量的因素来看,所选的四个解释变量与粮产量都有密切关系,因此它们之间可能具有较强的共线性,现进行多重共线性检验: (1)根据简单相关系数公式,该模型中四个解释变量得相关系数矩阵如图所示: X1 X2 X3 X4 X1 1 0.882038357851 0.863333559223 0.714970041093 X2 0.882038357851 1 0.901769706417 0.731461937668 X3 0.863333559223 0.901769706417 1 0.848157708636 X4 0.714970041093 0.731461937668 0.848157708636 1 由此可知,X2与X3的相关系数较高,说明它们之间可能存在共线性。 (2)修正 运用OLS方法逐一用Y对X1,X2,X3,X4回归 Y 对X1回归 Dependent Variable: Y Method: Least Squares Date: 05/16/04 Time: 15:00 Sample: 1 31 Included observations: 31 Variable Coefficient Std. Error t-Statistic Prob. X1 0.496455 0.073612 6.744229 0.0000 C 648.9313 180.3059 3.599057 0.0012 R-squared 0.610658 Mean dependent var 1490.890 Adjusted R-squared 0.597232 S.D. dependent var 1141.343 S.E. of regression 724.3415 Akaike info criterion 16.07074 Sum squared resid 15215448 Schwarz criterion 16.16326 Log likelihood -247.0965 F-statistic 45.48463 Durbin-Watson stat 1.403900 Prob(F-statistic) 0.000000 Y 对X2回归 Dependent Variable: Y Method: Least Squares Date: 05/16/04 Time: 15:01 Sample: 1 31 Included observations: 31 Variable Coefficient Std. Error t-Statistic Prob. X2 0.727633 0.074609 9.752561 0.0000 C 227.6144 164.1219 1.386862 0.1761 R-squared 0.766341 Mean dependent var 1490.890 Adjusted R-squared 0.758284 S.D. dependent var 1141.343 S.E. of regression 561.1372 Akaike info criterion 15.56015 Sum squared resid 9131373. Schwarz criterion 15.65266 Log likelihood -239.1823 F-statistic 95.11244 Durbin-Watson stat 0.880823 Prob(F-statistic) 0.000000 Y对 X3回归 Dependent Variable: Y Method: Least Squares Date: 05/16/04 Time: 15:02 Sample: 1 31 Included observations: 31 Variable Coefficient Std. Error t-Statistic Prob. X3 9.366144 0.684930 13.67460 0.0000 C 238.0023 119.2935 1.995098 0.0555 R-squared 0.865737 Mean dependent var 1490.890 Adjusted R-squared 0.861108 S.D. dependent var 1141.343 S.E. of regression 425.3585 Akaike info criterion 15.00608 Sum squared resid 5246965. Schwarz criterion 15.09860 Log likelihood -230.5943 F-statistic 186.9948 Durbin-Watson stat 1.848900 Prob(F-statistic) 0.000000 Y对 X4回归 Dependent Variable: X4 Method: Least Squares Date: 05/16/04 Time: 15:02 Sample: 1 31 Included observations: 31 Variable Coefficient Std. Error t-Statistic Prob. Y 0.659424 0.073216 9.006522 0.0000 C 53.24803 136.6499 0.389668 0.6996 R-squared 0.736645 Mean dependent var 1036.377 Adjusted R-squared 0.727564 S.D. dependent var 876.9039 S.E. of regression 457.7038 Akaike info criterion 15.15266 Sum squared resid 6075291. Schwarz criterion 15.24518 Log likelihood -232.8663 F-statistic 81.11745 Durbin-Watson stat 1.402526 Prob(F-statistic) 0.000000 由此,X3的可决系数最高,说明Y对X3的线性关系最强,结合经济意义和统计检验,选出如下线性回归方程: Y=238.0023+9.366x3 (1.995) (13.6746) R2=0.866 SE=425.3585 F=18609948 以它为基础逐步回归: 1,Y=230.705-0.054X1+10.112X3 (1.905) (-0.629) (70372) R2=0.868 SE=429.86 F=91.749 2,Y=148.389-0.135X1+0.259X2+8.379X3 (1.184) (-1.428) (1.798)(5.125) R2=0.882 SE=413.7 F=67.1 3,Y=79.599-0.136X1+0.302X2+5.578X3+0.359X4 (0.667) (-1.56)(2.237)(2.906) (2.367) R2=0.903 SE=382.413 F=60.308 由此可见,X1对Y影响并不显著,现决定将X1删除,得如下模型: Dependent Variable: Y Method: Least Squares Date: 05/16/04 Time: 15:19 Sample: 1 31 Included observations: 31 Variable Coefficient Std. Error t-Statistic Prob. X2 0.202652 0.122011 1.660927 0.1083 X3 4.802746 1.901996 2.525107 0.0177 X4 0.358288 0.155883 2.298444 0.0295 C 125.2837 118.7193 1.055293 0.3006 R-squared 0.893626 Mean dependent var 1490.890 Adjusted R-squared 0.881807 S.D. dependent var 1141.343 S.E. of regression 392.3843 Akaike info criterion 14.90227 Sum squared resid 4157066. Schwarz criterion 15.08731 Log likelihood -226.9853 F-statistic 75.60752 Durbin-Watson stat 1.440474 Prob(F-statistic) 0.000000
第三,由于随机扰动项可能包含对粮产量的影响因素,从而使得随机扰动项可能出现自相关,现检验如下: (1)图示
从图中可以看出,残差成线性自回归,说明随机扰动项存在自相关。 DW检验 DW=1.44 DL=1.229 DU=1.650 无法确定是否存在自相关,需进一步检验 修正 由DW=1.44,算出ρ=0.28。分别对Y,X2,X3,X4作广义差分得如下模型: Dependent Variable: DY Method: Least Squares Date: 05/16/04 Time: 15:56 Sample(adjusted): 2 31 Included observations: 30 after adjusting endpoints Variable Coefficient Std. Error t-Statistic Prob. DX2 0.255981 0.113195 2.261408 0.0323 DX3 4.192208 1.727270 2.427072 0.0225 DX4 0.357679 0.154627 2.313172 0.0289 C 82.65742 100.1592 0.825260 0.4167 R-squared 0.892301 Mean dependent var 1111.730 Adjusted R-squared 0.879874 S.D. dependent var 1106.490 S.E. of regression 383.5008 Akaike info criterion 14.86013 Sum squared resid 3823895. Schwarz criterion 15.04695 Log likelihood -218.9019 F-statistic 71.80427 Durbin-Watson stat 1.818796 Prob(F-statistic) 0.000000 这时我们发现DW知在经过广义差分后有所提高,自相关消除。 第四,由于样本数据的观测误差和模型设置的不正确性,随机误差项可能随某个解释变量的变化而变化,所以进行异方差检验。 图示
有图可知,该模型存在复杂型的异方差 用对数变换法对该模型进行修正,得新模型如下: Dependent Variable: LY Method: Least Squares Date: 05/16/04 Time: 16:06 Sample: 1 31 Included observations: 31 Variable Coefficient Std. Error t-Statistic Prob. LX2 0.372139 0.142715 2.607568 0.0147 LX3 0.520076 0.141261 3.681673 0.0010 LX4 0.137348 0.085233 1.611433 0.1187 C 1.043700 0.627773 1.662544 0.1080 R-squared 0.933482 Mean dependent var 6.857225 Adjusted R-squared 0.926091 S.D. dependent var 1.148277 S.E. of regression 0.312173 Akaike info criterion 0.629398 Sum squared resid 2.631209 Schwarz criterion 0.814428 Log likelihood -5.755664 F-statistic 126.3013 Durbin-Watson stat 1.321994 Prob(F-statistic) 0.000000 经过对数变换后,该模型的可决系数有所提高,异方差消除。 通过上述检验和修正最后得出如下模型:LY=α+β2LX2+β3LX3+β4LX4+U 令Y*=LY α*=α β2*=β2 β3*=β3 β4*=β4 U*=U X2*=LX2 X3*=LX3 X4*=X4 即: Y*=α*+β2*X2*+β3*X3*+β4*X4*+U* 该模型剔除了无关的解释变量X1,并消除了多重共线性,自相关,异方差,从而具有较高的拟合优度,最后得出如下结论: 粮产量与如下三个因素有关:有效灌溉面积,化肥施用量,农业从业人数。