Wednesday, May 6, 2020

Datasets On Rented Properties Samples †MyAssignmenthelp.com

Question: Discuss about the Datasets On Rented Properties. Answer: This report looks at two datasets on rented properties taken by students in Australia. It uses primary and secondary data to determine a few determinants of rent paid by students, in terms of the chosen suburb, the number of bedrooms as a proxy for size, type of property (flat or house) and bond amount paid for the property. As we are using a sample its usefulness is limited by the sampling technique, and the data collected. Section 1 We first consider a primary dataset, obtained by an interview with students to know their weekly rent. This dataset is problematic as It is very small (only 5 points in it) It is not comparable to other secondary data that provides information on suburb, the number of bedrooms, category - flat or house, and bond amount paid. The other data is a secondary data- with 500 observations. It is rich in terms of additional information and can be used to make estimates campus wise or to determine association among variables. It can also be used for regression purposes to determine what affects rents in a significant way. Such conclusions can help students search for the right property based on their own requirements. A snapshot of this set is as follows: Bond Amount Weekly Rent Dwelling Type Number Bedrooms Postcode Suburb $2,700 $675 Flat 3 2031 RANDWICK $3,000 $750 Flat 2 2031 RANDWICK $1,540 $385 Flat 2 2144 AUBURN $2,360 $590 House 3 2144 AUBURN $2,600 $650 Flat 1 2000 SYDEY Section 2 We now use the primary data, to give a snapshot with numerical and visual help. weekly rent Mean 160 Standard Error 24.2899156 Median 150 Mode #N/A Standard Deviation 54.31390246 Sample Variance 2950 Kurtosis -1.952887101 Skewness 0.327662152 Range 130 Minimum 100 Maximum 230 Sum 800 Count 5 It can be seen that the highest rent is $230, while the lowest is $120 only0 almost half of the maximum. The mean rent is $160 while median is $150. The data is limited, but a little skewed to the right. Section 3 Next we consider dataset2, and look at the variable - Dwelling Type. The following points are clear: Most students prefer to live in flats- 474/500 or 94.8% in the sample. Parramatta is the most preferred location- 156/500 = 33.2% stay here The most dominated suburb is Sydney in terms of flats, with just 1/167 =0.5% staying in a house here. The other suburbs have a % of 17.8% , 5.7% and 2.9% staying in houses. The difference in flat and house residency is seen more starkly in a bar chart below. SUBURB Flat House Grand Total AUBURN 60 13 73 PARRAMATTA 147 9 156 RANDWICK 101 3 104 SYDNEY 166 1 167 Grand Total 474 26 500 . We move to check the hypothesis that houses are preferred by less than 10% of students. = sample proportion of students in houses = 26/500 = 0.052 Ho: p= 0.1 H1: p 0.1 ( left tail test) Z test value = (0.052 0.1)/ SE where SE = (0.052 *.948 /500)^.5 = 0.01 Test value = - 0.042 /0.01 = -4.834. Using a 95% confidence level, the critical z value is -1.645. The p value of the test value is P (z -4.834) = 0 as p value 0.05, we conclude that we cannot accept the null hypothesis. There is statistical evidence that proportion of Houses is less than 0.1. This conclusion proves the data shown above in true in a statistical sense. It is not mere luck/sampling problem that share of flats is so high. The low share of houses is systematic, and may have deeper reasons which we are unable to see in this assignment. ? Section 4 Next we move on to dwellings with 2 bedrooms only- flat and houses, irrespective of the suburb. To compare them we segregate them on the basis of suburbs and use average mean as the comparison metric. The table shows that Auburn has the lowest average rent of $404.67, while Sydney is at the other extreme of $838.04. A visual description is shown for easier comparison. Row Labels Sum of Average of WeeklyRent AUBURN 404.67 PARRAMATTA 461.31 RANDWICK 618.04 SYDNEY 838.04 Once again, like we tested for statistical significance of the houses proportion, we can check if the differences in average rent are amatter of luck/ design of the sample chosen or sysematic. For testing this we use ANOVA test. The null hypothesis is Ho: 1 = 2 = 3 =4 ( 1, 2,3 4 refer to suburbs) The alternative hypothesis is H1: 1 2 3 4 Using the ANOVA function in Excel we get the following table. Using the F test we note that p value is zero as P( F 456.9) = 0. This implies that the differences are statistically different. Accordingly a student must decide on the suburb to choose after considering these mean values as important and real enough. Source of Variation SS df MS F Between Groups 7640126.16 3 2546709 456.9565 Within Groups 1616227.32 290 5573.198 Total 9256353.49 293 Section 5 Lastly, we consider the relation between two quantitative variables- weekly Rent and Bond Amount in a scatterplot. ? We can see that most datapoints lie on the regression line (red line) of are very close to. This shows up as very strong association, with virtually zero outliers. The value of coefficient of determination = 0.985. The correlation coefficient is 0.992= .958^.5 is extremely high. So the bond amount can act as a very good and reliable guide to the value of rent. Lower bond amount properties are likely to have lower rents. Section 6 To conclude we can say that the secondary data is richer and more useful. It can still be validated with use of primary data. However we need more details on the primary data, so that we can make it more comparable with secondary data. The data can be improved to include more parameters that affect rents size of dwelling in square feet, shared or single occupancy, provision of kitchen can be some examples. References Hypothesis Testing . (n.d.). Retrieved May 30, 2017, from https://onlinecourses.science.psu.edu/statprogram/node/138 Hypothess testing . (n.d.). Retrieved June 2, 2017, from https://www.statisticshowto.com/probability-and-statistics/hypothesis-testing/ Mean, median, mode. (n.d.). Retrieved May 31, 2017, from https://www.bbc.co.uk/schools/gcsebitesize/maths/statistics/measuresofaveragerev6.shtml Measures of Spread. (n.d.). Retrieved Sep 13, 2017, from Statistics. laerd.com: https://statistics.laerd.com/statistical-guides/measures-of-spread-range-quartiles.php Measuresof dispersion. (n.d.). Retrieved Sep 11, 2017, from Simon.cs.vt.edu: https://simon.cs.vt.edu/SoSci/converted/Dispersion_I/ Regression analysis. (n.d.). Retrieved June 6, 2017, from Home.iitk.ac.in: https://home.iitk.ac.in/~shalab/regression/Chapter2-Regression-SimpleLinearRegressionAnalysis.pdf Sampling techniques. (n.d.). Retrieved June 18, 2017, from Rgs.org: https://www.rgs.org/OurWork/Schools/Fieldwork+and+local+learning/Fieldwork+techniques/Sampling+techniques.htm The 5 steps in Hypothesis testing. (n.d.). Retrieved June 5, 2017, from Learn,bu.edu: https://learn.bu.edu/bbcswebdav/pid-826908-dt-content-rid-2073693_1/courses/13sprgmetcj702_ol/week04/metcj702_W04S01T05_fivesteps.html What isa P value . (n.d.). Retrieved May 29, 2017, from stat.ualberta.ca: https://www.stat.ualberta.ca/~hooper/teaching/misc/Pvalue.pdf

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.