Mountain Everest Deaths and Ascents between 1922-2020.
Time in Nepal |
Nepal1 is tomorrow~
I got two datasets, peak ascents and deaths reports for Everest from The Himalayan Database and combined them into one dataset. Most other online databases only provide one or two recent year data for free, and I didn’t find more available data, so this dataset has limited variables for analysis. Probably could add some weather data like daily wind speed and average temperature as predictors. 👉National Geographic
something is wrong here, NA is more than 215, and the totals are not equal?
NA_cnt <- table(is.na(total))
NA_pct <- prop.table(NA_cnt)
cbind(NA_cnt, NA_pct)
NA_cnt NA_pct
FALSE 135076 0.998410833
TRUE 215 0.001589167
FALSE
145698
counts <- table(total$Dth)
proportions <- prop.table(counts)
cbind(counts, proportions)
counts proportions
0 10101 0.97059671
1 306 0.02940329
Age Male climbers’ age range is wider than females due to some outliers. We can see the oldest climbers are over 80. 50% of death in females are between the age of 30-50 while males are between 30-45. Both genders’ interquartile ranges are similar among success ascents groups.
ggplot(data = total,
mapping = aes(x = Sex,
y = Age,
fill = Dth), na.rm = TRUE) +
labs(x = "Sex",
y = "Age",
title = "Age Distribution by Sex and Death",
tag = "Fig. 1") +
geom_boxplot(alpha = .3)
Causes More people died the altitude below 7500m because of natural disasters compared to the altitude above 7500m. Death zone2 is defined as an altitude above 8000m, which can cause mountaineers to lose physical strength or even vital functions due to a low level of concentration of oxygen. I guess I could add some ratio variables for the death rates in different altitude zones? Like death zone and Khumbu Icefall?
=(Had issues with some columns even though I had already removed the missing values and it did work when I ran the chunk individually. However, it did not work when I knit the whole post. 😩
total %>%
drop_na(Altitude) %>%
ggplot(mapping = aes(x = Cause,
y = Altitude,
fill = Cause)) +
labs(title = "Causes of Death by Altitude",
tag = "Fig. 2") +
geom_boxplot(alpha = .3) +
theme(legend.position = "none",
axis.text.x = element_text(angle = 90))
Oxy
order <- c(1, 2, 3, 4)
total %>%
drop_na(Altitude) %>%
ggplot(mapping = aes(x = factor(Season, level = order),
y = Altitude,
fill = Oxy)) +
labs(title = "Oxygen Consumption by Season",
x = "Season",
tag = "Fig. 3") +
geom_boxplot(alpha = .3) +
scale_x_discrete(labels = c("1" = "Spring", "2" = "Summer", "3" = "Fall", "4" = "Winter")) +
scale_fill_discrete(labels = c("Yes", "No", "NA"))
Oxy by Age Most climbers are between 25-45 years old
total %>%
drop_na(Age) %>%
ggplot(mapping = aes(x = Age,
fill = Oxy)) +
geom_histogram(position = "stack",
alpha = .6,
stat="count") +
labs(title = "Histogram: Credit Limit vs Attrition",
tag = "Fig. 4")
Oxy by Altitude Higher consumption of oxygen above 7500m altitude.
total %>%
drop_na(Altitude) %>%
ggplot(mapping = aes(x = Altitude,
group = Oxy,
fill = Oxy)) +
geom_density(adjust = 1.5, alpha = .4) +
labs(title = "Altitude vs Oxygen Consumption",
tag = "Fig. 5")
Season Most expeditions happened in spring and some in the fall
order <- c(1, 2, 3, 4)
total %>%
drop_na(Altitude) %>%
ggplot(mapping = aes(x = Altitude,
fill = factor(Season, level = order))) +
geom_histogram(position = "stack",
alpha = .5) +
labs(title = "Altitude vs Season",
tag = "Fig. 6") +
scale_fill_discrete(name = "Season", labels = c("Spring", "Summer", "Fall", "Winter"))
Error: StatBin requires a continuous x variable: the x variable is discrete.Perhaps you want stat="count"?
comparison <- matrix(c(0.9199, 0.2818, 0.6264, 0.9287,
0.9327, 0.3152, 0.6044, 0.9426,
0.9503, 0.3963, 0.6154, 0.9604),
ncol = 4, byrow = TRUE)
colnames(comparison) <- c("Accuracy", "Kappa", "Recall", "Sepecificity")
rownames(comparison) <- c("i.SMOTE", "ii.UP", "iii.DOWN")
comparison <- as.data.frame.matrix(comparison)
kable(comparison) %>%
row_spec(3, color = "white", background = "#bdaeea")
Accuracy | Kappa | Recall | Sepecificity | |
---|---|---|---|---|
i.SMOTE | 0.9199 | 0.2818 | 0.6264 | 0.9287 |
ii.UP | 0.9327 | 0.3152 | 0.6044 | 0.9426 |
iii.DOWN | 0.9503 | 0.3963 | 0.6154 | 0.9604 |
index <- createDataPartition(total1$Dth, p = .7, list = FALSE, times = 1)
train <- total1[index,]
test <- total1[-index,]
glm <- train(
form = Dth ~ Age + factor(Oxy),
data = train,
trContrl =trainControl(method = "cv", number = 10),
method = "glm",
family = "binomial",
na.action = na.omit
)
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error: Stopping
knn1$bestTune
k
10 23
pred1 <- predict(knn1, test)
confusionMatrix(pred1, test$Dth,
positive = "1")
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 2814 34
1 216 57
Accuracy : 0.9199
95% CI : (0.9098, 0.9292)
No Information Rate : 0.9708
P-Value [Acc > NIR] : 1
Kappa : 0.2818
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.62637
Specificity : 0.92871
Pos Pred Value : 0.20879
Neg Pred Value : 0.98806
Prevalence : 0.02916
Detection Rate : 0.01826
Detection Prevalence : 0.08747
Balanced Accuracy : 0.77754
'Positive' Class : 1
knn3$bestTune
k
1 5
pred3 <- predict(knn3, test)
confusionMatrix(pred3, test$Dth,
positive = "1")
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 2856 36
1 174 55
Accuracy : 0.9327
95% CI : (0.9234, 0.9413)
No Information Rate : 0.9708
P-Value [Acc > NIR] : 1
Kappa : 0.3152
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.60440
Specificity : 0.94257
Pos Pred Value : 0.24017
Neg Pred Value : 0.98755
Prevalence : 0.02916
Detection Rate : 0.01762
Detection Prevalence : 0.07337
Balanced Accuracy : 0.77348
'Positive' Class : 1
knn4$bestTune
k
10 23
pred4 <- predict(knn4, test)
confusionMatrix(pred4, test$Dth,
positive = "1")
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 2910 35
1 120 56
Accuracy : 0.9503
95% CI : (0.9421, 0.9577)
No Information Rate : 0.9708
P-Value [Acc > NIR] : 1
Kappa : 0.3963
Mcnemar's Test P-Value : 1.509e-11
Sensitivity : 0.61538
Specificity : 0.96040
Pos Pred Value : 0.31818
Neg Pred Value : 0.98812
Prevalence : 0.02916
Detection Rate : 0.01794
Detection Prevalence : 0.05639
Balanced Accuracy : 0.78789
'Positive' Class : 1
R.I.P all the bodies and Hillary Step