2018 himcm problem a roller coaster team 8593users.wpi.edu › ~npmclaren › documents ›...

2018 HiMCM Problem ARoller Coaster

Team 8593

Team 8593 HiMCM 2018 Page 1 of 13

Printed by Wolfram Mathematica Student Edition

SummaryMany modern rankings of roller coasters base the thrill of a roller coaster on the opinions of experts. According

to numerous credible sources, however, the thrill associated with riding a roller coaster is due to various factors, includ-ing the speed, the height, the g-forces experienced, and the number of inversions. G-forces especially contribute to thrillbecause they provide the rider with uncommon feelings such as weightlessness or an increase in weight. We used thisfact to create a model that ranked the top ten roller coasters out of a given data set, purely through the use of objectivecriteria.

The provided data set contained many missing values. Rather than look up each missing value individually, wemodified a machine learning algorithm to accurately impute the missing data based on patterns that that were observedbetween other categories of data. The results of the imputation were quite accurate and provided us with data that waswithin 13% of the actual value (which was researched for confirmation). With the completed dataset, we inputted thevalues into our model and then ranked the roller coasters based on the thrill score produced by the model.

The top ten ranking that our model produced was compared to other top ten rankings found online. While theonline rankings were based on subjective opinions, the ranking produced by our model was based on objective criteriasuch as height, duration, and maximum g-force. In addition, the negative g-force that a rider would experience on thefirst drop was calculated using the provided data, and was used in our model. All of the separate criteria were divided bythe average values of each category in order to ensure that the criteria were weighted evenly.

After developing our model, we developed a mockup of a mobile app that adjusted the weights of the criteriabased on user input. This allows for a top ten list that is personalized to the user, and thus, more accurate to the user’sneeds. In addition, we made a news article that describes our model and app.

News ReleaseTo those looking for that surge of adrenaline and heart-pounding excitement, nothing can quite measure up to

that perfect roller coaster. The perfect combination of breakneck speeds, terrifying heights, massive drops, insane twistsand turns, and g-force to feel weightless and massive can make any avid rider want to stay on forever. Unfortunatelymany are left wanting more from any standard, lackluster rollercoaster. What is needed is an ideal ranking system notbased on the writer’s opinions of quality, but instead objective data collected from roller coasters around the world. Thisobjective data was formed into an algorithm to generate a “thrill score” where the highest results were selected to be onthe top ten list. Based on generated “thrill scores,” the top ten roller coasters in order are as follow: Smiler in England,Kingda Ka in the United States, Steel Dragon 2000 in Japan, Altair in Italy, Fury 325 in the United States, Takabisha inJapan, Top Thrill Dragster in the United States, Leviathan in Canada, Millennium Force in the United States, and SteelVengeance in the United States. The algorithm summed the ratios of data from each rollercoaster to the average of eachcategory with an impact on quality: maximum height, maximum speed, track length, number of inversions, drop, time, g-force, and negative gs. These are standard aspects to a roller coaster which make it enjoyable, but the importance ofeach in identifying the perfect roller coaster varies from person to person. It is this disagreement between riders thatmakes it important for any algorithm for public to be modular enough to account for changes in preference.

Making this modular ranking system as accessible as possible to the public led us to design a user-friendly



Making ranking system possible public design user-friendlymobile app. With this application, any user can select the type of roller coaster, location, and the effect of various keyfeatures on ranking. Users will determine what makes a roller coaster most enjoyable and this will be applied as aweight to each term, affecting the rank. Please download the application from the App Store and the Google Play Storeand we look forward to hearing the responses to our product in the future



Problem RestatementCreate a mathematical algorithm that represents the “Thrill Factor” experienced by riders of various roller

coasters (Part 1). Utilize this algorithm to create a list of the ten best roller coasters. Then, compare your rankings withtwo online rankings (Part 2). Additionally, design an app that will help a user find the roller coasters that would providethem with the most excitement based on their personal preference (Part 3). Finally, design a newspaper article thatdiscusses the algorithm, results of the roller coaster ranking, and mobile application (Part 4).

Assumptions and JustificationsAssumption

The calculations assume that the cart does not experience friction and drag as it goes through the track. How-ever, other assumptions, such as the location of max speed, may incorporate friction and drag. Justification

Accounting for friction and drag, requires a lot of highly specific information, such as the mass of the cart, thecurrent angle of the track, or the coefficient of friction. This information is not already given by the dataset, and wouldbe near impossible to get for every single roller coaster. Therefore, it was decided that ignoring the work done byfriction and drag can be reasonably ignored

Assumption

It is assumed that when the cart is going down the drop, the acceleration it experiences is uniform, and the trackdescends linearly. Justification

It would be near impossible to find the actual acceleration and shape of the track with the current information, asmore information on the curve of the drop than just the initial angle of descent would be needed.

Assumption

The cart only has work done on it, via launch systems such as lift hills, electromagnetic propulsion, orhydraulics, at the beginning of the track. Afterwards, all kinetic energy never surpasses the initial energy at the start ofthe track. Justification

It is commonplace for carts to be moved up the initial incline on a rollercoaster and allow momentum to keep theroller coaster moving. After all the speed is exhausted due to friction and drag, the roller coaster stops and the ride isover. There, however, are rides that have multiple launch systems, in order to increase the length of the ride. It wouldunreasonable to look up every track to see the number of launch systems it has, so we assumed that every roller coasterhas one and only one at the beginning.

Assumption

Experienced riders will want the roller coaster to have the maximum value possible for all the categories that arebeing implemented into the algorithm. Justification

While certain aspects of roller coasters are not liked by everyone, it is reasonable to assume that experiencedriders will rate more extreme roller coasters as having a higher thrill factor. Even acknowledging this fact, different



having higher acknowledgingpeople have different tastes for specific aspects of the roller coaster. In order to make a model, we assumed that everysingle aspect is wanted by the hypothetical rider.

Assumption

Although certain individuals will prefer either wooden or metal roller coasters, for the ranking algorithm thematerial that the roller coaster is made out of will be ignored. Justification

For a ranking system that is supposed to be as objective as possible, it is reasonable to only account for theactual statistics of the roller coaster and ignore things that are dependent on individual taste. While the main rankingalgorithm will not take into account the material of the coasters, the user-friendly app could ask the user for his or herpreference and incorporate that into the algorithm.

Assumption

The max speed of the cart is located at the bottom of the first drop.Justification

If the cart experienced no friction and drag, the location where the cart experienced maximum speed would be atthe lowest part of the coaster. However, in the real world, which is where this data was collected, friction and dragconstantly slow down the cart. This means that the max speed is most likely going to be at the bottom of the first drop,as afterwards the work done by friction and drag will constantly reduce the maximum kinetic energy achievable by thecart. Theoretically it is possible to have a small initial drop, and then have a much greater drop afterwards. If the differ-ence in the cart's kinetic energy at the bottom of the initial and subsequent drop is greater than the loss of energy due tofriction and drag, then the max speed would be located at the bottom of the subsequent drop. This scenario, however, isvery unlikely to happen in a real roller coaster, so it is safe to assume that the bottom of the first drop is the location ofthe cart's highest speed.

Objective Scoring Algorithm

Part 1: Preliminary ResearchThe first step taken by our group was to research what aspects of a roller coaster induce thrill. Our group found

many different aspects that were not provided by the initial data, and found out the required information needed to findthem. Certain aspects had enough information to calculate, while others could not be calculated due to either a lack ofdata or our assumptions.

G-Forces:

There are 4 main types of g-forces: positive g’s, negative g’s, lateral g’s, and linear g’s.

Positive g’s occur when the cart is being pushed up by an external force. It causes a feeling of increased weight for therider. Common causes of positive g’s are when the cart pulls up from a drop and from banked turns.

Negative g’s occur when the cart is accelerating downwards. It causes a feeling of less weight for the rider. It is consid-ered by many to the most fun g-force. It is mainly caused when the cart sharply drops.



by many g-force. mainly sharply drops.

Lateral g’s are caused when the cart is making an unbanked turn, It causes the rider to be pushed towards the outer sideof the cart. Most roller coasters use banked turns in order to convert the lateral g’s into positive g’s.

Linear g’s are when the cart accelerates straight forward. This force pushes riders back into their seats. These typicallyonly happen when the cart is initially launched.

After some research, it was discovered that the factors that cause a majority of the g-force did not have enough data tobe accounted for. However, we still decided to make a formula that would incorporate these features if the data wasavailable.

Rotation:

A large amount of acceleration is experienced when the cart experiences rotation. This is commonly experienced whenthe cart makes a banked turn or goes onto an inversion.

Inversions:

The force felt when going on an inversion is caused by centripetal acceleration. The centripetal acceleration can becalculated by the formula:

Using the equations for kinetic energy and potential energy, one can obtain an equation for velocity squared that is interms of the difference in height from the top of the track.

Combining these two equations, one can create an equation that finds the centripetal acceleration of the cart in terms ofdifference in height from the top of the track and the radius of the loop.

An issue with perfectly circular loops is that as the cart descends down the loop, the distance from the top of the loopincreases, which increases the g-force being felt, oftentimes increasing it to unsafe levels. In order to get around thisfact, inversions are made in a clothoid shape. This shape reduces the radius as the velocity goes down, which helps keepa uniform centripetal acceleration. This is the formula of the g-force of the cart, with r being the radius at any point, vbeing velocity going into the loop, h being the height from the bottom of the loop, and being the current angle aroundthe loop.

This equation means that if we were given the radius and height of the inversion, we could find the g-force caused bycentripetal acceleration at any point.

Turns:

When turning, roller coasters generally have banked curves in order to convert some of the lateral g-force into positive g-force. The g's felt by the rider on a banked curve can be found by the following formula, where is the ideal angle wheregravity is the only force needed to keep the cart from sliding up or down the turn.



gravity only keep sliding up

This equation allows us to calculate the g-force on any banked turn, as long as we have the angle. This also assumeszero friction for the same reason as mentioned previously.

Finding g-force experienced on the maximum drop:

After some research, it was determined that negative g's are the most thrilling for riders. Due to this fact, we decided thatthe negative g's felt going down the largest drop were an important factor in determining the rating of a roller coaster.The first step in finding the g-force of the drop is to find the initial velocity of the cart. This can be found by the follow-ing equation, which was derived from the equations for potential and kinetic energy, as well as the law of conservationof energy.

After obtaining the value for v-initial, are able to use kinematic equations to obtain the acceleration of the cart.

This equation can be simplified to find the g-force of the cart. The equation them becomes:

This g-force is able to be determined by the given data.

Jerk:

Jerk can be calculated by taking third derivative of a position vs time function. It is the change in acceleration over time.Individuals can perceive jerk, with a higher jerk being more thrilling. In addition, the jerk is more enjoyable if, whengraphed, it resembles a continuous function, rather than a stepwise one. This would mean that there are no sudden largechanges in acceleration, and rather, the acceleration goes up in a uniform way. Jerk can be calculated by the followingequation:

We realized however, that since it was stated in our assumptions that the cart had constant acceleration, the jerk wouldbe zero.

Part 2: Developing the AlgorithmThe algorithm we created takes into account multiple variables and adds them together to produce a final Thrill Score.The algorithm takes 8 different variables into account. Each variable is then divided by the average value of the variablein order to scale all the variables evenly.



Part 3: Implementing Generated Data Using Machine LearningFor numerous categorical variables provided, data was missing. To simply discard this data would have detrimen-

tal effects on the “Thrill Score” of these roller coasters. In addition, attempting to research every single missing valuewould be an inefficient use of time. At times the data may even not be accessible. The method we decided to utilize inorder to account for the missing values in the sparse dataset is to use Multiple Imputation. In statistics, Multiple Imputa-tion is the use of machine learning to estimate missing values in a dataset. A simpler method would be to utilize singleimputation, which would simply replace the missing values with the mean of the dataset. Multivariate Imputation ByChained Equations (MICE) is a cutting-edge method of statistical imputation. By creating linear regression models withrespect to the variable with missing values and variables with known values, the computer is able to provide an accurateestimate for the missing values. In the statistical software R-Studio, we modified a machine learning algorithm andutilized the MICE package to perform the Multiple Imputation of the dataset.

In order to achieve the most accurate results from the Machine Learning we had to manually sort the dataset.First, we removed all non numeric variables except for name and park so that we could identify the roller coasters afterthe values had been generated. Additionally, we made sure that no regression models were created by using the non-numeric variables. This would result in inaccurate predictions because the program would assume that variables, such aspark or name, have a significant impact on the maximum speed. This is not the case so we removed these variables toincrease the accuracy of our Machine Learning Algorithm. The next step for cleaning the dataset was converting thetime values of minutes:seconds to just seconds. This would allow us to perform more calculations on the dataset.Finally, we researched the duration of all the roller coasters and inputted them into the Excel sheet so that the programwould have more information to utilize. The variables that the program had to impute values for were: G-Force, VerticalAngle, and Drop.

Ranking ComparisonTo find alternative rankings, a simple search was performed online. Source 1 determines its list of top ten roller

coasters by tallying open-community votes. Users that create an account may up-vote roller coasters they enjoy mostand down-vote roller coasters they enjoy the least. Source 2 derived their rankings based on personal experiences ofriders and included non-operational roller coasters which defeats the point of recommending roller coasters for futureriders.



Out[ ]=

Top Ten Comparison

Rank Algorithm Source 1 Source 2

1 Smiler Millenium Force Takashiba

2 Kingda Ka Steel Vengeance Altair CCW-02043 Steel Dragon 2000 Top Thrill Dragster 10 Inversion Roller Coaster

4 Altair Maverick Fury 325

5 Fury 325 El Toro Leviathan

6 Takabisha Fury 325 Dodonpa

7 Top Thrill Dragster Intimidator 305 Hades 360

8 Leviathan Kingda Ka Steel Dragon 2000

9 Millennium Force The Voyage Kingda Ka

10 Steel Vengeance Apollo's Chariot Top Thrill Dragster

Mobile Device ApplicationIn an attempt to develop a user-friendly mobile app to allow users to find the best roller coasters, we felt it was

important to allow user input when creating lists in addition to the overall rankings. Leaving all parameters uncheckedwould produce the original algorithm’s rankings seen in the above table but manipulating the parameters of preferenceswill affect how each is weighted, thus producing a new arrangement of roller coasters. Figure 1 demonstrates how thefinished application will function and demonstrates the design of the application with random rankings not based on ouralgorithm.



Out[7]=

Figure 1: Application Concept

ConclusionThere were two main problems that we had to address before we could input the data into our algorithm and rank

the roller coasters. The dataset provided contained numerical values that were formatted in a way that Microsoft Excelcould not perform calculations on. An example of this being duration, which was formatted into mm:ss. To performcalculations on these values, we converted all of the quantities to seconds. A second major issue we had to solve beforewe could utilize our algorithm was the fact that hundreds of values were missing from the dataset. To solve this prob-lem, we ran our machine learning algorithm on the dataset and derived all the missing values.

After the data was inputted into the “Thrill Score” model and the results were recorded, we wanted to analyze our resultsto determine if our solution worked correctly. We noticed that for the roller coasters with vertical angles close to 90degrees, the g-force of the cares was close to 1. This makes sense because if a roller coaster was accelerating straightdownward (at a 90 degree angle) the acceleration due to gravity would be 9.8 m/s^2 or 1 g. In addition, we checkedsome of the imputed values with their actual values by researching the roller coaster online. The imputed values wereextremely accurate. For example, when analyzing the imputed values of drop height, we noticed that some values werewithin 16 feet of the actual value. We also wanted to determine if there was a difference between our roller coasterranking with the imputed data and without the imputed data. We found that without the imputed data, the resulting topten list had very view similarities with the top ten lists found online. However, after generating a top ten list using theimputed data, the list we generated had more terms that matched other lists, albeit in a different order.



Future ExtensionsIn the future, we would try to collect more verifiable data rather than resorting to using a machine learning

algorithm to generate all missing information. The data would focus on inversions and banked turns, which would allowus to calculate the centripetal acceleration at those locations More data on the initial descent would also try to be col-lected, as this would allow for non-uniform acceleration, enabling us to compare the jerk of each roller coaster. Wewould also like to research more on how the g-force riders experience are affected by different ride types, as well asaccount for them in our model.



BibliographyChen, Tianqi, and Michaël Benesty. “XG Boost Presentation.” The Comprehensive R Archive Network, ComprehensiveR Archive Network (CRAN), cran.r-project.org/web/packages/xgboost/vignettes/xgboostPresentation.html#basic-prediction-using-xgboost.

Martin, Karen. “Seven Ways to Make up Data: Common Methods to Imputing Missing Data.” The Analysis FactorRSS, 12 Sept. 2018, www.theanalysisfactor.com/seven-ways-to-make-up-data-common-methods-to-imputing-missing-data/.

Mitchell, Aric, and Leora X. “Top 27: Best Roller Coasters (in the World).” 10 Dangers Of Tap Water You ShouldKnow About (Never Drink This Stuff), 11 Oct. 2017, www.ideahacks.com/best-roller-coasters-in-the-world/.

Ridgway, Andy. “The Thrill Engineers.” Science Focus - BBC Focus Magazine, Science Focus, 16 Aug. 2018, www.-sciencefocus.com/future-technology/the-thrill-engineers/.

Swalin, Alvira. “How to Handle Missing Data – Towards Data Science.” Towards Data Science, Towards Data Science,31 Jan. 2018, towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4.

“The Best Roller Coasters in the World.” Ranker, www.ranker.com/crowdranked-list/best-roller-coasters?ref=al-so_ranked&pos=1&a=0&l=1589685

rollercoasterimputed

2018 himcm problem a roller coaster team 8593users.wpi.edu › ~npmclaren › documents ›...

Documents