data mining disasters a report mary mcglohon sigbovik commission for workplace safety

17
Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Post on 19-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Data Mining Disasters

A Report

Mary McGlohonSIGBOVIK Commission for Workplace Safety

Page 2: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Data Mining Safety

•Data mining disasters are a hazard to the progress of scientific research.

•We will review some common mining disasters and make recommendations for prevention

Page 3: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Numeric Overflow

In 2007, numeric floods were responsible for over $600 million in property

damages.-Department of Made-Up Statistics

““’’’’

Page 4: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Numeric Overflow

ERROR::NUMERICOVERFLOW Nobody expected the breach of the levees

Page 5: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Numeric Overflow

•Also caused loss of several hundred nerd-hours.

•1 nerd-hour = 1 grad-student-hour = 0.25 faculty-hours = 6 undergrad-hours

Page 6: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Numeric Overflow

•Recommendation: A drowning researcher’s best bet is to grab onto a floating log.

Page 7: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Power Law Failures

•Occurs when confusing heavy-tailed distributions such as:

• Power Law (incl. Pareto, Zipf)

• Lognormal

• Weibull

• Burr

• Log-gamma

• Log-Log-Log-Log-Mushroom-Mushroom

Page 8: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Power Law Failures

•Many natural phenomena have heavy tails.

• Magnitude of earthquakes

• Size of human settlements

• Degree distribution of “real” graphs

• Time-to-response in CS professors email

• Your mom

•However, confusing heavy-tailed distributions confused results in...

Page 9: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety
Page 10: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Power Law Failures

•Related danger: Statisticians, computer scientists, and physicists wasting valuable nerd-hours in religious arguments over which heavy-tailed distribution is being followed.

Page 11: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Power Law Failures

•Statisticians get mean when they get religious. (SIGBOVIK07)

•Recommendation: Calm the hell down.

Page 12: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Decision Tree Forest Fires

•Pruning is used to prevent overfitting.

•When overpruning occurs, trees are burned to stumps.

•This spreads, torching entire forests.

(Aww...)

Page 13: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Decision Tree Forest Fires•Recommendation:

Researchers should obtain burning permit before pruning with fire.

•Smoking while researching is not recommended-- if you choose to do so, make sure your “butts are out”.

Page 14: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Voting Fraud by One-Armed Bandits

•Cascading failures from other fields may cause disasters in data mining.

•Fatal mistake: combining related subfields voting mechanisms and one-armed bandit problems.

Page 15: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Voting Fraud by One-Armed Bandits

•One-armed bandits commit voting fraud by:

• Impersonating real voting machines.

• Cramming cake into voting machines.

• (The cake is a lie.)

Page 16: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Other safety measures

•Cool mining helmets

Page 17: Data Mining Disasters A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety

Conclusion

•The Commission for Workplace Safety hopes this has raised awareness of potential data mining disasters.

•When faced with data-mining disasters,

• Remain Calm.

• Blame it on one-off errors, lack of rigor in proofs of correctness, or whatever government agency is funding the project.