pacific research platform - big data and the earth …...research platform (prp), a consortium of...

13
1 Big Data and the Earth Sciences: Grand Challenges Workshop Summary of the Workshop May 31 to June 2nd, 2017 Hosted by: University of California San Diego (UCSD) Calit2’s Qualcomm Institute (Calit2/QI) at UCSD Center for Western Weather and Water Extremes Scripps Institution of Oceanography This workshop is supported by: NSF award ACI-1541349 CA AR Program award 4600010378 NOAA PSD award NA15OAR4320071 Scripps Institution of Oceanography’s Directors Office

Upload: others

Post on 08-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

1

Big Data and the Earth Sciences: Grand Challenges Workshop

Summary of the Workshop

May 31 to June 2nd, 2017

Hosted by: University of California San Diego (UCSD)

Calit2’s Qualcomm Institute (Calit2/QI) at UCSD Center for Western Weather and Water Extremes

Scripps Institution of Oceanography

This workshop is supported by:

NSF award ACI-1541349 CA AR Program award 4600010378

NOAA PSD award NA15OAR4320071 Scripps Institution of Oceanography’s Directors Office

Page 2: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

2

Highlights from the

Big Data and the Earth Sciences: Grand Challenges Workshop Convener: Scott L. Sellars

Planning Committee: Thomas A. DeFanti, Julie Kalansky, Anna Wilson, Minghua Zheng.

This summary provides an overview of key points, challenges, and priorities discussed at the Big Data and the Earth Sciences: Grand Challenges Workshop held from May 31 to June 2, 2017 in La Jolla, California. The workshop was hosted by the Pacific Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity, data-centric freeway system on a large regional scale” and the Center for Western Weather and Water Extremes (CW3E) of UC San Diego’s Scripps Institution of Oceanography. In addition to this summary, the four “Grand Challenges Lectures” and the final panel recording are available and are posted on the UC San Diego’s PRP website (https://www.youtube.com/user/Calit2ube). Outline: Section 1. Introduction Section 2. Workshop Context and Goal Section 2.1. Day One: Cyberinfrastructure and the Emergence of Data Sciences Section 2.2. Day Two: Computational Science - Modeling and Methods Section 2.3. Day Three: Caveats and Grand Challenges, the Path Forward Section 2.4. Meeting the Challenge – Paths Forward for Big Data in Earth Sciences Section 2.4.1. Education Section 2.4.2. Discipline Knowledge and Reward Structure for “Renaissance Teams” Section 2.4.3. Cyberinfrastructure and Big Data Partners in the Earth Sciences

Page 3: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

3

1. Introduction:

The objective of the workshop was to assemble researchers in the Earth sciences, computer sciences, and information technology to learn, collaborate, network together, and focus on the challenges they all face in harnessing and using Big Data in the Earth sciences. In the Earth sciences, hyper-dimensional data from satellite- and ground-based observations, and cutting–edge weather and climate models are ever increasing. Technologies emerging from projects like the National Science Foundation (NSF) funded Pacific Research Platform (such as the Flash I/O Network Appliance (FIONA) and an end-to-end 10- 100Gb/s network backbone for data transfers) seek to transform scientific data analysis capabilities with state-of-the-art networking technologies. The workshop was open to faculty, researchers, and graduate students in a broad range of fields such as computer science, machine learning, atmospheric science, hydrometeorology, civil engineering, oceanography and related fields with the goal to highlight key technological advances and Big Data methods that are emerging as possible tools to improve our understanding and the predictability of Earth systems. In addition, over 60 highly motivated undergraduate/graduate students from the SIO209/ECE285 Machine Learning for physical applications class attended the workshop and provided summaries for many of the talks. The convergence of advanced cyberinfrastructure, technologies, and Big Data approaches are emerging as powerful tools for understanding complex, large volumes of data, especially in the Earth and environmental sciences. Traditionally, the Earth sciences focused on fundamental theories of nature and mathematics, leaving the knowledge of cyberinfrastructure, non-physically based approaches (statistical modeling), and the emerging field of “data science” scattered among the many disparate Earth science disciplines. As we continue to get deeper into the era of the “fourth paradigm,” as described in essays based on Jim Gray’s vision of data science in the book, The Fourth Paradigm: Data-Intensive Scientific Discovery [Hey et al., 2009], there is a realization that linking cyberinfrastructure, technology, and Big Data is necessary to provide faster discoveries and a broader understanding of ways to study Earth systems. The Grand Challenges aspect of the workshop was to focus on bringing together thought leaders on how to bridge the various disciplines needed for the Earth science community to take full advantage of the latest tools and methods provided by cyberinfrastructure. The three main topics of discussion of Earth sciences research included:

Page 4: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

4

• Cyberinfrastructure technological advancements: for Big Data acquisition,

collection, management, storage, access, and collaboration. • Computational Science: statistical sampling, modeling and methods for Earth

sciences data exploration, analysis, understanding and interpretation. • Challenges: those faced in Big Data approaches for Earth science investigation

Each day had at least one Grand Challenge lecture, laying the foundation for the following sessions during that day. The overall message conveyed by all lecturers was that, although each of the Earth sciences’ disciplines requires independent knowledge and expertise, future Earth science research would depend upon the successful collaboration with cyberinfrastructure scientists and the integration of knowledge from each area. The four lectures were delivered by distinguished researchers and experts who have engaged at some level in the steps necessary for the integration of these areas:

● Dr. Larry Smarr, Founding Director of the California Institute for Telecommunications and Information Technology (Calit2), a UC San Diego/UC Irvine partnership, holds the Harry E. Gruber professorship in Computer Science and Engineering (CSE) at UC San Diego's Jacobs School.

● Dr. Michael Wehner, Senior Staff Scientists, Computational Research Division at the Lawrence Berkeley National Laboratory.

● Dr. Vipin Kumar, Regents Professor at the University of Minnesota, holds the William Norris Endowed Chair in the Department of Computer Science and Engineering, University of Minnesota.

● Dr. Padhraic Smyth, Professor, Director, UC Irvine Data Science Initiative and Associate Director, Center for Machine Learning and Intelligent Systems, UC Irvine.

2. Workshop Context and Goal

Over the two-and-a-half-day timeframe, the interactions between the participants allowed the common communication barriers that exist between the diverse disciplines to become less restrictive and field specific terminology to become more familiar. These interactions provided essential, comprehensive detail on the advanced methods and technologies presented, so that researchers in different Earth science disciplines can decide how best to harness and apply these methods in context of the science questions they are addressing. 2.1. Day One: Cyberinfrastructure and the Emergence of Data Science

Page 5: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

5

Technological advances in hardware and software have allowed data driven approaches to emerge as powerful tools that can be used in the era of Big Data and “deep analysis.” Big Data collaborations involve massive data transfers, storage, and specialized analysis approaches. John Graham, Senior Engineer at UC San Diego’s Qualcomm Institute/Calit2 stated early in his talk that “we can't even keep up [referring to technology], and that is a good thing.” His statement emphasizes the fast pace of innovation in the field of Big Data, technology, and data science, and that even the top centers struggle to keep up with these rapid advances. Dr. Larry Smarr kicked off the workshop presenting the progress made over the last decade in science data networking and architecture by universities. He also laid out his vision for a National Research Platform, the next iteration of the PRP that was originally envisioned in 2009, that would “link together universities across the country on a national scale”. Throughout the first day, terms like ESnet, CENIC, Internet2, XSEDE, Globus, Kubernetes, non-von Neumann processors, Rook, and Kepler Workflows were used. The use of these terms sent many in the audience seeking definitions online of the tool names, ideas, and processes that were discussed. Although the overarching session relied on discipline specific jargon, the benefits of the use of these technologies for handling Big Data were clear. Many participants were very interested to not only learn about the state-of-the-art in Big Data technologies and data sciences but also how to start the process of engagement with a computer scientists and technologists. Non-technical talks also focused on aspects of Big Data that are often overlooked by researchers. For example, UCSD librarian David Minor discussed how UC San Diego library is engaging in Big Data technologies to store and host scientific data. Questions from the participants emphasized the importance of education for scientists in the new technologies and software approaches to address Big Data in the Earth sciences. Dr. Ilkay Altintas, from UC San Diego’s data science office, emphasized the importance of students learning about these technological and software advances, and indicated that she teaches an online data science course that is taken by 25,000 students. Dr. Altintas discussed the data science software, Kepler Workflows, in her talk and how workflow software can be used to organize a scientific workflow environment for entire research processes (e.g., data transfer, pre-processing, analysis, and post-processing). She mentioned that students “get workflows,” highlighting the fact that millennials and current students, with the never-ending access to multi-dimensional data and analysis, tend to grasp these approaches that link together various research methodologies, software, and technologies.

Page 6: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

6

Dr. Michael Wehner’s Grand Challenges Lecture that afternoon emphasized the challenges that large-scale climate modeling projects present with the need to transfer and analyze the “copious” amounts of data that the numerical climate models produce. He discussed the current prominent methodologies used in large scale weather and climate science, including international climate modeling intercomparison projects. He suggested that in the era of Big Data, these projects may not be able to succeed without a strategic plan to deal with storing and distributing these massive datasets for research teams to access. Beyond access to data, he highlighted the serious challenges scientists face in analyzing the many model realizations, runs, and variables. He indicated that he has over 4 Petabytes (PB) of data, much of which is mostly accessible at Lawrence Berkeley National Laboratory (LBNL). His point became ever more evident throughout the two and a half days of talks and discussions. From the discussions on Day One, the enthusiasm for the Earth sciences to engage in these new technologies and capabilities was clear. The successes in the private sector with data science, from Google to Amazon, show that these capabilities are powerful. However, many scientists were not aware that research projects such as the PRP are on the forefront of these advances and are meant to help the many disciplines dealing with Big Data at universities. 2.2. Day Two: Computational Science - Modeling and methods

Beyond the technological capabilities, lecturers on Day two presented approaches in predictive modeling that are advancing rapidly. Dr. William W. Hsieh (University of British Columbia), Dr. David Gagne (National Center for Atmospheric Research), and graduate student Ata Akbari Asanjan (University of California, Irvine) discussed Extreme Learning Machines, Generative Adversarial Networks, and Recurrent Neural Networks, respectively. These algorithms are state-of-the-art machine learning methods being applied to pressing Earth science prediction problems such as precipitation, cloud, and streamflow forecasting. They are mostly available from open source software packages, such as TensorFlow, Torch, Caffe, and others, which provide some of the most sophisticated state-of-the-art machine learning algorithms in the industry and are available at no cost. In the Earth sciences, numerical weather and climate prediction models have also advanced, including data assimilation, higher space and time resolution, advanced physics and optimization, coupling of dynamic earth systems, and interpretability of the data. Many participants who have worked in modeling physical based systems using numerical approaches continue to raise caution about the lack of physical understanding of machine learning methods that rely on data-driven approaches. This

Page 7: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

7

point was made throughout Days Two and Three. Dr. Bruce Cornuelle, Senior Researcher and Oceanographer at Scripps Institution of Oceanography, led off his talk with the question: “How can we merge machine learning with data assimilation?” He then focused on a discussion about how physical models and data-driven models are competing in real-world prediction problems and how we need to bring these two closer together. He suggested that our efforts should focus on improved optimization for physical models and better diagnostics for data-driven models. In the end, he posed a powerful question that turned out to be more of a challenge to the computer science community, “Could a data-driven model infer the equations of motion from a sparse, incomplete, and noisy ocean dataset?” This is a grand question indeed that highlights the need for multi-disciplinary collaboration and inclusion of discipline specific knowledge to address these problems. Dr. Vipin Kumar showed how he and his colleagues are utilizing machine learning approaches to enable scientists to understand land use and land cover change dynamics on a global scale. He cautioned about the challenges that traditional data science approaches face when applied to Earth science data as well. His concerns include the “unstructured” nature of the data, the quality and/or scope of the data, and the sources of the data that include many different sensors and different space and time modalities. Although these issues do exist, he reframed these as exciting opportunities for the computer science arena. He showed examples of research on “Rare Target Class Using Imperfect Labels” [Mithal et al., 2017], and “Using Physics Guided Labeling to Handle Poor Data Quality” [Jia et al., 2016, 2017, Khandelwal et al., 2015]. In addition, a presentation from Dr. Stefan Liess, a colleague of Dr. Kumar, explained that harnessing pattern recognition approaches allowed climate scientists to detect and investigate a previously unknown stationary “Rossby Wave” pattern (a large scale atmospheric motion phenomenon) between the West Siberian Plain and the tropical Pacific that can impact global climate [Liess et al., 2017]. He reported that it was the use of Big Data approaches that led to this discovery. However, even with all these challenges, Jeanine Jones from California’s Department of Water Resources directly emphasized the dire need for practical and accurate long range weather forecasts for planning for water allocations, reservoir operation, and the preparation for possible floods and drought. She highlighted the fact that long-range precipitation forecasts for California have been an issue since the 1920’s and she posed a question about whether the Big Data and computational sciences community could help. This was just one of many forecasting challenges that were discussed over the two and a half days and stimulated many side conversations about possible

Page 8: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

8

collaborations to tackle these forecasting challenges. 2.3. Day Three: Caveats and Grand Challenges, the Path Forward... Dr. Padhraic Smyth, in the fourth and final Grand Challenges Lecture of the workshop, cautioned the participants that these promising new approaches are not always easy to apply directly to Earth science problems. He identified, for instance, that simply training a predictive model on data from one region, in general, will not transfer to other regions. This is something that Dr. Kumar on Day Two also highlighted this by demonstrating how a region specific model (California) for identifying burnt forests using satellite data, failed when applied to other regions (Montana) because of the complexities in the geography and other factors. Dr. Smyth shared another example of the challenges by reporting results from a study in a state-of-the-art pattern recognition algorithm trained to detect either guitars or penguins [Nguyen et al., 2015] that showed enormous accuracy when presented with pictures of one or the other (upwards of 98.90% accuracy for guitars and 99.99% accuracy for penguins). The issue was that it was also extremely confident (99.99% certainty) that a picture of an abstract pattern with similar colors to a penguin/guitar was a penguin/guitar. To a human observer, it is obvious that none of these patterns resemble a penguin or guitar. These and other issues exist with these powerful algorithms and highlight Dr. Cornuelle’s point about the importance of domain knowledge. 2.4. Panel Discussion: Meeting the Challenge – paths forward for Big Data in

Earth Sciences

The final panel on Day three, which included three of the four Grand Challenges Lecturers (Dr. Smarr, Dr. Kumar, and Dr. Smyth), provided a wealth of insight into the many ways forward. The panelists expressed that there are just as many opportunities that exist for Big Data approaches in the Earth sciences as there are challenges. The three main themes of the panel are summarized below. 2.4.1. Education It was described in the wrap-up panel that there may be a need to “take a hammer” to undergraduate course curriculum around Big Data and data sciences. As Dr. Smyth pointed out, in general, students with interests in data sciences from disciplines outside of the traditional Statistics and Computer Science “can’t just pop in” on courses that focus on statistical and computational methods because of the required prerequisites. These requirements will, inevitably, limit what curious students can explore.

Page 9: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

9

Some universities are noticing a shift toward interdisciplinary needs and offerings. For example, UC Berkeley’s freshman classes in data sciences will be used as connectors to other disciplines. Dr. Smarr reported that UC Berkeley has now appointed a new Vice Chancellor of Data Sciences and that Calit2 has been working to create multidisciplinary educational projects and teams to bridge many of the gaps across the disparate disciplines, including working with over 400 private sector companies to engage with students and the University. Dr. Kumar pointed out that ideal curriculums would allow a student to learn computer science, machine learning, systems thinking, as well as Earth sciences (or other disciplines for that matter), yet he agreed that it is unclear how to do this, given that most students are rooted in a much more specific domain, which requires a significant amount of time in order to learn effectively. This led an exchange that focused on the best way to interface computer science, machine learning, and Earth sciences. In addition, exploring questions like: “what kind of knowledge should be required from one side or the other in the limited amount of time that we have to educate our students?” This question needs further discussion. Dr. Kumar suggested that we need to “build the paradigm of machine learning that can incorporate the knowledge of these different disciplines.” It was agreed by the panelists that there is much work to be done to incorporate disciplinary knowledge in machine learning techniques and methods. In the end, the recognition that there is a dire need for people with skills in both camps was unanimous, but there was no clear answer on how best to integrate or coordinate their knowledge, or what should be expected from students and researchers who participate in this interface. Dr. Smarr fittingly stated, “It is a lot of fun to get a little bit out of your narrow specialty and actually solve a real problem.” The Data Science and Big Data landscape is evolving quickly in the Earth sciences, and if we are to keep up, continuous engagement and education is the key. 2.4.2. Discipline Knowledge and Reward Structure for “Renaissance Teams” “Cutting edge research requires teams,” as stated by Dr. Kumar during the final panel. Cross disciplinary engagement is very challenging and exciting, as viewed by academia, and yet Earth science peer reviewed journal articles, posters, and conference talks provide little recognition or credit for such engagement on the CVs of computer scientists. Thus these outputs are of limited value for computer scientists who collaborate with researchers in Earth sciences. For computer scientists, peer reviewed conference papers and presentaitons are the gold standard for publishing. This difference in publishing styles and rewards makes bringing the two communities

Page 10: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

10

together challenging. Dr. Smarr described what his colleague, Dr. Donna Cox from the National Center for Supercomputing Applications (NCSA), calls “Renaissance Teams.” The goal of these multidisciplinary teams are to learn enough about each other’s discipline to be productive. They are still quite rare, but are necessary for innovative approaches to be successful. One important observation is that multidisiplinary thinking is becoming required for addressing problems that deal in real-world challenges and impacting many disiplines. Science is in the transition away from problems belonging to traditional disciplinary boxes. However, there must be recognition, venues, journals, and workshops focused on the skills and achievements of these multidisciplinary teams. Fortunately more of these have been developing recently. Dr. Smarr emphasized that the key to success in creating these teams is making sure that the members are brought on “as equals,” something that is often challenging in scientific and academic arenas, as years of experience, and sometimes the academic hierarchical structure, often make it difficult for students, faculty, and staff to integrate and collaborate in this fashion. It was noted that the private sector, where most students will end up, is actively looking for people and students who have the skills and experience of working on Renaissance Teams. Yet, there is also no current reward structure for researchers who engage in these multidisciplinary endeavors, and thus there is little incentive to encourage young scientists to participate. The reward structure was brought up throughout the workshop, and there was agreement that there are major barriers to bringing together the disciplines. It seemed clear that if a reward structure was set up to support these types of teams and projects, that more students, scientists, and researchers would participate. 2.4.3. Cyberinfrastructure and Big Data Partners in the Earth Sciences Geosciences are major drivers for cyberinfrastructure investment and use. Yet, with these drivers in place, and even considering that there has been more standardization over the decades, there is still little national data set conformity. “There is still not a national Big Data cyberinfrastructure that would allow for very quickly pulling together disparate datasets,” stated Dr. Smarr. Any graduate student working in the Earth sciences knows this well, as obtaining and organizing data from various research groups and modeling centers takes up a major portion of their time. To alleviate this, from a research perspective, we need to have a national strategy for linking Earth science researchers and data.

Page 11: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

11

Multi-agency planning is challenging agencies to work together, and with the emergence of global coverage of remote sensing observations and modeling products, the situation has created what Dr. Smyth described as a “forcing function” for change in data sharing in the Earth sciences. Dr. Smyth recalled working at NASA’s Jet Propulsion Laboratory at a time when “data on tapes used to be mailed to researchers.” Beyond Big Data’s emergence and its impact on data sharing, Dr. Smyth highlighted that where we really need improvement is in metadata, which describes the data to be used in research (i.e., what is measured, what type of device measured it, and what units are used). He concluded that metadata are important and that these types of improvements are necessary for the longevity of the data and sustained community involvement. The vast community of computer scientists developing algorithms for these data sets need the standardization of metadata to ensure proper use and understanding of the data and he felt that this should be a priority for the multi-disciplinary projects. Beyond the academic use of Earth science data, the data can be used in many commercial applications. The panel discussed how they hoped the synergy between private sector and academia would evolve. Dr. Smyth stated that, “Companies are much more interested in the application of the knowledge instead of basic research.” A key example of an important byproduct of diverse investment by government, private sector and academia in Earth science research is the initial military investment into satellites, which have made tremendous contributions to private industries and the sciences. It was pointed out that hopefully the recent commercialization of satellite data will have similar byproducts, but in the end, companies have shareholders and stock prices that drive their priorities. So, getting companies interested in the application of Earth science problems is necessary for the business model to be successful. Additional thought on how to better develop this synergy is needed.

3. Summary of Recommendations There are many areas of opportunity to develop a sustainable engagement between cyberinfrastructure, network technologies, and Big Data in the Earth sciences disciplines. Recommendations include:

1. Developing an educational framework for undergraduate and graduate course curriculum around Big Data and data sciences that is constructed to train students to think about discipline specific data in a statistical and computational fashion. To accomplish this, discussion is needed around the development of a strategic plan.

Page 12: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

12

Computer science, machine learning and the Earth sciences are very different disciplines and understanding how to encourage cross-discipline education and engagement is the first step in effectively developing an educational framework that supports the idea.

2. In addition to a formal education framework, engagement of students on multidisciplinary teams (e.g., “Renaissance Teams”) can help create the mindset that fosters collaboration and builds professional relationships that are needed for sustained multi-disciplined projects. To ensure the success of these teams, a reward structure should be developed for multi-disciplinary teams, with clear incentives. As stated by Dr. Kumar this would include “building the paradigm of machine learning that can incorporate the knowledge of these different disciplines” and encourage multi-disciplinary teams to build a culture of “equals,” creating a less hierarchical structure than those often found in team management in academia.

3. With the growing amounts of Earth science data, it will be essential to create a national cyberinfrastructure plan, and secure the investments needed, to connect the research Universities that are dealing with Big Data in the Earth sciences to the broader community. There is still little national data set conformity and the Earth sciences need to improve and standardize the “metadata” for the sustainability of the data and its usage by disciplines both inside and outside of the Earth sciences. One potential direction to assist in accomplishing this goal is to develop a plan to engage private companies interested in the solutions of Earth science problems and their applications. Public-Private partnerships are needed to allow Universities to take advantages of the recent commercialization of satellite data and other Earth science data. By building on improved Big Data and data science education, encouraging multidisciplinary teams, and improving cyberinfrastructure, a robust synergy can be developed to encourage a public/private/academic collaboration that would enhance most scientific research and projects and provide students with real-world experience that will benefit their career and our sciences. References: Hey, T., S. Tansley, and K. Tolle (Eds.) (2009), The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft, Redmond, Wash. Jia, X., Khandelwal, A., Gerber, J., Carlson, K., West, P., and Kumar, V. Learning Large-scale Plantation Mapping from Imperfect Annotators. In IEEE Big Data (Big Data), 2016.

Page 13: Pacific Research Platform - Big Data and the Earth …...Research Platform (PRP), a consortium of universities in the Western U.S. that is building a “science-driven, high-capacity,

13

Jia, X., Khandelwal, A., Gerber, J., Carlson, K., Samberg, L., West, P., and Kumar, V. Automated Plantation Mapping in Southeast Asia Using Remote Sensing Data. (2017) In Department of Computer Science and Engineering-Technical Reports. Khandelwal, A., Mithal, V., and Kumar, V. (2015). Post Classification Label Refinement Using Implicit Ordering Constraint Among Data Instances, Proceedings of the IEEE International Conference on Data Mining. Liess, S., Agrawal, S., Chatterjee, S., & Kumar, V. (2017). A Teleconnection between the West Siberian Plain and the ENSO Region. Journal of Climate, 30(1), 301-315. DOI: 10.1175/JCLI-D-15-0884.1 Mithal, V., Nayak, G., Khandelwal, N., Kumar, V., Oza N., and Nemani, R. (2017). RAPT: Rare Class Prediction in Absence of True Labels. IEEE Transactions on Knowledge and Data Engineering. Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 7-12-NaN-2015, pp. 427–436). http://doi.org/10.1109/CVPR.2015.7298640 Editorial. (2012). Database bonanza. Nature Climate Change, 2(10), 703. http://doi.org/10.1038/nclimate1713