Loading…
This event has ended. Create your own event on Sched.
Data to Action: Increasing the Use and Value of Earth Science Data and InformationFor 20 years, ESIP meetings have brought together the most innovative thinkers and leaders around Earth observation data, thus forming a community dedicated to making Earth observations more discoverable, accessible and useful to researchers, practitioners, policymakers, and the public.

The ESIP Summer Meeting has already taken place, but check out the ESIP Summer Meeting Highlights Webinar: https://youtu.be/vbA8CuQz9Rk.
Thursday, July 18
 

10:30am PDT

Meet The Maintainers: commoning for data infrastructure durability
Because they care about and for the infrastructure that houses every bit of data, every byte of the cloud, and every line of code, maintainers sustain the technology infrastructure that makes Earth data use possible. Maintainers work in many arenas, of course, they keep energy grids up, roadways repaired, buildings secure. Data infrastructure experts are now in conversations with other maintainers. Recently, a group of maintainers: technicians, engineers, historians, social scientists, sysadmins (the ones you call on to reboot the system when it’s down) started a conversation and created a group called The Maintainers. With support from the Alfred P. Sloan Foundation, ESIP is bringing the Maintainer conversation to Tacoma. We’ve invited several of them to talk about the real issues involved in stewarding hardware and systems, not just data. By caring for your hardware, they let you focus on other tasks. Join us to discover how ESIP’s goals of sustaining the Earth science data endeavor rely upon those who chose not to innovate today, but rather to navigate the problematics of keeping everything running most of the time.

Session recording here.

Introduction:
Bruce Caron
Presentation Title: Culture, Kindness, and Care: Commoning for Earth Knowledge Sustainability
Slides: https://doi.org/10.6084/m9.figshare.8969912

Moderator: Mark Parsons
Slides: https://doi.org/10.6084/m9.figshare.8969915

Invited Presentations
Presenter: Emily Jane Sylak-Glassman
Presentation Title: The Importance of Maintaining Earth Observational Data for Long-Term Climate Record Reconstruction
Slides: https://doi.org/10.6084/m9.figshare.8969918

Presenter: Daniella Lowenberg
Presentation Title: Maintaining and Growing Research Data Publishing at CDL & Dryad
Slides: https://doi.org/10.6084/m9.figshare.8980454

Presenter: Jason A. Gallo
Presentation Title: The Scale and Value of Earth Observation Infrastructure
Slides: https://doi.org/10.6084/m9.figshare.8969924

Presenter: Fred C. Beach
Presentation Title: U.S. Energy Infrastructure: ‘What’s Past is Prologue’
Slides: https://doi.org/10.6084/m9.figshare.8969921

Session Take-Aways
  1. ESIP scientists and data scientists and project managers are only one part of the larger team that keeps Earth information active and durable. The Maintainer organization can be an “ESIP for the rest of the team”... where ESIP maintainers gather with others to solve their problems. ESIP will have a session at the next Information Maintainer conference in DC in October. Maintenance is about fostering and caring for relationships: Who decides to maintain is all of us. This requires awareness and kindness.
  2. Earth data resources often have multiple inputs, some of these quite complex. Just finding out and mapping these is an important maintainer activity. Also, archiving data and software at the same time makes good sense (Dryad and Zenodo). Maintenance is more complex than it seems (The US has no energy policy; predicting one variable (sea ice extent) requires dozens of inputs and complex interactions).
  3. Infrastructure (such as our energy infrastructure) can get to the point where the trillions of dollars needed to update it might be better spent replacing this with something highly distributed.



Speakers
avatar for Bruce Caron

Bruce Caron

Executive Director, New Media Studio
avatar for Mark Parsons

Mark Parsons

Editor in Chief, Data Science Journal


Thursday July 18, 2019 10:30am - 12:00pm PDT
Ballrm A
  Ballrm A, Breakout

10:30am PDT

Challenges and Opportunities in Adopting Cloud technologies for Data Intensive Science
The amount of data generated by public and private sector organizations has increased many fold in the last decade. In recent years, consumers and providers of data are faced with an increasing challenge of managing the quantity and quality of information produced. The advent of cloud technologies has been a boon for the big data era offering a solution for the information overload. While cloud technologies have provided an excellent opportunity, challenges and opportunities on utilizing cloud technologies are still to be explored. The complex business/infrastructure aspect of the cloud technologies paradigm and the rapid changes in the technical development have made transitions complex and confusing at times. In this session, we hope to share case studies of migration/utilization of cloud technologies for data intensive science. The challenges and opportunities revealed by those case studies we hope will inform stakeholders, collaborators, and other interested parties. We hope that the lessons learned will inform future work and help expedite progress in the field of Earth Science informatics.

Developing Applications Using Earth Science Data in the AWS Cloud with PODPAC

Matt Ueckermann
Observational and modeled data products from NASA encompass petabytes of scientific data available for analysis, analytics, and exploitation. Unfortunately, these data sets are highly underutilized by the scientific community due to: (1) vast computational resource requirements; (2) disparate formats, projections, and resolutions that hinder data fusion and integrated analyses across different data sets; (3) complex and disjoint data access and retrieval protocols; and (4) task specific and non-reusable code development processes that hinder algorithm sharing and collaboration. In response, NASA EOSDIS is actively investigating migration of their vast data archives to storage on commercial cloud services such as Amazon Web Services (AWS). However, to maximize the benefit of cloud-based data storage, cloud-based data analysis and analytics are needed to process data “close” to where it is stored. Recognizing that migrating workflows to the cloud requires a high degree of cloud computing expertise, we are developing the Pipeline for Observational Data Analysis and Collaboration (PODPAC). PODPAC is a Python library designed to automatically harmonize disparate data sources, seamlessly access NASA earth science data, and analyze data in the AWS cloud. PODPAC is built around the tools of the Python data ecosystem (NumPy, Scipy, X-Array) and aims to bridge the gap between data sources, analysis, and the cloud. In this talk, we will introduce PODPAC, and demonstrate on-demand cloud computation of a value-added derived product using NASA data. 
Opportunities for Accelerating Science in the Cloud
Christopher Lynnes
As the data holdings of the Earth Observation System Data and Information System expand over the next several years, the typical data analysis process of downloading data to local compute resources will become increasingly inefficient. However, cloud computing promises to mitigate that by allowing the user to process close to the data. These improvements will be obtained via a variety of mechanisms: 1 - improving the ability of data transformation services to reduce the data prior to analysis; 2 – providing cloud-native analysis capabilities for common analysis functions; and 3 – providing the ability to work directly with data in Web Object Storage.

The role of data stewards in a cloud-based platform
Amanda Leon

Google Earth Engine has a growing user community as a cloud-based platform for analysis and visualization of geospatial data. This adoption is heavily driven by the ease of access Earth Engine’s Data Catalog provides to a wealth of satellite imagery and other geospatial data.  As stewards of NASA EOSDIS data, Distributed Active Archive Centers (DAACs) can play a key role in supporting and maximizing the utility of Earth Engine for the scientific community.  The NSIDC DAAC has been assessing various data stewardship topics to support the sustainment and expansion of NASA EOSDIS data in Google Earth Engine including: 1) data inclusion decisions based on science use cases; 2) optimized workflows for preparing 

Open Source Data-Intensive Platform for the Cloud
 
Thomas Huang
JPL has a long history of building many innovative solutions for onboard instrument, ground operation and data system, archive and distribution for our missions. As the rate of data generate from our missions continue to increase and is expected to rise significantly in near future, JPL is engaging in in reusable data-intensive technologies for mission operations and to enable science. This talk discusses open source solution we have developed for the Cloud platform to address three challenges from our growing collections of scientific data: interactive analysis, in situ match-up, and search relevancy, and their applications.

Developing a roadmap for cloud services
Suresh Vannan

The Physical Oceanography Distributed Active Archive Center (PO.DAAC) will be the data repository for the Surface Water Ocean Topography (SWOT) mission. SWOT provides new challenges, and opportunities, to PO.DAAC, a large data volume (20 TB/day) and a new community of users (hydrologists). This presentation will show how PO.DAAC plans on addressing those. PO.DAAC first assessed what tools and services current and new users will need to discover, access and utilize SWOT data. This analysis provided information for developing a roadmap that shows what services PO.DAAC (and ESDIS) will migrate and/or develop in a Cloud-based environment for the user community.

Leveraging an interoperable scalable data platform to support Earth Observation Data
Sudhir Raj Shrestha (sshrestha@esri.com)
With an ever-increasing wealth of scientific data produced from various sources and platforms including earth observations, models and forecasts, comes exciting and challenging opportunities to exploit such vast amounts of data to produce valuable information products. These data are widely used for monitoring, and analysis of measurements that are associated with physical, chemical and biological phenomena across earth’s oceans, atmosphere and land masses by government agencies like NOAA, NASA, USGS and private industries. The volume, diversity, and complexity of multidimensional earth science data have posed challenges in the past with how it is shared with a diverse community, visualized intuitively, and integrated for answering scientific questions. With advances in geospatial science and technology, these data and analytics can now advantageously be hosted in the cloud. This will have a tremendous impact on how scientists, policy makers, and the public ingest, manage, analyze, visualize, and share complex scientific data. GIS software is evolving in step with the technology industry to help meet these challenges. In this presentation, I will discuss briefly, how the current technology trend is driving more scalable, interoperable and format agnostic capabilities. We will share how the ArcGIS platform supports this “Open Science” and share use cases in place in NOAA and NASA. We will also share recent advancements in the cloud, spatial machine learning and geospatial data science that support various domain of science applications.

Session recording here.

Speakers
avatar for Sudhir Raj Shrestha

Sudhir Raj Shrestha

Solution Engineer Researcher, Esri
Solution Engineer and Scientific Data enthusiast with keen interest in making data easily Discoverable and Interoperable. Passionate about geospatially driven Hydrological Modeling and Heuristic Soil Modeling and develop, implement new and innovative geospatial methods, techniques... Read More →
avatar for Amanda Leon

Amanda Leon

DAAC Manager, NASA National Snow and Ice Data Center DAAC
avatar for Thomas Huang

Thomas Huang

Technical Group Supervisor, JPL
avatar for Christopher Lynnes

Christopher Lynnes

Researcher, Self
Christopher Lynnes recently retired from NASA as System Architect for NASA’s Earth Observing System Data and Information System, known as EOSDIS. He worked on EOSDIS for 30 years, over which time he has worked multiple generations of data archive systems, search engines and interfaces... Read More →
avatar for Suresh Vannan

Suresh Vannan

Project Manager, NASA/Caltech Jet Propulsion Laboratory



Thursday July 18, 2019 10:30am - 12:00pm PDT
Ballrm BC
  Ballrm BC, Breakout

1:30pm PDT

How to build your data "groups" for optimal discovery?
How does your Earth Science community define a collection that is discoverable in catalogs AND  yet can be simply understood by humans? Many areas need to be evaluated such as definitions, elements, vocabularies and more...oh, my!  Help us create a cheat sheet to help the data management community by having fun.

Whether you come to our session or not, please help us by filling in the blanks for the following statement in Slido.
I work with _______ datatype and it is aggregated by ___________. An example answer would be: sonar data; single cruise + instrument type.

https://app.sli.do/event/dttuqvzw
or
Go to https://www.slido.com and enter the code #C690

Session recording here.

Speakers
avatar for Anna Milan

Anna Milan

Metadata Standards Lead, NOAA NCEI
~*~Metadata Adds Meaning~*~
avatar for Heather Brown

Heather Brown

Archive Data Management Specialist, Riverside for NESDIS/NCEI



Thursday July 18, 2019 1:30pm - 3:00pm PDT
Ballrm D
  Ballrm D, Working Session

1:30pm PDT

HDF Town Hall
Data in HDF file formats continues to play an important role for Earth Scientists in the U.S. and around the world. The HDF Group will update ESIP members on the state of HDF software and HDF5 Roadmap, and will share our experience on working with HDF5 in the Cloud. We will discuss our technical approaches, and lessons learned from different projects including a NASA ACCESS project that transformed NASA HDF data into GeoTIFF in AWS. We will also update ESIP members on our involvement in standardization efforts and demonstrate how HDF tools support ESDIS data from product initial design to production, and to compliance with the standards. We will encourage ESIP members participating in the session to share their experiences with the HDF software and to contribute to the HDF5 Roadmap.

Talks   
Google Colaboratory for HDF-EOS - Joe Lee Abstract: Google provides a free Jupyter notebook environment called Colaboratory (also known as Colab).  It is simple, easy, and awesome Python environment for data scientists. We present how NASA Earthdata in HDF can be used with Google Colab using the existing comprehensive example on HDF-EOS Tools and Information Center website (http://hdfeos.org/zoo). We also present how OPeNDAP can be used with Colab to achieve 100%-cloud data analysis.

Keywords: Python, Google Colab, Jupyter notebook, HDF-EOS, OPeNDAP, Cloud computing.
Slides: https://doi.org/10.6084/m9.figshare.8976464

Leveraging the Cloud for HDF Software Testing - Larry Knox

Abstract: In this talk we will discuss how we leverage the Cloud for HDF software daily regression testing including testing of the HDF5 parallel library on the Cloud cluster using Orange FS.
Keywords: HDF5, Cloud, CI testing.
Parallel Computing with HDF Server - John Readey

Abstarct: To deal with really big data you need to be able to harness the power of multiple machines, but many users are put off by the complexity involved in setting up a cluster and then figuring out to effectively utilize it.   However, by using HDF Server (HSDS) with Kubernetes, it’s much easier than you would think.  In this talk we’ll walk through some examples of using xarray, h5netcdf, and h5py with HSDS to illustrate how you can scale up your compute to match your data size.
Keywords: HDF5, h5netcdf, h5py

 HDF5 Roadmap 2019-2020 - Elena Pourmal

Abstract: In this talk we will give an overview of the new features of the upcoming HDF5 release 1.12.0, and outline the HDF5 roadmap for the next year. We will demonstrate new open source file drivers to access HDF5 files via Amazon Simple Storage Service (Amazon S3) and on Hadoop Distributed File system (HDFS). We will use this presentation to get feedback on the HDF5 roadmap from the ESDIS users and application developers.
Keywords: HDF5, Amazon S3, HDFS, Cloud, Object Store.

Session recording here.

Moderators
AJ

Aleksandar Jelenek

The HDF Group

Speakers
JR

John Readey

Developer, The HDF Group
LK

Larry Knox

The HDF Group
EP

Elena Pourmal

Engineering Director, HDF Group
HDF
avatar for Hyokyung Joe Lee

Hyokyung Joe Lee

Software Engineer, The HDF Group
Data Modeling: HDF Product DesignerData Format: HDF(-EOS) / netCDF / Parquet / ONNX / ArcGIS CRF / GDALData Service: OPeNDAP (Hyrax / THREDDS / Pydap) / ArcGIS EnterpriseData @Scale: Cloud / AWS S3 & Lambda & ECS / Docker & Kubernetes / Conda & DaskData Analytics: Big data / Apache... Read More →



Thursday July 18, 2019 1:30pm - 3:00pm PDT
Room 318
  Room 318, Breakout
 
Friday, July 19
 

10:00am PDT

Community Ontology Repository (COR) Administration, Development and Planning
This session will consist of two main themes: discussion of current and pending important administrative tasks; and hands-on exercises covering the development aspect toward improvement of the software itself, as well as possible complementary tools that could be integrated (e.g., ontology viewers/visualizers). Participants will gain understanding of how this particular instance of the ORR software is deployed on Amazon and how they can contribute in various ways including further core development and integration of new tools and client libraries to leverage the powerful API and SPARQL endpoint capabilities of the COR server.

NOTES: HERE

Session recording here.

Moderators
avatar for Annie Burgess

Annie Burgess

Lab Director, ESIP
avatar for John Graybeal

John Graybeal

Technical Program Manager, CEDAR and BioPortal, Stanford University
Metadata, semantics, and cool repositories for metadata and semantics.Cool Earth Science (or biomedical) projects that will change the world.Or at least, change the way we manage metadata about the world.

Speakers
avatar for Lewis McGibbney

Lewis McGibbney

Enterprise Search Technologist III, Jet Propulsion Laboratory
avatar for Carlos Rueda

Carlos Rueda

Sr Software Engineer, MBARI
My areas of expertise and interest include scientific data management, visualization, data integration and interoperability, programming languages, and semantic web. https://www.mbari.org/rueda-carlos/


Friday July 19, 2019 10:00am - 11:30am PDT
Room 317
 


Twitter Feed

Filter sessions
Apply filters to sessions.
  • Area
  • adoption
  • analysis-ready
  • Archiving
  • Assessment
  • Assessment dimensions
  • Big Data
  • CF
  • CF metadata
  • Citation
  • Climate Literacy
  • climDB
  • cloud
  • Cloud computing
  • collaboration
  • commoning
  • community ontology repository
  • Compliance
  • cor
  • crowdsourcing
  • cryosphere
  • cyberinfrastructure
  • data
  • data analysis
  • data citation
  • data integration
  • data intensive science
  • data management
  • data model
  • data packaging
  • Data product development
  • Data Rescue
  • data risk
  • Data Stewardship
  • data visualization
  • deep learning
  • Discovery
  • Documentation
  • DOI
  • Drones
  • Earth Science
  • Ecological Metadata Language
  • Education
  • Education Metadata
  • Educational resource assessment
  • EML
  • esiplab
  • Essential Variables
  • Evaluation
  • FAIR
  • FAIR Data
  • federation
  • geolocation
  • geoweaver
  • GIS
  • Government
  • granularity
  • hardware
  • HDF Group
  • hydrologic modeling
  • hydrosphere
  • IGSN
  • Improvement
  • Information Quality
  • infrastructure
  • interface design
  • international
  • Interoperability
  • Jupyter
  • Jupyter Notebooks
  • knowledge representation
  • LTER
  • maintainers
  • metadata
  • Mission Scale Data
  • Modeling
  • Multi-Cloud
  • Multi-Site
  • NASA
  • NASA DQWG
  • netCDF
  • NOAA
  • Ontology
  • ontology_engineering
  • PID
  • Planning
  • Public Sector
  • Python
  • R
  • Raster Analytics
  • remote sensing
  • repository
  • Research Object Citations
  • samples
  • Satellite Data
  • satellite imagery
  • schema-org dataset api
  • schema-org spatial temporal
  • schema.org
  • science
  • Science Communication
  • semantic technologies
  • Semantics
  • sensor networks
  • Services
  • Software
  • standardize data
  • Strategy
  • Subsetting
  • Sustainability
  • Sustainable Development Goals
  • sustainable education gateway
  • SWEET
  • Tools
  • Training assessment
  • Trusted data
  • Uncertainty
  • usability
  • Use
  • user communities
  • ux
  • vocabularly
  • vocabulary
  • water
  • water resources
  • working session
  • workshop