This enterprise knowledge management platform serves as a model for companies struggling to provide large-scale customer access to well-managed, integrated, and secure information assets. Researchers at universities and national laboratories are now utilizing a valuable resource known as the National Science Knowledge Materials to further their work.
The initiative is a pioneering project, funded by the organization, designed to create an informative resource that fosters connections among research centers globally. Initiated two years ago by a team of five researchers, including Valerio Pascucci from the University of Utah, Michela Taufer from the University of Tennessee at Knoxville, Alex Szalay from Johns Hopkins University, John Allison from the University of Michigan in Ann Arbor, and Frank Wuerthwein from the San Diego Supercomputing Center.
“We converged as a collective group of scientists and computer scientists, recognizing the need for a material that would serve our scientific community,” Taufer said during a recorded webinar earlier this year.
The NSDF’s conceptual foundation lies in introducing a novel, transdisciplinary approach to inbuilt knowledge provision, granting seamless access to shared storage, networking, computing, and academic sources, thereby democratizing data-driven scientific discovery. The National Science Data Framework envisions a world where cutting-edge research is unencumbered by the boundaries of current understanding.
The NSDF provides a unified, modular, containerized knowledge foundation, bridging the gap in current computational infrastructure. By offering a single, domain-agnostic stack accessible via an interface, it seamlessly integrates core knowledge assets with connectors to various storage, compute, and networking resources across participating sites.
The NSDF pilot provides entry to the stack through a diverse range of storage repositories, including authority file methods, regional Ceph stores, OSG StashCache, Origin nodes, NRP storage pods, FIONAs, cloud object stores, and edge knowledge streams, as stated on the NSDF website.
The NSDF stack itself is compartmentalized into distinct components, including:
- A consumer-facing layer comprising command-line tools, domain-specific utilities, interactive notebooks (such as Jupyter), and data visualization dashboards.
-
A three-tiered programmable knowledge architecture comprising knowledge administration and computing connections, knowledge discovery and curation tools, advanced processing and analytics capabilities, interactive visualization instruments, and automated workflow management.
- A decentralized, scalable content delivery network comprising a core kernel and modular plug-ins, accessible via software development kits (SDKs), application programming interfaces (APIs), and microservices.
-
Helping providers of core knowledge materials develop capabilities for shipping, including features such as an information catalog, safety monitoring, provenance tracking, and containerized orchestration for seamless deployment and management.
With the Networked Storage Device Facility (NSDF) activated through this equipment, participating customers can leverage native storage and features as described on the NSDF website. Through Internet2, knowledge is disseminated among various institutions and universities at lightning-fast speeds, facilitated by a robust infrastructure featuring a 100-megabit backbone, with select sites enhanced to support even faster transmission rates of up to one terabit.
DoubleCloud, a Nationwide Science Knowledge Democratization Consortium (NSDDC), is hosting an NSDF Catalog, allowing users to discover and access vast archives of scientifically curated data. Approximately 65 analytics organizations have contributed to the DoubleCloud knowledge repository, a comprehensive catalog featuring listings from prominent institutions such as AWS OpenData, Arizona State University, the University of Virginia, and the University of the West Indies, among others.
“Our service offers a finely grained index of scientific knowledge at the file or object level, enabling the optimization of data flow processes and enhancing customer expertise by providing granular knowledge distribution methods from the buyer’s perspective.”
Because launched, the NSDF has expanded to numerous websites and methods, including Jetstream at the University of Arizona, Indiana University, and the Texas Advanced Computing Center (TACC) at the University of Texas, Austin; Stampede2 at TACC on the University of Texas, Austin; IBM Cloud sites in Dallas, Texas, and Ashburn, Virginia; Chameleon at the University of Chicago and TACC; CloudLab at the University of Utah, University of Wisconsin-Madison, and Clemson University; the Center for High-Performance Computing at the University of Utah; CloudBank in various AWS regions; the Open Science Grid (OSG); the Open Storage Network at multiple institutions; and CyVerse.
The NSDF pilot currently supports various analysis initiatives, including collaborations with IceCube neutrino observatory, which observes deep space from Antarctica; the Xenon1T dark matter detector at the Gran Sasso Underground Laboratory in Italy; and the Cornell High Energy Synchrotron Source (CHESS) at Cornell University, among others.
You’ll find additional information on the NSDF at /about/nsdf.