Progress Reports – Neurodata Without Borders

Below are progress reports from NWB core developments. It is beyond the scope of this page to cover all developments in the broader NWB community,

NWB:N: A Data Standard and Software Ecosystem for Neurophysiology (R24MH116922)

Project Summary

Brain function is produced by the coordinated activity of multiple neuronal types that are widely distributed across many brain regions. Neuronal signals are acquired using extra- and intracellular recordings, and increasingly optical imaging, during sensory, motor, and cognitive tasks. Neurophysiology research generates large, complex, heterogeneous datasets at terabyte scale. The data size and complexity is expected to continue to grow with the increasing sophistication of experimental apparatus. Lack of standards for neurophysiology data and related metadata is the single greatest impediment to fully extracting return-on- investment from neurophysiology experiments, impeding interchange and reuse of data and reproduction of derived conclusions. This gap motivates the launch of Neurodata Without Borders (NWB:N). The goal of NWB:N is to develop a standardized format and methods for neurophysiology data and metadata. Following a successful pilot, initial efforts have begun on the second phase. Using modern software engineering principles, a beta NWB:2.0 has been developed with a new modular software architecture and APIs that enable users to efficiently interact with the NWB:N data format, format files, and specifications. However, despite the innovations substantial software development remains to fully deliver on the promise of NWB:N. Based on the foundations of NWB:N, the goal of this project is to develop a next generation data format and software ecosystem to enable standardization, sharing, and reuse of neurophysiology data and analyses, enhancing discovery and reproducibility. To achieve this goal we will: 1) develop and maintain an accessible and sustainable open source software ecosystem for NWB:N, 2) design methods for integration of controlled vocabularies, provenance and modeling of data relationships to make data findable, interpretable, and (re)usable, and 3) develop tools for facilitating community adoption, extension, and curation of NWB:N for integration of new use cases.

09/18/2018 – 05/31/2019

Accomplishments

In order to accelerate release of NWB 2.0 to the community and enhance community adoption and extension, we have prioritized work in the current reporting period towards Aim 1 and Aim 3 and we have also made significant efforts towards community engagement and adoption.

Aim 1: To develop and maintain an accessible and sustainable open source ecosystem for NWB:N.

In the reporting period we had several main milestones for this Aim:

Hiring of staff and gathering of data analysis requirements: We have conducted an extensive search of candidates and have hired Ryan Ly as new staff at LBNL who will work full time on the project. To gather requirements we have worked in the reporting period with neuroscience labs, in particular the FrankLab (UCSF), BouchardLab (LBNL) and the Allen Institute for Brain Science and many other laboratories (e.g., ChangLab (UCSF), BuszakiLab (NYU) among others) as well as tool builders, including, DataJoint, Neo, BrainStorm, CalmAn among others.

Finalize NWB 2.0 format changes and release of NWB:N 2.0: In January 2019 we completed the full final release of NWB:N 2.0 including the release of: 1) the final NWB:N 2.0 data standard specification, 2) version 1.0 of the PyNWB python API, including many advances in the API to enhance usability, 3) the MatNWB Matlab API (MatNWB development is supported by Kavli/Simons/HHMI), 4) the specification language specification, 5) the HDF5 data storage specification, 6) the nwb-docutils with tools for documentation of the NWB:N data standard and extensions, and 8) extensive online documentation and tutorials. This has been a major milestone for this project and provides our users with a reliable standard and software tools; a critical step towards promoting adoption of NWB:N. As part of NWB:N 2.0 we have implemented a wide array of advances in the NWB:N data standard, e.g., support for tables, ragged arrays, explicit data referencing, and compound data types among others. These advances enabled us to significantly improve the organization and management of electrodes, spiking units, ROIs, sweeps, time intervals, spectral decomposition and many other critical data types. We published a pre-print detailing the advances in the NWB:N data standard in conjunction with the 2.0 release (see https://doi.org/10.1101/523035).

Develop and set up open source infrastructure for deployment and testing of NWB:N APIs: For PyNWB and HDMF we use a test-driven development practice. We implement both unit and integration tests. Code contributions are reviewed and approved by the core development team as part of the GitHub’s Pull Request (PR) processes before they are merged with the main code base. As part of this process, all unit and integration tests and tutorials are executed on all major platforms (Windows, Mac, Linux) and main Python version (e.g., 2.7.x and 3.5 and 3.6). We further perform automated testing of coding style guidelines via flake8 and code coverage tests. To ease use and installation, we release PyNWB and HDMF via the PyPI/PIP and conda Python package managers.

Support modular storage and streaming I/O via HDF5: NWB:N 2.0 and PyNWB support self-contained (i.e., all data in one file) as well as modular storage (i.e,. data distributed across several files). Using the concept of external links, users can distribute data across files while being able to access the data transparently across the files as if stored in a single NWB:N file [2]. PyNWB further supports iterative data write via the concept of data-chunk-iterators, with broad applications to data streaming, optimization for iterative import of large-scale data, and optimization of sparse data I/O [3]. PyNWB further supports advanced data I/O features, e.g., chunking and compression, for optimization of data layout to reduce storage and I/O cost [4].

First annual hackathon: On May 13-14, 2019 we have organized the NWB:N User Days aimed at user training, adoption, and use cases to facilitate adoption of NWB:N. Following the User Days, we then host on May 15-16, 2019 the NWB:N Developer Hackathon aimed at bringing together the NWB:N developer community to further development of NWB:N as well as integration of NWB:N with other tools [1].

Other main accomplishments: Many of the tools and methods for hierarchical data modeling and design of sustainable and usable data interfaces that we developed for NWB:N are application-agnostic and useful beyond the scope of neurophysiology and more broadly neuroscience. To facilitate reuse of our methods for data standardization we have refactored PyNWB and released the Hierarchical Data Modeling Framework (HDMF) as a separate package. With HDMF we provide the community a general software framework for modeling, creating, and interacting with new hierarchical data standards. To ensure software quality we established the same core software practices we use for PyNWB also for HDMF, including continuous testing and deployment via PIP and Conda. As part of the development of NWB:N 2.0 we also made initial progress towards our goal for integration of PyNWB with common data analysis libraries (e.g., numpy and pandas).

Aim 2: To design methods for integration of controlled vocabularies, provenance, modeling of data relationships and external data management systems.

This aim is divided into two major sub-tasks.

Task 2.1 in this grant is to extend NWB:N metadata standardization focusing on three key areas: (a) location of cells and stimuli in standardized spatial coordinate systems and anatomical ontologies, (b) a genotype data model, and (c) a schema for behavior, including sensory stimuli. We have established engagement in several of the use cases outlined as part of the conversion of laboratory data to the NWB:N standard by the team. We have an active collaboration with the BRAIN Initiative Cell Census Network (BICCN) Brain Cell Data Center group to co-develop and consolidate data models for anatomy, spatial coordinates and genotype metadata standards. We have integrated high density Neuropixels data and visual behavior experiments from the Allen Institute into the NWB:N 2.0 format. The Neuropixels data will be publicly release in Fall 2019. We are currently investigating the use of Bpod’s description language for describing behavior for its fit for purpose for this grant. We anticipate Task 2.1 to be a high priority in the next reporting period.

Task 2.2. is to develop methods for integration of provenance, data relationships and data management systems with NWB:N. To enable cross-referencing of diverse data, the previous NWB:N 1.0.x standard supported the creation of links. However, links are mainly useful in cases where we need to reference single objects, but their utility is limited in cases where we need to store large collections of references. To address this challenge, NWB:N 2.0 adds support for object and region-references, enabling efficient storage of large collections of references to groups, datasets, and subsets of datasets stored as values of datasets or attributes. The ability to store object references has been central to enable many of the advancements in the NWB:N 2.0 data standard schema and significantly improves documentation of relationships between different data products as well as data and their associated metadata in NWB:N 2.0. We are also currently working on adding support for specification of scales, i.e., the ability to associate and describe the relationship between datasets and select dimensions of other datasets. This will allow us to make currently implicit relationships explicit (e.g., the relationship between timestamps and the corresponding time dimensions of recorded data). Support for dimension scales will allow us to enhance the specificity of the format and enable programmatic introspection of data dimensions. We anticipate to complete a first draft implementation of this feature in this reporting period while evaluation and deployment will likely occur in the next performance period. To evaluate the requirements for integration with data management systems we have engaged with Vathes (DataJoint) to define possible strategies to enhance integration and interoperability between NWB:N and data management systems. We have also engaged with NIX, ZARR, ExDIR and HDF5 to foster integration with NWB:N and evaluation of alternate storage backends for NWB:N. In collaboration with Dr. Suren Byna (LBNL) and Dr. Spyros Blanas (OSU), we have initiated a project to evaluate and compare Zarr, HDF5, and other storage backends for NWB:N. This has resulted in the creation of a new Zarr-based storage backend for HDFM/PyNWB.

Aim 3: To develop tools for facilitating community adoption, extension, and curation of NWB:N for new use cases.

The goal of this aim is to enable the neurophysiology community to adopt and curate the NWB:N data format, and facilitate the integration of new use-cases.

In this reporting period we have focused on creating the fundamental tools and methods for creating, using and sharing extensions. As part of PyNWB (and HDMF) we provide dedicated APIs for creating specifications for new format extensions using the NWB:N specification language. Based on the specification of extensions, the PyNWB and MatNWB data APIs directly support read/write of data using extension without the need for custom user code. We currently support sharing of extension specifications via YAML and NWB:N files. We also released the nwb-docutils, which provide critical tools to support automatic generation of Sphinx-based documentation for format extensions. Using Sphinx supports direct integration with the ReadTheDocs service for continuous documentation and creation of documentation in common formats, e.g., PDF, HTML, ePub, among many others.

In addition, another main focus has been on evaluation of extension needs and creation of rules, guidelines, and design specifications required for establishing the NWB:N Hub, i.e., a web-accessible catalog of NWB:N extensions with support for testing, deployment and review of extensions. We have created draft guidelines for sharing extensions and rules for extension versioning. We have further evaluated strategies for sharing extensions, establishing the core design for the NWB:N Hub ). We anticipate to complete implementation and initial release of the core templates by the end of this reporting period. We engaged with science collaborators at the ChangLab and FrankLab among others, to develop relevant extensions to evaluate and refine our extension sharing strategies, methods, and tools. Finally, we have developed a draft for strategies for review of extensions and integration with the NWB:N core. We will work with the NWB:N Executive Board and Technical Advisory Board to approve and release these standard rules.

Community Engagement and Adoption.

To facilitate community engagement and adoption, our activities are broad and include both explicit interactions (multiple use-cases, guidance by the NWB:N-EB, video conferences, workshops, and hackathons), as well as intrinsic mechanisms via online community resources (e.g., Github, Slack, NWB:N-Hub, tutorials, and documentation). In this reporting period we have engaged in a large number of activities to foster engagement with the community and adoption of NWB:N; specifically:

Workshops, conferences, and hackathons: As mentioned above (see Aim 1), we have organized the NWB:N User Days (May 13-14, 2019) and NWB:N Developer Hackathon (May 15-16, 2019) aimed at bringing the experimental neurophysiology community together to further adoption and the development of NWB:N [1]. In addition, on March 4, 2019 we organized a tutorial on the NWB:N data standard and on using PyNWB and MatNWB at the Cosyne 2019 Workshops [5]. Our team, further participated in and presented NWB:N at the Neuro Reproducibility Hackashop (organized by Joshua Vogelstein) in March 2019. As indicated in Section C1, we also presented a poster on NWB:N 2.0 at the Neuroscience 2018 (SfN) conference in Nov. 2018 and published a pre-print describing the NWB:N 2.0 data standard in Jan. 2019. Finally, the PI presented a poster on NWB:N 2.0 and gave a talk about this project at the BRAIN PI meeting in April 2019.

Other significant contributors and other support: We have secured additional support by the Kavli foundation to support Benjamin Dichter (0.5FTE) as a consultant to the NWB:N effort to work with neuroscience laboratories on adoption of NWB:N and community outreach and training activities. The Kavli foundation is further providing travel support for participants at the NWB:N User Days and Developer Hackathon and has also provided additional support for the NWB:N tutorial at Cosyne 2019. HHMI Janelia is providing additional support for the NWB:N User Days and Developer Hackathon with regard to housing for participants as well as space and organizational support for the events. Kavli and the Simons foundation further are providing additional support to Vidrio for development of MatNWB. The Kavli foundation and MathWorks are further providing additional support for users via $10K seed grants to individual neuroscience laboratories to promote adoption. These activities are synergistic to this grant while focusing on additional activities not supported by this grant.

Outreach and engagement with tool developers: We have engaged with developers of a broad range of analysis tools, including, CalmAn, Neo, BrainStorm, MountainSort, Neuropixels, among others. This has resulted in preliminary efforts towards integration of NWB:N with the analysis tools. We have also engaged with NIX, Zarr, ExDIR and HDF5 to foster integration with NWB:N and evaluation of alternate storage backends for NWB:N. In collaboration with Dr. Suren Byna (LBNL) and Dr. Spyros Blanas (OSU), we have initiated a project to evaluate and compare Zarr, HDF5, and other storage backends for NWB:N. Many of the efforts listed here will be the focus of projects at the NWB:N Developer Hackathon.

Outreach and engagement with data standards committees: We have engaged with the INCF SBP subcommittee and submitted an application for evaluation of NWB:N as an INCF endorsed data standard (application in review). We have further engaged with the IEEE P2794 “Standard for Reporting of In Vivo Neural Interface Research.” We have participated in several phone calls with the IEEE P2794 team.

Outreach and engagement with industry partners: We are actively coordinating with Vidrio on development of MatNWB. We have also engaged with Vathes. Vathes is exploring integration of NWB:N with DataJoint and is also working with the Svoboda Lab and the Churchland Lab on integration of data with NWB:N. We have also engaged with Kitware on continuous integration for PyNWB. Kitware is also working on the development of visualization tools for NWB:N. The industry partners are not supported by this grant and their activities are independent of the research in this grant. However, the activities by industry partners are synergistic to this grant as they further adoption of NWB:N by end users and tool developers.

Outreach and engagement with neuroscience laboratories and projects: Many groups are already actively exploring adoption of NWB:N. This includes individual neuroscience labs, e.g., the FrankLab (UCSF), ChangLab (UCSF), Allen Institute for Brain Science, BouchardLab (LBNL), SvobodaLab (HHMI), MeisterLab (HHMI), ChurchlandLab (CSHL) among many others. Also, a cross-U19 interest group is exploring NWB:N and we have also engaged with U19 projects directly (e.g,. Soltesz, sharp wave ripple U19) .

Community online resources: To facilitate engagement and adoption, we offer a broad range of web-based community resources, including online documentation and tutorials as well as developer and user channels via Slack, GoogleGroups, and Twitter. All websites and software repositories are publicly available via GitHub, enabling users to freely access all codes and contribute via pull requests and our issue trackers. We distribute software documentations online via the ReadTheDocs services. We also host websites for NWB:N training events (e.g, hackathon and workshops) via GitHub. All online resources are conveniently accessible via our central https://neurodatawithoutborders.github.io/ website. Over the course of the project we have made significant improvements and updates across all our online resources. For further details see Section C2 and C3 of the report.

References:

[1] NWB:N User Days and Developer Hackathon website: https://neurodatawithoutborders.github.io/nwb_hackathons/HCK06_2019_Janelia/

[2] PyNWB tutorial on modular data storage: https://pynwb.readthedocs.io/en/stable/tutorials/general/linking_data.html#sphx-glr-tutorials-general-linking-data-py

[3] PyNWB iterative data write tutorial: https://pynwb.readthedocs.io/en/stable/tutorials/general/iterative_write.html#sphx-glr-tutorials-general-iterative-write-py

[4] PyNWB advanced data I/O tutorial: https://pynwb.readthedocs.io/en/stable/tutorials/general/advanced_hdf5_io.html#sphx-glr-tutorials-general-advanced-hdf5-io-py

[5] Cosyne 2019, NWB:N tutorial website: https://neurodatawithoutborders.github.io/nwb_hackathons/Cosyne_2019/

[6] Rübel O, Tritt A, Cain NH, Dichter B, Fillion-Robin J, Ozturk D, Frank LM, Chang EF, Sommer FT, Svoboda K, Grauer M, Schroeder W, Ng L, Bouchard K. NWB:N: Advances towards an ecosystem for standardizing neurophysiology. Neuroscience 2018; 2018 November 06; San Diego, CA, USA. https://abstractsonline.com/pp8/#!/4649/presentation/22546: 2018 November 06.

[7] Rübel O, Tritt A, Dichter B, Braun T, Cain N, Clack N, Davidson TJ, Dougherty M, Fillion-Robin J, Graddis N, Grauer M, Kiggins JT, Niu L, Ozturk D, Schroeder W, Soltesz I, Sommer FT, Svoboda K, Ng L, Frank LM, Bouchard K. NWB:N 2.0: An Accessible Data Standard for Neurophysiology. https://www.biorxiv.org/ [Preprint]. 2019 January 17. Available from: https://www.biorxiv.org/content/10.1101/523035v1. DOI: https://doi.org/10.1101/523035.

6/1/2019 – 11/30/2019

Accomplishments

Summary:

With the release of NWB 2.0 in January 2019, a main focus of our efforts in this reporting period has been to develop tools for facilitating community adoption, extension, and curation of NWB:N for new use cases (Aim 3). In particular, we have developed and released the Neurodata Extensions Catalog [E3.2, E4.1] and associated templates [E4.2], tools [E4.3, E4.9, E4.4], and guidelines [E3.3, E.3.4, E3.5, E3.6] as a unique novel resource that enables the NWB community to easily develop, share and maintain extensions to the NWB data standard (Aim 3). To ensure accessibility, reliability, stability, support, and functionality of NWB software, another main focus of our efforts has been on maintaining an accessible and sustainable open source software ecosystem for NWB (Aim 1). In addition to major enhancements to our software processes and new API features, particular highlights have been the NWB User Days and the NWB Developer Days in May 2019, which were attended by 43 users and developers from 29 major labs and research institutions (see [E1.3] for a detailed report). Finally, community outreach and adoption has been a main focus of our efforts, including: (1) NWB received a prestigious R&D 100 Award by the R&D World magazine in November 2019, (2) NWB is currently in review to become an INCF Supported Best Practice [E3.1], (3) we presented NWB at leading conferences, including at INCF, SfN, and IEEE Big Data, (4) we have engaged with tool developers to support integration of NWB with important neuroscience tools, e.g., CaImAn, SpikeInterface, BrainStorm, DataJoint, OpenSourceBrain, and DANDI among others, and (5) we have engaged with users to support integration of neuroscience data with NWB and public release of data. The Allen Institute for Brain Science has released its first — and the world’s largest — dataset of electrical brain activity in NWB:N format gathered using the new Neuropixels high-resolution silicon probe [E3.10]. Going forward, in addition to continued efforts on maintenance and development of NWB software (Aim 1) and community outreach and adoption, a major focus of our efforts will be on Aim 2 to design methods for integration of controlled vocabularies, provenance, modeling of data relationships and external data management systems with NWB.

Aim 1: To develop and maintain an accessible and sustainable open source ecosystem for NWB:N.

In the reporting period, we had several main accomplishments for this Aim:

Hackathons for user and developer training: On May 13-14, we held the NWB User Days aimed at training users and helping them convert their data to NWB:N, followed on May 15-16 by the NWB Developer Days aimed at engaging with tool developers, integrating NWB with community tools, and advancing development of NWB:N. 43 users and developers from 29 major labs and research institutions attended the event. Substantial progress was made towards converting data from 14+ different labs to NWB:N and integrating NWB:N with important data analysis tools (see [E1.3] for a detailed report).

Integration with community tools: NWB has been integrated with a number of data analysis and visualization tools, such as: 1) CaImAn for calcium imaging processing and analysis; 2) SpikeInterface, which facilitates access to more than six modern spike sorters; 3) Brainstorm, a powerful toolbox for MEG, EEG, ECoG, and invasive animal electrophysiology; 4) NWBExplorer, a web-based tool for visualizing intracellular electrophysiology data, 5) NWB Widgets, a set of dedicated Jupyter widgets for NWB, among others (see [E3.7]). Integrations with other major tools, e.g., RAVE, DataJoint, JRClust, Neo and others are also ongoing.

Improved software development processes for enhanced reliability and stability: We have made major improvements to our software development processes and software products. First, we have released general data structures used in NWB that are not specific to neurosciences as a new “HDMF-common-schema” package [E4.6] and moved functionality related to the HDMF common schema from PyNWB to the Hierarchical Data Modeling Framework (HDMF). This approach supports reuse of our products across research communities, helps insulate neuroscience users from technical details, and facilitates software support and maintenance. Second, to facilitate the management of our NWB software products and their interdependencies (e.g, PyNWB, HDMF, NWB Schema, HDMF Schema etc.), we have enhanced our software management practices to use git submodules to automatically link, manage, and bundle our software packages. Third, we have streamlined our continuous integration process by using Azure Pipelines. Fourth, we have enhanced testing of HDMF to automatically also test against PyNWB to ensure continued interoperability between our software. Fifth, we have continued our efforts to increase unit and coverage testing of PyNWB and HDMF.

New software features and bug fixes to enhance accessibility, stability and functionality: We have added many new features and fixed bugs in PyNWB, HDMF, and the NWB schema. Some select new features include: support for links to links; improved support for scalar data types; support iterating over large datasets along arbitrary axes; improved support for data chunk iteration; support for unique IDs in tables; enhanced support for search and selection of rows in tables; improved documentation of all data types; and improved support for creating extensions, among many others. NWB version 2.1 was released in Sep. 2019 to add support for these new features and address minor consistency and accessibility issues raised by the community. See also Aim 2, Task 2.2, and Aim 3 for additional features added in support of those aims. For a detailed list of features and bug fixes, we refer to our release notes [E3.11].

Aim 2, Task 2.1: Design and integrate standardized metadata models with associated controlled vocabularies and ontologies with NWB:N.

Accomplishments for this period include:

Hiring of staff: Pamela Baker has joint the team. She has a background in neurophysiology, modeling and software development. Baker has begun work on collecting the metadata requirements for the focus areas identified in the grant (anatomical location & structure, genetic tools, stimulus, and behavior) from the external use case labs as well as identifying key issues in the NWB task and stimulus representations with AIBS users.

Meeting with use case labs and gathering of metadata requirements: Baker has worked with Allen Institute scientists Marina Garrett and Douglas Ollerenshaw to discuss details about the visual change detection task (UC1). Baker also reached out to the external use case labs and set up meetings with the relevant personnel from the collaborating labs at the SFN meeting in October. We met with Nuo Li to discuss the movement planning data task (UC2) being used by the Mesoscale Activity Project (MAP) led by the Svoboda lab; Loren Frank offered to share with us 2 types of maze tasks (UC3) used in his lab; Brian Gereke, Vyassa Baratham and Max Dougherty to go over a 3D grasping task from Bouchard lab (UC4). We discussed the plans for the new NWB ontology and were able to get feedback from the use case labs on their critical needs and challenges with respect to task and stimulus representation. Details collected on the use cases are published and updated in [E4.5]. In addition to the use cases described in the grant, PB also met with Ken Harris (UCL) and Niccolo Bonnachi (Champalimaud) from the International Brain Laboratory (IBL) consortium to discuss the new task and stimulus ontology development. The IBL members agreed to share their experimental metadata for task, stimulus, genetic tools and anatomical registration with the NWB team as an additional external use case (UC5). IBL uses the same control software (Bpod) as the Kepecs and MAP labs, so this new use case should integrate easily with the original use cases determined in the grant, and the high profile of the IBL project makes their feedback valuable for encouraging wider community adoption.

Development of behavioural task ontology: We are teaming up with Adam Kepecs to leverage his recent R01 grant in the development of a language for representing tasks and behavior. As part of this collaboration, we also agreed to work on shared use cases, including the IBL consortium’s experimental task. During the SfN meeting, we met with Kepecs, postdoctoral researcher Marion Bosc, and lab engineer Michael Wulf to discuss their plans and how to coordinate our efforts. Baker had a follow-up video conference with Dr. Bosc in early December to further discuss the details of the task representation and has begun working on translating the use case behavioral tasks into this format. Baker is also planning to visit the Kepecs lab in Q1 2020 to offer feedback on the process of converting use case tasks into their format.

Development of stimulus ontology: Baker has had meetings with Allen Institute scientists to discuss the work on developing a new stimulus ontology that was started at the NWB Hackathon in April 2019. Baker has begun planning with developers, Allen Institute internal (Jim Berg, Luke Campagnola) and external Blue Brain Project (BBP) scientists to work on stimulus representation for intracellular electrophysiology (ICEphys) experiments (UC6) at the February Hackathon. We begin with the ICEphys stimuli because, while they are relatively simple and stereotyped, they share the protocol complexity (conditions, runs/sequences, trials, repetitions, delays) of the more elaborate sensory and behavioral experiments. Also, there are two large releases of ICEphys data planned in 2020, both from the Allen Institute and the BBP, so it is timely to consider ICEphys representation now.

Aim 2, Task 2.2: Develop methods for integration of provenance, data relationships and data management systems with NWB:N.

Accomplishments for this period include:

Evaluate new features for integration with data management systems: To evaluate the utility of NWB with data management systems, we have worked directly with DataJoint (a leading data pipeline management system used in the neurophysiology community) and the Frank Lab (UCSF) to develop a strategy for NWB-DataJoint integration so that labs can use both tools in tandem, where DataJoint helps labs manage tabular metadata, and NWB stores large data files and is used for sharing the full datasets. This integration is currently being tested by the Frank Lab at UCSF. As a result of this collaboration, we have created support for unique object IDs and scratch space to facilitate integration of NWB with data management and analysis.

Support unique object identifiers for integration with data management systems: Data management systems rely on the ability to uniquely reference data. To support unique identification and referencing of data stored in NWB:N, we have enhanced the NWB:N data standard and APIs to support unique object IDs using UUIDs. Every neurodata-type object (e.g., TimeSeries recording) is automatically assigned a unique ID on write, allowing for unique identification and referencing of data stored in NWB.

Support scratch space for data analysis: For exploratory and lab-specific analysis, users often need to cache unstructured intermediate results which are not intended as end-products for sharing. However, these products are often critical as intermediates to accelerate and facilitate analysis, while working towards the final data products. To ensure that users can effectively utilize NWB for data management and exploratory analysis, we have added a new scratch space, which allows users to store unstructured data during data analysis.

Cache schema in NWB:N files to ensure long-term accessibility: To facilitate integration of NWB:N with data archives (e.g., DANDI) and ensure that NWB:N files remain readable as newer versions of the schema are released, we have enhanced the NWB:N data standard and APIs to cache the schema used to generate a file in the file itself.

NWB+DANDI integration: We are also working closely with the team from the BRAIN Initiative-funded Distributed Archives for Neurophysiology Data Integration (DANDI) project to provide search, visualization, and computation features that are NWB-aware. In addition to the schema cache, a main focus of our engagements with the DANDI team has been on enhancements to NWB for data validation and provenance.

Zarr backend for NWB: Data storage backends are often designed to facilitate specific use-cases (e.g, long-term storage vs. active use). NWB recognized this early on, and HDMF has been architected carefully to facilitate the integration and use of new data storage backends. To evaluate and demonstrate the use of alternate storage backends, we have developed a prototype integration of Zarr (a Python array library) to enable storage of NWB files via collections of folders (Groups), JSON files (metadata), and flat binary files (datasets) [E4.7]. To evaluate the performance of Zarr compared to HDF5, we have engaged with the ExaHDF5 team, which resulted in a poster at SuperComputing 2019 [E2.1] and we also submitted a paper to IPDPS 2020 conference [E1.1].

Aim 3: To develop tools for facilitating community adoption, extension, and curation of NWB:N for new use cases.

The NWB:N standard supports the storage of most data types in neurophysiology. However, as new experiments, acquisition, and analysis methods emerge, users need to store non-standard and lab-specific data along with standard NWB:N data. In order to support emerging data types and use cases, we designed NWB:N to be modular and allow user-defined extensions to the standard. In this reporting period, we developed a set of tools and processes to facilitate the creation, sharing, and adoption of NWB extensions in the community, as well as the curation of the NWB:N standard through integration of extensions. Accomplishments for this period include:

Neurodata Extensions Catalog: To help users share their NWB extensions, find and collaborate on others’ extensions, and enable the neurophysiology community to curate and contribute to NWB:N via extension proposals, we developed and released the Neurodata Extensions (NDX) Catalog (formerly called “NWB:N-Hub”) [E3.2, E4.1]. The NDX Catalog stores for each extension a description and metadata record with the name, version, source code location, license, and maintainers. Each catalog entry is managed and maintained in a dedicated GitHub repository as part of the NDX Catalog GitHub organization. New extensions can be submitted to the catalog via a dedicated registry. Upon review and approval, new catalog entries (consisting of a new Git repository and associated sources and CI) are automatically generated by our NDX catalog smithy tool [E4.4]. Updates to existing catalog entries are managed via Git pull requests, supporting automatic testing, review, and approval. Importantly, the extension developers maintain ownership of their extension sources, while the NDX catalog manages metadata important for sharing, release and deployment.

NDX Catalog Web UI: To facilitate search and interaction with the NDX catalog for end users (often with little programming experience), we developed a dedicated web-based interface for the NDX catalog [E3.2]. The web UI allows users to explore extensions registered with the catalog, supports text search of extension metadata to help users quickly find extensions relevant to their use cases, and provides convenient access to relevant documentation. Planned work includes adding advanced search (e.g., by author, version, license, and source code language), tracking of older extension versions, and enhanced automated testing of extensions in the NDX Catalog.

Template and tool for creating NWB extensions: To help users write NWB extensions, we created an extension template and released a convenient software tool that guides users through the process of creating new extensions via a question-and-answer-based process, which makes it easy for novice users to create new extensions while helping to ensure the use of best practices [E4.2]. Users enter basic metadata about their new extension, such as the name, description, and authors of the extension. The templating tool in turn automatically creates the required extension folder structure, source code, and documentation with the user’s metadata filled in. Users then specify their new data types and/or functionality in Python or Matlab, following our examples and documentation for writing extensions. Finally, running a simple command generates the YAML specifications for the extension. This process is tested continuously using Azure Pipelines and has already been used by the community to make several extensions.

Tools for testing and documentation of extensions: After users create an NWB extension, they are encouraged to publish and share their extension with the community via the NDX Catalog. In order to ensure that all extensions in the catalog work as intended, are well documented, and do not duplicate functionality of existing extensions, we developed a quality control review process for extensions before they are registered in the NDX Catalog [9]. To help users pass this review, we created and provide tools for automatically testing and documenting extensions. These tools include “nwb-docutils”, a software package that generates Sphinx-based documentation for the new data types and functionality and allows for the creation of documentation in PDF and HTML formats through the ReadTheDocs service [7]. We also created a tool, “nwb-extensions-smithy”, to streamline the extension registration process: after an NDX Catalog maintainer approves an extension, this tool automatically creates a “record” repository for the extension in the catalog and adds the authors as maintainers of the repository [E4.3, E4.4]. Future work includes enhancing these tools to run user-defined unit and coverage tests of extensions.

Guidelines and strategies for versioning, sharing, and review of extensions: We had previously drafted documents to guide users in the development, versioning, and sharing of NWB extensions, as well as in the process of creating and reviewing proposal extensions for integration with the NWB core schema. Through our development of the NDX Catalog, development of extension-related tools described above, and feedback from users creating and registering extensions in the catalog, we have refined these documents and released them publicly on the Neurodata Extensions Catalog website [E3.2, E3.3, E3.4, E3.5, E3.6].

Developed proposal for extensions for intracellular ephys: To evaluate our methods and enhance NWB, we have engaged with the Allen Institute for Brain Science and Channelpedia (Blue Brain Project) to develop a proposal extension to enhance storage of intracellular electrophysiology metadata [11]. Labs commonly organize intracellular electrophysiology data hierarchically, with metadata associated with each level of the hierarchy. To encode this hierarchy, we created an extension with new data types for tables at each hierarchical level, where the rows reference other tables. We have been working closely with developers and users to add convenience functions for accessibility and ensure the new structure works well with existing data pipelines. We expect to complete development of the proposal extension at the Developer Hackathon in Feb. 2020, after which it will be open for community review for integration with the core schema.

Released extensions via the NDX Catalog: To demonstrate and evaluate the utility of the NDX catalog we released several existing extensions for simulation output, ECoG data, and miniscope metadata via the NDX catalog.

Community Outreach and Engagement:

Community engagement and outreach activities are central to foster adoption and use of the NWB:N data standard and software. To facilitate training of users, our activities are broad and include both explicit interactions (multiple use-cases, guidance by the NWB:N-EB, video conferences, workshops, and hackathons), as well as intrinsic mechanisms via online community resources (e.g., Github, Slack, NDX Catalog, tutorials, and documentation). In this reporting period, we have engaged in a large number of activities to foster engagement with the community and promote adoption of NWB:N; specifically:

Hackathons and tutorials: As mentioned above, we held the NWB:N User Days (May 13-14, 2019) and NWB:N Developer Hackathon (May 15-16, 2019) which brought experimental neurophysiologists and developers together to further adoption and the development of NWB:N (see [E1.3] for a detailed report). As described in our community outreach and engagement plan in Section D.1, we have a large number of training and outreach events planned in 2020, including two NWB hackathons and tutorials at Cosyne 2020 and INCF Neuroinformatics 2020 among others. These dedicated NWB training events provide a great opportunity for onboarding of new users, deep dives, and direct engagement with the community for dissemination and to collect critical feedback to ensure NWB:N meets user needs.

Community-driven training events: The Allen Institute for Brain Science also carries out training courses and workshops to disseminate the use of the data they have released in NWB format. Highlights include the “Summer Workshop on the Dynamic Brain: August 25 – September 8, 2019” and the “Exploring the Allen Brain Observatory: An Open Database of Cortical Cell Physiology”, SfN Satellite workshop. Since all the data, tools, and course materials are open access, it is possible for the neuroscience community to run and extend these workshops. An example for this is the Seattle Global Brainhack 2019 event organized by Ariel Rokem at the eScience Institute at the University of Washington to introduce the Neuropixels NWB data release, attended by members of the Allen NWB team. While these events have not been funded by this grant, they are important community events that help train users and facilitate dissemination of NWB. It is through outreach events such as these that we receive good feedback on usability and missing features.

Invited talks and workshops: In September, O. Rübel gave an invited talk on NWB at the OpenSourceBrain workshop in Alghero, Sardinia [E2.6]. The OpenSourceBrain project is adopting NWB for sharing of experimental data in conjunction with neurophysiology simulation models. In November, A. Tritt and B. Dichter gave invited talks on NWB at a neuroscience data management workshop in Trondheim, Norway [E2.8]. Invited presentations provide a great opportunity for outreach to new users and engagement with new projects interested in, or already committed to, adopting NWB:N. At both events, we connected with users and developers and discussed integration of various data sharing platforms with NWB:N, as well as strategies for facilitating data sharing, management, and standardization in the community. For example, two important outcomes of our participation at the OpenSourceBrain workshop have been the engagement with the Channelpedia team from the BlueBrainProject and we are also currently exploring opportunities for a joint hackathon in London with the OpenSourceBrain team.

Publications in Journals and Conference: As shown in detail in Section E we continue to make our research publicly available via a broad set of publication activities. A. Tritt will present a paper on HDMF in December at the renowned IEEE Big Data conference [E1.2], we have contributed to a paper which compares Zarr and HDF5 in the context of neurophysiology submitted to IPDPS 2020 [E.1.1], and published a detailed report from the 2019 NWB User- and Developer Days [E.1.3]. We have further presented 9+ posters, talks, and demonstrations on NWB research at SfN, INCF, BRAIN Initiative Alliance Social, DOE Data Day, and other conferences and workshops [E2.1 – E2.9]. These events allow us to reach out to the international community, connect with existing users and engage with new users interested in adopting NWB. We have released two NWB newsletters [E3.13, E3.14], created curated lists for public NWB datasets [E3.8] and analysis and visualization tools that support NWB:N [E3.7], developed a new interactive course on NWB to help new users (particularly those without strong coding skills) get started using NWB [E3.9], and released a broad range of online resources (Sec. E.3). Finally all our software is available open source (Sec. E.4).

Public data releases: In the reporting period, NWB users have released a number of high-value datasets in NWB:N format to the neuroscience community:

In November, the Steinmetz Lab at Univ. of Washington has publicly released a massive dataset of spiking activity from 30,000 neurons for Steinmetz et al. in NWB.
A collaboration between DataJoint and the Svoboda Lab, with advising from the NWB team, has resulted in 11 high-value datasets from the Svoboda Lab being converted to NWB and shared publicly.
In October, the Allen Institute released the world’s largest dataset of electrical brain activity gathered using Neuropixels, a new high-resolution silicon probe that can read out activity from hundreds of neurons simultaneously [E3.10]. This data release consists of spiking activity from nearly 100,000 neurons from wild-type mice and 3 transgenic lines, across a variety of regions in the cortex, hippocampus, and thalamus during a visual task. A key feature of these experiments is simultaneously recording across as many as 8 visual regions, which will enable scientists to study inter-areal neural communication patterns in greater detail than ever before.
The Allen Institute also released a synaptic physiology data collection consisting of mouse and human datasets of simultaneous patch clamp recordings, accompanied by software tools to explore the data. The datasets describe 1368 chemical synapses from mouse primary visual cortex and 363 from human cortex. Software tools include online Jupyter notebooks and direct API access for download and manipulation of data.

The Allen Institute for Brain Science is also planning a large release of intracellular electrophysiology data in 2020. We are also working with developers of the Channelpedia platform of the Blue Brain Project towards a large-scale release of intracellular electrophysiology data in NWB:N 2.0 in 2020. We also anticipate further data releases from recipients of Kavli and Simons seed grants and other NWB user labs. We have compiled a list of select public NWB:N datasets on our website at [E3.8].

Public Recognition: In November, the NWB:N project has been selected for a 2019 R&D 100 Award by the R&D World magazine. The R&D 100 Awards have served as one of the most prestigious innovation awards programs for the past 56 years, honoring great R&D pioneers and revolutionary ideas in science and technology. NWB:N is also currently in review to become an INCF Supported Best Practice [E3.1].

Other contributors and support to others for community outreach and adoption: This summer, the Kavli Foundation provided seed grants to six labs to convert a high-value dataset to NWB or integrate NWB with a data analysis and visualization tool. The Simons Foundation also started a new pilot project on conversion of data from leading neurophysiology labs to NWB. With this additional support from the Simons Foundation, NWB community liaison, Ben Dichter, has worked closely with the Lisa Giocomo’s lab at Stanford Univ., Beth Buffalo’s lab at Univ. of Washington, and Richard Axel’s lab at Columbia Univ. to convert a representative dataset from each lab to NWB and create a plan for harmonization of their lab-specific data pipelines with NWB. The Kavli foundation further continues to provide additional support to Benjamin Dichter (0.5FTE) as a consultant to the NWB:N effort to work with neuroscience laboratories on adoption of NWB:N and community outreach and training activities. For the upcoming 2020 NWB:N User Days and Developer Hackathon (May, 2020) we have further secured travel support by the Kavli foundation for participants as well as support by HHMI Janelia with regard to housing for participants as well as space and organizational support for the events. The Kavli and Simons Foundations further continue to provide additional support to Vidrio for development of MatNWB. Finally, travel support for invited talks by NWB team members at the OpenSourceBrain workshop, data management workshop in Norway, and upcoming International Brain Initiative meeting in Tokyo has been provided by the respective event hosts.

Industry partners and external projects: We are actively coordinating with Vidrio on development of MatNWB. We are also working with Vathes and the Frank Lab towards integration of NWB:N with DataJoint. Further, Kitware is also working on the development of visualization tools for NWB:N. We have further engaged with the recently funded DANDI project to ensure NWB:N can support the DANDI needs. These industry partners and projects are not supported by this grant and their activities are independent of the research in this grant. However, these activities are synergistic to this grant as they further adoption of NWB:N by end users and tool developers.

Community online resources: To facilitate engagement and adoption, we offer a broad range of web-based community resources, including online documentation and tutorials as well as developer and user channels via Slack, GoogleGroups, and Twitter. All websites and software repositories are publicly available via GitHub, enabling users to freely access all codes and contribute via pull requests and our issue trackers. We distribute software documentations online via the ReadTheDocs services. We also host websites for NWB:N training events (e.g, hackathon and workshops) via GitHub. All online resources are conveniently accessible via our central nwb.org website. We recently consolidated the nwb.org and neurodatawithoutborders.github.io websites so that users can now conveniently access all sites related to NWB via nwb.org as the main central resource. Over the course of the project, we have made significant improvements and updates across all our online resources. For further details, see Section E.

Outreach and engagement with tool developers: As described above, we have also continued to engage with tool developers to integrate NWB with community data tools.

E: Products and References

E.1: Journal Publications, Conference Papers and Reports

[E1.1] Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas, “Predicting and Comparing Performance of Array Management Libraries,” submitted to 34th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2020), May 18-22, 2020, New Orleans, LA, USA (submitted)

[E1.2] A. J. Tritt, O. Rübel, B. Dichter, R. Ly, D. Kang, E. F. Chang, L. M. Frank, K. E. Bouchard, “HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards,” IEEE Big Data, December, 2019.

[E1.3] Oliver Rübel, Andrew Tritt, Ryan Ly, Benjamin Dichter eds., “Report: 6th NWB:N Hackathon at HHMI Janelia: User Days and Developer Hackathon,” Online Technical Report, June, 2019. [Online] at https://tinyurl.com/y5rt6dho (Online Report)

E.2: Conference Posters and Presentations

[E2.1] Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas, “Comparison of Array Management Library Performance – A Neuroscience Use Case,” SuperComputing 2019, November, 21, 2019, Denver, CO, USA [Online] at https://sc19.supercomputing.org/presentation/?id=rpost191&sess=sess348

[E2.2] O. Rübel, A. J. Tritt, B. Dichter, R. Ly, M. Dougherty, V. Baratham, T. J. Davidson, L. Ng, L. M. Frank, K. E. Bouchard, “Extending the NWB:N neurophysiology data standard: Methods and applications,” Neuroscience 2019, October 23, 2019, Chicago, IL, USA, Poster #706.05

[E2.3] B. K. Dichter, M. Dougherty, V. Baratham, K. Nasiotis, A. J. Tritt, O. Rübel, E. F. Chang, K. E. Bouchard, “Neurodata without borders: Neurophysiology as a cross-species standard for electrocorticography,” Neuroscience 2019, October 21, 2019, Chicago, IL, USA, Poster #432.04

[E2.4] O. Rübel, A. J. Tritt, R. Ly, K. E. Bouchard, “Creating data standards for modern experimental and observational sciences,” DOE Data Day 2019, September 25, 2019, Livermore, CA, USA

[E2.5] B. Dichter, M. Dougherty, V. Baratham, K. Nasiotis, O. Woolnough, M. Feyder, A. J. Tritt, O. Rübel, N. Tandon, E. F. Chang, “Extending Neurodata Without Borders: Neurophysiology for Electrocorticography,” Neuroinformatics 2019, September 1, 2019, Warsaw, Poland

[E2.6] O. Rübel, NWB:N 2.0: An Ecosystem for Neurophysiology Data Standardization, Open Source Brain Workshop, Sept. 10, 2019, Alghero, Sardinia, Italy [Online] at https://tinyurl.com/uom5pqz

[E2.7] R. Ly, B. Dichter, and O. Rübel, “Meet NWB,” Tools & Tech: A BRAIN Initiative Alliance Social at SfN 2019, October 20, 2019, Chicago, IL, USA

[E2.8] A. Tritt and B. Dichter, NWB at Workshop – Getting your hands-on data management, Trontheim, Norway, Nov. 6-8, 2019 [Online[ at https://tinyurl.com/yx3vonoc

[E2.9] R. Ly, B. Dichter, and O. Rübel, NWB Demo at INCF Booth at SfN 2019, Neuroscience 2019, October 21 and 22, 2019, Chicago, IL, USA

E.3: Website(s) or Other Internet site(s)

[E3.1] Maryann Martone, Richard Gerkin, Roman Moucek, Samir Das, Wojtek Goscinski, Jeanette Hellgren-Kotaleski, Eric Tatt Wei Ho, David Kennedy, Trygve Leergaard, Mathew Abrams, “Call for community review of Neurodata Without Borders: Neurophysiology (NWB:N) 2.0–a data standard for neurophysiology”, October, 2019. [Online] at https://f1000research.com/documents/8-1731

[E3.2] Neurodata Extension (NDX) Catalog [Online] at https://nwb-extensions.github.io/

[E3.3] Oliver Rübel, Andrew Tritt, Benjamin Dichter, Ryan Ly, “Guidelines for sharing NWB extensions (NDX),” October, 2019. [Online] at https://nwb-extensions.github.io/sharing_guidelines

[E3.4] Oliver Rübel, Andrew Tritt, Benjamin Dichter, Ryan Ly, “Strategies for sharing NWB:N extensions (NDX),” October, 2019. [Online] at https://nwb-extensions.github.io/sharing_strategies

[E3.5] Oliver Rübel, Andrew Tritt, Benjamin Dichter, Ryan Ly, “NWB Proposal Review Process,” October, 2019. [Online] at https://nwb-extensions.github.io/proposal_review

[E3.6] O. Rübel et al., “Versioning NWB:N Specification Namespaces,” October, 2019. [Online] at https://nwb-extensions.github.io/versioning_guidelines

[E3.7] Analysis and Visualization Tools that support NWB:N [Online] at https://www.nwb.org/tools/

[E3.8] Select public datasets in NWB:N [Online] at https://www.nwb.org/example-datasets/

[E3.9] PyNWB online training course [Online] at 1) Course https://pynwb-course.netlify.com and 2) Sources https://github.com/NeurodataWithoutBorders/pynwb-course

[E3.10] Rob Piercy, “Allen Institute debuts new window into brain cell communication,” October, 2019. [Online] at https://tinyurl.com/vavql9m

[E3.11] Release notes for PyNWB, HDMF, NWB-schema are available [Online] at: 1) https://github.com/NeurodataWithoutBorders/pynwb/releases, 2) https://github.com/hdmf-dev/hdmf/releases 3) https://nwb-schema.readthedocs.io/en/latest/format_release_notes.html

[E3.12] Proposal for extensions of NWB to enhance support for intracellular electrophysiology metadata, [Proposal Online] at https://tinyurl.com/wmt4yq4 [Extension Online] at https://github.com/oruebel/ndx-icephys-meta

[E3.13] “NWB:N Newsletter Summer 2019,” August, 2019. [Online] at https://mailchi.mp/49a33d0ef48e/nwbn-newsletter-aug-2019?e=8ccd838458

[E3.14] “NWB:N Newsletter Winter 2019,” November, 2019. [Online] at https://mailchi.mp/2a6045ef0b56/nwb-newsletter-nov-2019

E.4: Software Repositories and Other Products

[E4.1] NWB Extensions GitHub Organization [Online] at https://github.com/nwb-extensions

[E4.2] NDX Template for creating new NWB extensions [Online] at https://github.com/nwb-extensions/ndx-template

[E4.3] Staged extensions repo for submitting new extensions to the NDX catalog, [Online] at https://github.com/nwb-extensions/staged-extensions

[E4.4] Ryan Ly, “NWB Extensions Smithy,” October, 2019. [Online] at https://github.com/nwb-extensions/nwb-extensions-smithy

[E4.5] NWB ontology project use cases [Online] at https://github.com/NeurodataWithoutBorders/ontology-project

[E4.6] Andrew Tritt, Ryan Ly, Oliver Rübel, “HDMF Common Schema,” August 2019. [Online] at https://github.com/hdmf-dev/hdmf-common-schema

[E4.7] Pull request with Zarr storage backend implementation [Online] at https://github.com/hdmf-dev/hdmf/pull/98

[E4.8] Slides for all talks and tutorials presented at the 6th NWB:N Developer Hackathon and User Days [Online] at https://drive.google.com/drive/folders/18oG1rRJpluXQJJQaH4xbz6u58LXPiZbI

[E4.9] NWB documentation utilities for creating documentation for extensions. [Online] at 1) Sources https://github.com/NeurodataWithoutBorders/nwb-docutils and 2) PIP https://pypi.org/project/nwb-docutils/

[E4.10] Hierarchical Data Modeling Framework (HDMF) [Online] at 1) Sources https://github.com/hdmf-dev/hdmf , 2) PIP https://pypi.org/project/hdmf/ , 3) Conda https://anaconda.org/conda-forge/hdmf

[E4.11] PyNWB [Online] at 1) Sources https://github.com/NeurodataWithoutBorders/pynwb, 2) PIP https://pypi.org/project/pynwb/, 3) Conda https://anaconda.org/conda-forge/pynwb

[E4.12] MatNWB [Online] at 1) Sources https://github.com/NeurodataWithoutBorders/matnwb 2) MathWorks https://www.mathworks.com/matlabcentral/fileexchange/67741-neurodatawithoutborders-matnwb