Last year was very intense for the cTuning foundation and dividiti - we continued working closely with AI, ML and systems communities to automate experimentation while improving reproducibility and reusability of results based on our long-term vision.
First of all, we had a chance to apply the new ACM artifact reviewing and badging policy (which we had co-authored a year earlier as a part of the new ACM taskforce on reproducibility based on our prior Artifact Evaluation experience) at the ACM CGO’18 and PPoPP’18. The good news is that we had yet another record number of submissions: nearly half of accepted papers submitted artifacts for validation totaling to 30! We also switched from EasyChair to HotCRP for artifact submission and evaluation since the last one provided a more convenient and anonymous communication mechanism between authors and reviewers during evaluation to continuously solve issues! This also motivates us to remove a “technical clarification” session during Artifact Evaluation at the future conferences since our reviewers already communicate with authors during evaluation!
We also noticed that our Artifact Appendix Template which we had prepared several years before in an attempt to unify CGO,PPoPP and PACT AE is now used at other conferences including SuperComputing (see Artifact Description in this SC’17 paper which will be used for CLUSTER competition at SC’18)! We plan to collaborate with CLUSTER colleagues as a part of the ACM pilot projects to automate artifact evaluation and introduce workflow frameworks to convert these artifacts to our Collective Knowledge format (see SC’16 paper and CGO’17 article with artifacts and workflows shared in the CK format).
Finally, we got a very positive feedback about our open artifact evaluation by the community from the past CGO/PPoPP’17 which we consider using even more in the future (see our motivation):
At the same time, we noticed several ambiguities with the new policy for “artifacts available” and “artifacts reusable” badges.
After consulting with our ACM colleagues, we updated reviewing criteria for “artifacts available” badge at http://cTuning.org/ae/reviewing.html:
The author-created artifacts relevant to this paper will receive an ACM "artifacts available" badge only if they have been placed on a publicly accessible archival repository such as Zenodo, FigShare or Dryad. A DOI will be then assigned to their artifacts and must be provided in the Artifact Appendix! The authors can also share their artifact via ACM DL - in such case they should contact AE chairs to obtain DOI (not yet automated unlike above repositories).
Criteria for “artifacts reusable” badge turned out to be even more vague particularly for systems research where experimental workflows often involve very complex and continuously changing algorithm/software/hardware stack. Many authors considered that having their artifact public with a ReadME, a few ad-hoc scripts to build and run experiments, and a Docker image is enough to get “artifacts reusable” badge.
Each year we see all the burden and suffering of our evaluators to figure out how to deal with numerous ad-hoc, non-portable and often failing scripts, non-unified data formats, and ad-hoc validation. That is why we argue that some sort of common workflow frameworks combined with cross-platform package managers and common APIs must be used to make an artifact easily reusable, portable and customizable (see Collective Knowledge, spack, easybuild , etc.). Such automation and unification can help to make Artifact Evaluation sustainable in a longer term particularly when more artifacts are submitted.
However, since we do not want to enforce our views and didn’t yet manage to reach a satisfying conclusion, we started discussing these issues at the open Artifact Evaluation discussion session at CGO/PPoPP’18. We also described some of these issues and possible solutions in our CNRS’17 presentation “Enabling open and reproducible computer systems research: the good, the bad and the ugly”.
At the same time, we continue working with ACM, Raspberry Pi foundation and the community to improve automation and experiment crowdsourcing using our open-source Collective Knowledge platform as well as sharing of artifacts and workflows as portable, customizable and reusable components with a common Python API and JSON meta-information.
You can see practical examples of such “plug&play” artifacts and workflows in our recent interactive and reproducible CK-based article “A Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques”. It presents our long-term educational initiative to teach students how to benchmark and co-design software and hardware stack for self-optimizing computer systems in a collaborative and reproducible way. It has all workflows, artifacts and results shared as portable, customizable and reusable CK components via GitHub and FigShare to let the community validate, reuse, improve and build upon them while crowdsource experiments via our public CK repository. Furthermore, we want researchers to quickly reuse and compare against shared performance results for common benchmarks and datasets on specific platforms and compilers during feedback-directed compilation and autotuning, rather than spending considerable effort rebuilding and rerunning such experiments!
We are also very excited to organize the 1st ACM ReQuEST tournament based on Collective Knowledge platform with a consortium of leading universities (Cornell, Washington, Toronto, EPFL, Cambridge) and a strong advisory board: http://cKnowledge.org/request . This novel competition series focus on reproducible and Pareto-efficient co-design and optimization of the whole application/software/hardware stack for AI, ML, deep learning and other emerging workloads in terms of speed, accuracy, power and costs. More importantly, a growing number of participants will continue improving the common and CK-based optimization workflow and sharing portable and customizable AI/ML blocks optimized across diverse models, data sets and platforms from IoT to HPC. The benchmarking results and winning SW/HW/model configurations will be visualized on a public interactive dashboard and grouped according to certain categories (e.g. embedded vs. server). They can be also reproduced, reused, improved and compared against, thanks to the common CK framework. Our eventual goal is to share all winning algorithms and related artifacts as “plug&play” CK components with a common API to let the community immediately validate, customize, reuse and build upon them thus removing technology transfer gap and enabling open systems/AI/ML research!
The first edition of ReQuEST will serve mainly as a testbed for our approach, framework and repository, so we decided to limit submissions only to deep learning algorithms for image classification. It will be collocated with ASPLOS'18 - ACM conference on Architectural Support for Programming Languages and Operating Systems, which is the premier forum for multidisciplinary systems research spanning computer architecture and hardware, programming languages and compilers, operating systems and networking. Authors of Pareto-efficient or original submissions will be invited to present their findings at the associated ACM workshop. At the end of the tournament, we will provide a report to our advisory board presenting the outcome of the tournament, issues, possible solutions and next steps.
You can find more details about ReQuEST long term vision in the following documents:
We look forward to collaborating with all of you in 2018 to automate research and experimentation, improve reproducibility of published results, develop efficient systems for AI and other emerging workloads, accelerate AI/ML/systems research, make a breakthrough in AI and enable intelligent systems everywhere!
Miscellaneous resources (2017)
- Artifact Evaluation at SC’17 (uses our Artifact Appendix template)
- NIPS’17 paper implementation challenge
- Digital Infrastructures for Research (2017)
- Computational Reproducibility at Exascale Workshop (CRE2017)
- Workshop on Open Source Supercomputing (OpenSuCo-2017)
- ACM SIGCOMM 2017 Reproducibility Workshop (Reproducibility’17)
Example of artifacts in the ACM Digital Library:
- https://dl.acm.org/citation.cfm?doid=2807591.2807619 - SC’16 paper which used Collective Knowledge to share artifacts
- https://doi.org/10.1145/3159940 (see source materials)
- CGO'18 replication package (artifact) with a linked paper
- Example of artifact badges in ACM Digital Library
- ACM SIGPLAN's Empirical Evaluation Checklist (beta)
- Popper: Practical Falsifiable Research
- The 1st Open Science platform launched by Aarhus University, Denmark
- OpenNeuro platform – Open and Reproducible Science as a Service
- The Journal of Open Source Software
- Collaborative Open Computer Science
- Reproducible Science Project
- JSON for Linking Data
- Common Workflow Language, v1.0 (GitHub)
- Data Science Workflow: Overview and Challenges
- Automated tool wrapper/converter for Common Workflow Language
- Open Dataset and Software Track at ACM Multimedia Systems 2017
- Reproducibility and Comprehensive Assessment of Next Generation Sequencing Bioinformatics Software
- The CodeMeta Project
- Research Data at Springer Nature
- Permanent archives acceptable for ACM “artifacts available” badge:
- Portable workflows using Collective Knowledge Framework
- Spack: portable package manager for HPC (we plan to connect CK and Spack in the future)
- Scons: a software construction tool (we added support for Scons to CK)
- Facebook Buck (a fast build system)
- Genome Analysis Toolkit 4 (GATK4) as open source resource to accelerate research
- NextFlow: A DSL for data-driven computational pipelines
- LabPal: Easily run experiments on a computer
- Anonymous Github: a proxy server to support anonymous browsing of Github repositories for open-science code and data
- Singularity containers
- VC++ Packaging Tool
- Google CoLab
Benchmarks and data sets
- Example of an interactive and reproducible article on autotuning and machine learning
- Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge
- What is a Good PhD? By Lasse Natvig
- Addressing threats to reproducibility through research transparency
- Empirical Software Engineering using R
- Hidden Technical Debt in Machine Learning Systems
- New method to ensure reproducibility in computational experiments
- Why should I believe your supercomputing research?
- The hard road to reproducibility
- How to run a lab for reproducible research
- Taming the Complexity of Artifact Reproducibility
- What Should the Scientific Community of Tomorrow Look Like?
- Barbagroup reproducibility syllabus
- Déjàvu: A Map of Code Duplicates on Github
- A guide to sustainability models for research software projects
- 10 Ways to keep your successful scientific software alive
- Linux Foundation Launches Open Data Licensing Agreements
- The Bootstrap Blog
- Management Science
- A truly reproducible scientific paper?
- RCE Podcast Looks at Reproducibility of Scientific Results
- Posters presented at SIAM CSE17 PP108 Minisymposterium: Software Productivity and Sustainability for CSE and Data Science
- How I learned to stop worrying and love the coming archivability crisis in scientific software
- Software for reproducible science: let’s not have a misunderstanding
- Dynamic curation of artifacts and experiments is changing the way digital libraries will operate
- Report on the 1st IEEE Workshop on The Future of Research Curation and Research Reproducibility (2016.11)
- Joint CGO/PPoPP Artifact Evaluation Discussion Session slides
- Enabling open and reproducible computer systems research: the good, the bad and the ugly
See all related resources here.