cTuning foundation - enabling collaborative and reproducible AI/ML/computer systems/quantum R&D

Last year was very intense for the cTuning foundation and dividiti - we continued working closely with AI, ML and systems communities to automate experimentation while improving reproducibility and reusability of results based on our long-term vision.

First of all, we had a chance to apply the new ACM artifact reviewing and badging policy (which we had co-authored a year earlier as a part of the new ACM taskforce on reproducibility based on our prior Artifact Evaluation experience) at the ACM CGO’18 and PPoPP’18. The good news is that we had yet another record number of submissions: nearly half of accepted papers submitted artifacts for validation totaling to 30! We also switched from EasyChair to HotCRP for artifact submission and evaluation since the last one provided a more convenient and anonymous communication mechanism between authors and reviewers during evaluation to continuously solve issues! This also motivates us to remove a “technical clarification” session during Artifact Evaluation at the future conferences since our reviewers already communicate with authors during evaluation!

We also noticed that our Artifact Appendix Template which we had prepared several years before in an attempt to unify CGO,PPoPP and PACT AE is now used at other conferences including SuperComputing (see Artifact Description in this SC’17 paper which will be used for CLUSTER competition at SC’18)! We plan to collaborate with CLUSTER colleagues as a part of the ACM pilot projects to automate artifact evaluation and introduce workflow frameworks to convert these artifacts to our Collective Knowledge format (see SC’16 paper and CGO’17 article with artifacts and workflows shared in the CK format).

Finally, we got a very positive feedback about our open artifact evaluation by the community from the past CGO/PPoPP’17 which we consider using even more in the future (see our motivation):

https://github.com/thu-pacman/self-checkpoint/issues/1
https://gitlab.com/michel-steuwer/cgo_2017_artifact/issues/1
https://github.com/SamAinsworth/reproduce-cgo2017-paper/issues/6

At the same time, we noticed several ambiguities with the new policy for “artifacts available” and “artifacts reusable” badges.

After consulting with our ACM colleagues, we updated reviewing criteria for “artifacts available” badge at http://cTuning.org/ae/reviewing.html:

The author-created artifacts relevant to this paper will receive an ACM "artifacts available" badge only if they have been placed on a publicly accessible archival repository such as Zenodo, FigShare or Dryad. A DOI will be then assigned to their artifacts and must be provided in the Artifact Appendix! The authors can also share their artifact via ACM DL - in such case they should contact AE chairs to obtain DOI (not yet automated unlike above repositories).

Criteria for “artifacts reusable” badge turned out to be even more vague particularly for systems research where experimental workflows often involve very complex and continuously changing algorithm/software/hardware stack. Many authors considered that having their artifact public with a ReadME, a few ad-hoc scripts to build and run experiments, and a Docker image is enough to get “artifacts reusable” badge.

Each year we see all the burden and suffering of our evaluators to figure out how to deal with numerous ad-hoc, non-portable and often failing scripts, non-unified data formats, and ad-hoc validation. That is why we argue that some sort of common workflow frameworks combined with cross-platform package managers and common APIs must be used to make an artifact easily reusable, portable and customizable (see Collective Knowledge, spack, easybuild , etc.). Such automation and unification can help to make Artifact Evaluation sustainable in a longer term particularly when more artifacts are submitted.

However, since we do not want to enforce our views and didn’t yet manage to reach a satisfying conclusion, we started discussing these issues at the open Artifact Evaluation discussion session at CGO/PPoPP’18. We also described some of these issues and possible solutions in our CNRS’17 presentation “Enabling open and reproducible computer systems research: the good, the bad and the ugly”.

At the same time, we continue working with ACM, Raspberry Pi foundation and the community to improve automation and experiment crowdsourcing using our open-source Collective Knowledge platform as well as sharing of artifacts and workflows as portable, customizable and reusable components with a common Python API and JSON meta-information.

You can see practical examples of such “plug&play” artifacts and workflows in our recent interactive and reproducible CK-based article “A Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques”. It presents our long-term educational initiative to teach students how to benchmark and co-design software and hardware stack for self-optimizing computer systems in a collaborative and reproducible way. It has all workflows, artifacts and results shared as portable, customizable and reusable CK components via GitHub and FigShare to let the community validate, reuse, improve and build upon them while crowdsource experiments via our public CK repository. Furthermore, we want researchers to quickly reuse and compare against shared performance results for common benchmarks and datasets on specific platforms and compilers during feedback-directed compilation and autotuning, rather than spending considerable effort rebuilding and rerunning such experiments!

We are also very excited to organize the 1st ACM ReQuEST tournament based on Collective Knowledge platform with a consortium of leading universities (Cornell, Washington, Toronto, EPFL, Cambridge) and a strong advisory board: http://cKnowledge.org/request . This novel competition series focus on reproducible and Pareto-efficient co-design and optimization of the whole application/software/hardware stack for AI, ML, deep learning and other emerging workloads in terms of speed, accuracy, power and costs. More importantly, a growing number of participants will continue improving the common and CK-based optimization workflow and sharing portable and customizable AI/ML blocks optimized across diverse models, data sets and platforms from IoT to HPC. The benchmarking results and winning SW/HW/model configurations will be visualized on a public interactive dashboard and grouped according to certain categories (e.g. embedded vs. server). They can be also reproduced, reused, improved and compared against, thanks to the common CK framework. Our eventual goal is to share all winning algorithms and related artifacts as “plug&play” CK components with a common API to let the community immediately validate, customize, reuse and build upon them thus removing technology transfer gap and enabling open systems/AI/ML research!

The first edition of ReQuEST will serve mainly as a testbed for our approach, framework and repository, so we decided to limit submissions only to deep learning algorithms for image classification. It will be collocated with ASPLOS'18 - ACM conference on Architectural Support for Programming Languages and Operating Systems, which is the premier forum for multidisciplinary systems research spanning computer architecture and hardware, programming languages and compilers, operating systems and networking. Authors of Pareto-efficient or original submissions will be invited to present their findings at the associated ACM workshop. At the end of the tournament, we will provide a report to our advisory board presenting the outcome of the tournament, issues, possible solutions and next steps.

You can find more details about ReQuEST long term vision in the following documents:

We look forward to collaborating with all of you in 2018 to automate research and experimentation, improve reproducibility of published results, develop efficient systems for AI and other emerging workloads, accelerate AI/ML/systems research, make a breakthrough in AI and enable intelligent systems everywhere!

Miscellaneous resources (2017)

Events

Initiatives

Example of artifacts in the ACM Digital Library:
- https://dl.acm.org/citation.cfm?doid=2807591.2807619 - SC’16 paper which used Collective Knowledge to share artifacts
- https://doi.org/10.1145/3159940 (see source materials)
- CGO'18 replication package (artifact) with a linked paper

Repositories

Research Data at Springer Nature
Permanent archives acceptable for ACM “artifacts available” badge:
- Zenodo
- FigShare
- Dryad

Tools

Portable workflows using Collective Knowledge Framework
Spack: portable package manager for HPC (we plan to connect CK and Spack in the future)
Scons: a software construction tool (we added support for Scons to CK)
Facebook Buck (a fast build system)
Genome Analysis Toolkit 4 (GATK4) as open source resource to accelerate research
NextFlow: A DSL for data-driven computational pipelines
LabPal: Easily run experiments on a computer
Anonymous Github: a proxy server to support anonymous browsing of Github repositories for open-science code and data
Singularity containers
VC++ Packaging Tool
Google CoLab
Benchmarks and data sets

Articles

See all related resources here.

We have successfully completed PPoPP'18 artifact evaluation (AE). Just like at CGO'18, we received a record number of artifact submissions: 15. Results are now available at http://ctuning.org/ae/artifacts.html !

For the first time, we used the new ACM Artifact Review and Badging policy which we co-authored last year. Note that it is now possible to search for papers with specific badges in the ACM Digital Library: go to https://dl.acm.org/advsearch.cfm?coll=DL&dl=ACM and select "Artifact Badge" for field and then select badges to search! Since we see AE as a cooperative process to improve reproducibility of experiments, authors, reviewers and chairs worked closely together to improve artifacts and pass evaluation. We would like to thank them all for their hard work: http://cTuning.org/ae/committee.html !

Though there were no major problems, we noticed that "reusability/customization" criteria in the new guidelines are quite vague and caused ambiguity in evaluation of several complex artifacts.

Another problem is that all artifacts have their own ad-hoc formats and scripts, while we now have to automate this process as much as possible to make AE sustainable. ACM is now evaluating several technologies to pack, share and evaluate artifacts automatically: https://www.acm.org/publications/artifacts

We plan to evaluate those techniques further during the 1st open tournament on reproducible and Pareto-efficient co-design of the whole software/hardware/model stack for deep learning and other emerging workloads: http://cKnowledge.org/request

We would like to discuss all these issues with the community to improve next AE during an open CGO-PPoPP AE discussion session on the 26th of February 2018 (17:15). Please join us and feel free to provide your feedback!

CGO'18 website: http://cgo.org/cgo2018
PPoPP'18 website: https://ppopp18.sigplan.org
Artifact evaluation for systems conferences: http://cTuning.org/ae

Sunday, 4 March 2018

Digest of reproducibility activities from the non-profit cTuning foundation and dividiti in 2017

Miscellaneous resources (2017)

Events

Initiatives

Example of artifacts in the ACM Digital Library:

Repositories

Tools

Benchmarks and data sets

Articles

Thursday, 8 February 2018

ACM ReQuEST: 1st open and reproducible tournament to co-design Pareto-efficient deep learning (speed, accuracy, energy, size, costs)

Saturday, 20 January 2018

Public CGO-PPoPP'18 artifact evaluation discussion session on the 26th of February

Tuesday, 5 September 2017

My CNRS webcast "Enabling open and reproducible research at computer systems conferences: good, bad and ugly"

Monday, 4 September 2017

Microsoft sponsors non-profit cTuning foundation

Successful PhD defense at the University of Paris-Saclay (advised by cTuning foundation members)

We helped prepare ACM policy on Result and Artifact Review and Badging