Sunday, 4 March 2018

Digest of reproducibility activities from the non-profit cTuning foundation and dividiti in 2017

Last year was very intense for the cTuning foundation and dividiti - we continued working closely with AI, ML and systems communities to automate experimentation while improving reproducibility and reusability of results based on our long-term vision.

First of all, we had a chance to apply the new ACM artifact reviewing and badging policy (which we had co-authored a year earlier as a part of the new ACM taskforce on reproducibility based on our prior Artifact Evaluation experience) at the ACM CGO’18 and PPoPP’18. The good news is that we had yet another record number of submissions: nearly half of accepted papers submitted artifacts for validation totaling to 30! We also switched from EasyChair to HotCRP for artifact submission and evaluation since the last one provided a more convenient and anonymous communication mechanism between authors and reviewers during evaluation to continuously solve issues! This also motivates us to remove a “technical clarification” session during Artifact Evaluation at the future conferences since our reviewers already communicate with authors during evaluation!

We also noticed that our Artifact Appendix Template which we had prepared several years before in an attempt to unify CGO,PPoPP and PACT AE is now used at other conferences including SuperComputing (see Artifact Description in this SC’17 paper which will be used for CLUSTER competition at SC’18)! We plan to collaborate with CLUSTER colleagues as a part of the ACM pilot projects to automate artifact evaluation and introduce workflow frameworks to convert these artifacts to our Collective Knowledge format (see SC’16 paper and CGO’17 article with artifacts and workflows shared in the CK format).

Finally, we got a very positive feedback about our open artifact evaluation by the community from the past CGO/PPoPP’17 which we consider using even more in the future (see our motivation):

At the same time, we noticed several ambiguities with the new policy for “artifacts available” and “artifacts reusable” badges.

After consulting with our ACM colleagues, we updated reviewing criteria for “artifacts available” badge at http://cTuning.org/ae/reviewing.html:

The author-created artifacts relevant to this paper will receive an ACM "artifacts available" badge only if they have been placed on a publicly accessible archival repository such as Zenodo, FigShare or Dryad. A DOI will be then assigned to their artifacts and must be provided in the Artifact Appendix! The authors can also share their artifact via ACM DL - in such case they should contact AE chairs to obtain DOI (not yet automated unlike above repositories).

Criteria for “artifacts reusable” badge turned out to be even more vague particularly for systems research where experimental workflows often involve very complex and continuously changing algorithm/software/hardware stack. Many authors considered that having their artifact public with a ReadME, a few ad-hoc scripts to build and run experiments, and a Docker image is enough to get “artifacts reusable” badge.

Each year we see all the burden and suffering of our evaluators to figure out how to deal with numerous ad-hoc, non-portable and often failing scripts, non-unified data formats, and ad-hoc validation. That is why we argue that some sort of common workflow frameworks combined with cross-platform package managers and common APIs must be used to make an artifact easily reusable, portable and customizable (see Collective Knowledge, spack, easybuild , etc.). Such automation and unification can help to make Artifact Evaluation sustainable in a longer term particularly when more artifacts are submitted.

However, since we do not want to enforce our views and didn’t yet manage to reach a satisfying conclusion, we started discussing these issues at the open Artifact Evaluation discussion session at CGO/PPoPP’18. We also described some of these issues and possible solutions in our CNRS’17 presentation “Enabling open and reproducible computer systems research: the good, the bad and the ugly”.

At the same time, we continue working with ACM, Raspberry Pi foundation and the community to improve automation and experiment crowdsourcing using our open-source Collective Knowledge platform as well as sharing of artifacts and workflows as portable, customizable and reusable components with a common Python API and JSON meta-information.

You can see practical examples of such “plug&play” artifacts and workflows in our recent interactive and reproducible CK-based article “A Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques”. It presents our long-term educational initiative to teach students how to benchmark and co-design software and hardware stack for self-optimizing computer systems in a collaborative and reproducible way. It has all workflows, artifacts and results shared as portable, customizable and reusable CK components via GitHub and FigShare to let the community validate, reuse, improve and build upon them while crowdsource experiments via our public CK repository. Furthermore, we want researchers to quickly reuse and compare against shared performance results for common benchmarks and datasets on specific platforms and compilers during feedback-directed compilation and autotuning, rather than spending considerable effort rebuilding and rerunning such experiments!

We are also very excited to organize the 1st ACM ReQuEST tournament  based on Collective Knowledge platform with a consortium of leading universities (Cornell, Washington, Toronto, EPFL, Cambridge) and a strong advisory board: http://cKnowledge.org/request . This novel competition series focus on reproducible and Pareto-efficient co-design and optimization of the whole application/software/hardware stack for AI, ML, deep learning and other emerging workloads in terms of speed, accuracy, power and costs. More importantly, a growing number of participants will continue improving the common and CK-based optimization workflow and sharing portable and customizable AI/ML blocks optimized across diverse models, data sets and platforms from IoT to HPC. The benchmarking results and winning SW/HW/model configurations will be visualized on a public interactive dashboard and grouped according to certain categories (e.g. embedded vs. server). They can be also reproduced, reused, improved and compared against, thanks to the common CK framework. Our eventual goal is to share all winning algorithms and related artifacts as “plug&play” CK components with a common API to let the community immediately validate, customize, reuse and build upon them thus removing technology transfer gap and enabling open systems/AI/ML research!


The first edition of ReQuEST will serve mainly as a testbed for our approach, framework and repository, so we decided to limit submissions only to deep learning algorithms for image classification. It will be collocated with ASPLOS'18 - ACM conference on Architectural Support for Programming Languages and Operating Systems, which is the premier forum for multidisciplinary systems research spanning computer architecture and hardware, programming languages and compilers, operating systems and networking. Authors of Pareto-efficient or original submissions will be invited to present their findings at the associated ACM workshop. At the end of the tournament, we will provide a report to our advisory board presenting the outcome of the tournament, issues, possible solutions and next steps.

We look forward to collaborating with all of you in 2018 to automate research and experimentation, improve reproducibility of published results, develop efficient systems for AI and other emerging workloads, accelerate AI/ML/systems research, make a breakthrough in AI and enable intelligent systems everywhere!

Miscellaneous resources (2017)

Events

Initiatives 

Repositories

Tools

Articles


See all related resources here.

Thursday, 8 February 2018

ACM ReQuEST: 1st open and reproducible tournament to co-design Pareto-efficient deep learning (speed, accuracy, energy, size, costs)

The first Reproducible Quality-Efficient Systems Tournament (ReQuEST) will debut at ASPLOS’18 ( ACM conference on Architectural Support for Programming Languages and Operating Systems, which is the premier forum for multidisciplinary systems research spanning computer architecture and hardware, programming languages and compilers, operating systems and networking).

Organized by a consortium of leading universities (Washington, Cornell, Toronto, Cambridge, EPFL) and the cTuning foundation, ReQuEST aims to provide a open-source tournament framework, a common experimental methodology and an open repository for continuous evaluation and multi-objective optimization of the quality vs. efficiency Pareto optimality of a wide range of real-world applications, models and libraries across the whole software/hardware stack.

ReQuEST will use the established artifact evaluation methodology together with the Collective Knowledge framework validated at leading ACM/IEEE conferences to reproduce results, display them on a live dashboard and share artifacts with the community. Distinguished entries will be presented at the associated workshop and published in the ACM Digital Library. To win, the results of an entry do not necessarily have to lie on the Pareto frontier, as an entry can be also praised for its originality, reproducibility, adaptability, scalability, portability, ease of use, etc.

The first ReQuEST competition will focus on deep learning for image recognition with an ambitious long-term goal to build a public repository of portable and customizable “plug&play” AI/ML algorithms optimized across diverse data sets, models and platforms from IoT to supercomputers (see live demo). Future competitions will consider other emerging workloads, as suggested by our Industrial Advisory Board.

For more information, please visit http://cKnowledge.org/request


Saturday, 20 January 2018

Public CGO-PPoPP'18 artifact evaluation discussion session on the 26th of February


We have successfully completed PPoPP'18 artifact evaluation (AE). Just like at CGO'18, we received a record number of artifact submissions: 15. Results are now available at http://ctuning.org/ae/artifacts.html !


For the first time, we used the new ACM Artifact Review and Badging policy which we co-authored last year. Note that it is now possible to search for papers with specific badges in the ACM Digital Library: go to https://dl.acm.org/advsearch.cfm?coll=DL&dl=ACM and select "Artifact Badge" for field and then select badges to search! Since we see AE as a cooperative process to improve reproducibility of experiments, authors, reviewers and chairs worked closely together to improve artifacts and pass evaluation. We would like to thank them all for their hard work: http://cTuning.org/ae/committee.html !


Though there were no major problems, we noticed that "reusability/customization" criteria in the new guidelines are quite vague and caused ambiguity in evaluation of several complex artifacts.

Another problem is that all artifacts have their own ad-hoc formats and scripts, while we now have to automate this process as much as possible to make AE sustainable. ACM is now evaluating several technologies to pack, share and evaluate artifacts automatically: https://www.acm.org/publications/artifacts

We plan to evaluate those techniques further during the 1st open tournament on reproducible and Pareto-efficient co-design of the whole software/hardware/model stack for deep learning and other emerging workloads: http://cKnowledge.org/request

We would like to discuss all these issues with the community to improve next AE during an open CGO-PPoPP AE discussion session on the 26th of February 2018 (17:15). Please join us and feel free to provide your feedback!



Tuesday, 5 September 2017

My CNRS webcast "Enabling open and reproducible research at computer systems conferences: good, bad and ugly"

This spring I was kindly invited by Dr. Arnaud Legrand (CNRS research scientist promoting reproducible research in France) to present our practical experience while enabling open and reproducible research at computer systems conferences (good, bad and ugly).

This CNRS webinar took place in Grenoble on March 14, 2017 with a very lively audience.

You can find the following online resources related to this talk:

Monday, 4 September 2017

Microsoft sponsors non-profit cTuning foundation

We would like to thank to Microsoft for providing an Azure sponsorship for our non-profit cTuning foundation to host our public repository of cross-linked artifacts and optimization results in a unified and reusable Collective Knowledge format.

Many thanks to Dr. Aaron Smith for his assistance!

Successful PhD defense at the University of Paris-Saclay (advised by cTuning foundation members)

We would like to congratulate Abdul Memon (PhD student advised by Dr. Grigori Fursin from the cTuning foundation) for successfully defending his thesis "Crowdtuning: Towards Practical and Reproducible Auto-tuning via Crowdsourcing and Predictive Analytics" in the University of Paris-Saclay.

Most of the software, data sets and experiments are shared in a unified, reproducible and reusable way using Collective Mind framework and later converted to the new Collective Knowledge framework.

We helped prepare ACM policy on Result and Artifact Review and Badging

After arranging many Artifact Evaluations to reproduce and validate experimental results from published papers at various ACM and IEEE computer systems conferences (CGO, PPoPP, PACT, SC), we saw the need for a common reviewing and badging methodology.

In 2016, the cTuning foundation joined ACM internal workgroup on reproducibility and provided feedback and suggestions to develop a common methodology for artifact evaluation. The outcome of this collaborative effort is the common ACM policy on Result and Artifact Review and Badging published here:
We started aligned artifact submission and reviewing procedures for computer systems conferences:
We expect that this document will gradually evolve based on our AE experience - please, stay tuned for more news!