cTuning foundation - enabling collaborative and reproducible AI/ML/computer systems/quantum R&D

Thursday, 19 July 2018

New CK released with 500+ reusable, portable and customizable AI/ML components (libraries, models, data sets)

We have the pleasure to announce a new release of our open-source community-driven Collective Knowledge framework (CK) with a completely redesigned website: http://cKnowledge.org.

With CK, you can convert your code and data into unified CK components with a common Python API and JSON meta description, and share them with others via private or public repositories (e.g. GitHub). The community can then reuse your components and help adapt them to new research scenarios by extending APIs, meta descriptions and functionality. The open and decentralized nature of CK liberates the community from being locked into any proprietary tools, formats and services.

For example, you can now take advantage of over 540 CK packages shared by our community to automatically install various AI/ML frameworks and libraries (TensorFlow, TFLite, MXNet, NNVM, TVM, VTA, Caffe, Caffe2, CNTK, cuDNN, ArmCL, PyTorch), models and data sets on Linux, MacOS, Windows and Android.

You can also quickly reuse over 340 customizable CK programs from traditional systems benchmarks to emerging AI applications. This includes all workflows from the 1st ACM ReQuEST tournament to collaboratively benchmark and co-design the efficient SW/HW stack for deep learning inference from the cloud to the edge!

All CK programs automatically manage dependencies using CK packages, unified compilation and customized execution across diverse platforms, frameworks, libraries, models and data sets. Adding new CK program has also become easier: just invoke “ck add program:my-new-program” and select one of the multiple shared templates! This approach simplifies developing customizable, portable and extensible benchmarks, and can assist new benchmarking initiatives such as MLPerf.

We also continue improving our universal and ML/AI-based CK autotuner/crowd-tuner with new practical use-cases to perform multi-objective autotuning/co-design of MobileNets across the full software/hardware stack, to crowdsource benchmarking of different AI frameworks and libraries (TFlite, TensorFlow, Caffe, ArmCL) across Android devices provided by volunteers, and to automatically generate adaptive libraries!

Based on user feedback, we have introduced a virtual CK environment with over 200 CK plugins to automatically detect software and data dependencies required by CK programs and experimental workflows. We have also shared over 150 CK modules and over 50 CK productivity functions with a common API which can help you automate and unify various AI/ML/systems research tasks.

We have updated CK documentation including first steps, portable package manager and how to add your own workflows and components . We also plan to redesign our public repository with crowdsourced experiments to make more dynamic and user-friendly: http://cKnowledge.org/repo

Please join us to discuss CK and related technology at ResCuE-HPC at Supercomputing’18, the 1st workshop on reproducible, customizable and portable workflows which we co-organize with Todd Gamblin (Lawrence Livermore National Laboratory, USA), Michela Taufer (University of Delaware, USA) and Milos Puzovic (The Hartree Centre, UK).

We are now preparing many exciting CK-based projects with our academic and industrial partners around automating artifact evaluation across different AI/ML/systems conferences (SysML, CGO, PPoPP, PACT, SC), collaboratively co-designing efficient SW/HW stack for emerging AI/ML and quantum workloads, starting new ReQuEST tournaments, and much more! Please get in touch if you are interested to know more or participate!

Enjoy,

Your Collective Knowledge team.

Sunday, 1 July 2018

ACM proceedings with reusable Collective Knowledge workflows from the 1st ReQuEST-ASPLOS'18 tournament on reproducible SW/HW co-design of efficient deep learning

Artificial Intelligence (AI), Machine Learning (ML) and other emerging workloads demand efficient computer systems from the cloud to the edge. Systems designers, however, face numerous challenges from tackling the ever-growing space of design and optimization choices (including algorithms, models, software frameworks, libraries, hardware platforms, optimization techniques) to balancing off multiple objectives (including accuracy, speed, throughput, power, size, price). Furthermore, the lack of a common experimental framework and methodology makes it even more challenging to keep up with and build upon the latest research advances.

The ACM Reproducible Quality-Efficient Systems Tournaments initiative (ReQuEST) invites a multidisciplinary community (workloads/software/hardware) to decompose the complex multi-objective benchmarking, co-design and optimization process into customizable workflows with reusable components (see the introduction to ReQuEST). We leverage the open Collective Knowledge workflow framework (CK) and the rigorous ACM artifact evaluation methodology (AE) to allow the community collaboratively explore quality vs. efficiency trade-offs for rapidly evolving workloads across diverse systems.

The 1st ReQuEST tournament served as a proof-of-concept of our approach. We invited the community to submit complete implementations (code, data, scripts, etc.) for the popular ImageNet object classification challenge. For several weeks, four volunteers collaborated with the authors to convert their artifacts into a common CK format and evaluate the converted artifacts on the original or similar platforms. The evaluation metrics included accuracy on the ImageNet validation set (50,000 images), latency (seconds per image), throughput (images per second), platform price (dollars) and peak power consumption (Watts).

Since collapsing all metrics into one to select a single winner often results in over-engineered solutions, we have opted instead to select multiple implementations from a Pareto-frontier, based on their uniqueness or simply to obtain a reference implementation. The authors of such selected solutions were given an opportunity to share their insights at the associated ReQuEST workshop co-located with the 23rd ACM ASPLOS conference at the end of March 2018 in Williamsburg, VA, USA (ASPLOS is the premier forum for multidisciplinary systems research spanning computer architecture and hardware, programming languages and compilers, operating systems and networking).

The ReQuEST-ASPLOS’18 proceedings, available in the ACM Digital Library, include five papers with Artifact Appendices and a set of ACM reproducibility badges. The proceedings are accompanied by snapshots of Collective Knowledge workflows covering a very diverse model/software/hardware stack:

Models: MobileNets, ResNet-18, ResNet-50, Inception-v3, VGG16, AlexNet, SSD.
Data types: 8-bit integer, 16-bit floating-point (half), 32-bit floating-point (float).
AI frameworks and libraries: MXNet, TensorFlow, Caffe, Keras, Arm Compute Library, cuDNN, TVM, NNVM.
Platforms: Xilinx Pynq-Z1 FPGA, Arm Cortex CPUs and Arm Mali GPGPUs (Linaro HiKey960 and T-Firefly RK3399), a farm of Raspberry Pi devices, NVIDIA Jetson TX1 and TX2, and Intel Xeon servers in Amazon Web Services, Google Cloud and Microsoft Azure.

The validated ReQuEST-ASPLOS’18 results, available on the ReQuEST scoreboard, also exhibit amazing diversity:

Most importantly, the community can now access all the above CK workflows under permissive licenses and continue collaborating on them via dedicated ReQuEST’18 GitHub projects. First, the workflows can be automatically adapted to new platforms and environments by either detecting already installed dependencies (e.g. libraries) or rebuilding dependencies via an integrated package manager supporting Linux, Windows, MacOS and Android. Second, the workflows can be customized by swapping in new models, data sets, frameworks, libraries, and so on. Third, the workflows can be extended to expose new design and optimization choices (e.g. quantization), as well as evaluation metrics (e.g. power or memory consumption). Finally, the workflows can be used for collaborative autotuning (“crowd-tuning”) to explore huge optimization spaces using devices such as Android phones and tablets, with best solutions being made available to the community on the online CK scoreboard.

Our overwhelmingly positive experience has also allowed us to critically assess several potential issues with scaling up this approach and suggest how to overcome them:

Fair competitive benchmarking between different platforms, frameworks and models is hard work. It requires carefully considering model equivalence (e.g. performing the same mix of operations), input equivalence (e.g. preprocessing the inputs in the same way), output equivalence (e.g. validating the outputs for each input, not just calculating the usual aggregate accuracy score), etc. Formalizing the benchmarking requirements and encapsulating them in shared CK components (e.g. using a framework-independent model representation such as ONNX) and workflows (e.g. for input conversion and output validation), should help standardize and automate the benchmarking process and thus bring order and peace to the galaxy ;) .
Thorough artifact evaluation can take several person-weeks. Each submitted workflow needs to be studied in detail in its original form and then converted into a common format. However, the more reusable CK components (such as workflows, modules/plugins, packages) are shared by the community, the easier the conversion becomes. For example, we have successfully reused several previously shared components for models, frameworks and libraries, as well as the universal CK workflow for program benchmarking and autotuning. We propose to introduce a new ACM reproducibility badge for such unified “plug&play” components. This could eventually lead to creating a “marketplace” for Pareto-efficient implementations (code and data) shared as portable, customizable and reusable CK components.
Artifact evaluation may require access to expensive computational resources (e.g. cloud instances with 72-core servers), proprietary tools (e.g. Intel compilers), and auxiliary hardware (e.g. power meters). Raising the profile of AE by widely recognizing its benefits and impact should help us obtain access, licensees and sponsorship from the industry and funding agencies.
Full experimental evaluation can take many weeks (for example, when validating accuracy on 50,000 images on a 100 MHz FPGA board). The AE committee can collaborate with the authors to determine a minimally useful scope for evaluation which would still provide insights to the community. The community can eventually crowdsource full evaluation. In other words, AE can be “staged” with a quick check that the artifacts are “functional” before the camera-ready deadline followed by full evaluation using the ReQuEST methodology. In fact, ReQuEST can grow into a non-profit service to conferences and journals. Sponsorship should help attract experienced full-time evaluators, as well as part-time volunteers to work on unifying and evaluating artifacts and workflows.

Our future plans include:

collaborating with the community, our Advisory Board and ACM to address the above issues;
using the ReQuEST experience to assist AE at the upcoming SysML’19 conference;
replacing non-representative benchmarks with realistic workloads;
creating realistic training sets based on mispredictions shared by the community;
improving the benchmarking and co-design methodology, and contributing to emerging benchmarking initiatives such as MLPerf;
collaborating with other competitions such as LPIRC, DAWNBench and SCC on developing a common experimental framework;
standardizing multi-objective autotuning and co-design workflows;
extending unified collection of platform information in CK;
improving and documenting the experimental framework and scoreboard;
generating reproducible and interactive reports (see examples 1 and 2);
adding new shared components such as workloads, data sets, tools and platforms;
automating AE “at the source” by integrating CK workflows with e.g. HotCRP;
standardizing APIs and meta-descriptions of shared components to make them “marketplace-ready”;
running new ReQuEST competitions for other workloads!

We are also organizing a related ResCuE-HPC workshop at Supercomputing’18 to discuss with the community some of the above issues focusing on how to develop reproducible, customizable and portable workflows for high-performance computing (HPC).

Our long-term vision is to dramatically reduce the complexity and costs of the development and deployment of AI, ML and other emerging workloads. We believe that having an open repository (marketplace) of customizable workflows with reusable components helps to bring together the multidisciplinary community to collaboratively co-design, optimize and autotune computer systems across the full model/software/hardware stack. Systems integrators will also benefit from being able to assemble complete solutions by adapting such reusable components to their specific usage scenarios, requirements and constraints. We envision that our community-driven approach and decentralized marketplace will help accelerate adoption and technology transfer of novel AI/ML techniques similar to the open-source movement.

ACM proceedings with reusable CK workflows and AI/ML components:

"Highly Efficient 8-bit Low Precision Inference of Convolutional Neural Networks with IntelCaffe" [Paper DOI] [Artifact DOI] [CK workflow]
"Optimizing Deep Learning Workloads on ARM GPU with TVM" [Paper DOI] [Artifact DOI] [CK workflow]
"Real-Time Image Recognition Using Collaborative IoT Devices" [Paper DOI] [Artifact DOI] [CK workflow]
"Leveraging the VTA-TVM Hardware-Software Stack for FPGA Acceleration of 8-bit ResNet-18 Inference" [Paper DOI] [Artifact DOI] [CK workflow]
"Multi-objective autotuning of MobileNets across the full software/hardware stack" [Paper DOI] [Artifact DOI] [CK workflow]

Organizers (A-Z):

Luis Ceze, University of Washington, USA
Natalie Enright Jerger, University of Toronto, Canada
Babak Falsafi, EPFL, Switzerland
Grigori Fursin, cTuning foundation, France
Anton Lokhmotov, dividiti, UK
Thierry Moreau, University of Washington, USA
Adrian Sampson, Cornell University, USA
Phillip Stanley Marbell, University of Cambridge, UK

Advisory board (A-Z):

Michaela Blott, Xilinx
Unmesh Bordoloi, General Motors
Ofer Dekel, Microsoft
Maria Girone, CERN openlab
Wayne Graves, ACM
Vinod Grover, NVIDIA
Sumit Gupta, IBM
James Hetherington, Alan Turing Institute
Steve Keckler, NVIDIA
Wei Li, Intel
Colin Osborne, Arm
Andrew Putnam, Microsoft
Boris Shulkin, Magna
Greg Stoner, AMD
Alex Wade, Chan Zuckerberg Initiative
Peng Wu, Huawei
Cliff Young, Google

Acknowledgments:
We thank the ReQuEST Advisory Board for their enthusiastic support of our vision; the ReQuEST authors for being very responsive when converting their workflows to the CK format and during artifact evaluation; Flavio Vella and Nikolai Chunosov for their help with unifying and evaluating submissions; Xipeng Shen and James Tuck for their support for organizing the ReQuEST workshop at ASPLOS’18; Craig Rodkin, Asad Ali and Wayne Graves for helping to prepare the ACM DL proceedings with CK workflows, and the CK community for their contributions.

Sunday, 4 March 2018

Digest of reproducibility activities from the non-profit cTuning foundation and dividiti in 2017

Last year was very intense for the cTuning foundation and dividiti - we continued working closely with AI, ML and systems communities to automate experimentation while improving reproducibility and reusability of results based on our long-term vision.

First of all, we had a chance to apply the new ACM artifact reviewing and badging policy (which we had co-authored a year earlier as a part of the new ACM taskforce on reproducibility based on our prior Artifact Evaluation experience) at the ACM CGO’18 and PPoPP’18. The good news is that we had yet another record number of submissions: nearly half of accepted papers submitted artifacts for validation totaling to 30! We also switched from EasyChair to HotCRP for artifact submission and evaluation since the last one provided a more convenient and anonymous communication mechanism between authors and reviewers during evaluation to continuously solve issues! This also motivates us to remove a “technical clarification” session during Artifact Evaluation at the future conferences since our reviewers already communicate with authors during evaluation!

We also noticed that our Artifact Appendix Template which we had prepared several years before in an attempt to unify CGO,PPoPP and PACT AE is now used at other conferences including SuperComputing (see Artifact Description in this SC’17 paper which will be used for CLUSTER competition at SC’18)! We plan to collaborate with CLUSTER colleagues as a part of the ACM pilot projects to automate artifact evaluation and introduce workflow frameworks to convert these artifacts to our Collective Knowledge format (see SC’16 paper and CGO’17 article with artifacts and workflows shared in the CK format).

Finally, we got a very positive feedback about our open artifact evaluation by the community from the past CGO/PPoPP’17 which we consider using even more in the future (see our motivation):

https://github.com/thu-pacman/self-checkpoint/issues/1
https://gitlab.com/michel-steuwer/cgo_2017_artifact/issues/1
https://github.com/SamAinsworth/reproduce-cgo2017-paper/issues/6

At the same time, we noticed several ambiguities with the new policy for “artifacts available” and “artifacts reusable” badges.

After consulting with our ACM colleagues, we updated reviewing criteria for “artifacts available” badge at http://cTuning.org/ae/reviewing.html:

The author-created artifacts relevant to this paper will receive an ACM "artifacts available" badge only if they have been placed on a publicly accessible archival repository such as Zenodo, FigShare or Dryad. A DOI will be then assigned to their artifacts and must be provided in the Artifact Appendix! The authors can also share their artifact via ACM DL - in such case they should contact AE chairs to obtain DOI (not yet automated unlike above repositories).

Criteria for “artifacts reusable” badge turned out to be even more vague particularly for systems research where experimental workflows often involve very complex and continuously changing algorithm/software/hardware stack. Many authors considered that having their artifact public with a ReadME, a few ad-hoc scripts to build and run experiments, and a Docker image is enough to get “artifacts reusable” badge.

Each year we see all the burden and suffering of our evaluators to figure out how to deal with numerous ad-hoc, non-portable and often failing scripts, non-unified data formats, and ad-hoc validation. That is why we argue that some sort of common workflow frameworks combined with cross-platform package managers and common APIs must be used to make an artifact easily reusable, portable and customizable (see Collective Knowledge, spack, easybuild , etc.). Such automation and unification can help to make Artifact Evaluation sustainable in a longer term particularly when more artifacts are submitted.

However, since we do not want to enforce our views and didn’t yet manage to reach a satisfying conclusion, we started discussing these issues at the open Artifact Evaluation discussion session at CGO/PPoPP’18. We also described some of these issues and possible solutions in our CNRS’17 presentation “Enabling open and reproducible computer systems research: the good, the bad and the ugly”.

At the same time, we continue working with ACM, Raspberry Pi foundation and the community to improve automation and experiment crowdsourcing using our open-source Collective Knowledge platform as well as sharing of artifacts and workflows as portable, customizable and reusable components with a common Python API and JSON meta-information.

You can see practical examples of such “plug&play” artifacts and workflows in our recent interactive and reproducible CK-based article “A Collective Knowledge workflow for collaborative research into multi-objective autotuning and machine learning techniques”. It presents our long-term educational initiative to teach students how to benchmark and co-design software and hardware stack for self-optimizing computer systems in a collaborative and reproducible way. It has all workflows, artifacts and results shared as portable, customizable and reusable CK components via GitHub and FigShare to let the community validate, reuse, improve and build upon them while crowdsource experiments via our public CK repository. Furthermore, we want researchers to quickly reuse and compare against shared performance results for common benchmarks and datasets on specific platforms and compilers during feedback-directed compilation and autotuning, rather than spending considerable effort rebuilding and rerunning such experiments!

We are also very excited to organize the 1st ACM ReQuEST tournament based on Collective Knowledge platform with a consortium of leading universities (Cornell, Washington, Toronto, EPFL, Cambridge) and a strong advisory board: http://cKnowledge.org/request . This novel competition series focus on reproducible and Pareto-efficient co-design and optimization of the whole application/software/hardware stack for AI, ML, deep learning and other emerging workloads in terms of speed, accuracy, power and costs. More importantly, a growing number of participants will continue improving the common and CK-based optimization workflow and sharing portable and customizable AI/ML blocks optimized across diverse models, data sets and platforms from IoT to HPC. The benchmarking results and winning SW/HW/model configurations will be visualized on a public interactive dashboard and grouped according to certain categories (e.g. embedded vs. server). They can be also reproduced, reused, improved and compared against, thanks to the common CK framework. Our eventual goal is to share all winning algorithms and related artifacts as “plug&play” CK components with a common API to let the community immediately validate, customize, reuse and build upon them thus removing technology transfer gap and enabling open systems/AI/ML research!

The first edition of ReQuEST will serve mainly as a testbed for our approach, framework and repository, so we decided to limit submissions only to deep learning algorithms for image classification. It will be collocated with ASPLOS'18 - ACM conference on Architectural Support for Programming Languages and Operating Systems, which is the premier forum for multidisciplinary systems research spanning computer architecture and hardware, programming languages and compilers, operating systems and networking. Authors of Pareto-efficient or original submissions will be invited to present their findings at the associated ACM workshop. At the end of the tournament, we will provide a report to our advisory board presenting the outcome of the tournament, issues, possible solutions and next steps.

You can find more details about ReQuEST long term vision in the following documents:

We look forward to collaborating with all of you in 2018 to automate research and experimentation, improve reproducibility of published results, develop efficient systems for AI and other emerging workloads, accelerate AI/ML/systems research, make a breakthrough in AI and enable intelligent systems everywhere!

Miscellaneous resources (2017)

Events

Initiatives

Example of artifacts in the ACM Digital Library:
- https://dl.acm.org/citation.cfm?doid=2807591.2807619 - SC’16 paper which used Collective Knowledge to share artifacts
- https://doi.org/10.1145/3159940 (see source materials)
- CGO'18 replication package (artifact) with a linked paper

Repositories

Research Data at Springer Nature
Permanent archives acceptable for ACM “artifacts available” badge:
- Zenodo
- FigShare
- Dryad

Tools

Portable workflows using Collective Knowledge Framework
Spack: portable package manager for HPC (we plan to connect CK and Spack in the future)
Scons: a software construction tool (we added support for Scons to CK)
Facebook Buck (a fast build system)
Genome Analysis Toolkit 4 (GATK4) as open source resource to accelerate research
NextFlow: A DSL for data-driven computational pipelines
LabPal: Easily run experiments on a computer
Anonymous Github: a proxy server to support anonymous browsing of Github repositories for open-science code and data
Singularity containers
VC++ Packaging Tool
Google CoLab
Benchmarks and data sets

Articles

See all related resources here.