cTuning foundation - enabling collaborative and reproducible AI/ML/computer systems/quantum R&D: February 2015

Monday 16 February 2015

Artifact Evaluation Experience presentation online

I presented our Artifact Evaluation Experience at CGO'15/PPoPP'15. Presentation is now available online: http://www.slideshare.net/GrigoriFursin/presentation-fursin-aecgoppopp2015

Overall, the feedback is positive and we plan to continue AE for CGO'16/PPoPP'16.

Our main task is to improve guidelines for artifact submission and reviewing.

We will continue validating our new publication model for ADAPT'16

For a few years we are promoting a new publication model where papers and related material are submitted to open access archives, then publicly discussed via SlashDot and Reddit, and only then validated and selected by the program committee.

Though we did not have participants for this publication model at our ADAPT'15, we had considerable interest and some colleagues are willing to participate in our ADAPT'16 ...

Furthermore, one of the papers made it to Slashdot generating considerable feedback and thus supporting our idea. By the way, we just noticed that similar approach is proposed in other sciences!

Therefore, we plan to continue validating our new publication model at ADAPT'16 - please, stay tuned!

Highest ranked artifacts for CGO'15/PPoPP'15

We would like to congratulate authors of the following 2 highest-ranked artifacts from CGO/PPoPP'15:

1st place (sponsored by Nvidia)
"The SprayList: A scalable relaxed priority queue"
Justin Kopinsky, Dan Alistarh, Jerry Li and Nir Shavi

received prize "Nvidia Quadro K6000"

2nd place (sponsored by cTuning Foundation) "A graph-based higher-order intermediate representation"
Roland Leißa, Marcel Köster and Sebastian Hack

received prize "Acer C720P"

Sunday 8 February 2015

ADAPT'15 outcome and new publication model for ADAPT'16

A few words about ADAPT'15 outcome:

* Final program with all PDFs is available online at here.

* The following paper received Nvidia best paper award (Tesla K40):

A Self-adaptive Auto-scaling Method for Scientific Applications on HPC Environments and Clouds
Kiran Mantripragada¹, Alecio Binotto¹ and Leonardo Tizzei²
¹ IBM Research - Brazil
² IBM Brazil

* We had a very interesting discussion about our new open publication model. In spite of some possible issues, it seems that there is a support to try it for ADAPT'16. Interestingly, we just found out that very similar model is proposed for other scientific fields (see this blog article). Furthermore, we just found the following public discussion on Slashdot about one of ADAPT'15 papers supporting our idea (as a researcher, you normally publish to present your work to a broad community, initiate discussions and get feedback to improve your work, unless it's purely for academic promotion reasons).

Please, follow our announcements about ADAPT'16 (will likely be co-located with HiPEAC'16 in Prague and will likely feature new publication model).

Anaconda Scientific Python Distribution

Recently discovered Anaconda Scientific Python Distribution. It contained all necessary libraries for the Collective Knowledge Framework that I use for auto-tuning, statistical analysis and predictive analytics, so wanted to share it with you:

https://store.continuum.io/cshop/anaconda

Wednesday 4 February 2015

New year's digest on collaborative & reproducible research

This list is aggregated from public and private messages or during web browsing. Don't hesitate to send me links via our public mailing list or LinkedIn group (to have an acknowledgment):
https://groups.google.com/forum/#!forum/collective-mind
http://www.linkedin.com/groups/Reproducible-research-experimentation-in-computer-7433414

=== Misc articles ===

* "Research Wranglers: Initiatives to Improve Reproducibility of Study Findings"
http://ehp.niehs.nih.gov/122-a188

* Dennis McCafferty, "Should Code be Released?",
Communications of ACM, 2010/10, Vol.53, No.10, DOI:10.1145/1831407.1831415
http://dl.acm.org/citation.cfm?id=1831415

* Chris Drummond, "Replicability is not Reproducibility: Nor is it Good Science"
Proc. of the Evaluation Methods for Machine Learning Workshop
at the 26th ICML, Montreal, Canada, 2009.
Copyright: National Research Council of Canada
http://cogprints.org/7691/7/ICMLws09.pdf

* Science is in a reproducibility crisis - how do we resolve it?
http://theconversation.com/science-is-in-a-reproducibility-crisis-how-do-we-resolve-it-16998

* My blog article on "Automatic performance tuning and reproducibility as a side effect"
for the Software Sustainability Institute:
http://www.software.ac.uk/blog/2014-07-22-automatic-performance-tuning-and-reproducibility-side-effect

* Puzzling Measurement of "Big G" Gravitational Constant Ignites Debate
http://www.scientificamerican.com/article/puzzling-measurement-of-big-g-gravitational-constant-ignites-debate-slide-show/

* White House takes notice of reproducibility in science, and wants your opinion
http://retractionwatch.com/2014/09/05/white-house-takes-notice-of-reproducibility-in-science-and-wants-your-opinion/

* Problems during performance benchmarking:
** https://homes.cs.washington.edu/~bornholt/post/performance-evaluation.html

We also experienced many similar issues during our work on auto-tuning and machine learning:
* http://hal.inria.fr/hal-01054763
* http://arxiv.org/abs/1406.4020

* ACM SIGOPS Operating Systems Review - Special Issue on Repeatability
and Sharing of Experimental Artifacts:
http://dl.acm.org/citation.cfm?id=2723872

* Vinton G. Cerf. "Bit Rot: Long-Term Preservation of Digital Information" [Point of View]
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=5768098

* Less related though interesting (about citations):
http://www.nature.com/news/the-top-100-papers-1.16224

=== Future events ==

* February 9, 2015, CGO/PPoPP joint session on artifact evaluation experience
San Francisco, 17:15 - 17:35
Grigori Fursin and Bruce Childers

* November 1-4, 2015, Dagstuhl Perspective Workshop
"Artifact Evaluation for Publications"
Bruce Childers, Shriram Krishnamurthi, Grigori Fursin, Andreas Zeller

* March 13 - 18 , 2016, Dagstuhl Seminar 16111
"Rethinking Experimental Methods in Computing"

=== Past events ===

* Oct 27-30, 2014, Washington DC, US
"1st International Workshop on Collaborative methodologies to Accelerate
Scientific Knowledge discovery in big data (CASK) 2014"

In conjunction with 2014 IEEE International Conference on Big Data
(IEEE BigData 2014)

http://bigscientificdata.org/cask14

* September 1, 2014: Special journal issue on reproducible research methodologies
in IEEE Transactions on Emerging Topics in Computing (TETC).

http://www.occamportal.org/images/reproduce/TETC-SI-REPRODUCE.pdf

* January 2015:

ACM SIGOPS Operating Systems Review
Special Issue on Repeatability and Sharing
of Experimental Artifacts

http://www.sigops.org/osr.html

=== Journals/Conferences with reproducible articles ===
* IPOL Journal: Image Processing On Line
http://www.ipol.im

=== Tools ===
* NGS pipelines - ntegrates pipelines and user interfaces
to help biologists to analyse data outputed from biological
applications such as RNAseq, sRNAseq, ChipSeq, BS-seq:
https://mulcyber.toulouse.inra.fr/projects/ngspipelines

* Skoll: A process & Infrastructure for Distributed, continuous Quality assurance
http://www.cs.umd.edu/projects/skoll/Skoll/Home.html

* NEPI: Simplifying network experimentation:
http://nepi.inria.fr

* RR (Mozilla project): records nondeterministic executions and debugs them deterministically
http://rr-project.org

* Burrito: Rethinking the Electronic Lab Notebook
http://pgbovine.net/burrito.html

* Collective Knowledge (cTuning v4): our tool and repository to simplify code and data sharing as reusable components (for collaborative and reproducible R&D):
http://github.com/ctuning/ck

=== Online workflows ===

* RunMyCode:
http://www.runmycode.org

* AptLab:
https://www.aptlab.net

=== Projects ===
* OpenLab:
http://www.ict-openlab.eu

* EU Recode project
http://recodeproject.eu/events/upcoming-events

* CERN: opendata
http://opendata.cern.ch

* Research Data Alliance:
https://rd-alliance.org

* Open Data Institute:
http://opendatainstitute.org

=== Online archives/repos ===

* Olive Archive (preserving executable content):
https://olivearchive.org

* Tera-PROMISE:
http://openscience.us/repo

* OpenAire (CERN)
https://www.openaire.eu

* Zenodo:
https://zenodo.org

* ResearchCompendia:
http://researchcompendia.org

* Internet Archive:
https://archive.org

* The national archives:
http://www.nationalarchives.gov.uk

* WikiData:
http://www.wikidata.org/wiki/Wikidata:Introduction

* The digital preservation network:
http://www.dpn.org

* Open datasets:
https://open-data.europa.eu

* DataHub:
http://datahub.io

* Datacite: citing data as DOI (Germany, has connections with CERN)
https://www.datacite.org/contact

* CrossRef:
http://www.crossref.org

* International DOI Foundation
http://www.doi.org

* Our new pilot Collective Knowledge repository:
http://cknowledge.org/repo