Make generated datasets available in swift
The output datasets generatated by the content gap pipelines need to be accessible, e.g.
- aggregated metrics to be used by the content gap web interface
- raw article content gap features dataset for research
For this purpose, the research team has storage available on the swift cluster, requested&approved with this phab. Data engineering added support for propagating credentials, see details here.
Note however, once the file objects are stored in swift (even with a public acl), they are only available inside the wmf network. An internal service could be used to "proxy" files to make them available publicly.