@gmodena I did this while I was on a long train ride on my break in early December. Lemme know what you think!
This seems like something we'd probably want to have before too many other folks (enterprise?) use this API. We've already abstracted away Flink Row, we should probably abstract these other Flink types too?
In this MR we move code that depends on datacenter prepartitions to always depend on both main datacenters (eqiad
and codfw
).
Since we now have canary events on all streams, this change avoids us having to manually change the code every time we do a datacenter swithover.
Bug: T338796
We should probably change this to event.mediawiki.page_change_v1
sometime.
As we are removing the refinery oozie directory, we moved this util script into the python folder.
Needs: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/983674/
Bug: T336739
Perhaps:
from analytics.config.dag_config import artifact, create_easy_dag, dataset
# ...
wait_for_webrequest = dataset("hive_wmf_webrequest_all_sources").get_sensor_for(dag)
Or, are you intending to run this before the 'refined' Hive partition is created?
We previously only implemented conversion between PyFlink Rows and python dicts. We'd prefer not to expose any of the PyFlink types if we can help it. This change implements:
Bug: T349640
Ottomata (efc47935) at 12 Dec 16:53
Finish conversions for pyflink and python native types
Bug: T345806
Requires https://gitlab.wikimedia.org/repos/data-engineering/eventutilities-python/-/merge_requests/84 to be merged and released.
Version bump eventutilities-python
to 0.18.
Adds support for partitioning Kafka topics with a record message key.
Bug: T345806
Version bump mediawiki-event-utilities
to 1.3.3.
This version introduces support for producing Kafka messages with a record key. This will allowed partitioning Kafka topics with message ordering preserved. The logic has been implemented in the upstream Java code, and the resulting record serializer instantiation is transparent to Python code. This changes introduces no overhead to the Python process.
Kafka message keys must be declared in the stream configuration (see added unit test for an example).
Ottomata (8891c0cc) at 05 Dec 16:26
WIP - Finish conversions for pyflink and python native types
This should work, and probably is good to have to make sure the perms are correct, but I think if you change this on the parent dir, the newly created partition dirs should inherit the proper group settings.
Ottomata (2f5d54dc) at 08 Nov 14:12
Ottomata (843583ac) at 08 Nov 14:12
Merge branch 'T350732_workflow_utils_fix' into 'main'
... and 1 more commit
These have some recent fixes that allow the trigger_release job to work correctly.
Bug: T350732
Yeah, was temporary, used it to vet that the change to workflow_utils worked before releasing workflow_utils.
Forgot to merge this yesterday: !76 (merged)
Merging now.
Picks up T350732.
Bug: T350732
These have some recent fixes that allow the trigger_release job to work correctly.
Bug: T350732
Ottomata (2f5d54dc) at 07 Nov 20:32
.gitlab-ci.yml - use workflow_utils v0.14.0 CI templates