HomeBig DataSharing Context Between Duties in Databricks Workflows

Sharing Context Between Duties in Databricks Workflows


Databricks Workflows is a fully-managed service on Databricks that makes it straightforward to construct and handle complicated information and ML pipelines in your lakehouse with out the necessity to function complicated infrastructure.

Generally, a job in an ETL or ML pipeline is dependent upon the output of an upstream job. An instance can be to judge the efficiency of a machine studying mannequin after which have a job decide whether or not to retrain the mannequin based mostly on mannequin metrics. Since these are two separate steps, it could be finest to have separate duties carry out the work. Beforehand, accessing info from a earlier job required storing this info exterior of the job’s context, corresponding to in a Delta desk.

Databricks Workflows is introducing a brand new characteristic referred to as “Activity Values”, a easy API for setting and retrieving small values from duties. Duties can now output values that may be referenced in subsequent duties, making it simpler to create extra expressive workflows. Wanting on the historical past of a job run additionally supplies extra context, by showcasing the values handed by duties on the DAG and job ranges. Activity values may be set and retrieved by way of the Databricks Utilities API.

The historical past of the run exhibits that the “evaluate_model” job has emitted a worth
When clicking on the duty, you possibly can see the values emitted by the duty

Activity values are actually typically out there. We’d love so that you can check out this new performance and inform us how we will enhance orchestration even additional!



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments