ZenML：

2021-04-05 | 0 评论 | 0 浏览

简介

ZenML是一个可扩展的MLOPS框架，来创建Reproducible的pipeline。

特点

Guaranteed reproducibility of training experiments via

Versioned data, code and models
Automatically tracked experiments
Declarative pipeline configs

Guaranteed comparability between experiments

Ability to quickly switch between local and cloud environment (e.g. orchestrate pipelines on kubernetes)

Built-in and extensible abstractions for:

Distributed pre-processing on large datasets
Cloud-based training jobs
Model serving

Pre-built helpers to compare and visualize parameters and results:

Automated evaluation of each pipeline run with tensorboard + TFMA
Automated statistics visualization of each pipeline run with TFDV

Cached pipeline states for faster experiment iterations

使用

1. Connect your data

Choose a prebuilt connector for common sources (S3, Google Storage, BigQuery, SQL)
Write your own connector to your data or feature store
Automatic versioning and caching of your data for faster pipeline start - built-in!

2. Splitting

All common splitting methods supported.
Natively distributable.
A multitude of custom data splits.
All common data types supported, including time-series.
Auto-format data to TFRecords for 7x faster training.
Automatic caching of split results for faster starts of consecutive pipelines.

3. Transform

Distributable data preprocessing for lightning-fast pipelines.
All common preprocessing methods supported - including time-series.
Support for custom tf.functions for custom preprocessing.
All transforms are embedded in the training graph for seamless serving.

4. Train

Training on pre-configured GPU containers.
Hyperparameter-tuning natively baked in.
Distributable training for large model architectures.
Automated resource provisioning.
Leverage cloud resources on GCP, AWS, Azure.

5. Evaluate

Automated evaluation for every pipeline.
Clearly trace which pipeline steps lead to which results.
Absolute freedom - access raw results from Jupyter Notebooks.
Bring-your-own-tooling: Evaluate with your own metrics and tools.
Compare between pipelines to gain cross-training evaluation insights.

6. Serve

Every pipeline yields a serveable model - guaranteed.
Preprocessing, including custom functions, is embedded in the graph automatically.
Trace each model with full lineage.

7. Integrations

Powerful, out-of-the-box integrations to various backends like Kubernetes, Dataflow, Kubeflow, Seldon, Sagemaker, Google AI Platform, and more.
Support for remote and local artifact stores
Easy integration of centralized Metadata Stores (MySQL).
Extensible Interfaces to build your own custom integrations.

8. Collaborate across your organization

Execute distributed data pipelines with a simple configuration.
Separate configuration and code for robust pipelines.
Reuse pipeline states across users and pipelines.
Clearly trace which pipeline steps lead to which results.
Compare results of training pipelines over time and across pipelines.
Share pipeline layouts with your team.

FAQ

Q: Why did you build ZenML?

We built it because we scratched our own itch while deploying multiple ML models in production for the last 3 years. Our team struggled to find a simple yet production-ready solution whilst developing large scale ML pipelines, and built a solution for it that we are now proud to share with all of you!

Q: Can I integrate my own, custom processing backend?

Absolutely. We have a clever design for our integration interfaces, so you can simply add your own!

参考

ZenML: Write production-ready ML code.