ZenML:

  |   0 评论   |   0 浏览

简介

ZenML是一个可扩展的MLOPS框架,来创建Reproducible的pipeline。

特点

Guaranteed reproducibility of training experiments via
  • Versioned data, code and models
  • Automatically tracked experiments
  • Declarative pipeline configs
Guaranteed comparability between experiments
Ability to quickly switch between local and cloud environment (e.g. orchestrate pipelines on kubernetes)
Built-in and extensible abstractions for:
  • Distributed pre-processing on large datasets
  • Cloud-based training jobs
  • Model serving
Pre-built helpers to compare and visualize parameters and results:
  • Automated evaluation of each pipeline run with tensorboard + TFMA
  • Automated statistics visualization of each pipeline run with TFDV
Cached pipeline states for faster experiment iterations

使用

1. Connect your data

  • Choose a prebuilt connector for common sources (S3, Google Storage, BigQuery, SQL)
  • Write your own connector to your data or feature store
  • Automatic versioning and caching of your data for faster pipeline start - built-in!

2. Splitting

  • All common splitting methods supported.
  • Natively distributable.
  • A multitude of custom data splits.
  • All common data types supported, including time-series.
  • Auto-format data to TFRecords for 7x faster training.
  • Automatic caching of split results for faster starts of consecutive pipelines.

3. Transform

  • Distributable data preprocessing for lightning-fast pipelines.
  • All common preprocessing methods supported - including time-series.
  • Support for custom tf.functions for custom preprocessing.
  • All transforms are embedded in the training graph for seamless serving.

4. Train

  • Training on pre-configured GPU containers.
  • Hyperparameter-tuning natively baked in.
  • Distributable training for large model architectures.
  • Automated resource provisioning.
  • Leverage cloud resources on GCP, AWS, Azure.

5. Evaluate

  • Automated evaluation for every pipeline.
  • Clearly trace which pipeline steps lead to which results.
  • Absolute freedom - access raw results from Jupyter Notebooks.
  • Bring-your-own-tooling: Evaluate with your own metrics and tools.
  • Compare between pipelines to gain cross-training evaluation insights.

6. Serve

  • Every pipeline yields a serveable model - guaranteed.
  • Preprocessing, including custom functions, is embedded in the graph automatically.
  • Trace each model with full lineage.

7. Integrations

  • Powerful, out-of-the-box integrations to various backends like Kubernetes, Dataflow, Kubeflow, Seldon, Sagemaker, Google AI Platform, and more.
  • Support for remote and local artifact stores
  • Easy integration of centralized Metadata Stores (MySQL).
  • Extensible Interfaces to build your own custom integrations.

8. Collaborate across your organization

  • Execute distributed data pipelines with a simple configuration.
  • Separate configuration and code for robust pipelines.
  • Reuse pipeline states across users and pipelines.
  • Clearly trace which pipeline steps lead to which results.
  • Compare results of training pipelines over time and across pipelines.
  • Share pipeline layouts with your team.

FAQ

Q: Why did you build ZenML?

We built it because we scratched our own itch while deploying multiple ML models in production for the last 3 years. Our team struggled to find a simple yet production-ready solution whilst developing large scale ML pipelines, and built a solution for it that we are now proud to share with all of you!

Q: Can I integrate my own, custom processing backend?

Absolutely. We have a clever design for our integration interfaces, so you can simply add your own!

参考