mltrace documentation =================================== mltrace_ is a lightweight, open-source Python tool to get "bolt-on" observability in ML pipelines. It offers the following: - interface to define data and ML tests for components in pipelines - coarse-grained lineage and tracing - Python API to log versions of data and pipeline components - database to store information about component runs - UI and CLI to show the trace of steps in a pipeline taken to produce an output, flag outputs for review, and identify what steps of the pipeline to investigate first when debugging ``mltrace`` is designed specifically for Agile or multidisciplinary teams collaborating on machine learning or complex data pipelines. A more detailed blog post on why the tool was developed can be found here_. .. _mltrace: https://github.com/loglabs/mltrace .. _here: https://www.shreya-shankar.com/introducing-mltrace/ Design principles ^^^^^^^^^^^^^^^^^ - Simplicity (users should know *exactly* what the tool does) - Rinse and repeat other successful designs - Decorator design similar to Dagster solids_ - Logging design similar to MLFlow tracking_ - API designed for both engineers and data scientists - UI designed for people to help triage issues *even if they didn’t build the ETL or models themselves* .. _solids: https://docs.dagster.io/concepts/solids-pipelines/solids .. _tracking: https://www.mlflow.org/docs/latest/tracking.html Roadmap ^^^^^^^ We are actively working on the following: - Component input and output monitoring - Stateful testing (i.e., being able to use historical component inputs outputs in testing and monitoring) - API to log from any type of file, not just a Python file - Prometheus integrations to monitor component output distributions - Support for finer-grained lineage (at the record level) Guides ^^^^^^ .. toctree:: :maxdepth: 2 changelog quickstart concepts logging querying All functions