8 databases supporting in-database machine learning

In my August 2020 article, “How to choose a cloud machine learning platform,” my first guideline for choosing a platform was, “Be close to your data.” Keeping the code near the data is necessary to keep the latency low, since the speed of light limits transmission speeds. After all, machine learning — especially deep learning — tends to go through all your data multiple times (each time through is called an epoch).

I said at the time that the ideal case for very large data sets is to build the model where the data already resides, so that no mass data transmission is needed. Several databases support that to a limited extent. The natural next question is, which databases support internal machine learning, and how do they do it? I’ll discuss those databases in alphabetical order.

Amazon Redshift

Amazon Redshift is a managed, petabyte-scale data warehouse service designed to make it simple and cost-effective to analyze all of your data using your existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year.

Amazon Redshift ML is designed to make it easy for SQL users to create, train, and deploy machine learning models using SQL commands. The CREATE MODEL command in Redshift SQL defines the data to use for training and the target column, then passes the data to Amazon SageMaker Autopilot for training via an encrypted Amazon S3 bucket in the same zone.

After AutoML training, Redshift ML compiles the best model and registers it as a prediction SQL function in your Redshift cluster. You can then invoke the model for inference by calling the prediction function inside a SELECT statement.

Summary: Redshift ML uses SageMaker Autopilot to automatically create prediction models from the data you specify via a SQL statement, which is extracted to an S3 bucket. The best prediction function found is registered in the Redshift cluster.

Copyright © 2021 IDG Communications, Inc.

Source link