Magento Business Intelligence Help Center

Private Data Warehouse FAQ

JUMP TO

BROUGHT TO YOU BY

Avatar

Don't see your question here? Get in touch with us!

What is Private Data Warehouse?

The Private Data Warehouse (PDW) is a clone of your Magento BI data warehouse exported to an Amazon Redshift instance. PDW allows you to interact with and explore your data any way you like, whether it's running queries or layering on statistical analysis tools.

Note that PDW is an additional feature we provide for Magento BI, not a requirement. You can still interact with your data via the Data Warehouse Manager. PDW is intended for users who want to directly query their data, build custom applications, or layer third-party tools on top of their warehouse.

How does the update cycle work for PDW?

PDW updates are directly tied to Magento BI updates, so every updated Magento BI batch will have a corresponding PDW update that happens around the same time.

How is my data organized in Amazon Redshift?

All data from Magento BI integrations are stored within tables inside a schema in Redshift titled rjm_current. This schema represents the latest batch of data pulled from your data sources.

In addition to rjm_current, another schema that starts with rjm_pending and a timestamp might also be visible at times. You can ignore this schema, as it’s used for loading the next batch of data and is renamed to rjm_current after importing is complete.

As new batches finish importing into Redshift, the existing rjm_current schema is dropped and the rjm_pending schema is renamed to rjm_current. This allows queries to be saved and reused while at the same time allowing for new batches of data to appear.

Hey! The database views I made are gone - what happened?

Because the rjm_pending schema replaces the rjm_current schema during updates, any database views that point to tables inside of rjm_current will be lost when a new batch of data arrives. We recommend you save the definition of these views and rerun them before beginning any analysis work.

Why are the column names different from what I see in the Data Warehouse Manager?

The column names for each table should match closely to the column names you see inside the Data Warehouse Manager, but there are a few situations where they may be slightly different. This is to accommodate naming restrictions within Redshift. Data typing should also be consistent with what’s shown in the Data Warehouse Manager.

How are nested data structures handled?

Many SaaS integrations used nested data structures, which means each attribute (or column) in a table could have its own set of attributes. PDW is designed to deconstruct these nested structures into separate tables to easier to query the data. Note that Magento BI currently supports only two levels of nesting.

Let’s use Shopify as an example. Shopify orders data uses nested records, which are comprised of three parts - an order record, line items, and tax lines. In the example below, we’ll only use a few attributes in each part to explain how the deconstruction works. Here’s what a Shopify order record might look like:

{ “order_id” : “1234” {
 “created_at” : “2015-01-01 00:00:00”,
 “customer” : “100”,
 “line_items” : [
     { “product_id” : “5008798”,
       “price” : “5.99”,
       “quantity” : “1” } ]
}
}

Order Record (top level)

This is the core data that comprises the order, including the order time, the customer, and so on. The primary key for orders - order_id - is also included. PDW will store this table as orders in Redshift.

order_id [pk] created_at customer
1234 2015-01-01 00:00:00 100
1235 2015-01-01 00:05:10 376

 

Line_Items (second level)

This level contains contains one or more sub records that hold the various items ordered as part of the top level order. Attributes include product information, price, and so on.

PDW will store this table as orders__line_items in the data warehouse, and will append a few additional fields to help connect these rows back to the top level record:

  • _id_of_parent - This is the top level key and points to the top level record this row is associated with.
  • _sub_id - This forms part of a composite primary key for this row.
_id_of_parent _sub_id product_id price quantity
1234 0 5008798 5.99 1
1234 1 3445689 10.99 1
1235 0 3435543 6.99 2

 

If we wanted to return all line items for order number 1234, we’d run the following query:

SELECT * FROM orders__line_items li
WHERE _id_of_parent = 1234

We recommend always joining the top level table to the nested table - this will allow you to avoid queries that may have outdated data.

Was this article helpful?
0 out of 0 found this helpful

Comments

Powered by Zendesk