Today more than ever, success with E-Commerce (E-Comm) requires constant supervision and evolution. Online platforms like Shopify and Amazon (AWS) make it easy to enter a market, but also create an extremely competitive environment. Part of keeping E-Comm profitable is an analysis called “Marketing Spend Analysis”. There are many versions of this. Most of which include metrics related to advertising costs, budget, conversions, and revenue.
In the following writeup, we cover an actual project designed for a customer’s E-Comm (D2C) portion of their business. The customer is a high-growth, mid-sized company that used many of the most popular platforms – Shopify, Amazon MWS, Google AdWords, MS Advertising / BING, Google Analytics and more, however, the manual process of collecting data no longer supported their business. They concluded that a timely, more accurate solution would lead to better decisions and result in greater profits.
The customer used many of the most popular platforms. The solution had to be capable of sourcing data from each platform, storing it, harmonizing it, and presenting it to upper management and the marketing teams.
The following systems were considered in scope:
· Facebook Advertising – Marketing
The above is a diagram of the customer’s workbook-based solution. All downloads were performed manually by one or more staff. Each file in the diagram represents an actual download file – with Google Analytics requiring 4 distinct downloads for each time period required.
The customer faced many of the same issues that every growing department faces. In the beginning, a workbook-based solution started out small and met their needs. As E-Comm became a higher priority, increased platform count, data volume and visibility drove the need for a long-term solution. Their analysts were spending more time collecting data than actually analyzing it. And human errors were gaining wider visibility.
The solution used tools from the traditional Microsoft Data Warehousing domain along with newer tools such as Azure Data Factory (ADF) and Azure Databricks. Below is a discussion of the components used and their overall purpose.
Microsoft Azure was used as the platform on which to build the solution. Its tight integration with the customers’ existing Office365 installation, along with the low cost of the components required (ADF, MS SQL Server, AD etc.) made it the winning cloud platform for this particular custom.
Azure Data Factory (ADF) – ADF is an ETL/integration platform used to move data from one environment to another. At the time of this project, ADF had 80+/- pre-defined connectors designed to seamlessly connect sources and targets. The reality of it is that many times ADF supports basic connectivity, and a working knowledge of the source’s API is required.
Example: The Shopify connector. The ADF connector allows for a quick connection to Shopify’s REST/API (via OAUTH2); however, a developer must be familiar with the API, the URL/endpoints, and the APIs parameters to control data flow. For most connectors, ADF provides basic connectivity and the customer must be able to build it out as needed.
Azure SQL Database – A relational database was chosen to store data for reporting. Analysts are accustomed to working with relational databases, writing SQL and Power BI can easily connect to it. The Azure SQL Database comes with automatic backup, supports replication, and contains most of the features that come with an on-premises SQL Server install. Management of the database is performed via the Azure portal and connectivity is via their standard Enterprise Manager. A fully managed version of Azure SQL Database is available. For this project, we chose to go with the pay-as-you-go option – which includes a base price for storage plus processing time fees.
Azure Storage Account w/ Azure Data Lake Storage (ADLSv2) – Azure blob storage is an inexpensive way to store any kind of data. An Azure Storage Account is fully integrated with Azure security and it includes storage types File Share, Tables, and Blob. Basically, anything can be stored in it, in any format, securely and redundantly. For this project, a data lake was created using storage types File Share and Blob. The data lake was used to support data ingestion and archiving.
Data Bricks – Data Bricks is an open-source environment based on Python/Spark. It is designed to process large amounts of data across multiple nodes (think “Big Data”). In this project, each of the sources had its own API. For some sources, we used Python code to access the API. Data Bricks was used to run the Python code. Other Azure options were available, but the tight integration between the Azure Data Factory and Data Bricks made it the logical choice.
PBI was selected as the visualization tool. It includes all the presentation controls needed (charts, graphs, slicers, filters etc.) and at the time was also the lowest cost. For this use case, when comparing the value proposition of PBI to others, PBI was the clear winner.
Power BI Service – The PBI Service is a cloud service hosted by Microsoft. It is subscription-based and is used to publish/share dashboards. The PBI Service comes in a “shared” capacity via “Power BI Pro” licensing (shared CPU time with other PBI customers) or in a dedicated capacity via a “Premium” subscription. The client’s needs were met by PBI Pro, and the monthly fee of $9.99/per user fell within budget. Free end-user access is also available for some view-only scenarios.
Power BI Desktop – The PBI Desktop is where all the development work is done. The Desktop is locally installed and used to load, wrangle, model, and visualize data. It’s free, widely distributable and built with an analyst in mind. Once a data model is completed and is ready for a wider audience, it can be published to the PBI Service where updates can be scheduled, and access controlled.
Note: Some of the reporting capabilities of PBI Desktop are embedded in the PBI Service – so power users do not need to use the Desktop. We, however, prefer the Desktop for most of our development work.
Miscellaneous – Microsoft Power Apps was used to provide end users the ability to manually update specific database tables (mapping & status tables). Azure Logic Apps was used to send status/notification emails from the Azure Data Factory (ADF).
The above solution takes data from today’s leading cloud platforms and combines it for analysis. It provides timely analysis of both marketing spend and its return. Many companies struggle with it and if you have this issue, you are not alone. Many customers contact us with issues related to just a subset of these sources. Feel free to contact us with questions related to a single source or this solution in its entirety. We will help you however we can. Thank you for reading!
The above solution was designed and implemented by Eastern Analytics, Inc. Please direct all questions to Scott.Pietroski@eastern-analytics.us