E-Commerce – Marketing Spend Analysis – Using Microsoft Azure and Power BI
- March 28 2022
- Scott Pietroski
Marketing Spend Analysis – Intro
Today more than ever, success with E-Commerce (E-Comm) requires constant supervision and evolution. Online platforms like Shopify and Amazon (AWS) make it easy to enter a market, but also create an extremely competitive environment. Part of keeping E-Comm profitable is an analysis called “Marketing Spend Analysis”. There are many versions of this. Most of which include metrics related to advertising costs, budget, conversions, and revenue.
In the following writeup, we cover an actual project designed for a customer’s E-Comm (D2C) portion of their business. The customer is a high-growth, mid-sized company that used many of the most popular platforms – Shopify, Amazon MWS, Google AdWords, MS Advertising / BING, Google Analytics and more, however, the manual process of collecting data no longer supported their business. They concluded that a timely, more accurate solution would lead to better decisions and result in greater profits.
Required Platforms/Data Sources:
The customer used many of the most popular platforms. The solution had to be capable of sourcing data from each platform, storing it, harmonizing it, and presenting it to upper management and the marketing teams.
The following systems were considered in scope:
· Facebook Advertising – Marketing- Google AdWords – Marketing
- Microsoft Advertising (Bing) – Marketing
- Shopify – E-Comm Order Processing
- Google Analytics – Web traffic analytics
- Amazon Advertising – Marketing
- Amazon Web Marketplace – E-Comm Order Processing and fulfillment (FBA)
Existing Environment – Workbook-based solution:
The original marketing spend analysis:
The above is a diagram of the customer’s workbook-based solution. All downloads were performed manually by one or more staff. Each file in the diagram represents an actual download file – with Google Analytics requiring 4 distinct downloads for each time period required.
Business Problem:
The customer faced many of the same issues that every growing department faces. In the beginning, a workbook-based solution started out small and met their needs. As E-Comm became a higher priority, increased platform count, data volume and visibility drove the need for a long-term solution. Their analysts were spending more time collecting data than actually analyzing it. And human errors were gaining wider visibility.
- Manual Data Collection – All 7 source platforms have their own internal Analytics/Reporting functionality. During the manual collection process, an analyst would log into each system, identify, and execute individual reports. The results were then exported to a file (.CSV or .XLS) and later added to a ‘master workbook’. Some sources (i.e., Google Analytics) required 4 different exports per time period.
- Timeliness of Data – Due to the required effort, some platforms were downloaded weekly and others monthly. All data was extracted at the day level (by date); however, the download process was so cumbersome that the analysis was only being performed monthly. To gain real value from their data, the business needed to have visibility daily. They needed to be able to determine which ads were working and pause those that were not.
- Error Prone – All 7 sources provide data in distinct formats and levels. An analyst would take the data, add it to the ‘master workbook’ and attempt to harmonize it. The harmonization process relied heavily on VLOOKUPS, formulas, cross-references, and adjustments. The analyst had to continually adjust the workbook to support new values, new formulas, higher volumes, and system changes. Most months this process took 2-4 days requiring reconciliations and final adjustments.
- Scalability – As new sources and rows were added, the complexity continued to grow. From some systems, new variants of data were arriving monthly. As the data evolved so did the workbook and its complexity. Eventually, file size even became an issue.
- Security – There was no real security when it came to the workbook-based reporting. Workbooks were usually stored on a shared drive and emailed out. The ‘master’ workbook approach worked well for those authorized, but it could easily be shared which sometimes led to outside scrutiny. The option of releasing a subset of data was not always possible. So, in the end, visibility was usually restricted even though other groups might benefit from the data. The customer needed a solution that allowed control over access to a report and also to the data within it. Ideally, each manager would be restricted to their own data, while upper management could see it all.
- One Version of the Truth – The more versions of the workbook that existed – the easier it was for someone to become misinformed. Version management had become such a problem that directors would email the analyst requesting the most recent workbook rather than rely on the published version. Thus, ensued the inefficient monthly process of email tag, checking the dates on files and so on.
The solution: Built using Microsoft Azure and Power BI:
Microsoft Azure and Power BI: Tools and services used
The solution used tools from the traditional Microsoft Data Warehousing domain along with newer tools such as Azure Data Factory (ADF) and Azure Databricks. Below is a discussion of the components used and their overall purpose.
Microsoft Azure
Microsoft Azure was used as the platform on which to build the solution. Its tight integration with the customers’ existing Office365 installation, along with the low cost of the components required (ADF, MS SQL Server, AD etc.) made it the winning cloud platform for this particular custom.
Azure Data Factory (ADF) – ADF is an ETL/integration platform used to move data from one environment to another. At the time of this project, ADF had 80+/- pre-defined connectors designed to seamlessly connect sources and targets. The reality of it is that many times ADF supports basic connectivity, and a working knowledge of the source’s API is required.
Example: The Shopify connector. The ADF connector allows for a quick connection to Shopify’s REST/API (via OAUTH2); however, a developer must be familiar with the API, the URL/endpoints, and the APIs parameters to control data flow. For most connectors, ADF provides basic connectivity and the customer must be able to build it out as needed.
Azure SQL Database – A relational database was chosen to store data for reporting. Analysts are accustomed to working with relational databases, writing SQL and Power BI can easily connect to it. The Azure SQL Database comes with automatic backup, supports replication, and contains most of the features that come with an on-premises SQL Server install. Management of the database is performed via the Azure portal and connectivity is via their standard Enterprise Manager. A fully managed version of Azure SQL Database is available. For this project, we chose to go with the pay-as-you-go option – which includes a base price for storage plus processing time fees.
Azure Storage Account w/ Azure Data Lake Storage (ADLSv2) – Azure blob storage is an inexpensive way to store any kind of data. An Azure Storage Account is fully integrated with Azure security and it includes storage types File Share, Tables, and Blob. Basically, anything can be stored in it, in any format, securely and redundantly. For this project, a data lake was created using storage types File Share and Blob. The data lake was used to support data ingestion and archiving.
Data Bricks – Data Bricks is an open-source environment based on Python/Spark. It is designed to process large amounts of data across multiple nodes (think “Big Data”). In this project, each of the sources had its own API. For some sources, we used Python code to access the API. Data Bricks was used to run the Python code. Other Azure options were available, but the tight integration between the Azure Data Factory and Data Bricks made it the logical choice.
Microsoft Power BI (PBI)
PBI was selected as the visualization tool. It includes all the presentation controls needed (charts, graphs, slicers, filters etc.) and at the time was also the lowest cost. For this use case, when comparing the value proposition of PBI to others, PBI was the clear winner.
Power BI Service – The PBI Service is a cloud service hosted by Microsoft. It is subscription-based and is used to publish/share dashboards. The PBI Service comes in a “shared” capacity via “Power BI Pro” licensing (shared CPU time with other PBI customers) or in a dedicated capacity via a “Premium” subscription. The client’s needs were met by PBI Pro, and the monthly fee of $9.99/per user fell within budget. Free end-user access is also available for some view-only scenarios.
Power BI Desktop – The PBI Desktop is where all the development work is done. The Desktop is locally installed and used to load, wrangle, model, and visualize data. It’s free, widely distributable and built with an analyst in mind. Once a data model is completed and is ready for a wider audience, it can be published to the PBI Service where updates can be scheduled, and access controlled.
Note: Some of the reporting capabilities of PBI Desktop are embedded in the PBI Service – so power users do not need to use the Desktop. We, however, prefer the Desktop for most of our development work.
Miscellaneous – Microsoft Power Apps was used to provide end users the ability to manually update specific database tables (mapping & status tables). Azure Logic Apps was used to send status/notification emails from the Azure Data Factory (ADF).
Marketing Spend Analysis – Product Tips and Tricks
- Connectivity – Approvals: Authorizations and approvals are required to extract data from each source/platform. Some sources allow the user to generate access keys & tokens with no additional support. Other sources (i.e. Facebook) require a developer to be approved, a custom application to be created and the application to be approved/authorized. For planning purposes, expect an average of 1-2 weeks to obtain approval for any one REST/API connection.
- Connectivity – REST/API: Each platform has its own API. There is no standard among the APIs other than REST – which is a framework of properties and not functionality. There is a learning curve for each API. Prior experience with REST/APIs is suggested.
- Connectivity – ADF: When trying to connect the Azure Data Factory (ADF) to a new API, download the API’s sample code and use it as a reference. It will help circumvent any sticking points. Each platform has API documentation; however, it can be limited and is not very intuitive.
- Data – Timeliness: Timeliness of data is always important. Design architecture to support incremental and full loads. There will be times when an API’s connectivity is limited requiring recovery from a failed load. Most systems allow for extraction as frequently as needed; however, some will throttle extractions at peak times (causing extraction to fail). For this project, data was extracted daily (in the early morning) without many failures. Design the system with flexibility in mind.
- Data – Integrity: Some sources will update data weeks or months later. An example of this is a return to Shopify. An order record may be created today, and that same record may be updated with return information 3 weeks from now. The solution should be designed accordingly.
- Data – Granularity: Many sources provide little in the form of available columns. Others provide so many columns that an analysis is needed. Extract as many columns/attributes as possible then aggregate. Be careful to avoid ‘sampling’. Sampling is when a source returns a ‘representative’ data set – rather than returning your actual data.
- Data – Best Practices: An analyst is probably already pulling data from sources and wrangling it. Work with the analyst to determine entity relationships, sticking points and harmonization issues. Many APIs allow access to data points not easily accessed via their own front end. Consider the API when determining scope. Add additional time for unexpected modeling requirements. The data does not usually come out of an API the way it is displayed on their reports.
- Data – Harmonization: Use mapping tables for harmonization where necessary. Mapping tables allow for variations in the source data without requiring periodic rework.
- PBI – Visualization: Minimize the number of PBI Datasets where possible. Overlapping datasets will cause additional maintenance. Architect Power BI as you would any other analytics environment.
- PBI – Deployment: PBI is designed with power users in mind. Dashboards and reports will multiply quickly once users have access. Store the ‘Official’ datasets in one workspace and let the power users perform ad-hoc in their own.
Marketing Spend Analysis – Conclusion
The above solution takes data from today’s leading cloud platforms and combines it for analysis. It provides timely analysis of both marketing spend and its return. Many companies struggle with it and if you have this issue, you are not alone. Many customers contact us with issues related to just a subset of these sources. Feel free to contact us with questions related to a single source or this solution in its entirety. We will help you however we can. Thank you for reading!
The above solution was designed and implemented by Eastern Analytics, Inc. Please direct all questions to Scott.Pietroski@eastern-analytics.us
Leave your thought here
Your email address will not be published. Required fields are marked *