Building Data Products
You don't need to be a technical product manager to build a data product. This article will cover the 5 key steps to follow when preparing your raw data in order to build a valuable data product
We hear a lot about data and products, but have you stopped to think about how a data product is made? In this article, I'll share with you some of the basics of data products to help you become a data literate product manager.
Let’s zoom out first; why are data products even important? They are increasingly important because they provide the insights needed to inform decision-making, improve operational efficiency, build personalisation in products, detect fraud and drive innovation. The basics of data products require us to learn a new language; you'll be understanding the raw data you need, and how to prepare and process it to get you the results and insights that your product needs.
If you haven’t built a data product before, it can seem daunting to know where to start from. Luckily you don’t need to be a technical product manager, or have been a developer in your past life, to understand the key steps to preparing your data correctly.
But first
It’s so easy to jump straight into solutionising, but before you start building your data product, here are some questions to answer.
Who will use it?
What question(s) will they be using the data to answer? What story does the data need to tell?
How do they intend to use the data?
What are the boundaries of the data you will need to capture and store?
How will you ensure the security of the data, especially if you’re handling personal or PII data?
Is your proposed solution scalable?
Once you’ve determined what to build and why, then you’re ready to build!
What are the key data preparation steps?
There are 5 key steps when preparing your data set.
1. Data Selection
This step is all about choosing the right data for the question you’re trying to answer. It’s essential that your data sources are accurate, and so it’s always worth doing the extra work up-front if needed to make sure of its accuracy. If a user’s confidence in your product is eroded, it can be difficult to regain their trust, and so you should make sure the source data is reliable.
Finally, keep in mind your goal - that is, what you’re trying to achieve with the data. Remember that most times, less is more.
2. Data Preprocessing
Here, the raw data is cleaned and organised. You should also check for any errors and eliminate bad data. Once the data is cleaned, there are a number of other steps that you could perform to process the data, depending on the outcome you’re trying to achieve.
You may need to Harmonise the data; this is where the clean, raw, unstructured data is converted into something more meaningful or understandable.
You may need to Format the data; if the source data was available in different formats, you can make it uniform during this step.
If a smaller data set is sufficient for your purpose, you can Sample a portion of the full data set to save time, memory and operating costs.
Finally, if you need to handle vast amounts of data, think about how you might Aggregate it to make it easier to transform in the next step.
3. Data Transformation
The data transformation step can be simple or complex based on the changes that are needed to get the source data ready for its intended use. It could include mapping between two or more data sets, and the data might need to be enriched with more information to make it more useful. Typically in this step is where you perform any calculations or queries to support the intended output.
4. Data Output
This is the final step before the data is published in some way so that it can be consumed by non-data specialists. You can visualise the data via a graph or chart, video, report, to name a few formats. You should refer back to the questions you answered before you started building to determine the best way to visualise the output.
Depending on the type of data product you’re building, there may be considerations about how 'live' the data is and how you should communicate to users if the data does not capture the latest information. You might also need a way to communicate to users if there is missing data at this point.
5. Data Storage
Consider how and where the data is stored for future use. In determining the best storage options, speak to your tech lead to understand the balance between cheaper long term storage, and more costly but typically quicker to access options.
For example, you could cut your running costs by up to 50% by using AWS S3 Infrequent Access (IA) storage versus their Standard storage solution. However, if you need to retrieve the data often, the costs to retrieve data from IA storage could very quickly add up and reduce the savings you made by using IA in the first place.
Useful non-functional requirements to put in place
In addition to thinking about the core features of your data product, there are a number of NFRs to review and iterate to improve:
Performance; how can you build it in a way that optimises time and computation power when reading and writing operations?
Scalability; as the product scales to handle more data points, is the loading time maintained whether it’s processing 100,000 data points versus 1,000,000?
Completeness; how can you minimise missing or incomplete data in your product?
Reliability; how reliable is your solution and can you instil confidence in your end users to use the product?
Monitoring and Notification; how you will be alerted if the system goes down or fails?
Compliance; does the data need to be processed in a compliant way, for example purging personal data after 5 years?
In summary
Building a data product shouldn’t be daunting, even if you haven’t done it before. Once you’ve determined how the data will be used and what question(s) it needs to answer, then you simply adapt the key data preparation steps to achieve the desired outcome.
Have you built a data product recently? How did you approach the steps to prepare your data set(s) for processing? I’d love to hear your examples too, share them in the comments below.