I get questions like this a lot:
- Where did this data come from?
- How do I know I can trust the source?
- What types of QA checks were applied to this data?
Data lineage is such a chronic issue in data engineering. This blog post from Airbyte gives a good overview & mentions some interesting products/projects that can maybe help out with data lineage.
Unfortunately, I have limited flexibility to purchase or install tools for this in my current role. Anyone rolled their own solution for this?
You must log in or # to comment.
Apache Nifi maintains a linage table for its data movement and transformation