Design and Implementation of Data Lakehouse Architecture for Self-Service Analytics

Authors

  • Soon Kien Yuan Soon Centre for Mathematical Science Universiti Malaysia Pahang Al-Sultan Abdullah, Lebuhraya Persiaran Tun Khalil Yaakob, 26300 Gambang, Pahang, Malaysia.
  • Nor Azuana Ramli Centre for Mathematical Science, Universiti Malaysia Pahang Al-Sultan Abdullah, Lebuh Persiaran Tun Khalil Yaakob, 26300 Kuantan, Pahang, Malaysia. https://orcid.org/0000-0002-4158-2890
  • Mohd Zaid Waqiyuddin Mohd Zulkifli Credence, 1 Jalan Damansara, Damansara Kim, 60000 W.P. Kuala Lumpur, Malaysia.

DOI:

https://doi.org/10.58915/amci.v15i2.2267

Keywords:

architecture, data lakehouse, data warehouse, cloud, data management

Abstract

This paper focused on designing data lakehouse architecture for self-service analytics. The objectives include creating a collaborative analytics environment, streamlining the management of multiple extract, transform and load (ETL) processes, adopting cost-effective and non-proprietary architecture, integrating with business intelligence (BI) tools, ensuring high query performance for interactive visualization, enabling data warehousing capabilities, and offering a self-service data discovery and metadata platform.  An iterative development methodology that involved requirement gathering and planning, design, implementation, testing, deployment, and maintenance phases was utilized in this research. The logical design comprises six layers: data ingestion, storage, catalog, semantics, processing, and consumption. For physical design, Dremio was used as the core component, while Apache Iceberg was used for data format and query processing. The case study presented in this paper adopted an Integrated Multi-Zone Analytics Framework to handle data tasks and workloads. As this paper concludes, it suggests future enhancements, such as considering the Dremio Enterprise Edition for advanced features, and exploring Databricks and MLflow if expecting extensive machine learning workloads. These enhancements can further improve the architecture and its outcomes.

Downloads

Published

02-06-2026

How to Cite

Soon, S. K. Y., Nor Azuana Ramli, & Mohd Zulkifli, M. Z. W. (2026). Design and Implementation of Data Lakehouse Architecture for Self-Service Analytics. Applied Mathematics and Computational Intelligence (AMCI), 15(2), 75–91. https://doi.org/10.58915/amci.v15i2.2267

Issue

Section

Articles

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.