interested in building a data pipeline with dbt cloud and starburst galaxy? if so, then this post presents recorded videos of 7 lab exercises plus the lab guide itself so you work through them on your own & at your pace
Tag Archives: hive
hive acid transactions work on trino (can even update a partitioned column)
it seems that folks who haven’t used hive in production are always quick to say that hive doesn’t have classic crud operations, much less the merge statement, and that simply isn’t true – this post shows you that you can create a hive acid table and mutate its contents with trino
building a sql-based data pipeline with trino & starburst (5 slick videos)
a collection of videos presented as an overview of how you could build a sql-based data transformation pipeline utilizing trino/starburst and automating it with dbt
determining # of splits w/trino/starburst/galaxy (hive table format)
ever wondered how trino decides how many splits to use in a query when reading files from your data lake — if so, come along and ride on a fantastic voyage
starburst galaxy’s materialized views (using apache iceberg)
join me on a quick test drive of the features of materialized views in starburst galaxy (saas offering powered by trino) which use apache iceberg for persistence and features some pretty cool features around snapshots and awareness of stale data
hive, trino & spark features (their journeys to sql, performance & durability)
different big data sql engines are created to solve a particular lack of focus from existing ones, but sooner or later they all start looking like each other from their list of features and observable behaviors
securing hive entities (ranger and atlas to the rescue)
video showing how to use ranger & atlas to create security policies on hive tables, columns and rows as well as implementing data masking and tag-based restrictions
hive’s merge statement (it drops a lot of acid)
hive’s merge command provides another option for acid transactioning beyond insert, update and delete — this post walks you through a simple example and looks at the underlying filesystem at all the base, delta and delta_delete files that are created to support this standard sql command
hive delta file compaction (minor and major)
a quick walk-thru of how minor and major compactions occur for hive transactional tables; ensuring all the delta files eventually roll into base ones
hive acid transactions with partitions (a behind the scenes perspective)
let’s take a deeper look at what happens under the hood of hive on these “acid” activities such as insert, update and delete — including look at the actual directories and orc files created