understanding rag ai apps (and the pipelines that feed them)

i’m learning all about rag ai apps and wanted to try to explain, at a high-level, what these are all about plus do the same for the etl pipelines that are key to their success

iceberg snapshots affect storage footprint (not performance)

it is easy to understand why most folks initially imagine that iceberg’s ability to maintain a long history of snapshots will cause performance problems, but that is not the case — the real gotcha is that keeping many versions can quickly consume 2-10+ times the amount of data lake storage space