Snowflake vs redshift

8/31/2023

I’d go so far as to call it the “engineer’s DW” when compared to Snowflake. Redshift is great for optimized big queries. AWS do not actually enumerate the capabilities of worker nodes - there's a list which says "worker nodes can do stuff like this", which then gives a list of examples. The problem here is that users need to know how to write queries correctly, or they end up bringing back the entirety of S3 to the clusters, which will then fall over, having run out of disk space. Anything the workers cannot do leads to the necessary data being returned to the cluster proper, and Redshift itself does that work. Spectrum workers are capable of handling only some aspects of SQL - basically, all functionality which can be executed without needing to talk to another other worker, as they all run in parallel and do not communicate with each other. If you have ten queries running, each is running at one tenth of the speed. So, if you have a single query running, that query has all the bandwidth and I/O to S3. There are a maximum of 10 spectrum workers per Redshift slice, each of which can read one file at a time. To my eye it is in no way comparable to SF external tables, although here note strongly that I have read the SF white paper but I have never touched SF. Spectrum is quite a limited feature, in a number of ways. But other things like having a query fail at runtime because it compiles into an unsupported path without any way to see how exactly that compiles down or affecting and what condition triggers is impossible to follow. Some of it is just frustrating but predictable, like the weak windowing support. make a list of days to fill in blanks where there isn’t data. Because why would you want to do things like. Things like being able to generate a series becomes writing very awkward recursive CTEs. A lot of stuff that you’re used to having in PG does not exist in RedShift.There’s nominal recursive CTE support, but you can’t do things like detect cycles.There’s gaps in ANSI-SQL, some parts of windowing is supported, some are not.Sure you can use VARCHAR and a byte field, but it’s 2023. there’s a lack of common types like UUID and IP.Sometimes it depends on what data the intermediary tables has because it can affect the query plan. you’ll run into a lot of border cases on joins and sub queries, and sometimes they’ll work fine in one case but then not in another.Which in turn gets you into trouble because of string length.

anything involving aggregation on array types, the redshift super type is incredibly weak, if you ever use it you’ll find yourself casting to and from strings really fast with awkward hacks.

It’s a long list and I don’t have the time to really write a comprehensive answer.

0 Comments

Snowflake vs redshift

Leave a Reply.

Author

Archives

Categories