ECE366 - Lesson 12

Selected Topics

Instructor: Professor Hong

Review & Questions

## Postgres Conference [https://postgresconf.org/conferences/2024/schedule/events](https://postgresconf.org/conferences/2024/schedule/events)
## What are conferences? Why are they important? - A gathering of people to share ideas. - A place to learn about new technologies. - A place to network. - A place to learn about new job opportunities. - A place to learn about new products, services, companies, research, trends. - A place to sell products.
## Day 1
## Conference Launch & Keynote - Synergy - Robbers Cave Experiment - Stop & Process - Golden Rule: Treat others how you want to be treated - Platinum Rule: Treat others how they want to be treated
## Why We Put GPUs in the DB - Various extensions to PG are geared towards AI - PGVector, PGML - Interesting story about Instacart - Massive amount of load on Sunday morning for grocery shopping - Elastic Search couldn't handle the load so they switched to sharded PG - Deployed the new infrastructure in real time to handle the load - Save time with no network data movement
## Enabling New Operational Use Cases with PostgreSQL - Sales pitch for Materialize
## Innovating for all with PostgreSQL at AWS - Amazon did 19% of the contributions to PG16 - 10 committers - Added a trusted language extensions - Worked on PGvector - searching for distance between records
## Looking for the Differences in Your Data - CEO of dbeaver - Tatiana Krupenya - Ways to diff data: - by eye -- good for small tables - excel/comparison tools -- small table, no SQL knowledge - SQL - all data in PG - comparison tools big tables in PG, data in other dbs
## Looking for the Differences in Your Data SQL Modified ``` select it.id, lt.uid lt_uid, lt."name" lt_name, rt.uid rt_uid, rt."name" rt_name from table_diff_1 lt join table_diff_2 rt on rt.id = lt.id and (rt.uid <> lt.uid or rt."name" <> lt."name"); ```
## Looking for the Differences in Your Data SQL Deleted ``` select lt.* from table_diff_1 lt where not exists (select 1 from table_diff_2 rt where rt.id=lt.id); ```
## Looking for the Differences in Your Data SQL Added ``` select rt.* from table_diff_2 rt w here not exists (select 1 from table_diff_1 lt where rt.id=lt.id); ```
## pg_hint_plan – get the right plan without surprises - Extension that wasn't accepted in PG - Allows you to modify the query plan for PG query - Sales pitch for yugabyte as pg_hint_plan is in their version of PG - Slides [here](https://postgresconf.org/conferences/2024/program/proposals/pg_hint_plan-get-the-right-plan-without-surprises-322ab02e-ff9b-40c4-8961-123a15367066)
## EXPLAIN ANALYZE ``` explain analyze select player_name from player where player_name = 'malek'; explain analyze select player_name from player where password = 'password2'; ``` - Seq Scan is slow - Index Scan is fast
## Use Cases for Multi-Master Distributed Postgres - Sales pitch for pgEdge - Use case for multi-master db - Need to quickly publish data to local location - Challenge - Keeping all the master nodes in sync
## How can you measure PSQL portability? - Sales pitch for Google cloud, Alloydb, Spanner - Highly Available -- 99.99999% of the time - Translation from PSQL AST (Abstract Syntax Tree) to Spanner AST - More about AST on [Wikipedia](https://en.wikipedia.org/wiki/Abstract_syntax_tree)
## Reducing Costs and Improving Performance With Data Modeling in Postgres - Locality of data affects performance - CPU Registers, CPU Cache, Main Memory, Secondary Storage (SSD, HDD) - Fixed page size of 8kb; model the data to fit the page size and reduce padding - Think about the sizes of each data type - [TOAST](https://www.postgresql.org/docs/16/storage-toast.html) - The Oversized-Attribute Storage Technique - Speed up by an avg of 15%
## Achieving High Availability (HA) in PSQL - Hosting PG is not as simple as running a container - Operational Continuity - Redundancy and Automated Fallover - Data Integrity with Synchronous Replication - Seamless Fallover Mechanisms - Monitoring and Regular Testing - Slides [here](https://postgresconf.org/conferences/2024/program/proposals/achieving-high-availability-ha-in-postgresql-strategies-tools-and-best-practices)
## Day 2
## The First Vacuum Daemon - Added to PostgreSQL 6.5.0 in 1999 - Added/updated records => a new tuple is created - Old tuples need to be regularly vacuumed or cleaned up to reduce bloat - Bitemporal Data Model - Effective Time - when a tuple is effective in the real world - Asserted Time - when a tuple exists in the db - Primary Key: Business keys + Effective Time + Asserted Time
## The future of databases and Generative AI - Sales pitch for Google Cloud, Alloydb - Large Language Model + Postgres (PGvector)
## JSON and analytics in Postgres using index and columnar - JSON vs JSONB (binary, better) - Duplicate Keys - JSON preserves; JSONB stores only the last value - Operations with Keys - JSON can update, add, delete keys; JSONB can do nested keys and values - Performance - JSON insert is faster; JSONB indexes, extracts faster
## JSON and analytics in Postgres using index and columnar - Columnar representation of data in memory - Entire table or selected columns - Push down aggregations - Read more [here](https://www.percona.com/blog/powering-postgresql-15-with-columnar-tables/)
## MERGE() - A Quick Introduction - [Merge](https://www.postgresql.org/docs/current/sql-merge.html) documentation - Postgres 15+ - Reduces round trips to the db - PG 17 new features - RETURNING, MERGE_ACTION - Slides [here](https://postgresconf.org/conferences/2024/program/proposals/merge-a-quick-introduction)
## Everything you want to know about Postgres autovacuum - Multi-version concurrrency control - when a query updates an existing row in a table, th eDBMS makes a copy of that row and applies the changes to this new version instead of overwriting the original version. - Readers do not block writers, writers do not block readers - Introduces additional overhead and issues and results in bloat
## Vacuum and Autovacuum - 3 main phases - scan heap - scan a target table to build a list of dead tuples stored in the local memory - vacuum index - remove index tuples by referring to the dead tuple list - vacuum heap - remove dead tupes from the heap - autovacuum can be configured to regularly clean up bloat - vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
## Enhancing OLTP Systems with pgvector: Best Practices in Database Design - pgvector - an open-source extension that - adds support for storage, indexing, searching, metadata with choice of distance - 16k dimensions, 3 distance metrics, 2 index types - General Tips - Ensure that there is sufficient memory - Connection Pooling - Tune Autovacuum - Still changing A LOT
## Advanced strategies for PostgreSQL Lock Management - Locks happen tables / records are modified in incompatible ways (e.g. non-backwards compatitible schema changes) - Analyze the queries that are running - ```select * from pg_stat_activity;``` - Slides [here](https://postgresconf.org/conferences/2024/program/proposals/advanced-strategies-for-postgresql-lock-management)
## Monitoring PostgreSQL - It's important to monitor the processes of Postgres. - Built-in tools - pg_stat_statements - pg_stat_monitor - pg_stat_plans - pgstattuple - pg_buffercache
## Monitoring PostgreSQL - External Tools - pgwatch2 - PGObserver - pgCluu - pgexporter - Prometheus & Grapfana - Percona PMM (sales pitch!)
## Open Source At A Crossroads - Lots of companies are selling shovels - Some companies are taking open source and making money off of it rather than contributing back
## Top 5 PostgreSQL Query Tuning Tips - Slides [here](https://postgresconf.org/conferences/2024/program/proposals/top-5-postgresql-query-tuning-tips)
## Day 3
## Working With The Citus Extension
## PostgreSQL Partitioning: Beyond the obvious
## Mastering PostgreSQL Partitioning: Supercharge Performance and Simplify Maintenance
## The Future of MySQL is Postgres
## Transaction Isolation in Postgres
## Local-first application architecture using Postgres logical replication
## Scaling PostgreSQL: Navigating Horizontal and Vertical Scalability Pathways - Slides [here](https://postgresconf.org/conferences/2024/program/proposals/scaling-postgresql-navigating-horizontal-and-vertical-scalability-pathways)