ECE366 - Lesson 12
Selected Topics
Instructor: Professor Hong
## Postgres Conference
[https://postgresconf.org/conferences/2024/schedule/events](https://postgresconf.org/conferences/2024/schedule/events)
## What are conferences? Why are they important?
- A gathering of people to share ideas.
- A place to learn about new technologies.
- A place to network.
- A place to learn about new job opportunities.
- A place to learn about new products, services, companies, research, trends.
- A place to sell products.
## Conference Launch & Keynote
- Synergy
- Robbers Cave Experiment
- Stop & Process
- Golden Rule: Treat others how you want to be treated
- Platinum Rule: Treat others how they want to be treated
## Why We Put GPUs in the DB
- Various extensions to PG are geared towards AI
- PGVector, PGML
- Interesting story about Instacart
- Massive amount of load on Sunday morning for grocery shopping
- Elastic Search couldn't handle the load so they switched to sharded PG
- Deployed the new infrastructure in real time to handle the load
- Save time with no network data movement
## Enabling New Operational Use Cases with PostgreSQL
- Sales pitch for Materialize
## Innovating for all with PostgreSQL at AWS
- Amazon did 19% of the contributions to PG16
- 10 committers
- Added a trusted language extensions
- Worked on PGvector - searching for distance between records
## Looking for the Differences in Your Data
- CEO of dbeaver - Tatiana Krupenya
- Ways to diff data:
- by eye -- good for small tables
- excel/comparison tools -- small table, no SQL knowledge
- SQL - all data in PG
- comparison tools big tables in PG, data in other dbs
## Looking for the Differences in Your Data SQL
Modified
```
select it.id,
lt.uid lt_uid,
lt."name" lt_name,
rt.uid rt_uid,
rt."name" rt_name
from table_diff_1 lt
join table_diff_2 rt
on rt.id = lt.id
and (rt.uid <> lt.uid
or rt."name" <> lt."name");
```
## Looking for the Differences in Your Data SQL
Deleted
```
select lt.* from table_diff_1 lt
where not exists (select 1
from table_diff_2 rt
where rt.id=lt.id);
```
## Looking for the Differences in Your Data SQL
Added
```
select rt.*
from table_diff_2 rt w
here not exists (select 1
from table_diff_1 lt
where rt.id=lt.id);
```
## pg_hint_plan – get the right plan without surprises
- Extension that wasn't accepted in PG
- Allows you to modify the query plan for PG query
- Sales pitch for yugabyte as pg_hint_plan is in their version of PG
- Slides [here](https://postgresconf.org/conferences/2024/program/proposals/pg_hint_plan-get-the-right-plan-without-surprises-322ab02e-ff9b-40c4-8961-123a15367066)
## EXPLAIN ANALYZE
```
explain analyze
select player_name
from player
where player_name = 'malek';
explain analyze
select player_name
from player
where password = 'password2';
```
- Seq Scan is slow
- Index Scan is fast
## Use Cases for Multi-Master Distributed Postgres
- Sales pitch for pgEdge
- Use case for multi-master db
- Need to quickly publish data to local location
- Challenge
- Keeping all the master nodes in sync
## How can you measure PSQL portability?
- Sales pitch for Google cloud, Alloydb, Spanner
- Highly Available -- 99.99999% of the time
- Translation from PSQL AST (Abstract Syntax Tree) to Spanner AST
- More about AST on [Wikipedia](https://en.wikipedia.org/wiki/Abstract_syntax_tree)
## Reducing Costs and Improving Performance With Data Modeling in Postgres
- Locality of data affects performance
- CPU Registers, CPU Cache, Main Memory, Secondary Storage (SSD, HDD)
- Fixed page size of 8kb; model the data to fit the page size and reduce padding
- Think about the sizes of each data type
- [TOAST](https://www.postgresql.org/docs/16/storage-toast.html) - The Oversized-Attribute Storage Technique
- Speed up by an avg of 15%
## Achieving High Availability (HA) in PSQL
- Hosting PG is not as simple as running a container
- Operational Continuity
- Redundancy and Automated Fallover
- Data Integrity with Synchronous Replication
- Seamless Fallover Mechanisms
- Monitoring and Regular Testing
- Slides [here](https://postgresconf.org/conferences/2024/program/proposals/achieving-high-availability-ha-in-postgresql-strategies-tools-and-best-practices)
## The First Vacuum Daemon
- Added to PostgreSQL 6.5.0 in 1999
- Added/updated records => a new tuple is created
- Old tuples need to be regularly vacuumed or cleaned up to reduce bloat
- Bitemporal Data Model
- Effective Time - when a tuple is effective in the real world
- Asserted Time - when a tuple exists in the db
- Primary Key: Business keys + Effective Time + Asserted Time
## The future of databases and Generative AI
- Sales pitch for Google Cloud, Alloydb
- Large Language Model + Postgres (PGvector)
## JSON and analytics in Postgres using index and columnar
- JSON vs JSONB (binary, better)
- Duplicate Keys - JSON preserves; JSONB stores only the last value
- Operations with Keys - JSON can update, add, delete keys; JSONB can do nested keys and values
- Performance - JSON insert is faster; JSONB indexes, extracts faster
## JSON and analytics in Postgres using index and columnar
- Columnar representation of data in memory
- Entire table or selected columns
- Push down aggregations
- Read more [here](https://www.percona.com/blog/powering-postgresql-15-with-columnar-tables/)
## MERGE() - A Quick Introduction
- [Merge](https://www.postgresql.org/docs/current/sql-merge.html) documentation
- Postgres 15+
- Reduces round trips to the db
- PG 17 new features - RETURNING, MERGE_ACTION
- Slides [here](https://postgresconf.org/conferences/2024/program/proposals/merge-a-quick-introduction)
## Everything you want to know about Postgres autovacuum
- Multi-version concurrrency control - when a query updates an existing row in a table, th eDBMS makes a copy of that row and applies the changes to this new version instead of overwriting the original version.
- Readers do not block writers, writers do not block readers
- Introduces additional overhead and issues and results in bloat
## Vacuum and Autovacuum
- 3 main phases
- scan heap - scan a target table to build a list of dead tuples stored in the local memory
- vacuum index - remove index tuples by referring to the dead tuple list
- vacuum heap - remove dead tupes from the heap
- autovacuum can be configured to regularly clean up bloat
- vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
## Enhancing OLTP Systems with pgvector: Best Practices in Database Design
- pgvector - an open-source extension that
- adds support for storage, indexing, searching, metadata with choice of distance
- 16k dimensions, 3 distance metrics, 2 index types
- General Tips
- Ensure that there is sufficient memory
- Connection Pooling
- Tune Autovacuum
- Still changing A LOT
## Advanced strategies for PostgreSQL Lock Management
- Locks happen tables / records are modified in incompatible ways (e.g. non-backwards compatitible schema changes)
- Analyze the queries that are running
- ```select * from pg_stat_activity;```
- Slides [here](https://postgresconf.org/conferences/2024/program/proposals/advanced-strategies-for-postgresql-lock-management)
## Monitoring PostgreSQL
- It's important to monitor the processes of Postgres.
- Built-in tools
- pg_stat_statements
- pg_stat_monitor
- pg_stat_plans
- pgstattuple
- pg_buffercache
## Monitoring PostgreSQL
- External Tools
- pgwatch2
- PGObserver
- pgCluu
- pgexporter
- Prometheus & Grapfana
- Percona PMM (sales pitch!)
## Open Source At A Crossroads
- Lots of companies are selling shovels
- Some companies are taking open source and making money off of it rather than contributing back
## Top 5 PostgreSQL Query Tuning Tips
- Slides [here](https://postgresconf.org/conferences/2024/program/proposals/top-5-postgresql-query-tuning-tips)
## Working With The Citus Extension
## PostgreSQL Partitioning: Beyond the obvious
## Mastering PostgreSQL Partitioning: Supercharge Performance and Simplify Maintenance
## The Future of MySQL is Postgres
## Transaction Isolation in Postgres
## Local-first application architecture using Postgres logical replication
## Scaling PostgreSQL: Navigating Horizontal and Vertical Scalability Pathways
- Slides [here](https://postgresconf.org/conferences/2024/program/proposals/scaling-postgresql-navigating-horizontal-and-vertical-scalability-pathways)