分享更有价值
被信任是一种快乐

怎么从零学习PostgreSQL Page结构

文章页正文上

这篇文章主要为大家展示了“怎么从零学习PostgreSQL Page结构”,内容简而易懂,条理清晰,希望能够帮助大家解决疑惑,下面让小编带领大家一起研究并学习一下“怎么从零学习PostgreSQL Page结构”这篇文章吧。pg中的page和Oracle中的数据块一样,指的是数据库的块,操作系统块的整数倍个,默认是8K也就是两个操作系统块(4k的文件系统块)。这个大小在pg编译安装configure的时候通过–with-blocksize参数指定,单位是Kb。2.2PageHeaderData数据结构 (页头)
可以看到一个Page有 Pager header(页头),后面是linp(行指针),pd_lower和pd_upper分别是空闲空间的开始位置和结束位置;后面就是行数据(pg里面的行就是tuple)和special空间。整个page的结构比Oracle的数据块结构简单多了。typedefstructPageHeaderData{/* XXX LSN is member of *any* block, not only page-organized ones */PageXLogRecPtrpd_lsn;/* LSN: next byte after last byte of xlog* record for last change to this page */uint16pd_checksum;/* checksum */uint16pd_flags;/* flag bits, see below */LocationIndexpd_lower;/* offset to start of free space */LocationIndexpd_upper;/* offset to end of free space */LocationIndexpd_special;/* offset to start of special space */uint16pd_pagesize_version;TransactionIdpd_prune_xid;/* oldest prunable XID, or zero if none */ItemIdDatapd_linp[FLEXIBLE_ARRAY_MEMBER];/* line pointer array */}PageHeaderData;具体的长度和描述也都有详细说明:FieldTypeLengthDescriptionpd_lsnPageXLogRecPtr8 bytesLSN: next byte after last byte of WAL record for last change to this pagepd_checksumuint162 bytesPage checksumpd_flagsuint162 bytesFlag bitspd_lowerLocationIndex2 bytesOffset to start of free spacepd_upperLocationIndex2 bytesOffset to end of free spacepd_specialLocationIndex2 bytesOffset to start of special spacepd_pagesize_versionuint162 bytesPage size and layout version number informationpd_prune_xidTransactionId4 bytesOldest unpruned XMAX on page, or zero if none简单来说,pd_lsn是指最后修改过这个page的lsn(log sequence number),这个和wal(write ahead log,同oracle redo)中记录的lsn一致。数据落盘时redo必须先刷到wal,这个pd_lsn就记录了最后data落盘时的相关redo的lsn。pd_checksum是校验和,在initdb初始化实例的时候通过-k参数指定开启,默认是关闭的,initdb之后不能修改,它基于FNV-1a hash算法,做了相应的更改。这个校验和与Oracle的checksum一样用于数据块在读入和写出内存时的校验。比如我们在内存中修改了一个数据块,写入到磁盘的时候,在内存里面先计算好checksum,数据块写完后再计算一遍cheksum是否和之前在内存中的一致,确保整个写出过程没有出错,保护数据结构不被破坏。pd_flags有以下的值:/** pd_flags contains the following flag bits. Undefined bits are initialized* to zero and may be used in the future.** PD_HAS_FREE_LINES is set if there are any LP_UNUSED line pointers before* pd_lower. This should be considered a hint rather than the truth, since* changes to it are not WAL-logged.** PD_PAGE_FULL is set if an UPDATE doesn’t find enough free space in the* page for its new tuple version; this suggests that a prune is needed.* Again, this is just a hint.*/#definePD_HAS_FREE_LINES0x0001/* are there any unused line pointers? */#definePD_PAGE_FULL0x0002/* not enough free space for new tuple? */#definePD_ALL_VISIBLE0x0004/* all tuples on page are visible to* everyone */#definePD_VALID_FLAG_BITS0x0007/* OR of all valid pd_f免费主机域名lags bits */pd_lower和pd_upper分别表示空闲空间起始位置和结束位置;pd_special在索引page才有效;pd_pagesize_version是page大小和page version的存储位,在不同数据库版本中,page version不一样:数据库版本pd_pagesize_version07.3 & 7.418.028.13>8.34prune_xid表示这个page上最早删除或者修改tuple的事务id,在vacuum操作的时候会用到。(pg没有undo,旧的数据也在page中,用vacuum来清理)lp_off是tuple的开始的偏移量;lp_flags是标志位;lp_len记录了tuple的长度。
FieldLengthDescriptionlp_off15 bitsoffset to tuplelp_flags2 bitsState of iteam pointerlp_len15 bitsByte length of tupletypedefstructHeapTupleFields{TransactionIdt_xmin;/* inserting xact ID */TransactionIdt_xmax;/* deleting or locking xact ID */union{CommandIdt_cid;/* inserting or deleting command ID, or both */TransactionIdt_xvac;/* old-style VACUUM FULL xact ID */}t_field3;}HeapTupleFields;typedefstructDatumTupleFields{int32datum_len_;/* varlena header (do not touch directly!) */int32datum_typmod;/* -1, or identifier of a record type */Oiddatum_typeid;/* composite type OID, or RECORDOID *//** Note: field ordering is chosen with thought that Oid might someday* widen to 64 bits.*/}DatumTupleFields;structHeapTupleHeaderData{union{HeapTupleFieldst_heap;DatumTupleFieldst_datum;}t_choice;ItemPointerDatat_ctid;/* current TID of this or newer tuple (or a* speculative insertion token) *//* Fields below here must match MinimalTupleData! */uint16t_infomask2;/* number of attributes + various flags */uint16t_infomask;/* various flag bits, see below */uint8t_hoff;/* sizeof header incl. bitmap, padding *//* ^ – 23 bytes – ^ */bits8t_bits[FLEXIBLE_ARRAY_MEMBER];/* bitmap of NULLs *//* MORE DATA FOLLOWS AT END OF STRUCT */};(*这部分代码在src/include/access/htup_details.h)也有对应的长度和描述的相详细说明:
FieldTypeLengthDescriptiont_xminTransactionId4 bytesinsert XID stampt_xmaxTransactionId4 bytesdelete XID stampt_cidCommandId4 bytesinsert and/or delete CID stamp (overlays with t_xvac)t_xvacTransactionId4 bytesXID for VACUUM operation moving a row versiont_ctidItemPointerData6 bytescurrent TID of this or newer row versiont_infomask2uint162 bytesnumber of attributes, plus various flag bitst_infomaskuint162 bytesvarious flag bitst_hoffuint81 byteoffset to user dataunion是共享结构体,起作用的变量是最后一次赋值的成员。来看看tuple header的结构。在HeapTupleFields中,t_xmin是插入这行tuple的事务id;t_xmax是删除或者锁住tuple的事务id;union结构中的t_cid是删除或者插入这个tuple的命令id,也就是命令序号;t_xvac是以前格式的vacuum full用到的事务id。在DatumTupleFields中,datum_len_指tuple的长度;datum_typmod是记录的type;datum_typeid是记录的id。页头HeapTupleHeaderData包含了union结构体中的两个变量HeapTupleFields和DatumTupleFields。t_ctid是tuple id,类似oracle的rowid,形式为(块号,行号)。t_infomask2 表示属性和标志位t_infomask 是flag标志位,具体值如下:/** information stored in t_infomask:*/#defineHEAP_HASNULL0x0001/* has null attribute(s) */#defineHEAP_HASVARWIDTH0x0002/* has variable-width attribute(s) */#defineHEAP_HASEXTERNAL0x0004/* has external stored attribute(s) */#defineHEAP_HASOID0x0008/* has an object-id field */#defineHEAP_XMAX_KEYSHR_LOCK0x0010/* xmax is a key-shared locker */#defineHEAP_COMBOCID0x0020/* t_cid is a combo cid */#defineHEAP_XMAX_EXCL_LOCK0x0040/* xmax is exclusive locker */#defineHEAP_XMAX_LOCK_ONLY0x0080/* xmax, if valid, is only a locker *//* xmax is a shared locker */#defineHEAP_XMAX_SHR_LOCK(HEAP_XMAX_EXCL_LOCK|HEAP_XMAX_KEYSHR_LOCK)#defineHEAP_LOCK_MASK(HEAP_XMAX_SHR_LOCK|HEAP_XMAX_EXCL_LOCK|HEAP_XMAX_KEYSHR_LOCK)#defineHEAP_XMIN_COMMITTED0x0100/* t_xmin committed */#defineHEAP_XMIN_INVALID0x0200/* t_xmin invalid/aborted */#defineHEAP_XMIN_FROZEN(HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)#defineHEAP_XMAX_COMMITTED0x0400/* t_xmax committed */#defineHEAP_XMAX_INVALID0x0800/* t_xmax invalid/aborted */#defineHEAP_XMAX_IS_MULTI0x1000/* t_xmax is a MultiXactId */#defineHEAP_UPDATED0x2000/* this is UPDATEd version of row */#defineHEAP_MOVED_OFF0x4000/* moved to another place by pre-9.0* VACUUM FULL; kept for binary* upgrade support */#defineHEAP_MOVED_IN0x8000/* moved from another place by pre-9.0* VACUUM FULL; kept for binary* upgrade support */#defineHEAP_MOVED(HEAP_MOVED_OFF|HEAP_MOVED_IN)#defineHEAP_XACT_MASK0xFFF0/* visibility-related bits */t_hoff表示tuple header的长度t_bits记录了tuple中null值的列它在源码的crontrib目录下面postgres@cs-> cd postgresql-10.4/contrib/pageinspectmake && make installpostgres@cs-> makepostgres@cs-> make installcreate extension就好了postgres@cs-> psqlpsql (10.4)Type “help” for help.postgres=# CREATE EXTENSION pageinspect;CREATE EXTENSIONpost免费主机域名gres=# xExpanded display is on.postgres=# dxList of installed extensions-[ RECORD 1 ]——————————————————Name | pageinspectVersion | 1.6Schema | publicDescription | inspect the contents of database pages at a low level-[ RECORD 2 ]——————————————————Name | plpgsqlVersion | 1.0Schema | pg_catalogDescription | PL/pgSQL procedural language3.2 创建建测试表t1,插入数据这里可以看到1000行数据用了6个数据块来存储(这里数据块从0开始),第6个数据块包含了73条记录(tuple)
这里我们通过两个函数来查看page_header 可以看到页头的数据heap_pageitems 可以看到具体tuple的数据postgres=# xExpanded display ison.postgres=# select* frompage_header(get_raw_page(‘t1’,0));-[ RECORD1]——–lsn | 0/1671188checksum | 0flags | 0lower| 772upper| 784special | 8192pagesize | 8192version | 4prune_xid | 0postgres=#可以看到第0个page的pd_lsn为0/1671188,checksum和flags都是0,这里没有开启checksum;tuple开始偏移是772(pd_lower),结束偏移是784(pd_upper),这个page是个表,所以它没有special,我们看到的sepcial就是8192了;pagesize是8192就是8K,version是4,没有需要清理的tuple,所以存储需要清理的tuple的最早事务的id就是0(prune_xid)。我们来看一行记录,可以看到它是第1行记录(lp=1),tuple的开始偏移量8160(lp_off),tuple的长度是32bytes(lp_len为32,这个tuple是第一个插入的tuple,所以lp_off+lp_len=8160+32=8192),这行记录的插入事务id是557(t_min),和tuple的删除事务id是0(tmax),这里数据没有被删除,所以都是0。我们还可以看到t_ctid是(0,1),这里表示这个tuple是这个page中第一个块的第一条tuple;tinfomask2是2,t_infomask为2306,十六进制就是 0x0902 ,这个我们可以根据上面提到的值去看看具体的含义,0x0902=0x0100+0x0800+0x0002;tuple头部结构(行头)的长度是24(t_hoff),t_data就是16进制存储的真正的数据了。我们删除一行tuple可以看到prune_xid有了值,为559,这个559就是删除这个tuple的事务id(当前最早的删除或更改了tuple的事务id)同样,我们可以看到lp为1的这个tuple的t_xmax为559,这里就是删除这行tuple的事务id。
PostgreSQLPage的物理结构相比Oracle的数据块来说简单很多了,源代码开放也便于学习和研究,pg是个很好很强大的数据库,值得好好学习。以上是“怎么从零学习PostgreSQL Page结构”这篇文章的所有内容,感谢各位的阅读!相信大家都有了一定的了解,希望分享的内容对大家有所帮助,如果还想学习更多知识,欢迎关注云技术行业资讯频道!

相关推荐: 怎么使用MindMaster

本篇内容主要讲解“怎么使用MindMaster”,感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习“怎么使用MindMaster”吧!可谓是思维导图软件中的新秀。因为新,所以界面设计更加时尚大方,超大的操作按钮图标,即便是…

文章页内容下
赞(0) 打赏
版权声明:本站采用知识共享、学习交流,不允许用于商业用途;文章由发布者自行承担一切责任,与本站无关。
文章页正文下
文章页评论上

云服务器、web空间可免费试用

宝塔面板主机、支持php,mysql等,SSL部署;安全高速企业专供99.999%稳定,另有高防主机、不限制内容等类型,具体可咨询QQ:360163164,Tel同微信:18905205712

主机选购导航云服务器试用

登录

找回密码

注册