0%

数据库原理概论——introduction

本专题以《数据库系统概念 原书第六版》为参考,结合软件学院课件、微助教题目,致力于制作一套专属笔记。

第一章 引言

​ 第一章为线上课程,采用老旧的录播课程,课程知识量较大,不过习题比较不错;纯粹的英文课件应采取更好的模式,构建更好的笔记。– 记录于网课结束,便于后期迭代

​ 作为全书的统领,引言介绍了数据库原理概论这门课程要探讨的内容,学习这一部分是比较困难的,众多复杂概念关系应该会在之后的章节中体现,本章旨在全篇引导。

​ 笔记内容以ppt英文为主,辅助以个人理解,希望可以提供帮助。

1.DB、DBMS、DBS

学习数据库相关知识,首先得理解这三个概念;数据数据库(database)、数据库管理系统,数据库系统

Database (DB): contains information about a particular enterprise

* 数据在我们专业中有很多不同的解释,数据结构、数据分析等具体领域也都不尽相同,那么我们数据库原理概论这里的数据,就是指我们在编程中具体遇到的、需要存储并且可以方便操作的信息集合。(个人理解),也就是上述提到的一特定领域企业的信息*

Database Management System (DBMS) : Provide a way to store and retrieve database information that is both convenient and efficient. Like OS, DBMS is a system software

数据库管理系统是一款系统软件,给我们存储、查询数据信息提供了巨大的便利。之前,我们使用文件系统操作数据,在处理数据时会遇到困难的问题,这些阻碍了我们编程的进展,我认为这些问题应该就是数据库原理的基本内容,当我们学完数据库后,编程能力一定会有巨大突破。

关系:数据库管理系统(DBMS)通过调用操作系统(OS)等操作实现对数据(DB)的管理

由计算机硬件、操作系统、DBMS、数据库、应用程序以及用户等组成的一个整体称为数据库系统(DBS)

2.DBS的应用

representative(代表) application:

  • Banking: all transactions(事务)
  • Airlines: reservations(预约), schedules
  • Universities: registration, grades
  • Sales: customers, products, purchases
  • Online retailers(零售商): order tracking(跟踪), customized(定制的) recommendations
  • Manufacturing: production, inventory(存货), orders, supply chain
  • Human resources: employee records, salaries, tax deductions(扣除额)

3.DBS的目标

这一部分对比了文件系统与数据库系统,点明数据库系统的优势,而这些优势正是我们重点关注的内容;毕竟文件系统真的是给我们的编程造成了不少困扰

Drawbacks of using file system:

  • Data redundancy(冗余) and inconsistency
  • Difficulty in accessing data :
    • Need to write a new program to carry out each new task
  • Data isolation (数据孤立)— multiple files and formats
  • Integrity problems(完整性问题)
    • Integrity constraints (e.g. account balance > 0) become “buried” in program code rather than being stated explicitly(显示地)
    • Hard to add new constraints or change existing ones
  • Atomicity of updates(更新时的原子性问题)
    • Failures may leave data in an inconsistent state with partial updates carried out
    • Transfer of funds from one account to another should either complete or not happen at all
  • Concurrent(并发) access by multiple users
    • Concurrent access is needed for performance
    • Uncontrolled concurrent accesses can lead to inconsistencies
    • Two people reading a balance and updating it at the same time
  • Security problems

4.View of Data

4.1 Data Abstract

  • database system provide an abstract view of the data
  • Hide certain complexity details of how data is stored and maintained (存储和维护)
  • Achieved through several level of abstraction
    • Physical level
      • describes how a record is stored
    • Logical level
      • describes what data are stored in the database, and the relationships among the data
      • describe entire database
    • View level
      • Describe part of the database
      • A way to hide:
        • (a) details of data types and
        • (b) information (such as an employee’s salary) for security purposes

Levl of Abstraction

4.2 Instances and Schemas(模式)

4.2.1 Similar to types and variables in programming languages

  • Schema – the overall design of the database
    • Analogous(相似的) to type information of a variable in a program
  • Instance – the collection of information stored in the database at a particular point in time
    • Analogous to the value of a variable

4.2.2 Several Schemas of DBS

  • Physical schema: database design at the physical level
  • Logical schema: database design at the logical level
  • Sub schemas: describe different views at the view level

4.2.3 Physical Data Independence(物理数据独立性)

​ – application do not depend on physical schema, and thus need not be rewritten if the physical schema changes

  • Applications depend on the logical schema
  • Physical schema is hidden beneath the logical schema

4.3 Data Models

  • Relational model(关系模型)
  • Entity-Relationship data model(实体-联系模型)
    • mainly for database design
  • Object-based data models (基于对象的数据模型)
    • Object-oriented
    • Object-relational
  • Semistructured data model (半结构化的数据模型)
    • XML
  • Other older models:
    • Network model (网状数据模型)
    • Hierarchical model(层次结构模型)

5.Database Languages

Database system provides

  • Data Definition Language (DDL) to specify database schema
  • Data Manipulation Language (DML) to express database queries and updates

In practice, DDL and DML form parts of a single database language

  • SQL (Structured Query Language)

5.1 DDL

1
2
3
4
create table account (
account-number char(10),
balance
);
  • DDL compiler generates tables according to DDL statements and store the schema definitions of these tables in data dictionary
  • Data dictionary contains metadata(元数据:关于数据的数据)
    • Database schema
    • Integrity constraints(完整性约束)
    • Authorization(授权)

5.2DML

  1. Language for accessing and manipulating the data organized by the appropriate data model
    1. Retrieval of information
    2. Insertion of new information
    3. Deletion of information
    4. Modification of information
  2. DML also known as query language(尽管技术上,这是不准确的)
  3. Two classes of languages:
    1. Procedural(过程化) – user specifies what data is required and how to get those data
    2. Declarative (nonprocedural) (声明式)– user specifies what data is required without specifying how to get those data
  4. SQL is the most widely used nonprocedural query language
1
Select customer.customer_name from   customer where  customer.customer_id =192-83-7465

6.Access from application programs

Application programs generally access databases through one of

  • Language extensions to allow embedded SQL(扩展宿主语言)
  • Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database

7.Relational Databases

  • A relational database is based on the relational data model
  • Data and relationships among the data is represented by a collection of tables
  • Includes both a DML and a DDL
  • Most commercial relational database systems employ the SQL query language

8.Database Design

Logical Design – Deciding on the database schema. Database design requires that we find a “good” collection of relation schemas.

  • Business decision – What attributes should we record in the database?
  • Computer Science decision – What relation schemas should we have and how should the attributes be distributed among the various relation schemas?

Physical Design – Deciding on the physical layout of the database

9.Database Users

  • Application programmers – interact with system through DML calls
  • Sophisticated users(见多识广的;老练的;见过世面的;复杂巧妙的;先进的;精密的;水平高的;在行的) – form requests in a database query language
  • Specialized users (专家)– write specialized database applications that do not fit into the traditional data processing framework
  • Naive users (天真)– invoke(调用) one of the permanent application programs that have been written previously

10.Database Administrator (DBA)

  • A person who have central control of the database system, coordinates(协调) all the activities of the database system

  • Has a good understanding of the enterprise’s information resources and needs

  • Database administrator’s duties include:

    • Schema definition
    • Storage structure and access method definition
    • Granting user authority to access the database
    • Specifying integrity constraints
    • Acting as liaison(联络) with users
    • Monitoring performance and responding to changes in requirements

11.Database System Structure

Function components of DBMS

  • Storage Management(存储管理器)
  • Query Processing (查询处理器)

11.1 Storage Management

  • Database data is typically stored on secondary memory (hard disk), using file system

  • Storage Manager is a program module responsible to

    • Interact with the file sytem
    • Efficient storing, retrieving and updating of data
  • Storage Manager include following components

    • Authorization and integrity manager(权限及完整性管理器
    • Transaction manager(事务管理器
    • File manager(文件管理器
    • Buffer manager(缓冲区管理器
  • Authorization and integrity manager

    • Tests for the satisfaction of integrity constraints and check the authority of users to access data
  • Transaction manager

    • Ensure database remains in a consistent state despite system failure
    • Ensure concurrent transaction execution proceed without conflicting
  • File manager

    • Allocate space on disk storage
    • Manage data structures used to represent information stored on disk
  • Buffer manager

    • Fetch data from disk storage to main memory
    • Decide what data to cache in main memory
  • Storage Manager implements following data structures

    • Data File(数据文件): store database itself
    • Data Dictionary(数据字典): store metadata about the structure of database, in particular the schema of database
    • Indices(索引): provide fast access to data items that hold particular value

11.2 Query Processing

  • One of the main purpose of DBS is to simplify and facilitate (促进)access to data

  • DBS achieve this goal by

    • Data abstraction
    • Nonprocedural language (非过程化)
  • Query Processor translates updates and queries written in a nonprocedural language, at the logical level, into an efficient sequence of operations at the physical level

  • Query processor components include

    • DDL Interpreter(解释器)
      • Interprets DDL statement and record definitions in the data dictionary
    • DML Compiler(编译器)
      • Translate DML statement into an evaluation plan consisting low-level instructions
    • Query Evaluation Engine(查询执行引擎)
      • Executes low level instructions generated by DML compiler

(*)查询执行、查询优化器选择合适执行策略的方法

12.Transaction Management

  • A transaction is a collection of operations that performs a single logical function in a database application
  • Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.
  • Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database

13.Database Architecture

  • The architecture of a database systems is greatly influenced by the underlying computer system on which the database is running
    • Centralized
    • Client-server
    • Parallel (multi-processor)
    • Distributed

Architecture

Architecture-cn

14.History of Database Systems

  • 1950s and early 1960s:

    • Data processing using magnetic tapes for storage
      • Tapes provide only sequential access
    • Punched cards for input
  • Late 1960s and 1970s:

    • Hard disks allow direct access to data
    • Network and hierarchical data models in widespread use
    • Ted Codd defines the relational data model
      • Would win the ACM Turing Award for this work
      • IBM Research begins System R prototype
      • UC Berkeley begins Ingres prototype
  • 1980s:

    • Research relational prototypes evolve into commercial systems
      • SQL becomes industrial standard
    • Parallel and distributed database systems
    • Object-oriented database systems
  • 1990s:

    • Large decision support and data-mining applications
    • Large multi-terabyte data warehouses
    • Emergence of Web commerce