All About Distributed Databases

What Is Distributed Database?

A database system known as a distributed database is one that uses several computers or nodes that are linked together by a network. In a distributed database, each node can store a portion of the data, and the total amount of data stored on all nodes makes up the entire database.

The system makes sure that the data is consistent and accessible to users despite network outages or other system problems. Data is stored and processed in a distributed way in a distributed database. A distributed database's main objectives are to give applications that need access to vast amounts of data high availability, scalability, and performance.

Need of Distributed Database In Dbms

· An organised collection of information is called a database. In a database, the data can be conveniently accessed, managed, modified, updated, controlled, and organised.

· The two main categories of databases are distributed and centralised databases. Why do we even need a Distributed Database in Dbms is the question at hand. For the time being, let's imagine that we only have centralised databases.

· All of the data will be entered into a single database. Increasing its size to the point that it will take a long time to query even a single entry.

· Since we only have one database, once a fault happens, we are no longer able to fulfil user requests.

· Even if we wanted to, there is no way to scale, and availability is also reduced, which lowers throughput.

· Throughput, latency, scalability, availability, fault tolerance, and many other difficulties that may occur while using a single machine and a single database are all resolved by distributed databases. We require distributed databases because of this. Let's go over them in more depth.

Distributed Database Types

There are two types of distributed databases:

Homogenous
Heterogenous

1. Homogeneous

A homogeneous distributed database is a form of distributed database where each site uses the same operating system and database management system (DBMS) software. In this kind of distributed database, data is partitioned and distributed across various sites in a way that ensures data consistency and availability. Each site has a copy of the same database schema.

Homogeneous distributed databases are commonly used in large-scale enterprise applications where data needs to be shared across multiple locations, such as in global supply chain management, financial services, and telecommunications.

2. Heterogenous

In a heterogeneous distributed database, many database management systems (DBMS) or data models are utilised by the system's nodes. This indicates that the material is accessed via several languages or protocols and is stored in various formats.

For example, one node in the system may use a relational DBMS, while another node uses a NoSQL DBMS. Each node may also use different data models, such as hierarchical, network, or object-oriented data models. Because the data must be translated between various formats and languages, managing a heterogeneous distributed database can be more difficult than managing a homogeneous distributed database. Data security, performance, and consistency problems may result from this. Therefore, to manage a heterogeneous distributed database effectively, specialised knowledge and tools are frequently needed.

Distributed Database Storage

Distributed database storage is managed in two ways:

Replication
Fragmentation

1. Replication

The system maintains copies of the data at many locations, as the name would imply. A database is fully redundant if every single record is accessible from numerous locations.

Data replication has the benefit of making more data accessible across more sites. Since the data is spread across several locations, queries can be handled concurrently.

Data replication does, however, have significant drawbacks. If any site fails to update and synchronise data with other sites on a consistent basis, the database will become inconsistent. Data availability benefits greatly from replication.

Constant updating makes concurrency control more difficult and adds server overhead.

2. Fragmentation

The relations are fragmented, which means they are divided into smaller portions, when it comes to distributed database storage. Each fragment is kept in a different location depending on where it is needed.

Making ensuring that the fragments can be afterwards reassembled into the original relation without losing data is a need for fragmentation.

Data consistency is avoided by fragmentation since there are no duplicates of the data.

Two different types of fragmentation exist:

Vertical fragmentation - Each group (tuple) of rows in the relation schema is assigned to a single fragment.

Vertical fragmentation: This technique divides the relational schema into smaller schemas, each of which has a common candidate key that ensures a lossless join.

Distributed Database Features

Some general features of distributed databases are:

Location independency - Data is physically stored at multiple sites and managed by an independent DDBMS.
Distributed query processing - Distributed databases answer queries in a distributed environment that manages data at multiple sites. High-level queries are transformed into a query execution plan for simpler management.
Distributed transaction management - Provides a consistent distributed database through commit protocols, distributed concurrency control techniques, and distributed recovery methods in case of many transactions and failures.
Seamless integration - Databases in a collection usually represent a single logical database, and they are interconnected.
Network linking - All databases in a collection are linked by a network and communicate with each other.
Transaction processing - Distributed databases incorporate transaction processing, which is a program including a collection of one or more database operations. Transaction processing is an atomic process that is either entirely executed or not at all.

Vertical fragmentation - The relation schema is fragmented into smaller schemas, and each fragment contains a common candidate key to guarantee a lossless join.

• Distributed Database Advantages and Disadvantages

Development in modules- A distributed database that has been developed in a modular fashion can be expanded to new locations or units by adding new servers and data to the current configuration and seamlessly connecting them to the distributed system. Distributed databases continue to operate normally after this kind of expansion.

Reliability- Comparatively speaking, distributed databases are more reliable than centralised ones. In the event of a centralised database failure, the entire system shuts down. When a failure occurs in a distributed database, the system continues to run, although with reduced performance, until the problem is fixed.

Disadvantages

Large overhead- When database replication is employed, several activities over numerous sites necessitate numerous calculations and ongoing synchronisation, adding a significant amount of processing overhead.

Data reliability-Data integrity, which is jeopardised by changing data at various sites, is a potential issue when employing database replication.

Data distribution errors- Effective data dissemination is a key factor in user request response. That means if data is not evenly distributed across various sites, responsiveness may suffer.

Search This Blog

Design and Thinking