Google Cloud Storage is used as object storage for storing large-scale unstructured data in the cloud. The service is used by organizations to store video and audio files, backup data, petabyte-scale data used for analytics, logging, and more. Google Cloud Storage can also be used as a file system in Linux machines for specific use cases.
In this blog, we’ll explore Google Cloud Storage use cases in greater detail, the advantages and disadvantages of this approach, and how you can get started with mounting it as a file system in Linux.
Jump ahead to a topic in this post:
File systems typically use a hierarchical namespace, which isn’t available by default in object storage services as is the case of Google Cloud Storage. However, certain use cases like application lift and shift migrations, distributed file systems, and file sharing demand the scalability and flexibility offered by object storage, but in a hierarchical namespace constructed as a file system. Luckily, there are tools available that can help you do that, such as Google Cloud Storage.
This approach is helpful when you can’t make changes to legacy applications to use object storage, even though the architecture would benefit from object storage capabilities. Updating the legacy code bases to include Google Cloud object storage APIs would require investing additional time and effort. The same applications can function in the same way as they did on-premises by using Google Cloud Storage mounted as a file system. It can also help fast-track Google Cloud migrations.
Mounting Google object storage as a file system also helps in collaborative file sharing where files are accessed simultaneously by different users. Building distributed file shares using Google Cloud object storage is beneficial as the service can scale according to your requirements. It’s also economical as you only need to pay for the object storage space you use.
Google Cloud Storage can be mounted on Linux machines as a file system using a tool called Cloud Storage FUSE. In the backend, this tool relies on an opensource FUSE adapter that helps with the process. This means that object storage can be accessed by applications in the same way that they access standard file systems as they’re exposed as locally mounted folders.
The “/” characters in object names are interpreted as directory separators using Google Cloud FUSE. In other words, objects in the bucket with the same prefix are handled the same way files are handled in a directory. However, it should be noted that the interface isn’t like NFS or CIFS, but rather as a file system on an operating system. Folders mounted this way also aren’t POSIX compliant.
Cloud Storage Fuse is ideal for use cases where an application demands the scalability and performance offered by Google Cloud object storage, such as machine learning models that need access to file systems for data storage, analytics, data models, or log injection. Cloud Storage FUSE also integrates with the following Google Cloud products: Google Kubernetes Engine, Vertex AI, Deep learning VM Images, and Deep Learning Containers.
Let's look at the steps required to mount Google Cloud Storage as a file system on an Ubuntu machine using Cloud Storage FUSE.
gcloud storage buckets create gs://<bucketname> and replace <bucketname> with the name of the bucket you want to create
export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb https://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install fuse gcsfuse
mkdir "$HOME/mountfolder
gcsfuse <storagename> "$HOME/mountfolder"
Mounting Google Cloud Storage as a file system is viable for some use cases, but there are also some limitations to this approach that you should be aware of.
Google Cloud Storage isn’t recommended to be used for performance-intensive workloads like databases as it will have latency issues when compared to native file systems. Throughput could get considerably reduced while reading and writing single files.
Storage read/write operations could lead to a transient error while accessing data via Cloud Storage FUSE. Therefore, it’s recommended to use cloud storage only for applications that are tolerant of such errors.
Google Cloud Storage mounted through cloud storage FUSE may cause inconsistent behavior for use cases that need file transcoding. This occurs because Cloud Storage FUSE doesn’t support reading or modifying objects that are required during encoding or compression.
You can’t use Cloud Storage FUSE with storage buckets that have versioning and retention policies enabled. Mounting storage buckets with versioning enabled can lead to unpredictable behavior.
Object storage file size and I/O limitations applicable to object storage still apply while mounting Google Cloud Storage as a file system using Cloud Storage FUSE. For example, the maximum size of a file that can be stored is 5 TB and the maximum requests per second is 1000 per object.
NetApp Cloud Volume ONTAP simplifies the usage of Google Cloud Storage for file system storage requirements. Cloud Volumes ONTAP is a trusted storage management solution that uses NetApp ONTAP technology to enhance efficiency and optimize the use of native Google Cloud Storage.
Cloud Volumes ONTAP provides a number of file system capabilities that your workloads could benefit from, such as:
Armed with the knowledge of setting up Google Cloud Storage as a file system using Cloud Storage FUSE, you can now use the process for certain use cases, such as creating a shared file system, storing common log/configuration files, and migrating applications without changing any code. However, as Google Cloud Storage isn’t natively designed to be used as file system storage, it’s always important to consider the associated limitations as well against business goals and ensure the alignment of performance and latency requirements.
Cloud Volumes ONTAP can work as a better alternative, as it uses Google Cloud storage in the backend and delivers enhanced efficiency, management, and data protection capabilities at a reduced cost.