July 16, 2024


In the last few years, the creation of textual descriptions for visual data has become a compelling research issue. However, the problem statement for producing visual data from written descriptions is still much more difficult because it calls for the fusion of Natural Language Processing and Computer Vision techniques. The available techniques create uncompressed images from textual descriptions using Generative Adversarial Networks (GANs). Generative Adversarial Networks are a type of machine-learning framework that can produce texts, photos, videos, and voice recordings. Previously, GANs have been successfully used to produce image datasets for other deep learning algorithms to train, to produce movies or animations for particular purposes, and to produce appropriate captions for photos. 

In reality, most visual input is processed and transmitted in a compressed form. In order to achieve storage and computational efficiency, the suggested work makes an effort to directly produce visual data in compressed representation form utilizing Deep Convolutional GANs (DCGANs). A new GAN-based model, T2CI-GAN, has been recently created by researchers from the Computer Vision and Biometrics Lab of IIIT Allahabad and Vignan University in India that can produce compressed images from text-based descriptions. This approach might serve as a starting point for investigating several options for image storage and content sharing among various smart devices.

In earlier work, the researchers used GANs and other deep learning models to handle various tasks, such as feature extraction from data, text and image data segmentation, word detection in lengthy text extracts, and creating compressed JPEG images. This novel model expands on these earlier initiatives to tackle a computational issue that has so far received scant attention in the literature. Only a few deep learning-based techniques utilized by other research teams to create images from text descriptions produce compressed images. Additionally, most existing systems for producing and compressing images approach the problem of doing so independently, which increases the workload of computing and processing time.

The suggested T2CI-GAN is a deep learning-based model that outputs compressed visual images from text descriptions as its input. This is a significant departure from the traditional approaches that generate visual representations from text descriptions and further compress those images. The model’s primary selling feature is its ability to map text descriptions and generate compressed images directly.

The research team created two GAN-based models to produce compressed images from text descriptions. A dataset of compressed JPEG DCT (discrete cosine transform) images was used to train the first of these models. Following training, this model could produce compressed images from text descriptions. On the other hand, a set of RGB photos were used to train the researchers’ second GAN-based model. This model developed the ability to produce JPEG-compressed DCT representations of images, which explicitly express a series of data points as an equation. The suggested models were evaluated using both the RGB and JPEG compressed versions of the well-known open-source benchmark dataset Oxford-102 Flower pictures. In the JPEG-compressed domain, the model achieved highly encouraging state-of-the-art performance.

When supplied photos are intended to be easily shared with smartphones or other smart devices, the T2CI-GAN model may be utilized to enhance automated image retrieval systems. Additionally, it can be a valuable tool for media and communications experts, enabling them to find lighter versions of particular photographs to post online.

Due to recent technological advancements, our world is heading toward machine-to-machine and human-to-machine connections. T2CI-GAN will be crucial in this situation because machines need facts in compressed form in order to read or comprehend them. The model presently only creates photos in JPEG compressed form. Thus the researchers’ long-term goal is to expand it to produce images in any compressed form without restriction on the compression algorithm. After the team’s research article is published, the model’s source code will also be made available to the general public.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'T2CI-GAN: Text to Compressed Image generation using Generative Adversarial Network'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference article.

Please Don’t Forget To Join Our ML Subreddit

Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.




Source link