Fast and Easy Image Generation with Fabric and OpenAI

Sep 23, 2024

Automation, Blogging, Fabric, Image Generation, LLM, OpenAI

Introduction

In my previous articles, I introduced Fabric and explained how this tool can be integrated into workflows on Mac, iPad, and iPhone. Fabric generates an optimized prompt for a given problem through text inputs, such as simple entries in the command line or by extracting information from text files, in conjunction with a template referred to as a pattern. This prompt is then submitted to a large language model (LLM), which subsequently returns a text output.

Image Generation

A workflow that I use quite often is the creation of an article image using a specific pattern tailored to the style of this blog. Until now, I have created the article images for this blog manually using Stable Diffusion or ChatGPT. To do this, I described the topic of the article in keywords and conveyed the desired image style as well as the type of composition to the tools, subsequently saving the result manually. With Fabric, I can further automate this process by generalizing these style and composition descriptions as a pattern. This way, the text of an article can simply be passed to Fabric, and the finished image will be generated as a result.

Creation of an Article Image

Originally, I intended to implement the image generation in Python. However, the Python example on the OpenAI website failed to generate images in landscape format. So, I adapted the curl example to meet my requirements and integrated it into a shell script. The following prerequisites must be fulfilled:

An OpenAI account and an OpenAI API key (OpenAI Platform).
A functioning installation of Fabric. Please refer to Installation and Getting Started with Fabric – the Prompt Optimizer.
The command-line program jq to process JSON data.

The Pattern

Let us begin with the pattern optimized for this blog. To do this, I duplicated an existing pattern folder as a template. The folder $HOME/.config/fabric/prompts/create_art_prompt suited my needs well. I renamed the copy of the folder to create_blog_image, which simultaneously serves as the name under which the new pattern will be invoked in Fabric. I then replaced the contents of the file system.md in the folder with the following content.

# IDENTITY AND GOALS

You are an expert graphic designer and AI whisperer. You know how to take a concept and give it to an AI and have it create the perfect piece of drawing for it.

Take a step back and think step by step about how to create the best result according to the STEPS below.

STEPS

- Think deeply about the concepts in the input.

- Think about the best possible way to capture that concept visually in a compelling and interesting way.

OUTPUT

- Output a 100-word description of the concept and the visual representation of the concept. 

- Write the direct instruction to the AI for how to create the drawing, i.e., don't describe the drawing, but describe what it looks like and how it makes people feel in a way that matches the concept. the style, colors, mood and composition description below

- Style: Vibrant and dynamic with a mix of modern digital and vintage elements.

- Composition: Flowing, ribbon-like elements intertwined with detailed sketches of mathematical equations and geometric shapes. The background features aged parchment paper with modern digital elements like a keyboard and computer code on a screen.

- Colors: Bright, neon colors such as blues, reds, and purples contrasted against warm sepia tones.

- Mood: Abstract and technological, with a futuristic feel emphasized by the bold, futuristic font of the word “FABRIC” integrated into the design.

- Include nudging clues that give the piece the proper style, .e.g., "Like you might see in the New York Times", or "Like you would see in a Sci-Fi book cover from the 1980's.", etc. In other words, give multiple examples of the style of the art in addition to the description of the art itself.

INPUT

INPUT:

With this pattern, a prompt can already be created that can be used in ChatGPT or Stable Diffusion:

cat $HOME/Documents/Blog/new_blog_post.md | fabric -sp create_blog_image

Thus, the first part is complete.

The Script

The script is intended to pass the generated prompt to DALL·E 3 and process the response. This response consists of a JSON payload that includes, among other things, an URL that pointed to the generated image and the revised_prompt, which is the prompt actually utilized by DALL·E 3.

As previously mentioned, the command-line tool jq is required for processing the input and output to OpenAI, which can be installed using Homebrew:

brew install jq

In addition to the image, the script will, at least in the test phase, save the prompt generated by Fabric as well as the revised_prompt for analysis purposes and to optimize the pattern. Later, these lines can be uncommented or deleted.

Until now, I have no found a solution for generating meaningful names for the files, so I used timestamp to name them. Additionally, the image will be opened at the end in a program designated for PNG files, such as the Preview app.

This leads to the following script:

#!/bin/zsh

# Check if data is being piped into the script
if [ -t 0 ]; then
	# If no input is received via pipe...
  echo "Es wurde keine Eingabe gepiped" 
  exit 1
else
  # Read all piped input
  prompt=$(cat -)
fi

# Your OpenAI API key should be set as an environment variable
api_key="$OPENAI_API_KEY"

# Create the JSON payload
json_payload=$(jq -n \
  --arg model "dall-e-3" \
  --arg prompt "$prompt" \
  --argjson n 1 \
  --arg size "1792x1024" \
  '{model: $model, prompt: $prompt, n: $n, size: $size}'
)

# Get the current date and time for the filename
timestamp=$(date +"%Y%m%d_%H%M%S")

# Execute the curl command and save the response
response=$(curl -s https://api.openai.com/v1/images/generations \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $api_key" \
  -d "$json_payload")

# Extract the image URL from the response
image_url=$(echo $response | jq -r '.data[0].url')

# Extract the revised prompt from the response
revised_prompt=$(echo $response | jq -r '.data[0].revised_prompt')

# Save the image using the date and time in the filename
curl -s "$image_url" -o "$HOME/Pictures/CreateImage/image_${timestamp}.png"

# Save the used prompt in a text file
echo "$prompt" > "$HOME/Pictures/CreateImage/prompt_${timestamp}.txt"

# Save the revised prompt in a text file
echo "$revised_prompt" > "$HOME/Pictures/CreateImage/revised_prompt_${timestamp}.txt"

# Display the image
open "$HOME/Pictures/CreateImage/image_${timestamp}.png"

# Output success messages
echo "Bild gespeichert als image_${timestamp}.png"
echo "Prompt gespeichert als prompt_${timestamp}.txt"
echo "Überarbeiteter Prompt gespeichert als revised_prompt_${timestamp}.txt"


# If the created image file should be piped to the next step, uncomment the success messages and uncomment the following line # echo image_${timestamp}.png

What to Do with the Scripts

Scripts are text files that typically cannot be executed directly. To simplify the invocation of a shell script, I copy them into a directory that is included in the shell’s search path. On a Mac, these are usually:

$HOME/Applications 
$HOME/bin
$HOME/.local/bin

When scripts are stored in one of these directories, they are accessible only to the current user. If a script is to be made available to all users on a computer, the appropriate paths are:

/usr/local/bin
/opt/bin

Administrator privileges are required for copying and manipulating files in these locations.

To test which paths are already specified in the search path, you can use the command:

echo $PATH

I prefer the folder $HOME/Applications for my personal programs and scripts. Since this folder is also used for web apps created by browsers, such as those added to the Dock in Safari via “File → Add to Dock”, it is usually already present.

If this folder is not included in the search path, it can easily be added with the following command:

echo 'export PATH=$PATH:$HOME/Applications' >> $HOME/.zshrc`  
source $HOME/.zshrc`

The script can then be saved in the folder, for example, under the name CreateImage, and made executable with:

chmod +x $HOME/Applications/CreateImage

Another Environment Variable

In order to create an image using the OpenAI API call, the OpenAI API key must be stored as an environment variable in the shell configuration file. The script will then read it from there. The advantage of this approach is that the key does not need to be specified in the code of every program that calls OpenAI, thus preventing accidental exposure of the key in a blog post like this or on GitHub.

# Added for OpenAI Apps
echo 'export OPENAI_API_KEY="Your OpenAI Key"' >> $HOME/.zshrc
source $HOME/.zshre

The First Test Run

With this command, you can now test whether the image generation with Fabric and the script works:

echo "Create an image of two parrots on a skyscraper roof" | CreateImage

You should see a confirmation in the terminal that the image and both prompts have been saved. Additionally, the created image will open in the Preview app:

Now that all components have been created, the workflow can be tested with the draft of this article:

cat $HOME/Documents/Blog/article_draft.md | fabric -sp create_blog_image | CreateImage

With that result:

Further Optimization

However, this still requires too much typing. Therefore, this lengthy command line invocation can be encapsulated in a shell script:

#!/bin/zsh
# Check if an argument has been provided
if [ $# -eq 0 ]; then
    echo "Please provide a file."
    exit 1
fi
# Check if the provided argument is a file
if [ ! -f "$1" ]; then
    echo "The provided argument is not a file."
    exit 1
fi
# If an argument has been provided and it is a file, execute the command
cat "$1" | fabric -sp create_blog_image | CreateImage

Saved in the $HOME/Applications folder, for example as make_article_image, and marked as executable, the invocation then simplifies to:

make_article_image /path/to/article.md

For my use case, I am currently seeking a solution that I can directly invoke in Obsidian to create the image from the active note. There are several candidates that might support this, such as the “Templater” or “Shell Commands” plugins, but that will be the next step.

Conclusion

This example of how Fabric can be integrated into a useful workflow should be understood as just that: an example for personal development. The pattern must be tailored to individual needs — I certainly want my image blog style to be consistently reflected everywhere 😉 . Optimizing such a pattern will undoubtedly require several iterations and time. It is worthwhile to take a closer look at the two generated prompts to see which levers need to be adjusted, as the saying goes in German.

I hope these ideas are nonetheless helpful, and as always, I welcome comments, whether regarding potential errors, improvements, praise, or criticism.

dit und dat