Reading Text From Images
I’m experimenting with the Tesseract OCR Engine and Chat GPT. This is part of an old project I want to revive called “Cooking Without The Bullsh*t”. The goal is to have a website that has access to a recipe library that can be added to. If I have a bunch of recipes written down by the family, it doesn’t make sense to manually type them all if that could potentially be automated.
Wrote up a small console app, using Tesseract to read the text of a recipe image and Chat GPT to format the text into markdown.
using OpenAI.Chat;
using Tesseract;
try
{
var base_folder = Environment.GetEnvironmentVariable("BASE_FOLDER") ?? "C:\\Deploy\\Recipes";
var output_folder = Path.Combine(base_folder, "OUTPUT");
Directory.CreateDirectory(output_folder);
var openai_api_key = Environment.GetEnvironmentVariable("OPENAI_API_KEY");
if (string.IsNullOrEmpty(openai_api_key))
{
Console.WriteLine("Invalid OpenAI API key provided.");
return;
}
var files = Directory.GetFiles(base_folder);
foreach (var file in files)
{
Console.WriteLine($"Processing: {file}");
if (string.IsNullOrEmpty(file) || !File.Exists(file))
{
Console.WriteLine($"No file was found at: {file}");
return;
}
using (var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default))
using (var img = Pix.LoadFromFile(file))
using (var page = engine.Process(img))
{
var text = page.GetText();
var client = new ChatClient("gpt-4o-mini", openai_api_key);
var completion = client.CompleteChat($"This is a recipe that I have scanned in, could you return it back to me with markdown formatting. I only want the recipe in the result because I will ingest that data: \n {text}");
var output_file_path = Path.Combine(output_folder, $"{Path.GetFileNameWithoutExtension(file)}.txt");
File.WriteAllText(output_file_path, completion.Value.Content[0].Text);
Console.WriteLine($"GPT outputed to: {output_file_path}");
}
}
}
catch(Exception ex)
{
Console.WriteLine(ex.Message);
}
Passed in the following image from my grandmother Virginia Wahus:
Got this back:
Peanut Butter Round-Up Cookies
By: Virginia Wahus
Ingredients
- 1 cup shortening
- 2 cups flour
- 1 cup brown sugar
- 3/4 cup sugar
- 2 teaspoons baking soda
- 1/2 teaspoon salt
- 2 eggs
- 1 cup oatmeal
- 1 cup creamy peanut butter
Instructions
- Preheat the oven to 350°F (175°C).
- Shape the dough into 1-inch balls.
- Place the balls on an ungreased cookie sheet.
- Press each ball with a fork to create a crisscross pattern.
- Bake for 8 to 10 minutes.
Pretty stoked at how well this works. Could potentially be a good way of documenting family recipes in the future. The only problem I foresee is the image will have to be cropped so it only contains text pertinent to the recipe.