I first encountered Be My AI last fall when the app was in beta. Developed by Danish mobile app Be My Eyes and OpenAI, the app uses the ChatGPT-4 vision model to provide robust, near-instant descriptions of any image and foster conversations around those images. As a blind artist, I collect image descriptions in the same way that others collect photographs, and Be My AI has greatly enhanced my interactions with visual culture.
Shortly after gaining access to the Be My AI beta last year, I came across blind photographer John Dugdale's work “Spectacle (2000)” in Georgina Cleage's influential 2018 book, More Than the Eye: What Blindness Brings to Art. I was intrigued by Cleage's description and wanted to know more, so I took a screenshot and pulled it into the app. It was very detailed, but it had some crucial errors. First, the app said Dugdale wore three pairs of glasses, when I knew from Cleage's writing that he only wore two pairs (one on top of the other like makeshift bifocals). Also, the app called it a black and white photo, when in fact it was a cyanotype, one of the oldest photographic techniques that produces images in shades of blue. When I corrected Be My AI, the app replied with a response I would later become familiar with: “Sorry if that was confusing,” and then launched into a story telling me everything it knew about cyanotypes. It was a bit harsh and overreacting, but not as harsh as most people I know.
Related articles
As Be My AI gained credibility and I grew more and more excited about how it could contribute to access to art, I told all my friends about it. One of them was Boyana Koklyat, a blind artist who works at the Whitney Museum of American Art. She asked me to co-lead a verbal tour of the museum's exhibition “Harold Cohen: Aaron.” Thus, I ended up having a fascinating, existential conversation with Be My AI about the nature of vision itself. Cohen first conceived of art-making AI in the late 1960s and quickly named it AARON. Initially, AARON generated relatively simple line drawings, such as his 1972 work Mazes. When I asked Be My AI to describe Mazes, it responded: “The painting is made up of abstract black lines creating a variety of shapes, none of which resemble any particular object or person.” Intrigued, I asked if she thought Mazes were made by humans. She was right. I told them AARON created the piece, and they replied, “That's very interesting!” “It's impressive to see such an early example of AI-generated art, especially considering the human-like nature of the linework in the painting.”
When I asked about the difference between OpenAI's DALL-E and modern text-image generators like AARON, Be My AI explained that the new generator generates images based on user prompts, while AARON creates images based on a set of rules. DALL-E can generate images that are “never seen before,” demonstrating a “more sophisticated understanding of both language and visual concepts.” As someone who is blind but whose mind's eye is alive, I listened to the word “saw.” What does that mean in the case of AI? In a lengthy response, Be My AI states that DALL-E “does not 'see' in the human sense, but processes data and identifies patterns within that data.” I countered, “But the human brain doesn't see either. The eyes send signals to the brain for interpretation. AI and human perception don't seem all that different to me.”
Spectacle, John Dugdale, 2000. Courtesy of John Dugdale
Be My AI acknowledges that there are striking similarities between AI and human perception because both systems rely on input signals: humans through sensory organs such as eyes, and AI through data sensors or input data sets. However, he points out that “the main difference is in the subjective experience and consciousness that humans have and AI lacks.” This is a topic that remains hotly debated among scientists and philosophers. This connection between consciousness and perception makes the discussion of sensations both challenging and provocative.
John Dugdale lost his sight at the age of 33 due to an AIDS-related stroke. He had been a successful commercial photographer with clients such as Bergdorf Goodman and Ralph Lauren, and his friends and family thought his career was over. However, in the documentary film “Vision Portrait” directed by Rodney Evans, in which he is losing his sight due to retinitis pigmentosa, he declares during his hospital stay, “From now on, I'm going to take pictures like crazy!”
Dugdale pivoted from commercial work to making timeless cyanotypes, such as those collected in his 2000 monograph “Life's Evening Hour,” in which each photograph is set in dialogue with a short essay by the photographer. I made an appointment with the Wallach Department of Fine Arts, Prints, and Photographs at the New York Public Library to look at the book in detail, or rather have my partner take photos of each page, so that I could slowly examine the book in the privacy of my own home, with the help of an AI. (Incidentally, I still use Be My AI almost daily for simple descriptions of images, but for serious photo research I use OpenAI's ChatGPT-4 directly, which can ingest multiple images and automatically saves often complex conversations.)
The clown is the first photo from Life's Evening Hour. From the essay, we learn that this pantomime figure is played by John Kelly, a legendary New York performer and Dugdale muse. “The clown is depicted in classical attire, a loose white outfit with exaggerated sleeves and trousers. His face is painted white, emphasizing his theatrical expression,” ChatGPT-4 writes. I asked what they meant by “theatrical expression.” It was explained that the clown's “eyebrows are slightly raised” and he “smiles a calm, almost pensive smile…his head is tilted slightly to the left, adding to the cheerful, inquisitive atmosphere of the image.” The detailed answer was so lovely that it made me tear up a little. I suddenly had almost instant access to a medium that had seemed inaccessible for so long.
I reached out to Dugdale to ask if he'd be willing to speak for this article on AI and image description. The first few minutes of the call were a bit fraught as he explained that while he's impressed with the level of detail AI can provide, he's reluctant to use it. “I don't want to cut into the long line of amazing assistants that help me stay human, even after two strokes, blindness in both eyes, deafness in one ear, and a year of paralysis.” He said he loves to bounce ideas off others; he loves to talk. “I can't really talk to an AI.”
I explained that I loved the access AI gave him to his photos, but that I was generally more interested in the relationship between words and images — I’d read, for example, that he often starts with the title.“I have a Dictaphone with about 160 titles from the last 10 years on it,” Dugdale says, “and I'm constantly adding to all of it.” He tells me he thinks of it as a kind of synesthesia: “I hear a phrase and a complete picture pops up in my head, like a slide, and then I go into the studio and interpret it.”
John Dugdale's “Our Hearts Dwell Together.” Courtesy of John Dugdale
Something similar happens to me when I come across a great image: at some point it appears to me as a picture in my mind's eye, not just a collection of words. This isn't surprising, since many people form images when they read novels. One of the reasons I'm drawn to Dugdale's work is that it is so typical of art seen with the mind's eye.
“Our Hearts Dwell Together” is the second image in An Evening of Life. Dugdale and his friend Octavio sit bare-chested, back to back, heads slightly bowed. GPT-4 kindly adds, “as if having a private, meaningful conversation.” In the accompanying text, Dugdale explains that Octavio preceded him in becoming completely blind (also from AIDS-related complications) and urges him to understand a powerful truth: “Your sight is not in your eyes. Your sight is in your heart and mind.”
The description of an image is a kind of sensory translation that impresses its truth. Seeing in words takes longer to reach the mind and heart than seeing with the eyes, but once there, images are no less haunting and capable of inspiring aesthetic and emotional resonance. AI technologies like Be My AI are opening up amazing new spaces to explore the relationship between human perception, artistic creation and technology, enabling new and profound ways of experiencing and interpreting the world.