Links: Merlín Sofware
Related notes
Digital Signage
Digital Photo Frame
Smart Mirror
Planning
The plan is to develop a scalable software that allows for a small computer (like a Raspberry Pi nano) to run a client, that displays AI generated pictures.
The client should be able to receive voice commands that will process to generate a new picture.
To make it as flexible as possible, the client should run on a standard browser.
The server side should be able to run on a separate computer to allow a local AI model to generate the images.
A first draft of the possible architecture:
We concluded that we want to avoid physical button, and use voice commands locally to control de generation of images, for that, a software like Whisper could translate the audio to text.
To control when the voice controls are activated, instead of a physical button, a wake word could be used (check this discussion on github), and Voice Acticity Detection (VAD) is also important to know when the command ends.
More complex configuration, like setting the desired server provider and changing modes can be left to an app (or webapp)
There are two modes for the client, one that requests AI images from voice commands, and one that request a random image.
The carousel mode is going to have a timer associated that could be configured beforehand, and will request images every so often.
The project divides into 3 codebases.
frame_server
The server side written in python, it provides a REST API for comunication with the client and send images.
frame_client
The client written in Flutter, it sends requests to the server for images and displays them in screen.
Shows a QR on startup to allow for connection with the app.
frame_app
The control app written in Flutter, it runs on a phone.
Connects to the client through an API REST and sends config options like speed of requests and server ip
Includes a QR reader for direct communication with the client, and initial connection