I’m building a system where I’ll have some embedded devices sending audio data to my server via HTTP, along with a small amount of numerical data. It’s important that before these are sent to the database, they’re bundled together. Another point, the client needs to authenticate to upload data.

The naive solution would be to send everything together as a JSON object and just base 64 encode the audio, with a single backend endpoint for receiving both the audio and the other data. My concern is, base64 adds a pretty significant amount of overhead (4 bytes of base64 for every 3 bytes of audio) and that is a problem for me, it’s more power usage and it’s a medium amount of data, a 33% tax is non-trivial. Network encoding might reduce that problem, but I can’t be entirely sure until I test.

The other solution would be setting up a different endpoint for audio that can receive the binary data directly, maybe through a stream. This probably is the most efficient way to go about it, but it introduces some complexity as to how I’m supposed to match up the audio with the other data. I’m not very familiar with backend engineering so I’m not sure if there’s a straightforward solution that is just an unknown unknown for me.

Maybe I can hash the audio data in the client, include it in the header for both requests, then keep a record of incoming requests along with their hashes in a dictionary in the server (push them as they’re received) so they can efficiently be matched up with each other? three-heads-thinking Feels like a cool solution, but IDK, thoughts? Should I just eat the 33% overhead from the simple solution and accept that premature optimization is the root of all evil?

  • zongor [comrade/them, he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 months ago

    Instead of sending everything as a json object you might want to use a multipart request. That way you could send the json object with the numerical data as a part of the first response to the request and then the file as the second response to the request.

    I’m not sure if this would work without more context but, you could also write/use a small 9p server on your embedded devices to create a namespace for the audio file and the numeric data as a separate file and then on your server you could mount all of the embedded device’s files using 9pfuse or writing a small 9p client to read the data directly

    • FunkyStuff [he/him]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      9 months ago

      Yeah, reasonable, I’ll try that. Might add a little bit of cost to the server side to decompress but it’s probably negligible.

  • farting_weedman [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    9 months ago

    Encode the audio with opus first then use base 64.

    Audio is pretty little and embedded devices are already pretty low power. What limitations are you trying to work around that it’s messing you up?

    Edit: Oh yeah, why do you need to authenticate first? It seems like a stupid question but you could be establishing tls with the authentication and relying on it for security or you could just be using authentication to figure out which surveillance button just sent a timestamped audio log.

    • FunkyStuff [he/him]@hexbear.netOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 months ago

      There’s a couple limitations, the main one has nothing to do with the server, it’s about the transmitter antennas feeding into the uplink being a little low bitrate compared to the audio that’s being recorded. I’ll have to research opus encoding.

      Main reason for auth is that it’s going on a public web server, and I don’t know of a simpler way to make sure only our stuff is uploading to it than authenticating (well I guess I could also just check the IP address of the request but that feels like an unreliable solution, TLS is probably best).