What I'm working on: ocaml-cfgen

I owe it to myself to blog a bit more often about what I'm doing, but the hard part is only writing about one thing: there's always so much context that I want to convey, but sometimes that's left for a separate, more focused post.

ocaml-cfgen is attempt to turn CloudFormation resource definitions into OCaml types with support for serialising them to CloudFormation JSON.

This isn't my first attempt to build parser-generators, see also aws-reason, rescript-aws-sdk-v3-wrapper, ocaml-gtk (all in various states of completion but none in regular use).

I keep gravitating back to them as they give you an instant goal with a clearly defined path to completion (generate the types and get them to compile). On their own, however, they aren't that useful.

ocaml-cfgen is a side-track from another goal, which was to try and build a decent workflow for writing and deploying AWS Lambda functions written in OCaml. AWS doesn't support OCaml as a target language, but they do have support for custom runtimes.

Creating a development environment for OCaml serverless

To build for an AWS custom Lambda runtime, you need to have a development environment running on Amazon Linux 2 (AL2), as this is the Linux version provided in their Lambda containers. OCaml binaries built for other Linux distributions will probably not work as they will be bound against the glibc version of your distribution, which will probably have an incompatible ABI with that used on AL2 (as well as other important libraries).

An easy way to do this is to build your Lambda functions inside a docker container running AL2, but this is really only good for building and testing inside CI/CD pipelines for deployment environments, where correctness and reproducability is key.

If you want to iterate locally on your build, deploying it as you go and testing in the cloud, spinning up docker containers and running a full build and then deploying with CloudFormation is not fast.

AWS provide Serverless Application Model (SAM, written in Python) and Cloud Development Kit (CDK, written in TypeScript), which both have support for patching your Lambda functions directly in your development workflow. This feature at least lets you bypass the CloudFormation stage, which itself is quite slow, but you still need to build your OCaml code. This solution works well for interpreted languages where the runtime is already in the target Lambda container (or can be put there with a Lambda layer) and your code doesn't need to be built, but unhelpful for compiled languages (like Rust or OCaml) which link against native binaries.

(As as aside: I have moved away from local testing of Lambdas for all but the simplest API projects that don't have integration with other AWS services (except for DynamoDB), as local emulation of these is difficult to configure and you always end up with a mismatch between what the cloud service can do and what the local emulator supports. Any non-trivial project is a nightmare to test locally - start building enterprise software and you'd be mad to attempt local testing unless you want to seriously slow your devs down and force them onto the lowest common denominator capabilities of your target cloud)

Building OCaml with SAM

I started with building a workflow on top of SAM, but with two separate workflows:

A production deployment workflow, which builds native binaries inside a docker container containing a full OCaml switch installed with your project's dependencies, suitable for CI/CD
A development deployment workflow, where the OCaml switch + dependencies are built inside a docker container once, but the project is rebuilt as OCaml bytecode with a local opam switch, so it can be deployed to the cloud and run with an ocamlrun binary.

To support this workflow, I need to compile an OCaml switch inside a docker container, with all the project dependencies. However, I build this onto a docker volume mount, and then create two versions of this: one version with all the binaries and source code, useful for production builds, and another version which contains just the ocamlrun binary and any shared native libraries, which I zip up and use as a Lambda layer with the custom AL2 runtime to deploy and run my development bytecode versions.

This also needs a local opam switch to be able to run the compiler locally outside of the docker container to support the development workflow (generating bytecode, merlin + lsp support for VSCode usage, etc.).

(This means you need two OCaml switches, which take time to build themselves as they literally download and compile the compiler plus each library - I wish there was better solutions to this problem)

This is quite cumbersome and held together (poorly) with shell scripts and lots of process, so it's not something I want to productionize and would not propose to any development team. Ideally I would reuse an AWS tool for building out this workflow, and CDK is promising, but multi-language deployment tooling is always horrible to use and maintain and full of kludges.

Integrating this into SAM itself would also be something of an undertaking, given some of the internal architectural details that make it difficult to build code for multi-Lambda projects performantly. SAM builds each Lambda function individually, so this would preclude direct dune integration for building all the binaries in one sweep, with dune's automatic support for incremental builds.

(I face this problem everyday with TypeScript-based AWS Lambda projects: SAM builds each Lambda separately with esbuild, which over dozens of Lambda functions, locks up all your CPUs for about 60s while it spawns dozens of parallel esbuild instances. Building Lambda functions individually in any language seems to be much slower than building them together, if for no other reason than a lot of shared code gets reprocessed.)

Escaping multi-language tool hell

This is all a long-winded way of saying I haven't bothered looking to do this in the CDK, which also has seems to have the same, broken, single-Lambda build architecture, and is also slow to build and deploy. Instead I'm looking into how I could get the best bits of SAM and CDK (CloudFormation generation, individual Lambda patching, build tool integration, etc) if I did it myself in OCaml. cf-gen is one of the steps to get there.

My hypothesis (hope) is that tigthly integrating the tooling for AWS Lambda development with the build tooling of the target language (OCaml), along with using a (famously) fast lanugage, will result in a much better developer experience that sticky-taping together the tooling from multiple languages (Python, OCaml, TypeScript).