Maestro: First Principals Thinking

Findings

While I think LLMs can be used to generate ideas or compile a list of potential features or technologies, the models aren’t sophisticated enough to understand the context of the problem and how to maximize user value.

This leaves a lot of gaps that humans inevitably need to fill. That’s to be expected. Anyone who has built an app from scratch will tell you that gathering feedback early and often is the best way to create the most useful app.

Novel concepts might be inspired by humans interacting with LLMs, but it’s doubtful it could come up with anything compelling on its own. Could you use LLM as a brainstorming tool, absolutely!

My Experience

The accuracy of output was mostly consistent, some models went into more detail than others. This is probably a result of token limits and other model parameters. The same for completeness of response. Each model attempted to follow the process of first principals toward and MVP.

But none of the models really handled the concept of first principals reasoning. I suspect the results from this test were influenced by the vagueness of my prompt. Each model confidently proclaimed its fundamental truth and proceeded with solutioning. The models didn’t show their work and it’s difficult to determine if the code skipped a step.

Likewise, none of the models identified the assumptions to be validated by the MVP, which is a huge part of what we want to learn from end user testing. A good app MVP focuses on user experience or behavior. A great app focuses on making that interaction as simple and effective as possible.

While I appreciate that the LLM models were able to call out features and technologies that might solve a user’s problem, I find this type of thinking one dimensional. One model tried to create user stories but didn’t describe the user value.

Findings

My Experience

Test Results