Jul 5, 2024

Using Cloudflare AI and the DeepSeek Coder LLM to Write Code

My code is always filled with // TODO: comments. It’s not because I don’t finish things, but because there are always things to go back and improve. I’m sure I’m not the only one, as Android Studio has a built-in feature to show all of the // TODO: comments in a project.

I could just sit down one rainy Saturday and go through all of them in my projects, but wouldn’t it be a lot more fun to let an LLM take the first pass at them?

While I don’t think we are quite at the point of automating the code writing and pull requests, I do think this is a really interesting area to explore. All the usual caveats apply that you shouldn’t use this approach with proprietary code, and you do need to review the changes before deciding to merge them.

I used the DeepSeek Coder LLM from Hugging Face, which is a large language model trained on code. The Cloudflare AI API hosts the model.

For the demo night, I wrote a JavaScript command line tool that fills in the // TODO: comments in your source code with code statements generated by the LLM. The rough workflow is that you run the tool with an argument of a source code file, it reads your code, sends the file to the LLM, and then replaces the TODO comments with the generated code.

Want to see what it looks like in action? Check out this example, and then find the code listing below.

Example Input


/**
 * Calculate the area of different geometries.
 * @param {string} shape - The shape type (e.g., 'circle', 'rectangle', 'triangle').
 * @param {object} dimensions - The dimensions of the shape.
 * @returns {number} - The calculated area of the shape.
 */
function calculateArea(shape, dimensions) {
    let area = 0;
  
    switch (shape.toLowerCase()) {
      case 'circle':
        const radius = dimensions.radius;
        area = Math.PI * radius * radius;
        break;
      // TODO: Add some more shapes
      default:
        throw new Error('Unsupported shape type');
    }
  
    return area;
  }


function calculateVolume(shape, dimensions) {
  // TODO: Build out this function
}

After running this code through my tool, the first // TODO: statement gets replaced with code generated by the LLM. The second // TODO: statement is left in place, as the LLM only processes one completion at a time.

Example Output

/**
 * Calculate the area of different geometries.
 * @param {string} shape - The shape type (e.g., 'circle', 'rectangle', 'triangle').
 * @param {object} dimensions - The dimensions of the shape.
 * @returns {number} - The calculated area of the shape.
 */
function calculateArea(shape, dimensions) {
    let area = 0;
  
    switch (shape.toLowerCase()) {
      case 'circle':
        const radius = dimensions.radius;
        area = Math.PI * radius * radius;
        break;
      //FROM LLM: Add some more shapes
      case 'rectangle':
        const length = dimensions.length;
        const width = dimensions.width;
        area = length * width;
        break;
      case 'triangle':
        const base = dimensions.base;
        const height = dimensions.height;
        area = (base * height) / 2;
        break;

      default:
        throw new Error('Unsupported shape type');
    }
  
    return area;
  }


function calculateVolume(shape, dimensions) {
  // TODO: Build out this function
}

My Demo Night Code

This is more or less the code that I demo’d at the Cloudflare AI Hack Night in Austin. I’ve cleaned it up just a little, in particular the way the regular expression works for the TODO comments - the old approach appended the TODO comment after the inserted code, and leaving the comment in place where it was makes sense.

The old input file path was also hard-coded, instead of being an argument to the script.

I also tested this version with multiple TODO comments - again, I had about 40 minutes to code this originally, so some of the edge cases weren’t handled!

require('dotenv').config()
const fs = require('fs');
const { exit } = require('process');

const model = "@hf/thebloke/deepseek-coder-6.7b-base-awq"

const CLOUDFLARE_ACCOUNT_ID = process.env.CLOUDFLARE_ACCOUNT_ID;
const CLOUDFLARE_API_TOKEN = process.env.CLOUDFLARE_API_TOKEN;

const headers = {
  "Authorization": `Bearer ${CLOUDFLARE_API_TOKEN}`
};

const url = `https://api.cloudflare.com/client/v4/accounts/${CLOUDFLARE_ACCOUNT_ID}/ai/run/${model}`;


if (process.argv.length !== 3) {
  console.log('This script takes a filename as an argument, such as todo.rb')
  exit();
}

const filePath = process.argv[2];

const fileContents = readFile(filePath);

const modifiedContents = processTodoComments(fileContents);

if (modifiedContents !== null) {
  console.log('Modified file contents:');
  console.log(modifiedContents);
}

const messageBody = {
  "messages":[
    {
      "role":"system",
      "content":"You are a superstar coder"
    },
    {
      "role":"user",
      "content":modifiedContents
    }
  ]
}

async function callURL() {
    const response = await fetch(url, {
        method: "POST",
        headers: headers,
        body: JSON.stringify(messageBody)
    });
    return await response.json();
}

console.log('==========CALLING CLOUDFLARE==============')

callURL().then(json => {
    const newCode = json.result.response;
    handleLLMResponse(newCode);
})

function handleLLMResponse(newCode) {
  console.log('==========NEW CODE==============')
  console.log(newCode);
  const improvedCode = modifiedContents.replace('<｜fim▁hole｜>', newCode).replace('<｜fim▁end｜>','').replace('<｜fim▁begin｜>','').replace('// TODO:', '//FROM LLM:')
  console.log('\n\n\n\n');
  console.log('==========IMPROVED CODE==============')
  console.log(improvedCode);
}


function readFile(filePath) {
    try {
      // Read the file contents
      let fileContents = fs.readFileSync(filePath, 'utf8');
      return fileContents;
    } catch (err) {
      console.error(`Error reading or processing file: ${err.message}`);
      return null;
    }
  }
  
function processTodoComments(fileContents) {

      // Regular expression to match TODO comments
      const todoRegex = /\/\/ TODO:.*/;
  
      // Prepend the fim_hole tag to TODO comments
      const modifiedContents = fileContents.replace(todoRegex, '$&\n<｜fim▁hole｜>\n');
  
      return '<｜fim▁begin｜>' + modifiedContents + '<｜fim▁end｜>';
}

Next Steps

There are some nice improvements you could make to this tool. The first would be to handle multiple files at once, so that the LLM has more context. I believe that’s possible by including the files into the LLM input, but I haven’t tried it yet.

The second improvement would be to handle multiple TODO comments in a single file. The current implementation only processes one at a time, but it would be nice to process all of them in a single run. The problem with this would be if the LLM generated code that was dependent on the code generated by a previous TODO comment, and that code wasn’t correct.

The third improvement would be to generate some kind of test or validation for the code - this might be one way of ensuring that the generated code is correct.

From a user experience perspective, there could be some better ways to integrate this tool into a developer workflow. For instance, it could generate a pull request with the changes, and then tag you - more of an automated bot approach, than an on-demand command line.

Another user experience improvement would be to integrate this tool with Visual Studio Code, where it could keep track of TODO comments in your project, and then offer up fixes.

Conclusion

From my perspective, this little experiment was a succcess - it was easy to use the Cloudflare AI API, and the DeepSeek Coder LLM wasn’t something that I was personally familiar with.