AI - Architect the cloud

Your own AI coding assistant running on Akamai cloud!

Alesandro Slepčević — Tue, 18 Feb 2025 18:17:12 +0000

What? You want some AI to write my code?

AI-powered coding assistants are the main talk in the developer world for a while, there’s no denying that. I can’t count the times I’ve read somewhere the AI will replace developers in the next X years. You’ve probably seen tools like GitHub Copilot, ChatGPT, or Tabnine popping up everywhere.

They promise to boost productivity, help with debugging, and even teach you new coding techniques. Sounds amazing, right? But like anything, AI-powered coding assistants have their downsides too. So, let’s talk about what makes them great—and where they might fall short.

Why AI Coding Assistants Are a Game-Changer

Obviously, one of the biggest advantages of using an AI assistant is the time it saves. Instead of writing the same repetitive boilerplate code over and over and over again, you can generate it in seconds. Need a quick function to parse JSON? AI has you covered. Easy peasy! Stuck on how to structure your SQL query? Just ask. This means less time spent on the boring stuff and more time on actual problem-solving.

AI is also a fantastic debugging tool. It can analyze your code, catch potential issues, and suggest fixes before you even run it. Instead of spending hours combing through error messages and Stack Overflow threads, you get quick, relevant suggestions that help you move forward faster.

And let’s not forget about learning. If you’re picking up a new language or framework, an AI assistant can guide you with real-time examples, explain unfamiliar syntax, and even generate sample projects. It’s like having a 24/7 coding mentor who doesn’t judge your questions.

Beyond just speed and learning, AI can actually help improve code quality. It can suggest best practices, helps format your code, and even recommends refactoring when your code gets messy. Plus, if you’re working in a team, it can assist with keeping code style consistent and even generate useful commit messages or documentation. Wouldn’t it be cool if we could plug the AI into our pipeline and make sure that all rules are being followed?

The downsides no one(everyone) talks about?

As cool as AI coding assistants are, you don’t need to be a genius to see that they’re far from perfect. One of the biggest concerns I personally see is over-reliance. If you’re constantly relying on AI to write your code, do you really understand what’s happening under the hood? This can be a problem when something breaks, and you don’t know how to fix it because you never really wrote the thing in the first place I’m sure you love reading someone else’s codebase and debugging that <3

Another issue is that AI-generated code isn’t always optimized or even correct! It might suggest something that works but isn’t efficient, secure, or maintainable. If you blindly accept AI suggestions without reviewing them, you could end up with a mess of inefficient or buggy code.

Then there’s the question of security. AI assistants are trained on huge datasets, and sometimes they can generate code that includes security vulnerabilities. If you’re working on sensitive stuff, you have to be extra careful about what code you’re using and where it’s coming from.

Let’s talk privacy! Many AI coding tools rely on cloud-based processing, meaning your code might be sent to external servers for analysis. If you’re working on proprietary or confidential code, you need to be aware of the risks and check the privacy policies of the tools you’re using.

And finally, while AI can make you more productive, it can also be a bit of a crutch. Some developers might start relying too much on AI for even basic things, which can slow down their growth and problem-solving skills in the long run.

So, Should You Use One?

AI coding assistants are undeniably powerful tools, but they work best when used wisely. They’re great for boosting productivity, helping with debugging, and learning new technologies—but they shouldn’t replace actual coding knowledge and problem-solving skills. Think of them as a really smart assistant, not a replacement for your own expertise.

If you use AI responsibly—review its suggestions, stay mindful of security risks, and make sure you’re still learning and improving as a developer it can be a fantastic addition to your workflow, just don’t let it do all the thinking for you

Still interested and want to start using AI in your daily work? Enter bolt.diy

bolt.diy is the open source version of Bolt.new (previously known as oTToDev and bolt.new ANY LLM), which allows you to choose the LLM that you use for each prompt! Currently, you can use OpenAI, Anthropic, Ollama, OpenRouter, Gemini, LMStudio, Mistral, xAI, HuggingFace, DeepSeek, or Groq models

bolt.diy was originally started by Cole Medin but has quickly grown into a massive community effort to build the one of the open source AI coding assistants out there.

What do I need to get this deployed?

Well, just Terraform and a Linode account.
In the backend we will deploy a VM with a GPU attached, install bolt.diy, ollama and ask it to write some code! Maybe a simple Tic-Tac-Toe game?

Ideally you would run your bolt.diy deployment on a separate machine from the machine running the model, but for our use case, current deployment model is more than enough.

Like most of the things on this blog, guess what we’re gonna use? Yes! IaC!!!

Here’s a link to the Github repository containing the Terraform code.

Code will do the following:

Deploy a GPU based instance in Akamai Connected Cloud
Use cloud-init to install the following:
- curl
- wget
- nodejs
- npm
- nvtop – great tool to monitor your GPU usage
- Nvidia drivers
Deploy and configure a firewall which will allow SSH and bolt.diy access from your IP.
Configure bolt and ollama to run as a Linux service. For ollama service, we are always making sure we have a model downloaded and created with 32K context size.

How do you deploy it?

Just fill in your Linode API token and the desired region, Linode token and your IP address in variables.tf file and run the following commands:

git clone https://github.com/aslepcev/linode-bolt.diy
cd linode-bolt.diy
#Fill in the variables.tf file now
terrafom init
terraform plan
terraform apply

After a short 5-6 minute wait, everything should be deployed and ready to use. Go ahead and visit the IP address of your VM on the port 5173.

Example url: http://172.233.246.209:5173

Make sure that Ollama is selected as a provider and you’re off to the races!

What can it do?

Well, it really depends on the model we are running. With the RTX 4000 Ada GPU, we can comfortably run a 14B parameter model with 32K context size which is “ok” for smaller and simpler stuff.

I tested it out with a simple task of creating a Tic-Tac-Toe game in NodeJS. It got the functionality right the first time, but it looked like something only a mother could love

I just told it to make it a bit prettier and add some color; these were the results I got:

Interestingly, during the coding process, it made a mistake which it managed to identify and fix all on its own! All I did was press the “Ask Bolt” button.

Also, here’s a fully functioning Space Invaders alike game which it also wrote

What if I want to run a larger model? 32B parameters or even larger?

That’s very easy! Since Ollama can use multiple GPU’s, all we need to do is scale up the VM we are using to the one which includes two or more GPU’s. Akamai offers maximum of 4 GPU’s per VM which brings up to 80 GB of VRAM which we can use to run our model. I will not experiment with larger models in this blog post; this is something we will benchmark and try out in the future.

Cheers! Alex.

P.S – parts of this post were written by bolt.diy

The post Your own AI coding assistant running on Akamai cloud! first appeared on Architect the cloud.

Sentiment analysis of 40 thousand movie reviews in 20 minutes using Neural Magic’s DeepSparse inference runtime and Linode virtual machines.

Alesandro Slepčević — Sat, 30 Mar 2024 22:54:00 +0000

First, let me start with a word or two about DeepSparse.

DeepSparse is a sparsity-aware inference runtime that delivers GPU-class performance on commodity CPUs, purely in software, anywhere.

GPUs Are Not Optimal – Machine learning inference has evolved over the years led by GPU advancements. GPUs are fast and powerful, but they can be expensive, have shorter life spans, and require a lot of electricity and cooling.

Other major problems with GPU’s, especially if you’re thinking in the context of Edge computing, is that they can’t be packed as densely and are power ineffective compared to CPU’s; not to mention availability these days.

Since Akamai recently partnered up with Neural Magic, I’ve decided to write a quick tutorial on how to easily get started with running a simple DeepSparse sentiment analysis workload.

In case you want more about Akamai and Neural Magic’s partnership, make sure to watch this excellent video from TFiR. It will also give you a great summary of Akamai’s Project Gecko.

What is Sentiment analysis?

Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

Why is DeepSparse cool? Because I’m doing analysis of 40 thousands movie reviews in 20 minutes using only TWO DUAL CORE Linode VM’s. Mind officially blown.

Let’s do some math here; rounding it up to 120 thousand processed reviews an hour, with 2 instances and a load balancer, we can process over 86 million requests a month which will cost you a staggering 82$ .

If you’re doing that on other cloud providers, you’re paying a five digit monthly bill for that pleasure.

Want to try it yourself? It’s easy!

If you want to try it out on Linode, follow instructions below.

If you want to check out Neural Magic DeepSparse repo, head out here.

Step 1. Clone the Repository.

Open your terminal or command prompt and run the following command:

git clone https://github.com/slepix/neuralmagic-linode

This code will deploy 2 x Dedicated 4 GB virtual machines and a Nodebalancer. It will also install Neural Magic’s DeepSparse runtime as a Linux service and install & configure Nginx to proxy requests to DeepSparse server listening on 127.0.0.1:5543.

WARNING: THIS IS NOT PRODUCTION GRADE SERVER CONFIGURATION!

It’s just a POC! Secure your servers and consult Neural Magic documentation if you want to go to production.

Step 2. – Terraform init

Navigate to the repo using the following command:

cd neuralmagic-linode

If you haven’t already installed Terraform on your machine, you can download it from the official Terraform website and follow the installation instructions for your operating system.

Step 3.

Initialize Terraform by running:

terraform init

Step 4. – Configure your Linode token

Open variables.tf file and paste in your Linode token. If you don’t know how to create a Linode PAT, check this article here. It should look similar like the picture. You can also adjust the region while you’re here

Token in the picture is not valid. It's just an example.

Step 5 – Run Terraform apply

After configuring your variables, you can apply the Terraform configuration by running:

terraform apply

Terraform will show you a plan of the changes it intends to make.

Review the plan carefully, and if everything looks good, type “yes" and press Enter to apply the changes. Give it 5-6 minutes to finish everything and by visiting your Nodebalancer IP, you should be presented with a landing page for DeepSparse server API.

Step 6.

After the installation is done, it’s finally time to send some data to our API and see how it performs.

We can do that by using curl or invoke-webrequest if you’re on Windows and using Powershell.

CURL:

sentence="Neural Magic & Akamai are cool!"
nodebalancer="172.233.34.110" #PUT YOUR NODEBALANCER IP HERE
curl -X POST http://$nodebalancer/v2/models/sentiment_analysis/infer -H "Content-Type: application/json" -d "{\"sequences\": \"$sentence\"}"

PowerShell:

$sentence = "Neural Magic & Akamai are cool!"
$nodebalancer = "172.233.34.110"

$path = "v2/models/sentiment_analysis/infer"
$api = "http://$nodebalancer/$path"
$body = @{
   sequences = $sentence
} | ConvertTo-Json

(Invoke-WebRequest -Uri $api -Method Post -ContentType "application/json" -Body $body -ErrorAction Stop).content

In both cases make sure to paste in the IP address of the Nodebalancer you deployed and modify the sentence as you wish.

Benchmark time!

In the repository, I’ve included a file called movies.csv and three files; two PowerShell and one Python file.

movies.zip – unzip this one in the same folder where your benchmark scripts are.

analyze.ps1 – PowerShell based benchmark, sends requests in serial – not performant.

panalyze.ps1 – PowerShell based benchmark, sends requests in parallel – better performant

pypanalyze.py – Python based benchmark, sends requests in parallel – best performer (doh!) <-use this

All you need to do to in order to kick off a benchmark is to update the the URL variable with your Nodebalancer IP and you’re off to the races.

Does it scale?

Yes! For kicks I’ve added a third node and the same job finished in 825 seconds. Feel free to add as many nodes as you like and see what numbers you can get. Additionally, you can play with the number of workers in the Python file.

Note 1: python script has been written with the help of ChatGPT :) Results matched with my PowerShell version against verified smaller sample size(check note 2), so I'm gonna call it good :)
	 
Note 2: PowerShell versions don't handle some comments as they should and end up sending garbage to the API. Happens in 3% of the cases. Most probably some encoding/character issue which I couldn't be bothered to fix :)

Note3: Movies.csv file has been generated by using data from https://kaggle.com/

Cheers,

Alex.

The post Sentiment analysis of 40 thousand movie reviews in 20 minutes using Neural Magic’s DeepSparse inference runtime and Linode virtual machines. first appeared on Architect the cloud.