Latest Posts

Volumes Technical Preview: Blazing-fast NVMe SSD for Your Data

Jun 19, 2024 By Thomas Le Roux In Koyeb

Ready? Day three of Koyeb launch week is on! When you deploy your apps on Koyeb, your data is on ephemeral disks. While this works great for stateless applications, this is challenging for stateful workloads like databases. Just in time to save the day, we are launching the technical preview of Volumes! You can now use Volumes to persist data between deployments, restarts, and even when services are paused. We're gradually onboarding users to ensure the best experience for everyone.

Read Post

Koyeb

Read more about Volumes Technical Preview: Blazing-fast NVMe SSD for Your Data

GPUs Public Preview: Run AI workloads on H100, A100, L40S, and more

Jun 18, 2024 By Yann Léger In Koyeb

Welcome to day two of Koyeb launch week. Today we're announcing not one, but two major pieces of news: Our lineup ranges from 20GB to 80GB of vRAM with A100 and H100 cards. You can now run high-precision calculations with FP64 instructions support and a gigantic 2TB/s of bandwidth on the H100. With prices ranging from $0.50/hr to $3.30/hr and always billed by the second, you'll be able to run training, fine-tuning, and inference workloads with a card adapted to your needs.

Read Post

Koyeb

Read more about GPUs Public Preview: Run AI workloads on H100, A100, L40S, and more

Autoscaling GA: Scale Fast, Sleep Well, Don't Break the Bank

Jun 17, 2024 By Julien Castets In Koyeb

We are thrilled to kickstart this first launch week with autoscaling - now generally available! Our goal is to offer a global and serverless experience for your deployments. Autoscaling makes this vision a reality. Say goodbye to overpaying for unused resources and late-night alerts for unhealthy instances or underprovisioned resources! During the autoscaling public preview, we received key feedback around scaling factors.

Read Post

Koyeb

Read more about Autoscaling GA: Scale Fast, Sleep Well, Don't Break the Bank

Best LLM Inference Engines and Servers to Deploy LLMs in Production

May 20, 2024 By Alisdair Broshar In Koyeb

AI applications that produce human-like text, such as chatbots, virtual assistants, language translation, text generation, and more, are built on top of Large Language Models (LLMs). If you are deploying LLMs in production-grade applications, you might have faced some of the performance challenges with running these models. You might have also considered optimizing your deployment with an LLM inference engine or server.

Read Post

Koyeb

Read more about Best LLM Inference Engines and Servers to Deploy LLMs in Production

A Software Engineer's Tips and Tricks #4: Collaborating on Visual Studio Code with Live Share

May 14, 2024 By Julien Castets In Koyeb

Hey there! We're back for our third edition of Tips and Tricks, our new mini series where we share some helpful insights and cool tech that we've stumbled upon while working on technical stuff. Catch up on the previous posts: All of our posts are super short reads, just a couple of minutes tops. If you don’t like one of the posts, no problem! Just skip it and check out the next one. If you enjoy any of the topics, I encourage you to check out the "further reading" links.

Read Post

Koyeb

Read more about A Software Engineer's Tips and Tricks #4: Collaborating on Visual Studio Code with Live Share

The engineering behind autoscaling with HashiCorp's Nomad on a global serverless platform

May 6, 2024 By Julien Castets In Koyeb

There are several ways to handle load spikes on a service. However, these methods are not cost-effective: you either pay for resources you don't use, or you risk not having enough resources to handle the load. Fortunately, there is a third way: horizontal autoscaling. Horizontal autoscaling is the process of dynamically adjusting the number of instances of a service based on the current load. This way, you only pay for the resources you use, and you can handle load spikes without any manual intervention.

Read Post

Koyeb

Read more about The engineering behind autoscaling with HashiCorp's Nomad on a global serverless platform

A Software Engineer's Tips and Tricks #3: CPU Utilization Is Not Always What It Seems

May 3, 2024 By Julien Castets In Koyeb

Hey there! We're back for our third edition of Tips and Tricks. As we said in our first posts on Drizzle ORM and Template Databases in PostgreSQL, our new Tips and Tricks mini blog series is going to share some helpful insights and cool tech that we've stumbled upon while working on technical stuff. Today's topic is short and sweet. It'll be on CPU utilization and what that metric indicates. If you enjoy it and want to learn more, I encourage you to check out the "further reading" links.

Read Post

Koyeb

Read more about A Software Engineer's Tips and Tricks #3: CPU Utilization Is Not Always What It Seems

Serverless GPUs in Private Preview: L4, L40S, V100, and more

Apr 30, 2024 By Yann Léger In Koyeb

Today, we’re excited to share that Serverless GPUs are available for all your AI inference needs directly through the Koyeb platform! We're starting with GPU Instances designed to support AI inference workloads including both heavy generative AI models and lighter computer vision models. These GPUs provide up to 48GB of vRAM, 733 TFLOPS and 900GB/s of memory bandwidth to support large models including LLMs and text-to-image models.

Read Post

Koyeb

Read more about Serverless GPUs in Private Preview: L4, L40S, V100, and more

A Software Engineer's Tips and Tricks #2: Template Databases in PostgreSQL

Apr 26, 2024 By Julien Castets In Koyeb

Hey there! We're back for our second edition of Tips and Tricks. As we said in our first post on Drizzle ORM, our new Tips and Tricks mini blog series is going to share some helpful insights and cool tech that we've stumbled upon while working on technical stuff. Today, we're going to talk about the template databases of PostgreSQL. Remember, these posts will be super short reads. If you don’t like the topic of one of the posts, no problem! Just skip it and check out the next one.

Read Post

Koyeb

Read more about A Software Engineer's Tips and Tricks #2: Template Databases in PostgreSQL

What are LLMs? An intro into AI, models, tokens, parameters, weights, quantization and more

Apr 25, 2024 By Alisdair Broshar In Koyeb

To keep up with everything happening in the world of artificial intelligence, it helps to understand and grasp key terms and concepts behind the technology. In this introduction, we are going to dive into what is generative AI, looking at the technology and models they are built on. We'll discuss how these models are built, trained, and deployed into the world.

Read Post

Koyeb

Read more about What are LLMs? An intro into AI, models, tokens, parameters, weights, quantization and more

Systems | Development | Analytics | API | Testing

Latest Posts

Volumes Technical Preview: Blazing-fast NVMe SSD for Your Data

GPUs Public Preview: Run AI workloads on H100, A100, L40S, and more

Autoscaling GA: Scale Fast, Sleep Well, Don't Break the Bank

Best LLM Inference Engines and Servers to Deploy LLMs in Production

A Software Engineer's Tips and Tricks #4: Collaborating on Visual Studio Code with Live Share

The engineering behind autoscaling with HashiCorp's Nomad on a global serverless platform

A Software Engineer's Tips and Tricks #3: CPU Utilization Is Not Always What It Seems

Serverless GPUs in Private Preview: L4, L40S, V100, and more

A Software Engineer's Tips and Tricks #2: Template Databases in PostgreSQL

What are LLMs? An intro into AI, models, tokens, parameters, weights, quantization and more

Monthly Archive

Follow Us