I'm in love with Guanajuato

Guanajuato is my favourite city in Mexico. The centro historico is built into a valley, and many of the roads are tunneled through the mountains that surround it. The city itself has been built and rebuilt upon its former self, with a subterranean level comprised of the haciendas that once lined the Guanajatu River.

Lens Model	iPhone 14 Pro back triple camera 6.86mm f/1.78
ISO	80
Exposure Time	1/121
Image Size	4032x3024
GPS Position	21.018317 N, 101.253494 W
GPS Altitude	2058.7 m Above Sea Level
GPS Img Direction	159.2975464
Create Date	2023:01:03 18:22:08

My first thought when I got here was, why? Why tunnel through three kilometers of mountain to build a town here? The answer is silver. The area around Guanajuato accounted for more than two-thirds of the world’s silver production leading into the 19th century. The primary mine—La Valenciana—is still operational.

Guanajuato was also the site of the first battle of the Mexican War of Independence. You can still see the bullet holes in the Alhóndiga de Granaditas (grain exchange building next to the central mercado).

Guanajuato is about 4 to 5 hours by bus from Mexico City and well worth the trip. It’s also a very reasonable day trip (and beautiful drive) from San Miguele de Allende.

Lens Model	iPhone 14 Pro back triple camera 6.86mm f/1.78
ISO	80
Exposure Time	1/7576
Image Size	4032x3024
GPS Position	21.016833 N, 101.254242 W
GPS Altitude	2024 m Above Sea Level
GPS Img Direction	116.5262318
Create Date	2023:01:03 12:41:27

Lens Model	iPhone 14 Pro back triple camera 6.86mm f/1.78
ISO	80
Exposure Time	1/1779
Image Size	4032x3024
GPS Position	21.019817 N, 101.250581 W
GPS Altitude	2076.3 m Above Sea Level
GPS Img Direction	260.5706482
Create Date	2023:01:03 18:07:53

Lens Model	iPhone 14 Pro back triple camera 6.86mm f/1.78
ISO	80
Exposure Time	1/1241
Image Size	4032x3024
GPS Position	21.014539 N, 101.254303 W
GPS Altitude	2085.6 m Above Sea Level
GPS Img Direction	40.99349974
Create Date	2023:01:04 17:57:19

Lens Model	iPhone 14 Pro back triple camera 6.86mm f/1.78
ISO	80
Exposure Time	1/604
Image Size	4032x3024
GPS Position	21.016881 N, 101.256767 W
GPS Altitude	2014.6 m Above Sea Level
GPS Img Direction	104.9418297
Create Date	2023:01:05 11:05:21

Full resolution photos are available below the fold.

Note: I’ve edited this post. It was first posted on January 4, 2023 with only the leading image. When I first posted the picture, it was in the early days of ChatGPT and I was of course excited to try to generate some text with it. I did so with a short description of Guanajuato City as a caption, and with the benefit of hindsight I’m no longer comfortable polluting the internet with content I didn’t write myself. All of the photos beyond the first were published on March 23, 2024 and the text is my own.

2024-03-23

Mexico 2021-2023

Pulling a Smartsheet table into Microsoft Excel using Power Query

Well if you thought my first post in eight months would be exotic, go ahead and smash that back button.

I use this technique when we have one-off assignments at work where I need a quick and dirty web-based data store that several people can collaborate on, and that can be easily queried in Excel without any intermediate infrastructure or processing. This would be quite trivial if not for Smartsheet’s intractible API format.

Assuming you have a Smartsheet grid you want to mirror in Excel and that can be refreshed on the fly, you’ll need the sheet’s ID and an API bearer token for a user with viewer permissions.

In Excel, open Power Query and create a new query using the advanced editor. Make sure to replace $SHEET_ID and $BEARER_TOKEN. The query will bring in both your data and column headers.

let
    Source = Json.Document(
        Web.Contents("https://api.smartsheet.com/2.0/sheets/$SHEET_ID", [
            Headers=[
                #"Content-Type"="application/json",
                Authorization="Bearer $BEARER_TOKEN"
            ]
        ])
    ),

    // Process rows
    RowsData = Source[rows],
    RowsTable = Table.FromList(RowsData, Splitter.SplitByNothing()),
    ExpandedRows = Table.ExpandRecordColumn(
        RowsTable,
        "Column1",
        {"id", "rowNumber", "expanded", "createdAt", "modifiedAt", "cells", "siblingId"},
        {"ID", "RowNumber", "Expanded", "CreatedAt", "ModifiedAt", "Cells", "SiblingId"}
    ),
    ExpandCells = Table.ExpandListColumn(ExpandedRows, "Cells"),
    ExpandedCellsDetails = Table.ExpandRecordColumn(
        ExpandCells,
        "Cells",
        {"columnId", "value", "displayValue"},
        {"ColumnID", "CellValue", "CellDisplayValue"}
    ),
    RemovedCellsMetaColumns = Table.RemoveColumns(
        ExpandedCellsDetails,
        {"ID", "Expanded", "CreatedAt", "ModifiedAt", "CellDisplayValue", "SiblingId"}
    ),
    PivotedCellsByColumnId = Table.Pivot(
        Table.TransformColumnTypes(RemovedCellsMetaColumns, {{"ColumnID", type text}}),
        List.Distinct(Table.TransformColumnTypes(RemovedCellsMetaColumns, {{"ColumnID", type text}})[ColumnID]),
        "ColumnID",
        "CellValue"
    ),
    CleanRowData = Table.RemoveColumns(PivotedCellsByColumnId, {"RowNumber"}),

    // Process columns
    ColumnsData = Source[columns],
    ColumnsTable = Table.FromList(ColumnsData, Splitter.SplitByNothing()),
    ExpandedColumns = Table.ExpandRecordColumn(
        ColumnsTable,
        "Column1",
        {"id", "title"},
        {"ColumnID", "ColumnTitle"}
    ),
    ColumnTitlesMapped = Table.Pivot(
        Table.TransformColumnTypes(ExpandedColumns, {{"ColumnID", type text}}),
        List.Distinct(Table.TransformColumnTypes(ExpandedColumns, {{"ColumnID", type text}})[ColumnID]),
        "ColumnID",
        "ColumnTitle"
    ),

    // Add headers
    CombinedDataTable = Table.Combine({ColumnTitlesMapped, CleanRowData}),
    FinalData = Table.PromoteHeaders(CombinedDataTable)

in
    FinalData

2024-03-07

Technology

How to checkout and edit a pull request locally

Let’s say you have a dependabot pull request and Charlie Marsh has added a new check to Ruff that causes your lint check to fail. You can fix the lint error and push the changes back to the pull request branch!

First, checkout the pull request locally:

# In this case, I'm updating ruff to v0.0.278
git fetch origin dependabot/pip/ruff-0.0.278
git switch --track origin/dependabot/pip/ruff-0.0.278

We’ve now checked out the PR branch and set it to track the remote. We can use this pattern to keep tabs on long-running PRs, or as in this case, simply push an additional patch before merging. If you’d like a more friendly local branch name, you can append the :my-branch-name to the end of the git fetch call, and then call git switch my-branch-name to check it out; just keep in mind that this won’t set the local branch to track the remote.

In my case, this ruff release does not provide any new rule categories and my lints still pass, however I’d like to update the ruff version in my .pre-commit-config.yaml file so that it’s consistent with my requirements.txt. I’ll make that change, commit and push back to the remote.

git add .pre-commit-config.yaml
git commit -m "Update pre-commit config."
git push

At this point, your checks should fire again and you can merge using your preferred merge method into your trunk. Check out real pull request to see how this looks server side.

2023-07-16

Technology

#git

Running a local Kubernetes cluster with Kind: A step-by-step guide

It has happened. I thought I could avoid it, but here we are. As if getting your program to run on one computer wasn’t hard enough, now we have to run it on multiple computers at the same time? They have played us for absolute fools.

Anyway, assuming we have some shared experience with Docker, let’s introduce some terminology:

A Pod is the smallest deployable unit in Kubernetes, often a single instance of an application. As I understand it, a pod is the logical equivalent of a container.
Nodes are the machines that host these pods. More nodes allow for more redundancy.
A Cluster is a set of nodes with the same job. A cluster can run multiple nodes, and a node can run multiple pods, and a pod typically consists of between two and fifteen orca whales.
A Service is an abstraction which provides a single network entry point to distribute traffic across the cluster.

For local development I am using Kind, a tool which allows you to run Kubernetes clusters in Docker containers. It is a lightweight way to run docker containers inside kubernetes inside a docker container (pause for effect).

The command to create a cluster is: kind create cluster

To deploy the application, it needs to be packaged as a Docker image. After creating the Dockerfile, the image is built and loaded into the Kind cluster with the following commands:

docker build -t my-image-name .

kind load docker-image my-image-name

I should note that in addition to Kind, there is a tool called minikube which is similar, though it requires you to set up a container registry.

The next step is creating a deployment and a service for the application by creating kubernetes manifest files in your project directory. The simplest possible configuration is something like so:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-image-name-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-image-name
  template:
    metadata:
      labels:
        app: my-image-name
    spec:
      containers:
        - name: my-image-name
          image: my-image-name
          imagePullPolicy: Never # Use for local image
          ports:
            - containerPort: 8000 # Use the port your application runs on

Note that the imagePullPolicy is set to Never because we are using a local image with the implied tag latest. Specifying a specific tag should make this unnecessary, otherwise the default behaviour is to try to pull the image from Docker Hub, which will fail each time (or worse, deploy something unexpected).

In addition to matching the exposed port of the container, your application should be configured to bind to any incoming address (0.0.0.0), not just localhost.

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: my-image-name-service
spec:
  type: NodePort
  ports:
    - port: 8000
      nodePort: 30080
  selector:
    app: my-image-name

With these files in place, we can create the deployment and service respectively using kubectl apply -f <file-name> for each. They can be verified using: kubectl get deployments and kubectl get services.

If there are any issues, logs can be checked using: kubectl logs <pod-name>, and the pod name can be found using kubectl get pods.

Remember to specify environment variables in the deployment.yaml file under env in the containers specification if your application requires them.

If you’re running docker inside a linux virtual machine, port 30080 should already be exposed. If you’re running using Docker Desktop, there’s one more step which requires forwarding a local port to the service port. This can be done using:

kubectl port-forward service/my-image-name-service 30080:8000

This will map the service to localhost:30080 on your local machine. Launch it in tmux or append the command with an ampersand as it will block the terminal otherwise.

Fin. Now deploy to prod on a Friday afternoon and you’re done!

2023-06-20

Technology

Notes from Stephen Wolfram's ChatGPT primer

Source Material: Stephen Wolfram, 2023-02-14

ChatGPT is a large-scale transformer-based language model that is designed to predict the next word in a sentence given the context of what has been said. It is a neural network with 175 billion parameters that has been trained on a vast corpus of text, enabling it to form and apply a semantic structure to human language.

The name ChatGPT stands for Generative Pre-trained Transformer. Generative means that the model is capable of generating new text rather than just recognizing patterns in existing text. Pre-trained means that the model has been trained on a large corpus of text before being fine-tuned for a specific task. Transformer refers to the specific type of neural network architecture which is designed to better handle long-term dependencies between words in a sentence.

To accomplish its task, ChatGPT uses a technique known as unsupervised learning, which allows it to learn patterns in the data without being explicitly taught. Instead of being trained on explicit examples of inputs and their associated outputs like in supervised learning, the model is given a large corpus of text and is trained to predict the next word in a sentence by masking the latter part of the sentence and having it predict what should come next. It then compares what it generated with the masked text, and iteratively adjusts its parameters to minimize the error.

To evaluate how well the model performs on each iteration, a loss function is used. The loss function calculates how far away the model’s predictions are from the desired outcome, and the neural net weights are adjusted in a way that minimizes the result of the loss function.

Training the model both optimizes the neural net weights and produces embeddings, which are a way of representing the meaning of words as arrays of numbers (in the vague, undefinable sense of ‘meaning’). Nearby words are represented by nearby numbers. ChatGPT takes this concept further by generating embeddings not just for individual words, but for entire sequences of words.

These embeddings are then used to predict the probabilities of different words that might come next in a sentence. This is accomplished using a transformer architecture, which is designed to better handle dependencies between tokens in the input and output even when they are far from each other in the input sequence. One of the defining features of the transformer is its use of an attention mechanism, which involves certain neurons focusing more on relevant parts of the sequence than others. This allows ChatGPT to take into account the context of the conversation that’s taken place, which can inform the next token that’s generated. The attention mechanism also allows ChatGPT to capture context from the prior conversation even when that context is not adjacent to the token being generated. This is the main reason ChatGPT comes across as a coherent entity.

Finally, ChatGPT uses a temperature setting to introduce a degree of randomness into its predictions, which can make the output more diverse and interesting.

What strikes me as the most profound point Stephen makes is the success of ChatGPT as a scientific discovery in that it shows that there may be simple rules that describe how the semantics of human language can be arranged that we ourselves don’t yet understand. Studying the pathways and structures ChatGPT uses could help deepen our own understanding of human language.

2023-04-07

Technology

To older posts