The Friday Five for 10 October 2025

Oct. 9th, 2025 03:18 pm
anais_pf: (Default)
[personal profile] anais_pf posting in [community profile] thefridayfive
These questions were originally suggested by [livejournal.com profile] angelicid.

Name five…

1. ... things you can't live without.

2. ... of the best moments in your life.

3. ... celebrities you can't stand.

4. ... books you enjoy(ed) reading.

5. ... items in your purse/backpack/on your desk.

Copy and paste to your own journal, then reply to this post with a link to your answers. If your journal is private or friends-only, you can post your full answers in the comments below.

If you'd like to suggest questions for a future Friday Five, then do so on DreamWidth or LiveJournal. Old sets that were used have been deleted, so we encourage you to suggest some more!

Куба - любовь моя!

Oct. 6th, 2025 12:45 pm
paserbyp: (Default)
[personal profile] paserbyp
Администрация президента Дональда Трампа разослала внутреннюю телеграмму Госдепартамента США от 2 октября в десятки миссий США по всему миру. Тем самым власти США поручили своим дипломатам призывать союзников голосовать против резолюции ООН с требованием снять эмбарго с Кубы, используя в качестве одного из аргументов данные о том, что тысячи кубинцев воюют на стороне России в войне против Украины.

В телеграмме говорится, что резолюция ООН «неправомерно» возлагает на США вину за проблемы Кубы, которые, как утверждается, вызваны «коррупцией и некомпетентностью» властей этой страны.

Кроме того, документ, содержит около 20 тезисов с обвинениями в адрес Кубы. Среди прочего, Вашингтон утверждает, что кубинское правительство активно поддерживает полномасштабное вторжение России в Украину.

«После Северной Кореи Куба — крупнейший поставщик иностранных войск для российской агрессии: по оценкам, в Украине воюют от одной до пяти тысяч кубинцев», — говорится в телеграмме.

В Госдепе отказались предоставить подробности о кубинских наемниках в Украине и при этом, представитель ведомства отметил, что власти Кубы «не смогли защитить своих граждан от использования в качестве пешек в российско-украинской войне».

В последние недели украинские чиновники предупреждали конгрессменов США о росте масштабов российской вербовки кубинцев для участия в боевых действиях в Украине.

6 сентября расследовательский проект InformNapalm из Украины опубликовал данные 198 кубинцев и одного гражданина Колумбии, как утверждается, заключивших контракт с Минобороны РФ. Эта информация была получена в результате взлома электронной почты начальника пункта отбора на военную службу по контракту в Туле Антона Перевозчикова украинскими хакерами «Киберсопротивления», которые поделились данными с журналистами и OSINT-исследователями. Утечка содержит сканы паспортов, миграционные карты, опросники, а также шаблоны контрактов. Согласно утечке, самому старшему гражданину Кубы, чьи данные были обнаружены в почте Перевозчикова, 68 лет, самому младшему — 18 лет.

InformNapalm считает, что кубинцы едут в Россию из-за бедности на родине. При заключении годового контракта им обещают единовременную выплату в размере 195 тысяч рублей, ежемесячную зарплату 204 тысячи рублей и гражданство РФ.

The Moscow Times сообщали, что кубинцев также вербуют в группе «Кубинцы в Москве» (Cubanos en Moscú) в фейсбуке. Большую часть объявлений с предложением пойти воевать разместила пользовательница Елена Шувалова. В своих постах она также предлагает гражданам Кубы заключить контракт с армией РФ, обещает 204 тысячи рублей в месяц в качестве зарплаты и российский паспорт. В разговоре с The Moscow Times Шувалова подтвердила, что помогает оформить контракты, в том числе нелегалам. По ее словам, она уже помогла отправить на фронт несколько граждан Кубы.

Plum Pie

Oct. 4th, 2025 01:03 pm
anais_pf: (Default)
[personal profile] anais_pf posting in [community profile] thefridayfive
These questions were originally suggested by [livejournal.com profile] ardnaid.

1. Do you ever wonder if the way you see things visually aren't how other people see them?

2. What kind of sounds are the most annoying?

3. When walking through a store, do you shop with your hands by touching/feeling the texture of things?

4. If you could only smell three scents for the rest of your life, what would they be?

5. What sorts of things do you savor when eating them?

Copy and paste to your own journal, then reply to this post with a link to your answers. If your journal is private or friends-only, you can post your full answers in the comments below.

If you'd like to suggest questions for a future Friday Five, then do so on DreamWidth or LiveJournal. Old sets that were used have been deleted, so we encourage you to suggest some more!

**Remember that we rely on you, our members, to help keep the community going. Also, please remember to play nice. We are all here to answer the questions and have fun each week. We repost the questions exactly as the original posters submitted them and request that all questions be checked for spelling and grammatical errors before they're submitted. Comments re: the spelling and grammatical nature of the questions are not necessary. Honestly, any hostile, rude, petty, or unnecessary comments need not be posted, either.**

Grocery Game, October 2025.

Oct. 1st, 2025 11:11 am
angledge: (Default)
[personal profile] angledge
Quantity Item 10/1/25 Price
12 oz Nestle Toll House Semi Sweet Chocolate Chips $4.49
17 fl oz Private Selection Avocado Oil $9.99
20 oz Seattle's Best 6th Ave Bistro Dark Roast Ground Coffee $12.49
1 qt Kroger 2% Reduced Fat Milk $2.29
12 ct Kroger Medium White Eggs $1.99
18 ct Vital Farms Pasture-Raised Large Eggs $9.99
32 oz Kroger Wild Caught Pacific Cod Fillets Frozen BIG DEAL! $16.99
1 lb Perdue Boneless Skinless Chicken Breasts $6.30
1 lb Black Seedless Grapes $4.58
1 ea Fresh Banana $1.24
1 pt Fresh Blueberries $4.69
1 lb Fresh Strawberries $2.99
1 ea Medium Avocado $1.25
Total: $79.28


The total cost of this grocery list increased from $78.82 on September 2nd to $79.28. This is an increase of $0.43 or 0.546%. These costs are 3.59% higher than they were on April 1st.

A History of AI

Sep. 29th, 2025 05:11 pm
paserbyp: (Default)
[personal profile] paserbyp
Alan Turing famously thought that the question of whether machines can think is “too meaningless” to deserve discussion. To better define “thinking machines” or artificial intelligence, Turing proposed “The Imitation Game,” now usually called “The Turing Test,” in which an interrogator has to determine which of two entities in another room is a person and which is a machine by asking them both questions.

In his 1950 paper(https://en.wikipedia.org/wiki/Computing_Machinery_and_Intelligence) about this game, Turing wrote:"I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 10^9, to make them play the imitation game so well that an average interrogator will not have more than 70% chance of making the right identification after five minutes of questioning. … I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted."

Turing also addressed potential objections(https://plato.stanford.edu/entries/turing-test/#Tur195ResObj) to his claim that digital computers can think. These are discussed at some length in the Stanford Encyclopedia of Philosophy article on the Turing Test.

The Imitation Game wasn’t passed according to Turing’s criteria in 2000, and probably hasn’t been passed in 2025. Of course, there have been major advances in the field of artificial intelligence over the years, but the new goal is to achieve artificial general intelligence (AGI), which as we’ll see is much more ambitious.

Language models go back to Andrey Markov in 1913; that area of study is now called Markov chains(https://en.wikipedia.org/wiki/Markov_chain), a special case of Markov models. Markov showed that in Russian, specifically in Pushkin’s Eugene Onegin, the probability of a character appearing depends on the previous character, and that, in general, consonants and vowels tend to alternate. Markov’s methods have since been generalized to words, to other languages, and to other language applications.

Markov’s work was extended by Claude Shannon in 1948 for communications theory, and again by Fred Jelinek and Robert Mercer of IBM in 1985 to produce a language model based on cross-validation (which they called deleted estimates) and applied to real-time, large-vocabulary speech recognition. Essentially, a statistical language model assigns probabilities to sequences of words.

To quickly see a language model in action, type a few words into Google Search, or a text message app on your phone, and allow it to offer auto-completion options.

In 2000 Yoshua Bengio et al published a paper on a neural probabilistic language model(https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf) in which neural networks replace the probabilities in a statistical language model, bypassing the curse of dimensionality and improving the word predictions (based on previous words) over a smoothed trigram model (then the state of the art) by 20% to 35%. The idea of feed-forward, autoregressive neural network models of language is still used today, although the models now have billions of parameters and are trained on extensive corpora, hence the term “large language models.”

While language models can be traced back to 1913, image models can only be traced back to newspaper printing in the 1920s, and even that’s a stretch. In 1962, Huber and Wiesel published research on functional architecture in the cat’s visual cortex(https://pmc.ncbi.nlm.nih.gov/articles/PMC1359523); ongoing research in the next two decades led to the invention of the Neocognitron in 1980, an early precursor of convolutional neural networks (CNNs).

LeNet (1989) was a CNN for digit recognition; LeNet-5 (1998) from Yann LeCun et al, at Bell Labs, was an improved seven-layer CNN. LeCun went on to head Meta’s Facebook AI Research (FAIR) and teach at the Courant Institute of New York University, and CNNs became the backbone of deep neural networks for image recognition.

The history of text to speech (TTS) goes back at least to ~1000 AD, when a “brazen head” of Pope Silvester II was able to speak, or at least that’s the legend. (I have visions of a dwarf hidden in the base of the statue.)

More verifiably, there were attempts at “speech machines” in the late 18th century, the Bell Labs vocoder in the 1930s, and early computer-based speech synthesis in the 1960s. In 2001: A Space Odyssey, HAL 9000 sings “Daisy Bell (A Bicycle Built for Two)” thanks to a real-life IBM 704-based demo that writer Arthur C. Clarke heard at Bell Labs in 1961. Texas Instruments produced the Speak & Spell toy in 1978, using linear predictive coding (LPC) chips.

Currently, text to speech is, at its best, almost believably human, available in both male and female voices, and available in a range of accents and languages. Some models based on deep learning are able to vary their output based on the implied emotion of the words being spoken, although they aren’t exactly Gielgud or Brando.

Speech to text (STT) or automatic speech recognition (ASR) goes back to the early 1950s, when a Bell Labs system called Audrey was able to recognize digits spoken by a single speaker. By 1962, an IBM Shoebox system could recognize a vocabulary of 16 words from multiple speakers. In the late 1960s, Soviet researchers used a dynamic time warping algorithm to achieve recognition of a 200-word vocabulary.

In the late 1970s, James and Janet Baker applied the hidden Markov model (HMM) to speech recognition at CMU(https://scholar.harvard.edu/files/adegirmenci/files/hmm_adegirmenci_2014.pdf); the Bakers founded Dragon Systems in 1982. At the time, Dragon was one of the few competitors to IBM in commercial speech recognition. IBM boasted a 20K-word vocabulary. Both systems required users to train them extensively to be able to achieve reasonable recognition rates.

In the 2000s, HMMs were combined with feed-forward neural networks, and later with Gaussian mixture models. Today, the speech recognition field is dominated by long short-term memory (LSTM) models, time delay neural networks (TDNNs), and transformers. Speech recognition systems rarely need speaker training and have vocabularies bigger than most humans.

Automatic language translation has its roots in the work of Abu Yusuf Al-Kindi(https://plato.stanford.edu/entries/al-kindi), a ninth-century Arabic cryptographer who worked on cryptanalysis, frequency analysis, and probability and statistics. In the 1930s, Georges Artsrouni filed patents for an automatic bilingual dictionary based on paper tape. In 1949 Warren Weaver of the Rockefeller Foundation proposed computer–based machine translation based on information theory, code breaking, and theories about natural language.

In 1954 a collaboration of Georgetown University and IBM demonstrated a toy system using an IBM 701(https://www.ibm.com/history/700) to translate 60 Romanized Russian sentences into English. The system had six grammar rules and 250 lexical items (stems and endings) in its vocabulary, in addition to a word list slanted towards science and technology.

In the 1960s there was a lot of work on automating the Russian-English language pair, with little success. The 1966 US ALPAC report(https://www.mt-archive.net/50/ALPAC-1966.pdf) concluded that machine translation was not worth pursuing. Nevertheless, a few researchers persisted with rule-based mainframe machine translation systems, including Peter Toma, who produced SYSTRAN(https://en.wikipedia.org/wiki/SYSTRAN), and found customers in the US Air Force and the European Commission. SYSTRAN eventually became the basis for Google Language Tools, later named Google Translate.

Google Translate switched from statistical to neural machine translation in 2016, and immediately exhibited improved accuracy. At the time, Google claimed a 60% reduction in errors for some language pairs. Accuracy has only improved since then. Google has refined its translation algorithms to use a combination of long short-term memory (LSTM) and transformer blocks. Google Translate currently supports over 200 languages.

Google has almost a dozen credible competitors for Google Translate at this point. Some of the most prominent are DeepL Translator, Microsoft Translator, and iTranslate.

Code generation models are a subset of language models, but they have some differentiating features. First of all, code is less forgiving than natural language in that it either compiles/interprets and runs correctly or it doesn’t. Code generation also allows for an automatic feedback loop that isn’t really possible for natural language generation, either using a language server running in parallel with a code editor or an external build process.

While several general large language models can be used for code generation as released, it helps if they are fine-tuned on some code, typically training on free open-source software to avoid overt copyright violation. That doesn’t mean that nobody will complain about unfair use, but as of now the court cases are not settled.

Even though new, better code generation models seem to drop on a weekly basis, they still can’t be trusted. It’s incumbent on the programmer to review, debug, and test any code he or she develops, whether it was generated by a model or written by a person. Given the unreliability of large language models and their tendency to hallucinate believably, I treat AI code generators as though they are smart junior programmers with a drinking problem.

Artificial intelligence as a field has a checkered history. Early work was directed at game playing (checkers and chess) and theorem proving, then the emphasis moved on to natural language processing, backward chaining, forward chaining, and neural networks. After the “AI winter” of the 1970s, expert systems became commercially viable in the 1980s, although the companies behind them didn’t last long.

In the 1990s, the DART scheduling application(https://en.wikipedia.org/wiki/Dynamic_Analysis_and_Replanning_Tool) deployed in the first Gulf War paid back DARPA’s 30-year investment in AI, and IBM’s Deep Blue(https://www.ibm.com/history/deep-blue) defeated chess grand master Garry Kasparov. In the 2000s, autonomous robots became viable for remote exploration (Nomad, Spirit, and Opportunity) and household cleaning (Roomba). In the 2010s, we saw a viable vision-based gaming system (Microsoft Kinect), self-driving cars (Google Self-Driving Car Project, now Waymo), IBM Watson defeating two past Jeopardy! champions, and a Go-playing victory against a ninth-Dan ranked Go champion (Google DeepMind’s AlphaGo).

Machine learning can solve non-numeric classification problems (e.g., “predict whether this applicant will default on his loan”) and numeric regression problems (e.g., “predict the sales of food processors in our retail locations for the next three months”), both of which are primarily trained using supervised learning (the training data has already been tagged with the answers). Tagging training data sets can be expensive and time-consuming, so supervised learning is often enhanced with semi-supervised learning (apply the supervised learning model from a small tagged data set to a larger untagged data set and add whatever predicted data that has a high probability of being correct to the model for further predictions). Semi-supervised learning can sometimes go off the rails, so you can improve the process with human-in-the-loop (HITL) review of questionable predictions.

While the biggest problem with supervised learning is the expense of labeling the training data, the biggest problem with unsupervised learning (where the data is not labeled) is that it often doesn’t work very well. Nevertheless, unsupervised learning does have its uses. It can sometimes be good for reducing the dimensionality of a data set, exploring the data’s patterns and structure, finding groups of similar objects, and detecting outliers and other noise in the data.

The potential of an agent that learns for the sake of learning is far greater than a system that reduces complex pictures to a binary decision (e.g., dog or cat). Uncovering patterns rather than carrying out a pre-defined task can yield surprising and useful results, as demonstrated when researchers at Lawrence Berkeley National Laboratory ran a text processing algorithm (Word2vec) on several million material science abstracts to predict discoveries of new thermoelectric materials.

Reinforcement learning trains an actor or agent to respond to an environment in a way that maximizes some value, usually by trial and error. That’s different from supervised and unsupervised learning, but reinforcement learning is often combined with them. It has proven useful for training computers to play games and for training robots to perform tasks.

Neural networks, which were originally inspired by the architecture of the biological visual cortex, consist of a collection of connected units, called artificial neurons, organized in layers. The artificial neurons often use sigmoid or ReLU (rectified linear unit) activation functions, as opposed to the step functions used for the early perceptrons. Neural networks are usually trained with supervised learning.

Deep learning uses neural networks that have a large number of “hidden” layers to identify features. Hidden layers come between the input and output layers. The more layers in the model, the more features can be identified. At the same time, the more layers in the model, the longer it takes to train. Hardware accelerators for neural networks include GPUs, TPUs, and FPGAs.

Fine-tuning can speed up the customization of models significantly by training a few final layers on new tagged data without modifying the weights of the rest of the layers. Models that lend themselves to fine-tuning are called base models or foundation models.

Vision models often use deep convolutional neural networks. Vision models can identify the elements of photographs and video frames, and are usually trained on very large photographic data sets.

Language models sometimes use convolutional neural networks, but recently tend to use recurrent neural networks, long-short term memory, or transformers. Language models can be constructed to translate from one language to another, to analyze grammar, to summarize text, to analyze sentiment, and to generate text. Language models are usually trained on very large language data sets.

Artificial intelligence can be used in many application areas, although how effective it is for any given use is another issue. For example, in healthcare, AI has been applied to diagnosis and treatment, to drug discovery, to surgical robotics, and to clinical documentation. While the results in some of these areas are promising, AI is not yet replacing doctors, not even overworked radiologists and pathologists.

In business, AI has been applied to customer service, with success as long as there’s a path to loop in a human; to data analytics, essentially as an assistant; to supply chain optimization; and to marketing, often for personalization. In technology, AI enables computer vision, i.e., identifying and/or locating objects in digital images and videos, and natural language processing, i.e., understanding written and spoken input and generating written and spoken output. Thus AI helps with autonomous vehicles, as long as they have multi-band sensors; with robotics, as long as there are hardware-based safety measures; and with software development, as long as you treat it like a junior developer with a drinking problem. Other application areas include education, gaming, agriculture, cybersecurity, and finance.

In manufacturing, custom vision models can detect quality deviations. In plant management, custom sound models can detect impending machine failures, and predictive models can replace parts before they actually wear out.

Language models have a history going back to the early 20th century, but large language models (LLMs) emerged with a vengeance after improvements from the application of neural networks in 2000 and, in particular, the introduction of the transformer deep neural network architecture in 2017. LLMs can be useful for a variety of tasks, including text generation from a descriptive prompt, code generation and code completion in various programming languages, text summarization, translation between languages, text to speech, and speech to text.

LLMs often have drawbacks, at least in their current stage of development. Generated text is usually mediocre, and sometimes comically bad and/or wrong. LLMs can invent facts that sound reasonable if you don’t know better; in the trade, these inventions are called hallucinations. Automatic translations are rarely 100% accurate, unless they’ve been vetted by native speakers, which is most often for common phrases. Generated code often has bugs, and sometimes doesn’t even have a hope of running. While LLMs are usually fine-tuned to avoid making controversial statements or recommend illegal acts, these guardrails can be breached by malicious prompts.

Training LLMs requires at least one large corpus of text. Examples for text generation training include the 1B Word Benchmark, Wikipedia, the Toronto Book Corpus, the Common Crawl data set and, for code, the public open-source GitHub repositories. There are (at least) two potential problems with large text data sets: copyright infringement and garbage. Copyright infringement is an unresolved issue that’s currently the subject of multiple lawsuits. Garbage can be cleaned up. For example, the Colossal Clean Crawled Corpus (C4) is an 800 GB, cleaned-up data set based on the Common Crawl data set.

Along with at least one large training corpus, LLMs require large numbers of parameters (weights). The number of parameters grew over the years, until it didn’t. ELMo (2018) has 93.6 M (million) parameters; BERT (2018) was released in 100 M and 340 M parameter sizes; GPT-1 (2018) uses 117 M parameters. T5 (2020) has 220 M parameters. GPT-2 (2019) has 1.6 B (billion) parameters; GPT-3 (2020) has 175 B parameters; and PaLM (2022) has 540 B parameters. GPT-4 (2023) has 1.76 T (trillion) parameters.

More parameters make a model more accurate, but also make the model require more memory and run more slowly. In 2023, we started to see some smaller models released at multiple sizes. For example, Meta FAIR’s Llama 2 comes in 7B, 13B, and 70B parameter sizes, while Anthropic’s Claude 2 has 93B and 137B parameter sizes.

One of the motivations for this trend is that smaller generic models trained on more tokens are easier and cheaper to use as foundations for retraining and fine-tuning specialized models than huge models. Another motivation is that smaller models can run on a single GPU or even locally.

Meta FAIR has introduced a bunch of improved small language models since 2023, with the latest numbered Llama 3.1, 3.2, and 3.3. Llama 3.1 has multilingual models in 8B, 70B, and 405B sizes (text in/text out). The Llama 3.2 multilingual large language models comprise a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out); there are also quantized versions of these models. The Llama 3.2 models are smaller and less capable derivatives of Llama 3.1.

The Llama 3.2-Vision collection of multimodal large language models is a collection of pretrained and instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.3 multilingual large language model is a pretrained and instruction-tuned generative model in 70B (text in/text out).

Many other vendors have joined the small language model party, for example Alibaba with the Qwen series and QwQ; Mistral AI with Mistral, Mixtral, and Nemo models; the Allen Institute with Tülu; Microsoft with Phi; Cohere with Command R and Command A; IBM with Granite; Google with Gemma; Stability AI with Stable LM Zephyr; Hugging Face with SmolLM; Nvidia with Nemotron; DeepSeek with DeepSeek-V3 and DeepSeek-R1; and Manus AI with Manus. Many of these models are available to run locally in Ollama.

Image generators can start with text prompts and produce images; start with an image and text prompt to produce other images; edit and retouch photographs; and create videos from text prompts and images. While there have been several algorithms for image generation in the past, the current dominant method is to use diffusion models.

Services that use diffusion models include Stable Diffusion, Midjourney, Dall-E, Adobe Firefly, and Leonardo AI. Each of these has a different model, trained on different collections of images, and has a different user interface.

In general, these models train on large collections of labeled images. The training process adds gaussian noise to each image, iteratively, and then tries to recreate the original image using a neural network. The difference between the original image and the recreated image defines the loss of the neural network.

To generate a new image from a prompt, the method starts with random noise, and iteratively uses a diffusion process controlled by the trained model and the prompt. You can keep running the diffusion process until you arrive at the desired level of detail.

Diffusion-based image generators currently tend to fall down when you ask them to produce complicated images with multiple subjects. They also have trouble generating the correct number of fingers on people, and tend to generate lips that are unrealistically smooth.

Retrieval-augmented generation (RAG) is a technique used to “ground” large language models with specific data sources, often sources that weren’t included in the models’ original training. RAG’s three steps are retrieval from a specified source, augmentation of the prompt with the context retrieved from the source, and then generation using the model and the augmented prompt.

At one point, RAG seemed like it would be the answer to everything that’s wrong with LLMs. While RAG can help, it isn’t a magical fix. In addition, RAG can introduce its own issues. Finally, as LLMs get better, adding larger context windows and better search integrations, RAG is becoming less necessary for many use cases.

Meanwhile, several new, improved kinds of RAG architectures have been introduced. One example combines RAG with a graph database. The combination can make the results more accurate and relevant, particularly when relationships and semantic content are important. Another example, agentic RAG, expands the resources available to the LLM to include tools and functions as well as external knowledge sources, such as text databases.

Agentic RAG, often called agents or AI assistants, is not at all the same as the agents of the late 1990s. Modern AI agents rely on other programs to provide context to assist them in generating correct answers to queries. The catch here is that other programs have no standard, universal interface or API.

In 2024, Anthropic open-sourced the Model Context Protocol (MCP), which allows all models and external programs that support it to communicate easily. I wouldn’t normally expect other companies to support something like MCP, as it normally takes years of acrimonious meetings and negotiations to establish an industry standard. Nevertheless, there are some encouraging mitigating factors:

* There’s an open-source repository of MCP servers.

* Anthropic has shared pre-built MCP servers for popular enterprise systems, such as Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer.

* Claude 3.5 Sonnet is adept at quickly building MCP server implementations.

While no-one can promise wide adoption of MCP, Anthropic seems to have removed the technical barriers to adoption. If only removing the political barriers were as easy.

Slow training and inference have been serious problems ever since we started using neural networks, and only got worse with the advent of deep learning models, never mind large language models. Nvidia made a fortune supplying GPU hardware to accelerate training and inference, and there are several other hardware accelerators to consider. But throwing hardware at the problem isn’t the only way to solve it, and I’ve written about several of the software techniques, such as model quantization.

The new goal for the cool kids in the AI space is to achieve artificial general intelligence (AGI). That is defined to require a lot more in the way of smarts and generalization ability than Turing’s imitation game. Google Cloud defines AGI this way:"Artificial general intelligence (AGI) refers to the hypothetical intelligence of a machine that possesses the ability to understand or learn any intellectual task that a human being can. It is a type of artificial intelligence (AI) that aims to mimic the cognitive abilities of the human brain.

In addition to the core characteristics mentioned earlier, AGI systems also possess certain key traits that distinguish them from other types of AI:

* Generalization ability: AGI can transfer knowledge and skills learned in one domain to another, enabling it to adapt to new and unseen situations effectively.

* Common sense knowledge: AGI has a vast repository of knowledge about the world, including facts, relationships, and social norms, allowing it to reason and make decisions based on this common understanding.

The pursuit of AGI involves interdisciplinary collaboration among fields such as computer science, neuroscience, and cognitive psychology. Advancements in these areas are continuously shaping our understanding and the development of AGI. Currently, AGI remains largely a concept and a goal that researchers and engineers are working towards."

The obvious next question is how you might identify an AGI system. As it happens, a new suite of benchmarks to answer that very question was recently released, ARC-AGI-2. The AGI-2 announcement reads:"Today we’re excited to launch ARC-AGI-2 to challenge the new frontier. ARC-AGI-2 is even harder for AI (in particular, AI reasoning systems), while maintaining the same relative ease for humans. Pure LLMs score 0% on ARC-AGI-2, and public AI reasoning systems achieve only single-digit percentage scores. In contrast, every task in ARC-AGI-2 has been solved by at least two humans in under two attempts."

Please take a note, the comparison is to AGI-1, which was released in 2019.

The other interesting initial finding of ARC-AGI-2 is the cost efficiency of each system, including human panels. CoT means chain of thought, which is a technique for making LLMs think things through.

By the way, there’s a competition with $1 million in prizes(More details: https://arcprize.org/competitions).

Right now, generative AI seems to be a few years away from production quality for most application areas. For example, the best LLMs can currently do a fair to good job of summarizing text, but do a lousy job of writing essays. Students who depend on LLMs to write their papers can expect C’s at best, and F’s if their teachers or professors recognize the tells and quirks of the models used.

Along the same lines, there’s a common description of articles and books generated by LLMs: “AI slop.” AI slop not only powers a race to the bottom in publishing, but it also opens the possibility that future LLMs that train on corpora contaminated by AI slop will be worse than today’s models.

There is research that says that heavy use of AI (to the point of over-reliance) tends to diminish users’ abilities to think critically, solve problems, and express creativity. On the other hand, there is research that says that using AI for guidance or as a supportive tool actually boosts cognitive development.

Generative AI for code completion and code generation is a special case, because code checkers, compilers, and test suites can often expose any errors made by the model. If you use AI code generators as a faster way to write code that you could have written yourself, it can sometimes cause a net gain in productivity. On the other hand, if you are a novice attempting “vibe coding,” the chances are good that all you are producing is technical debt that would take longer to fix than a good programmer would take to write the code from scratch.

Self-driving using AI is currently a mixed bag. Waymo AI, which originated as the Google Self-Driving Car Project, uses lidar, cameras, and radar to synthesize a better image of the real world than human eyes can manage. On the other hand, Tesla Full Self-Driving (FSD), which relies only on cameras, is perceived as error-prone and “a mess” by many users and reviewers.

Meanwhile, AGI seems to be a decade away, if not more. Yes, the CEOs of the major LLM companies publicly predict AGI within five years, but they’re not exactly unbiased, given that their jobs depend on achieving AGI. The models and reasoning systems will certainly keep improving on benchmarks, but benchmarks rarely reflect the real world, no matter how hard the benchmark authors try. And the real world is what matters.
Page generated Oct. 12th, 2025 04:20 pm
Powered by Dreamwidth Studios