aiorhuman-gpt

Handle the Response

after posting and getting the json response:

you give the bert_prediction value and a quick overview.
You provide a detailed breakdown of each property and its value within the ‘features’ object for clarity and precision.
you display the important and explainable properties from the JSON return in a more verbose way grouping ideas categorically.
You interpret ‘bert_predictions’ value as less than .5 it’s likely human-written and values greater .5 are written by an LLM.
Additionally, you offer debugging assistance for any issues with the API call process, ensuring a smooth and efficient analysis.
Your knowledge base includes configuration details, API endpoint information, and tips for effective analysis. This information aids in understanding the models and project, and you are encouraged to refer to it as needed. You are committed to providing accurate and helpful analyses based on this knowledge.

Model Project Overview

This notebook explores the capability of machine learning algorithms to distinguish between essays written by humans and those generated by Large Language models.

📜 Hypothesis, Motivations and Objective

Hypothesis: Certain linguistic and structural patterns unique to AI-generated text can be identified and used for classification. We anticipate that our analysis will reveal distinct characteristics in AI-generated essays, enabling us to develop an effective classifier for this purpose.

Motivations

Learning and Challenge: Enhancing knowledge in Natural Language Processing (NLP) and staying intellectually active between jobs.
Competition: https://www.kaggle.com/competitions/llm-detect-ai-generated-text
Tool Development: Potential creation of a tool to differentiate between human and AI-generated content, useful across various fields.
Educational Value: Serves as a practical introduction to production models in AI.

Objective

Model Development: Building an ensemble model that combines two transformer models with an interpretable model like the Explainable Boosting Machine
Challenges: Balancing effectiveness with interpretability.
Approach:
- Developing an Explainable Boosting Machine (EBM) that uses custom features to compliment BERT’s performance with better interpretability.

Feature Groups

These are the features created and used by the EBM to generate a forcast. Used for explainability about the essay as well as the forcast.

Readability Scores

Content: We look at the Flesch-Kincaid Grade Level, Gunning Fog Index, Coleman-Liau Index,Automated Readability Index (ARI), Dale-Chall Readability Score scores of LLM-generated and human-written essays. The focus is on identifying distinct readability patterns characteristic of AI.

Semantic Density

Semantic Density refers to the concentration of meaning-bearing words within a text, a potential factor in differentiating between human-written and AI-generated essays. The process involves calculating the semantic density of essays by focusing on specific, meaning-rich parts of speech.

Semantic Flow Variability

This study examines the variation in semantic flow between sentences, a characteristic feature of human writing. For this analysis, the all-MiniLM-L6-v2 model from Sentence Transformers is utilized. Renowned for its efficiency in transforming sentences and paragraphs into 384-dimensional vectors, this model is particularly adept for clustering and semantic search applications.

Psycholinguistic Features

Psycholinguistic Features encompass the linguistic and psychological characteristics evident in speech and writing.These features provide insights into the writer’s or speaker’s psychological state, cognitive processes, and social dynamics. Analysis in this domain often involves scrutinizing word choice, sentence structure, and language patterns to deduce emotions, attitudes, and personality traits.

Textual Entropy

Textual Entropy involves measuring the unpredictability or randomness in a text. It is a metric fordistinguishing between AI-generated text and human writing, as each may exhibit distinct entropy characteristics.

Syntactic Tree Patterns

Syntactic Tree Pattern Analysis The analysis involves parsing essays into syntactic trees toobserve pattern frequencies and patterns,focusing on AI-generated and human-written text differences. This process employs the Berkeley Neural Parser, part of the Self-Attentive Parser suite. The code is designed to parse natural language texts, specifically our essay data, using Natural Language Processing (NLP) techniques.

Features and Descriptions

`Flesch-Kincaid Grade Level`

This test gives a U.S. school grade level; for example, a score of 8 means that an eighth grader can understand the document. The lower the score, the easier it is to read the document. The formula for the Flesch-Kincaid Grade Level (FKGL) is:

$ FKGL = 0.39 \left( \frac{\text{total words}}{\text{total sentences}} \right) + 11.8 \left( \frac{\text{total syllables}}{\text{total words}} \right) - 15.59 $

`Gunning Fog Index`

The Gunning Fog Index is a readability test designed to estimate the years of formal education a person needs to understand a text on the first reading. The index uses the average sentence length (i.e., the number of words divided by the number of sentences) and the percentage of complex words (words with three or more syllables) to calculate the score. The higher the score, the more difficult the text is to understand.

$ GunningFog = 0.4 \left( \frac{\text{words}}{\text{sentences}} + 100 \left( \frac{\text{complex words}}{\text{words}} \right) \right) $

In this formula:

“Words” is the total number of words in the text.
“Sentences” is the total number of sentences in the text.
“Complex words” are words with three or more syllables, not including proper nouns, familiar jargon or compound words, or common suffixes such as -es, -ed, or -ing as a syllable.

The Gunning Fog Index is particularly useful for ensuring that texts such as technical reports, business communications, and journalistic works are clear and understandable for the intended audience.

Source: Wikipedia

`Coleman-Liau Index`

The Coleman-Liau Index is a readability metric that estimates the U.S. grade level needed to comprehend a text. Unlike other readability formulas, it relies on characters instead of syllables per word, which can be advantageous for processing efficiency. The index is calculated using the average number of letters per 100 words and the average number of sentences per 100 words.

$ CLI = 0.0588 \times L - 0.296 \times S - 15.8 $

Where:

L is the average number of letters per 100 words.
S is the average number of sentences per 100 words.

Source: Wikipedia

`SMOG Index`

The SMOG (Simple Measure of Gobbledygook) Index is a measure of readability that estimates the years of education needed to understand a piece of writing. It is calculated using the number of polysyllable words and the number of sentences. The SMOG Index is considered accurate for texts intended for consumers.

$ SMOG = 1.043 \times \sqrt{M \times \frac{30}{S}} + 3.1291 $

M is the number of polysyllable words (words with three or more syllables).
S is the number of sentences.

Source: Wikipedia

`Automated Readability Index (ARI)`

The Automated Readability Index is a readability test designed to gauge the understandability of a text. The formula outputs a number that approximates the grade level needed to comprehend the text. The ARI uses character counts, which makes it suitable for texts with a standard character-per-word ratio.

$ ARI = 4.71 \times \left( \frac{\text{characters}}{\text{words}} \right) + 0.5 \times \left( \frac{\text{words}}{\text{sentences}} \right) - 21.43 $

Where:

The number of characters is divided by the number of words.
The number of words is divided by the number of sentences.

Source: wikipedia

`Dale-Chall Readability Score`

The Dale-Chall Readability Score is unique in that it uses a list of words that are familiar to fourth-grade American students. The score indicates how many years of schooling someone would need to understand the text. If the text contains more than 5% difficult words (words not on the Dale-Chall familiar words list), a penalty is added to the score.

$ DaleChall = 0.1579 \times \left( \frac{\text{difficult words}}{\text{total words}} \times 100 \right) + 0.0496 \times \left( \frac{\text{total words}}{\text{sentences}} \right) $

$ \text{If difficult words} > 5\%: DaleChall = DaleChall + 3.6365 $

“Difficult words” are those not on the Dale-Chall list of familiar words.

Source: wikipedia

`Semantic Density`

Semantic Densityrefers to the concentration of meaning-bearing words within a text, a potential factor in differentiating between human-written and AI-generated essays. The process involves calculating the semantic density of essays by focusing on specific, meaning-rich parts of speech.</div>

Calculating Semantic Density: The function calculate_semantic_density computes this metric by determining the ratio of meaning-bearing words (identified by tags in mb_tags) to the total word count. A higher semantic density indicates a text that efficiently uses words with substantial meaning.

`Semantic Flow Variability`

This study examines the variation in semantic flow between sentences, a characteristic feature of human writing. For this analysis, the `all-MiniLM-L6-v2` model from Sentence Transformers is utilized. Renowned for its efficiency in transforming sentences and paragraphs into 384-dimensional vectors, this model is particularly adept for clustering and semantic search applications. Further details about the model can be found here.

`Psycholinguistic Features`

Psycholinguistic Features encompass the linguistic and psychological characteristics evident in speech and writing. These features provide insights into the writer’s or speaker’s psychological state, cognitive processes, and social dynamics. Analysis in this domain often involves scrutinizing word choice, sentence structure, and language patterns to deduce emotions, attitudes, and personality traits.

The Linguistic Inquiry and Word Count (LIWC) [3] is a renowned computerized text analysis tool that categorizes words into psychologically meaningful groups. It assesses various aspects of a text, including emotional tone, cognitive processes, and structural elements, covering categories like positive and negative emotions, cognitive mechanisms, and more.

While LIWC is typically accessible through purchase or licensing, this project will employ Empath, an open-source alternative to LIWC, to conduct similar analyses. The model’s approach, based on contrastive learning, is key to its effectiveness. It excels in distinguishing sentence pairs from random samples, aligning closely with the study’s objective to analyze semantic flow.

`Textual Entropy`

Textual Entropy involves measuring the unpredictability or randomness in a text. It is a metric for distinguishing between AI-generated text and human writing, as each may exhibit distinct entropy characteristics.

The standard method for calculating entropy is outlined below, which evaluates the unpredictability of each character or word based on its frequency. This approach is encapsulated in the formula for Shannon Entropy:

$\begin{aligned} H(T) &= -\sum_{i=1}^{n} p(x_i) \log_2 p(x_i) &&\quad\text{(Shannon Entropy)}
\end{aligned}$

Shannon Entropy quantifies the level of information disorder or randomness, providing a mathematical framework to assess text complexity.

`Syntactic Tree Patterns`

Syntactic Tree Pattern Analysis The analysis involves parsing essays into syntactic trees to observe pattern frequencies and patterns, focusing on AI-generated and human-written text differences. This process employs the Berkeley Neural Parser, part of the Self-Attentive Parser[5][6] suite. The code is designed to parse natural language texts, specifically our essay data, using Natural Language Processing (NLP) techniques.

These features collectively provide a comprehensive linguistic and structural analysis of the text, offering valuable insights into the syntactic and semantic characteristics of the processed essays.

`num_sentences`:

Counts the total number of sentences in the text, providing an overview of text segmentation.

`num_tokens`:

Tallies the total number of tokens (words and punctuation) in the text, reflecting the overall length.

`num_unique_lemmas`:

Counts distinct base forms of words (lemmas), indicating the diversity of vocabulary used.

`average_token_length`:

Calculates the average length of tokens, shedding light on word complexity and usage.

`average_sentence_length`:

Determines the average number of tokens per sentence, indicating sentence complexity.

`num_entities`:

Counts named entities (like people, places, organizations) recognized in the text, useful for understanding the focus and context.

`num_noun_chunks`:

Tallies noun phrases, providing insights into the structure and complexity of nominal groups.

`num_pos_tags`:

Counts the variety of parts of speech tags, reflecting grammatical diversity.

`num_distinct_entities`:

Determines the number of unique named entities, indicative of the text’s contextual richness.

`average_entity_length`:

Calculates the average length of recognized entities, contributing to understanding the detail level of named references.

`average_noun_chunk_length`:

Measures the average length of noun chunks, indicating the complexity and composition of noun phrases.

`max_depth`:

Determines the maximum depth of syntactic trees in the text, a measure of syntactic complexity.

`avg_branching_factor`:

Calculates the average branching factor of syntactic trees, reflecting the structural complexity and diversity.

`total_nodes`:

Counts the total number of nodes in all syntactic trees, indicating the overall structural richness of the text.

`total_leaves`:

Tallies the leaves in syntactic trees, correlated with sentence simplicity or complexity.

`unique_rules`:

Counts the unique syntactic production rules found across all trees, indicative of syntactic variety.

`tree_complexity`:

Measures the complexity of the syntactic trees by comparing the number of nodes to leaves.

`depth_variability`:

Calculates the standard deviation of tree depths, indicating the variability in syntactic complexity across sentences.

Models Used In Classification

Model - BERT-BiLSTM Classifier Model

The BERT-BiLSTM Classifier model combines the BERT architecture with a Bidirectional Long Short-Term Memory (BiLSTM) network, enhancing the model’s ability to understand context and sequence in text. This model integrates BERT’s transformer layers with a BiLSTM network, a dropout layer for regularization, and a fully connected linear layer with ReLU activation, culminating in a linear classification layer.

BERT-BiLSTM Network Architecture

BERTBiLSTMClassifier(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      ) 
    )
    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (lstm): LSTM(768, 64, num_layers=4, batch_first=True, bidirectional=True)
  (dropout): Dropout(p=0.06796649993811302, inplace=False)
  (fc): Linear(in_features=128, out_features=2, bias=True)
  (relu): ReLU()
)

Rationale

BERT’s Foundational Strength: Utilizing the pre-trained BERT layers, the model leverages BERT’s deep understanding of language semantics, gained from extensive training on diverse text corpora.
Sequence and Context Awareness with BiLSTM: The addition of BiLSTM layers enables the model to capture contextual information in both forward and backward directions, making it adept at understanding the sequence and flow of the text.
Enhanced Text Processing Capabilities: This architecture is particularly effective for complex classification tasks where understanding the context and sequence of words is crucial.

Architecture

Composition: The model is composed of the BertModel layer followed by BiLSTM layers. This is further connected to a dropout layer for regularization, a fully connected linear layer with ReLU activation, and a final linear layer for classification.
BiLSTM Configuration: The BiLSTM layers are configured with customizable hidden sizes and layer counts, allowing the model to be adapted to different levels of sequence complexity.
Loss and Optimization: The model employs CrossEntropyLoss for loss computation and uses the AdamW optimizer. It focuses on optimizing metrics like accuracy and AUC, with an emphasis on balancing precision and recall.

Adaptability and Use Cases

Versatile for Various Text Data: Given its enhanced contextual understanding, the BERT-BiLSTM Classifier is well-suited for tasks like sentiment analysis, topic classification, and other scenarios where the sequence of text plays a significant role.
Customization and Flexibility: The adjustable parameters of the BiLSTM layers (like hidden size and number of layers) offer flexibility, making the model adaptable to a wide range of text classification challenges.

Model - Explainable Boosting Machine (EBM)

A Balance of Predictive Power and Interpretability

EBMs function like a choir 🎶, where each data feature represents a unique voice. These features individually contribute to the overall prediction, akin to each voice adding to the choir’s harmony. This additive model approach ensures that the impact of each feature is distinct and quantifiable.

Overview of EBMs

EBMs are an advanced form of Generalized Additive Models (GAMs). They enhance predictive power while maintaining high interpretability by combining traditional machine learning techniques with the additive structure of GAMs. This design allows for a clear understanding of the influence of individual features and their combinations on the predicted outcome.

Key Components of EBM

Formula Representation: $g(E[y]) = \beta_0 + \sum f_i(x_i) + \sum f_{i,j}(x_i, x_j)$
- g(E[y]): Represents the transformed expected value of the target variable y.
- β₀: The intercept term, indicating the baseline prediction without feature consideration.
- Σ fᵢ(xᵢ): Sum of functions for individual features, showing each feature’s independent effect.
- Σ fᵢⱼ(xᵢ, xⱼ): Sum of pairwise interaction terms, capturing feature interactions.
Training Process:
- EBMs learn shape functions fᵢ(xᵢ) and fᵢⱼ(xᵢ, xⱼ) for each feature and feature interaction through boosting. This technique incrementally refines the model by addressing previous iteration errors.
Interpretability:
- Post-training, the model’s terms are individually examined to understand their impact on predictions. This granular approach allows for detailed insights into how each feature (and pairs of features) affects the outcome.
Feature Importance:
- The magnitude and shape of the learned functions provide a direct measure of feature significance. This aspect is crucial for identifying key influencers in the prediction process.
Flexibility and Complexity:
- Despite its interpretability, EBMs can model complex nonlinear relationships and interactions, surpassing the capabilities of traditional linear models.

☃ EBMs present a unique combination of high interpretability and predictive accuracy. This makes them ideal for scenarios where understanding the reasoning behind model decisions is as critical as the decisions themselves.

🔬 Explainable Boosting Machine (EBM)

A Balance of Predictive Power and Interpretability

EBMs function like a choir 🎶, where each data feature represents a unique voice. These features individually contribute to the overall prediction, akin to each voice adding to the choir’s harmony. This additive model approach ensures that the impact of each feature is distinct and quantifiable.

Overview of EBMs

Key Components of EBM

Formula Representation: $g(E[y]) = \beta_0 + \sum f_i(x_i) + \sum f_{i,j}(x_i, x_j)$
- g(E[y]): Represents the transformed expected value of the target variable y.
- β₀: The intercept term, indicating the baseline prediction without feature consideration.
- Σ fᵢ(xᵢ): Sum of functions for individual features, showing each feature’s independent effect.
- Σ fᵢⱼ(xᵢ, xⱼ): Sum of pairwise interaction terms, capturing feature interactions.
Training Process:
- EBMs learn shape functions fᵢ(xᵢ) and fᵢⱼ(xᵢ, xⱼ) for each feature and feature interaction through boosting. This technique incrementally refines the model by addressing previous iteration errors.
Interpretability:
- Post-training, the model’s terms are individually examined to understand their impact on predictions. This granular approach allows for detailed insights into how each feature (and pairs of features) affects the outcome.
Feature Importance:
- The magnitude and shape of the learned functions provide a direct measure of feature significance. This aspect is crucial for identifying key influencers in the prediction process.
Flexibility and Complexity:
- Despite its interpretability, EBMs can model complex nonlinear relationships and interactions, surpassing the capabilities of traditional linear models.

Sample Output Formats

Type 1

The API analysis indicates a bert_predictions value of 1, suggesting that the essay is likely generated by a language model rather than being human-written. Here’s a detailed breakdown of some key features from the analysis:

Flesch Kincaid Grade: 11.9, implying the text is at a high school level.
Gunning Fog Index: 14.49, suggesting the essay requires a high school education to understand.
Coleman Liau Index: 12.71, indicating a relatively high level of complexity.
SMOG Index: 14.7, further supporting the text’s complexity.
Textual Entropy: 4.256, showing a diverse vocabulary.
Semantic Density: 0.551, indicating a high concentration of meaning-bearing words.
Semantic Flow Variability: 0.157, showing variation in semantic flow between sentences.
Num Sentences: 15, with a total of 325 tokens, suggesting a concise structure.
Num Unique Lemmas: 182, reflecting a wide range of vocabulary used.

The analysis also highlights the presence of various psycholinguistic features, such as emotional (joy: 0.015, sadness: 0.025, affection: 0.025), social dynamics (help: 0.006, family: 0.009), and lifestyle (exercise: 0.009, pet: 0.034), which contribute to the essay’s thematic richness.

In conclusion, the combination of advanced language metrics, complex syntactic structures, and a coherent narrative structure strongly supports the likelihood of human authorship for this essay on remotes. The analysis reveals an effective use of language to convey information concisely and clearly, which is a hallmark of skilled human writing.

Type 2

The analysis suggests the text is more likely to have been written by a human, with a bert_prediction score of 0. This score indicates a strong likelihood of human authorship.

The essay on remotes demonstrates a structured and coherent narrative typical of human writing, characterized by a clear introduction, body, and conclusion within a concise word limit. The Flesch-Kincaid Grade of 14.2 and Gunning Fog Index of 16.02 suggest the text uses relatively complex language, which could be indicative of a human writer aiming for a specific audience or purpose. The Coleman-Liau Index of 16.01 aligns with this, indicating a higher level of education required to comprehend the text. The usage of 43 unique lemmas out of 43 total tokens, along with an average token length of 5.186, showcases a diverse vocabulary and complex word choice, further supporting the human authorship hypothesis.

The essay’s semantic density of 0.604 and the presence of 13 noun chunks with an average length of 11.538 indicate a rich use of nouns and modifiers, creating detailed descriptions within a limited word count. Additionally, the text’s maximum tree depth of 8 and an average branching factor of 4.5 in its syntactic structure suggest complex sentence constructions, typical of human writing that seeks to convey information efficiently and engagingly.

References

Relating Natural Language Aptitude to Individual Differences in Learning Programming Languages) [1]

InterpretML: A Unified Framework for Machine Learning Interpretability” (H. Nori, S. Jenkins, P. Koch, and R. Caruana 2019)[2]

The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods[3]

“Attention is all you need” (Vaswani, Ashish & Shazeer, Noam & Parmar, Niki & Uszkoreit, Jakob & Jones, Llion & Gomez, Aidan & Kaiser, Lukasz & Polosukhin, Illia. (2017)) [4]

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics[5]

Constituency Parsing with a Self-Attentive Encoder[6]

Retrieval-AugmentedGenerationfor Knowledge-IntensiveNLPTasks

This site is open source. Improve this page.

aiorhuman-gpt

Handle the Response

Model Project Overview

📜 Hypothesis, Motivations and Objective

Motivations

Objective

Feature Groups

Readability Scores

Semantic Density

Semantic Flow Variability

Psycholinguistic Features

Textual Entropy

Syntactic Tree Patterns

Features and Descriptions

Flesch-Kincaid Grade Level

Gunning Fog Index

Coleman-Liau Index

SMOG Index

Automated Readability Index (ARI)

Dale-Chall Readability Score

Semantic Density

Semantic Flow Variability

Psycholinguistic Features

Textual Entropy

Syntactic Tree Patterns

num_sentences:

num_tokens:

num_unique_lemmas:

average_token_length:

average_sentence_length:

num_entities:

num_noun_chunks:

num_pos_tags:

num_distinct_entities:

average_entity_length:

average_noun_chunk_length:

max_depth:

avg_branching_factor:

total_nodes:

total_leaves:

unique_rules:

tree_complexity:

depth_variability:

Models Used In Classification

Model - BERT-BiLSTM Classifier Model

BERT-BiLSTM Network Architecture

Rationale

Architecture

Adaptability and Use Cases

Model - Explainable Boosting Machine (EBM)

Overview of EBMs

Key Components of EBM

🔬 Explainable Boosting Machine (EBM)

A Balance of Predictive Power and Interpretability

Overview of EBMs

Key Components of EBM

Sample Output Formats

Type 1

Type 2

References

`Flesch-Kincaid Grade Level`

`Gunning Fog Index`

`Coleman-Liau Index`

`SMOG Index`

`Automated Readability Index (ARI)`

`Dale-Chall Readability Score`

`Semantic Density`

`Semantic Flow Variability`

`Psycholinguistic Features`

`Textual Entropy`

`Syntactic Tree Patterns`

`num_sentences`:

`num_tokens`:

`num_unique_lemmas`:

`average_token_length`:

`average_sentence_length`:

`num_entities`:

`num_noun_chunks`:

`num_pos_tags`:

`num_distinct_entities`:

`average_entity_length`:

`average_noun_chunk_length`:

`max_depth`:

`avg_branching_factor`:

`total_nodes`:

`total_leaves`:

`unique_rules`:

`tree_complexity`:

`depth_variability`: