worksheetsai work

progress:

friday 5/24: 2 hrs scrolling through jupyter notebook and seeing what’s going on. we need to basically just have something rn that accurately evaluates grade levels for passages. the normalization/weighting is super off in that method.
sunday (6 hours): we need to properly normalize each measure, as well as test them for accuracy? can optimize weights autoregressively (I need to learn how to do this). might be best to replicate the lexile evaluator as well, since there’s so much annotated data to train on.
friday 5/31: 1.5 hours getting intuitions for the encoding process, embeddings, and transformers in general
saturday 6/1: 4 hours? diving into SBERT docs and looking at what’s possible for semantic similarity specifically.
sunday-tuesday just focusing on final papers.
wednesday 6/5 -3-4 hours meeting and syncing.
thursday 6/6 2 hours getting semantic similarity with SBERT to actually work
friday 6/7—getting the lexile predictor found on github to work (5 hours) and messing around with grade level data in the context
monday 6/10 - 6 hours of building the few shot generator

Untitled

This will be helpful for evaluating grade levels if we’re measuring text with Lexile:

## Lexile to Grade Level Conversion

def lex_to_grade(lexile):

	# Might change this later, if we're including the "BR"/"beginner reader" subzero levels
  if lexile < 0:
    raise ValueError("Lexile level must be non-negative")
  
  # Define Lexile level thresholds for each grade
  lexile_thresholds = [
    (0, 164, 0),        # Kindergarten
    (165, 424, 1),      # 1st grade
    (425, 644, 2),      # 2nd grade
    (645, 849, 3),      # 3rd grade
    (850, 949, 4),      # 4th grade
    (950, 1029, 5),     # 5th grade
    (1030, 1094, 6),    # 6th grade
    (1095, 1154, 7),    # 7th grade
    (1155, 1204, 8),    # 8th grade
    (1205, 1249, 9),    # 9th grade
    (1250, 1294, 10),   # 10th grade
    (1295, 1349, 11),   # 11th grade
    (1350, 1400, 12)    # 12th grade
  ]
  
  # Check lexile against the thresholds and calculate the grade level
  for lower_bound, upper_bound, grade in lexile_thresholds:
    if lower_bound <= lexile <= upper_bound:
      # Calculate the fractional part within the grade level range
      decimal_grade = grade + (lexile - lower_bound) / (upper_bound - lower_bound + 1)
      return decimal_grade
  
  # If Lexile level is above 1600, return 13
  if lexile > 1400:
    return 13
  
  # If Lexile level is below the minimum threshold, return 0
  return 0

print(lex_to_grade(100))  # Expected output: ~0.5
print(lex_to_grade(450))  # Expected output: ~2.25
print(lex_to_grade(1250)) # Expected output: ~9.25
print(lex_to_grade(1700)) # Expected output: 13