Forum: >>> Magnum BBS <<<

Fast full-text searching in Python (job for Whoosh?)

From Dino@21:1/5 to All on Sat Mar 4 22:43:54 2023

I need fast text-search on a large (not huge, let's say 30k records
totally) list of items. Here's a sample of my raw data (a list of US
cars: model and make)

$ head all_cars_unique.csv\
Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend
Acura,MDX
Acura,MDX Sport Hybrid
Acura,NSX
Acura,RDX
Acura,RL
Acura,RLX
$ wc -l all_cars_unique.csv
1415 all_cars_unique.csv
$ grep -i v60 all_cars_unique.csv
Genesis,GV60
Volvo,V60
$

Essentially, I want my input field to suggest autofill options with data
from this file/list. The user types "v60" and a REST point will offer:

[
{"model":"GV60", "manufacturer":"Genesis"},
{"model":"V60", "manufacturer":"Volvo"}
]

i.e. a JSON response that I can use to generate the autofill with
JavaScript. My Back-End is Python (Flask).

How can I implement this? A library called Whoosh seems very promising
(albeit it's so feature-rich that it's almost like shooting a fly with a bazooka in my case), but I see two problems:

1) Whoosh is either abandoned or the project is a mess in terms of
community and support (https://groups.google.com/g/whoosh/c/QM_P8cGi4v4
) and

2) Whoosh seems to be a Python only thing, which is great for now, but
I wouldn't want this to become an obstacle should I need port it to a
different language at some point.

are there other options that are fast out there? Can I "grep" through a
data structure in python... but faster?

Thanks

Dino

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to Dino on Sat Mar 4 23:12:28 2023

On 3/4/2023 10:43 PM, Dino wrote:

I need fast text-search on a large (not huge, let's say 30k records
totally) list of items. Here's a sample of my raw data (a list of US
cars: model and make)

I suspect I am really close to answering my own question...

import time
lis = [str(a**2+a*3+a) for a in range(0,30000)]
s = time.process_time_ns(); res = [el for el in lis if "13467" in

el]; print(time.process_time_ns() -s);
753800

s = time.process_time_ns(); res = [el for el in lis if "52356" in

el]; print(time.process_time_ns() -s);
1068300

s = time.process_time_ns(); res = [el for el in lis if "5256" in

el]; print(time.process_time_ns() -s);
862000

s = time.process_time_ns(); res = [el for el in lis if "6" in el]; print(time.process_time_ns() -s);

1447300

s = time.process_time_ns(); res = [el for el in lis if "1" in el]; print(time.process_time_ns() -s);

1511100

s = time.process_time_ns(); res = [el for el in lis if "13467" in

el]; print(time.process_time_ns() -s); print(len(res), res[:10])
926900
2 ['134676021', '313467021']

I can do a substring search in a list of 30k elements in less than 2ms
with Python. Is my reasoning sound?

Dino

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to All on Sat Mar 4 22:47:03 2023

Here's the complete data file should anyone care.

Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend
Acura,MDX
Acura,MDX Sport Hybrid
Acura,NSX
Acura,RDX
Acura,RL
Acura,RLX
Acura,RLX Sport Hybrid
Acura,RSX
Acura,SLX
Acura,TL
Acura,TLX
Acura,TSX
Acura,Vigor
Acura,ZDX
Alfa Romeo,164
Alfa Romeo,4C
Alfa Romeo,4C Spider
Alfa Romeo,Giulia
Alfa Romeo,Spider
Alfa Romeo,Stelvio
Alfa Romeo,Tonale
Aston Martin,DB11
Aston Martin,DB9
Aston Martin,DB9 GT
Aston Martin,DBS
Aston Martin,DBS Superleggera
Aston Martin,DBX
Aston Martin,Rapide
Aston Martin,Rapide S
Aston Martin,Vanquish
Aston Martin,Vanquish S
Aston Martin,Vantage
Aston Martin,Virage
Audi,100
Audi,80
Audi,90
Audi,A3
Audi,A3 Sportback e-tron
Audi,A4
Audi,A4 (2005.5)
Audi,A4 allroad
Audi,A5
Audi,A5 Sport
Audi,A6
Audi,A6 allroad
Audi,A7
Audi,A8
Audi,Cabriolet
Audi,Q3
Audi,Q4 Sportback e-tron
Audi,Q4 e-tron
Audi,Q5
Audi,Q5 Sportback
Audi,Q7
Audi,Q8
Audi,Quattro
Audi,R8
Audi,RS 3
Audi,RS 4
Audi,RS 5
Audi,RS 6
Audi,RS 7
Audi,RS Q8
Audi,RS e-tron GT
Audi,S3
Audi,S4
Audi,S4 (2005.5)
Audi,S5
Audi,S6
Audi,S7
Audi,S8
Audi,SQ5
Audi,SQ5 Sportback
Audi,SQ7
Audi,SQ8
Audi,TT
Audi,allroad
Audi,e-tron
Audi,e-tron GT
Audi,e-tron S
Audi,e-tron S Sportback
Audi,e-tron Sportback
BMW,1 Series
BMW,2 Series
BMW,3 Series
BMW,4 Series
BMW,5 Series
BMW,6 Series
BMW,7 Series
BMW,8 Series
BMW,Alpina B7
BMW,M
BMW,M2
BMW,M3
BMW,M4
BMW,M5
BMW,M6
BMW,M8
BMW,X1
BMW,X2
BMW,X3
BMW,X3 M
BMW,X4
BMW,X4 M
BMW,X5
BMW,X5 M
BMW,X6
BMW,X6 M
BMW,X7
BMW,Z3
BMW,Z4
BMW,Z4 M
BMW,Z8
BMW,i3
BMW,i4
BMW,i7
BMW,i8
BMW,iX
Bentley,Arnage
Bentley,Azure
Bentley,Azure T
Bentley,Bentayga
Bentley,Brooklands
Bentley,Continental
Bentley,Continental GT
Bentley,Flying Spur
Bentley,Mulsanne
Buick,Cascada
Buick,Century
Buick,Enclave
Buick,Encore
Buick,Encore GX
Buick,Envision
Buick,LaCrosse
Buick,LeSabre
Buick,Lucerne
Buick,Park Avenue
Buick,Rainier
Buick,Regal
Buick,Regal Sportback
Buick,Regal TourX
Buick,Rendezvous
Buick,Riviera
Buick,Roadmaster
Buick,Skylark
Buick,Terraza
Buick,Verano
Cadillac,ATS
Cadillac,ATS-V
Cadillac,Allante
Cadillac,Brougham
Cadillac,CT4
Cadillac,CT5
Cadillac,CT6
Cadillac,CT6-V
Cadillac,CTS
Cadillac,CTS-V
Cadillac,Catera
Cadillac,DTS
Cadillac,DeVille
Cadillac,ELR
Cadillac,Eldorado
Cadillac,Escalade
Cadillac,Escalade ESV
Cadillac,Escalade EXT
Cadillac,Fleetwood
Cadillac,LYRIQ
Cadillac,SRX
Cadillac,STS
Cadillac,Seville
Cadillac,Sixty Special
Cadillac,XLR
Cadillac,XT4
Cadillac,XT5
Cadillac,XT6
Cadillac,XTS
Chevrolet,1500 Extended Cab
Chevrolet,1500 Regular Cab
Chevrolet,2500 Crew Cab
Chevrolet,2500 Extended Cab
Chevrolet,2500 HD Extended Cab
Chevrolet,2500 HD Regular Cab
Chevrolet,2500 Regular Cab
Chevrolet,3500 Crew Cab
Chevrolet,3500 Extended Cab
Chevrolet,3500 HD Extended Cab
Chevrolet,3500 HD Regular Cab
Chevrolet,3500 Regular Cab
Chevrolet,APV Cargo
Chevrolet,Astro Cargo
Chevrolet,Astro Passenger
Chevrolet,Avalanche
Chevrolet,Avalanche 1500
Chevrolet,Avalanche 2500
Chevrolet,Aveo
Chevrolet,Beretta
Chevrolet,Blazer
Chevrolet,Blazer EV
Chevrolet,Bolt EUV
Chevrolet,Bolt EV
Chevrolet,Camaro
Chevrolet,Caprice
Chevrolet,Caprice Classic
Chevrolet,Captiva Sport
Chevrolet,Cavalier
Chevrolet,City Express
Chevrolet,Classic
Chevrolet,Cobalt
Chevrolet,Colorado Crew Cab
Chevrolet,Colorado Extended Cab
Chevrolet,Colorado Regular Cab
Chevrolet,Corsica
Chevrolet,Corvette
Chevrolet,Cruze
Chevrolet,Cruze Limited
Chevrolet,Equinox
Chevrolet,Equinox EV
Chevrolet,Express 1500 Cargo
Chevrolet,Express 1500 Passenger
Chevrolet,Express 2500 Cargo
Chevrolet,Express 2500 Passenger
Chevrolet,Express 3500 Cargo
Chevrolet,Express 3500 Passenger
Chevrolet,G-Series 1500
Chevrolet,G-Series 2500
Chevrolet,G-Series 3500
Chevrolet,G-Series G10
Chevrolet,G-Series G20
Chevrolet,G-Series G30
Chevrolet,HHR
Chevrolet,Impala
Chevrolet,Impala Limited
Chevrolet,Lumina
Chevrolet,Lumina APV
Chevrolet,Lumina Cargo
Chevrolet,Lumina Passenger
Chevrolet,Malibu
Chevrolet,Malibu (Classic)
Chevrolet,Malibu Limited
Chevrolet,Metro
Chevrolet,Monte Carlo
Chevrolet,Prizm
Chevrolet,S10 Blazer
Chevrolet,S10 Crew Cab
Chevrolet,S10 Extended Cab
Chevrolet,S10 Regular Cab
Chevrolet,SS
Chevrolet,SSR
Chevrolet,Silverado (Classic) 1500 Crew Cab
Chevrolet,Silverado (Classic) 1500 Extended Cab
Chevrolet,Silverado (Classic) 1500 HD Crew Cab
Chevrolet,Silverado (Classic) 1500 Regular Cab
Chevrolet,Silverado (Classic) 2500 HD Crew Cab
Chevrolet,Silverado (Classic) 2500 HD Extended Cab
Chevrolet,Silverado (Classic) 2500 HD Regular Cab
Chevrolet,Silverado (Classic) 3500 Crew Cab
Chevrolet,Silverado (Classic) 3500 Extended Cab
Chevrolet,Silverado (Classic) 3500 Regular Cab
Chevrolet,Silverado 1500 Crew Cab
Chevrolet,Silverado 1500 Double Cab
Chevrolet,Silverado 1500 Extended Cab
Chevrolet,Silverado 1500 HD Crew Cab
Chevrolet,Silverado 1500 LD Double Cab
Chevrolet,Silverado 1500 Limited Crew Cab
Chevrolet,Silverado 1500 Limited Double Cab
Chevrolet,Silverado 1500 Limited Regular Cab
Chevrolet,Silverado 1500 Regular Cab
Chevrolet,Silverado 2500 Crew Cab
Chevrolet,Silverado 2500 Extended Cab
Chevrolet,Silverado 2500 HD Crew Cab
Chevrolet,Silverado 2500 HD Double Cab
Chevrolet,Silverado 2500 HD Extended Cab
Chevrolet,Silverado 2500 HD Regular Cab
Chevrolet,Silverado 2500 Regular Cab
Chevrolet,Silverado 3500 Crew Cab
Chevrolet,Silverado 3500 Extended Cab
Chevrolet,Silverado 3500 HD Crew Cab
Chevrolet,Silverado 3500 HD Double Cab
Chevrolet,Silverado 3500 HD Extended Cab
Chevrolet,Silverado 3500 HD Regular Cab
Chevrolet,Silverado 3500 Regular Cab
Chevrolet,Silverado EV
Chevrolet,Sonic
Chevrolet,Spark
Chevrolet,Spark EV
Chevrolet,Sportvan G10
Chevrolet,Sportvan G20
Chevrolet,Sportvan G30
Chevrolet,Suburban
Chevrolet,Suburban 1500
Chevrolet,Suburban 2500
Chevrolet,Suburban 3500HD
Chevrolet,Tahoe
Chevrolet,Tahoe (New)
Chevrolet,Tracker
Chevrolet,TrailBlazer
Chevrolet,Trailblazer
Chevrolet,Traverse
Chevrolet,Trax
Chevrolet,Uplander Cargo
Chevrolet,Uplander Passenger
Chevrolet,Venture Cargo
Chevrolet,Venture Passenger
Chevrolet,Volt
Chrysler,200
Chrysler,300
Chrysler,300M
Chrysler,Aspen
Chrysler,Cirrus
Chrysler,Concorde
Chrysler,Crossfire
Chrysler,Fifth Ave
Chrysler,Grand Voyager
Chrysler,Imperial
Chrysler,LHS
Chrysler,LeBaron
Chrysler,New Yorker
Chrysler,PT Cruiser
Chrysler,Pacifica
Chrysler,Pacifica Hybrid
Chrysler,Prowler
Chrysler,Sebring
Chrysler,Town & Country
Chrysler,Voyager
Daewoo,Lanos
Daewoo,Leganza
Daewoo,Nubira
Daihatsu,Charade
Daihatsu,Rocky
Dodge,Avenger
Dodge,Caliber
Dodge,Caravan Cargo
Dodge,Caravan Passenger
Dodge,Challenger
Dodge,Charger
Dodge,Colt
Dodge,D150 Club Cab
Dodge,D150 Regular Cab
Dodge,D250 Club Cab
Dodge,D250 Regular Cab
Dodge,D350 Club Cab
Dodge,D350 Regular Cab
Dodge,Dakota Club Cab
Dodge,Dakota Crew Cab
Dodge,Dakota Extended Cab
Dodge,Dakota Quad Cab
Dodge,Dakota Regular Cab
Dodge,Dart
Dodge,Daytona
Dodge,Durango
Dodge,Dynasty
Dodge,Grand Caravan Cargo
Dodge,Grand Caravan Passenger
Dodge,Hornet
Dodge,Intrepid
Dodge,Journey
Dodge,Magnum
Dodge,Monaco
Dodge,Neon
Dodge,Nitro
Dodge,Ram 1500 Club Cab
Dodge,Ram 1500 Crew Cab
Dodge,Ram 1500 Mega Cab
Dodge,Ram 1500 Quad Cab
Dodge,Ram 1500 Regular Cab
Dodge,Ram 2500 Club Cab
Dodge,Ram 2500 Crew Cab
Dodge,Ram 2500 Mega Cab
Dodge,Ram 2500 Quad Cab
Dodge,Ram 2500 Regular Cab
Dodge,Ram 3500 Club Cab
Dodge,Ram 3500 Crew Cab
Dodge,Ram 3500 Mega Cab
Dodge,Ram 3500 Quad Cab
Dodge,Ram 3500 Regular Cab
Dodge,Ram 50 Regular Cab
Dodge,Ram Van 1500
Dodge,Ram Van 2500
Dodge,Ram Van 3500
Dodge,Ram Van B150
Dodge,Ram Van B250
Dodge,Ram Van B350
Dodge,Ram Wagon 1500
Dodge,Ram Wagon 2500
Dodge,Ram Wagon 3500
Dodge,Ram Wagon B150
Dodge,Ram Wagon B250
Dodge,Ram Wagon B350
Dodge,Ramcharger
Dodge,Shadow
Dodge,Spirit
Dodge,Sprinter 2500 Cargo
Dodge,Sprinter 2500 Passenger
Dodge,Sprinter 3500 Cargo
Dodge,Stealth
Dodge,Stratus
Dodge,Viper
Eagle,Premier
Eagle,Summit
Eagle,Talon
Eagle,Vision
FIAT,124 Spider
FIAT,500
FIAT,500 Abarth
FIAT,500L
FIAT,500X
FIAT,500c
FIAT,500c Abarth
FIAT,500e
Ferrari,296 GTB
Ferrari,430 Scuderia
Ferrari,458 Italia
Ferrari,458 Speciale
Ferrari,458 Spider
Ferrari,488 GTB
Ferrari,488 Pista
Ferrari,488 Spider
Ferrari,599 GTB Fiorano
Ferrari,599 GTO
Ferrari,612 Scaglietti
Ferrari,812 Competizione
Ferrari,812 Competizione A
Ferrari,812 GTS
Ferrari,812 Superfast
Ferrari,California
Ferrari,F12berlinetta
Ferrari,F430
Ferrari,F8
Ferrari,FF
Ferrari,GTC4Lusso
Ferrari,Portofino
Ferrari,Roma
Ferrari,SF90
Fisker,Karma
Ford,Aerostar Cargo
Ford,Aerostar Passenger
Ford,Aspire
Ford,Bronco
Ford,Bronco Sport
Ford,C-MAX Energi
Ford,C-MAX Hybrid
Ford,Club Wagon
Ford,Contour
Ford,Crown Victoria
Ford,E-Transit 350 Cargo Van
Ford,E150 Cargo
Ford,E150 Passenger
Ford,E150 Super Duty Cargo
Ford,E150 Super Duty Passenger
Ford,E250 Cargo
Ford,E250 Super Duty Cargo
Ford,E350 Super Duty Cargo
Ford,E350 Super Duty Passenger
Ford,EcoSport
Ford,Econoline E150 Cargo
Ford,Econoline E150 Passenger
Ford,Econoline E250 Cargo
Ford,Econoline E350 Cargo
Ford,Econoline E350 Super Duty Cargo
Ford,Econoline E350 Super Duty Passenger
Ford,Edge
Ford,Escape
Ford,Escape Plug-in Hybrid
Ford,Escort
Ford,Excursion
Ford,Expedition
Ford,Expedition EL
Ford,Expedition MAX
Ford,Explorer
Ford,Explorer Sport
Ford,Explorer Sport Trac
Ford,F150 (Heritage) Regular Cab
Ford,F150 (Heritage) Super Cab
Ford,F150 Lightning
Ford,F150 Regular Cab
Ford,F150 Super Cab
Ford,F150 SuperCrew Cab
Ford,F250 Crew Cab
Ford,F250 Regular Cab
Ford,F250 Super Cab
Ford,F250 Super Duty Crew Cab
Ford,F250 Super Duty Regular Cab
Ford,F250 Super Duty Super Cab
Ford,F350 Crew Cab
Ford,F350 Regular Cab
Ford,F350 Super Cab
Ford,F350 Super Duty Crew Cab
Ford,F350 Super Duty Regular Cab
Ford,F350 Super Duty Super Cab
Ford,F450 Super Duty Crew Cab
Ford,F450 Super Duty Regular Cab
Ford,Festiva
Ford,Fiesta
Ford,Five Hundred
Ford,Flex
Ford,Focus
Ford,Focus ST
Ford,Freestar Cargo
Ford,Freestar Passenger
Ford,Freestyle
Ford,Fusion
Ford,Fusion Energi
Ford,Fusion Plug-in Hybrid
Ford,GT
Ford,Maverick
Ford,Mustang
Ford,Mustang MACH-E
Ford,Probe
Ford,Ranger Regular Cab
Ford,Ranger Super Cab
Ford,Ranger SuperCab
Ford,Ranger SuperCrew
Ford,Taurus
Ford,Taurus X
Ford,Tempo
Ford,Thunderbird
Ford,Transit 150 Cargo Van
Ford,Transit 150 Crew Van
Ford,Transit 150 Passenger Van
Ford,Transit 150 Van
Ford,Transit 150 Wagon
Ford,Transit 250 Cargo Van
Ford,Transit 250 Crew Van
Ford,Transit 250 Van
Ford,Transit 350 Cargo Van
Ford,Transit 350 Crew Van
Ford,Transit 350 HD Cargo Van
Ford,Transit 350 HD Crew Van
Ford,Transit 350 HD Van
Ford,Transit 350 Passenger Van
Ford,Transit 350 Van
Ford,Transit 350 Wagon
Ford,Transit Connect Cargo
Ford,Transit Connect Cargo Van
Ford,Transit Connect Passenger
Ford,Transit Connect Passenger Wagon
Ford,Windstar Cargo
Ford,Windstar Passenger
Ford,ZX2
Freightliner,Sprinter 1500 Cargo
Freightliner,Sprinter 1500 Passenger
Freightliner,Sprinter 2500 Cargo
Freightliner,Sprinter 2500 Crew
Freightliner,Sprinter 2500 Passenger
Freightliner,Sprinter 3500 Cargo
Freightliner,Sprinter 3500 Crew
Freightliner,Sprinter 3500 XD Crew
Freightliner,Sprinter 3500XD Cargo
Freightliner,Sprinter 4500 Cargo
Freightliner,Sprinter 4500 Crew
Freightliner,Sprinter WORKER Cargo
Freightliner,Sprinter WORKER Passenger
GMC,1500 Club Coupe
GMC,1500 Regular Cab
GMC,2500 Club Coupe
GMC,2500 Crew Cab
GMC,2500 HD Club Coupe
GMC,2500 HD Extended Cab
GMC,2500 HD Regular Cab
GMC,2500 Regular Cab
GMC,3500 Club Coupe
GMC,3500 Crew Cab
GMC,3500 Extended Cab
GMC,3500 Regular Cab
GMC,Acadia
GMC,Acadia Limited
GMC,Canyon Crew Cab
GMC,Canyon Extended Cab
GMC,Canyon Regular Cab
GMC,Envoy
GMC,Envoy XL
GMC,Envoy XUV
GMC,Hummer EV Pickup
GMC,Hummer EV SUV
GMC,Jimmy
GMC,Rally Wagon 1500
GMC,Rally Wagon 2500
GMC,Rally Wagon 3500
GMC,Rally Wagon G2500
GMC,Rally Wagon G3500
GMC,Safari Cargo
GMC,Safari Passenger
GMC,Savana 1500 Cargo
GMC,Savana 1500 Passenger
GMC,Savana 2500 Cargo
GMC,Savana 2500 Passenger
GMC,Savana 3500 Cargo
GMC,Savana 3500 Passenger
GMC,Sierra (Classic) 1500 Crew Cab
GMC,Sierra (Classic) 1500 Extended Cab
GMC,Sierra (Classic) 1500 HD Crew Cab
GMC,Sierra (Classic) 1500 Regular Cab
GMC,Sierra (Classic) 2500 Crew Cab
GMC,Sierra (Classic) 2500 HD Crew Cab
GMC,Sierra (Classic) 2500 HD Extended Cab
GMC,Sierra (Classic) 2500 HD Regular Cab
GMC,Sierra (Classic) 3500 Crew Cab
GMC,Sierra (Classic) 3500 Extended Cab
GMC,Sierra (Classic) 3500 Regular Cab
GMC,Sierra 1500 Crew Cab
GMC,Sierra 1500 Double Cab
GMC,Sierra 1500 Extended Cab
GMC,Sierra 1500 HD Crew Cab
GMC,Sierra 1500 Limited Crew Cab
GMC,Sierra 1500 Limited Double Cab
GMC,Sierra 1500 Limited Regular Cab
GMC,Sierra 1500 Regular Cab
GMC,Sierra 2500 Crew Cab
GMC,Sierra 2500 Extended Cab
GMC,Sierra 2500 HD Crew Cab
GMC,Sierra 2500 HD Double Cab
GMC,Sierra 2500 HD Extended Cab
GMC,Sierra 2500 HD Regular Cab
GMC,Sierra 2500 Regular Cab
GMC,Sierra 3500 Crew Cab
GMC,Sierra 3500 Extended Cab
GMC,Sierra 3500 HD Crew Cab
GMC,Sierra 3500 HD Double Cab
GMC,Sierra 3500 HD Extended Cab
GMC,Sierra 3500 HD Regular Cab
GMC,Sierra 3500 Regular Cab
GMC,Sonoma Club Cab
GMC,Sonoma Club Coupe Cab
GMC,Sonoma Crew Cab
GMC,Sonoma Extended Cab
GMC,Sonoma Regular Cab
GMC,Suburban 1500
GMC,Suburban 2500
GMC,Terrain
GMC,Vandura 1500
GMC,Vandura 2500
GMC,Vandura 3500
GMC,Vandura G1500
GMC,Vandura G2500
GMC,Vandura G3500
GMC,Yukon
GMC,Yukon Denali
GMC,Yukon XL
GMC,Yukon XL 1500
GMC,Yukon XL 2500
Genesis,Electrified G80
Genesis,G70
Genesis,G80
Genesis,G90
Genesis,GV60
Genesis,GV70
Genesis,GV80
Geo,Metro
Geo,Prizm
Geo,Storm
Geo,Tracker
HUMMER,H1
HUMMER,H2
HUMMER,H3
HUMMER,H3T
Honda,Accord
Honda,Accord Crosstour
Honda,Accord Hybrid
Honda,CR-V
Honda,CR-V Hybrid
Honda,CR-Z
Honda,Civic
Honda,Civic Type R
Honda,Clarity Electric
Honda,Clarity Fuel Cell
Honda,Clarity Plug-in Hybrid
Honda,Crosstour
Honda,Element
Honda,Fit
Honda,HR-V
Honda,Insight
Honda,Odyssey
Honda,Passport
Honda,Pilot
Honda,Prelude
Honda,Prologue
Honda,Ridgeline
Honda,S2000
Honda,del Sol
Hyundai,Accent
Hyundai,Azera
Hyundai,Elantra
Hyundai,Elantra GT
Hyundai,Elantra Hybrid
Hyundai,Elantra N
Hyundai,Entourage
Hyundai,Equus
Hyundai,Excel
Hyundai,Genesis
Hyundai,Genesis Coupe
Hyundai,IONIQ 5
Hyundai,IONIQ 6
Hyundai,IONIQ 7
Hyundai,Ioniq Electric
Hyundai,Ioniq Hybrid
Hyundai,Ioniq Plug-in Hybrid
Hyundai,Kona
Hyundai,Kona Electric
Hyundai,Kona N
Hyundai,NEXO
Hyundai,Palisade
Hyundai,Santa Cruz
Hyundai,Santa Fe
Hyundai,Santa Fe Hybrid
Hyundai,Santa Fe Plug-in Hybrid
Hyundai,Santa Fe Sport
Hyundai,Santa Fe XL
Hyundai,Scoupe
Hyundai,Sonata
Hyundai,Sonata Hybrid
Hyundai,Sonata Plug-in Hybrid
Hyundai,Tiburon
Hyundai,Tucson
Hyundai,Tucson Fuel Cell
Hyundai,Tucson Hybrid
Hyundai,Tucson Plug-in Hybrid
Hyundai,Veloster
Hyundai,Venue
Hyundai,Veracruz
Hyundai,XG300
Hyundai,XG350
INFINITI,EX
INFINITI,FX
INFINITI,G
INFINITI,I
INFINITI,J
INFINITI,JX
INFINITI,M
INFINITI,Q
INFINITI,Q40
INFINITI,Q50
INFINITI,Q60
INFINITI,Q70
INFINITI,QX
INFINITI,QX30
INFINITI,QX50
INFINITI,QX55
INFINITI,QX60
INFINITI,QX70
INFINITI,QX80
Isuzu,Amigo
Isuzu,Ascender
Isuzu,Axiom
Isuzu,Hombre Regular Cab
Isuzu,Hombre Spacecab
Isuzu,Impulse
Isuzu,Oasis
Isuzu,Regular Cab
Isuzu,Rodeo
Isuzu,Rodeo Sport
Isuzu,Spacecab
Isuzu,Stylus
Isuzu,Trooper
Isuzu,VehiCROSS
Isuzu,i-280 Extended Cab
Isuzu,i-290 Extended Cab
Isuzu,i-350 Crew Cab
Isuzu,i-370 Crew Cab
Isuzu,i-370 Extended Cab
Jaguar,E-PACE
Jaguar,F-PACE
Jaguar,F-TYPE
Jaguar,I-PACE
Jaguar,S-Type
Jaguar,X-Type
Jaguar,XE
Jaguar,XF
Jaguar,XJ
Jaguar,XK
Jeep,Cherokee
Jeep,Comanche Regular Cab
Jeep,Commander
Jeep,Compass
Jeep,Gladiator
Jeep,Grand Cherokee
Jeep,Grand Cherokee 4xe
Jeep,Grand Cherokee L
Jeep,Grand Wagoneer
Jeep,Grand Wagoneer L
Jeep,Liberty
Jeep,Patriot
Jeep,Recon
Jeep,Renegade
Jeep,Wagoneer
Jeep,Wagoneer L
Jeep,Wrangler
Jeep,Wrangler Unlimited
Jeep,Wrangler Unlimited 4xe
Kia,Amanti
Kia,Borrego
Kia,Cadenza
Kia,Carnival
Kia,EV6
Kia,EV9
Kia,Forte
Kia,Forte Koup
Kia,Forte5
Kia,K5
Kia,K900
Kia,Niro
Kia,Niro EV
Kia,Niro Plug-in Hybrid
Kia,Optima
Kia,Optima (2006.5)
Kia,Optima Hybrid
Kia,Optima Plug-in Hybrid
Kia,Rio
Kia,Rondo
Kia,Sedona
Kia,Seltos
Kia,Sephia
Kia,Sorento
Kia,Sorento Hybrid
Kia,Sorento Plug-in Hybrid
Kia,Soul
Kia,Soul EV
Kia,Spectra
Kia,Sportage
Kia,Sportage Hybrid
Kia,Sportage Plug-in Hybrid
Kia,Stinger
Kia,Telluride
Lamborghini,Aventador
Lamborghini,Gallardo
Lamborghini,Huracan
Lamborghini,Murcielago
Lamborghini,Murcielago LP640
Lamborghini,Urus
Land Rover,Defender 110
Land Rover,Defender 130
Land Rover,Defender 90
Land Rover,Discovery
Land Rover,Discovery Series II
Land Rover,Discovery Sport
Land Rover,Freelander
Land Rover,LR2
Land Rover,LR3
Land Rover,LR4
Land Rover,Range Rover
Land Rover,Range Rover Evoque
Land Rover,Range Rover Sport
Land Rover,Range Rover Velar
Lexus,CT
Lexus,ES
Lexus,GS
Lexus,GX
Lexus,HS
Lexus,IS
Lexus,IS F
Lexus,LC
Lexus,LFA
Lexus,LS
Lexus,LX
Lexus,NX
Lexus,RC
Lexus,RX
Lexus,RZ
Lexus,SC
Lexus,TX
Lexus,UX
Lincoln,Aviator
Lincoln,Blackwood
Lincoln,Continental
Lincoln,Corsair
Lincoln,LS
Lincoln,MKC
Lincoln,MKS
Lincoln,MKT
Lincoln,MKX
Lincoln,MKZ
Lincoln,Mark LT
Lincoln,Mark VII
Lincoln,Mark VIII
Lincoln,Nautilus
Lincoln,Navigator
Lincoln,Navigator L
Lincoln,Town Car
Lincoln,Zephyr
Lotus,Elise
Lotus,Evora
Lotus,Evora 400
Lotus,Evora GT
Lotus,Exige
Lotus,Exige S
Lucid,Air
MAZDA,323
MAZDA,626
MAZDA,929
MAZDA,B-Series Cab Plus
MAZDA,B-Series Extended Cab
MAZDA,B-Series Regular Cab
MAZDA,CX-3
MAZDA,CX-30
MAZDA,CX-5
MAZDA,CX-50
MAZDA,CX-7
MAZDA,CX-70
MAZDA,CX-9
MAZDA,CX-90
MAZDA,MAZDA2
MAZDA,MAZDA3
MAZDA,MAZDA5
MAZDA,MAZDA6
MAZDA,MPV
MAZDA,MX-3
MAZDA,MX-30
MAZDA,MX-5 Miata
MAZDA,MX-5 Miata RF
MAZDA,MX-6
MAZDA,Millenia
MAZDA,Navajo
MAZDA,Protege
MAZDA,Protege5
MAZDA,RX-7
MAZDA,RX-8
MAZDA,Tribute
MINI,Clubman
MINI,Convertible
MINI,Cooper
MINI,Countryman
MINI,Coupe
MINI,Hardtop
MINI,Hardtop 2 Door
MINI,Hardtop 4 Door
MINI,Paceman
MINI,Roadster
Maserati,Coupe
Maserati,Ghibli
Maserati,GranSport
Maserati,GranTurismo
Maserati,Grecale
Maserati,Levante
Maserati,MC20
Maserati,Quattroporte
Maybach,57
Maybach,62
McLaren,570GT
McLaren,570S
McLaren,600LT
McLaren,650S
McLaren,675LT
McLaren,720S
McLaren,MP4-12C
Mercedes-Benz,190 E
Mercedes-Benz,300 CE
Mercedes-Benz,300 D
Mercedes-Benz,300 E
Mercedes-Benz,300 SD
Mercedes-Benz,300 SE
Mercedes-Benz,300 SL
Mercedes-Benz,300 TE
Mercedes-Benz,400 E
Mercedes-Benz,400 SE
Mercedes-Benz,400 SEL
Mercedes-Benz,500 E
Mercedes-Benz,500 SEC
Mercedes-Benz,500 SEL
Mercedes-Benz,500 SL
Mercedes-Benz,600 SEC
Mercedes-Benz,600 SEL
Mercedes-Benz,600 SL
Mercedes-Benz,A-Class
Mercedes-Benz,B-Class
Mercedes-Benz,C-Class
Mercedes-Benz,CL-Class
Mercedes-Benz,CLA
Mercedes-Benz,CLA-Class
Mercedes-Benz,CLK-Class
Mercedes-Benz,CLS
Mercedes-Benz,CLS-Class
Mercedes-Benz,E-Class
Mercedes-Benz,G-Class
Mercedes-Benz,GL-Class
Mercedes-Benz,GLA
Mercedes-Benz,GLA-Class
Mercedes-Benz,GLB
Mercedes-Benz,GLC
Mercedes-Benz,GLC Coupe
Mercedes-Benz,GLE
Mercedes-Benz,GLE Coupe
Mercedes-Benz,GLK-Class
Mercedes-Benz,GLS
Mercedes-Benz,M-Class
Mercedes-Benz,Mercedes-AMG A-Class
Mercedes-Benz,Mercedes-AMG C-Class
Mercedes-Benz,Mercedes-AMG CLA
Mercedes-Benz,Mercedes-AMG CLS
Mercedes-Benz,Mercedes-AMG E-Class
Mercedes-Benz,Mercedes-AMG EQS
Mercedes-Benz,Mercedes-AMG G-Class
Mercedes-Benz,Mercedes-AMG GLA
Mercedes-Benz,Mercedes-AMG GLB
Mercedes-Benz,Mercedes-AMG GLC
Mercedes-Benz,Mercedes-AMG GLC Coupe
Mercedes-Benz,Mercedes-AMG GLE
Mercedes-Benz,Mercedes-AMG GLE Coupe
Mercedes-Benz,Mercedes-AMG GLS
Mercedes-Benz,Mercedes-AMG GT
Mercedes-Benz,Mercedes-AMG S-Class
Mercedes-Benz,Mercedes-AMG SL
Mercedes-Benz,Mercedes-AMG SLC
Mercedes-Benz,Mercedes-AMG SLK
Mercedes-Benz,Mercedes-EQ EQB
Mercedes-Benz,Mercedes-EQ EQE
Mercedes-Benz,Mercedes-EQ EQE SUV
Mercedes-Benz,Mercedes-EQ EQS
Mercedes-Benz,Mercedes-EQ EQS SUV
Mercedes-Benz,Mercedes-Maybach GLS
Mercedes-Benz,Mercedes-Maybach S 600
Mercedes-Benz,Mercedes-Maybach S-Class
Mercedes-Benz,Metris Cargo
Mercedes-Benz,Metris Passenger
Mercedes-Benz,Metris WORKER Cargo
Mercedes-Benz,Metris WORKER Passenger
Mercedes-Benz,R-Class
Mercedes-Benz,S-Class
Mercedes-Benz,SL
Mercedes-Benz,SL-Class
Mercedes-Benz,SLC
Mercedes-Benz,SLK
Mercedes-Benz,SLK-Class
Mercedes-Benz,SLR McLaren
Mercedes-Benz,SLS-Class
Mercedes-Benz,Sprinter 1500 Cargo
Mercedes-Benz,Sprinter 1500 Passenger
Mercedes-Benz,Sprinter 2500 Cargo
Mercedes-Benz,Sprinter 2500 Crew
Mercedes-Benz,Sprinter 2500 Passenger
Mercedes-Benz,Sprinter 3500 Cargo
Mercedes-Benz,Sprinter 3500 Crew
Mercedes-Benz,Sprinter 3500 XD Cargo
Mercedes-Benz,Sprinter 3500 XD Crew
Mercedes-Benz,Sprinter 4500 Cargo
Mercedes-Benz,Sprinter 4500 Crew
Mercedes-Benz,Sprinter WORKER Cargo
Mercedes-Benz,Sprinter WORKER Passenger
Mercury,Capri
Mercury,Cougar
Mercury,Grand Marquis
Mercury,Marauder
Mercury,Mariner
Mercury,Milan
Mercury,Montego
Mercury,Monterey
Mercury,Mountaineer
Mercury,Mystique
Mercury,Sable
Mercury,Topaz
Mercury,Tracer
Mercury,Villager
Mitsubishi,3000GT
Mitsubishi,Diamante
Mitsubishi,Eclipse
Mitsubishi,Eclipse Cross
Mitsubishi,Endeavor
Mitsubishi,Expo
Mitsubishi,Galant
Mitsubishi,Lancer
Mitsubishi,Lancer Evolution
Mitsubishi,Mighty Max Macro Cab
Mitsubishi,Mighty Max Regular Cab
Mitsubishi,Mirage
Mitsubishi,Mirage G4
Mitsubishi,Montero
Mitsubishi,Montero Sport
Mitsubishi,Outlander
Mitsubishi,Outlander PHEV
Mitsubishi,Outlander Sport
Mitsubishi,Precis
Mitsubishi,Raider Double Cab
Mitsubishi,Raider Extended Cab
Mitsubishi,i-MiEV
Nissan,200SX
Nissan,240SX
Nissan,300ZX
Nissan,350Z
Nissan,370Z
Nissan,400Z
Nissan,Altima
Nissan,Ariya
Nissan,Armada
Nissan,Frontier Crew Cab
Nissan,Frontier King Cab
Nissan,Frontier Regular Cab
Nissan,GT-R
Nissan,JUKE
Nissan,Kicks
Nissan,King Cab
Nissan,LEAF
Nissan,Maxima
Nissan,Murano
Nissan,NV1500 Cargo
Nissan,NV200
Nissan,NV200 Taxi
Nissan,NV2500 HD Cargo
Nissan,NV3500 HD Cargo
Nissan,NV3500 HD Passenger
Nissan,NX
Nissan,Pathfinder
Nissan,Pathfinder Armada
Nissan,Quest
Nissan,Regular Cab
Nissan,Rogue
Nissan,Rogue Select
Nissan,Rogue Sport
Nissan,Sentra
Nissan,Stanza
Nissan,TITAN Single Cab
Nissan,TITAN XD Crew Cab
Nissan,TITAN XD King Cab
Nissan,TITAN XD Single Cab
Nissan,Titan Crew Cab
Nissan,Titan King Cab
Nissan,Versa
Nissan,Versa Note
Nissan,Xterra
Nissan,Z
Nissan,cube
Oldsmobile,88
Oldsmobile,98
Oldsmobile,Achieva
Oldsmobile,Alero
Oldsmobile,Aurora
Oldsmobile,Bravada
Oldsmobile,Ciera
Oldsmobile,Custom Cruiser
Oldsmobile,Cutlass
Oldsmobile,Cutlass Ciera
Oldsmobile,Cutlass Cruiser
Oldsmobile,Cutlass Supreme
Oldsmobile,Intrigue
Oldsmobile,LSS
Oldsmobile,Regency
Oldsmobile,Silhouette
Oldsmobile,Toronado
Panoz,Esperante
Plymouth,Acclaim
Plymouth,Breeze
Plymouth,Colt
Plymouth,Colt Vista
Plymouth,Grand Voyager
Plymouth,Laser
Plymouth,Neon
Plymouth,Prowler
Plymouth,Sundance
Plymouth,Voyager
Polestar,1
Polestar,2
Polestar,3
Polestar,5
Pontiac,Aztek
Pontiac,Bonneville
Pontiac,Firebird
Pontiac,G3
Pontiac,G5
Pontiac,G6
Pontiac,G6 (2009.5)
Pontiac,G8
Pontiac,GTO
Pontiac,Grand Am
Pontiac,Grand Prix
Pontiac,LeMans
Pontiac,Montana
Pontiac,Montana SV6
Pontiac,Solstice
Pontiac,Sunbird
Pontiac,Sunfire
Pontiac,Torrent
Pontiac,Trans Sport
Pontiac,Vibe
Porsche,718 Boxster
Porsche,718 Cayman
Porsche,718 Spyder
Porsche,911
Porsche,928
Porsche,968
Porsche,Boxster
Porsche,Carrera GT
Porsche,Cayenne
Porsche,Cayenne Coupe
Porsche,Cayman
Porsche,Macan
Porsche,Panamera
Porsche,Taycan
Porsche,Taycan Cross Turismo
Ram,1500 Classic Crew Cab
Ram,1500 Classic Quad Cab
Ram,1500 Classic Regular Cab
Ram,1500 Crew Cab
Ram,1500 Quad Cab
Ram,1500 Regular Cab
Ram,2500 Crew Cab
Ram,2500 Mega Cab
Ram,2500 Regular Cab
Ram,3500 Crew Cab
Ram,3500 Mega Cab
Ram,3500 Regular Cab
Ram,C/V
Ram,C/V Tradesman
Ram,Dakota Crew Cab
Ram,Dakota Extended Cab
Ram,ProMaster 1500 Cargo
Ram,ProMaster 2500 Cargo
Ram,ProMaster 3500 Cargo
Ram,ProMaster Cargo Van
Ram,ProMaster City
Ram,ProMaster Window Van
Rivian,R1S
Rivian,R1T
Rolls-Royce,Cullinan
Rolls-Royce,Dawn
Rolls-Royce,Ghost
Rolls-Royce,Phantom
Rolls-Royce,Wraith
SRT,Viper
Saab,9-2X
Saab,9-3
Saab,9-4X
Saab,9-5
Saab,9-7X
Saab,900
Saab,9000
Saturn,Astra
Saturn,Aura
Saturn,Ion
Saturn,L-Series
Saturn,Outlook
Saturn,Relay
Saturn,S-Series
Saturn,SKY
Saturn,VUE
Scion,FR-S
Scion,iA
Scion,iM
Scion,iQ
Scion,tC
Scion,xA
Scion,xB
Scion,xD
Subaru,Ascent
Subaru,B9 Tribeca
Subaru,BRZ
Subaru,Baja
Subaru,Crosstrek
Subaru,Forester
Subaru,Impreza
Subaru,Justy
Subaru,Legacy
Subaru,Loyale
Subaru,Outback
Subaru,SVX
Subaru,Solterra
Subaru,Tribeca
Subaru,WRX
Subaru,XV Crosstrek
Suzuki,Aerio
Suzuki,Equator Crew Cab
Suzuki,Equator Extended Cab
Suzuki,Esteem
Suzuki,Forenza
Suzuki,Grand Vitara
Suzuki,Kizashi
Suzuki,Reno
Suzuki,SX4
Suzuki,Samurai
Suzuki,Sidekick
Suzuki,Swift
Suzuki,Verona
Suzuki,Vitara
Suzuki,X-90
Suzuki,XL-7
Suzuki,XL7
Tesla,Cybertruck
Tesla,Model 3
Tesla,Model S
Tesla,Model X
Tesla,Model Y
Toyota,4Runner
Toyota,86
Toyota,Avalon
Toyota,Avalon Hybrid
Toyota,C-HR
Toyota,Camry
Toyota,Camry Hybrid
Toyota,Celica
Toyota,Corolla
Toyota,Corolla Cross
Toyota,Corolla Cross Hybrid
Toyota,Corolla Hatchback
Toyota,Corolla Hybrid
Toyota,Corolla iM
Toyota,Cressida
Toyota,Crown
Toyota,Echo
Toyota,FJ Cruiser
Toyota,GR Supra
Toyota,GR86
Toyota,Highlander
Toyota,Highlander Hybrid
Toyota,Land Cruiser
Toyota,MR2
Toyota,Matrix
Toyota,Mirai
Toyota,Paseo
Toyota,Previa
Toyota,Prius
Toyota,Prius Plug-in Hybrid
Toyota,Prius Prime
Toyota,Prius c
Toyota,Prius v
Toyota,RAV4
Toyota,RAV4 Hybrid
Toyota,RAV4 Prime
Toyota,Regular Cab
Toyota,Sequoia
Toyota,Sienna
Toyota,Solara
Toyota,Supra
Toyota,T100 Regular Cab
Toyota,T100 Xtracab
Toyota,Tacoma Access Cab
Toyota,Tacoma Double Cab
Toyota,Tacoma Regular Cab
Toyota,Tacoma Xtracab
Toyota,Tercel
Toyota,Tundra Access Cab
Toyota,Tundra CrewMax
Toyota,Tundra Double Cab
Toyota,Tundra Hybrid CrewMax
Toyota,Tundra Regular Cab
Toyota,Venza
Toyota,Xtra Cab
Toyota,Yaris
Toyota,Yaris Hatchback
Toyota,Yaris iA
Toyota,bZ4X
VinFast,VF 8
VinFast,VF 9
Volkswagen,Arteon
Volkswagen,Atlas
Volkswagen,Atlas Cross Sport
Volkswagen,Beetle
Volkswagen,CC
Volkswagen,Cabrio
Volkswagen,Cabrio (New)
Volkswagen,Cabriolet
Volkswagen,Corrado
Volkswagen,Eos
Volkswagen,Eurovan
Volkswagen,Fox
Volkswagen,GLI
Volkswagen,GTI
Volkswagen,GTI (New)
Volkswagen,Golf
Volkswagen,Golf (New)
Volkswagen,Golf Alltrack
Volkswagen,Golf GTI
Volkswagen,Golf III
Volkswagen,Golf R
Volkswagen,Golf SportWagen
Volkswagen,ID.4
Volkswagen,ID.Buzz
Volkswagen,Jetta
Volkswagen,Jetta (New)
Volkswagen,Jetta GLI
Volkswagen,Jetta III
Volkswagen,Jetta SportWagen
Volkswagen,New Beetle
Volkswagen,Passat
Volkswagen,Passat (New)
Volkswagen,Phaeton
Volkswagen,R32
Volkswagen,Rabbit
Volkswagen,Routan
Volkswagen,Taos
Volkswagen,Tiguan
Volkswagen,Tiguan Limited
Volkswagen,Touareg
Volkswagen,Touareg 2
Volkswagen,e-Golf
Volvo,240
Volvo,740
Volvo,850
Volvo,940
Volvo,960
Volvo,C30
Volvo,C40 Recharge
Volvo,C70
Volvo,EX90
Volvo,S40
Volvo,S40 (New)
Volvo,S60
Volvo,S70
Volvo,S80
Volvo,S90
Volvo,V40
Volvo,V50
Volvo,V60
Volvo,V70
Volvo,V90
Volvo,XC40
Volvo,XC40 Recharge
Volvo,XC60
Volvo,XC70
Volvo,XC90
smart,fortwo
smart,fortwo EQ cabrio
smart,fortwo EQ coupe
smart,fortwo cabrio
smart,fortwo electric drive
smart,fortwo electric drive cabrio

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Greg Ewing@21:1/5 to Dino on Sun Mar 5 19:19:35 2023

On 5/03/23 5:12 pm, Dino wrote:

I can do a substring search in a list of 30k elements in less than 2ms
with Python. Is my reasoning sound?

I just did a similar test with your actual data and got
about the same result. If that's fast enough for you,
then you don't need to do anything fancy.

--
Greg

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to Dino on Sun Mar 5 21:05:10 2023

On 3/4/2023 11:12 PM, Dino wrote:

On 3/4/2023 10:43 PM, Dino wrote:

I need fast text-search on a large (not huge, let's say 30k records
totally) list of items. Here's a sample of my raw data (a list of US
cars: model and make)

I suspect I am really close to answering my own question...

import time
lis = [str(a**2+a*3+a) for a in range(0,30000)]
s = time.process_time_ns(); res = [el for el in lis if "13467" in

el]; print(time.process_time_ns() -s);
753800

s = time.process_time_ns(); res = [el for el in lis if "52356" in

el]; print(time.process_time_ns() -s);
1068300

s = time.process_time_ns(); res = [el for el in lis if "5256" in

el]; print(time.process_time_ns() -s);
862000

s = time.process_time_ns(); res = [el for el in lis if "6" in el]; print(time.process_time_ns() -s);

1447300

s = time.process_time_ns(); res = [el for el in lis if "1" in el]; print(time.process_time_ns() -s);

1511100

s = time.process_time_ns(); res = [el for el in lis if "13467" in

el]; print(time.process_time_ns() -s); print(len(res), res[:10])
926900
2 ['134676021', '313467021']

I can do a substring search in a list of 30k elements in less than 2ms
with Python. Is my reasoning sound?

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From avi.e.gross@gmail.com@21:1/5 to All on Sun Mar 5 22:56:26 2023

Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it.

Your question does not look difficult unless your real question is about
speed. Realistically, much of the time spent generally is in reading in a
file and the actual search can be quite rapid with a wide range of methods.

The data looks boring enough and seems to not have much structure other than one comma possibly separating two fields. Do you want the data as one wide filed or perhaps in two parts, which a CSV file is normally used to
represent. Do you ever have questions like tell me all cars whose name
begins with the letter D and has a V6 engine? If so, you may want more than
a vanilla search.

What exactly do you want to search for? Is it a set of built-in searches or something the user types in?

The data seems to be sorted by the first field and then by the second and I
did not check if some searches might be ambiguous. Can there be many entries containing III? Yep. Can the same words like Cruiser or Hybrid appear?

So is this a one-time search or multiple searches once loaded as in a
service that stays resident and fields requests. The latter may be worth speeding up.

I don't NEED to know any of this but want you to know that the answer may depend on this and similar factors. We had a long discussion lately on
whether to search using regular expressions or string methods. If your data
is meant to be used once, you may not even need to read the file into
memory, but read something like a line at a time and test it. Or, if you end
up with more data like how many cylinders a car has, it may be time to read
it in not just to a list of lines or such data structures, but get
numpy/pandas involved and use their many search methods in something like a data.frame.

Of course if you are worried about portability, keep using Get Regular Expression Print.

Your example was:

$ grep -i v60 all_cars_unique.csv
Genesis,GV60
Volvo,V60

You seem to have wanted case folding and that is NOT a normal search. And
your search is matching anything on any line. If you wanted only a complete field, such as all text after a comma to the end of the line, you could use grep specifications to say that.

But once inside python, you would need to make choices depending on what
kind of searches you want to allow but also things like do you want all matching lines shown if you search for say "a" ...

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Dino
Sent: Saturday, March 4, 2023 10:47 PM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

Here's the complete data file should anyone care.

Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend
<SNIP>
smart,fortwo electric drive
smart,fortwo electric drive cabrio

--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to avi.e.gross@gmail.com on Mon Mar 6 07:40:29 2023

Thank you for taking the time to write such a detailed answer, Avi. And apologies for not providing more info from the get go.

What I am trying to achieve here is supporting autocomplete (no pun
intended) in a web form field, hence the -i case insensitive example in
my initial question.

Your points are all good, and my original question was a bit rushed. I
guess that the problem was that I saw this video:

https://www.youtube.com/watch?v=gRvZbYtwTeo&ab_channel=NextDayVideo

The idea that someone types into an input field and matches start
dancing in the browser made me think that this was exactly what I
needed, and hence I figured that asking here about Whoosh would be a
good idea. I know realize that Whoosh would be overkill for my use-case,
as a simple (case insensitive) query substring would get me 90% of what
I want. Speed is in the order of a few milliseconds out of the box,
which is chump change in the context of a web UI.

Thank you again for taking the time to look at my question

Dino

On 3/5/2023 10:56 PM, avi.e.gross@gmail.com wrote:

Dino, Sending lots of data to an archived forum is not a great idea. I snipped most of it out below as not to replicate it.

Your question does not look difficult unless your real question is about speed. Realistically, much of the time spent generally is in reading in a file and the actual search can be quite rapid with a wide range of methods.

The data looks boring enough and seems to not have much structure other than one comma possibly separating two fields. Do you want the data as one wide filed or perhaps in two parts, which a CSV file is normally used to represent. Do you ever have questions like tell me all cars whose name
begins with the letter D and has a V6 engine? If so, you may want more than
a vanilla search.

What exactly do you want to search for? Is it a set of built-in searches or something the user types in?

The data seems to be sorted by the first field and then by the second and I did not check if some searches might be ambiguous. Can there be many entries containing III? Yep. Can the same words like Cruiser or Hybrid appear?

So is this a one-time search or multiple searches once loaded as in a
service that stays resident and fields requests. The latter may be worth speeding up.

I don't NEED to know any of this but want you to know that the answer may depend on this and similar factors. We had a long discussion lately on whether to search using regular expressions or string methods. If your data is meant to be used once, you may not even need to read the file into
memory, but read something like a line at a time and test it. Or, if you end up with more data like how many cylinders a car has, it may be time to read it in not just to a list of lines or such data structures, but get numpy/pandas involved and use their many search methods in something like a data.frame.

Of course if you are worried about portability, keep using Get Regular Expression Print.

Your example was:

$ grep -i v60 all_cars_unique.csv
Genesis,GV60
Volvo,V60

You seem to have wanted case folding and that is NOT a normal search. And your search is matching anything on any line. If you wanted only a complete field, such as all text after a comma to the end of the line, you could use grep specifications to say that.

But once inside python, you would need to make choices depending on what
kind of searches you want to allow but also things like do you want all matching lines shown if you search for say "a" ...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to Thomas Passin on Mon Mar 6 07:28:50 2023

On 3/5/2023 9:05 PM, Thomas Passin wrote:

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

Thank you. SQLite would be overkill here, plus all the machinery that I
would need to set up to make sure that the DB is rebuilt/updated regularly.
Do you happen to know something about Whoosh? have you ever used it?

IOW, do the bulk of the work once at startup.

Sound advice

Thank you

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to Greg Ewing on Mon Mar 6 07:26:24 2023

On 3/5/2023 1:19 AM, Greg Ewing wrote:

I just did a similar test with your actual data and got
about the same result. If that's fast enough for you,
then you don't need to do anything fancy.

thank you, Greg. That's what I am going to do in fact.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Weatherby,Gerard@21:1/5 to All on Mon Mar 6 15:35:02 2023

�Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it.�

Surely in 2023, storage is affordable enough there�s no need to criticize Dino for posting complete information. If mailing space is a consideration, we could all help by keeping our replies short and to the point.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Weatherby,Gerard@21:1/5 to All on Mon Mar 6 15:32:09 2023

Not sure if this is what Thomas meant, but I was also thinking dictionaries.

Dino could build a set of dictionaries with keys �a� through �z� that contain data with those letters in them. (I�m assuming case insensitive search) and then just search �v� if that�s what the user starts with.

Increased performance may be achieved by building dictionaries �aa�,�ab� ... �zz. And so on.

Of course, it�s trading CPU for memory usage, and there�s likely a point at which the cost of building dictionaries exceeds the savings in searching.

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Thomas Passin <list1@tompassin.net>
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-
list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From rbowman@21:1/5 to Dino on Mon Mar 6 15:52:28 2023

On Mon, 6 Mar 2023 07:40:29 -0500, Dino wrote:

The idea that someone types into an input field and matches start
dancing in the browser made me think that this was exactly what I
needed, and hence I figured that asking here about Whoosh would be a
good idea. I know realize that Whoosh would be overkill for my use-case,
as a simple (case insensitive) query substring would get me 90% of what
I want. Speed is in the order of a few milliseconds out of the box,
which is chump change in the context of a web UI.

For a web application the round trips to the server for the next set of suggestions swamp out the actual lookups. Use the developer console in
your browser to look at the network traffic and you'll see it's busy.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to Gerard on Mon Mar 6 11:03:02 2023

On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:

Not sure if this is what Thomas meant, but I was also thinking dictionaries.

Dino could build a set of dictionaries with keys “a” through “z” that contain data with those letters in them. (I’m assuming case insensitive search) and then just search “v” if that’s what the user starts with.

Increased performance may be achieved by building dictionaries “aa”,”ab” ... “zz. And so on.

Of course, it’s trading CPU for memory usage, and there’s likely a point at which the cost of building dictionaries exceeds the savings in searching.

Chances are it would only be seconds at most to build the data cache,
and then subsequent queries would respond very quickly.

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Thomas Passin <list1@tompassin.net>
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/

python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From rbowman@21:1/5 to Gerard on Mon Mar 6 15:46:39 2023

On Mon, 6 Mar 2023 15:32:09 +0000, Weatherby,Gerard wrote:

Increased performance may be achieved by building dictionaries “aa”,”ab”
... “zz. And so on.

Or a trie. There have been several implementations but I believe this is
the most active:

https://pypi.org/project/PyTrie/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to Dino on Mon Mar 6 12:50:54 2023

On 3/6/2023 7:28 AM, Dino wrote:

On 3/5/2023 9:05 PM, Thomas Passin wrote:

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

Thank you. SQLite would be overkill here, plus all the machinery that I
would need to set up to make sure that the DB is rebuilt/updated regularly. Do you happen to know something about Whoosh? have you ever used it?

I know nothing about it, sorry. But anything beyond python dictionaries
and possibly some lists strikes me as overkill for what you have described.

IOW, do the bulk of the work once at startup.

Sound advice

Thank you

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From avi.e.gross@gmail.com@21:1/5 to All on Mon Mar 6 12:37:09 2023

Gerard,

I was politely pointing out how it was more than the minimum necessary and might gets repeated multiple times as people replied. The storage space is a resource someone else provides and I prefer not abusing it.

However, since the OP seems to be asking a question focused on how long it takes to search using possible techniques, indeed some people would want the entire data to test with.

In my personal view, the a snippet of the data is what I need to see how it
is organized and then what I need way more is some idea for what kind of searching is needed.

If I was told there would be a web page allowing users to search a web
service hosting the data on a server with one process called as much as
needed that spawned threads to handle the task, I might see it as very worthwhile to read in the data once into some data structure that allows
rapid searches over and over. If it is an app called ONCE as a whole for
each result, as in the grep example, why bother and just read a line at a
time and be done with it.

My suggestion remains my preference. The discussion is archived. Messages
are can optimally be trimmed as needed and not allowed to contain the full contents of the last twenty replies back and forth unless that is needed. Larger amounts of data can be offered to share and if wanted, can be posted
or send to someone asking for it or placed in some public accessible place.

But my preference may not be relevant as the forum has hosts or owners and
it is what they want that counts.

The data this time was not really gigantic. But I often work with data from
a CSV that has hundreds of columns and hundreds of thousands or more rows,
with some of the columns containing large amounts of text. But I may be interested in how to work with say just half a dozen columns and for the purposes of my question here, perhaps a hundred representative rows. Should
I share everything, or maybe save the subset and only share that?

This is not about python as a language but about expressing ideas and
opinions on a public forum with limited resources. Yes, over the years, my combined posts probably use far more archival space. We are not asked to be sparse, just not be wasteful.

The OP may consider what he is working with as a LOT of data but it really isn't by modern standards.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Weatherby,Gerard
Sent: Monday, March 6, 2023 10:35 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

"Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it."

Surely in 2023, storage is affordable enough there's no need to criticize
Dino for posting complete information. If mailing space is a consideration,
we could all help by keeping our replies short and to the point.

--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From avi.e.gross@gmail.com@21:1/5 to Gerard on Mon Mar 6 12:49:20 2023

Thomas,

I may have missed any discussion where the OP explained more about proposed usage. If the program is designed to load the full data once, never get updates except by re-reading some file, and then handles multiple requests, then some things may be worth
doing.

It looked to me, and I may well be wrong, like he wanted to search for a string anywhere in the text so a grep-like solution is a reasonable start with the actual data being stored as something like a list of character strings you can search "one line"
at a time. I suspect a numpy variant may work faster.

And of course any search function he builds can be made to remember some or all previous searches using a cache decorator. That generally uses a dictionary for the search keys internally.

But using lots of dictionaries strikes me as only helping if you are searching for text anchored to the start of a line so if you ask for "Honda" you instead ask the dictionary called "h" and search perhaps just for "onda" then recombine the prefix in
any results. But the example given wanted to match something like "V6" in middle of the text and I do not see how that would work as you would now need to search 26 dictionaries completely.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Thomas Passin
Sent: Monday, March 6, 2023 11:03 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:

Not sure if this is what Thomas meant, but I was also thinking dictionaries.

Dino could build a set of dictionaries with keys “a” through “z” that contain data with those letters in them. (I’m assuming case insensitive search) and then just search “v” if that’s what the user starts with.

Increased performance may be achieved by building dictionaries “aa”,”ab” ... “zz. And so on.

Of course, it’s trading CPU for memory usage, and there’s likely a point at which the cost of building dictionaries exceeds the savings in searching.

Chances are it would only be seconds at most to build the data cache,
and then subsequent queries would respond very quickly.

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Thomas Passin <list1@tompassin.net>
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/

python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$>

--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From avi.e.gross@gmail.com@21:1/5 to avi.e.gross@gmail.com on Mon Mar 6 13:45:34 2023

Ah, thanks Dino. Autocomplete within a web page can be an interesting
scenario but also a daunting one.

Now, do you mean you have a web page with a text field, initially I suppose empty, and the user types a single character and rapidly a drop-down list or something is created and shown? And as they type, it may shrink? And as soon
as they select one, it is replaced in the text field and done?

If your form has an attached function written in JavaScript, some might load your data into the browser and do all that work from within. No python
needed.

Now if your scenario is similar to the above, or perhaps the user needs to
ask for autocompletion by using tab or something, and you want to keep
sending requests to a server, you can of course use any language on the
server. BUT I would be cautious in such a design.

My guess is you autocomplete on every keystroke and the user may well type multiple characters resulting in multiple requests for your program. Is a
new one called every time or is it a running service. If the latter, it pays
to read in the data once and then carefully serve it. But when you get just
the letter "h" you may not want to send and process a thousand results but limit It to say the first N. If they then add an o to make a ho, You may not need to do much if it is anchored to the start except to search in the
results of the previous search rather than the whole data.

But have you done some searching on how autocomplete from a fixed corpus is normally done? It is a quite common thing.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Dino
Sent: Monday, March 6, 2023 7:40 AM
To: python-list@python.org
Subject: Re: RE: Fast full-text searching in Python (job for Whoosh?)

Thank you for taking the time to write such a detailed answer, Avi. And apologies for not providing more info from the get go.

What I am trying to achieve here is supporting autocomplete (no pun
intended) in a web form field, hence the -i case insensitive example in
my initial question.

Your points are all good, and my original question was a bit rushed. I
guess that the problem was that I saw this video:

https://www.youtube.com/watch?v=gRvZbYtwTeo&ab_channel=NextDayVideo

The idea that someone types into an input field and matches start
dancing in the browser made me think that this was exactly what I
needed, and hence I figured that asking here about Whoosh would be a
good idea. I know realize that Whoosh would be overkill for my use-case,
as a simple (case insensitive) query substring would get me 90% of what
I want. Speed is in the order of a few milliseconds out of the box,
which is chump change in the context of a web UI.

Thank you again for taking the time to look at my question

Dino

On 3/5/2023 10:56 PM, avi.e.gross@gmail.com wrote:

Dino, Sending lots of data to an archived forum is not a great idea. I snipped most of it out below as not to replicate it.

Your question does not look difficult unless your real question is about speed. Realistically, much of the time spent generally is in reading in a file and the actual search can be quite rapid with a wide range of

methods.

The data looks boring enough and seems to not have much structure other

than

one comma possibly separating two fields. Do you want the data as one wide filed or perhaps in two parts, which a CSV file is normally used to represent. Do you ever have questions like tell me all cars whose name
begins with the letter D and has a V6 engine? If so, you may want more

than

a vanilla search.

What exactly do you want to search for? Is it a set of built-in searches

or

something the user types in?

The data seems to be sorted by the first field and then by the second and

I

did not check if some searches might be ambiguous. Can there be many

entries

containing III? Yep. Can the same words like Cruiser or Hybrid appear?

So is this a one-time search or multiple searches once loaded as in a
service that stays resident and fields requests. The latter may be worth speeding up.

I don't NEED to know any of this but want you to know that the answer may depend on this and similar factors. We had a long discussion lately on whether to search using regular expressions or string methods. If your

data

is meant to be used once, you may not even need to read the file into
memory, but read something like a line at a time and test it. Or, if you

end

up with more data like how many cylinders a car has, it may be time to

read

it in not just to a list of lines or such data structures, but get numpy/pandas involved and use their many search methods in something like

a

data.frame.

Of course if you are worried about portability, keep using Get Regular Expression Print.

Your example was:

$ grep -i v60 all_cars_unique.csv
Genesis,GV60
Volvo,V60

You seem to have wanted case folding and that is NOT a normal search. And your search is matching anything on any line. If you wanted only a

complete

field, such as all text after a comma to the end of the line, you could

use

grep specifications to say that.

But once inside python, you would need to make choices depending on what
kind of searches you want to allow but also things like do you want all matching lines shown if you search for say "a" ...

--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to avi.e.gross@gmail.com on Mon Mar 6 13:38:15 2023

On 3/6/2023 12:49 PM, avi.e.gross@gmail.com wrote:

Thomas,

I may have missed any discussion where the OP explained more about proposed usage. If the program is designed to load the full data once, never get updates except by re-reading some file, and then handles multiple requests, then some things may be

worth doing.

It looked to me, and I may well be wrong, like he wanted to search for a string anywhere in the text so a grep-like solution is a reasonable start with the actual data being stored as something like a list of character strings you can search "one line"

at a time. I suspect a numpy variant may work faster.

And of course any search function he builds can be made to remember some or all previous searches using a cache decorator. That generally uses a dictionary for the search keys internally.

But using lots of dictionaries strikes me as only helping if you are searching for text anchored to the start of a line so if you ask for "Honda" you instead ask the dictionary called "h" and search perhaps just for "onda" then recombine the prefix in

any results. But the example given wanted to match something like "V6" in middle of the text and I do not see how that would work as you would now need to search 26 dictionaries completely.

Well, that's the question, isn't it? Just how is this expected to be
used? I didn't read the initial posting that carefully, and I may have
missed something that makes a difference.

The OP gives as an example a user entering a string ("v60"). The
example is for a model designation. If we know that this entry box will
only receive model, then I would populate a dictionary using the model
numbers as keys. The number of distinct keys will probably not be that
large.

For example, highly simplified of course:

models = {'v60': 'Volvo', 'GV60': 'Genesis', 'cl': 'Acura'}
entry = '60'
candidates = (m for m in models.keys() if entry in m)
list(candidates)

['v60', 'GV60']

The keys would be lower-cased. A separate dictionary would give the
complete string with the desired casing. The values could be object
references to the complete information. If there might be several
different models models with the same key, then the values could be
lists or dictionaries and one would need to do some disambiguation, but
that should be simple or quick.

It all depends on the planned access patterns. If the OP really wants full-text search in the complete unstructured data file, then yes, a
full text indexer of some kind will be useful. Whoosh certainly looks
good though I have not used it. But for populating dropdown lists in
web forms, most likely the design of the form will provide a structure
for the various searches.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Thomas Passin
Sent: Monday, March 6, 2023 11:03 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:

Not sure if this is what Thomas meant, but I was also thinking dictionaries. >>
Dino could build a set of dictionaries with keys “a” through “z” that contain data with those letters in them. (I’m assuming case insensitive search) and then just search “v” if that’s what the user starts with.

Increased performance may be achieved by building dictionaries “aa”,”ab” ... “zz. And so on.

Of course, it’s trading CPU for memory usage, and there’s likely a point at which the cost of building dictionaries exceeds the savings in searching.

Chances are it would only be seconds at most to build the data cache,
and then subsequent queries would respond very quickly.

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of Thomas Passin <list1@tompassin.net>
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org <python-list@python.org>
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/

python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Greg Ewing@21:1/5 to avi.e.gross@gmail.com on Tue Mar 7 12:10:35 2023

On 7/03/23 6:49 am, avi.e.gross@gmail.com wrote:

But the example given wanted to match something like "V6" in middle of the text and I do not see how that would work as you would now need to search 26 dictionaries completely.

It might even make things worse, as there is likely to be a lot of
overlap between entries containing "V" and entries containing "6",
so you end up searching the same data multiple times.

--
Greg

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Greg Ewing@21:1/5 to Gerard on Tue Mar 7 12:06:23 2023

On 7/03/23 4:35 am, Weatherby,Gerard wrote:

If mailing space is a consideration, we could all help by keeping our replies short and to the point.

Indeed. A thread or two of untrimmed quoted messages is probably
more data than Dino posted!

--
Greg

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to Dino on Mon Mar 6 21:55:37 2023

On 3/4/2023 10:43 PM, Dino wrote:

I need fast text-search on a large (not huge, let's say 30k records
totally) list of items. Here's a sample of my raw data (a list of US
cars: model and make)

Gentlemen, thanks a ton to everyone who offered to help (and did help!).
I loved the part where some tried to divine the true meaning of my words :)

What you guys wrote is correct: the grep-esque search is guaranteed to
turn up a ton of false positives, but for the autofill use-case, that's actually OK. Users will quickly figure what is not relevant and skip
those entries, just to zero on in on the suggestion that they find relevant.

One issue that was also correctly foreseen by some is that there's going
to be a new request at every user key stroke. Known problem. JavaScript programmers use a trick called "debounceing" to be reasonably sure that
the user is done typing before a request is issued:

https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript

I was able to apply that successfully and I am now very pleased with the
final result.

Apologies if I posted 1400 lines or data file. Seeing that certain
newsgroups carry gigabytes of copyright infringing material must have
conveyed the wrong impression to me.

Thank you.

Dino

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From rbowman@21:1/5 to Dino on Tue Mar 7 04:05:19 2023

On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote:

ne issue that was also correctly foreseen by some is that there's going
to be a new request at every user key stroke. Known problem. JavaScript programmers use a trick called "debounceing" to be reasonably sure that
the user is done typing before a request is issued:

https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript

That could be annoying. My use case is address entry. When the user types

102 ma

the suggestions might be

main
manson
maple
massachusetts
masten

in a simple case. When they enter 's' it's narrowed down. Typically I'm
only dealing with a city or county so the data to be searched isn't huge.
The maps.google.com address search covers the world and they're also
throwing in a geographical constraint so the suggestions are applicable to
the area you're viewing. It must be nice to have a server or two...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to rbowman on Tue Mar 7 07:33:01 2023

On 3/6/2023 11:05 PM, rbowman wrote:

It must be nice to have a server or two...

No kidding

About everything else you wrote, it makes a ton of sense, in fact it's a dilemma I am facing now. My back-end returns 10 entries (I am limiting
to max 10 matches server side for reasons you can imagine).
As the user keeps typing, should I restrict the existing result set
based on the new information or re-issue a API call to the server?
Things get confusing pretty fast for the user. You don't want too many
cooks in kitchen, I guess.
Played a little bit with both approaches in my little application. Re-requesting from the server seems to win hands down in my case.
I am sure that them google engineers reached spectacular levels of UI
finesse with stuff like this.

On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote:

https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript

That could be annoying. My use case is address entry. When the user types

102 ma

the suggestions might be

main
manson
maple
massachusetts
masten

in a simple case. When they enter 's' it's narrowed down. Typically I'm
only dealing with a city or county so the data to be searched isn't huge.
The maps.google.com address search covers the world and they're also
throwing in a geographical constraint so the suggestions are applicable to the area you're viewing.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter J. Holzer@21:1/5 to rbowman on Tue Mar 7 14:24:38 2023

On 2023-03-07 04:05:19 +0000, rbowman wrote:

On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote:

ne issue that was also correctly foreseen by some is that there's going
to be a new request at every user key stroke. Known problem. JavaScript programmers use a trick called "debounceing" to be reasonably sure that
the user is done typing before a request is issued:

https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript

That could be annoying. My use case is address entry. When the user types

It can be. The delay is short but noticeable.

A somewhat smarter strategy is to send each query as soon as the user
hit the key but keep track of what you sent and received and discard
responses for obsolete requests (This is necessary because if you first
send "ma" and then "mas", the response to the first query might arrive
after the response to the second query and you don't want to display
"mansion" if the user already typed "mas".)

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmQHOxAACgkQ8g5IURL+ KF2IcA/8C6e4FE7ZesSCppNZjX2CW8muyi1+3N+JrLobRiv0NW0pAucJiYJJWPDG VppirybEkUOSMBZKfITn/4qADwMSko8QWvUXk2soCGLvBNve5dKR/iLNnjJ3QmIu i/+O5xcVBAGbxc83Xq+NyODr0wDkWy4ofqGsr+NTE/NP8AxoPyCjbFMbm1LrW4vJ wpKSWvMrdx7f5/z5xw9AdvzuAJc23LG0Hfk6ka4XoLiGDm5L8MJ02EkRLmZ27unZ hu1T1qIRJ6yqEDBVGeZkEpIzC+0nlVi4fFBa6eJQmtJoodfwgva+5zxkGZ5mh48V Y3kO7ul74hDOYsnwrjC4+EFtfE/YbivWTwJlT2Xhdx8IXseQweLTpfjAssUNvM9p /V0Ue6HfOr7g8CrtlD5jmEzJxxp/HixYvdj1F/DnCfEAJl1DqaVcjb9Ya9R3kOx7 QwrVbs0C83aB2zo9HkQQaW6KMFrGKnOh/BFB2sfOdVLjTHBowgH6k5GrZD6STtbm HeUh3dK9IBeW/fq9o0SsjpRZ+Prf+GE8Mc5pef3gzQJ96jrtBN9EvzRJTz15iOC0 HNAHGHqpugbq//Y/RW0/M5sEfjSD/2+bREiuI2fzGo+zFbUyOJ/51woHyVDco3TF q6XurJ2qN0CdMhHJmIdIMsCXksnZ8TPx/34Obm1

From avi.e.gross@gmail.com@21:1/5 to Dino on Tue Mar 7 14:02:03 2023

Some of the discussions here leave me confused as the info we think we got early does not last long intact and often morphs into something else and we find much of the discussion is misdirected or wasted.

Wouldn't it have been nice if this discussion had not started with a mention
of a package/module few have heard of along with a vague request on how best
to search for lines that match something in a file?

I still do not know enough to feel comfortable even after all this time. It
now seems to be a web-based application in which a web page wants to use autocompletion as the user types.

So was the web page a static file that the user runs, or is it dynamically created by something like a python program? How is the fact that a user has typed a letter in a textbox or drop down of sorts reflected in a request
being sent to a python program to return possible choices? Is the same
process called anew each time or is it, or perhaps a group of similar
processes or threads going to stick around and be called repeatedly?

Lots of details are missing and in particular, much of what is being
described sounds like it is happening in the browser, presumably in
JavaScript. Also noted is that the first keystroke or two may return too
much data.

So does the OP still think this is a python question? So much of the
discussion sounds like it is in the browser deciding whether to wait for the user to type more before making a request, or throwing away results of an
older request.

So my guess is that a possible design for this amount of data may simply be
to read the file into the browser at startup, or when the first letter is typed, and do all the searches internally, perhaps cascaded as long as backspace or editing is not used.

If the data gets much larger, of course, then using a server makes sense
albeit it need not use python unless lots more in the project is also ...

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of David Lowry-Duda
Sent: Tuesday, March 7, 2023 1:29 PM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

On 22:43 Sat 04 Mar 2023, Dino wrote:

How can I implement this? A library called Whoosh seems very promising >(albeit it's so feature-rich that it's almost like shooting a fly with
a bazooka in my case), but I see two problems:

1) Whoosh is either abandoned or the project is a mess in terms of
community and support
(https://groups.google.com/g/whoosh/c/QM_P8cGi4v4 ) and

2) Whoosh seems to be a Python only thing, which is great for now,
but I wouldn't want this to become an obstacle should I need port it to
a different language at some point.

As others have noted, it sounds like relatively straightforward
implementations will be sufficient.

But I'll note that I use whoosh from time to time and I find it stable
and pleasant to work with. It's true that development stopped, but it
stopped in a very stable place. I don't recommend using whoosh here, but
I would recommend experimenting with it more generally.

- DLD
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Lowry-Duda@21:1/5 to Dino on Tue Mar 7 13:28:52 2023

On 22:43 Sat 04 Mar 2023, Dino wrote:

How can I implement this? A library called Whoosh seems very promising >(albeit it's so feature-rich that it's almost like shooting a fly with
a bazooka in my case), but I see two problems:

1) Whoosh is either abandoned or the project is a mess in terms of
community and support
(https://groups.google.com/g/whoosh/c/QM_P8cGi4v4 ) and

2) Whoosh seems to be a Python only thing, which is great for now,
but I wouldn't want this to become an obstacle should I need port it to
a different language at some point.

As others have noted, it sounds like relatively straightforward
implementations will be sufficient.

But I'll note that I use whoosh from time to time and I find it stable
and pleasant to work with. It's true that development stopped, but it
stopped in a very stable place. I don't recommend using whoosh here, but
I would recommend experimenting with it more generally.

- DLD

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From rbowman@21:1/5 to Dino on Wed Mar 8 01:04:28 2023

On Tue, 7 Mar 2023 07:33:01 -0500, Dino wrote:

Played a little bit with both approaches in my little application. Re-requesting from the server seems to win hands down in my case.

That's necessary for a non-trivial data set. Assume you get 10 suggestions after the user type 'to'.

today
tomorrow
tomato
tonsil
torque
totem
toad
toque
toward
touch

If the user type 'l' next and is trying for 'tolerance' you'll need a new
set. You'll need a little refinement. If the user is a proficient typist
and wants to type 'tolerance' they may get ahead of you.

Another consideration is a less proficient typist or someone who can't
spell. Again, play with maps.google.com. They're good at it. Put '123
thomd' in the search bar. YMMV but I get 5 variations on 123 Thomas. When
they were working down 'thompd' had zero matches so they backed up to
'thom'.

If you play with their search they're using some more magic too. Try '123 ellekt'. They may be using a variation on soundex or something more sophisticated.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Passin@21:1/5 to Dino on Wed Mar 8 00:12:04 2023

On 3/7/2023 7:33 AM, Dino wrote:

It must be nice to have a server or two...

No kidding

About everything else you wrote, it makes a ton of sense, in fact it's a dilemma I am facing now. My back-end returns 10 entries (I am limiting
to max 10 matches server side for reasons you can imagine).
As the user keeps typing, should I restrict the existing result set
based on the new information or re-issue a API call to the server?
Things get confusing pretty fast for the user. You don't want too many
cooks in kitchen, I guess.
Played a little bit with both approaches in my little application. Re-requesting from the server seems to win hands down in my case.
I am sure that them google engineers reached spectacular levels of UI
finesse with stuff like this.

Subject of course to trying this out, I would be inclined to send a much
larger list of responses to the client, and let the client reduce the
number to be displayed. The latency for sending a longer list will be
smaller than establishing a new connection or even reusing an old one to
send a new, short list of responses. When the client types more, it can
only reduce the number of possibilities - among the (possibly imaginary)
larger original number of them. After the next round of user typing, the
client can check and see if there are enough surviving responses to
list. If not, it can then request a new list from the server.

Using this in reverse, if the user deletes some characters from the end,
there should be no need to go back to the server. The possible
responses would already have been sent to the client. They could be
interned in an associative array keyed by the string the client had
typed to get those responses.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to avi.e.gross@gmail.com on Wed Mar 8 09:40:51 2023

On 3/7/2023 2:02 PM, avi.e.gross@gmail.com wrote:

Some of the discussions here leave me confused as the info we think we got early does not last long intact and often morphs into something else and we find much of the discussion is misdirected or wasted.

Apologies. I'm the OP and also the OS (original sinner). My "mistake"
was to go for a "stream of consciousness" kind of question, rather than
a well researched and thought out one.

You are correct, Avi. I have a simple web UI, I came across the Whoosh
video and got infatuated with the idea that Whoosh could be used for
create a autofill function, as my backend is already Python/Flask. As
many have observed and as I have also quickly realized, Whoosh was
overkill for my use case. In the meantime people started asking
questions, I responded and, before you know it, we are all discussing
the intricacies of JavaScript web development in a Python forum. Should
I have stopped them? How?

One thing is for sure: I am really grateful that so many used so much of
their time to help.

A big thank you to each of you, friends.

Dino

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dino@21:1/5 to David Lowry-Duda on Wed Mar 8 09:56:32 2023

On 3/7/2023 1:28 PM, David Lowry-Duda wrote:

But I'll note that I use whoosh from time to time and I find it stable
and pleasant to work with. It's true that development stopped, but it
stopped in a very stable place. I don't recommend using whoosh here, but
I would recommend experimenting with it more generally.

Thank you, David. Noted.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From moi@21:1/5 to All on Wed Mar 8 07:46:29 2023

Do not forget Škoda cars.
It makes software amusing...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter J. Holzer@21:1/5 to Thomas Passin on Wed Mar 8 21:27:55 2023

On 2023-03-08 00:12:04 -0500, Thomas Passin wrote:

On 3/7/2023 7:33 AM, Dino wrote:

in fact it's a dilemma I am facing now. My back-end returns 10
entries (I am limiting to max 10 matches server side for reasons you
can imagine). As the user keeps typing, should I restrict the
existing result set based on the new information or re-issue a API
call to the server? Things get confusing pretty fast for the user.
You don't want too many cooks in kitchen, I guess.
Played a little bit with both approaches in my little application. Re-requesting from the server seems to win hands down in my case.
I am sure that them google engineers reached spectacular levels of UI finesse with stuff like this.

Subject of course to trying this out, I would be inclined to send a much larger list of responses to the client, and let the client reduce the number to be displayed. The latency for sending a longer list will be smaller than establishing a new connection or even reusing an old one to send a new,
short list of responses.

That depends very much on how long that list can become. If it's 200
matches - sure, send them all, even if the client will display only 10
of them. Probably even for 2000. But if you might get 20 million matches
you surely don't want to send them all to the client.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmQI78YACgkQ8g5IURL+ KF2r1Q//eOMEqtlAVqTU8HrNakgxiXEI+iKYg7IRnNIObiRLXxqK2qImdk5+G2Rt r6moI7C6t61wwq7bbSgIsWoubDpUEZWvTy36aireolavlQJd/Zj0I8FhGc5yL2ky ya6eyRR9NaaYbgjRL2skdqZQiy7T4CF3RB8h2p4Jgg/CxXHEJAYGBTMuJrU2Jygw WNY677KFDrxTlZiAk7UeXXqJJdWc2QmGQ1+jTyP35jny6rXn7+ZS60jFqEzcLa7O W9nGARm37z64KNG0Qm5gVYRIEAczZIu0NBTawfwD6LszezGT/k5med41Z0+3LKrL Fg5hTIyGjObEJS/pBDQ8TFQ55P5ANGVsvl36j6JpCFz+AocE/f1katEzFMrgFk1+ 73IcuXOujUlEnMmzZB52d5D3rpexL0WT5Uhtjs2yEEPUCapDMFtueZPuWaSbQQC9 TyN1q65UnNzYGmyWTUCTUNgXeo4xuQRySW5qh067zOgNNWjmq3k8Qv3TjQrvUKHN rOlyEyX/gpP03vtpoTSDHvAB1zmEQUVWIdmuo+P/RZvwuXZ+WcCN74rL1dF9juGt 8K4nCzYPyKm1H2fEv/cBcExu/KluA48yem7a6IxdupiHmmrQMv9b8xH2oUM3kbjc ZfIkOEbTgZiTR3uCOvJR2H9r8KkryUhM+FUTkbJ

From Thomas Passin@21:1/5 to Peter J. Holzer on Wed Mar 8 16:00:32 2023

On 3/8/2023 3:27 PM, Peter J. Holzer wrote:

On 2023-03-08 00:12:04 -0500, Thomas Passin wrote:

On 3/7/2023 7:33 AM, Dino wrote:

in fact it's a dilemma I am facing now. My back-end returns 10
entries (I am limiting to max 10 matches server side for reasons you
can imagine). As the user keeps typing, should I restrict the
existing result set based on the new information or re-issue a API
call to the server? Things get confusing pretty fast for the user.
You don't want too many cooks in kitchen, I guess.
Played a little bit with both approaches in my little application.
Re-requesting from the server seems to win hands down in my case.
I am sure that them google engineers reached spectacular levels of UI
finesse with stuff like this.

Subject of course to trying this out, I would be inclined to send a much
larger list of responses to the client, and let the client reduce the number >> to be displayed. The latency for sending a longer list will be smaller than >> establishing a new connection or even reusing an old one to send a new,
short list of responses.

That depends very much on how long that list can become. If it's 200
matches - sure, send them all, even if the client will display only 10
of them. Probably even for 2000. But if you might get 20 million matches
you surely don't want to send them all to the client.

Yes, of course. OTOH, if you have 2000+ possibilities it's basically
pointless to send them to the client. You can send the first 10, and
hope that will be worth something (it probably won't). You can send all
2000 and let the client show the first say 10, but that probably won't
be worth much either. If you have some way to prioritize them, you can
include the scores and send the top say 100 what you send to the client,
and let the client figure out what to do.

If you are going to have that many responses you will need some more
complex and sophisticated approach anyway, so the whole discussion would
not be applicable. And this would be getting miles (kms) away from
the OP's situation.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	71:06:11
Calls:	6,712
Files:	12,244
Messages:	5,356,967

Fast full-text searching in Python (job for Whoosh?)

Who's Online

System Info