Author: Aniq Ur Rahman | @Aniq55
- Image Processing
- Machine Learning
- Natural Language Processing
- Shell Scripting
The memes are collected from popular subreddits using a scraper script scrape/scraper.py
- The memes collected are put in
rawfolder and the scriptstandard.pyis run - Each file name is extracted and stored in a text file next to the new hex based filename generated fot the image
- The standardized images are stored in the
processedfolder
- The entered query is split into words and synonyms for each word is added to the list of
related queriesusing the nltk library - We scan the database to match words with the words in
related queries - This broadens the search area and minimizes zero output scenarios
- The memes are ordered in order of their relevance to the search query
- This is done by assigning a score to each meme present in the database and then sorting in descending order of scores
- OCR is done using Tesseract to extract text from the memes which is an essential part of the project
- The extracted text are not perfectly accurate so the output from ocr is fed into the spellchecker of the Python
autocorrectlibrary - The spellchecker makes the conversion more accurate
To run the GUI and test the functionalities, simply type
sudo bash run.sh
- To collect the memes from subreddits
sudo bash collect.sh
- The bash script prepares the database which allows the Meme Engine to function properly
- To run the Meme Retrieval Engine (Meme Finder) type
sudo bash run.sh
- Enter the query in the text field and click on
Go - The memes are sorted based on relevance
- The selected memes can be browsed using the
NextandPreviousbuttons
- cv2 (OpenCV)
- pytesseract
- nltk
- PIL
- hashlib
- shutil
- autocorrect
- pickle
- Adding functionality to the progress bar
- Correct the size scaling of memes for display on the canvas
- Adding feature to flush stored memes
- Creating an option to enter the names of subreddits to scrape from
- Storing popular meme templates and checking images for similarity and associating special keywords
- renames the memes present in
rawfolder to a unique hex digest generated filename and moves it toprocessedfolder
extractText(image_path): extracts text using OCR from the meme atimage_path
generateQuery(query): Extends the query to include all synonyms related to the input query using nltk packagecreate_index(database): creates an dictionary (index) of all memes stored in the database, where the filename is thekeyand the associated text is thevaluegetScore(INDEX, keywords): Creates a relevance based score list matched with the filenames inINDEXfor the givenkeywordsload_index(index_name): Loads an index dictionary fromindex_nameusingpicklelibrary
meme: class which contains vital information likememeListandcurrentImageand the object of this class is very important in the functioning of the GUIgetMemeList(query): gets the list of memes which match the givenquerydisplay(canvas, image_path): displays the image atimage_pathon thecanvasin the GUIgo(canvas, query): this function initiates all the process essential for the GUI to function. It gets the memeList ready based on the enteredqueryand also dispays the first meme on thecanvasprev(canvas): displays the previous image on thecanvasnext(canvas): displays the next image on thecanvas
